Results in Distribution Testing

Configuration

How to Read This Table

The Problem Setup

In the distribution equivalence (closeness) testing problem, we are given two unknown discrete probability distributions $P$ and $Q$ over a common support $\Sigma$ of size $N$, and we wish to decide:

either $d(P,Q) \leq \varepsilon$ (they are close) vs $d(P,Q) \geq \eta$ (they are far),

where $d$ is a distance measure ($d_{\mathrm{TV}}$, etc.), $\varepsilon$ is the closeness threshold, and $\eta > \varepsilon$ is the farness threshold. The non-tolerant case sets $\varepsilon = 0$ (exact equality vs. far), leaving $\eta$ as the sole parameter.

Since $P$ and $Q$ are unknown, we can only interact with them through sampling oracles. This table considers the setting where you have different oracle types for each distribution:

$P$

ROW ORACLE

First distribution
oracle type given by row

⟷

Test:
$d(P,Q) \leq \varepsilon$?
vs
$d(P,Q) \geq \eta$?

$Q$

COLUMN ORACLE

Second distribution
oracle type given by column

The cell at row A and column B gives the query complexity — the number of oracle calls (to either oracle) required by the best algorithm to solve the problem, up to constant and polylogarithmic factors. A ? means the result is an open problem.

Known Oracle Relationships (Partial Order)

The chain SAMP ⊂ DUAL ⊂ SUBCOND ⊂ COND ⊂ FULL is fully established. PAIRCOND is a separate branch: the only known relationship is PAIRCOND ⊂ COND (and transitively PAIRCOND ⊂ FULL). Whether PAIRCOND can simulate — or be simulated by — SAMP, DUAL, or SUBCOND remains open.

Oracle Definitions

SAMP Standard Sampling Oracle

The weakest and most natural access model. Each query returns one independent sample drawn from the distribution. No other information is available.

$\mathsf{SAMP}(P)$: draw $x \sim P$ uniformly at random.

DUAL Dual / Evaluation Oracle

Each query is either a sample draw from the distribution, or a probability evaluation at any chosen element. Provides access to probability values.

$\mathsf{DUAL}(P, x)$: either draw $x \sim P$, or query $P(x)$ for any chosen $x \in \Sigma$.

SUBCOND Subcube Conditional Oracle

For high-dimensional distributions over $\{0,1\}^d$, conditions on a subcube: a partial assignment fixing some coordinates. Known to be simulatable by COND; strictly stronger than DUAL.

$\mathsf{SUBCOND}(P, S, v)$: draw $x \sim P(\cdot \mid x_S = v)$ for $S \subseteq [d]$, $v \in \{0,1\}^{|S|}$.

COND Conditional Oracle

The strongest oracle that returns samples. Allows conditioning on any subset of the support. Known to simulate SAMP, DUAL, SUBCOND, and PAIRCOND.

$\mathsf{COND}(P, S)$: draw $x \sim P(\cdot \mid S)$ for any non-empty $S \subseteq \Sigma$.

PAIRCOND Pair Conditional Oracle

Returns a sample from the distribution restricted to a queried pair of elements. Only known to be simulatable by COND (and FULL). Its relationship to SAMP, DUAL, and SUBCOND is open.

$\mathsf{PAIRCOND}(P,\{a,b\})$: draw $x \sim P(\cdot \mid \{a,b\})$ for any $a,b \in \Sigma$.

FULL Full Evaluation Oracle

The strongest oracle. Essentially provides read access to the full probability table.

$\mathsf{FULL}(P)$: returns the entire $P(x)$ table for all $x \in \Sigma$.

Reading a Cell

Upper Bound (UB)

An upper bound $f(N,\varepsilon)$ means there exists an algorithm that uses at most $O(f(N,\varepsilon))$ queries and solves the problem correctly with probability $\geq 2/3$.

Lower Bound (LB)

A lower bound $g(N,\varepsilon)$ means that every algorithm for this oracle combination must use at least $\Omega(g(N,\varepsilon))$ queries in the worst case.

Notation

$N = |\Sigma|$ is the support size. $\varepsilon$ is the closeness parameter and $\eta$ is the farness parameter, with $\eta > \varepsilon \geq 0$. In the non-tolerant setting $\varepsilon = 0$, so bounds are stated in terms of $\eta$ alone (written as $\varepsilon$ in that column for brevity). $d$ is the dimension for high-dimensional distributions over $\{0,1\}^d$.

Open Problems (?)

A ? in a cell means no non-trivial bound is known for that oracle combination and distance setting. These represent open problems in distribution testing!

Query Complexity — Row: oracle for $P$ | Column: oracle for $Q$

	DUAL	PAIRCOND	SUBCOND	COND
SAMP	—
DUAL		—
PAIRCOND			—
SUBCOND				—
COND
FULL

Cell note & reference

Click a cell in the table above to see its reference or derivation.

Notes

The Total Variation distance $d_{\mathrm{TV}}$ between two distributions $P$ and $Q$ is: $d_{\mathrm{TV}}(P,Q) = \tfrac{1}{2}\sum_{x}|P(x)-Q(x)|$.
The support of a distribution is $\Sigma$ with $|\Sigma|=N$. For joint distributions on $d$ dimensions, the support is $\Sigma^d$ and the support size is $N^d$.
Query complexity counts the number of oracle calls needed to distinguish the null hypothesis from the alternative hypothesis, up to the specified distances.
Notation convention: $\varepsilon$ denotes the closeness threshold and $\eta$ the farness threshold ($\eta > \varepsilon \geq 0$). In the non-tolerant setting the closeness threshold is $\varepsilon = 0$, so bounds depend only on the farness threshold, written as $\varepsilon$ for brevity in that column. In the tolerant setting both parameters appear.
Cells marked ? represent open problems.

References

Testing Closeness of Discrete Distributions [ Paper | ]
@article{batu2013testing, title = {Testing closeness of discrete distributions}, author = {Batu, Tu{\u{g}}kan and Fortnow, Lance and Rubinfeld, Ronitt and Smith, Warren D and White, Patrick}, journal = {Journal of the ACM (JACM)}, volume = {60}, number = {1}, pages = {1--25}, year = {2013}, publisher = {ACM} }

Tugkan Batu, Lance Fortnow, Ronitt Rubinfeld, Warren D. Smith, Patrick White
An Automatic Inequality Prover and Instance Optimal Identity Testing [ Paper | ]
@article{valiant2017automatic, title = {An Automatic Inequality Prover and Instance Optimal Identity Testing}, author = {Gregory Valiant and Paul Valiant}, journal = {{SIAM} J. Comput.}, volume = {46}, number = {1}, pages = {429--455}, year = {2017} }

Gregory Valiant, Paul Valiant
On Distribution Testing in the Conditional Sampling Model [ Paper | ]
@article{nar2020cond, author = {Shyam Narayanan}, title = {Tolerant Distribution Testing in the Conditional Sampling Model}, journal = {CoRR}, volume = {abs/2007.09895}, year = {2020} }

Shyam Narayanan
Property Testing of Joint Distributions using Conditional Samples [ Paper | ]
@article{bhattacharyya2018property, author = {Rishiraj Bhattacharyya and Sourav Chakraborty}, title = {Property Testing of Joint Distributions using Conditional Samples}, journal = {CoRR}, volume = {abs/1702.01454}, year = {2017} }

Rishiraj Bhattacharyya, Sourav Chakraborty
Testing Bayesian Networks [ Paper | ]
@inproceedings{canonne2017testing, title = {Testing Bayesian Networks}, author = {Canonne, Cl{\'e}ment L and Diakonikolas, Ilias and Kane, Daniel M and Stewart, Alistair}, booktitle = {Conference on Learning Theory}, pages = {370--448}, year = {2017}, organization = {PMLR} }

Clement L. Canonne, Ilias Diakonikolas, Daniel M. Kane, Alistair Stewart
Testing Probability Distributions using Conditional Samples [ Paper | ]
@article{canonne2015testing, title = {Testing probability distributions using conditional samples}, author = {Canonne, Cl{\'e}ment L and Ron, Dana and Servedio, Rocco A}, volume = {44}, number = {3}, pages = {540--616}, year = {2015}, publisher = {SIAM} }

Clement L. Canonne, Dana Ron, Rocco A. Servedio
On the Power of Conditional Samples in Distribution Testing [ Paper | ]
@inproceedings{chakraborty2013power, title = {On the power of conditional samples in distribution testing}, author = {Chakraborty, Sourav and Fischer, Eldar and Goldhirsh, Yonatan and Matsliah, Arie}, booktitle = {Proceedings of the 4th ITCS}, pages = {561--580}, year = {2013} }

Sourav Chakraborty, Eldar Fischer, Yonatan Goldhirsh, Arie Matsliah
On Scalable Testing of Samplers [ Paper | ]
@inproceedings{potescalable, title = {On Scalable Testing of Samplers}, author = {Pote, Yash and Meel, Kuldeep S}, booktitle = {Advances in Neural Information Processing Systems} }

Yash Pote, Kuldeep S. Meel
Faster Algorithms for Testing under Conditional Sampling [ Paper | ]
@inproceedings{falahatgar2015faster, title = {Faster algorithms for testing under conditional sampling}, author = {Falahatgar, Moein and Jafarpour, Ashkan and Orlitsky, Alon and Pichapati, Venkatadheeraj and Suresh, Ananda Theertha}, booktitle = {Conference on Learning Theory}, pages = {607--636}, year = {2015}, organization = {PMLR} }

Moein Falahatgar, Ashkan Jafarpour, Alon Orlitsky, Venkatadheeraj Pichapati, Ananda Theertha Suresh
Testing Self-Reducible Samplers [ Paper | ]
@inproceedings{bhattacharyya2024testing, title = {Testing Self-Reducible Samplers}, author = {Bhattacharyya, Rishiraj and Chakraborty, Sourav and Pote, Yash and Sarkar, Uddalok and Sen, Sayantan}, booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence}, volume = {38}, number = {8}, pages = {7952--7960}, year = {2024} }

Rishiraj Bhattacharyya, Sourav Chakraborty, Yash Pote, Uddalok Sarkar, Sayantan Sen
Tolerant Testing of High-Dimensional Samplers with Subcube Conditioning [ Paper | ]
@article{kumar2023tolerant, title = {Tolerant Testing of High-Dimensional Samplers with Subcube Conditioning}, author = {Kumar, Gunjan and Meel, Kuldeep S and Pote, Yash}, journal = {arXiv preprint arXiv:2308.04264}, year = {2023} }

Gunjan Kumar, Kuldeep S. Meel, Yash Pote
Testing Symmetric Properties of Distributions [ Paper | ]
@inproceedings{valiant2008testing, title = {Testing symmetric properties of distributions}, author = {Valiant, Paul}, booktitle = {Proceedings of the 40th Annual ACM STOC}, pages = {383--392}, year = {2008} }

Paul Valiant
Testing Probability Distributions Underlying Aggregated Data [ Paper | ]
@inproceedings{canonne2014testing, title = {Testing probability distributions underlying aggregated data}, author = {Canonne, Cl{\'e}ment and Rubinfeld, Ronitt}, booktitle = {ICALP}, pages = {283--295}, year = {2014}, organization = {Springer} }

Clement Canonne, Ronitt Rubinfeld
Minimax Estimation of the $L_1$ Distance [ Paper | ]
@article{jiao2018minimax, title = {Minimax estimation of the $L_1$ distance}, author = {Jiao, Jiantao and Han, Yanjun and Weissman, Tsachy}, journal = {IEEE Transactions on Information Theory}, volume = {64}, number = {10}, pages = {6672--6706}, year = {2018}, publisher = {IEEE} }

Jiantao Jiao, Yanjun Han, Tsachy Weissman
Tight Lower Bound on Equivalence Testing in Conditional Sampling Model [ Paper | ]
@inproceedings{chakraborty2024tight, title = {Tight lower bound on equivalence testing in conditional sampling model}, author = {Chakraborty, Diptarka and Chakraborty, Sourav and Kumar, Gunjan}, booktitle = {SODA 2024}, pages = {4371--4394}, year = {2024}, organization = {SIAM} }

Diptarka Chakraborty, Sourav Chakraborty, Gunjan Kumar
Improved Bounds for High-Dimensional Equivalence and Product Testing using Subcube Queries [ Paper | ]
@article{adar2024improved, title = {Improved Bounds for High-Dimensional Equivalence and Product Testing using Subcube Queries}, author = {Adar, Tomer and Fischer, Eldar and Levi, Amit}, journal = {arXiv preprint arXiv:2408.02347}, year = {2024} }

Tomer Adar, Eldar Fischer, Amit Levi
The Price of Tolerance in Distribution Testing [ Paper | ]
@article{canonne2021price, title = {The Price of Tolerance in Distribution Testing}, author = {Canonne, Cl{\'e}ment L. and Jain, Ayush and Kamath, Gautam and Li, Jerry}, journal = {arXiv preprint arXiv:2106.13414}, year = {2021} }

Clément L. Canonne, Ayush Jain, Gautam Kamath, Jerry Li
A Coincidence-Based Test for Uniformity Given Very Sparsely Sampled Discrete Data [ Paper | ]
@article{paninski2008coincidence, title = {A Coincidence-Based Test for Uniformity Given Very Sparsely Sampled Discrete Data}, author = {Paninski, Liam}, journal = {IEEE Transactions on Information Theory}, volume = {54}, number = {10}, pages = {4750--4755}, year = {2008}, publisher = {IEEE} }

Liam Paninski

Contribute

This table is openly maintained. If you know a result that is missing, incorrect, or has been recently published, please comment via a GitHub Issue or submit a pull request directly.

Comment or Report an Error

Open a GitHub Issue to suggest a correction, flag a missing result, or ask a question. Click a table cell first, then use the button below to pre-fill the issue with that cell's details.

Suggest for selected cell View all issues

Submit a Pull Request

Results are stored in data.csv. Each entry specifies the oracle pair, bound type, LaTeX formula, and reference key. Fork the repo, add your entry, and open a PR.

Edit data.csv on GitHub Fork repo

Distribution Testing Zoo

Closeness Distance

Farness Distance

Bound Type

Upper Bound (UB)

Lower Bound (LB)

Notation

Open Problems (?)

Notes