How to Read This Table
In the distribution equivalence (closeness) testing problem, we are given two unknown discrete probability distributions \(P\) and \(Q\) over a common support \(\Sigma\) of size \(N\), and we wish to decide:
either \(d(P,Q) \leq \varepsilon\) (they are close) vs \(d(P,Q) \geq \eta\) (they are far),
where \(d\) is a distance measure (\(d_{\mathrm{TV}}\), etc.), \(\varepsilon\) is the closeness threshold, and \(\eta > \varepsilon\) is the farness threshold. The non-tolerant case sets \(\varepsilon = 0\) (exact equality vs. far), leaving \(\eta\) as the sole parameter.
Since \(P\) and \(Q\) are unknown, we can only interact with them through sampling oracles. This table considers the setting where you have different oracle types for each distribution:
oracle type given by row
\(d(P,Q) \leq \varepsilon\)?
vs
\(d(P,Q) \geq \eta\)?
oracle type given by column
The cell at row A and column B gives the query complexity — the number of oracle calls (to either oracle) required by the best algorithm to solve the problem, up to constant and polylogarithmic factors. A ? means the result is an open problem.
The chain SAMP ⊂ DUAL ⊂ SUBCOND ⊂ COND ⊂ FULL is fully established. PAIRCOND is a separate branch: the only known relationship is PAIRCOND ⊂ COND (and transitively PAIRCOND ⊂ FULL). Whether PAIRCOND can simulate — or be simulated by — SAMP, DUAL, or SUBCOND remains open.
The weakest and most natural access model. Each query returns one independent sample drawn from the distribution. No other information is available.
Each query is either a sample draw from the distribution, or a probability evaluation at any chosen element. Provides access to probability values.
For high-dimensional distributions over \(\{0,1\}^d\), conditions on a subcube: a partial assignment fixing some coordinates. Known to be simulatable by COND; strictly stronger than DUAL.
The strongest oracle that returns samples. Allows conditioning on any subset of the support. Known to simulate SAMP, DUAL, SUBCOND, and PAIRCOND.
Returns a sample from the distribution restricted to a queried pair of elements. Only known to be simulatable by COND (and FULL). Its relationship to SAMP, DUAL, and SUBCOND is open.
The strongest oracle. Essentially provides read access to the full probability table.
Upper Bound (UB)
An upper bound \(f(N,\varepsilon)\) means there exists an algorithm that uses at most \(O(f(N,\varepsilon))\) queries and solves the problem correctly with probability \(\geq 2/3\).
Lower Bound (LB)
A lower bound \(g(N,\varepsilon)\) means that every algorithm for this oracle combination must use at least \(\Omega(g(N,\varepsilon))\) queries in the worst case.
Notation
\(N = |\Sigma|\) is the support size. \(\varepsilon\) is the closeness parameter and \(\eta\) is the farness parameter, with \(\eta > \varepsilon \geq 0\). In the non-tolerant setting \(\varepsilon = 0\), so bounds are stated in terms of \(\eta\) alone (written as \(\varepsilon\) in that column for brevity). \(d\) is the dimension for high-dimensional distributions over \(\{0,1\}^d\).
Open Problems (?)
A ? in a cell means no non-trivial bound is known for that oracle combination and distance setting. These represent open problems in distribution testing!
Notes
- The Total Variation distance \(d_{\mathrm{TV}}\) between two distributions \(P\) and \(Q\) is: \(d_{\mathrm{TV}}(P,Q) = \tfrac{1}{2}\sum_{x}|P(x)-Q(x)|\).
- The support of a distribution is \(\Sigma\) with \(|\Sigma|=N\). For joint distributions on \(d\) dimensions, the support is \(\Sigma^d\) and the support size is \(N^d\).
- Query complexity counts the number of oracle calls needed to distinguish the null hypothesis from the alternative hypothesis, up to the specified distances.
- Notation convention: \(\varepsilon\) denotes the closeness threshold and \(\eta\) the farness threshold (\(\eta > \varepsilon \geq 0\)). In the non-tolerant setting the closeness threshold is \(\varepsilon = 0\), so bounds depend only on the farness threshold, written as \(\varepsilon\) for brevity in that column. In the tolerant setting both parameters appear.
- Cells marked ? represent open problems.
-
Testing Closeness of Discrete Distributions
[ Paper |
]
@article{batu2013testing, title = {Testing closeness of discrete distributions}, author = {Batu, Tu{\u{g}}kan and Fortnow, Lance and Rubinfeld, Ronitt and Smith, Warren D and White, Patrick}, journal = {Journal of the ACM (JACM)}, volume = {60}, number = {1}, pages = {1--25}, year = {2013}, publisher = {ACM} }
-
An Automatic Inequality Prover and Instance Optimal Identity Testing
[ Paper |
]
@article{valiant2017automatic, title = {An Automatic Inequality Prover and Instance Optimal Identity Testing}, author = {Gregory Valiant and Paul Valiant}, journal = {{SIAM} J. Comput.}, volume = {46}, number = {1}, pages = {429--455}, year = {2017} }
-
On Distribution Testing in the Conditional Sampling Model
[ Paper |
]
@article{nar2020cond, author = {Shyam Narayanan}, title = {Tolerant Distribution Testing in the Conditional Sampling Model}, journal = {CoRR}, volume = {abs/2007.09895}, year = {2020} }
-
Property Testing of Joint Distributions using Conditional Samples
[ Paper |
]
@article{bhattacharyya2018property, author = {Rishiraj Bhattacharyya and Sourav Chakraborty}, title = {Property Testing of Joint Distributions using Conditional Samples}, journal = {CoRR}, volume = {abs/1702.01454}, year = {2017} }
-
Testing Bayesian Networks
[ Paper |
]
@inproceedings{canonne2017testing, title = {Testing Bayesian Networks}, author = {Canonne, Cl{\'e}ment L and Diakonikolas, Ilias and Kane, Daniel M and Stewart, Alistair}, booktitle = {Conference on Learning Theory}, pages = {370--448}, year = {2017}, organization = {PMLR} }
-
Testing Probability Distributions using Conditional Samples
[ Paper |
]
@article{canonne2015testing, title = {Testing probability distributions using conditional samples}, author = {Canonne, Cl{\'e}ment L and Ron, Dana and Servedio, Rocco A}, volume = {44}, number = {3}, pages = {540--616}, year = {2015}, publisher = {SIAM} }
-
On the Power of Conditional Samples in Distribution Testing
[ Paper |
]
@inproceedings{chakraborty2013power, title = {On the power of conditional samples in distribution testing}, author = {Chakraborty, Sourav and Fischer, Eldar and Goldhirsh, Yonatan and Matsliah, Arie}, booktitle = {Proceedings of the 4th ITCS}, pages = {561--580}, year = {2013} }
-
On Scalable Testing of Samplers
[ Paper |
]
@inproceedings{potescalable, title = {On Scalable Testing of Samplers}, author = {Pote, Yash and Meel, Kuldeep S}, booktitle = {Advances in Neural Information Processing Systems} }
-
Faster Algorithms for Testing under Conditional Sampling
[ Paper |
]
@inproceedings{falahatgar2015faster, title = {Faster algorithms for testing under conditional sampling}, author = {Falahatgar, Moein and Jafarpour, Ashkan and Orlitsky, Alon and Pichapati, Venkatadheeraj and Suresh, Ananda Theertha}, booktitle = {Conference on Learning Theory}, pages = {607--636}, year = {2015}, organization = {PMLR} }
-
Testing Self-Reducible Samplers
[ Paper |
]
@inproceedings{bhattacharyya2024testing, title = {Testing Self-Reducible Samplers}, author = {Bhattacharyya, Rishiraj and Chakraborty, Sourav and Pote, Yash and Sarkar, Uddalok and Sen, Sayantan}, booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence}, volume = {38}, number = {8}, pages = {7952--7960}, year = {2024} }
-
Tolerant Testing of High-Dimensional Samplers with Subcube Conditioning
[ Paper |
]
@article{kumar2023tolerant, title = {Tolerant Testing of High-Dimensional Samplers with Subcube Conditioning}, author = {Kumar, Gunjan and Meel, Kuldeep S and Pote, Yash}, journal = {arXiv preprint arXiv:2308.04264}, year = {2023} }
-
Testing Symmetric Properties of Distributions
[ Paper |
]
@inproceedings{valiant2008testing, title = {Testing symmetric properties of distributions}, author = {Valiant, Paul}, booktitle = {Proceedings of the 40th Annual ACM STOC}, pages = {383--392}, year = {2008} }
-
Testing Probability Distributions Underlying Aggregated Data
[ Paper |
]
@inproceedings{canonne2014testing, title = {Testing probability distributions underlying aggregated data}, author = {Canonne, Cl{\'e}ment and Rubinfeld, Ronitt}, booktitle = {ICALP}, pages = {283--295}, year = {2014}, organization = {Springer} }
-
Minimax Estimation of the \(L_1\) Distance
[ Paper |
]
@article{jiao2018minimax, title = {Minimax estimation of the $L_1$ distance}, author = {Jiao, Jiantao and Han, Yanjun and Weissman, Tsachy}, journal = {IEEE Transactions on Information Theory}, volume = {64}, number = {10}, pages = {6672--6706}, year = {2018}, publisher = {IEEE} }
-
Tight Lower Bound on Equivalence Testing in Conditional Sampling Model
[ Paper |
]
@inproceedings{chakraborty2024tight, title = {Tight lower bound on equivalence testing in conditional sampling model}, author = {Chakraborty, Diptarka and Chakraborty, Sourav and Kumar, Gunjan}, booktitle = {SODA 2024}, pages = {4371--4394}, year = {2024}, organization = {SIAM} }
-
Improved Bounds for High-Dimensional Equivalence and Product Testing using Subcube Queries
[ Paper |
]
@article{adar2024improved, title = {Improved Bounds for High-Dimensional Equivalence and Product Testing using Subcube Queries}, author = {Adar, Tomer and Fischer, Eldar and Levi, Amit}, journal = {arXiv preprint arXiv:2408.02347}, year = {2024} }
-
The Price of Tolerance in Distribution Testing
[ Paper |
]
@article{canonne2021price, title = {The Price of Tolerance in Distribution Testing}, author = {Canonne, Cl{\'e}ment L. and Jain, Ayush and Kamath, Gautam and Li, Jerry}, journal = {arXiv preprint arXiv:2106.13414}, year = {2021} }
This table is openly maintained. If you know a result that is missing, incorrect, or has been recently published, please comment via a GitHub Issue or submit a pull request directly.
Open a GitHub Issue to suggest a correction, flag a missing result, or ask a question. Click a table cell first, then use the button below to pre-fill the issue with that cell's details.
Results are stored in data.csv. Each entry specifies the oracle pair, bound type, LaTeX formula, and reference key. Fork the repo, add your entry, and open a PR.