Skip to content
This repository has been archived by the owner on Sep 21, 2022. It is now read-only.

Retrieve differentially private quantiles from a set of pre-computed aggregates #331

Open
raprasad opened this issue Dec 2, 2020 · 1 comment
Assignees

Comments

@raprasad
Copy link
Member

raprasad commented Dec 2, 2020

from @joshua-oss:
Is there any way to get differentially private quantiles from a set of pre-computed aggregates? Access to underlying rows is not possible. The underlying SQL engines can return arbitrary exact quantiles, and also can return histograms with even or arbitrary bin width.

@raprasad
Copy link
Member Author

raprasad commented Dec 2, 2020

from slack:

@Shoeboxam:

You could insert the exact quantile statistic into the laplace quantile- but your utility will be so poor you might as well just sample some noise. Sensitivity is (M- m), independent of the record count.
The most efficient algorithm will be based on the exponential mechanism, but the current formulation requires row-level access.
Assuming your record counts are large, you would be better off postprocessing histograms. I think the histogram insertion is pretty well trodden ground at this point. I could add a postprocessing component if you'd like.
There may be a middle ground with the exponential, where we can still give a bound on the sensitivity of utilities derived from partially aggregated data.
After thinking about this some more, I have a utility function to evaluate bins from an exact histogram, along with a bounded sensitivity. Not sure how it compares to post-processing histograms at this point. Aiming for a middle ground between row-level exponential and postprocessed noisy histograms. The mechanism basically releases the bin index most likely to contain the quantile. (edited)

@joshua-oss:

That would work! I could pass in a vector of histogram bin sizes. I assume we wouldn’t be able to report accuracy

@Shoeboxam:

Yeah. The tricky part about the exponential mechanism accuracy is that it is dependent on the dataset itself. So the accuracy itself leaks information about the dataset

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants