Differential Privacy: 6 Key Equations Explained
- Abhishek Tiwari
- Notes
- 10.59350/ntarj-tg210
- December 1, 2024
Table of Contents
Differential Privacy is a powerful framework for ensuring privacy in data analysis by adding controlled noise to computations. Its mathematical foundation guarantees that the presence or absence of any individual’s data in a dataset does not significantly affect the outcome of an analysis. Here are six key equations that capture the essence of differential privacy and its mechanisms, along with references to their origins and explanations.
Definition of Differential Privacy
Differential privacy 1 formalises privacy through the concept of indistinguishability between neighbouring datasets. A mechanism satisfies -differential privacy if:
Where,
- denotes the probability of an event.
- : Neighboring datasets differing in one entry.
- : The randomized mechanism applied to the dataset.
- : Subset of possible outputs.
- : Privacy budget controlling the privacy-accuracy tradeoff.
This definition ensures that the outputs of are statistically indistinguishable for any two neighboring datasets.
Laplace Mechanism
The Laplace mechanism 1 is a common approach for achieving -differential privacy. It adds noise drawn from a Laplace distribution to the output of a function. The amount of noise is determined by the function’s sensitivity and the privacy budget , balancing privacy and accuracy. The noise is defined as:
Here:
- : Sensitivity of the function , measuring the maximum difference in output for neighboring datasets.
- is privacy budget.
The perturbed result is:
Gaussian Mechanism
For scenarios where a small probability of failure is acceptable, -differential privacy can be achieved using the Gaussian mechanism 2. The noise is sampled from a normal distribution:
The standard deviation of the noise is calculated as:
Where:
- : Sensitivity of the function .
- : Probability of a privacy failure.
- is privacy budget.
The perturbed output is:
When applied iteratively or in compositions, the Gaussian Mechanism aligns well with advanced composition theorems (described below), making it practical for repeated queries. It is particularly useful when the function’s output has high sensitivity or when ( , )-differential privacy is needed instead of strict -differential privacy.
Composition Theorems
When multiple differentially private mechanisms are applied, the privacy budget accumulates. Differential privacy provides a framework for measuring and bounding the cumulative privacy loss from multiple analyses of information about the same individuals. Two key composition approaches3 are:
Sequential Composition
If mechanisms are applied, each with privacy guarantees , the total privacy guarantee is:
Suppose a dataset is queried three times using mechanisms with , , and . The total privacy budget consumed is:
This means the combined analysis satisfies -differential privacy.
Advanced Composition
A tighter bound is given by advanced composition, which accounts for small failures:
Suppose a dataset is queried 100 times, each query satisfying -differential privacy. Using advanced composition:
Calculate the first term:
Calculate the second term:
Thus, after 100 queries, the total privacy guarantee is approximately .
Comparing Sequential and Advanced Composition
Aspect | Sequential Composition | Advanced Composition |
---|---|---|
Privacy Parameter ( ) | Sum of all individual values. | Tighter bound with sublinear growth. |
Failure Probability ( ) | Assumes . | Allows for small . |
Scaling with | Linear scaling: . | Sublinear scaling with . |
Accuracy | Less noise is required for individual mechanisms but results in higher cumulative noise. | Supports smaller cumulative noise. |
Use Cases | Suitable for strict privacy guarantees. | Suitable for practical settings with relaxed privacy. |
Sensitivity
The sensitivity1 of a function quantifies its robustness to changes in individual data points. Sensitivity determines the amount of noise required to ensure privacy. Currently, the global and local sensitivity are being mainly used in differential privacy.
Global Sensitivity
Global sensitivity is the maximum change in the output of a function when applied to any two neighboring datasets and , differing by a single element. For function , the or global sensitivity is defined as:
Where:
- : Neighboring datasets differing in one record.
- : The -norm, representing the absolute difference in outputs.
Global sensitivity is the maximum differences in output with consideration of all possible datasets and is therefore only dependent on the query and not the dataset.
Local Sensitivity
Local sensitivity4 is a finer measure defined for a specific dataset . It measures the maximum change in the output of function for all neighboring datasets of :
Local sensitivity attempts to calculate the sensitivity for a local data set, where the possible changes are bound by the local data set and not the universe of all data sets.
While local sensitivity may be smaller than global sensitivity for specific datasets, it is less robust for privacy guarantees since it depends on the specific dataset and not the worst-case scenario.
Sensitivity for Common Functions
Counting Queries: For functions that count individuals satisfying a condition, .
Summation Queries: If data values are bounded (e.g., income within ), the sensitivity is the range of possible values. For summing incomes:
Average Queries: Sensitivity for averages depends on both the range of values and the dataset size :
Maximum or Minimum Queries: Sensitivity is the largest possible change in the maximum or minimum when a single data point is added or removed.
Comparing Global Sensitivity vs. Local Sensitivity
Aspect | Global Sensitivity | Local Sensitivity |
---|---|---|
Definition | Measures the maximum change in the output of a function over all possible neighboring datasets. | Measures the maximum change in the output of for a specific dataset and its neighbors. |
Scope | Evaluates worst-case sensitivity across all possible datasets. | Evaluates sensitivity for a particular dataset . |
Robustness | Independent of the specific dataset; provides universal guarantees. | Dependent on the specific dataset; may not generalize to other datasets. |
Noise Requirement | Requires more noise to account for the worst-case scenario. | May allow less noise for datasets with lower local sensitivity. |
Use Case | Commonly used for general differential privacy guarantees. | Suitable for improving accuracy when local sensitivity is significantly smaller. |
Accuracy | May result in less accurate outputs due to higher noise. | Can yield more accurate results for specific datasets with low local sensitivity. |
Computation Complexity | Straightforward to compute for many functions but can be conservative. | Requires evaluating sensitivity for every neighboring dataset, which can be complex. |
Examples | For a count query: , as adding/removing one individual changes the count by 1. | For a count query: Sensitivity may be less than 1 if the specific dataset structure limits changes. |
Exponential Mechanism
The exponential mechanism5 is useful for selecting outputs in a way that prioritizes utility while maintaining privacy. It assigns probabilities to each potential output based on a utility function :
Where:
- : Utility function representing the quality of .
- : Sensitivity of the utility function.
- is the privacy budget.
Unlike mechanisms that add noise to the output (e.g., Laplace Mechanism), the Exponential Mechanism modifies the selection probability, making it suitable when adding noise would render the output meaningless or when the output space is non-numeric. It extends the concept of differential privacy to non-numeric data types and provides a foundation for private data analysis in more complex settings.
Laplace Mechanism vs. Gaussian Mechanism vs. Exponential Mechanism
Aspect | Laplace Mechanism | Gaussian Mechanism | Exponential Mechanism |
---|---|---|---|
Purpose | Adds noise to the output of a numeric query to achieve -differential privacy. | Adds noise to the output of a numeric query to achieve -differential privacy. | Selects an output from a discrete set based on a utility function, ensuring -differential privacy. |
Type of Output | Numeric values (e.g., sums, counts, averages). | Numeric values (e.g., sums, counts, averages). | Categorical or discrete outputs (e.g., choosing the best item or category). |
Noise Distribution | Noise is drawn from a Laplace distribution: . | Noise is drawn from a Gaussian (normal) distribution: . | Selection probabilities are proportional to . |
Parameters | (privacy budget). | (privacy budget), (failure probability). | (privacy budget), (sensitivity of the utility function). |
Sensitivity Requirement | Requires global sensitivity of the function. | Requires global sensitivity of the function. | Requires sensitivity of the utility function. |
Privacy Guarantee | Strict -differential privacy. | Approximate -differential privacy. | Strict -differential privacy. |
Noise Scaling | Noise scales linearly with . | Noise scales with . | No direct noise; selection probabilities are adjusted based on utility and . |
Failure Probability ( ) | No failure probability; guarantees hold strictly for all cases. | Allows a small failure probability ( ) for more flexibility in noise calibration. | No failure probability; guarantees hold strictly for all cases. |
Key Strength | Simple to implement and provides strict privacy guarantees. | Handles high-dimensional queries and provides a trade-off with a small failure probability ( ). | Ensures privacy while prioritizing outputs with higher utility, suitable for categorical data. |
Key Limitation | May add excessive noise for high-dimensional queries. | Requires additional parameter ( ) and more noise for strict privacy. | Limited to discrete or categorical output spaces; requires well-defined utility functions. |
Applications | Numeric data analysis, such as counting, summation, and mean estimation. | Machine learning, high-dimensional statistics, and approximate differential privacy settings. | Parameter selection, private feature selection, or any situation with categorical output. |
Accuracy vs. Privacy Tradeoff | Balances accuracy and privacy by adjusting ; may degrade accuracy with large sensitivity. | Allows finer trade-offs by adjusting both and ; suitable for approximate guarantees. | Balances utility and privacy; prioritizes high-utility outputs while maintaining privacy. |
Conclusion
Differential privacy provides a rigorous mathematical guarantee for protecting individual data in computations. These six equations form the core of differential privacy and its mechanisms. By carefully choosing the noise and parameters, analysts can ensure a balance between privacy and accuracy. Understanding these equations empowers practitioners to implement robust privacy-preserving techniques in real-world applications.
Refrences
Dwork, Cynthia, Frank McSherry, Kobbi Nissim, and Adam Smith. ‘Calibrating Noise to Sensitivity in Private Data Analysis’. In Proceedings of the Third Conference on Theory of Cryptography, 265–84. TCC’06. Berlin, Heidelberg: Springer-Verlag, 2006. https://doi.org/10.1007/11681878_14. ↩︎ ↩︎ ↩︎
Mironov, Ilya. 2012. ‘On Significance of the Least Significant Bits for Differential Privacy.’ In Proceedings of the 2012 ACM Conference on Computer and Communications Security, 650–61. Raleigh North Carolina USA: ACM. https://doi.org/10.1145/2382196.2382264. ↩︎
Kairouz, Peter, Sewoong Oh, and Pramod Viswanath. 2015. “The Composition Theorem for Differential Privacy.” arXiv. https://doi.org/10.48550/arXiv.1311.0776. ↩︎
Nissim, Kobbi, Sofya Raskhodnikova, and Adam Smith. 2007. “Smooth Sensitivity and Sampling in Private Data Analysis.” In Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing, 75–84. San Diego California USA: ACM. https://doi.org/10.1145/1250790.1250803. ↩︎
McSherry, Frank, and Kunal Talwar. ‘Mechanism Design via Differential Privacy’. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS’07), 94–103, 2007. https://doi.org/10.1109/FOCS.2007.66. ↩︎