Closeness: A New Privacy Measure for Data Publishing
The k-anonymity privacy requirement for publishing micro data requires that each equivalence class (i.e., a set of records that are indistinguishable from each other with respect to certain “identifying” attributes) contains at least k records. Recently, several authors have recognized that k-anonymity cannot prevent attribute disclosure. The notion of ‘-diversity has been proposed to address this; ‘-diversity requires that each equivalence class has at least ‘well-represented (in Section 2) values for each sensitive attribute. In this paper, we show that ‘-diversity has a number of limitations. In particular, it is neither necessary nor sufficient to prevent attribute disclosure. Motivated by these limitations, we propose a new notion of privacy called “closeness.” We first present the base model closeness, which requires that the distribution of a sensitive attribute in any equivalence class is close to the distribution of the attribute in the overall table (i.e., the distance between the two distributions should be no more than a threshold t). We then propose a more flexible privacy model called closeness that offers higher utility. We describe our desiderata for designing a distance measure between two probability distributions and present two distance measures. We discuss the rationale for using closeness as a privacy measure and illustrate its advantages through examples and experiments.
One problem with l-diversity is that it is limited in its assumption of adversarial knowledge. As we shall explain below, it is possible for an adversary to gain information about a sensitive attribute as long as she has information about the global distribution of this attribute. This assumption generalizes the specific background and homogeneity attacks used to motivate diversity. Another problem with privacy-preserving methods, in general, is that they effectively Assume all attributes to be categorical; the adversary either does or does not learn something sensitive. Of course, especially with numerical attributes, being close to the value is often good enough.
In this project, we propose a novel privacy notion called “closeness.” We first formalize the idea of global background knowledge and propose the base model t-closeness which requires that the distribution of a sensitive attribute in any equivalence class to be close to the distribution of the attribute in the overall table (i.e., the distance between the two distributions should be no more than a threshold t). This effectively limits the amount of individual-specific information an observer can learn. However, an analysis on data utility shows that t-closeness substantially limits the amount of useful information that can be extracted from the released data. Based on the analysis, we propose a more flexible privacy model called closeness, which requires that the distribution in any equivalence class is close to the distribution in a large-enough equivalence class (contains at least n records) with respect to the sensitive attribute. This limits the amount of sensitive information about individuals while preserves features and patterns about large groups. Our analysis shows that closeness achieves a better balance between privacy and utility than existing privacy models such as ‘l-diversity and t-closeness.
➢ Privacy measure
➢ Data publishing
• While the released table gives useful information to researchers, it presents disclosure risk to the individuals whose data are in the table. • Therefore, our objective is to limit the disclosure risk to an acceptable level while maximizing the benefit. • This is achieved by anonymizing the data before release. • The first step of anonymization is to remove explicit...
Please join StudyMode to read the full document