Abstract - If a distributor has given sensitive data to a set of supposedly trusted agents (third parties) and if some of the data is leaked and found in an unauthorized place, the distributor must assess the likelihood that the leaked data came from one or more agents, as opposed to having been independently gathered by other means. The techniques used improves the probability of identifying leakages and finding guilty agent. These methods do not rely on alterations of the released data so authentication for editing will be provided to keep the track of file getting edited. In this “realistic but fake” data objects are injected to further improve the chances of detecting leakage and identifying the guilty party. Keywords- data leakage, data privacy, fake objects, leakage model, guilty party.
There always has been a need to transfer sensitive data to any supposedly trusted third parties. For example, a company may have partnership with another company so the transactions may involve sharing customer’s private data. The supposedly trusted third parties are the agents and the owner of the data who sends his sensitive data to the agents is called the distributor. Our goal is to detect when the distributor’s sensitive data have been leaked by agents, and to identify the agent that leaked the data. In a technique called perturbation data is modified and made less sensitive before it is handled to agents. In some applications the original sensitive data cannot be perturbed. The distributor after sharing his sensitive data objects, which we consider here in form of file, discovers those objects at some unauthorized place. Ifdistributorfinds“enough evidence” that an agent leaked data, he may initiate legal
proceedings. This model that is being developed will be useful for assessing the “guilt” of agents. Also algorithms are presented for distributing objects to agents, in a way that improves our chances of identifying a leaker. Option of adding “fake” objects to the distributed set is also considered. Such objects do not correspond to real entities but appear realistic to the agents.
The scope of the project is wide. The problem of data leakage was handled traditionally by using watermarking. In this technique unique code would be embedded in each distributed copy of data object. The leaker would easily be identified if that copy would be discovered in hands of unauthorized party. But this was not very effective as this method involves alteration of original data. In case the recipient is malicious watermarking can easily get destroyed. We propose addition of fake objects. These are known only to the sender the receiving agent being unaware of it. They seem realistic to the agents. These fake objects act as a type of watermark without modification of original data objects. This helps in identifying guilty agent who leaks data.
1.3 Paper Organization
The rest of this paper is organized as follows: section 2 gives a brief introduction of problem definition, entities there relations; section 3 gives details about architecture of proposed system, its features and algorithm used; section 4 gives detail information about mathematical notations; section 5 gives result discussion, section 6 gives the applications and finally section 7 gives the conclusion.
II. PROBLEM DEFINITION AND NOTATION
2.1 Entities and Agents
We consider the scenario of an organization where there are ‘n’ number of trusted agents. The agent who creates data in the form of file and shares it with other agent or agent’s is called as Distributor. An Unauthorized agent is the one whom the Distributor doesn’t want his sensitive data to be with. A distributor owns a set D of valuable data objects in the form of files say F=(F1,F2,F3…Fn). The distributor wants to share some of the files with a set...