We discussed the importance of and the techniques for, designing database integration systems in Chapter 4. Similar issues arise in data sharing P2P systems. Due to specific characteristics of P2P systems, e.g., the dynamic and autonomous nature of peers, the approaches that rely on centralized global schemas no longer apply. The main problem is to support decentralized schema mapping so that a query expressed on one peer’s schema can be reformulated to a query on another peer’s schema. The approaches which are used by P2P systems for defining and creating the mappings between peers’ schemas can be classified as follows: 1- Pairwise schema mapping,
2-mapping based on machine learning techniques,
3- common agreement mapping,
4-schema mapping using information retrieval (IR) techniques. 1-Pairwise Schema Mapping:
In this approach, each user defines the mapping between the local schema and the schema of any other peer that contains data that are of interest. Relying on the transitivity of the defined mappings, the system tries to extract mappings between schemas that have no defined mapping.
Piazza follows this approach :
An Example of Pairwise Schema Mapping in Piazza
The data are shared as XML documents, and each peer has a schema that defines the terminology and the structural constraints of the peer. When a new peer (with a new schema) joins the system for the first time, it maps its schema to the schema of some other peers in the system. Each mapping definition begins with an XML template that matches some path or subtree of an instance of the target schema. Elements in the template may be annotated with query expressions that bind variables to XML nodes in the source.
Active XML [Abiteboul et al., 2002, 2008b] also relies on XML documents for data sharing. The main innovation is that XML documents are active in the sense that they can includeWeb service calls. Therefore, data and queries can be seamlessly integrated.
another example that follows this approach:
The Local Relational Model (LRM):
LRM assumes that the peers hold relational databases,
and each peer knows a set of peers with which it can exchange data and services. This set of peers is called peer’s acquaintances. Each peer must define semantic dependencies and translation rules between its data and the data shared by each of its acquaintances. The defined mappings form a semantic network, which is used for query reformulation in the P2P system.
Piazza Querying Reformulation Example:
Hyperion [Kementsietsidis et al., 2003]:
generalizes this approach to deal with autonomous peers that form acquaintances at run-time, using mapping tables to define value correspondences among heterogeneous databases. Peers perform local querying and update processing, and also propagate queries and updates to their acquainted peers.
Table from Airline ‘A’
Table from Airline ‘B’
PGrid [Aberer et al., 2003b]:
also assumes the existence of pairwise mappings
between peers, initially constructed by skilled experts. Relying on the transitivity of these mappings and using a gossip algorithm, PGrid extracts new mappings that relate the schemas of the peers between which there is no predefined schema mapping.
2-Mapping based on Machine Learning Techniques:
This approach is generally used when the shared data are defined based on ontologies and taxonomies as proposed for the semantic web. It uses machine learning techniques to automatically extract the mappings between the shared schemas. The extracted mappings are stored over the network, in order to be used for processing future queries.
* GLUE [Doan et al., 2003b] uses this approach as the following:
Given two ontologies,for each concept in one, GLUE finds the most similar concept in the other. It gives well founded probabilistic definitions to several...