Chapter 1: Introduction
• Syllabus • Data Independence and Distributed Data Processing • Deﬁnition of Distributed databases • Promises of Distributed Databases • Technical Problems to be Studied • Conclusion Acknowledgements: I am indebted to Arturas Mazeika for providing me his slides of this course. DDB 2008/09 J. Gamper Page 1
• Introduction • Distributed DBMS Architecture • Distributed Database Design • Query Processing • Transaction Management • Distributed Concurrency Control • Distributed DBMS Reliability • Parallel Database Systems
• In the old days, programs stored data in regular ﬁles • Each program has to maintain its own data – huge overhead – error-prone
Data Independence . . .
• The development of DBMS helped to fully achieve data independence (transparency) • Provide centralized and controlled data maintenance and access • Application is immune to physical and logical ﬁle organization
Data Independence . . .
• Distributed database system is the union of what appear to be two diametrically opposed approaches to data processing: database systems and computer network – Computer networks promote a mode of work that goes against centralization
• Key issues to understand this combination
– The most important objective of DB technology is integration not centralization – Integration is possible without centralization, i.e., integration of databases and networking does not mean centralization (in fact quite opposite)
• Goal of distributed database systems: achieve data integration and data distribution transparency
Distributed Computing/Data Processing
• A distributed computing system is a collection of autonomous processing elements that are interconnected by a computer network. The elements cooperate in order to perform the assigned task.
• The term “distributed” is very broadly used. The exact meaning of the word depends on the context.
• Synonymous terms:
– distributed function – distributed data processing – multiprocessors/multicomputers – satellite processing – back-end processing – dedicated/special purpose computers – timeshared systems – functionally modular systems
Distributed Computing/Data Processing . . .
• What can be distributed?
– Processing logic – Functions – Data – Control
• Classiﬁcation of distributed systems with respect to various criteria – Degree of coupling, i.e., how closely the processing elements are connected ∗ e.g., measured as ratio of amount of data exchanged to amount of local processing ∗ weak coupling, strong coupling – Interconnection structure ∗ point-to-point connection between processing elements ∗ common interconnection channel – Synchronization ∗ synchronous ∗ asynchronous DDB 2008/09 J. Gamper Page 7
Deﬁnition of DDB and DDBMS
• A distributed database (DDB) is a collection of multiple, logically interrelated databases distributed over a computer network
• A distributed database management system (DDBMS) is the software that manages the DDB and provides an access mechanism that makes this distribution transparent to the users
• The terms DDBMS and DDBS are often used interchangeably • Implicit assumptions – Data stored at a number of sites each site logically consists of a single processor – Processors at different sites are interconnected by a computer network (we do not consider multiprocessors in DDBMS, cf. parallel systems) – DDBS is a database, not a collection of ﬁles (cf. relational data model). Placement and query of data is impacted by the access patterns of the user – DDBMS is a collections of DBMSs (not a remote ﬁle system)
Deﬁnition of DDB and DDBMS . . .