Distributed and Parallel Database Systems
M. Tamer Ozsu
Department of Computing Science
University of Alberta
Edmonton, Canada T6G 2H1
78153 LE Chesnay Cedex
The maturation of database management system (DBMS) technology has coincided with signiﬁcant developments in distributed computing and parallel processing technologies. The end result is the emergence of distributed database management systems and parallel database management systems. These systems have started to become the dominant data management tools for highly data-intensive applications. The integration of workstations in a distributed environment enables a more efﬁcient function distribution in which application programs run on workstations, called application servers, while database functions are handled by dedicated computers, called database servers. This has led to the present trend in distributed system architecture, where sites are organized as specialized servers rather than as general-purpose computers.
A parallel computer, or multiprocessor, is itself a distributed system made of a number of nodes (processors and memories) connected by a fast network within a cabinet. Distributed database technology can be naturally revised and extended to implement parallel database systems, i.e., database systems on parallel computers [DeWitt and Gray, 1992, Valduriez, 1993]. Parallel database systems exploit the parallelism in data management [Boral, 1988] in order to deliver high-performance and high-availability database servers at a much lower price than equivalent mainframe computers [DeWitt and Gray, 1992, Valduriez, 1993]. In this paper, we present an overview of the distributed DBMS and parallel DBMS technologies, highlight the unique characteristics of each, and indicate the similarities between them. This discussion should help establish their unique and complementary roles in data management.
A distributed database (DDB) is a collection of multiple, logically interrelated databases distributed over a computer network. A distributed database management system (distributed DBMS) is then deﬁned as the software system that permits the management of the distributed database and makes the distribution ¨
transparent to the users [Ozsu and Valduriez, 1991a]. These deﬁnitions point to two identifying architectural principles. The ﬁrst is that the system consists of a (possibly empty) set of query sites and a non-empty set of data sites. The data sites have data storage capability while the query sites do not. The latter only run the user interface routines in order to facilitate the data access at data sites. The second is that each site (query or data) is assumed to logically consist of a single, independent computer. Therefore, each site has its own primary and secondary storage, runs its own operating system (which may be the same or different at different sites), and has the capability to execute applications on its own. The sites are interconnected by a computer network rather than a multiprocessor conﬁguration. The important point here is the emphasis on loose interconnection between processors which have their own operating systems and operate independently.
The database is physically distributed across the data sites by fragmenting and replicating the data [Ceri et al., 1987]. Given a relational database schema, fragmentation subdivides each relation into horizontal or vertical partitions. Horizontal fragmentation of a relation is accomplished by a selection operation which places each tuple of the relation in a different partition based on a fragmentation predicate (e.g., an Employee relation may be fragmented according to the location of the employees). Vertical fragmentation, divides a relation into a number of fragments by projecting over its attributes (e.g., the Employee relation may be fragmented such that the Emp number, Emp name and Address...
References: [Abbadi et al., 1985] A. E. Abbadi, D. Skeen, and F. Cristian. “An Efﬁcient, Fault–Tolerant Protocol
for Replicated Data Management”, In Proc
[Apers et al., 1992] P. Apers, C. van den Berg, J. Flokstra, P. Grefen, M. Kersten, A. Wilschut. “Prisma/DB:
a Parallel Main-Memory Relational DBMS”, IEEE Trans
[Bell and Grimson, 1992] D. Bell and J. Grimson. Distributed Database Systems, Reading, MA: AddisonWesley, 1993.
[Bergsten et al., 1991] B. Bergsten, M. Couprie, P. Valduriez. “Prototyping DBS3, a Shared-Memory Parallel Database System”, In Proc. Int. Conf. on Parallel and Distributed Information Systems, Miami,
Florida, December 1991, pp 226–234.
[Bernstein et al., 1987] P. A. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency Control and Recovery in Database Systems. Reading, Mass.: Addison-Wesley, 1987.
[Boral, 1988] H. Boral. “Parallelism and Data Management”, In Proc. 3rd Int. Conf. on Data and Knowledge
Bases, Jerusalem, June 1988, pp
[Boral et al., 1990] H. Boral, W. Alexander, L. Clay, G. Copeland, S. Danforth, M. Franklin, B. Hart, M.
Smith and P. Valduriez. “Prototyping Bubba, a Highly Parallel Database System”, IEEE Trans. on
Knowledge and Data Engineering (March 1990), 2(1): 4-24.
[Ceri and Pelagatti, 1984] S. Ceri and G. Pelagatti. Distributed Databases: Principles and Systems. New
York: McGraw-Hill, 1984.
[Ceri et al., 1987] S. Ceri, B. Pernici, and G. Wiederhold. “Distributed Database Design Methodologies”,
[Copeland et al., 1988] G
[DeWitt et al., 1990] D.J. DeWitt, S. Ghandeharizadeh, D.A. Schneider, A. Bricker, H.-I Hsiao, and R.
(March 1990), 2(1): 44–62.
[DeWitt and Gray, 1992] D. DeWitt and J. Gray. “Parallel Database Systems: The Future of HighPerformance Database Systems”, Communications of ACM (June 1992), 35(6):85–98.
[Dogac et al., 1994] A
Database Systems, Berlin: Springer-Verlag, 1994.
[EDS, 1990] European Declarative System (EDS) Database Group. EDS-Collaborating for a HighPerformance Parallel Relational Database. In Proc. ESPRIT Conf., Brussels, November 1990.
[Elmagarmid, 1992] A.K. Elmagarmid (ed.). Transaction Models for Advanced Database Applications. San
Mateo, CA: Morgan Kaufmann, 1992.
[Freytag et al., 1993] J-C. Freytag, D. Maier, and G. Vossen. Query Processing for Advanced Database
[Freytag, 1987] J-C. Freytag. “A Rule-based View of Query Optimization”, In Proc. ACM SIGMOD Int.
Conf. on Management of Data, San Francisco, 1987, pp 173–180.
[Fushimi et al., 1986] S. Fushimi, M. Kitsuregawa and H. Tanaka. “An Overview of the System Software
of a Parallel Relational Database Machine GRACE”, In Proc
[Garcia-Molina and Lindsay, 1990] H. Garcia-Molina and B. Lindsay. “Research Directions for Distributed
Databases”, IEEE Q
[Ghandeharizadeh et al., 1992] S. Ghandeharizadeh, D. DeWitt, W. Quresh., “A Performance Analysis of
Alternative Multi-Attributed Declustering Strategies”, ACM SIGMOD Int
[Gifford, 1979] D. K. Gifford. “Weighted Voting for Replicated Data”, In Proc. 7th ACM Symp. on Operating System Principles, Paciﬁc Grove, Calif., December 1979, pp. 150–159.
[Graefe, 1990] G. Graefe. “Encapsulation of Parallelism in the Volcano Query Processing Systems”, In
[Gray, 1981] J. Gray. “The Transaction Concept: Virtues and Limitations”, In Proc. 7th Int. Conf. on Very
Large Data Bases, Cannes, France, September 1981, pp
[Gray, 1979] J. N. Gray. “Notes on Data Base Operating Systems”, In Operating Systems: An Advanced
[Gray and Reuter, 1993] J. Gray and A. Reuter. Transaction Processing: Concepts and Techniques. San
Mateo, CA: Morgan Kaufmann, 1993.
[Hsiao and DeWitt, 1991] H.-I
[Ibaraki and Kameda, 1984] T. Ibaraki and T. Kameda. “On the Optimal Nesting Order for Computing
N-Relation Joins”, ACM Trans
[Ioannidis and Wong, 1987] Y. Ioannidis and E. Wong. “Query Optimization by Simulated Annealing”, In
[Ioannidis and Kang, 1990] Y. Ioannidis and Y.C. Kang. “Randomized Algorithms for Optimizing Large
Join Queries”, In Proc
[Lorie et al., 1989] R. Lorie, J-J. Daudenarde, G. Hallmark, J. Stamos, H. Young. “Adding Intra-parallelism
to an Existing DBMS: Early Experience”, IEEE Bull
[Mohan and Lindsay, 1983] C. Mohan and B. Lindsay. “Efﬁcient Commit Protocols for the Tree of Processes Model of Distributed Transactions”, In Proc. 2nd ACM SIGACT–SIGMOD Symp. on Principles
of Distributed Computing, 1983, pp
[Orfali et al., 1994] R. Orfali, D. Harkey and J. Edwards. Essential Client/Server Survival Guide, New
York, John Wiley, 1994.
[Ozsu, 1994] M.T. Ozsu. “Transaction Models and Transaction Management in Object-Oriented Database
[Ozsu and Valduriez, 1991a] M.T. Ozsu and P. Valduriez. Principles of Distributed Database Systems,
Englewood Cliffs, NJ: Prentice-Hall, 1991.
[Ozsu and Valduriez, 1991b] M.T. Ozsu and P. Valduriez. “Distributed Database Systems: Where Are We
Now?”, IEEE Computer (August 1991), 24(8): 68–78.
[Ozsu et al., 1994] M.T. Ozsu, U. Dayal and P. Valduriez (eds.). Distributed Object Management, San
Mateo: Morgan Kaufmann, 1994
[Selinger et al., 1979] P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie and T. G. Price.
Please join StudyMode to read the full document