Parallel And Distributed Databases

Topics: SQL, Database management system, Database transaction Pages: 40 (12087 words) Published: October 3, 2013
Distributed and Parallel Database Systems
M. Tamer Ozsu
Department of Computing Science
University of Alberta
Edmonton, Canada T6G 2H1

Patrick Valduriez
INRIA, Rocquencourt
78153 LE Chesnay Cedex

The maturation of database management system (DBMS) technology has coincided with significant developments in distributed computing and parallel processing technologies. The end result is the emergence of distributed database management systems and parallel database management systems. These systems have started to become the dominant data management tools for highly data-intensive applications. The integration of workstations in a distributed environment enables a more efficient function distribution in which application programs run on workstations, called application servers, while database functions are handled by dedicated computers, called database servers. This has led to the present trend in distributed system architecture, where sites are organized as specialized servers rather than as general-purpose computers.

A parallel computer, or multiprocessor, is itself a distributed system made of a number of nodes (processors and memories) connected by a fast network within a cabinet. Distributed database technology can be naturally revised and extended to implement parallel database systems, i.e., database systems on parallel computers [DeWitt and Gray, 1992, Valduriez, 1993]. Parallel database systems exploit the parallelism in data management [Boral, 1988] in order to deliver high-performance and high-availability database servers at a much lower price than equivalent mainframe computers [DeWitt and Gray, 1992, Valduriez, 1993]. In this paper, we present an overview of the distributed DBMS and parallel DBMS technologies, highlight the unique characteristics of each, and indicate the similarities between them. This discussion should help establish their unique and complementary roles in data management.

Underlying Principles
A distributed database (DDB) is a collection of multiple, logically interrelated databases distributed over a computer network. A distributed database management system (distributed DBMS) is then defined as the software system that permits the management of the distributed database and makes the distribution ¨

transparent to the users [Ozsu and Valduriez, 1991a]. These definitions point to two identifying architectural principles. The first is that the system consists of a (possibly empty) set of query sites and a non-empty set of data sites. The data sites have data storage capability while the query sites do not. The latter only run the user interface routines in order to facilitate the data access at data sites. The second is that each site (query or data) is assumed to logically consist of a single, independent computer. Therefore, each site has its own primary and secondary storage, runs its own operating system (which may be the same or different at different sites), and has the capability to execute applications on its own. The sites are interconnected by a computer network rather than a multiprocessor configuration. The important point here is the emphasis on loose interconnection between processors which have their own operating systems and operate independently.


The database is physically distributed across the data sites by fragmenting and replicating the data [Ceri et al., 1987]. Given a relational database schema, fragmentation subdivides each relation into horizontal or vertical partitions. Horizontal fragmentation of a relation is accomplished by a selection operation which places each tuple of the relation in a different partition based on a fragmentation predicate (e.g., an Employee relation may be fragmented according to the location of the employees). Vertical fragmentation, divides a relation into a number of fragments by projecting over its attributes (e.g., the Employee relation may be fragmented such that the Emp number, Emp name and Address...

