I. Column Oriented Database
A column-oriented DBMS is a database management system (DBMS) that stores data tables as sections of columns of data rather than as rows of data. In comparison, most relational DBMSs store data in rows. This column-oriented DBMS has advantages for data warehouses, customer relationship management (CRM) systems, and library card catalogs, and other ad hoc inquiry systems where aggregates are computed over large numbers of similar data items. It is possible to achieve some of the benefits of column-oriented and row-oriented organization with any DBMSs. By denoting one as column-oriented, we are referring to both the ease of expression of a column-oriented structure and the focus on optimizations for column-oriented workloads. This approach is in contrast to row-oriented or row store databases and with correlation databases, which use a value-based storage structure. II. History of Column Oriented Database
Column stores or transposed files have been implemented from the early days of DBMS development. TAXIR was the first application of a column-oriented database storage system with focus on information-retrieval in biology in 1969. Statistics Canada implemented the RAPID system in 1976 and used it for processing and retrieval of the Canadian Census of Population and Housing as well as several other statistical applications. RAPID was shared with other statistical organizations throughout the world and used widely in the 1980s. It continued to be used by Statistics Canada until the 1990s. KDB was the first commercially available column oriented database developed in 1993 followed in 1995 by Sybase IQ. However, that has changed rapidly since about 2005 with many open source and commercial implementations. III. Working of Column Oriented Database
A relational database management system provides data that represents a two-dimensional table, of columns and rows. For example, a database might have this table: EmpId
This simple table includes an employee identifier (EmpId), name fields (Lastname and Firstname) and a salary (Salary). This two-dimensional format exists only in theory, in practice, storage hardware requires the data to be serialized into one form or another. The most expensive operations involving hard drives are seeks. In order to improve overall performance, related data should be stored in a fashion to minimize the number of seeks. This is known as locality of reference, and the basic concept appears in a number of different contexts. Hard drives are organized into a series of blocks of a fixed size, typically enough to store several rows of the table. By organizing the data so rows fit within the blocks, and related rows are grouped together, the number of blocks that need to be read or sought is minimized. A column-oriented database serializes all of the values of a column together, then the values of the next column, and so on. For our example table, the data would be stored in this fashion; 10:001,12:002,11:003,22:004;Smith:001,Jones:002,Johnson:003,Jones:004;Joe:001,Mary:002,Cathy:003,Bob:004;40000:001,50000:002,44000:003,55000:004; In this layout, any one of the columns more closely matches the structure of an index in a row-based system. This causes confusion about how a column-oriented store "is really just" a row-store with an index on every column. However, it is the mapping of the data that differs dramatically. In a row-oriented indexed system, the primary key is the rowid that is mapped to indexed data. In the column-oriented system primary key is the data, mapping back to rowids. This may seem subtle, but the difference can be seen in this common modification to the same store: …;Smith:001,Jones:002,004,Johnson:003;…
As two of the records store the same value, "Jones", it is possible to store this only once in the column store, along with...
References: C-Store: A column-oriented DBMS, Stonebraker et al., Proceedings of the 31st VLDB Conference, Trondheim, Norway, 2005
A decomposition storage model, Copeland, George P
N. Bruno, Teaching an old elephant new tricks, in: CIDR ’09, 2009.
Daniel Lemire, Owen Kaser, Kamel Aouiche, "Sorting improves word-aligned bitmap indexes", Data & Knowledge Engineering, Volume 69, Issue 1 (2010), pp. 3-28.
Daniel Lemire and Owen Kaser, Reordering Columns for Smaller Indexes, Information Sciences 181 (12), 2011
Brighthouse: an analytic data warehouse for ad hoc queries, Slezak et al., Proceedings of the 34th VLDB Conference, Auckland, New Zealand, 2008
Please join StudyMode to read the full document