What is a data warehouse?
A data warehouse is defined in this section as “a pool of data produced to support decision making.” This focuses on the essentials, leaving out characteristics that may vary from one DW to another but are not essential to the basic concept. The same paragraph gives another definition: “a subject-oriented, integrated, time-variant, nonvolatile collection of data in support of management’s decision-making process.” This definition adds more specifics, but in every case appropriate: it is hard, if not impossible, to conceive of a data warehouse that would not be subject-oriented, integrated, etc.
How is a data warehouse different from a database?
Technically a data warehouse is a database, albeit with certain characteristics to facilitate its role in decision support. Specifically, however, it is (see previous question) an “integrated, time-variant, nonvolatile, subject-oriented repository of detail and summary data used for decision support and business analytics within an organization.” These characteristics, which are discussed further in the section just after the definition, are not necessarily true of databases in general—though each could apply individually to a given one. As a practical matter most databases are highly normalized, in part to avoid update anomalies. Data warehouses are highly denormalized for performance reasons. This is acceptable because their content is never updated, just added to. Historical data are static. 3.
What is an ODS?
Operational Data Store is the database from which a business operates on an on-going basis. 4.
Differentiate among a data mart, an ODS, and an EDW.
An ODS (Operational Data Store) is the database from which a business operates on an ongoing basis. Both an EDW and a data mart are data warehouses. An EDW (Enterprise Data Warehouse) is an all-encompassing DW that covers all subject areas of interest to the entire organization. A data mart is a smaller DW designed around one problem, organizational function, topic, or other suitable focus area. 5.
Explain the importance of metadata.
Metadata, “data about data,” are the means through which applications and users access the content of a data warehouse, through which its security is managed, and through which organizational management manages, in the true sense of the word, its information assets. Most database management systems would be unable to function without at least some metadata. Indeed, the use of metadata, which enable data access through names and logical relationships rather than physical locations, is fundamental to the very concept of a DBMS. Metadata are essential to any database, not just a data warehouse. (See answer to Review Question 2 of this section above.)
Section 2.2 Review Questions
Describe the data warehousing process.
The data warehousing process consists of the following steps: 1.
Data are imported from various internal and external sources. 2.
Data are cleansed and organized consistently with the organization’s needs. 3.
Data are loaded into the enterprise data warehouse, or
Data are loaded into data marts.
If desired, data marts are created as subsets of the EDW, or
The data marts are consolidated into the EDW.
Analyses are performed as needed
Describe the major components of a data warehouse.
Data sources. Data are sourced from operational systems and possibly from external data sources. Data extraction. Data are extracted using custom-written or commercial software called ETL. Data loading. Data are loaded into a staging area, where they are transformed and cleansed. The data are then ready to load into the data warehouse. Comprehensive database. This is the EDW that supports decision analysis by providing relevant summarized and detailed information. Metadata. Metadata are maintained for access by IT personnel and users. Metadata include rules for organizing data summaries that are easy to index and search. Middleware...
Please join StudyMode to read the full document