DATA WAREHOUSES, DECISION SUPPORT AND DATA MINING
“I certify that the work contained in this paper is wholly mine. This paper has not been used to meet requirements in another course. It has not been purchased nor written by someone else, nor written for me. Exceptions to the aforementioned constitute plagiarism and an honor and ethics violation and therefore will result in a course grade of F and any other University remedies as appropriate.”
Data Warehouses, Decision Support and Data Mining
Data warehousing and on-line analytical processing (OLAP) are key elements of decision support which has primarily become focus on database industry. Decision support places some rather different requirements on database technology compared to the traditional on-line transaction processing applications. This paper provides an overview of data warehousing and OLAP technologies by using back end tools for extracting, cleaning and loading data into a data warehouse; multidimensional data models typical of OLAP; front end client tools for querying and data analysis; server extensions for efficient query processing, with an emphasis on Applications for Data Warehouses such as Decision Support Systems (DSS), On-Line Analytical Processing (OLAP) and Data Mining to deliver advanced capabilities.
2. Data Warehousing Architecture and End-to-End Process
3. Decision support Back End Tools and Utilities
4. Conceptual Model and Front End Tools
5. OLTP Database Design Methodology
6. Data Mining
a. Goals of Data Mining
b. Data Mining Applications
c. Standard data mining process
d. CRISP-Data Mining process
7. Phases in the DM Process: CRISP-DM
Data warehousing is a collection of decision support technologies, aimed at enabling the knowledge workers such as executive, manager, analysts to make better and faster decisions. Data warehousing technologies have been successfully deployed in many industries such as manufacturing for order shipment and customer support, retail for user profiling and inventory management, financial services for claims analysis, risk analysis, credit card analysis, and fraud detection, transportation (for fleet management), telecommunications (for call analysis and fraud detection), utilities (for power usage analysis), and healthcare (for outcomes analysis). This paper presents a roadmap of data warehousing technologies, focusing on the special requirements that data warehouses place on database management systems (DBMSs).
A data warehouse is a “subject-oriented, integrated, time- varying, non-volatile collection of data that is used primarily in organizational decision making.” Typically, the data warehouse is maintained separately from the organization’s operational databases. There are many reasons for doing this. The data warehouse supports on-line analytical processing (OLAP), the functional and performance requirements of which are quite different from those of the on-line transaction processing (OLTP) applications traditionally supported by the operational databases .
OLTP applications typically automate clerical data processing tasks such as order entry and banking transactions that are essential day-to-day operations of an organization. These tasks are structured and repetitive, and consist of short, atomic, isolated transactions.
The transactions require detailed, up-to-date data, and read or update a few (tens of) records accessed typically on their primary keys. The size of Operational databases ranges from hundreds of megabytes to gigabytes in size. Consistency and recoverability of the database are critical, and maximizing transaction throughput is the key performance metric. Consequently, the database is designed to...