Data Warehousing and Data Mining
Submitted in Partial fulfilment of requirement of award of MBA degree of GGSIPU, New Delhi
Submitted By: Swati Singhal (12015603911)
Saba Afghan (11415603911) 2011-2013
Northern India Engineering College
(Affiliated to GGSIPU)
FC-26, Shastri Park, Delhi-110053
Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data ware houses. Data mining tools predict future trends and behaviors, allowing businesses to make proactive, knowledge- driven decisions systems. Data warehouse is a computer system designed to give business decision-makers instant access to information. The warehouse copies its data from existing systems like order entry, general ledger, and human resources and stores it for use by executives rather than programmers. Data warehouse users use special software that enables them to create and access information when they need it, as opposed to a reporting schedule defined by the information systems (IS) department. This paper describes the meaning of data warehouse and data mining basic architecture of data warehousing and data mining, functions and working of data mining. It also presents data mining from data warehouse
Modern organizations are under enormous pressure with recent development of the technology. Clearly we need a rapid access to all kinds of information. To assist this we need to consider the past and to identify relevant trend analysis. So to perform any trend analysis we must have a database. In most organizations you will find really large databases in operation for normal daily transactions. These types of databases are known as operational databases; in most cases they have not been design to store historical data or to respond to queries but simply to support all the applications for day to day transactions. The second type of database found in organizations is the data warehouse. This is designed for strategic decision support and is largely built up from the databases that make up the operational database. The basic characteristic of a data warehouse is that it contains vast amount of data which can mean billions of records. Smaller, local data warehouse are called data marts. A data warehouse is designed especially for decision support queries; therefore only data that is needed for decision support is extracted from the operational data and stored in the data warehouse along with the time when it was retrieved from operational databases.
A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process. Subject-Oriented: A data warehouse can be used to analyze a particular subject area. For example, "sales" can be a particular subject. Integrated: A data warehouse integrates data from multiple data sources. For example, source A and source B may have different ways of identifying a product, but in a data warehouse, there will be only a single way of identifying a product. Time-Variant: Historical data is kept in a data warehouse. For example, one can retrieve data from 3 months, 6 months, 12 months, or even older data from a data warehouse. This contrasts with a transactions system, where often only the most recent data is kept. For example, a transaction system may hold the most recent address of a customer, where a data warehouse can hold all addresses associated with a customer. Non-volatile: Once data is in the data warehouse, it will not...