Reading Material on Data Mining
Anas AP & Alex Titty John
• What is Data?
Data is a collection of facts and information or unprocessed information. Example: Student names, Addresses, Phone Numbers etc.
• What is a Database?
A structured set of data held in a computer which is accessible in various ways. Example: Electronic Address Book, Phone Book.
• What is a Data Warehouse?
The electronic storage of large amount of data by business.
Concept originated in 1988
IBM researchers Barry Devlin & Paul Murphy
Used in business for DATA MINING & data exploration
Data warehouse is a decision support database that is maintained separately from the organization's operational data base. Supports Information processing, by providing a solid platform of consolidated, historical data for analysis.
“A process of transforming data into information and making it available to users in a timely enough manner to make a difference” [Forrester Research, April 1996]
• What is Data Mining?
“Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner.”
Data mining is the exploration and analysis of large quantities of data in order to discover valid, novel, potentially useful and understandable patterns in data.
Valid: The patterns are true.
Novel: We did not know the pattern beforehand.
Useful: We can devise actions from the patterns.
Understandable: We can interpret and comprehend the patterns.
The relationships and summaries derived through a data mining exercise are often referred to as models or patterns. Examples include linear equations, rules, clusters, graphs, tree structures, and patterns in time series • What’s the difference between data mining and data warehousing Data mining is the process of finding patterns in a given data set. These patterns can often provide meaningful and insightful data to whoever is interested in that data. Data mining is used today in a wide variety of contexts – in fraud detection, as an aid in marketing campaigns, and even supermarkets use it to study their consumers.
Data warehousing can be said to be the process of centralizing or aggregating data from multiple sources into one common repository.
Example of data mining
If you’ve ever used a credit card, then you may know that credit card companies will alert you when they think that your credit card is being fraudulently used by someone other than you. This is a perfect example of data mining – credit card companies have a history of your purchases from the past and know geographically where those purchases have been made. If all of a sudden some purchases are made in a city far from where you live, the credit card companies are put on alert to a possible fraud since their data mining shows that you don’t normally make purchases in that city. Then, the credit card company can disable your card for that transaction or just put a flag on your card for suspicious activity.
Example of data warehousing
What Facebook does? Facebook basically gathers all of your data – your friends, your likes, who you stalk, etc – and then stores that data into one central repository. Even though Facebook most likely stores your friends, your likes, etc, in separate databases, they do want to take the most relevant and important information and put it into one central aggregated database. Why would they want to do this? For many reasons – they want to make sure that you see the most relevant ads that you’re most likely to click on, they want to make sure that the friends that they suggest are the most relevant to you, etc – keep in mind that this is the data mining phase, in which meaningful data and patterns are extracted from the aggregated data. But, underlying all these motives is the main motive: to make more money – after all, Facebook is a business.