# Data Mining for Business Intelligence: Data Visualization and Summary Statistics

Powerful Essays
Topics: Data analysis
Chapter 3 – Data
Visualization
Chapter 4 – Summary
Statistics
Intelligence
Shmueli, Patel & Bruce
© Galit Shmueli and Peter Bruce 2010

Data Visualization
• “A picture is worth a thousand words”
• Data visualization and summary statistics help condense data
• Effective presentation
• Supports data cleaning (identify missing values, outliers, incorrect values, duplicates) and exploring (combine some groups)
• Helps identify suitable variables
• Mandatory initial step for most data mining applications Graphs for Data
Exploration
Basic Plots
Line Graphs
Bar Charts
Scatterplots

Distribution Plots
Boxplots
Histograms

Two Examples
Amtrak Ridership:

Boston Housing

Amtrak routinely

Data:

collects data on ridership Goal: To predict future ridership using the series of monthly ridership data between Jan
1991 – March 2004

Census tracts in

Boston
Several variables (14)
– crime rate, location, etc. Goal 1: Predict median value of a home in the tract Goal 2: Cluster census tracts Line Graph for Time Series

Shows how ridership patterns of Amtrak trains change over time

Bar Chart for Categorical
Variable
Determine differences between subgroups
Example: 95% of tracts do not border
Charles River

Scatterplot
Displays relationship between two numerical variables
– median values decreases as percentage of low status population increases

Graphs
 Three most effective plots:
 bar charts – usually for categorical variables
 line graphs – time series data
 Scatterplots – relationship between 2

variables
 Used widely in the business world
 Domain knowledge and nature of the task are

used to select appropriate chart for data at hand Distribution Plots
 Display entire distribution of a numerical

variable
 Display “how many” of each value occur in a data set or, for continuous data or data with many possible values, “how many” values are in each of a series of ranges or “bins”
 Generally useful for prediction tasks
(supervised

## You May Also Find These Documents Helpful

• Powerful Essays

large volumes of business data. The use of database systems in supporting applications that employ query based report generation continues to be the main traditional use of this technology. However, the size and volume of data being managed raises new and interesting issues. Can we utilize methods wherein the data can help businesses achieve competitive advantage, can the data be used to model underlying business processes, and can we gain insights from the data to help improve business processes? These…

• 4568 Words
• 19 Pages
Powerful Essays
• Powerful Essays

Business Intelligence with Data Mining Abstract Banking and finance institutions are growing very fast in this globalization era. Mergers, acquisitions, globalization have made these institutions bigger. No doubt, the data also grow real huge and more varied. Big data storage such as data warehouse and data marts are provided to give a solution on big data storage. On the other sides, those data are needed to be analyzed. Business intelligence finally comes in as a solution in analyzing…

• 2794 Words
• 12 Pages
Powerful Essays
• Good Essays

Systems The goal of the term project is to develop a useful and viable prediction or classification model based on data. You will need to develop a research question, which you refine further based on the availability of data. You may need to merge multiple data sets together. Process: • Each team of 2 or 3 students will work on a business problem involving data analysis with real data. The project will focus on classification and prediction methods we covered during the semester. • A presentation…

• 1123 Words
• 5 Pages
Good Essays
• Satisfactory Essays

Overview: Chapter 2 Data Mining for Business Intelligence Shmueli, Patel & Bruce Core Ideas in Data Mining Classification Prediction Association Rules Data Reduction Data Visualization and exploration Two types of methods: Supervised and Unsupervised learning Supervised Learning Goal: Predict a single “target” or “outcome” variable Training data from which the algorithm “learns” – value of the outcome of interest is known Apply to test data where value is not known and will be predicted…

• 1101 Words
• 9 Pages
Satisfactory Essays
• Powerful Essays

between business intelligence, data warehouse, data mining, text and web mining, and knowledge management. Justify and synthesis your answers/viewpoints with examples (e.g. eBay case) and findings from literature/articles. To understand the relationships between these terms, definition of each term should be illustrated. Firstly, business intelligence (BI) in most resource has been defined as a broad term that combines many tools and technologies, used to extract useful meaning of enterprise data in order…

• 5812 Words
• 24 Pages
Powerful Essays
• Powerful Essays

Chapter 1 Exercises 1. What is data mining? In your answer, address the following: Data mining refers to the process or method that extracts or \mines" interesting knowledge or patterns from large amounts of data. (a) Is it another hype? Data mining is not another hype. Instead, the need for data mining has arisen due to the wide availability of huge amounts of data and the imminent need for turning such data into useful information and knowledge. Thus, data mining can be viewed as the result of…

• 2055 Words
• 9 Pages
Powerful Essays
• Powerful Essays

Data Mining Melody McIntosh Dr. Janet Durgin Information Systems for Decision Making December 8, 2013 Introduction Data mining, or knowledge discovery, is the computer-assisted process of digging through and analyzing enormous sets of data and then extracting the meaning of the data. Data mining tools predict behaviors and future trends, allowing businesses to make proactive, knowledge- driven decisions Although data mining is still in its infancy…

• 2070 Words
• 9 Pages
Powerful Essays
• Powerful Essays

Components of DSS (Decision Support System) Data Store – The DSS Database Data Extraction and Filtering End-User Query Tool End User Presentation Tools Operational Stored in Normalized Relational Database Support transactions that represent daily operations (Not Query Friendly) Differences with DSS 3 Main Differences Time Span Granularity Dimensionality Operational DSS Time span Real time Historic Current transaction Short time frame Long time frame Specific Data facts Patterns Granularity Specific…

• 1589 Words
• 9 Pages
Powerful Essays
• Better Essays

Data mining is a concept that companies use to gain new customers or clients in an effort to make their business and profits grow. The ability to use data mining can result in the accrual of new customers by taking the new information and advertising to customers who are either not currently utilizing the business 's product or also in winning additional customers that may be purchasing from the competitor. Generally, data are any “facts, numbers, or text that can be processed by a computer.” Today…

• 2354 Words
• 10 Pages
Better Essays
• Good Essays

Data Mining On Medical Domain Smita Malik, Karishma Naik, Archa Ghodge, Shivani Gaunker Shree Rayeshwar Institute of Engineering & Information Technology Shiroda, Goa, India. Smilemalik777@gmail.com; naikkarishma39@gmail.com; archaghodge@gmail.com; shivanigaunker@gmail.com Abstract-The successful application of data mining in highly visible fields like retail, marketing & e-business have led to the popularity of its use in knowledge discovery in databases (KDD) in other industries…

• 989 Words
• 4 Pages
Good Essays