Preview

Business Intelligence and Data Mining - Decision Trees

Good Essays
Open Document
Open Document
906 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Business Intelligence and Data Mining - Decision Trees
INDIAN INSTITUTE OF MANAGEMENT, INDORE
Post Graduate Programme – Term IV – AY 20012-13
Business Intelligence And Data Mining
Group Assignment on NGO Donations Maximization

Abstract

The problem is associated to devising a strategy to maximize the profits from a Direct Marketing Campaign to a selected group of customers while minimizing costs . The exercise requires the use of Business Intelligence tools and techniques to build a model , trained and tested on the historical data for the last year’s donation raising campaign . From this model it should be possible to predict the profitability of a prospective donor , hence allowing a more targeted campaign at lower cost . The difficulty is due to extremely imbalanced data and the inverse correlation between the probability of response and the dollar amount generated from it . The available data set and problem is of the KDD-CUP-98 challenge . The solution would be applicable to any direct marketing campaign which has historical data available .

Table of Contents Introduction 4 Performance Based Management 4 Balanced Scorecard 4 Problem in implementation of BSC 8 Literature Review 8 Company Name: Cipla 10 Introduction of the company 10 History 11 Vission & Mission of Cipla 12 Scorecard for Cipla 12 Market 12 Culture 12 Internal 13 R&D 13 Key Learning 15 Outcome/Conclusion 16 References 16

Introduction

The KDD-CUP-98 challenge is related to creation of a model trained and tested on historical data and capable of providing a prediction on the potential donors so as to maximise profit . It will provide a good mailing list so as to target only valuable customers . Typically the existing models predict future response behaviour . The historical database has information about mailing campaigns in the past and the response of customers and the collected dollar amount . The model should predict current customers who are likely to respond and maximize net profit



References: 16 Introduction The KDD-CUP-98 challenge is related to creation of a model trained and tested on historical data and capable of providing a prediction on the potential donors so as to maximise profit . It will provide a good mailing list so as to target only valuable customers . Typically the existing models predict future response behaviour . The historical database has information about mailing campaigns in the past and the response of customers and the collected dollar amount . The model should predict current customers who are likely to respond and maximize net profit ( Donation amount – Mailing cost ) over the contacted customers . The records are from the results of the 1997 Paralyzed Veterans of America fundraising mailing campaign and only 5% records are responders . Thus classification with response value can give 95% accuracy . An approach in ranking customers by estimated probability to respond and selecting top portion , if top 5% of the list contains 30% of responders and hence a lift of 6 , but the drawback is not using the donation amount for the customer . Here there is an inverse correlation between probability to donate and dollar amount as the donors donating higher amount are more cautious . Therefore probability based ranking tends to rank down valuable customers . Another method which adapts accuracy to cost-sensitive learning tries to minimize cost but since the initial list considers probability of response and then considers profitability , tends to ignores valuable consumers who are usually infrequent . The tweaked use of association rules leads to better result then the above suggested methods . It involves the identification of subsets of attributes which are correlated to “respond class” and then a small subset of generated association rules to identify potential customers in the current campaign . The solution tries to increase customer value by selecting association rules and increase profitability over the current customers . Negative association rules may also suggest , given some attributes the chances of not donating . The association rules do not tell how to maximize an objective function especially when there is inverse correlation . The dataset has 191,799 records of customers contacted in the 1997 mailing campaign . Each record has 479 non-target variables and two target variables indicating respond / not_respond and actual donation in dollars . 5% records are respond records and dataset is split into 50% for learning and 50% for validation . The customers are to be evaluated and predicted based on a mailing cost of $0.68 .The inverse correlation could exist in offering for the same customer which can be reduced by avoiding multiple mailings within a time period or for different customers meaning many small contributions and few big customers . The second type of inverse correlation has to be addressed . It can be done in two steps obtain probability estimation from decision trees and re-rank it using customer value , but this also ignores the value in the first step . The other problem is high dimensionality , having 481 variables and small target population leading to difficulty in identifying features for respond class . The one attribute at a time “ gain criterion “ does not search for correlated variables although it is good for maximising class probability but not when non-maximum class probability is also used for ranking customers .The notion of focussed association rules leads to features typical of response class and not of not_respond class i.e. a subset of variables in the respond class which occur infrequently in the not_respond class . This leads to data pruning of not_respond class leading to solution to scarcity of data in target class and also removal variables that are frequent in the non_respond class . The focussed association rules can then be converted into a model for predicting the donation amount for a customer by trying to cover customers using these rules and pruning over-fitting rules and estimating donation amount for rules . The assumption is that current customers follow the same class and donation distribution as that of historical records . Rule Generation ,finds a set of good rules that capture features of responders , Model Building combines rules into prediction model for donation amount and Model Pruning prunes rules that do not generalize to the entire population . Our Approach

You May Also Find These Documents Helpful

  • Better Essays

    Kool-Aid

    • 1080 Words
    • 5 Pages

    Target marketing is a marketing mix that is tailored to fit some specific target customers. In 2011, a multi-tiered campaign was launched to reach families across all platforms. Kool-Aid has stood for fun and refreshment for generations (Kool-Aid Sets Out to 'Bring Back Family Fun ', 2011). The idea for this campaign is to bring back the idea of “Family Fun”. In order to reach customers of all platforms they launched a Facebook campaign, a Kool-Aid sweepstakes, and a movie night. The sweepstakes allowed customers to enter for…

    • 1080 Words
    • 5 Pages
    Better Essays
  • Best Essays

    Kudler is looking for ways to increase sales and customer satisfaction. To achieve this goal Kudler will use data mining tools to predict future trends and behaviors to allow them to make proactive, knowledge-driven decisions. Kudler’s marketing director has access to information about all of its customers: their age, ethnicity, demographics, and shopping habits. The starting point will be a data warehouse containing a combination of internal data tracking all customers contact coupled with external market data about competitor activity. Background information on potential customers also provides an excellent basis for prospecting.…

    • 1512 Words
    • 7 Pages
    Best Essays
  • Powerful Essays

    Cis 500 Data Mining Report

    • 2046 Words
    • 9 Pages

    This report is an analysis of the benefits of data mining to business practices. It also assesses the reliability of data mining algorithms and with examples. “Data Mining is a process that uses statistical, mathematical, artificial intelligence, and machine learning techniques…

    • 2046 Words
    • 9 Pages
    Powerful Essays
  • Good Essays

    Ace Auto Dealer obtains customers from many different advertising avenues. Information about the customers is currently saved, including the customer’s name, address, phone number, date of visit, and the make and model of the vehicle in which the customer is interested. Unfortunately, the data that is currently being saved does not indicate if the customer has made a purchase and from which advertising avenue the customer came. This negatively impacts the business in many ways. First, sales representatives find themselves following up with customers who are no longer in the market for a vehicle. Also, the most effective advertising channel cannot be identified.…

    • 455 Words
    • 2 Pages
    Good Essays
  • Good Essays

    Exercise 1: What is expected outcome of each of the targeting scenarios? (complete both the Ad Revenue and Financial calculators to fully understand the financial impact of the scenarios)…

    • 1037 Words
    • 5 Pages
    Good Essays
  • Good Essays

    Statistics Business Paper

    • 2873 Words
    • 12 Pages

    Unite Automobile Enterprise plan to increase their sales in the upcoming year. The data and statistics that have been collected from previous customers will help determine the course of action that Unite will take when planning their new advertising campaign. With Unite’s limited advertising budget, the need to optimize the effects of their only marketing campaign is essential to securing profits for the forthcoming year. The research information will be comprised of many different variables collected from Unite’s past year’s customers. There are 80 samples collected from previous customer’s including: customer age demographic, the amount of money that a customer of each age demographic was willing to spend on a car, and the type of car; import or domestic, preferred by customer age. By researching this data, the company hopefully will be able to design an effective marketing campaign to successfully draw new customers to the company within the age demographic that has been determined to be the target audience by the majority of money that had been spent by that group.…

    • 2873 Words
    • 12 Pages
    Good Essays
  • Powerful Essays

    Data Mining Problems

    • 1295 Words
    • 6 Pages

    Suppose that we are responsible for managing product placement within a local supermarket. Our shelving units have 6 shelves each and are numbered from 1 to 6—with 1 being the lowest shelf and proceeding upward until the highest shelf is assigned the number 6. While there are many placement options that we should consider, we decide to look for any correlations between the row a product is placed on and its sales. Since we have our data stored in a data warehouse, it is easily accessible and responds quickly to our data request. Consider each of the following:…

    • 1295 Words
    • 6 Pages
    Powerful Essays
  • Powerful Essays

    Airbus A3Xx

    • 8265 Words
    • 34 Pages

    2. Analysis Of Changes In Operating Margin Against Changes In Steady State Number of Planes…

    • 8265 Words
    • 34 Pages
    Powerful Essays
  • Good Essays

    Data Mining

    • 1660 Words
    • 7 Pages

    Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases.…

    • 1660 Words
    • 7 Pages
    Good Essays
  • Good Essays

    Kudler has different types of options on how they could advertise the new shoppers program. Because these types of programs are commonplace within any market, Kudler needs to develop a complete database system, with including current and former customers. This database is used specifically for advertising of the new…

    • 907 Words
    • 4 Pages
    Good Essays
  • Satisfactory Essays

    qat1task5

    • 270 Words
    • 2 Pages

    By developing the likely revenue of market response outcome and summing the results, we obtain the expected…

    • 270 Words
    • 2 Pages
    Satisfactory Essays
  • Powerful Essays

    Week 10 Assignment

    • 1474 Words
    • 5 Pages

    Especially for You Jewelers is a small jewelry company in a college town. Over the last couple of years, it has experienced a tremendous increase in its business. However, its financial performance has not kept pace with its growth. The current system, which is partly manual and partly automated, doesn’t track accounts receivables sufficiently, and the company is finding it difficult to determine the reasons why the receivables are so high. The company runs frequent specials to attract customers but has no idea whether these efforts are profitable or if the benefit—if there is one—comes from associated sales. Especially for You Jewelers wants to increase repeat sales to its existing customers; thus, it needs to develop a customer database. It also wants to install a new direct sales and accounting system to help solve the outlined problems.…

    • 1474 Words
    • 5 Pages
    Powerful Essays
  • Powerful Essays

    Business Intelligence

    • 1635 Words
    • 7 Pages

    Data mining and OLAP are the most common Business Intelligence technologies. The term Business Intelligence refers to computer based methods to identify and extract useful information from business data. Online Analytical Processing commonly known as OLAP provides summary data and generates rich calculations. OLAP is a class of systems that provide answers to multidimensional queries. OLAP is typically used in business reporting for sales, marketing and various such domains. OLAP enables the users to view the data interactively from multiple perspectives.…

    • 1635 Words
    • 7 Pages
    Powerful Essays
  • Better Essays

    One of the considered “best fine food stores” around is the Kudler Fine Foods. However, Kudler is in serious need of a network infrastructure upgrade of their old one. To introduce the latest technologies in data collection; company communication; and information protection while providing the best data speeds and network access; are the main goals of the enterprise network. This huge step is significant as this will increase the revenue and will reduce the costs of operation throughout the Kudler Fine Foods stores. Kudler Fine Foods will go back up to technological speed as the network upgrade is completed, while at the same time improving the way they keep track of inventory and sales by using data mining techniques, which will be collected and analyzed in real time.…

    • 1908 Words
    • 6 Pages
    Better Essays
  • Good Essays

    Considering the level of competition in Business-to-Consumer (B2C) E-commerce environment and the investments required to attract new customers, firms are focusing on reducing their customer churn rate. Churn rate is the ratio of customers who part away with the firm in a specific time period. One of the best mechanism to retain current customers is to identify any potential churn and respond fast. Detecting early signs of a potential churn, recognizing what the customer is looking for by the switch over and automating personalized win back campaigns are essential to sustain business in this era of competition. E-Commerce firms normally possess large volume of data pertaining to their existing customers like past transactions, search history,…

    • 1095 Words
    • 5 Pages
    Good Essays