CUSTOMERS: East –
East-West Airlines (EA) is entering into partnership with the cellular service provider, Telcon, by marketing their service through direct mail. In order to achieve this, EA dataset is provided to categorize their customers to identify which ones would be likely to purchase Telcon’s services through direct mail. If the accurate categorization is done the partnership will save valuable resources by sending out offers to customers who are likely to accept. The dataset from EA contains 15 variables, which represents spending activity and flight patterns. The task is to use this data and classify existing customers as to whether they would buy Telcon’s service or not using the Naïve Bayes classification model. If the model used is successful then it can be deployed on future customers to categorize potential acceptance.
The data mining model chosen for this project is the Naïve Bayes classification model. This model makes no assumptions about the data and is used primarily for classification; not prediction. This model is works well with large datasets and is simple and computationally efficient in setting up.
The dataset contains 15 variables. Considering the number of variables in the dataset, data reduction is undertaken to identify variables that are correlated and by extensions reducing multicollinearity. 1|Page
From the correlation analysis above we see that 4 variables have a high correlation. These are:
1).flight_trans_12mo and Flight_miles_12 mo
2). any _cc_miles_12mo and cc1_miles
Data reduction will be undertaken by removing variables Flight_trans_12mo and cc1_miles.
The Naïve Bayes classification model will now be applied to the reduced variable dataset. The first step is partitioning the data using standard portioning in the ratio 60:40 for training and validation data