Preview

Business Intelligence

Better Essays
Open Document
Open Document
2851 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Business Intelligence
Introduction:
The report focuses on data mining approach to predict human wine taste preferences. A large data set is considered with white and red wine samples (“Vinho Verde” wine from Portugal). The inputs include objective tests (e.g. PH values) and the output is based on sensory data (median of at least 3 evaluations made by wine experts). Each expert graded the wine quality between 0 (very bad) and 10 (very excellent). Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).

Datasets Considered:
Each record contains 12 attributes. Each record contains a set of attributes and one attribute (quality) is the class.Theattributes considered are:
1 - fixed acidity, numeric
2 - volatile acidity, numeric
3 - citric acid, numeric
4 - residual sugar, numeric
5 – chlorides, numeric
6 - free sulfur dioxide, numeric
7 - total sulfur dioxide, numeric
8 – density, numeric
9 – pH, numeric
10 – sulphates, numeric
11 – alcohol, numeric
12 – R/W, nominal – R= red, W = white

Class: quality (score between 0 and 10)

Software used - WEKA 3.6.9

A data set of 6497 instances was considered for training. The entire set of data was again considered for cross validation of the model created from training data. From the data set, 21 records was chosen for prediction of class.
Data mining technique used

Classification technique has been used for the project which incorporates analysis of training set and test set to determine the relationship between various attributes with the class and also determines the accuracy of the training set analysis and test set analysis. Random sample is considered later for the prediction using the model built. Multilayer perceptron model has been used to make the prediction.

Training Data

Log obtained in WEKA software for the training set

=== Run information ===

You May Also Find These Documents Helpful

  • Good Essays

    In general, large and small Don’t have to buy everything Appropriate IT infrastructure is crucial, however…

    • 531 Words
    • 3 Pages
    Good Essays
  • Satisfactory Essays

    BPM helps organizations translate a unified set of objectives into plans, monitor execution, and deliver critical insight to…

    • 2168 Words
    • 9 Pages
    Satisfactory Essays
  • Satisfactory Essays

    Earth Science 2

    • 287 Words
    • 2 Pages

    2. Use the reference book to answer the following questions. Provide an explanation for the answers.…

    • 287 Words
    • 2 Pages
    Satisfactory Essays
  • Good Essays

    Accuracy Assessment Paper

    • 344 Words
    • 2 Pages

    Accuracy assessment is an important final step of the classification process. The goal is to…

    • 344 Words
    • 2 Pages
    Good Essays
  • Better Essays

    | * Clear, colorless liquid, disagreeable odor, flammable. Hazardous in case of skin, eye contact and ingestion and inhalation.…

    • 1699 Words
    • 7 Pages
    Better Essays
  • Good Essays

    The data mining model chosen for this project is the Naïve Bayes classification model. This…

    • 642 Words
    • 3 Pages
    Good Essays
  • Good Essays

    Business Intelligence

    • 812 Words
    • 4 Pages

    In an article in Harvard Business Review, Thomas Davenport (2006) argued that the latest strategic weapon for companies is ________.…

    • 812 Words
    • 4 Pages
    Good Essays
  • Good Essays

    ● Change: You do not need to collect each chemical in test tubes. For Steps 2-8:…

    • 851 Words
    • 11 Pages
    Good Essays
  • Satisfactory Essays

    Chapter 7 - K neighbours

    • 520 Words
    • 2 Pages

    e. Re-partition the data, this time into training, validation, and test sets (50%: 30%: 20%). Apply the k-NN method with the k chosen above, compare the classification matrix of the test set with that of the training and validation sets. Comment on the differences and their reason.…

    • 520 Words
    • 2 Pages
    Satisfactory Essays
  • Better Essays

    history markscheme

    • 2242 Words
    • 9 Pages

    Football match results are affected by a number of reasons. this includes the winning and loosing probability, whether away or at home. Despite the accuracy of the results, the frequency of accuracy should be considered when making the expert system. This is due to the probability factor in a football match, which is caused by many factors of a football match. In the making of a prediction system the following have to be made. Consideration of a good feature set is an important stage in initial development of the system. A feature set might contain factors that help better predict the football match. This might include Players ratings; previous wins or losses and probably the history of achievement of both teams.…

    • 2242 Words
    • 9 Pages
    Better Essays
  • Satisfactory Essays

    Here we use the mean decrease in the accuracy to define the predictors used for the classification of the data. More is the mean decrease in the accuracy, more important is the predictor for classification. Based on the output we select the important predictors to be used for running the tree. We have chosen the first 15 predictors with higher mean decrease in the accuracy.…

    • 366 Words
    • 2 Pages
    Satisfactory Essays
  • Better Essays

    The dataset have primarily a categorical type of attribute so there is low information content. This might indicate a decision tree would be an appropriate model to use.…

    • 2014 Words
    • 9 Pages
    Better Essays
  • Powerful Essays

    1. Introduction A problem of estimating the quality of attributes (features) is an important issue in the machine learning. There are several important tasks in the process of machine learning e.g., feature subset selection, constructive…

    • 20047 Words
    • 81 Pages
    Powerful Essays
  • Good Essays

    In this section Leave-One-Out Cross-Validation (LOOCV) was followed with the aim of training and testing the ANN model. In this way, frequently, one sample is kept for testing while the rest is used for training up to all samples are finally tested (26). Before the proposed model is applied to the particular application it must be trained using all available samples (27). The difference between the observed and the predicted values are shown in Fig. 11. The training of network continued until maximum correlation within the measured and predicted output was achieved (Table 3). Correlation expressed by R squared that R2 is coefficient of multiple determinations and relative root mean square error (RMSE) (26). Correlation results are perfect when an R squared value of 1, a very good fit is next to 1 and a very poor fit less than 0. On the other side, how much the value of RRMSE is smaller; the performance of the model is better.…

    • 753 Words
    • 4 Pages
    Good Essays
  • Powerful Essays

    Zhang, R., Zhou, Y, & Ishino, F. (2008). A preliminary study on prediction models for…

    • 3969 Words
    • 16 Pages
    Powerful Essays