Data Preprocessing

Powerful Essays
Data Preprocessing

3

Today’s real-world databases are highly susceptible to noisy, missing, and inconsistent data due to their typically huge size (often several gigabytes or more) and their likely origin from multiple, heterogenous sources. Low-quality data will lead to low-quality mining results. “How can the data be preprocessed in order to help improve the quality of the data and, consequently, of the mining results? How can the data be preprocessed so as to improve the efficiency and ease of the mining process?” There are several data preprocessing techniques. Data cleaning can be applied to remove noise and correct inconsistencies in data. Data integration merges data from multiple sources into a coherent data store such as a data warehouse. Data reduction can reduce data size by, for instance, aggregating, eliminating redundant features, or clustering. Data transformations (e.g., normalization) may be applied, where data are scaled to fall within a smaller range like 0.0 to 1.0. This can improve the accuracy and efficiency of mining algorithms involving distance measurements. These techniques are not mutually exclusive; they may work together. For example, data cleaning can involve transformations to correct wrong data, such as by transforming all entries for a date field to a common format. In Chapter 2, we learned about the different attribute types and how to use basic statistical descriptions to study data characteristics. These can help identify erroneous values and outliers, which will be useful in the data cleaning and integration steps. Data processing techniques, when applied before mining, can substantially improve the overall quality of the patterns mined and/or the time required for the actual mining. In this chapter, we introduce the basic concepts of data preprocessing in Section 3.1. The methods for data preprocessing are organized into the following categories: data cleaning (Section 3.2), data integration (Section 3.3), data reduction

You May Also Find These Documents Helpful

  • Better Essays

    Data Preprocessing

    • 3740 Words
    • 15 Pages

    IT433 Data Warehousing and Data Mining — Data Preprocessing — 1 Data Preprocessing • Why preprocess the data? • Descriptive data summarization • Data cleaning • Data integration and transformation • Data reduction • Discretization and concept hierarchy generation • Summary 2 Why Data Preprocessing? • Data in the real world is dirty – incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data • e.g., occupation=“ ”…

    • 3740 Words
    • 15 Pages
    Better Essays
  • Better Essays

    Data Preprocessing

    • 1261 Words
    • 6 Pages

    5 reasons to use social CRM for support and services By David Taber July 19, 2012 11:57 AM ET 2 Comments . CIO - So this article may come as something of a surprise, as I'm going to be beating the drums about social CRM again, this time for the service and support organization. Why? Social networks give you the quickest access to customers where they already are. Almost anybody you want to do business with is somewhere on Facebook or LinkedIn or product-review networks, at least in the United…

    • 1261 Words
    • 6 Pages
    Better Essays
  • Powerful Essays

    A PREPROCESSING FRAMEWORK FOR AUTOMATIC UNDERWATER IMAGES DENOISING * U.Deepika III MCA Dr.S.N.Geethalakshmi Associate Professor Dr.P.Subashini Associate Professor Department of Computer Science Avinashilingam Institute for Home Science and Higher Education for Women University, Coimbatore, India. *mithilydeep@gmail.com Abstract A major obstacle to underwater operations using cameras comes from the light absorption and scattering by the marine environment, which limits the visibility…

    • 2037 Words
    • 9 Pages
    Powerful Essays
  • Powerful Essays

    Data

    • 1644 Words
    • 7 Pages

    The Title of My Project Methaq Alabidy Submitted to: Professor Robert SE571 Principles of Information Security and Privacy Keller Graduate School of Management Date : 10/19/2012 Table of Contents Executive Summary Company Overview Security Vulnerabilities A hardware and policy Recommended Solutions A Hardware and ploicy soulotion Budget Summary References Executive Summary The purpose of the report is to assist Aircraft Solutions (AS) in…

    • 1644 Words
    • 7 Pages
    Powerful Essays
  • Satisfactory Essays

    data

    • 261 Words
    • 2 Pages

    Data Display Q no.1 Do you have a personal bank account? Bar chart Pie chart Q no.2 Your personal banking account… Bar chart Pie chart Q no.3 If you have a conventional banking account, the reason for this is that… Bar chart Pie chart Q no.4 Do you think Islamic banks are really Islamic (what is your perception)? Bar chart Pie chart Q no.5 Nowadays…

    • 261 Words
    • 2 Pages
    Satisfactory Essays
  • Powerful Essays

    Data

    • 6146 Words
    • 25 Pages

    EFFECT OF BANKING SECTOR REFORMS ON NIGERIAN ECONOMY BY AJAYI, L. B. (Ph.D) DEPARTMENT OF BANKING AND FINANCE FACULTY OF MANAGEMENT SCIENCES EKITI STATE UNIVERSITY OF ADO-EKITI, NIGERIA E-mail: boblaw2006@yahoo.com AND OPADOTUN B.A DEPARTMENT OF BANKING AND FINANCE FACULTY OF MANAGEMENT SCIENCES EKITI STATE UNIVERSITY OF ADO-EKITI, NIGERIA E-mail: bishopobey@yahoo.com ABSTRACT This paper investigates the…

    • 6146 Words
    • 25 Pages
    Powerful Essays
  • Powerful Essays

    Data Mining

    • 5812 Words
    • 24 Pages

    business intelligence, data warehouse, data mining, text and web mining, and knowledge management. Justify and synthesis your answers/viewpoints with examples (e.g. eBay case) and findings from literature/articles. To understand the relationships between these terms, definition of each term should be illustrated. Firstly, business intelligence (BI) in most resource has been defined as a broad term that combines many tools and technologies, used to extract useful meaning of enterprise data in order to help…

    • 5812 Words
    • 24 Pages
    Powerful Essays
  • Powerful Essays

    DATA WAREHOUSES & DATA MINING Term-Paper In Management Support System [pic] Submitted By: Submitted To: Chitransh Naman Anita Ma’am A22-JK903 Lecturer 10900100 MSS ABSTRACT :- Collection of integrated, subject-oriented, time-variant and non-volatile data in support of managements decision making process. Described as the "single point of truth", the "corporate memory", the sole historical register of virtually all transactions…

    • 2771 Words
    • 12 Pages
    Powerful Essays
  • Satisfactory Essays

    Data

    • 2751 Words
    • 12 Pages

    H010: Adjustment of Emotional Score of English Boys and Hindi Girls 1 – Boys, 2 - Girls and 1 - English and 2 – Hindi Group Statistics | | Gender | N | Mean | Std. Deviation | Std. Error Mean | Emotional Score | Boys | 175 | 10.9829 | 3.97329 | .30035 | | Girls | 120 | 13.9750 | 5.18152 | .47301 | Independent Samples Test | | Levene's Test for Equality of Variances | t-test for Equality of Means | | F | Sig. | t | df | Sig. (2-tailed) | Mean Difference | Std. Error Difference…

    • 2751 Words
    • 12 Pages
    Satisfactory Essays
  • Satisfactory Essays

    data

    • 403 Words
    • 2 Pages

    The Woodspurge The poem woodspurge uses different tools of poetry that are common in very good ways which makes a poem what is it gives it a back bone, a structure some might say its format or foundation but to me it is to enhance and to impasses a poem and to make it as relatable and as descriptive as possible. One of the tools used was a meter In poetry, meter is the basic rhythmic structure of a verse or lines in verse. Many traditional verse forms prescribe a specific verse metre, or a certain…

    • 403 Words
    • 2 Pages
    Satisfactory Essays