Preview

Steps Involved in the Data Preparation Process

Good Essays
Open Document
Open Document
631 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Steps Involved in the Data Preparation Process
Steps in the data preparation process.
Editing involves reviewing questionnaires to increase accuracy and precision. It consists of screening questionnaires to identify illegible, incomplete, inconsistent, or ambiguous responses. Responses may be illegible if they have been poorly recorded, such as answers to unstructured or open-ended questions. Likewise, questionnaires may be incomplete to varying degrees. A few or many questions may be unanswered. At this stage, the researcher makes a preliminary check for consistency. A response is ambiguous if, for example, the respondent has circled both 4 and 5 on a 7-point scale.
Coding means assigning a code, usually a number, to each possible response to each question. The code includes an indication of the column position and data record it will occupy. For example, gender of respondents may be coded as 1 for females and 2 for males. A field represents a single item of data, such as gender of the respondent. A record consists of related fields, such as sex, marital status, age, household size, and occupation. Thus, each record can have several columns. Generally, all the data for a respondent will be stored on a single record, although a number of records may be used for each respondent. It is often helpful to prepare a codebook containing the coding instructions and the necessary information about the variables in the data set.
Data cleaning is the thorough and extensive checking for consistency and treatment of missing responses. This cleaning process includes consistency checks and treatment of missing responses. While preliminary consistency checks have been made during editing, the checks at this stage are more thorough and extensive, since these are made by computer. Consistency checks are a part of the data cleaning process that identify data that are out of range or logically inconsistent, or that have extreme values. Data with values not defined by the coding scheme are inadmissible. Missing

You May Also Find These Documents Helpful