An application of document clustering for Categorizing Open-ended Survey Responses Nishantha Medagoda, Ruvan Weerasinghe University of Colombo School of Computing Sri Lanka firstname.lastname@example.org email@example.com ABSTRACT Open ended questions are an essential and important part of survey questionnaires. They provide an opportunity for researchers to discover unanticipated information regarding the domain of study. However, they are problematic for processing since they are unstructured questions to which possible answers are not suggested, and the respondent is free to answer in his or her own words. This paper presents a method of categorizing such open ended survey responses. A document clustering technique is employed in this study to categorize responses to open-ended survey questions. The algorithm employs several natural language processing techniques to extract a classification of responses automatically. Two experiments were carried out to determine the effectiveness of the proposed algorithm which proved to be promising. Keywords—Open-ended questions, Clustering
Open ended questions in survey questionnaires are unstructured questions in which possible answers are not suggested, and which the respondent is expected to answer in his or her own words. When building a questionnaire for a statistical survey, it is essential to include such open-ended questions to gather unanticipated information. Open-ended questions are those questions that will elicit such additional information from the respondents. Since the freedom of answering these types of questions is given to the respondent, the respondent may write any answers which are related to the question. Such WH-questions usually begin with “how”, “what”, “when”, “where”, or “why”. Therefore there is no specific format for answers to these open-ended questions. In analyzing such responses we need to filter appropriate sentences, words from the responses. Often however, the responses to this type of question in surveys are neglected owing to the difficulty in classifying them into any useful form. The main advantage of including these types of questions is getting more information than the ‘closed questions’. Complexity of data collection, the unpredictable space for storage of responses and the greater time needed for analysis are the main difficulties of dealing with open ended questions in a survey (Bullington. et. al., 1998).
Unlike in the past, today many statistical surveys are conducted over the internet for speedy reply and higher accuracy. Paper based surveys lead to high cost of administration and excessive time taken to complete the survey. As a result of these disadvantages, many researchers now tend to carry out their surveys on-line. The wide availability of online panels motivates the researcher to carry out online surveys especially in market research, product evaluation, and service evaluation among others. In paper based surveys, open ended responses are categorized using code books. These pre-defined codes are available before the survey is conducted or sometimes after conducting the pilot survey for a small sample. It has been identified that these codes are domain specific and some codes do not correspond to the responses of the respondents. Only experienced domain specific experts can compile these codes, and finding such personnel presents a severe bottleneck in the industry. This kind of human intensive process is also time consuming and the resulting accuracy usually very low. Since responses to online surveys are available in electronic form, text processing techniques can be applied conveniently to handle these responses without limitations. The techniques of natural language processing are common in text processing which allow some cognitive aspects included in the answers to be extracted using such techniques (Giorgetti.D and F.Sebastiani, 2000). Some of these techniques are employed in categorizing open-ended responses...
Please join StudyMode to read the full document