Sentence Level Semantic Classification of Online Product Reviews of Mixed Opinions Using Naive Bayes Classifier

Only available on StudyMode
  • Topic: Naive Bayes classifier, Statistical classification, Document classification
  • Pages : 14 (4166 words )
  • Download(s) : 206
  • Published : January 10, 2013
Open Document
Text Preview
International Journal of Engineering Trends and Technology- Volume3Issue2- 2012

Sentence Level Semantic Classification of Online Product Reviews of Mixed Opinions Using Naive bayes Classifier T.Revathi1, L.V.Ramya2, M.Tanuja3, S.Pavani4, M.Swathi5
Information Science & Technology Department, KL University Green Fields, Vaddeswaram, Andhra Pradesh, India

Abstract— Recent years have marked the beginning and rapid expansion of the social web, where people can freely express their opinion on different objects such as products, persons, topics etc on blogs, forums or e-commerce sites and opinion analysis is one emerging research field. As e-commerce is fast growing, product reviews on the Web have become an important information source for customers’ decision making when they plan to buy products online. Classifying the reviews automatically into different semantic orientations has become a major problem for customers as the reviews are too many for the customers to go through. In this paper we propose a different approach which performs the sentence level classification even the reviews contains mixed opinions. In this approach, a typical feature selection method based on sentence tagging is employed and a naive bayes classifier is used to create a base classification model, which is then combined with certain heuristic rules for review sentence classification. Experiments show that this approach achieves better results than using general naive bayes classifiers. Keywords— Sentence level classification, naive bayes classifier, sentence tagging

I. INTRODUCTION Web contains product reviews in variety of forms such as some particular sites dedicated to a specific type of product like sites for magazines and sites for movie reviews are different. Recent years have marked the strong influence of the “participative, social web” on the lives of both consumers and producer companies. This phenomenon encouraged the development of specialized sites, blogs, forums, as well as the inclusion of a review component in the already existing ecommerce sites, where people can write and read opinions and comments on their “objects” of interest – products, people, and topics. Product reviews, which contain customers’ feelings or opinions, available from many e-commerce web sites and professional product review portals, have become an important information source for a customer’s decision making when he/she plans to purchase products online. However, as the quantities of product reviews are often large, it is difficult for customers to read all of them before they make a decision. Sentiment classification, or semantic orientation classification is a way to automatically classify product reviews into two classes: recommended and not recommended, thus helping customers read them. This

classification approach is usually used to classify a customer’s review in a whole to determine its class. However, in reality, a customer often expresses mixed feelings in one review by pointing out some aspects are excellent but others are not so satisfactory. In this case, it is not reasonable to make an overall classification on it. In this paper, we propose an approach to determine customers’ semantic orientations in product reviews at a smaller granularity level (i.e. sentence level). This sentence-level semantic classification (SLSC) approach employs a naive bayes (NB) classifier, which is used widely in text classification tasks, as its base classification model. It performs part-of-speech tagging to review sentences and uses certain types of words (adjectives, adverbs) as its features, which is different from common feature selection methods used in building naive bayes classification models. In the approach proposed, we concentrated on two main problems that had not been addressed so far by research in the field. The first one was that of discovering the features that will be quantified. The second problem we addressed was that of quantifying the...
tracking img