Lexical Approach for Sentiment Analysis in Hindi

Topics: Adjective, Adverb, Linguistics Pages: 6 (1427 words) Published: February 18, 2013
Lexical Approach for Sentiment Analysis in Hindi
Santosh K
IIITH Hyderabad, India

Rahul Sharma
IIITH Hyderabad, India

Chiranjeev Sharma
IIITH Hyderabad, India

This paper presents a study on sentiment analysis and opinion mining in Hindi on product reviews. We experimented with several methods, mainly focusing on lexical based approaches. Different lexicons were used on same data set to analyse the significance of lexical based approaches.

2.1 Lexicon
Two different lexicons were used in order to test the efficiency of the lexical based approach for sentiment analysis. Each lexicon contains Adjectives and Adverbs and their corresponding positive and negative scores. HSL lexicon has positive, negative and objective score, where as HSWN lexicon has only positive and negative scores. The scores are the probability values of a word being used in a positive, negative or objective (neutral) sense. For any given word in the lexicon, the sum of all the scores is 1. The total score of a word w is given by, total score(w) = P (p) + P (n) + P (o) (1)

General Terms
Languages, Unsupervised

Opinion Mining, Sentiment Analysis

In view of the growing content on web in various Indian languages, there is a need for an analysis of the data from various sources like blogs, product reviews and other social networking websites. This classification can be useful in product analysis, marketing strategies, advertisements and other user specific recommendation systems. Sentiment analysis has been done in English and other languages. But it is fairly new in Hindi and other Indian languages. In this paper we propose a method to classify the reviews in to either positive or negative using a lexicon. Two different lexicons, HSL (Hindi Subjective Lexicon)1 [1] and HSWN (Hindi Sentence WordNet)2 were used and each lexicon contains Adjectives, Adverbs and their corresponding scores.

where, P(p), P(n) and P(o) is the probability of word w being used in a positive, negative and objective (neutral) sense. The size of the lexicons is given in the below table. Lexicon HSL HSWN Adjectives 8108 4861 Adverbs 889 294

Table 1: Size of Lexicons

A lexical based approach is followed, in which the data set is tested against two different lexicons[2]. Each review in the data set is classified based on the calculated score for adjective and adverb presence. Two types of approaches were followed using the Lexicon. Both the approaches are tested on two lexicons. • Using Hindi Parts-of-speech (PoS) tagger 3 , where only words that are tagged as JJ or RB are scored based on the lexcicon. • Without PoS tagger, where every word in the review is searched against the adjectives and adverbs in the lexicon and score in computed. There is a chance that the scores for the adjectives and adverbs are biased or domain dependent, so the reviews are ranked on based on the presence (occurrence) of them. For each of the above two approaches, the following four methods are followed. 3 http://ltrc.iiit.ac.in/showfile.php?filename= downloads/shallow_parser.php

The data set is product reviews in English, translated to Hindi and is validated manually. The data set contains 700 product reviews, out of which 350 are classified as positive and 350 as negative. The length of each review varies from 2 to 30 words. 1 2

HSL (Developed at IIIT, Hyderabad) HSWN (Developed at IIT, Bombay)

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$10.00.

• Adjective presence in the lexicon. • Adjective and...

References: [1] P. Arora, A. Bakliwal, and V. Varma. Hindi subjective lexicon generation using wordnet graph traversal. In CICLing, 2012. [2] A. Bakliwal, P. Arora, and V. Varma. Hindi subjective lexicon : A lexical resource for hindi polarity classification. In LREC, 2012.
Analysis on the usage of PoS tagger
It can be observed from Table 2 and 3 that the use of Hindi PoS tagger lead to decrease in performance by 3 to 5% for HSL lexicon and no significant change in performance for HSWN lexcicon. In case of the merged lexicon (Table 4), the
Continue Reading

Please join StudyMode to read the full document

You May Also Find These Documents Helpful

  • lexical approach Research Paper
  • Lexical Analysis Essay
  • Hindi Research Paper
  • What is Sentiment Analysis and Why is it Important for your business Essay
  • Job Analysis: A Systematic Approach Essay
  • Smiggle Marketing Approach and Analysis Essay
  • Leadership Approach Analysis Essay
  • Analysis of Declaration of Sentiments Essay

Become a StudyMode Member

Sign Up - It's Free