Preview

Movie Rating and Review Summarization in Mobile Environment

Powerful Essays
Open Document
Open Document
8550 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Movie Rating and Review Summarization in Mobile Environment
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART C: APPLICATIONS AND REVIEWS, VOL. 42, NO. 3, MAY 2012

Movie Rating and Review Summarization in Mobile Environment
Chien-Liang Liu, Wen-Hoar Hsaio, Chia-Hoang Lee, Gen-Chi Lu, and Emery Jou
Abstract—In this paper, we design and develop a movie-rating and review-summarization system in a mobile environment. The movie-rating information is based on the sentiment-classification result. The condensed descriptions of movie reviews are generated from the feature-based summarization. We propose a novel approach based on latent semantic analysis (LSA) to identify product features. Furthermore, we find a way to reduce the size of summary based on the product features obtained from LSA. We consider both sentiment-classification accuracy and system response time to design the system. The rating and review-summarization system can be extended to other product-review domains easily. Index Terms—Feature extraction, natural language processing (NLP), text analysis, text mining.

I. INTRODUCTION EOPLE’s opinion has become one of the extremely important sources for various services in ever-growing popular social networks. In particular, online opinions have turned into a kind of virtual currency for businesses looking to market their products, identify new opportunities, and manage their reputations. Meanwhile, cellular phones have definitely become the most-vital part of our lives. There is no doubt that the mobile platform is currently one of the most popular platforms in the world. However, digital content displayed in cellular phones is limited in size, since cellular phones are physically small. Hence, a mechanism that can provide users with condensed descriptions of documents will facilitate the delivery of digital content in cellular phones. This paper explores and designs a mobile system for movie rating and review summarization in which semantic orientation of comments, the limitation of small display capability of



References: [1] B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up?: Sentiment classification using machine learning techniques,” in Proc. ACL-02 Conf. Empirical Methods Natural Lang. Process., 2002, pp. 79–86. [2] P. D. Turney, “Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews,” in Proc. 40th Annu. Meeting Assoc. Comput. Linguist., 2002, pp. 417–424. [3] A. Esuli and F. Sebastiani, “Determining the semantic orientation of terms through gloss classification,” in Proc. 14th ACM Int. Conf. Inf. Knowl. Manage., 2005, pp. 617–624. [4] S. H. Choi, Y.-S. Jeong, and M. K. Jeong, “A hybrid recommendation method with reduced data for large-scale application,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 40, no. 5, pp. 557–566, Sep. 2010. [5] T. Mullen and N. Collier, “Sentiment analysis using support vector machines with diverse information sources,” in Proc. EMNLP, 2004, pp. 412– 418. [6] M. Hu and B. Liu, “Mining and summarizing customer reviews,” in Proc. 10th ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining, 2004, pp. 168– 177. [7] V. Hatzivassiloglou and K. R. McKeown, “Predicting the semantic orientation of adjectives,” in Proc. 8th Conf. Eur. Chap. Assoc. Comput. Linguist., Morristown, NJ: Assoc. Comput. Linguist., 1997, pp. 174–181. [8] A. Esuli and F. Sebastiani, “SENTIWORDNET: A publicly available lexical resource for opinion mining,” in Proc. 5th Conf. Lang. Res. Eval., 2006, pp. 417–422. [9] K. Dave, S. Lawrence, and D. M. Pennock, “Mining the peanut gallery: opinion extraction and semantic classification of product reviews,” in Proc. 12th Int. Conf. World Wide Web, New York: ACM, 2003, pp. 519– 528. [10] V. N. Vapnik, The Nature of Statistical Learning Theory. New York: Springer-Verlag, 1995. [11] B. Pang and L. Lee, “Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales,” in Proc. 43rd Annu. Meet. Assoc. Comput. Linguist, Morristown, NJ: Assoc. Comput. Linguist., 2005, pp. 115–124. [12] A. B. Goldberg and X. Zhu, “Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization,” in Proc. TextGraphs: First Workshop Graph Based Methods Nat. Lang. Process, Morristown, NJ: Assoc. Comput. Linguist., 2006, pp. 45–52. [13] B. Snyder and R. Barzilay, “Multiple aspect ranking using the good grief algorithm,” in Proc. HLT-NAACL, 2007, pp. 300–307. [14] L. Zhuang, F. Jing, and X.-Y. Zhu, “Movie review mining and summarization,” in Proc. 15th ACM Int. Conf. Inf. Knowl. Manage., 2006, pp. 43–50. [15] Y. Lu, C. Zhai, and N. Sundaresan, “Rated aspect summarization of short comments,” in Proc. 18th Int. Conf. World Wide Web, New York: ACM, 2009, pp. 131–140. [16] T. Hofmann, J. Puzicha, and M. I. Jordan, “Learning from dyadic data,” in Proc. Conf. Adv. Neural Inform. Process. Syst. II, Cambridge, MA: MIT Press, 1999, pp. 466–472. [17] T. K. Landauer, P. W. Foltz, and D. Laham, “Introduction to latent semantic analysis,” Discourse Processes, vol. 25, pp. 259–284, 1998. [18] T. Joachims, Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms. Norwell, MA: Kluwer, 2002. [19] C. Silva, U. Lotriˇ , B. Ribeiro, and A. Dobnikar, “Distributed text classic fication with an ensemble kernel-based learning approach,” IEEE Trans. Syst., Man, Cybern. C: Appl. Rev., vol. 40, no. 3, pp. 287–297, May 2010. [20] L. Rokach and O. Maimon, “Top-down induction of decision trees classifiers—A survey,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 35, no. 4, pp. 476–487, Nov. 2005. [21] G. P. Zhang, “Neural networks for classification: A survey,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 30, no. 4, pp. 451–462, Nov. 2000. [22] (2001). LIBSVM: A library for support vector machines [Online]. Available: http://www.csie.ntu.edu.tw/ cjlin/libsvm. [23] T. Hofmann, “Unsupervised learning by probabilistic latent semantic analysis,” Mach. Learn., vol. 42, no. 1/2, pp. 177–196, 2001. [24] A. P. Dempster, N. M. Laird, and D. B. Rubin. (1977). Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soc., Series B [Online]. vol. 39, no. 1, pp. 1–38. Available: http://citeseerx.ist.psu.edu/ viewdoc/summary?doi=10.1.1.133.4884. LIU et al.: MOVIE RATING AND REVIEW SUMMARIZATION IN MOBILE ENVIRONMENT 407 [25] C. D. Manning, P. Raghavan, and H. Schtze, Introduction to Information Retrieval. New York: Cambridge Univ. Press, 2008. [26] D. Ramage, P. Heymann, C. D. Manning, and H. Garcia-Molina, “Clustering the tagged web,” in Proc. 2nd ACM Int. Conf. Web Search Data Mining, New York: ACM, 2009, pp. 54–63. Chia-Hoang Lee received the Ph.D. degree in computer science from the University of Maryland, College Park, in 1983. He is currently a Professor with the Department of Computer Science, National Chiao Tung University, Hsinchu, Taiwan. He was a Faculty Member with the University of Maryland and Purdue University, West Lafayette, IN. His current research interests include artificial intelligence, human–machine interface systems, natural-language processing, and opinion mining. Chien-Liang Liu received the M.S. and Ph.D. degrees in computer science from National Chiao Tung University, Hsinchu, Taiwan, in 2000 and 2005, respectively. He is currently a Postdoctoral Researcher with the Department of Computer Science, National Chiao Tung University. His current research interests include machine learning, natural-language processing, and data mining. Gen-Chi Lu received the Master’s degree in computer science from National Chiao Tung University, Hsinchu, Taiwan, in 2009. He is currently an Engineer with the Global Legal Division iTEC, Hon Hai Precision Industry Company Ltd., Taipei, Taiwan. His current research interests include natural-language processing, opinion mining, and full-text search. Wen-Hoar Hsaio received the B.S. degree from the Department of Computer Science and Information Engineering, Chung Cheng Institute of Technology, National Defense University, Taipei, Taiwan, in 1980 and the M.S. degree in 1996 from the Department of Computer Science, National Chiao Tung University, Hsinchu, Taiwan, where he is currently working toward the Ph.D. degree with the Department of Computer Science. His current research interests include information retrieval, web mining, and machine learning. Emery Jou received the B.S degree in physics from Tsing Hua University, Hsinchu, Taiwan, the M.S. degree in computer science from the University of Texas at Austin, and the Ph.D. degree in computer science from the University of Maryland, College Park. He is currently a Research Scientist with the Institute for Information Industry, Taipei, Taiwan. He was with several Wall Street firms in the United States for more than 12 years (i.e., Morgan Stanley and JPMorganChase) as a System Architect for Security Transaction Processing through Single Sign-on and Public Key Infrastructure. He was also with Thales nCipher, Cambridge, U.K., where he was engaged in Tape Storage Data Encryption and Key Management Systems. In 2009, he was a Visiting Professor with the College of Computer Science, National Chiao Tung University, Hsinchu. He was also a consultant for the Industrial Technology Research Institute, Hsinchu.

You May Also Find These Documents Helpful

  • Powerful Essays

    Catalog Description: In this course, students examine and analyze the information retrieval process in order to more effectively conduct electronic searches, assess search results, and use information for informed decision making. Major topics include search engine technology, human information behavior, evaluation of information quality, and economic and cultural factors that affect the availability and reliability of electronic information. Pre‐ and Co‐requisites: None.…

    • 4452 Words
    • 19 Pages
    Powerful Essays
  • Powerful Essays

    EAGLES. Evaluation of natural language processing systems. (1995). Retrieved October 29, 2006 from the Université de Genève web site: http://www.issco.unige.ch/ewg95/…

    • 5023 Words
    • 21 Pages
    Powerful Essays
  • Good Essays

    Isds Ch 5

    • 3328 Words
    • 14 Pages

    1) DARPA and MITRE teamed up to develop capabilities to automatically filter text-based information sources to generate actionable information in a timely manner.…

    • 3328 Words
    • 14 Pages
    Good Essays
  • Better Essays

    Leadership Analysis Paper

    • 1468 Words
    • 6 Pages

    Sergey Brin; Lawrence Page (1998). "The Anatomy of a Large-Scale Hypertextual Web Search Engine". Stanford University. Stanford University. Retrieved 01 March 2014…

    • 1468 Words
    • 6 Pages
    Better Essays
  • Best Essays

    It Essay - Data Mining

    • 1998 Words
    • 8 Pages

    He, J. (2009). Advances in Data Mining: History and Future. Third International Symposium on Intelligent . Retrieved November 1, 2012, from http://ieeexplore.ieee.org.ezproxy.lib.ryerson.ca/stamp/stamp.jsp?tp=&arnumber=5370232&tag=1…

    • 1998 Words
    • 8 Pages
    Best Essays
  • Satisfactory Essays

    (1888 PressRelease) Together, the two organizations will provide a full range of online automated semantic-based creativity assessments to educators, researchers, and other organizations.…

    • 417 Words
    • 2 Pages
    Satisfactory Essays
  • Powerful Essays

    The Apostolate

    • 8252 Words
    • 34 Pages

    [VGJL94] E. M. Voorhees, N. K. Gupta, and B. Johnson-Laird. The collection fusion problem. In D. K. Harman, editor, Proc. Text Retrieval Conf. (TREC), pages 95–104, Gaithersburg, Maryland, 1994. NIST Special Publication 500-225.…

    • 8252 Words
    • 34 Pages
    Powerful Essays
  • Good Essays

    Naive Bayes

    • 7200 Words
    • 29 Pages

    Hand D.J. and Yu K. (2001) Idiot’s Bayes—not so stupid after all? International Statistical Review, 69, 385–398. Hastie T.J. and Tibshirani R.J. (1990) Generalized Additive Models. London: Chapman and Hall. Jamain A. and Hand D.J. (2005) The na¨ve Bayes mystery: A statistical detective ı story. Pattern Recognition Letters, 26, 1752–1760. Jamain A. and Hand D.J. (2008) Mining supervised classification performance studies: A meta-analytic investigation. Journal of Classification, 25, 87–112. Langley P. (1993) Induction of recursive Bayesian classifiers. Proceedings of the Eighth European Conference on Machine Learning, Vienna, Austria: SpringerVerlag, 153–164. Mani S., Pazzani M.J., and West J. (1997) Knowledge discovery from a breast cancer database. Lecture Notes in Artificial Intelligence, 1211, 130–133. Metsis V., Androutsopoulos I., and Paliouras G. (2006) Spam filtering with na¨ve ı Bayes—which na¨ve Bayes? CEAS 2006—Third Conference on Email and Antiı Spam, Mountain View, California. Sahami M., Dumains S., Heckerman D., and Horvitz E. (1998) A Bayesian approach to filtering junk e-mail. In Learning for Text Categorization—Papers from the AAAI Workshop, Madison, Wisconsin, pp. 55–62. Titterington D.M., Murray G.D., Murray L.S., Spiegelhalter D.J., Skene A.M., Habbema J.D.F., and Gelpke G.J. (1981) Comparison of discrimination techniques applied to a complex data set of head injured patients. Journal of the Royal Statistical Society, Series A, 144, 145–175.…

    • 7200 Words
    • 29 Pages
    Good Essays
  • Good Essays

    Database Ralationship

    • 7781 Words
    • 32 Pages

    Knowledge and Artificial Intelligence: An Evolving Synergy. Xianpei Han and Jun Zhao. 2009. Named entity disambiguation by leveraging Wikipedia semantic knowledge. In Proceeding of the 18th ACM Conference on Information and Knowledge Management (CIKM), pages 215–224. Johannes Hoffart, Mohamed Amir Yosef, Ilaria Bordino, Hagen Furstenau, Manfred Pinkal, Marc Spaniol, Bilyana Taneva, Stefan Thater, and Gerhard Weikum1. 2011. Robust Disambiguation of Named Entities in Text. In EMNLP, pages 782–792. Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke Zettlemoyer, and Daniel S. Weld. 2011. KnowledgeBased Weak Supervision for Information Extraction of Overlapping Relations. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). Fei Huang and Alexander Yates. 2009. Distributional representations for handling sparsity in supervised sequence labeling. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). Sayali Kulkarni, Amit Singh, Ganesh Ramakrishnan, and Soumen Chakrabarti. 2009. Collective annotation of wikipedia entities in web text. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 457–466. Tom Kwiatkowski, Luke Zettlemoyer, Sharon Goldwater, and Mark Steedman. 2011. Lexical Generalization in CCG Grammar Induction for Semantic Parsing. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). M.E. Lesk. 1986. Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In Proceedings of the SIGDOC Conference. Thomas Lin, Mausam, and Oren Etzioni. 2012. Entity linking at web scale. In Knowledge Extraction Workshop (AKBC-WEKEX), 2012. D.C. Liu and J. Nocedal. 1989. On the limited memory method for large scale optimization. Mathematical Programming B, 45(3):503–528. G.S. Mann and D. Yarowsky. 2003. Unsupervised personal name disambiguation. In CoNLL. Paul McNamee, Mark Dredze, Adam Gerber, Nikesh Garera, Tim Finin, James Mayfield, Christine Piatko, Delip Rao, David Yarowsky, and Markus Dreyer. 2009. HLTCOE Approaches to Knowledge Base Population at TAC 2009. In Text Analysis Conference. Rada Mihalcea and Andras Csomai. 2007. Wikify!: Linking documents to encyclopedic knowledge. In…

    • 7781 Words
    • 32 Pages
    Good Essays
  • Best Essays

    6. Klimt, B., & Yang, Y. (2004). The enron corpus: A new dataset for email classification research. In Machine learning: ECML 2004 (pp. 217-226). Springer Berlin Heidelberg.…

    • 3858 Words
    • 13 Pages
    Best Essays
  • Powerful Essays

    LDA is a means of classifying objects, such as documents, based on their underlying topics. I was surprised to see this paper as number one instead of Shannon’s information theory paper (#7) or the paper describing the concept that became Google (#3). It turns out that interest in this paper is very strong among those who list artificial intelligence as their subdiscipline. In fact, AI researchers contributed the majority of readership to 6 out…

    • 1801 Words
    • 8 Pages
    Powerful Essays
  • Powerful Essays

    In the Discussion section, we show how our methods could be used to construct a word-of-mouth metric.…

    • 9049 Words
    • 37 Pages
    Powerful Essays
  • Powerful Essays

    Language, ablity to speak & write and communicate is one of the most fundamental aspects of human behaviour. As the study of human-languages developed the concept of communicating with non-human devices was investigated. This is the origin of natural language processing (NLP). The idea of natural language processing is to design and build a computer system that will analyze , understand and generate natural human-languages. Natural language communication with computers has long been a major goal of artificial intelligence, both for the information it can give about intelligence in general, and for practical utility. There are many applications of natural language processing developed over the years. They can be mainly divided into two parts as follows.…

    • 2576 Words
    • 11 Pages
    Powerful Essays
  • Good Essays

    Essay

    • 5018 Words
    • 21 Pages

    Object: Automated essay scoring is the computer tech-niques and algorithms that evaluate and score essays automat-ically. Compared with human rater, automated essay scoring has the advantage of fairness, less human resource cost and timely feedback. In previous work, automated essay scoring is regarded as a classification or regression problem. Machine learning techniques such as K-nearest-neighbor (KNN), multi-ple linear regression have been applied to solve this problem. In this paper, we regard this problem as a ranking problem and apply a new machine learning method, learning to rank, to solve this problem. We will introduce detailed steps about how to apply learning to rank to automated essay scoring, such as feature extraction, scoring. Experiments in this paper show that learning to rank outperforms other classical machine learning techniques in automated essay scoring.…

    • 5018 Words
    • 21 Pages
    Good Essays
  • Good Essays

    UNIQLO

    • 4287 Words
    • 15 Pages

    Matsumura N, Ohsawa Y, Ishizuka M. Mining and characterizing opinion leaders from threaded online discussions[C]//Proceedings of the 6th International Conference on Knowledge-Based Intelligent Engineering Systems & Allied Technologies. 2002: 1267-1270.…

    • 4287 Words
    • 15 Pages
    Good Essays