Ghjkjh

Ghjkjh

Deriving Marketing Intelligence from Online Discussion
Natalie Glance nglance@intelliseek.com Matthew Hurst mhurst@intelliseek.com Kamal Nigam knigam@intelliseek.com Matthew Siegler msiegler@intelliseek.com Robert Stockton rstockton@intelliseek.com Intelliseek Applied Research Center Pittsburgh, PA 15217

Takashi Tomokiyo ttomokiyo@intelliseek.com ABSTRACT
Weblogs and message boards provide online forums for discussion that record the voice of the public. Woven into this mass of discussion is a wide range of opinion and commentary about consumer products. This presents an opportunity for companies to understand and respond to the consumer by analyzing this unsolicited feedback. Given the volume, format and content of the data, the appropriate approach to understand this data is to use large-scale web and text data mining technologies. This paper argues that applications for mining large volumes of textual data for marketing intelligence should provide two key elements: a suite of powerful mining and visualization technologies and an interactive analysis environment which allows for rapid generation and testing of hypotheses. This paper presents such a system that gathers and annotates online discussion relating to consumer products using a wide variety of state-of-the-art techniques, including crawling, wrapping, search, text classiﬁcation and computational linguistics. Marketing intelligence is derived through an interactive analysis framework uniquely conﬁgured to leverage the connectivity and content of annotated online discussion. Categories and Subject Descriptors: H.3.3: Information Search and Retrieval General Terms: Algorithms, Experimentation Keywords: text mining, content systems, computational linguistics, machine learning, information retrieval

from online public communications. For example, there are message boards devoted to a speciﬁc gaming platform, newsgroups centered around a particular make and model of motorcycle, and

References: [1] S. Abney. Partial parsing via ﬁnite-state cascades. In Workshop on Robust Parsing, 8th European Summer School in Logic, Language and Information, 1996. [2] R. Agrawal, S. Rajagopalan, R. Srikant, and Y. Xu. Mining newsgroups using networks arising from social behavior. In Proceedings of the Twelfth International World Wide Web Conference (WWW2003), 2003. [3] R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In J. B. Bocca, M. Jarke, and C. Zaniolo, editors, Proc. 20th Int. Conf. Very Large Data Bases, VLDB, pages 487–499. Morgan Kaufmann, 12–15 1994. [4] R. Baumgartner, S. Flesca, and G. Gottlob. Declarative information extraction, Web crawling, and recursive wrapping with Lixto. Lecture Notes in Computer Science, 2173, 2001. [5] K. D. Bollacker, S. Lawrence, and C. L. Giles. CiteSeer: An autonomous web agent for automatic retrieval and identiﬁcation of interesting publications. In Agents ’98, pages 116–123, 1998. [6] H. Chen, J. Hu, and R. W. Sproat. Integrating geometric and linguistic analysis for e-mail signature block parsing. ACM Transactions on Information Systems, 17(4):343–366, 1999. [7] W. W. Cohen. Data integration using similarity joins and a word-based information representation language. ACM Transactions on Information Systems, 18(3):288—321, 2000. [8] W. W. Cohen, L. S. Jensen, and M. Hurst. A ﬂexible learning system for wrapping tables and lists in HTML documents. In Proceedings of The Eleventh International World Wide Web Conference (WWW-2002), Honolulu, Hawaii, 2002. [9] M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam, and S. Slattery. Learning to construct knowledge bases from the World Wide Web. Artiﬁcial Intelligence, 118(1–2):69–113, 2000. [10] N. Glance and W. Cohen. BoardViewer: Meta-search and community mapping over message boards. Intelliseek Technical Report, 2003. [11] N. Glance, M. Hurst, and T. Tomokiyo. BlogPulse: Automated trend discovery for weblogs. In WWW 2004 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 2004. [12] M. Hurst and K. Nigam. Retrieving topical sentiments from online document collections. In Document Recognition and Retrieval XI, pages 27–34, 2004. [13] L. S. Jensen and W. Cohen. Grouping extracted ﬁelds. In Proceedings of the IJCAI-2001 Workshop on Adaptive Text Extraction and Mining, 2001. [14] T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In Machine Learning: ECML-98, Tenth European Conference on Machine Learning, 1998. [15] D. D. Lewis and J. Catlett. Heterogeneous uncertainty sampling for supervised learning. In Machine Learning: Proceedings of the Eleventh International Conference, 1994. [16] D. D. Lewis and W. A. Gale. A sequential algorithm for training text classiﬁers. In SIGIR ’94, pages 3–12, 1994. [17] N. Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2:285–318, 1988. [18] A. McCallum and K. Nigam. Employing EM in pool-based active learning for text classiﬁcation. In Machine Learning: Proceedings of the Fifteenth International Conference, pages 350–358, 1998. [19] J. Myllymaki. Eﬀective web data extraction with standard XML technologies. In Proc. WWWW10, pages 689–696, May 2001. [20] T. Nasukawa, M. Morohashi, and T. Nagano. Customer claim mining: Discovering knowledge in vast amounts of textual data. Technical report, IBM Research, Japan, 1999. [21] T. Nasukawa and J. Yi. Sentiment analysis: Capturing favorability using natural language processing. In Proceedings of K-CAP ’03, 2003. [22] K. Nigam and M. Hurst. Towards a robust metric of opinion. In AAAI Spring Symposium on Exploring Attitude and Aﬀect in Text, 2004. [23] B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up? sentiment classiﬁcation using machine learning techniques. In Proceedings of EMNLP 2002, 2002. [24] J. G. Shanahan, Y. Qu, and J. Weibe, editors. Computing Attitude and Aﬀect in Text. Springer, Dordrecht, Netherlands, 2005. [25] T. Tomokiyo and M. Hurst. A language model approach to keyphrase extraction. In Proceedings of the ACL Workshop on Multiword Expressions, 2003. [26] Y. Yang. An evaluation of statistical approaches to text categorization. Information Retrieval, 1(1/2):67–88, 1999.

You May Also Find These Documents Helpful

Mat 540 Quiz

Mat 540 Quiz

Nt1310 Unit 3 Assignment 1 Reaction Paper

Nt1310 Unit 3 Assignment 1 Reaction Paper

Cis 500 Data Mining Report

Cis 500 Data Mining Report

Text Mining for Gold

Text Mining for Gold

Alias Name

Alias Name

Midterm Paper

Midterm Paper

How to Analyze a Web Page

How to Analyze a Web Page

Twitter Case Study

Twitter Case Study

Access to Health Care

Access to Health Care

Film Review: Sentiment Analysis Of Movie Review

Film Review: Sentiment Analysis Of Movie Review

Audience profiling

Audience profiling

Spam Analysis: Analysis Of Naïve Bayes

Spam Analysis: Analysis Of Naïve Bayes

Database Ralationship

Database Ralationship

Bullet Screen Case Study

Bullet Screen Case Study

Ghjk

Ghjk

Related Topics