Open Domain Event Extraction from Twitter

Topics: Named entity recognition, Twitter, Natural language processing Pages: 26 (7610 words) Published: April 22, 2013
Open Domain Event Extraction from Twitter
Alan Ritter
University of Washington Computer Sci. & Eng. Seattle, WA

University of Washington Computer Sci. & Eng. Seattle, WA

Oren Etzioni
University of Washington Computer Sci. & Eng. Seattle, WA

Sam Clark∗
Decide, Inc. Seattle, WA

Tweets are the most up-to-date and inclusive stream of information and commentary on current events, but they are also fragmented and noisy, motivating the need for systems that can extract, aggregate and categorize important events. Previous work on extracting structured representations of events has focused largely on newswire text; Twitter’s unique characteristics present new challenges and opportunities for open-domain event extraction. This paper describes TwiCal— the first open-domain event-extraction and categorization system for Twitter. We demonstrate that accurately extracting an open-domain calendar of significant events from Twitter is indeed feasible. In addition, we present a novel approach for discovering important event categories and classifying extracted events based on latent variable models. By leveraging large volumes of unlabeled data, our approach achieves a 14% increase in maximum F1 over a supervised baseline. A continuously updating demonstration of our system can be viewed at; Our NLP tools are available at twitter_nlp.

Entity Steve Jobs iPhone GOP Amanda Knox

Event Phrase died announcement debate verdict

Date 10/6/11 10/4/11 9/7/11 10/3/11

Type Death ProductLaunch PoliticalEvent Trial

Table 1: Examples of events extracted by TwiCal. events. Yet the number of tweets posted daily has recently exceeded two-hundred million, many of which are either redundant [57], or of limited interest, leading to information overload.1 Clearly, we can benefit from more structured representations of events that are synthesized from individual tweets. Previous work in event extraction [21, 1, 54, 18, 43, 11, 7] has focused largely on news articles, as historically this genre of text has been the best source of information on current events. In the meantime, social networking sites such as Facebook and Twitter have become an important complementary source of such information. While status messages contain a wealth of useful information, they are very disorganized motivating the need for automatic extraction, aggregation and categorization. Although there has been much interest in tracking trends or memes in social media [26, 29], little work has addressed the challenges arising from extracting structured representations of events from short or informal texts. Extracting useful structured representations of events from this disorganized corpus of noisy text is a challenging problem. On the other hand, individual tweets are short and self-contained and are therefore not composed of complex discourse structure as is the case for texts containing narratives. In this paper we demonstrate that open-domain event extraction from Twitter is indeed feasible, for example our highest-confidence extracted future events are 90% accurate as demonstrated in §8. Twitter has several characteristics which present unique challenges and opportunities for the task of open-domain event extraction. Challenges: Twitter users frequently mention mundane events in their daily lives (such as what they ate for lunch) which are only of interest to their immediate social network. In contrast, if an event is mentioned in newswire text, it 1 200-million-tweets-per-day.html

Categories and Subject Descriptors
I.2.7 [Natural Language Processing]: Language parsing and understanding; H.2.8 [Database Management]: Database applications—data mining

General Terms
Algorithms, Experimentation

Social networking sites such as Facebook and...
Continue Reading

Please join StudyMode to read the full document

You May Also Find These Documents Helpful

  • Theme Inference Throught Detection and Extraction of Domain Specific Events Essay
  • Extraction of DNA from Cheek Cells Essay
  • Extraction from the Middle East Essay
  • Dna Extraction from Kiwi Essay
  • Extraction of DNA from onions Essay
  • Twitter Essay
  • Essay about Twitter
  • Events from the 1960s Essay

Become a StudyMode Member

Sign Up - It's Free