Database Ralationship

Only available on StudyMode
  • Download(s) : 9
  • Published : April 22, 2013
Open Document
Text Preview
Linking Named Entities to Any Database
Avirup Sil∗ Temple University Philadelphia, PA avi@temple.edu Yinfei Yang St. Joseph’s University Philadelphia, PA yangyin7@gmail.com Abstract Existing techniques for disambiguating named entities in text mostly focus on Wikipedia as a target catalog of entities. Yet for many types of entities, such as restaurants and cult movies, relational databases exist that contain far more extensive information than Wikipedia. This paper introduces a new task, called Open-Database Named-Entity Disambiguation (Open-DB NED), in which a system must be able to resolve named entities to symbols in an arbitrary database, without requiring labeled data for each new database. We introduce two techniques for Open-DB NED, one based on distant supervision and the other based on domain adaptation. In experiments on two domains, one with poor coverage by Wikipedia and the other with near-perfect coverage, our Open-DB NED strategies outperform a state-of-the-art Wikipedia NED system by over 25% in accuracy.

Ernest Cronin∗ Penghai Nie St. Joseph’s University St. Joseph’s University Philadelphia, PA Philadelphia, PA ernest.cronin@gmail.com nph87903@gmail.com Ana-Maria Popescu Yahoo! Labs Sunnyvale, CA amp@yahoo-inc.com Alexander Yates Temple University Philadelphia, PA yates@temple.edu

referents, but exclusive focus on Wikipedia as a target for NED systems has significant drawbacks: despite its breadth, Wikipedia still does not contain all or even most real-world entities mentioned in text. As one example, it has poor coverage of entities that are mostly important in a small geographical region, such as hotels and restaurants, which are widely discussed on the Web. 57% of the named-entities in the Text Analysis Conference’s (TAC) 2009 entity linking task refer to an entity that does not appear in Wikipedia (McNamee et al., 2009). Wikipedia is clearly a highly valuable resource, but it should not be thought of as the only one. Instead of relying solely on Wikipedia, we propose a novel approach to NED, which we refer to as Open-DB NED: the task is to resolve an entity to Wikipedia or to any relational database that meets mild conditions about the format of the data, described below. Leveraging structured, relational data should allow systems to achieve strong accuracy, as with domain-specific or database-specific NED techniques like Hoffart et al.’s NED system for YAGO (Hoffart et al., 2011). And because of the availability of huge numbers of databases on the Web, many for specialized domains, a successful system for this task will cover entities that a Wikipedia NED or database-specific system cannot. We investigate two complementary learning strategies for Open-DB NED, both of which significantly relax the assumptions of traditional NED systems. The first strategy, a distant supervision approach, uses the relational information in a given database and a large corpus of unlabeled text to learn a database-specific model. The second strat-

1

Introduction

Named-entity disambiguation (NED) is the task of linking names mentioned in text with an established catalog of entities (Bunescu and Pasca, 2006; Ratinov et al., 2011). It is a vital first step for semantic understanding of text, such as in grounded semantic parsing (Kwiatkowski et al., 2011), as well as for information retrieval tasks like person name search (Chen and Martin, 2007; Mann and Yarowsky, 2003). NED requires a catalog of symbols, called referents, to which named-entities will be resolved. Most NED systems today use Wikipedia as the catalog of

egy, a domain adaptation approach, assumes a single source database that has accompanying labeled data. Classifiers in this setting must learn a model that transfers from the source database to any new database, without requiring new training data for the new database. Experiments show that both strategies outperform a state-of-the-art Wikipedia NED system by wide margins without requiring any labeled...
tracking img