Abstract A great deal of work has been done demonstrating the ability of machine learning algorithms to automatically extract linguistic knowledge from annotated corpora. Very little work has gone into quantifying the difference in ability at this task between a person and a machine. This paper is a first step in that direction. 1 Introduction
Machine learning has been very successful at solving many problems in the field of natural language processing. It has been amply demonstrated that a wide assortment of machine learning algorithms are quite effective at extracting linguistic information from manually annotated corpora. Among the machine learning algorithms studied, rule based systems have proven effective on many natural language processing tasks, including part-of-speech tagging (Brill, 1995; Ramshaw and Marcus, 1994), spelling correction (Mangu and Brill, 1997), word-sense disambiguation (Gale et al., 1992), message understanding (Day et al., 1997), discourse tagging (Samuel et al., 1998), accent restoration (Yarowsky, 1994), prepositional-phrase attachment (Brill and Resnik, 1994) and base noun phrase identification (Ramshaw and Marcus, In Press; Cardie and Pierce, 1998; Veenstra, 1998; Argamon et al., 1998). Many of these rule based systems learn a short list of simple rules (typically on the order of 50-300) which are easily understood by humans. Since these rule-based systems achieve good performance while learning a small list of simple rules, it raises the question of whether peo*and Woman.
ple could also derive an effective rule list manually from an annotated corpus. In this paper we explore how quickly and effectively relatively untrained people can extract linguistic generalities from a corpus as compared to a machine. There are a number of reasons for doing this. We would like to understand the relative strengths and weaknesses of humans versus machines in hopes of marrying their con~plementary strengths to create even more accurate systems. Also, since people can use their metaknowledge to generalize from a small number of examples, it is possible that a person could derive effective linguistic knowledge from a much smaller training corpus than that needed by a machine. A person could also potentially learn more powerful representations than a machine, thereby achieving higher accuracy. In this paper we describe experiments we performed to ascertain how well humans, given an annotated training set, can generate rules for base noun phrase chunking. Much previous work has been done on this problem and many different methods have been used: Church's PARTS (1988) program uses a Markov model; Bourigault (1992) uses heuristics along with a grammar; Voutilainen's NPTool (1993) uses a lexicon combined with a constraint grammar; Juteson and Katz (1995) use repeated phrases; Veenstra (1998), Argamon, Dagan & Krymolowski(1998) and Daelemaus, van den Bosch & Zavrel (1999) use memory-based systems; Ramshaw & Marcus (In Press) and Cardie & Pierce (1998) use rule-based systems. 2 Learning Base Noun Phrases by Machine
We used the base noun phrase system of Ramshaw and Marcus (R&M) as the machine learning system with which to compare the hu65
man learners. It is difficult to compare different machine learning approaches to base NP annotation, since different definitions of base NP are used in many of the papers, but the R&M system is the best of those that have been tested on the Penn Treebank. 1 To train their system, R&M used a 200k-word chunk of the Penn Treebank Parsed Wall Street Journal (Marcus et al., 1993) tagged using a transformation-based tagger (Brill, 1995) and extracted base noun phrases from its parses by selecting noun phrases...