Preview

Mobile Based Application

Powerful Essays
Open Document
Open Document
4083 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Mobile Based Application
Manual Annotation of Amharic News Items with Part-of-Speech Tags and its Challenges* Abstract Since September 2005, the Ethiopian Languages Research Center of Addis Ababa University has been engaged in a project called "The Annotation of Amharic News Documents". The project was meant to tag manually each Amharic word in its context with the most appropriate parts-of-speech. This paper presents the POS tagset developed for annotating the news documents, the problems encountered in the process of tagging the news documents and the procedures followed to manually tag them. The major output of the work contains 1065 Amharic news documents (that constitute 210,000 prosodic words) annotated manually with part-ofspeeches and a new tagset for the language derived from the 1065 news item. The outcome of the POS tagging project is assumed to have great contribution for future works in natural language processing of Amharic, including the development of probabilistic part-of-speech taggers (a software which uses a lexicon as a component for automatically assigning words with appropriate part-of-speech and a central component for higher level NLP tools such as parsers), a noun-phrase chunker (a software tool that seeks to identify noun phrases in a text) and for works in speech synthesis, speech recognition, information retrieval, word sense disambiguation, corpus analysis and computational lexicography of Amharic. 1. Introduction In this paper we present a recently completed project work by the Ethiopian Languages Research Center that deals with the parts-of-speech (POS) tagging of Amharic news items. The project was conducted since September 2005 for four months. POS tagging is the process of assigning a POS or other lexical class marker to each word in a corpus (Jurafsky 2005). The project was initiated or stems from understanding the need for lack of basic Amharic


References: Alemu, A. and Asker, L. 2005. "Web Mining for an Amharic -English Bilingual Corpus", in Proceedings of the 1st International Conference on Web Information Systems and Technologies (WEBIST 2005), Miami. Baye Yimam 1987. E.C. yamari��a s�wasiw (Amharic Grammar). Addis Ababa: EMPDA. Demeke, Girma A. (forthcoming). Amharic Word Classes. WCAL 5, August 2006, Addis Ababa University. Jurafsky, D. and James, H. 2000. Speech and Language Processing. Prentice Hall: Mersehazen Wolde Kirkos. 1935 E.C. Amharic Grammar (text in Amharic). Addis Ababa: Artistic Priniting Press. Yacob, D. (1996). System for Ethiopic Representation in ASCII (SERA). http://www.abyssiniacybergateway.net/fidel/. Addresses of the authors: Girma A. Demeke Ethiopian Languages Research Center, Director Addis Ababa University Email: girmaad@gmail.com & Mesfin Getachew Faculty of Informatics, Department of Information Science Addis Ababa University Email: mesgetachew@yahoo.com 16

You May Also Find These Documents Helpful