Gate Jape Manual

Only available on StudyMode
  • Topic: Emirates Stadium, Manchester United F.C., Arsenal F.C.
  • Pages : 40 (7829 words )
  • Download(s) : 86
  • Published : April 14, 2013
Open Document
Text Preview
GATE JAPE Grammar Tutorial
Version 1.0
Dhaval Thakker, PA Photos, UK
Taha Osman, Nottingham Trent University, UK
Phil Lakin, PA Photos, UK

February 27, 2009

Table of Contents
Table of Contents...........................................................................................................2 Introduction to General Architecture of Text Engineering (GATE) .............................3 JAPE Rules ....................................................................................................................6 Example 1. A simple example to decide the category of sports ...........................7 Example 2. Multiple patterns in JAPE grammar ................................................10 Example 3. Nested Patterns.................................................................................11 Example 4. Using Part of Speech (POS) features to extract entities ..................15 Example 5. Priority in JAPE rules ......................................................................18 Example 6. Handling repetitiveness in patterns using Macro.............................23 Example 7. Using negation operator in JAPE.....................................................24 Example 8. Using JAVA in RHS of JAPE Grammar .........................................25 Example 9. Using a common file as a holder of application specific JAPE grammar files 26

Example 10.
Using JAVA in RHS of JAPE: A complex example ...................27 Example 11.
Using Split to control the application of a rule to a single sentence.
32
Example 12.
Co referencing..............................................................................33 Example 13.
Creating Temporary annotations and then deleting at the end
when it is no longer useful. ......................................................................................36 Example 14.
Creating new entities to use in the JAPE grammar......................36 Bibliography ................................................................................................................38

Conventions used in this tutorial:

Meaning of a term

Tips on how to do things
Worth Remembering or referring back

For improving the readability of the text, important terms are written in times new roman, italic, and the examples are written in Courier New, italic font such as in the following:
Phase: firstpass
Input: Lookup

//1
//2

The tutorial includes number of examples that are referred in the text here. The examples are in the tutorial folder.

In order to follow this tutorial you must have GATE 5.0 or later version, which is available from the GATE website.

Introduction to General Architecture of Text
Engineering (GATE)
The writer of this tutorial assumes basic understanding of the GATE system and the Java programming language. To be more specific, readers shall be familiar with the working of the GATE interface and ANNIE (A Nearly-New IE system) processing resources of sentence splitter, tokeniser, POS Tagger, gazetteer and JAPE transducer. A brief introduction is given below to these components with respect to their use in this tutorial.

ANNIE
English Tokenizer

English Sentence Splitter

POS Tagger

Gazetteer

JAPE Transducer
Figure 1 Typical GATE system components

The tokeniser splits the text into very simple tokens such as numbers, punctuation and words of different types. For example, GATE distinguishes between words in uppercase and lowercase, and between certain types of punctuation. The aim is to limit the work of the tokeniser to maximise efficiency, and enable greater flexibility by placing the burden on the grammar rules, which are more adaptable.

The sentence splitter is a cascade of finite-state transducers which segments the text into sentences. This module is required for the POS tagger and other modules.

In general terms a transducer is a device that converts one
type of energy to another for various purposes...
tracking img