An Easy Understanding of First and Follow Sets

Only available on StudyMode
  • Topic: Parsing, Security token, Recursive descent parser
  • Pages : 4 (896 words )
  • Download(s) : 147
  • Published : December 10, 2012
Open Document
Text Preview
Calculating FIRST and FOLLOW sets
S. Kamin
FIRST and FOLLOW sets are used when constructing recursive descent parsers (when the grammar is too complex to do it by inspection). They are also used in the construction of LR(1) parsers, although we are not covering that construction in class. Simply put, for non-terminal A, FIRST(A) is the set of tokens that can occur as the first token in a string derivable from A, and FOLLOW(A) is the set of tokens that can occur immediately after A in a string derivable from the start symbol of the grammar. The purpose of this document is to formalize these definitions and illustrate how FIRST and FOLLOW sets are calculated.

Definition A properly pruned syntax tree is a syntax tree in which some (zero or more) nodes have had all their children removed. A sentential form is the frontier of a properly pruned syntax tree.

Example
We’ll be using this grammar for examples in this document: Expr -> Expr + Term | Term
Term -> Term * Primary | Primary
Primary -> FunCall | Id | ( Expr )
Funcall -> Id ( OptArgs )
OptArgs -> ε | Args
Args -> Expr | Args , Expr
For example, given this syntax tree:

We can see that the following are sentential forms: x+f(y), Expr+Term, Expr+Funcall. On the other hand, Expr+f( can be obtained as the frontier of the tree if we remove some of the children of the FunCall node, but it is not a sentential form. Definition FOLLOW(A) is the set of all tokens that can appear immediately after A in some sentential form. By convention, the set of tokens is assumed to include “eof”, and each sentence is assumed to end with that token.

Definition FIRST(A) is the set of all tokens that can be the first token in a string derived from A (that is, in the frontier of a syntax tree rooted at A). For tokens t, we define FIRST(A) to be {t}. And for sequences of grammar symbols X1...Xn (for n >= 0), FIRST(X1...Xn) is the set of all tokens that can be the first token in a string derived from X1...Xn; that is, the...
tracking img