Lexical Analyzer

Only available on StudyMode
  • Download(s) : 102
  • Published : April 20, 2013
Open Document
Text Preview
Lexical Analysis

Made with OpenOffice.org


The role of lexical analyzer: analyzer:
It is the first phase of a compiler. compiler. Its main task is to read the input characters & produce as output a sequence of tokens that the parser uses for syntax analysis. analysis. Upon receiving a “get next token” command from the parser, the lexical analyzer reads input characters until it can identify the next token. token. It may also perform certain secondary tasks at the user interface. interface. One such task is stripping out from the source program comments & white spaces in the form of blank, tab, & new – line characters. characters. Another is correlating error messages from the compiler with the source program. program.

Made with OpenOffice.org


The role of lexical analyzer (cont…): (cont…
It may keep track of the number of new – line characters seen, so that a line number can be associated with an error message. message. In some compilers, the lexical analyzer is in charge of making a copy of the source program with the error messages marked in it. it.

Made with OpenOffice.org


Issues in lexical analysis: analysis:
There are several reasons for separating the analysis phase of compiling into linear analysis and hierarchical analysis. analysis. Simpler design is perhaps the most important consideration. The separation of consideration. lexical analysis from syntax analysis often allows to simplify one or the other of these phases. phases. Compiler efficiency is improved. A separate lexical analyzer allows to construct a improved. specialized & potentially more efficient processor for the task. A large amount of time task. is spent reading the source program & partitioning it into tokens. Specialized tokens. buffering techniques for reading input characters & processing tokens can significantly speed up the performance of a compiler. compiler. Compiler portability is enhanced. Input alphabet peculiarities & other device– enhanced. device– specific anomalies be restricted to the lexical analyzer. The representation of special analyzer. or non–standard symbols, such as in Pascal, can be isolated in the lexical analyzer. non– analyzer.

Made with OpenOffice.org


Tokens, Patterns, Lexemes: Lexemes:
There is a set of strings in the input for which the same token is produced as output. output. This set of strings is described by a rule called a pattern associated with the token. token. The pattern is said to match each string in the set. set. A lexeme is a sequence of characters in the source program that is matched by the pattern for a token. token. Ex: const pi = 3.1416 Ex: The substring pi is a lexeme for the token “identifier”, identifier”, & the substring const is a lexeme for the token “const”. const”

Made with OpenOffice.org


Tokens, Patterns, Lexemes (cont…): (cont…
Tokens are treated as terminal symbols in the grammar for the source language, using boldface names to represent tokens. tokens. The lexemes matched by the pattern for the token represent strings of characters in the source program that can be treated as a lexical unit. unit. In most programming languages, the following constructs are treated as tokens: tokens: keywords, operators, identifiers, constants, keywords, operators, identifiers, constants, literal strings, & punctuation strings, symbols such as parentheses, commas, semicolon, etc… parentheses, commas, etc… A pattern is a rule describing the set of lexemes that can represent a particular token in source programs. programs. Ex: The pattern for the token relation is the set of all six Pascal relational operators. Ex: operators.

Made with OpenOffice.org


Attributes for Tokens: Tokens:
When more than one pattern matches a lexeme, the lexical analyzer must provide additional information about the particular lexeme that matched, to the subsequent phases of the compiler. compiler. For example, the pattern num matches both the string 0 & 1,...
tracking img