Rock
Programming
Languages
Compiler phases
Prof. Xin Yuan
Overview
Compiler phases
10/2/2013
Lexical analysis
Syntax analysis
Semantic analysis
Intermediate (machine-independent) code generation
Intermediate code optimization
Target (machine-dependent) code generation
Target code optimization
COP4020 Spring 2013
2
Source program with macros
A typical compilation process
Preprocessor
Source program
Compiler
Target assembly program
Try g++ with –v, -E, -S flags
on linprog.
assembler
Relocatable machine code
linker
Absolute machine code
10/2/2013
COP4020 Spring 2013
3
What is a compiler?
A program that reads a program written in one language
(source language) and translates it into an equivalent
program in another language (target language).
Two components
Understand the program (make sure it is correct)
Rewrite the program in the target language.
Traditionally, the source language is a high level language
and the target language is a low level language (machine
code).
Source
program
compiler
Target
program
Error message
10/2/2013
COP4020 Spring 2013
4
Compilation Phases and Passes
Compilation of a program proceeds through a fixed
series of phases
Each phase use an (intermediate) form of the program produced by an earlier phase
Subsequent phases operate on lower-level code representations
Each phase may consist of a number of passes over the
program representation
10/2/2013
Pascal, FORTRAN, C languages designed for one-pass
compilation, which explains the need for function prototypes Single-pass compilers need less memory to operate
Java and ADA are multi-pass
COP4020 Spring 2013
5
Compiler Front- and Back-end
Abstract syntax tree or
other intermediate form
Source program (character stream)
Scanner
(lexical analysis)
MachineIndependent Code
Improvement
Parser
(syntax analysis)
Parse tree
Back end
synthesis
Front end
analysis
Tokens
Semantic Analysis
and Intermediate
Code Generation
Abstract syntax tree or
other intermediate form
10/2/2013
Modified intermediate form
Target Code
Generation
Assembly or object code
Machine-Specific
Code Improvement
Modified assembly or object code
COP4020 Spring 2013
6
Scanner: Lexical Analysis
Lexical analysis breaks up a program into tokens
Grouping characters into non-separatable units (tokens)
Changing a stream to characters to a stream of tokens
program gcd (input, output);
var i, j : integer;
begin
read (i, j);
while i j do
if i > j then i := i - j else j := j - i;
writeln (i)
end.
program
var
read
i
then
:=
)
10/2/2013
gcd
i
(
i
i
end
(
,
i
j
:=
.
input
j
,
do
i
i
,
:
j
if
;
output
integer
)
i
j
writeln
COP4020 Spring 2013
)
;
;
>
else
(
;
begin
while
j
j
i
7
Scanner: Lexical Analysis
What kind of errors can be reported by lexical analyzer?
A = b + @3;
10/2/2013
COP4020 Spring 2013
8
Parser: Syntax Analysis
Checks whether the token stream meets the
grammatical specification of the language and
generates the syntax tree.
A syntax error is produced by the compiler when the program
does not meet the grammatical specification.
For grammatically correct program, this phase generates an
internal representation that is easy to manipulate in later phases
Typically a syntax tree (also called a parse tree).
A grammar of a programming language is typically
described by a context free grammer, which also defines
the structure of the parse tree.
10/2/2013
COP4020 Spring 2013
9
Context-Free Grammars
A context-free grammar defines the syntax of a programming
language
The syntax defines the syntactic categories for language constructs
Statements
...
Please join StudyMode to read the full document