Wednesday, July 30, 2008

TRANSLATION RULES

a) INFIX TO PREFIX
  1. If E is a variable or a constant, then Prefix(E) = E
  2. If E is E1 op E2, then Prefix(E) = Prefix(E1 op E2) = op Prefix(E1) Prefix(E2)
  3. If E is (E1), then Prefix(E) = Prefix(E1)
example:

Prefix( ( A * B ) + ( C / D) )
= ( + Prefix(A * B) Prefix(C / D) )
= ( + ( * Prefix (A) Prefix (B) ) ( / Prefix (C) Prefix(D) )
= ( + ( * A B ) ( / C D ) )


b) PREFIX TO INFIX
  1. If E is a variable or a constant, then Infix(E) = E
  2. If E is op E1 E2, then Infix(E) = Infix(E1 E2) = Infix(E1) op Infix(E2)
  3. If E is (E1), then Infix(E) = Infix(E1)
example:

Infix( + ( * A B ) ( / C D ) )
= ( Infix ( * A B ) + Infix ( / C D) )
= ( ( Infix(A) * Infix(B) ) + ( Infix(C) / Infix (D) )
= ( ( A * B ) + ( C / D) )


c) POSTFIX TO INFIX
  1. If E is a variable or a constant, then Infix(E) = E
  2. If E is E1 E2 op , then Infix(E) = Infix(E1 E2 op ) = Infix(E1) op Infix(E2)
  3. If E is (E1), then Infix(E) = Infix(E1)
example:

Infix( ( A B * ) ( C D / ) + )
= ( Infix( A B *) + Infix( C D / ) )
= ( ( Infix(A) * Infix (B) ) + ( Infix(C) / Infix(D) )
= ( ( A * B ) + ( C / D) )


d) POSTFIX TO PREFIX
  1. If E is a variable or a constant, then Prefix(E) = E
  2. If E is E1 E2 op , then Prefix(E) = Prefix(E1 E2 op ) = op Prefix(E1 ) Prefix(E2)
  3. If E is (E1), then Prefix(E) = Prefix(E1)
example:

Prefix( ( A B * ) ( C D / ) + )
= ( + ( Prefix( A B * ) ) ( Prefix( C D / ) )
= ( + ( * Prefix(A ) Prefix(B ) ) ( / Prefix(C ) Prefix(D ) )
= ( + ( * A B ) ( / C D ) )


Tuesday, July 22, 2008

2.1) Consider the context-free grammar  S -> SS + |  SS * | a

a.) S -> SS*
-> SS + S*
-> aS + S*
-> aa + S*
-> aa + a*

b.) Parse tree of this string

c.) The grammar generates a language that accepts all possible arithmetic
combinations of a but using only + and * and postfix notation

2.2) What language can be generated by the following grammars?

a.) S -> 0 S 1 | 0 1

The grammar generates a language that accepts equal number of zeroes
and ones where the sequence of 0's precedes the 1's

b.) S-> +SS | -SS | a

The grammar generates a language that accepts all possible arithmetic
combination of a using only + and - and prefix notation

c.) S -> S ( S ) S | ε

The grammar generates a language that accepts adjacent, nested and
matched pairs of parentheses.

d.) S -> a S b S | b S a S | ε

The grammar generates a language that accepts an empty string or all
possible combination of strings containing equal number of a's and b's.

e.) S -> a | S + S | S S | S * | ( S )

An ambiguous grammar that generates a language that accepts strings of a.

2.3) 2.2.e is clearly an ambiguous grammar since string S + S * can be parsed in
more than one way as
shown below.


2.4) Construct unambiguous context-free grammars for each of the following
language

a.) Arithmetic expression in postfix notation

expr -> expr expr +
expr -> expr expr -
expr -> digit
digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

b.) Left-associative lists of identifiers separated by commas

expr -> expr, id
expr -> id

c.) Right associative lists of identifiers separated by commas

expr -> id, expr
expr -> id

d.) Arithmetic expressions of integers and identifiers with the four binary
operators +, - , *, /


expr -> expr + term | expr - term | term
term -> term * factor | term / factor | factor
factor -> digit | identifier | (expr)

e.) Add unary plus and minus to the arithmetic operators of (d)

expr -> expr + term | expr - term | term
term -> term * factor | term / factor | factor
factor -> digit | identifier | (expr) | +factor | -factor

Wednesday, June 25, 2008

As mentioned previously, the first official topic of this site is the Phases of a Compiler. Below is the illustration of the phases.



The first three phases, forms the bulk of the analysis portion of a compiler. Symbol table management and error handling, are shown interacting with the six phases.


Symbol table management

An essential function of a compiler is to record the identifiers used in the source program and collect information about various attributes of each identifier. A symbol table is a data structure containing a record for each identifier, with fields for the attributes of the identifier. The data structure allows us to find the record for each identifier quickly and to store or retrieve data from that record quickly. When an identifier in the source program is detected by the lex analyzer, the identifier is entered into the symbol table.


Error Detection and Reporting

Each phase can encounter errors. A compiler that stops when it finds the first error is not as helpful as it could be.

The syntax and semantic analysis phases usually handle a large fraction of the errors detectable by the compiler. The lexical phase can detect errors where the characters remaining in the input do not form any token of the language. Errors when the token stream violates the syntax of the language are determined by the syntax analysis phase. During semantic analysis the compiler tries to detect constructs that have the right syntactic structure but no meaning to the operation involved.


The Analysis phases

As translation progresses, the compiler’s internal representation of the source program changes. Consider the statement,


position := initial + rate * 10


The lexical analysis phase reads the characters in the source pgm and groups them into a stream of tokens in which each token represents a logically cohesive sequence of characters, such as an identifier, a keyword etc. The character sequence forming a token is called the lexeme for the token. Certain tokens will be augmented by a ‘lexical value’. For example, for any identifier the lex analyzer generates not only the token id but also enter s the lexeme into the symbol table, if it is not already present there. The lexical value associated this occurrence of id points to the symbol table entry for this lexeme. The representation of the statement given above after the lexical analysis would be:


id1: = id2 + id3 * 10


Syntax analysis imposes a hierarchical structure on the token stream, which is shown by syntax trees.


Intermediate Code Generation

After syntax and semantic analysis, some compilers generate an explicit intermediate representation of the source program. This intermediate representation can have a variety of forms.

In three-address code, the source pgm might look like this,


temp1: = inttoreal (10)

temp2: = id3 * temp1

temp3: = id2 + temp2

id1: = temp3


Code Optimisation

The code optimization phase attempts to improve the intermediate code, so that faster running machine codes will result. Some optimizations are trivial. There is a great variation in the amount of code optimization different compilers perform. In those that do the most, called ‘optimising compilers’, a significant fraction of the time of the compiler is spent on this phase.


Code Generation

The final phase of the compiler is the generation of target code, consisting normally of relocatable machine code or assembly code. Memory locations are selected for each of the variables used by the program. Then, intermediate instructions are each translated into a sequence of machine instructions that perform the same task. A crucial aspect is the assignment of variables to registers.

Tuesday, June 24, 2008

This blog, owned by KAMI, is in partial fulfillment of the requirements in ICS 40. It encompasses reviews, articles, etc. about Compiler Design Principles.

KAMI consists of the following 4th year students:


Agnes P. Domingo
Yiyi Johnson T. Guo
Beverly Jan R. Ucab

Our first most anticipated post will tackle the phases of a compiler. Keep posted! *winks*