CS 43001 Compiler Construction

(Autumn Semester 2006)

Theory
Niloy Ganguly niloy@cse.iitkgp.ernet.in

Laboratory
Chitta Ranjan Mandal chitta@cse.iitkgp.ernet.in
Niloy Ganguly niloy@cse.iitkgp.ernet.in

Teaching Assisstant
Subhankar Mitra subhankar.mitra@gmail.com
Ashish Tiwari mail.ashishtiwari@gmail.com
Monu Kedia monu.kedia@gmail.com
Ankur Saxena ankur.ankursaxena@gmail.com


Notices

Notices

30.11.2006 The Marks are out. You can come to see your answer script. However please avoid it if you have got EX or the chance of getting your grade change is minimum. The marks are abolute and the grades follow accordingly. (89 - EX). Also all grievances regarding lab has been taken into consideration.
12.09.2006 - In scribing you have to submit the doc, pdf and ppt file
25.09.2006 - Assignment 4 last date is extended to 16th October, 2006. Before DP, please submit the parsing table in doc format
25.09.2006 - All scribes before midsem has to be submitted by 16th OCtober, 2006.
10.10.2006 - Midsem and class test marks are out, If you want to see the answer scripts visit my office by Oct 13.

Theory

       Lectures
       Evaluation
       Blog
       Assignments
       Students List

Laboratory

          Marks
          Assignment 1
          Assignment 2
          Tiny-C Specification
          Assignment 3
          Assignment 4
          Assignment 5
          Assignment 6

Theory

  Lectures   : 	Mon - 4, Tue - 1,2, Thu -3 
  Room #     :  CSE 119
  Units      :	4-0-2
  Credits    :  4 (Theory) 
  Instructor :	Niloy Ganguly
  Contact    : 	Room #313  (CSE), Phone 3460

Text Books:

[1]  Aho, A. V., Sethi, R. and Ullman, J. D.
     Compilers - Principles, Techniques and Tools
     Addison-Wesley, 1988 (Indian reprint)
     - aka Dragon book

[2] Santanu Chattopadhyay
    Compiler Design
    PHI, 2005.

[3] Advanced Compiler Design Implementation
    Steven S. Muchnick
    Elsevier, 2003

Lectures

The lecture notes are unedited version of student submission.
01. 24.07.06 - Introduction,. ppt 04CS1024
02. 25.07.06 - Introduction,. ppt 04CS3019
03. 25.07.06 - Introduction,. ppt 04CS1005
04. 07.08.06 - Lexical Analyzer. ppt 04CS1006
05. 08.08.06 - Thompson Construction, Subset Construction,. ppt 04CS1017
06. 08.08.06 - Thompson Construction, Subset Construction,. ppt04CS1007
07. 10.08.06 - Thompson Construction, Subset Construction,. ppt04CS1002
08. 16.08.06 - Lex, ppt04CS1008
09. 17.08.06 - Syntax Analysis, Error Recovery, ppt04CS1004
10. 24.08.06 - Ambiguity in Grammar,ppt 04CS1010
11. 28.08.06 - Left Recursion, ppt , 04CS1011
12. 29.08.06 - Top Down Parsing ppt, 04CS1012
13. 29.08.06 - LL(1) Parser, ppt, 04CS1020
14. 31.08.06 - LL(1) Parser, ppt, 04CS1013
15. 07.09.06 - Bottom-up Parser, ppt1, ppt204CS1014
16. 11.09.06 - Bottom-up Parser, Operator Precedence Parser, ppt, 04CS1015
17. 12.09.06 - SLR Parser,ppt, 04CS1016
18. 12.09.06 - SLR Parser, ppt, 04CS1019
19. 14.09.06 - SLR Parser, ppt, 04CS1021
20. 25.09.06 - LR(1) Parser, ppt,04CS1023
21. 26.09.06 - LR(1) Parser, LALR Parser, ppt, 04CS1026
22. 26.09.06 - LR(1) Parser, LALR Parser, ppt, 04CS1027
23. 28.09.06 - LALR Parser, ppt, 04CS1029
24. 09.10.06 - Ambiguity in Grammar ppt, 04CS1030
25. 10.10.06 - Ambiguity in Grammar, Error Recovery ppt, 04CS1028
26. 10.10.06 - Ambiguity in Grammar ppt, 04CS1032
27. 16.10.06 - Intermediate Code Generation ppt, 04CS3002
28. 17.10.06 - Three Address Code Generation,, ppt, 04CS1033
29. 17.10.06 - Three Address Code Generation, ppt, 04CS1034
30. 18.10.06 - Three Address Code Generation ppt, 04CS1035
31. 23.10.06 - Three Address Code Generation - Array, ppt, 04CS1036
32. 24.10.06 - Three Address Code Generation - Array, ppt, 04CS1037
33. 24.10.06 - Three Address Code Generation -Boolean Functions, ppt, 04CS3001
34. 26.10.06 - Three Address Code Generation - Control Statements, ppt, 04CS3004
35. 30.10.06 - Three Address Code Generation - Control Statements, ppt, 04CS3005
36. 31.10.06 - Three Address Code Generation - Control Statements, ppt, 04CS3008
37. 31.10.06 - Three Address Code Generation - Backpatching, ppt, 04CS3010
38. 02.11.06 - Three Address Code Generation - Backpatching, ppt, 04CS3013
39. 09.11.06 - Three Address Code Generation - Backpatching, ppt, 04CS3014
40. 13.11.06 - Target Code Generation, ppt, 04CS3015
41. 14.11.06 - Target Code Generation - Basic Blocks, ppt, 04CS3020
42. 14.11.06 - Target Code Generation - Next Usage, ppt, 04CS3022
43. 16.11.06 - Target Code Generation - Register Allocation/DAG, ppt, 04CS3023

Evaluation

Teacher's Assessment : 20
        Assignment/Scribing : 6
        Class Test I : 5 - 04CS1022
        Class Test II : 5 - 04CS3024
        Attendance : 4

Mid-sem : 30 - Solution 04CS3017

End-sem : 50 - Solution 04CS3025

Blog

Class Diary before midsem
04CS3011, 04CS1018, 04CS3026, 04CS1031

Class Diary after midsem
04CS3018, 04CS3016, 04CS1025, 04CS3012

Assignments

  • Let G be the following grammar: (04CS3006)
    S-> S op1 S | S op2 S | A
    A-> a | b
    a. Prove that G is ambiguous.
    b. Rewrite G to impose left associativity on both op1 and op2 and to impose higher precedence on op1 than op2. The resulting grammar should be unambiguous, and able to specify an LR(1) parser.

  • Consider the following grammar: (04CS3007)
    R => R | R
    R => R R
    R => R *
    R => ( R )
    R => a | b
    where the terminals are { |, *, (, ), a, b }. This grammar generates regular expressions over a,b involving alternation, concatenation, and repetition.
    | alternation, * repetetion 1 or more time
    a) Compute FIRST and FOLLOW for the only non-terminal in this grammar.
    b) Show the grammar is ambiguous.
    c) construct an equivalent unambiguous grammar that gives the operators star (repetition), concatenation, and alternation the usual precedences :
    repetition > concatenation > alternation.
    d) Build the FIRST and FOLLOW set for this new grammar. Is it LL (1)?

  • We want to construct a simple pocket-calculator program using yacc and lex which can parse strings such as 1+(10-5-3)*5+2 and print the result, 13 in this case. Outline the overall structure of your program components. Give full details of the input to yacc and lex. (Precise syntactic details are not important, but your answer should reflect basic principles involved).
    Make following assumption :
    - Only three binary operator are allowed i.e. plus, minus and multiply
    - Only integer data type is allowed
    - Parentheses are used to override precedence. 04CS3009

    Lab Assignments

    General Guidelines

    Assignment 1: Assignment on Thompson's Construction

    - understand the working of #include, #define, #ifdef, #ifndef and commenting in C preprocessor.
    Write a mini pre-processor for a C-language. The preprocessor should do the following:
    - inclusion of header files
    -- Implement #include preprocessor directive. The content of the argument file should be copied to the output file.

    - textual replacement
    -- Implement #define preprocessor (macro) directive that is replace the macro calls inside the program with the macro contents. Parametrized and recursive #define need not be considered.

    - conditional inclusion/exclusion
    -- Implement #ifdef and #ifndef preprocessor directives. According to the value copy the #ifdef ot #ifndef part to the output file
    - stripping out the comments.
    -- Implement both single line (//) and multiple line comments (/* */)stripping.

    Submission Date - 7th August
    On 31st Jule come to the lab between 2 to 3 and meet your Teaching Assistants. Any doubt about the assignments will be cleared by them

    Assignment 2: Assignment on Thompson's Construction

    You have to implement the following two algorithms
    * Thompson Construction - Constructing NFA from regular expression
    * Constructing DFA from NFA.
    For constructing a lexical analyzer/scanner from a given set of regular expression.
    Assume following specifications for implementation :
    1. Input alphabets
    * All alphabets from a to z character
    - [a..z] denotes all the alphabets from a to z
    * White space, \t, \n

    2. Character corresponding to operators
    * Union (|) Ex: A|B
    * Intersection (.) Ex: A.B
    * Asterate (*) Ex: A* (0 or more occurences of A)
    * Plus (+) Ex: A+ (1 or more occurences)
    3. Input format to the tool
    Assume that the input regular expressions will given in a file (reg.exp). Each line will corresponds to one regular expression of the form
    .
    (a) Assume simple regular expressions only which can be parsed easily without need for complicated parsing.
    (b) Return UNMATCHED for lexeme which donot match any of the regular expression in reg.exp.
    (c) Your scanner should go for longest possible match i.e. a string 'iff' should be identified as 'iff', not as 'if' and 'f'.
    (d) Regular expression defined earlier in the reg.exp file should get priority over the one defined later i.e. 'for' can match as an ID also but if 'for' is defined earlier in the reg.exp as FOR then it should return FOR not ID
    Sample Input-Output (just for illusatration, your tool should work for other test cases also)

    -------------------------------------------------
    reg.exp file :
    < i.n.t,INT>
    < f.l.o.a.t,FLOAT>
    <[a..z]+,ID>
    Input to be scanned :
    int a
    float a
    ag-fgfg
    Output to be generated :
    INT
    ID
    FLOAT
    ID
    UNMATCHED
    ----------------------------------------------------

    Submission Date - 21st August
    On 14th August come to the lab between 2 to 3 and meet your Teaching Assistants. Any doubt about the assignments will be cleared by them

    Tiny-C Specification

    1. Data types: int, char, float
    2. Array Support: As in C
    Ex: int a[10], b[10][5];
    Multiple array declarations are also allowed. The format is as in C.
    Ex: int a[10],b[5];
    3. Structures: As in C language
    4. Composite statement is enclosed within braces { . . .}.
    5. Assignment.
    Ex: a = 10;
    Continued assingment should also be supported
    Ex: a = b = 10;
    6. Binary Arithmetic operators. +, -, *
    7. Unary Arithmetic operator. ++, ** (only postfix)
    8. Boolean operators: ==, >, <
    9. Binary Logical operator : &&, ||
    10. Unary logical operator : !
    11. Unary operator: -
    12. Conditional statement:
    if (condition) composite-statement [else composite-statement]
    13. Loops:
    a. For loop as in C
    b. Do while loop (a bottom tested loop) as in C
    b. While loop as in C
    14. Program entry point defined by keyword "main"
    15. Subroutine identified by keyword "SUB", followed by subroutine name.
    It accepts parameters, they are within paranthesis following the subroutine name. A subroutine is declared wherein the type of the parameters are defined. There is no return value for a subroutine.
    Example :
    // Declaration of subroutine (here the type of parameters are defined)
    SUB routine(float, int);
    // Definition of subroutine (Note that the type of parameters are NOT mentioned here. This is different from C)
    SUB routine(inti_f, iterate)
    {
    // Here the code for subroutine
    ....
    ...
    ..
    }
    // Call
    routine(1.9, 5);
    Programes written in Tiny-C can use identifiers consisting of any alpha-numeric characters, starting with only _ or [a..z] or [A..Z]
    Note : A sample programe written in Tiny-C will be circulated soon.

    Assignment 3: Understanding and working with Lex

    (2) You have to implement a Lexical Analyzer for Tiny-C programming language using LEX tools. The specifications of Tiny-C is given above.
    Specifications of Lexical Analyzer:
    (a) The name of the scanner function should be "int yylex()", it returns the integer as token.
    (b) Each token may have some attributes. For example : If the token corresponds to "integer constant" than yylex() returns an integer corresponding to "integer constant" but the value of "integer constant" is its attribute. Such attributes should be made available to the future stages of compilation through a global variable of C-Union type which holds atleast the following attributes :
    - Value of the integer constant in case of integer token
    - Value of the floating point constant in case of floating point token
    - character string for other cases. Like it will have the name of the identifier in case of identifier token
    (c) Print some meaningful names for the tokens in the output file, rather than the integer value returned by the lexical analyzer to aid the understanding of the output.
    The output should be a sequence of duplets :
    Example :
    Note : This analyzer will be used in the subsequent assingments. So,it should be implemented in a clean and readable way.
    Submission Date - 4th September

    Assignment 4: CFG for Tiny-C and table-driven LL(1) parser

    You have to write a CFG for Tiny-C programming language, and implement a table-driven LL(1) parser.
    Given:
    ______
    1. The Tiny-C langauge is a simplified and modified subset of C programming language. The language specifications are given below.
    2. You can get the CFG for C-programming language at Appendix A13 in Kernighan, B. W. and Ritchie, D. M., The C Programming Language, PHI.
    Reuse:
    ______ - Use the Lexical Analizer developed by you in Expt3 (feel free to modify).
    TO BE DONE - with paper and pencil
    __________________________________ - Write a CFG for the Tiny-C language specifications.
    - Transform your CFG (without changing the language) suitably for a table-driven predictive parser (LL(1)).
    - Compute FIRST, FOLLOW and create the Parsing table (manually).
    - Make a proper document of the above work, and submit a HARD-copy. (Also mail the soft-copy please).
    TO BE DONE - programming assignment
    ___________________________________
    - Write a Non-recursive table-driven LL(1) parser.
    Input - Outputs
    _______________
    - The grammar of the language must not be hard coded in the program, rather should be taken as input using a file. The file name can be fixed in the program or taken as one of the input of the program.
    - The program should also take the test program file as input.
    - The output of the program should be "Accepted" in the case that the input program satisfies the given grammar.
    - In the event of errors, try to do as clear error reporting as possible. Both lexical and syntactic errors should be clearly reported. The error reporting should contain the possible line number of the error and the type of error as well. All errors must be reported in separate lines and the format should be similar to the one used by gcc.
    - The program should try to find multiple errors (if present). The program should be designed in such a way that it does not terminate when an error is detected, but continues to detect any other occurance of error.(This feature is optional, but, will attract bonus marks if implemented.)
    Programming Assignemnet Submission:
    __________________________________
    1. Compile and test your program on one of the dept. machines. Once tested send the report in the following way. Use a 'Makefile' for the whole process(optional). The program must take inputs from the command line and must not be interactive. which then will be tested on 'test-data' (say).
    $make
    .
    .
    $a.out < test-data
    or
    $a.out test-data
    2. For submission prepare a tar-archive consisting of the 'Makefile', README, and all .c as well as all .h files. A .h file should contain Macros, prototype declarations, and type definitions. But it should not contain any C function body or variable declaration. Never '#include' one .c file in another. Also include the Grammar in a text file.
    You may include a few sample test-files. One sample file should be named as 'in'. If you wish to include more sample files, name them as 'in1', 'in2' ...
    3. NAME the tar-archive as "expt4_??.tar" where "??" is a TWO DIGIT Group_Id number. Do not include the directory while creating the archive. For example, use "tar -cvf .tar ." in a directory that contains all the files to be included. Do not include binaries/image files.
    4. Submit the tar file by mailing it to compiler.06@gmail.com.
    5. This experiment is a part of generating a working compiler for Tiny-C language which will continue until the semester-end. So keep a copy of the archive with you, you need to reuse the source-code later.
    Submission Date - 28th September - The submission is to be done through mail

    Assignment 5: Assignment on YACC

    You have to implement a LALR(1) parser for Tiny-C programming language using 'yacc' tools. - Specifications of Tiny-C is the same as circulated earlier
    - Reuse the Lexical Analyzer which you implemented in Assingment 3
    - Reuse the grammar which you wrote in Assingment 4. Feel free to modify it as per the requirements of 'yacc'
    - Try to report some error message in case of syntax error in the input
    Submission Date - 16th October

    Assignment 6: Assignment of Intermediate Code Generation

    The goal of this assingment is to generate machine independent intermediate representation (IR) for a program written in Tiny-C. There can be several different types of IR. We are concerned with generating a simple but widely followed IR i.e. 3-address (TA) code.
    Programming Assingment:
    You have to incorporate semantic actions in the parser implemented in Assingment 5 for generating 3-address code.
    - Reuse the Lexical analyzer for Tiny-C developed in Assingment 3
    - Reuse the parser for Tiny-C developed in Assingment 5
    " Note : Generate 3-address code in such a way so that it can be compiled using native C compiler. For this use the equivalent C statements for 3-address code and include variable declarations of the user defined variables and temporaries. A sample front-end following this philosophy for C language can be looked at http://www.lancecompiler.com/ , we want to have similar front-end for Tiny-C "
    Submission Date - 30th October