CS 358 Assignment 2 – Parsing and AST-Building

due date: Tuesday, 11 February (at 11:55pm)

In this assignment, you will continue using WrangLR. You will produce a parser for the full MiniJava language. I have supplied you (with help from the Dr. Appel and friends) with a directory structure that contains all the support classes that you should need to complete and test the project; it is an extension of the directory structure from Assignment 1. In packages from Assignment 1, it also contains:

·       syntaxtree (adapted from Appel), which contains definitions for all the abstract-syntax-tree classes.

·       treedisplay, which allows the abstract syntax tree to be displayed in a window or printed to an output-stream in a "structured" way.

The file MJGrammar.java (in the parse package) contains an implementation of a small subset of the grammar, which includes non-extended classes, void-returning methods, most types, and a few kinds of expressions. Note that some of the semantic actions are dummied up. You will need to extend the grammar so that it parses the entire language, and creates the proper AST. (Note: the switch statement and related constructs are optional.)

The nodes that should be built for each construct are largely described in Sections 4.2 and 4.3 of Appel. I have made a number of modifications to the AST definitions, which we are discussing in class. (Some modifications are because our Java subset is larger; some are to put more structure into the inheritance hierarchy; some are to make the AST display more nicely in a window.)

You might also remember that certain language constructs do not have explicit AST-nodes, but are converted to equivalent constructs. For example:

·       If a class definition is found without "extends", produce an AST as if extends Object had been written.

·       If an if-statement is found with no "else", produce an AST as if the if-statement had an else { } at the end.

·       If a for-statement is found, produce an equivalent while statement. For example,
   for (a = 4; a <= 10; a++) { bodyPart1(); bodyPart2(); }
would be treated as if
   { a = 4; while (a <= 10) {{ bodyPart1(); bodyPart2(); } a++;}}
had been written.

·       If an empty statement, ;, is encountered, it is treated as if { } had been written.

·       If a statement such as x++; is encountered, it is treated as if x = x + 1; had been written (and similarly for prefix ++ and both forms of --).

·       An expression such as x!=y is treated as !(x==y); similarly x<=y as !(x>y) and x>=y as !(x<y).

·       An expression such as -x is treated as 0-x; +x as 0+x.

·       If a method call does not have an explicit object, this is assumed. Hence, when the expression getVal() is encountered, it is treated as if this.getVal() had been written.

·       When a character literal is encountered, it is treated as if the equivalent integer literal had been typed. Hence, if '2' is encountered, the AST node for 50 would be produced (because the encoding for the character '2' is 50).

Building and running your parser

The setup for building/running the parser is similar to that for the scanner in Assignment 1. As before, I am supplying the starter files in the form in which they can be imported an Eclipse project (see Assignment 1 handout for details on importing). If you prefer to program in another environment and need help setting it up, come see me.

After editing your MJGrammar.java file, you will need to do the following so that the effects of those edits are incorporated into your running project:

 To run the parser on a file, say Test1.java (in the miniJava folder):

Incremental implementation

I’d suggest that you not try to get the entire grammar working all at once. Rather, try:

·       Selecting a few grammar constructs (e.g., simple expressions that contain no operators) and do those. (Initially, don’t do the semantic actions; just "dummy them up"--see below.)

·       Select a few more grammar constructs to implement, again without no semantic actions.

·       Continue on, until the entire grammar is implemented. (Doing this in small steps will make it easier to debug your grammar.

·       Add the semantic actions (possibly doing them incrementally, too).

Doing things incrementally will have several advantages:

·       You can start working on your grammar before we have discussed in class the details of the AST node-types.

·       If you introduce a bug (e.g., a shift-reduce conflict), you'll have a pretty good idea which grammar productions might have caused it.

·       It will give you confidence, as your parser gradually parses a larger and larger grammar.

·       You won't have to clutter up your head with semantic-action issues while you are focusing on the grammar.

When you "dummy up" a semantic action, you will typically produce a result that is a simple-to-create node that is not the correct one, but produces a valid AST. For example, you can have all your expression-producing productions create a dummy "True" node that represents the keyword true.) Once you think you are parsing things correctly, then start working on augmenting your semantic actions to build the appropriate AST nodes.

If you have trouble building the AST properly, you will get partial credit if your parser parses properly.

Extensions

In addition to the basic assignment (which is worth a borderline B+/A-), you may wish to do some or all of the following.

Extension 1 [worth about ½ a grade]. Recognize Java's do-while statement. This would allow, for example, the following to be parsed as a statement:

do {
  x = x * 2;
  y = y - x;
}
while (y < 0);

Because there is no AST node-type for "do-while", you should produce While AST node with the same meaning. In the above example, this would be the AST as if the following had been written:

while (true) {
  {
    x = x * 2;
    y = y - x;
  }
  if (!(y < 0)) break;
}

Extension 2 [worth about a full grade]. Recognize Java's switch statement. For example:

switch(n) {
  case 1: x++; break;
  case 2: case 3: y++; break;
  default: x--; y--; break;
  case 100: x = 0; break;
}

There are Switch, Case and Default classes into the syntaxtree package so that you can produce an AST node when you parse this construct. The Case and Default labels act as normal statements, so the body of the switch statement in the above example would have 14 statements.

Displaying the AST graphically

As mentioned above, the -w switch tells the compiler to graphically display the created AST into a window.

If you want to see the top-level of a subtree, you can click on a node: this will cause that node to turn light blue, and will prevent its subnodes from being shown; clicking a second time will cause the subnodes to again be shown.

If you want to change the font size to, for example, 20, you can append an integer to the -w, as in -w20. You can also change the background color by appending a colon (:) followed by a hexadecimal number that specifies the RGB value. Thus -w:ff7700 would specify a color with full red (ff hex = 255), half green (77 hex is about half of 255) and no blue (00 hex is zero). A font size and color can be combined as well (e.g., -w20:ff7700).

Using the scanner

I have implemented a scanner that you can use with your parser, but it is implemented very oddly—in two stages:

·      The first stage strips comments and “canonicalizes” the tokens into simplified character stream, where the output-tokens have some odd syntax rules.

·      The second stage reads the “canonicalized” tokens—these are the tokens that are used for the starter grammar.

There are two purposes in using this approach:

·      Because the canoncalized tokens are simpler to parse your parser-generation should run much more quickly than if you directly used the lexical rules from Assignment 1.

·      My MiniJava scanner implementation provided as a class-file, so that you won’t be able to use it if you’re late in turning in your assignment 1.

As a result of this two-stage approach, some of the lexical rules will look very odd. For example:

//: `int ::= “#it” => void

You should ignore the details of these rules, knowing if that if the int token is seen in the original source file, it was converted to #it in the intermediate file. Line and character positions are (allegedly) preserved.

The starter-project is set up to use an allegedly correct, and I would encourage you to use it. If you would like to use your own scanner as a base, you will need to make a minor modification to the Main class, and will also need to replace my scanner rules with yours in the MJGrammar.java file. (Come see me if you want help.)

Handin

On (or shortly after) the due date, I will run your program with some MiniJava programs (including syntactically illegal ones). You should hand in your program via learning.up.edu, in the same manner as Assignment 1. Hand in only your MJGrammar.java file. Do not hand in any other files.