Update: You can find some examples with their expected outputs here. Your goal for this assignment is to create an interpreter for the MITScript language. The semantics of the interpreter are described below in terms of the different constructs in the language.

Deadline: March 14 at 7pm.


Values in the program can belong to any of the following five categories. Boolean ;; Integer ;; String ;; Function ::= (frame, code);; Record ::= Map[String, Value]*;; None ;; The value None is a special singleton value. A function is represented as a pair of a pointer to a stack frame (described below) and the code corresponding to the function. In the semantics, only records are modeled explicitly as pointers (to a map from strings to values), but in the actual implementation you will be using objects to represent all the different types of values. Specifically, it is recommended that you define a Value class, and then make each of the different value types subclasses of Value.

Program state

The state of the program is represented by the stack and the heap. Because of closures, however, it is best to model the stack itself as a series of references to heap allocated stack frames.

A stack frame $s=\{x_1:v_1, ~\ldots ~x_k:v_k, ~p:a, ~g:\{x_j\} \}$ contains a set of variables $x_i$, each mapping to a value $v_i$, together with a parent pointer $p$ that maps to an address $a$ of a parent stack frame, and a set of global variable names $g$.

The state of the program includes of a heap $h:a\rightarrow \nu$ that maps addresses $a$ to heap allocated values $\nu$ which can either be stack frames or one of the five types of values listed above. It also includes a stack $\Gamma$ of references to stack frames. We will use the notation $\Gamma = \Gamma';a$ to indicate that the stack $\Gamma$ is equal to another stack $\Gamma'$ with a reference $a$ added at the top. Similarly, we use the notation $\Gamma = a_1;\Gamma';a_2$ to indicate that $\Gamma$ is a stack that has $a_1$ at the bottom, followed by another stack $\Gamma'$ followed by a reference $a_2$ at the top.

The program starts its execution with a Global stackframe $g=\{ \ldots p:NULL \}$ with a parent pointer equal to $NULL$, and with three predefined global functions ( see below). The initial heap contains only this initial global stackframe ($h_0 = \{a_g:g\}$), and the initial stack contains only a pointer to the global stack frame in the heap $\Gamma_0=a_g$.


When a program performs an illegal operation, execution should stop, and your interpreter must throw one of the following exceptions: In the semantics below we indicate exactly when each of these three exceptions should be thrown.

Program semantics without return

We will define the semantics of the language in steps; we will start with a "clean" version that assumes return statements only happen at the end of functions, and then we will enrich the semantics to account for returns in arbitrary program locations. At this stage, the semantics of statements will be defined in terms of rules that take a stack, a heap and a statement and produce an updated heap. \[ (\Gamma, h, S) \rightarrow h' \]
Sequence of Statements (Block) (\Gamma, h, S_0) -> h_1 \ldots (\Gamma, h_{i}, S_i) -> h_{i+1} --- ( \Gamma, h, [S_0,..., S_n] ) -> h_{n+1} The first statement is evaluated under the initial heap, and every subsequent statement is evaluated under the heap produced by the previous statement.
Var Assignment \Gamma = a_g;\Gamma';a_i (\Gamma, h, e)->(h', v) h_{out} = update_1(a_g, a_i, h', x, v) --- ( \Gamma, h, x = e ) -> h_{out} Where the update function $update_1(a_g, a_i, h', x, v)$ is defined in terms of the following cases. First, let $h'(a_i) = \{x_1:v_1, \ldots x_k:v_k, p:a, g:\{x_i\}\}$ be the current stack frame. Then, the two cases to consider are:
  • $x \in g$, so $x$ is a global variable. In that case, we need to update the value of x in the global stack frame $h'(a_g)$.
  • $x \notin g$, so $x$ is a local variable. In that case, we need to update the value of x in the local stack frame $h'(a_i)$.
The result of these updates is a new heap $h_{out}$ which is the result of the assignment operation.
Heap Assignment (\Gamma, h, e_1)->(h_1, a) (\Gamma, h_1, e_2)->(h_2, v) h_{out} = update_2(h_2, a, x, v) --- ( \Gamma, h, e_1.x = e_2 ) -> h_{out} In this case, the assignment is not updating a stack frame; it is updating a record in the heap. In this case, the update function is much simpler. $update_2(h, a, x, v)$ simply checks that $h(a)$ is indeed a Record, and updates field 'x' in that record with the value $v$.
IllegalCastException if $h(a)$ is not a record.
Heap Index Assignment (\Gamma, h, e_1)->(h_1, a) (\Gamma, h_1, e_2)->(h_2, fname) x = toString(fname) (\Gamma, h_2, e_3)->(h_3, v) h_{out} = update_2(h_3, a, x, v) --- ( \Gamma, h, e_1[e_2] = e_3 ) -> h_{out} This rule is similar to the previous one, except the field name is not hardcoded; instead, the index expressin needs to be evaluated and cast to the string according to the toString function defined later.
IllegalCastException if $h(a)$ is not a record.
If True (\Gamma, h, e)->(h', True) ~~~~ (\Gamma, h', S_1)-> h'' --- ( \Gamma, h, if(e)then S_1 else S_2 ) -> h'' If the condition evaluates to true, the result of the if statement is the result of evaluating the then branch. There is a symmetric case for when the branch condition evaluates to false $(\Gamma, h, e)->(h', False)$. In that case, the result is the result of evaluating $S_2$.
IllegalCastException if $e$ does not evaluate to a boolean.
While True ( \Gamma, h, e)->(h_1, True)~~~~ ( \Gamma, h_1, S)-> h_2 ~~~~ ( \Gamma, h_2, while(e)~S ) -> h_3 --- ( \Gamma, h, while(e)~S ) -> h_3
IllegalCastException if $e$ does not evaluate to a boolean.
While False ( \Gamma, h, e)->(h', False) --- ( \Gamma, h, while(e)~S ) -> h'
IllegalCastException if $e$ does not evaluate to a boolean.
Global --- ( \Gamma, h, global ~ x ) -> h The global declaration is a noop; it will only be relevant during function creation.
Return ( \Gamma, h, e)->(h', v) h'' = h'\{ ret:v\} --- ( \Gamma, h, return ~ e ) -> h'' Remember for this case, we assuming return only happens at the end of a function, so the only thing for return to do is evaluate the return expression and store it in a special location 'ret' in the heap.

Expresssion evaluation is more interesting. We now elaborate on the semantics of the main expression types:

Constant --- ( \Gamma, h, n ) -> (h, Integer(n)) There are similar versions of this rule for boolean and String constants, as well as the None constant. The idea is straightforward; the interpreter just generates a value corresponding to the constant.
Record Constructor ( \Gamma, h, e_0 ) -> (h_1, v_0) \ldots ( \Gamma, h_i, e_i ) -> (h_{i+1}, v_i) a = freshAddress(h_{i+1}) h_{out} = h_{i+1}[a : Map\{x_i:v_i\} ] --- ( \Gamma, h, \{x_i:e_i\} ) -> (h_{i+1}, a ) There record constructor just evaluates all its initializer expressions and constructs a Record value (a map) from fields to values. Note that the initializer expressions must be evaluated in order.
Variable \Gamma = a_g;\Gamma';a_i v = read(h, a_g, a_i, x) --- ( \Gamma, h, x ) -> (h, v) The function $read(h, a_g, a_i, x)$ is defined in terms of the following three cases. First, let $h(a_i) = \{x_1:v_1, \ldots x_k:v_k, p:a, g:\{x_i\}\}$ be the current stack frame. Then, the three cases to consider are:
  • $x \in g$, so $x$ is a global variable. In that case, we need to read x from the global stack frame $h(a_g)$.
  • $x \notin g$, but $x$ has a value in the current stack frame $h(a_i)$; then return $h(a_i)[x]$.
  • $x \notin g$, and $x$ does not have a value in the current stack frame; then search recursively in the parent stack frame $read(h, a_g, h(a_i)[p], x)$. If the parent pointer $h(a_i)[p]$ is NULL, throw an error.

UninitializedVariableException the variable has not been defined in any available stack frame.
Field Read ( \Gamma, h, e)->(h', a ) h'(a) = Map\{...,x:v,...\} --- ( \Gamma, h, e.x ) -> (h', v) ( \Gamma, h, e)->(h', a ) h'(a) = Map\{...,...\} x \notin h'(a) --- ( \Gamma, h, e.x ) -> (h', None) Evaluating the expression e produces a Record; a pointer to a map from variables to values, and the variable 'x' is mapped to the value v. If the map does not contain a field x, the value should be None.
IllegalCastException if $h'(a)$ does not evaluate to a map.
Index Read ( \Gamma, h, e_1)->(h', a ) h'(a) = Map\{...,x:v,...\} ( \Gamma, h', e_2)->(h'', fname ) x = toString(fname) --- ( \Gamma, h, e_1[e_2] ) -> (h'', v) When the index evaluates to a string, that string is used as a variable name to index into the map produced by evaluating $e_1$. If $fname$ is not a string, then it should be converted to a string by the casting rules below. Like before, if there is no field x in $h'(a)$, the Index Read should return None (that rule is omitted for brevity).
IllegalCastException if $h'(a)$ does not evaluate to a map.
Function \Gamma = a_g;\Gamma';a_i --- ( \Gamma, h, fun(x) S ) -> (h, Function(a_i, fun(x) S ) ) The construction of a function creates a new function value that captures the current stack pointer.
Function Call \Gamma = a_g;\Gamma';a_i ( \Gamma, h, e_b ) -> (h_1, Function(a_j, fun(x) S ) ) ( \Gamma, h_1, e_p ) -> (h_2, v ) a' = freshAddress(h_2) h_3 = h_2\{a' -> newStackFrame(a_j, fun(x) S)\} h_4 = update_3(h_3, a', x, v) h_5 = h_4[ret:None] \Gamma' = Gamma;a' (\Gamma', h_5, S) -> h_6 --- ( \Gamma, h, e_b(e_p) ) -> (h_6, h(ret) ) There are multiple steps to this rule. First, the base expression for the function call must be evaluated, and it must evaluate to a function value, which consists of a pointer to a frame, and the code for the function. The rule is written assuming a function with one parameter, but you should generalize it to support multiple parameters; when there are multiple parameters, you should evaluate them in order from left to right. If the caller passes a different number of parameters from what the callee expects, you should throw an error. The next step, and the most complex one is to allocate a new stack frame for the function call. The function $newStackFrame(a, f)$ must do all of the following things.
  • It should create a new stack frame with parent pointer equal to $a$ ($a_j$ in the context of the rule).
  • It should traverse the body of the function f and for every global declaration $global~x$ it should add $x$ to the global set of the new stack frame.
  • Also, for every assignment of the form $x=e$ in the body of $f$, it should add variable $x$ to the stack frame and initialize it to None.
  • In searching for assignments and global declarations, it should NOT visit the bodies of any inner functions defined inside the body of f.
After creating the new stack frame, the code should set all the parameters in the new stack frame to their correct values, and then evaluate the body of the function. Note that the function $update_3$ is different from the previous versions of update we have seen earlier. It is most similar to $update_1$, except it ignores the global set and simply writes directly to variable $x$ in the new stack frame $a'$. It is assumed that the return value will be stored in a special location $ret$ in the heap, so upon completion, that value must be retrieved. Note that before executing the body of the function, $ret$ is initialized with the None value.
IllegalCastException if $e_b$ does not evaluate to a Function value.
RuntimeException if the caller passes too many or too few parameters to the function.
Binary and Unary Operator ( \Gamma, h, e_1 ) -> (h_1, v_1 ) ( \Gamma, h_1, e_2 ) -> (h_2, v_2 ) v = compute(v_1, v_2, op) --- ( \Gamma, h, e_1 ~ op ~ e_2 ) -> (h_2, v ) The $compute$ function performs the actual computation on the two values $v_1$ and $v_2$. There is a corresponding rule for the unary operator. The semantics of the individual operators are mostly what you would expect; e.g. $a * b$ computes the product of two integers $a$ and $b$. However, you need to be aware of the following rules:
  • Boolean operators can only be applied to boolean values.
  • The arithmetic operators '-' '*' and '/' can only be applied to integer values.
  • The operator '+' on two integers corresponds to standard integer addition.
  • The operator '+' on two strings corresponds to string concatenation.
  • The operator '+' on a string and a value that is not a string causes the non-string value to be cast to a string.
  • Comparison operators '<', '>', '<=' and '>=' are only defined for integers.
  • Comparison operator '==', between two integers, two booleans, two strings, or between two None values performs a value comparison.
  • Comparison operator '==', between two Records checks for pointer equality.
  • Comparison operator '==', between two Functions checks that you have the same frame pointer and the same function.
  • Comparison operator '==', between two different types of values returns false

IllegalCastException for all cases not given explicit semantics above.
IllegalArithmeticException for division by zero.

Automatic casting

Some of the rules above involve automatic casting from other values into strings. Your implementation should define a toString function that casts values to strings according to the following rules:

Native functions

Your interpreter should support the following native functions:

These three functions should be initially available in the global scope. However, it should be possible for the program to redefine them. For example, the program below updates print to add a prefix to every message.

```java print("Hello"); //this should print 'Hello' to the console. oldprint = print; print = fun(s){ oldprint("OUTPUT: " + s); }; print("Hello"); //this should print 'OUTPUT: Hello' to the console. ```

Dealing with return in the middle of a function

The semantics above assume that return only happens at the end of a function. Your final interpreter, however, should support return anywhere in the body of a function. In particular, One approach to dealing with return is to have a flag in your interpreter that indicates whether 'return' has been executed or not. If the flag gets set to 'returned'. The code to interpret blocks and loops can check this flag before executing the next statement/loop-iteration.

Implementation notes: main

The starter code for this assignment is your Assignment 1. It is recommended that you implement your interpreter using the Visitor interface you created in Assignment 1. The main function should be modified to be like the one shown below. ```java int main(int argc, char** argv) { void* scanner; yylex_init(&scanner); if (argc < 2) { cout << "Expecting file name as argumeent" << endl; return 1; } FILE* infile = fopen(argv[1], "r"); if (infile == NULL) { cout << "Cannot open file " << argv[1] << endl; return 1; } yyset_in(infile, scanner); Statement* output; int rvalue = yyparse(scanner, output); if (rvalue == 1) { cout << "Parsing failed" << endl; return 1; } try { Interpreter interp; output->accept(interp); } catch (InterpreterException& exception) { cout << exception.message() << endl; return 1; } return 0; } ``` Note that the version of main above reads from a file passed as a commandline argument (an earlier version of this read from stdin, but if you do that, you will not be able to use the input function in your programs). We will be testing your interpreter by actually executing some programs and checking their output against that of our reference implementation, so you want to make sure your interpreter does not produce any unnecessary output.

Implementation notes: Mapping inference rules to visitors

You will be using visitors to implement the inference rules shown above. Note, however, that visit functions in the visitor map from an AST node to void, whereas the evaluation relation like $(\Gamma, h, e) \rightarrow (h,v)$ takes a stack and a heap in addition to the expression, and produces a heap and a value.
The way we are going to deal with this is that we will use instance variables to store both additional inputs as well as return values. For example, when visiting expressions, you want your visitor to return the value of the expression, but since the visit methods are void, the solution is to have an instance variable in the interpreter visitor: ```java Value* rval; ``` Then, when visiting expressions, you can enforce the property that every visit method for an expression sets the value of rval to the value of evaluating that expression. So for example, when visiting a binary expression expr, you can have code like: ```java expr->left->accept(*this); Value* leftVal = rval; expr->right->accept(*this); Value* rightVal = rval; rval = computeBinaryOp(expr->op, leftVal, rightVal); return; ``` In fact, I found it useful to define a method: ```java Value* eval(Expression* exp){ exp->accept(*this); return rval; } ``` Then, the code above can be rewritten as ```java Value* leftVal = eval(expr->left); Value* rightVal = eval(expr->right); rval = computeBinaryOp(expr->op, leftVal, rightVal); return; ``` You can also use an instance variable to keep track of the stack. For example, you can do ```java stack < StackFrame*> frameStack; ``` In the case of the heap, there is no nead to define your own map, since there is already a map from addresses to values you can use: the heap of the language itself! Thus, for example, any point where you are allocating addresses in the semantics can be implemented by actually allocating memory in the implementation (e.g. by using new).


In addition to your parser, you should submit 3 tests named interptest1.mit, interptest2.mit and interptest3.mit. Your tests should not use the "input" function, but should use print. As before, submission of your assigment should be accomplished by pushing your code to a branch with the name a2-submission. The last assignment pushed to this branch before the assignment due date is the one we will grade unless you notify us otherwise.