Deadline: March 14 at 7pm.
Values
Values in the program can belong to any of the following five categories.Program state
The state of the program is represented by the stack and the heap. Because of closures, however, it is best to model the stack itself as a series of references to heap allocated stack frames.A stack frame $s=\{x_1:v_1, ~\ldots ~x_k:v_k, ~p:a, ~g:\{x_j\} \}$ contains a set of variables $x_i$, each mapping to a value $v_i$, together with a parent pointer $p$ that maps to an address $a$ of a parent stack frame, and a set of global variable names $g$.
The state of the program includes of a heap $h:a\rightarrow \nu$ that maps addresses $a$ to heap allocated values $\nu$ which can either be stack frames or one of the five types of values listed above. It also includes a stack $\Gamma$ of references to stack frames. We will use the notation $\Gamma = \Gamma';a$ to indicate that the stack $\Gamma$ is equal to another stack $\Gamma'$ with a reference $a$ added at the top. Similarly, we use the notation $\Gamma = a_1;\Gamma';a_2$ to indicate that $\Gamma$ is a stack that has $a_1$ at the bottom, followed by another stack $\Gamma'$ followed by a reference $a_2$ at the top.
The program starts its execution with a Global stackframe $g=\{ \ldots p:NULL \}$ with a parent pointer equal to $NULL$, and with three predefined global functions ( see below). The initial heap contains only this initial global stackframe ($h_0 = \{a_g:g\}$), and the initial stack contains only a pointer to the global stack frame in the heap $\Gamma_0=a_g$.
Errors
When a program performs an illegal operation, execution should stop, and your interpreter must throw one of the following exceptions:-
UninitializedVariableException
when the program attempts to read from a variable in the stackframe that has not been initialized. -
IllegalCastException
when you attempt to apply an operation to value type for which it does not apply. -
IllegalArithmeticException
when you attempt to divide by zero. -
RuntimeException
other error situations, such as passing too many arguments to a function.
Program semantics without return
We will define the semantics of the language in steps; we will start with a "clean" version that assumes return statements only happen at the end of functions, and then we will enrich the semantics to account for returns in arbitrary program locations. At this stage, the semantics of statements will be defined in terms of rules that take a stack, a heap and a statement and produce an updated heap. \[ (\Gamma, h, S) \rightarrow h' \]Sequence of Statements (Block) |
|
The first statement is evaluated under the initial heap, and every subsequent statement is evaluated under the heap produced by the previous statement. |
Var Assignment |
|
Where the update function $update_1(a_g, a_i, h', x, v)$ is defined in terms of the following cases.
First, let $h'(a_i) = \{x_1:v_1, \ldots x_k:v_k, p:a, g:\{x_i\}\}$ be the current stack frame.
Then, the two cases to consider are:
|
Heap Assignment |
|
In this case, the assignment is not updating a stack frame; it is updating a record in the heap.
In this case, the update function is much simpler. $update_2(h, a, x, v)$ simply checks that $h(a)$ is indeed a Record, and
updates field 'x' in that record with the value $v$.
IllegalCastException if $h(a)$ is not a record. |
Heap Index Assignment |
|
This rule is similar to the previous one, except the field name is not hardcoded; instead,
the index expressin needs to be evaluated and cast to the string according to the toString function defined later.
IllegalCastException if $h(a)$ is not a record. |
If True |
|
If the condition evaluates to true, the result of the if statement is the result of evaluating the
then branch.
There is a symmetric case for when the branch condition evaluates to false $(\Gamma, h, e)->(h', False)$.
In that case, the result is the result of evaluating $S_2$.
IllegalCastException if $e$ does not evaluate to a boolean. |
While True |
|
IllegalCastException if $e$ does not evaluate to a boolean. |
While False |
|
IllegalCastException if $e$ does not evaluate to a boolean. |
Global |
|
The global declaration is a noop; it will only be relevant during function creation. |
Return |
|
Remember for this case, we assuming return only happens at the end of a function, so the only thing for return to do is evaluate the return expression and store it in a special location 'ret' in the heap. |
Expresssion evaluation is more interesting. We now elaborate on the semantics of the main expression types:
Constant |
|
There are similar versions of this rule for boolean and String constants, as well as the None constant. The idea is straightforward; the interpreter just generates a value corresponding to the constant. |
Record Constructor |
|
There record constructor just evaluates all its initializer expressions and constructs a Record value (a map) from fields to values. Note that the initializer expressions must be evaluated in order. |
Variable |
|
The function $read(h, a_g, a_i, x)$ is defined in terms of the following three cases.
First, let $h(a_i) = \{x_1:v_1, \ldots x_k:v_k, p:a, g:\{x_i\}\}$ be the current stack frame.
Then, the three cases to consider are:
UninitializedVariableException the variable has not been defined in any available stack frame. |
Field Read |
|
Evaluating the expression e produces a Record; a pointer to a map from variables to values,
and the variable 'x' is mapped to the value v.
If the map does not contain a field x, the value should be None.
IllegalCastException if $h'(a)$ does not evaluate to a map. |
Index Read |
|
When the index evaluates to a string, that string is used as a variable name to index into the map produced by evaluating $e_1$.
If $fname$ is not a string, then it should be converted to a string by the casting rules below.
Like before, if there is no field x in $h'(a)$, the Index Read should return None (that rule is omitted for brevity).
IllegalCastException if $h'(a)$ does not evaluate to a map. |
Function |
|
The construction of a function creates a new function value that captures the current stack pointer. |
Function Call |
|
There are multiple steps to this rule. First, the base expression for the function call must be evaluated,
and it must evaluate to a function value, which consists of a pointer to a frame, and the code for the function.
The rule is written assuming a function with one parameter, but you should generalize it to support multiple parameters;
when there are multiple parameters, you should evaluate them in order from left to right. If the caller passes
a different number of parameters from what the callee expects, you should throw an error.
The next step, and the most complex one is to allocate a new stack frame for the function call. The function
$newStackFrame(a, f)$ must do all of the following things.
IllegalCastException if $e_b$ does not evaluate to a Function value. RuntimeException if the caller passes too many or too few parameters to the function. |
Binary and Unary Operator |
|
The $compute$ function performs the actual computation on the two values $v_1$ and $v_2$. There is a corresponding rule
for the unary operator. The semantics of the individual operators are mostly what you would expect; e.g. $a * b$ computes
the product of two integers $a$ and $b$. However, you need to be aware of the following rules:
IllegalCastException for all cases not given explicit semantics above. IllegalArithmeticException for division by zero. |
Automatic casting
Some of the rules above involve automatic casting from other values into strings. Your implementation should define a toString function that casts values to strings according to the following rules:- toString(String(s)) = String(s)
- toString(Boolean(b)) = b ? "True" : "False"
- toString(Integer(n)) = The string representing the integer n in base 10
- toString(Function(fname, code)) = "FUNCTION"
- toString(Map{x_i:v_i}) = "{" + x_i + ":" + toString(v_i) + " " ... "}"
- toString(None) = "None"
Native functions
Your interpreter should support the following native functions:print(s)
Uses the default casting of s to a string and prints it to the console followed by a newline.input()
Reads a line of input from the console and returns it as a string value.intcast(s)
Expects a string and internally uses the c++ functionatoi
to parse the string and return an integer value. If the string does not represent an integer (e.g., the string "hello"), the function should raise an IllegalCastException
These three functions should be initially available in the global scope. However, it should be possible for the program to redefine them. For example, the program below updates print to add a prefix to every message.
Dealing with return in the middle of a function
The semantics above assume that return only happens at the end of a function. Your final interpreter, however, should support return anywhere in the body of a function. In particular,- return in the middle of a block should prevent statements after the return from executing
- return in the body of a loop should prevent subsequent iterations of the loop
Implementation notes: main
Implementation notes: Mapping inference rules to visitors
You will be using visitors to implement the inference rules shown above. Note, however, that visit functions in the visitor map from an AST node to void, whereas the evaluation relation like $(\Gamma, h, e) \rightarrow (h,v)$ takes a stack and a heap in addition to the expression, and produces a heap and a value.Deliverables
In addition to your parser, you should submit 3 tests named interptest1.mit, interptest2.mit and interptest3.mit. Your tests should not use the "input" function, but should use print. As before, submission of your assigment should be accomplished by pushing your code to a branch with the namea2-submission
. The last assignment pushed to this branch before the assignment due date is the one we will grade unless you notify us otherwise.