Full Name: ________________________________

MIT ID: ________________________________

Athena ID: ________________________________

Run \LaTeX{} again to produce the table
1. Consider a simple machine with one ALU and one memory unit. The machine can execute an ALU operation and a memory operation at the same clock. A memory load operation takes two clocks (i.e. its result will be available in two clocks) and all other operations (including a memory store operation) complete in one clock.

Consider the following code fragment:

\[
\begin{align*}
\text{L1:} & \quad \text{mov} \ 0(\%rsp), \ %r10 \\
\text{L2:} & \quad \text{mov} \ %r10, \ 4(\%rsp) \\
\text{L3:} & \quad \text{mov} \ 8(\%rsp), \ %r11 \\
\text{L4:} & \quad \text{add} \ %r8, \ %r11 \\
\text{L5:} & \quad \text{add} \ %r10, \ %r11 \\
\text{L6:} & \quad \text{mov} \ %r11, \ 12(\%rsp)
\end{align*}
\]

where \text{mov} \ src, \ dst \ copies \ a \ value \ from \ src \ to \ dst \ and \ add \ src, \ dst \ means \ dst \ += \ src.

(a) (5 points) A dependence DAG represents the constraints on the possible schedules of the instructions in a basic block. The nodes in a dependence DAG represent machine instructions and its edges represent dependencies between the instructions. Each edge is labeled with latency \( \ell \) indicating that the destination node must be issued no earlier than \( \ell \) clocks after the source node is issued. Draw the dependence DAG for the above code fragment.
(b) (8 points) How many clocks does it take to execute the code fragment without any instruction scheduling? Fill the following table with the instruction labels (L1, L2, · · · , L6) that describes what instructions are executing in the memory unit and the ALU at each clock. Leave blank any clocks that are not needed.

<table>
<thead>
<tr>
<th>Clock</th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
<th>5</th>
<th>6</th>
<th>7</th>
<th>8</th>
<th>9</th>
<th>10</th>
</tr>
</thead>
<tbody>
<tr>
<td>Memory Unit</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>ALU</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

(c) (8 points) How are the instructions in the code fragment reordered when we apply the list-scheduling algorithm with the heuristics for selection presented in Lecture 13? Write your answer by listing the instruction labels (L1, L2, · · · , L6) in the new order.

(d) (8 points) How many clocks does it take to execute the reordered code fragment after applying the list-scheduling algorithm? Fill the following table with the instruction labels (L1, L2, · · · , L6) that describes what instructions are executing in the memory unit and the ALU at each clock. Leave blank any clocks that are not needed.

<table>
<thead>
<tr>
<th>Clock</th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
<th>5</th>
<th>6</th>
<th>7</th>
<th>8</th>
<th>9</th>
<th>10</th>
</tr>
</thead>
<tbody>
<tr>
<td>Memory Unit</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>ALU</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
2. (a) Consider the following control flow graph with the assumption that all variables are live on exit from the loop:

- (1) $A = B + C$
- (2) $E = 3$
- (3) $D = A + 1$

i. (5 points) Which statements are loop-invariant?

ii. (5 points) Which statements can we move out of the loop?
(b) Consider the following control flow graph with the assumption that all variables are live on exit from the loop.

\[\text{(1) } A = B + C\]
\[\text{(2) } E = 2\]
\[\text{(3) } E = 3\]
\[\text{(4) } D = A + 1\]
\[\text{(5) } F = E + 2\]

i. (5 points) Which statements are loop-invariant?

ii. (5 points) Which statements can we move out of the loop?
3. Consider the following control flow graph:

(a) (8 points) Identify all the webs in the graph. Leave blank any rows or cells that are not needed.
(b) (8 points) Draw the interference graph for the webs you identified in (a). Label each node with its web number and variable.

(c) (5 points) How many colors are required to color the nodes of the graph such that no two adjacent nodes share the same color?

(d) (5 points) The target machine has less registers than required to allocate a register to every web. Which web will be chosen to be spilled first when we adopt the interference-degree heuristic?

(e) (5 points) Which web will be chosen to be spilled first when we adopt the spill-cost heuristic?
4. (a)

\begin{verbatim}
FOR I = 1 TO N
    FOR J = 1 TO N
    }
}
\end{verbatim}

i. (3 points) What is the distance vector between the source and sink of a dependence in the iteration space of the loop nest?

ii. (3 points) What kinds of dependencies exist in the loop nest?

iii. (3 points) Which loop(s) can be fully parallelized?

(b)

\begin{verbatim}
FOR I = 1 TO N
    FOR J = 1 TO N
        A = A + A;
    }
}
\end{verbatim}

i. (4 points) What is the distance vector between the source and sink of a dependence in the iteration space of the loop nest?

ii. (4 points) What kinds of dependencies exist in the loop nest?

iii. (3 points) Which loop(s) can be fully parallelized?