Instruction and Control

Introduction

The way I was taught about how computers work was via the “fetch, decode, execute” loop. I have no idea if this is standard pedagogy, but I imagine it is. The idea is that any computer must have a process to figure out the next operation to perform (fetch), it needs to be able to turn that operation into the correct set of control signals (decode), and of course once all of those control signals are setup it needs to actually perform the operation (execute).

I’m sure there are computers out there that think of these three steps as independent, and it’s probably the case that most are like that, but in the Theoputer I decided to simplify things a bit. Instead of three steps, I decided to combine the (fetch and decode) steps into a single step.

The reason for this is really about the clock! The clock has two distinct phases: the down phase and the up phase. Trying to split things into threes or spending three clock cycles for each instruction just seemed like a waste.

The circuit responsible for this combined fetch + decode stage I have dubbed the Instruction and Control board. Its job is to fetch instructions and decode those instructions into the set of control signals.

And it is a board. One of the few (down to three as of writing) circuits/boards that are plugged in to the Daughter Board. It’s also one of the more complex parts of the Theoputer apart from the Daughter Board, and it is the largest board that’s plugged into the Daughter Board. This post will be about the current board, which doesn’t suffer from the many issues that the prior boards plagued me with, but I will try to walk through some of the stickier parts along the way.

Two Clock Phases, One Set of Lines

Recall that when we designed our clock we made a point about how there are two phases:

The Instruction and Control (I&C) circuit is only responsible for that first phase, labeled the “fetch” phase. The I&C board is also the only circuit that is allowed to operate in this fetch phase because it is uniquely responsible for setting the control signals in the Theoputer. We discussed this a bit in the overview of how the Theoputer computes, but let’s do a slightly deeper dive.

Simplified Model

Let’s avoid looking at the schematic for now; it’s a bit complicated for our purposes. Instead, consider the following blocks:

Instruction and control, decoder, and register blocks

We see the I&C board here, which takes an instruction address, fetches that instruction, and decodes it into a set of control lines. In this simplified case we’re looking at just two of those control signals, a read and a write signal for one of the registers. You’ll also note that the register is connected to the databus and that connection is two-way. The register both can read and write to the databus depending on which control signal is active.

Let’s look at a specific instruction that will load the value $1$ into register A. In Theoputer Assembly that is the instruction LAI 0x01. For now let’s forget about where the $1$ comes from and just focus on the register logic:

Same image as above showing the write signal on the register active

Note: I’ve colored the first bit of the databus lines red to indicate that the value $1$ is present on them.

We can see the clock in the drawing now, because something has to tell the decoder and the register to do their respective actions and that is the job of the clock. But we have a problem. We have something called a “race condition” here if we take the diagram at face value.

Off to the Races

Imagine the clock pulses. That pulse takes some amount of time to get from the clock circuit to the decoder. It appears instantaneous, but nothing is instantaneous (even quantum leaps, contrary to what you may have been told). So that pulse takes some time to get to the decoder and then the decoder takes some time to setup the ~~WRITE~~ control signal, which takes some time to propagate to the register to tell it to accept a write operation.

In parallel that original clock pulse is also sent to the register to tell it to latch the current contents of the databus if and only if the ~~WRITE~~ signal is active.

But what if the signal to the register’s latch action happens first?! These two paths are racing each other, so we don’t know which one will happen first! There could be thermal noise or capacitances or whatever around that actually delays one of the signals and cause a different path to win seemingly randomly.

At Last, Two Phases

This race condition is the reason for separating out the (fetch + decode) steps from the execute step. We avoid all data race conditions like the one described if we ensure that the control lines get set and then execution happens. The picture really looks like this:

Correctly separating fetch and decode from execute to avoid race conditions

You can see the two phases of the clock now. By convention the (fetch + decode) actions happen during the ~~↓CLK~~ phase and (execute) actions happen during the ~~↑CLK~~ phase.

Thus for the rest of the discussion about the Instruction and Control circuit, we will only discuss things that happen during the (fetch + decode) / ~~↓CLK~~ phase.

And You Get a Register!

Everyone gets a register in the Theoputer! In this case the register is going to be critical to several actions. First, let’s consider the complete abstract picture of the LAI 0x01 instruction we looked at before:

Animated image showing the fetch + decode phase and execute phase of the LAI 0x01 instruction

If you’ve got your deep engineering hat on you may have a question about that second execute phase. How are the control lines still set in that second phase? If the Instruction and Control (I&C) circuit is intended to only operate during the (fetch + decode) phase, then why are those control signals not reset as soon as the ~~CLK~~ signal goes high?

The answer, of course, is that the I&C circuit has some memory inside it so that the control lines are valid throughout the entirety of the (fetch + decode + execute) instruction. If those control lines were somehow allowed to change during the execute phase then we would have all of the same race conditions previously discussed.

The deeper truth here is that the program ROMs will maintain their data outputs until the ROM address changes, which only happens when the Program Counter is changed. So in theory we don’t need this memory. But! It’s good that we added this memory, because when RAM execution entered the picture things got messy enough that the guarantee of instruction data being constant changed.

Of course the word memory should immediately invoke the word register since that’s effectively what memory is: a bank of registers. In this case we give this register a special name: The Instruction Register.

Inside the I&C circuit, the ~~↓CLK~~ signal actually latches the Instruction Register, effectively saving the current instruction for the lifetime of that instruction’s execution phase. Before we look at the schematic, let’s continue with an abstract view of the LAI 0x01 instruction:

Animated image showing the fetch + decode phase and execute phase of the LAI 0x01 instruction with the instruction register

Now we can see that the instruction is saved in the Instruction Register between the two clock phases, ensuring that the control signals remain valid through the entire instruction.

Fetch

Let’s dissect the (fetch + decode) phase by first looking at the fetch. For most of the Theoputer’s life, all instructions were fetched from the program ROMs. That changed recently with a new feature allows instructions to be executed from RAM, but we will avoid that topic here.

Recall that the Program Counter is 16-bits wide. Also recall that all instructions are 16-bits long. These two lengths are not correlated. They just happen to both be 16-bits. Before we dig deeper into the details we first need to define a useful term: Opcode. An opcode is the actual bit sequence of an instruction.

Let’s consider the instruction ADD. In the ISA the ADD operation does the following:

$$ \textrm{REG}_X = \textrm{REG}_A + \textrm{REG}_B $$

The opcode for this instruction is:

Bit 16	Bit 15	Bit 14	Bit 13	Bit 12	Bit 11	Bit 10	Bit 9	Bit 8	Bit 7	Bit 6	Bit 5	Bit 4	Bit 3	Bit 2	Bit 1
0	1	0	1	0	0	0	1	0	0	0	0	0	0	0	0

Notice that the bottom 8 bits are all zero. This is not coincidental. All of the Theoputer opcodes are actually 8-bits long, with the bottom 8 bits reserved for instruction data (see below).

We could have created variable-length instructions, but there’s not much gain in doing so. This is because of the way ROM works in the Instruction and Control board. The ROM chips in the Theoputer are these awesome ICs from Mircochip: SST39SF010. While they are awesome, they only have 8 data lines. In order to store 16-bit length opcodes, we need two of them!

Sure, we could use one and try to load the first half of the opcode and then load the second one. But that is far more complicated to orchestrate. Throw chips at the problem!

This is a case of good engineering. We don’t need a perfectly designed system right now. There’s enough complexity in the Theoputer as-is. We don’t gain enough by adding in the extra complexity of a system that needs to store parts of opcodes and all of the timing that comes with that. The juice is not worth the squeeze, and if it really ended up mattering we could switch to something more complicated only after we got the simpler system working.

The fetch part of the Instruction and Control board isn’t much more than just two of those SST39SF010 ROM chips wired in parallel so they provide the 16-bit opcode at the input address:

You can ignore the ~~^ROM_WE~~ and ~~DISABLE_ROM~~ signals for now. The ~~ADDR{0-15}~~ lines are connected to the Program Counter output in the Daughter Board. This happens via 16 pins attached to the Instruction and Control board that plug into 16 pins on the Daughter Board that connect to the 16 output signals of the Program Counter.

Instruction Data

Let’s think a bit more critically about the beloved LAI 0x01 instruction. We glossed over where the $1$ comes from before, but now is a good time to reconsider that glossiness.

Without going through the details of all of the opcodes in the Theoputer (discussed in the post about Theoputer Assembly), let’s look at the opcode for LAI ${value}:

Bit 16	Bit 15	Bit 14	Bit 13	Bit 12	Bit 11	Bit 10	Bit 9	Bit 8	Bit 7	Bit 6	Bit 5	Bit 4	Bit 3	Bit 2	Bit 1
0	1	1	0	0	0	0	0	I08	I07	I06	I05	I04	I03	I02	I01

Those last 8 bits are not numbers. They are arguments to the opcode $01100000$ and they are intended to be output to the databus from the instruction itself. This enables the programmer to set register A to an arbitrary immediate value as long as it is only 8 bits.

Note: The I in LAI stands for immediate

Recall that in a write-like operation, the register latches the value on the databus. So if we want the immediate value in the instruction itself to wind up in the register, we need to output the last 8 bits of the instruction to the databus.

This is exactly what happens in the Theoputer. This is the schematic for this particular mechanism in the Instruction and Control circuit:

Note: The registers in the Theoputer are all 8 bit and the instruction opcodes are all 16 bit, so there are actually two Instruction Registers. The one pictured above is just the bottom 8 bits.

Here you see the bottom 8 bits of the instruction (~~INSTR{0-7}~~) coming in to the bottom 8 bits of the Instruction Register. You can also see there is a buffer that will output those bottom 8 bits to a set of lines ~~IR_OUT{0-7}~~ if the signal ~~^IO~~ is active. As long as we connect ~~IR_OUT{0-7}~~ to ~~DBUS{0-7}~~ on the Daughter Board and connect up the correct control signal to ~~^IO~~ we will get the desired behavior (spoiler alert: that’s how it’s connected in the Theoputer).

There are several types of instructions that take these kinds of 8-bit parameters:

Instructions that put an immediate values into a register
Instructions that copy data to memory locations
Jump instructions

Most of those instructions also come in flavors that take their parameters from other registers, or memory, or anything that can write the databus in lieu of the parameters. But that’s starting to get into too many of the details of the ISA.

If you prefer the more abstract view of things, this is equivalent to ensuring one of the control lines is connected to the instruction register’s output buffer, which in turn is connected to the databus:

Animated image showing the fetch + decode phase and execute phase of the LAI 0x01 instruction with the instruction register also with data coming out of the instruction register

I’ve added in the ROM chip here as well just for some extra clarity.

Long-Argument Instructions

We are dipping our toe slightly into the territory of the decoding part of the Instruction and Control circuit (covered in a separate post), but the Instruction Register is important to understand in the context of the fetch, and thus instruction data is also relevant. Also also it makes sense to discuss the only other feature of instruction data: long argument instructions.

This is a feature that’s been around since the first real ISA, which was ISA V2.0. You probably, rightly, thought that having jumps and memory addresses that can only be 8-bits long is rather limiting. After all, both RAM and ROM have 16-bits of address space.

But we only have 16-bits total for opcode, and the top 8 bits are already claimed to decode the opcode to the control signals. What to do?

The solution in the Theoputer is to have a special bit to indicate that the opcode is a short opcode, and thus the argument is longer. This bit is the most significant bit (MSB) in the opcode (bit 16). If it is $1$, then only the top 4-bits are used for the actual instruction, allowing for 12-bits of data.

Take, for example, the instruction LAM 0x${address}. This is the instruction to load register A with the contents of RAM at address ${address}. Here is the full 16-bit opcode:

Bit 16	Bit 15	Bit 14	Bit 13	Bit 12	Bit 11	Bit 10	Bit 9	Bit 8	Bit 7	Bit 6	Bit 5	Bit 4	Bit 3	Bit 2	Bit 1
1	1	0	1	A12	A11	A10	A09	A08	A07	A06	A05	A04	A03	A02	A01

And in the circuit for the Instruction Register you can see how this MSB is used to control whether the bottom 8 bits or 12 bits are output to the databus:

You’ll note that the path through the bottom buffer there is not actually from a register! Until RAM execution was added the instructions coming from the program ROMs where valid throughout the lifetime of the instruction. Thus we didn’t strictly need any instruction registers.

Writing this blog post actually helped me think through this and fix this problem before production! I’ll take that as a win.

Final Note

All of this business with variable-length arguments seems a little too clever. And that’s a violation of the motto Never Be Clever. In fact, in more modern versions of the Theoputer, there are additional instructions (e.g. JPP) that use the values in the A and B CPU registers to perform jumps and memory address operations. Since each register is 8-bits long, together A and B allow for the full 16-bit parameters needed to address the entirety of the system’s ROM or RAM.

This is especially true for the assembly output of the Cish compiler, which can’t reliably know if 8 or 12 bits is enough. Still, the 8-bit and 12-bit parameterized jumps and memory accessors are faster and thus stick around for the truly clock cycle cutters out there.

What’s Left: Decode

This post is getting quite long and the decode portion of the Instruction and Control circuit/board is quite complicated because the Theoputer uses microinstructions. In an effort to make things more readable, there is a second post dedicated to the decode step.

If for some reason you’re here to see the Instruction and Control board in action, enjoy this fairly long video of the first ever instruction to be fetched and decoded, before the Daughter Board was even built:

There’s a short dedicated post about this video if you prefer to read versus watch.