My TTL CPU
Last update: 2023-04-13
What is it?
This is a 32-bit processor made of only TTL ICs. I'm busy building three different TTL CPUs and this is the smallest one. The others:- Mini6502, the middle one
- Big6502, the biggest one
All three projects are still under construction.....
Some history: the first processors
If you are Dutch, you should be familiar with a "Draaiorgel" / "Barrel organ". The music of the bigger barrel organs is directed by its Draaiorgelboek / Book music. The holes in the paper of the rolls make the various instruments to produce their sounds. There are barrel organs having more than 100 instruments on board.But what have barrel organs to do with processors? The various registers, counters, adders and whatever else you find inside a processor can be considered as its instruments and the program, more or less, as its book music. ("More or less" because modern processors have an "Instruction decoder", see later)
The fact is, the first computers operated this way: the bits of every byte of a program directly manipulated the various registers, counters etc. The more registers, counters or whatever such a computer had, the more bits a byte had. Zuse's Z3, the first operational Turing-complete computer in the world, was a 22 bitter. Its successor, the Z4, was 32 bits.
Remark: nowadays we are used to the fact that bytes are only 8 bits. But in the early computer days a byte could hold any number of bits. Due to the overwhelming present of 8-bitters from the 1970s on, byte became a synonym for 8 bits. From now on, if I use the word "byte" here, I mean 8 bits as well.
But there is one big difference between the organ and this type of processor: each line of the book music only contains code that plays the instruments whereas each line of code for the processor that "plays" the registers etc., also includes a number of bits of data. And the designer of the computer decided how many data bits would be used.
But in time the computer guys got in trouble. Once one computer was built, people wanted already a better one. This meant more data bits, more counters, more registers, etc., etc., etc., so the bus became wider and wider. A bad side effect of improving a computer was that practically any change made it incompatible with all earlier models. More important, already developed software could not be used on the changed models. The Z4 mentioned above was an improved version of the Z3 and also ran into this problem: software developed for the Z3 had to be rewritten so it would run on the Z4.
Then some guys invented the Instruction Decoder and a lot of problems were solved. For example, it is the instruction decoder that enables an AMD 80386 to run code originally written for an Intel 8088. But that is not discussed here, see the Mini6502.
My idea
For quite some time I wanted to build my own processor but so far, the designs were too complex, mainly due to the Instruction Decoder. I had thought about building such a "Barrel organ processor" as well but I didn't like the idea of having a n-bits wide data bus because it meant that I needed a lot of parallel ROMs and RAMs to be able to run any program.Then I had a brain wave: instead of reading n bits parallel, I would read them one by one bytewise (= 8 bits) from memory, store each byte in a latch and once all bytes had been loaded, I would activate the latches and all needed functions would be performed at that moment. The big advantage: just one ROM needed for (in this case) a 24 bits instruction code.
But now the weird fact: how many bits is this processor? The actual data it can handle is only 8 bits. But the size of its opcode is 24 bits. Then I considered this: if I had designed like the Z3, I would have needed a byte of 32 bits. So, a 32-bitter it is.
What I said above about changing/improving this type of processor is also valid for this one. Once I have built it, I have to stick with it.
What processor will it emulate?
The answer is simple: none. To be more precise: it is a new processor with its own opcode set. By the way, do you know a processor only using opcodes of 24 bits? Again: this has to do with the strong relation between the bytes to be loaded and the hardware of the TTL-CPU. In this case I have to load 24 bits to tell the TTL-CPU what to do where a lot of processors, like the 6502, could do with just eight bits. And even then it will probably do only a part of what the 6502 can do. So, in short: this will be a brand new, one of a kind, processor in this world.Version 1 and 2
At this moment I am at version 4, which is the production version. Version 1, in fact two versions: one with and one without on board RAM, contained errors. Version 2 is an improved version 1 without RAM and is described here in the first place. Version 3 is version 2 plus some add-ons: RAM, ROM and I/O. It is described later. In version 4 a GAL replaces some glue logic.Picture (of version 1)
The schematics
This schematic is not a complete jump into cold water, I will use ideas gathered when designing previous TTL-CPUs.The processor port: the interface to the outside world
The idea is to use a female 64 pins AC DIN connector as port to the outside world, one that is to be used for all my TTL-CPU boards. A card with a male DIN connector is attached to the TTL-CPU. This card should contain at least a connector to connect the board to the target system. In some cases some extra hardware is needed.
Remark: being a 6502 fan, the comment and design is a bit 6502 flavored.
An explanation of the various pins:
- A0..23: 24 address lines but only 16 are used in this case. These 16 address lines enable the TTL-CPU to address up to 64 KB of RAM, ROM or I/O. I planned to use all lines but being only human, I made a mistake and had to reduce the number to 16 lines.
- D0..D7: the eight data lines
- AEC is meant for tri-stating the various busses. Not used here.
- NMI, IRQ: To keep the design simple, I don't use these signals. But to be honest, I also had no idea (yet) how to implement them in this design.
- RDY, RESET will be used, see later.
- SO, a typical 6502 signal, won't be used here either.
- SYNC is also 6502 signal and will be used here to mark the actual execution of an instruction.
- HALT, M1: the well-known Z80 signals, not used here.
- PHI0, PHI1 and PHI2: the well-known 6502 clock signals.
- IORD, IOWR, MEMR, MEMW: those who are familiar with 80x86 systems will recognize the names of these lines; they are used to control memory and I/O operations. But forget about these names, only the names I0..I3 are used and of those signals only I1..I3 are used.
PHI0, PHI1 and PHI2
The three clock signals as used by the 6502. I need these signals inside the circuit anyway, so I connected them to the processor port. PHI1 is generated by inverting PHI0 using IC35A and PHI2 is generated by inverting PHI1 using IC24B.
Reset
The Reset signal, active (L), is used for two tasks:
- resetting the Program Counter (four 161 4-bits counters)
- resetting a 74LS393 counter, IC25A.
The last is done using an AND gate, IC23D, and an inverter, IC24D. The inverter creates the needed active (H) CLR signal for the 393.
SYNC
SYNC is a 6502 signal that tells the outside world the first cycle of an opcode is being processed. I use it to tell the outside world that the instruction is executed at that moment. And to be honest, I didn't need to create it, it just happened to be there (more or less).
ReaDY signal
RDY is a signal to tell the 6502, and in this case the TTL-CPU, to halt all activities as long as RDY is (L). The basic idea is that an active RDY prevents PHI2 to reach the 393 counter and Program Counter and so will stop our TTL-CPU doing anything.
RDY is fed to the D-input of IC36B, a 74LS74 D-flipflop. This D-flipflop represents the state of RDY at the end of PHI0 and does this by saving the state of RDY at the rising edge of PHI1 = falling edge of PHI0. If RDY is (L), output /Q becomes (H). /Q is fed to OR gate, IC34D, (through an AND gate, IC23A, see later) together with PHI2. Its output is fed to the clock input of IC25A, a 393 4-bits counter. The outputs of the 393 are increased at every falling edge of PHI2. The moment the second input of the OR gate becomes (H) because of the flipflop, the OR gate seizes to send the pulses of PHI2 to the 393 counter and the processor stops.
As you can see, the CLR input of the D-flipflop has been connected to control line I16. The moment I16 is negated, output /Q is pulled (H) and this will stop the 393. See it as an equivalent of the 80x86 instruction HLT (= HaLT) or even better, the 65816 instruction STP (= SToP).
FYI: I only added this feature for the simple reason that I16 was left over. But in contrary to an 80x86 that can be awaken again by an interrupt, this processor cannot for the simple reason this CPU doesn't have any means to release the flipflop again. In this case halt is really HALT.
After I finished the first schematic, I got another idea. When connecting the board to my Debugger, in single step mode my CPU will stop at every odd step. So the question rose if I could stop it only during the execution step, thus skipping the first six steps. IMHO it only needed one AND gate (IC23A) and that just happened to be left over. It is placed between the RDY flipflop and IC34D, the OR gate. The second input is connected to SYNC.
The function of the AND gate is simple: it blocks the /Q signal coming from the flipflop towards OR gate IC34D during all steps except step 0101 and 0111 when SYNC is (H). In case you want to see all steps, just open jumper J1 and my CPU will stop at every step again.
The Counter circuit and the instruction latches
The TTL-CPU has 24 inputs that need to be controlled. These controls can, for example, tell a latch to clock the data on its inputs or enable the outputs of a buffer. 24 inputs mean three 74ALS573 8-bit latches, the so called "code latches". They have to be loaded with data first and when that has been done, the instruction must be executed. That means I need at least four cycles. The 74LS393 4-bits counter mentioned above, IC25A, takes care of this.As said before, the 393 is clocked by PHI2. PHI2 and three outputs of the 393 are fed to IC21, a 154 4-to-16 demultiplexer. The outputs of the 154 represent the various steps in the process. One idea was to read a byte at every step i.e.at every half cycle of PHI2 but then I realized that would not work because it would mean that PHI2 had to be connected to the address lines of the ROM in one or another way. So I only used every odd step, thus when PHI2 is (H).
At step 0001 the clock input of IC01, a 573 latch, is activated and the byte read by IC17 is stored inside IC01. At step 0011 and 0101 this is done as well for IC02 and IC06.
A 7-segment LED display with internal decoder, DIS3, is used to make the cycles visible.
During the first three cycles the three instruction 573s are tri-stated and all outputs are kept (H) by pull-up resistors. The idea behind this is to make sure that all controls are in a neutral state. An example: assume that an instruction took care of writing data in one of the ALU buffers. To disable the clock signal for this latch after the instruction, a byte has to be read and to be stored into the according instruction latch. But as the ALU latch is still open, this byte will overwrite the one just written into the ALU buffer as well! So by disabling all outputs we make sure that nothing can be changed, overwritten or outputted by accident.
An exception is the output of IC17, the 573 latch that outputs the data coming from the processor port into the internal data bus. An inverter, IC35F, inverts control signal I4 and in this way takes care of negating IC17's OC input, thus allowing the data to reach the instruction latches.
If the instruction is to read data from the outside world, bit 4 of the first opcode byte has to be set (H) so that during the actual execution, IC17 keeps on transferring data from the outside world into the processor.
At step 0110 and 0111 all the outputs of the instruction latches are enabled and, on their turn, activate the needed controls. For example, this can be reading the content on the data bus and this content is written into buffer A of the ALU. Notice that two steps are involved (using AND gate IC23B to combine them) so the actual output mimics the behavior of a 6502. For example, in case of a 6522 VIA the address must be present before the rising edge of PHI2.
This combined signal is also be used to create SYNC, after it has been inverted by IC24F.
If the program wants to read data from the bus, this data is only valid during step 0111. This data, coming from RAM, ROM or I/O, is clocked into IC17 when PHI0 is (H). And that is during step 0111. During step 0110 the third byte of the opcode is placed on the internal bus by IC17. Can this jeopardise things? I don't think so because, if data, that is read from the external bus, needs to be stored, it can only be done at the end of step 0111. And then the real data is present.
At step 1000 the output of pin 9 is fed to the second input of AND gate IC23D and inverted by IC24D so it can reset the 393 counter. This causes the 393 to go to step 0000 at that moment which on its turn will automatically pull pin 9 (H) and thus will release the reset of the 393 counter.
The use of the various latches
The outputs perform, or control, various function:- Reading data from various sources.
- Controlling the selection of various functions of the (ALU). - Writing data to various latches.
In the first case I have only two sources that can be read, in this case only data from the ALU and data coming from the outside world. Otherwise I could have used a demultiplexer like the 74LS138 or 139 to save control lines as I only can read one source at the time anyway.
In the third case I have eight registers I can write to. The whole design would need 29 control lines which on its turn would mean originally that I needed at least four 573 latches and four clock cycles for reading an instruction. But eight of those control lines write to a 573 latch. Using a 138 3-to-8 demultiplexer would reduce the number of control lines to 24, thus needing exactly three 573 latches.
But a problem is that the outputs of a 138 are active (L) and the clock inputs of the 573 are active (H) so inverters are needed. That's why IC07, an 8-bits inverting 540 buffer is needed. Hey, but this means an extra 20-pins IC, then why not using an extra 573?
No. Remember that during the first four steps the 573s will be disabled and their outputs will be pulled (H) by resistors? So the inverters would be needed anyway. OK, I have thought about using resistors to pull the disabled outputs (L) but I don't have good experiences with this method: slow rising edges. And don't forget the extra step that is needed with an extra 573; this will certainly slow down the system by 25%.
Only seven latches are clocked by the 138, the one for latching the output of the data has its own inverted control line, I5. There are two reasons for this construction:
- During the first three steps all control lines are (H) and therefore the 138 would activate output Y7. In version 1 control line I5 disabled the 138, thus disabling output Y7 as well. In version 1 Y7, after inversion, lead to the ALU output latch.
- When working on the opcodes, I found out that when wanting to store data directly into the ALU output latch, I first needed it to store in, for example, the A register and then I would need another instruction for copying the data from the A register through the ALU into the ALU output latch. Using I5 in this way I can clock both the clock input of register A or B, C2 or C4, and the one of the ALU output latch, C7, at the same time.
The Program Counter
My first idea was to use the same program counter as the one used in Build your own Mini6502: the address is kept in the SRAM, copied to the ALU, incremented and outputted to the address bus and saved back to the SRAM at the same time. But the instruction bytes needed to perform these various steps using this CPU have to be read from the ROM which means that the program counter has to be increased every step to be able to read these bytes: a kind of chicken-and-egg problem. Conclusion: hardware is a must.So I decided to use an automated counter based on the one of Build your own 6502. But instead of 191 I used 161 counters. The 161 can be reseted, the 191 can not, and this enables, or rather forces, me to start at address $0000.
The 161s will be clocked by the same clock as the 393 counter. The major error I made in version 1 was that I overlooked the fact that the 161s count at the rising edge of the clock and the whole design is based on the fact that I expected the Program Counter to count at the falling edge of the clock. So I had to shuffle things to free an inverter, IC35C in this case.
In case of a jump, two 573 latches, IC08 and IC14, have to be filled with the new address. By negating the /LOAD input of the four 161s the new address is copied into them. How this signal for /LOAD is generated will be explained later.
In this design the Program Counter counts up at every clock cycle, thus also at the fourth cycle. Remember, this is the cycle where the actual action takes place like reading a byte from somewhere in RAM or setting an I/O register. So if we are dealing with a non-operand instruction where this byte is not used at all, then this byte can become waisted. "Can", because this fourth byte can be accessed in an indirect way so it is not a complete loss. But it will need some creative programming, see later.
The 161s cannot be tri-stated so two 541 buffers, IC22 and IC31 take care of that. Control signal I7 takes care of en- or disabling the 541s. When the code latches are filled with data, their outputs are disabled and all control signals are (H). To make sure that these 541 buffers are enabled, I7 is inverted first.
The temporary address lines A0..15
Thinking things over I soon found out that I could not use the Program Counter for temporary accesses to the memory or I/O. The reason is simple: setting the Program Counter for reading a byte means that it will continue the program at the address after the one needed for accessing the memory or I/O for the simple reason that I have no means to restore the Program Counter automatically to the original address immediately after that action. So I have to use some buffers that will contain the needed address for that moment: IC19 and IC20. Control line I7 can be used here as well and it selects whether the temporary address buffer (not inverted) or the Program Counter (inverted) is active.The ALU
For the ALU I decided to use two 74181s, the world's most well-known ALU IC. I was tempted to use EEPROMs here as well but then it wouldn't be an aal TTL CPU.The data needed as inputs for the ALU will be stored in two 573 latches, IC27 and IC28, first. The advantage of this design is that the flag information from the ALU stays available when the data on the data bus has been changed by other operations. IC03, a 573 latch, takes care of storing and outputting the data created by the ALU towards the internal data bus.
If needed, these three latches can be used as temporary internal registers.
The Flags and the use of them
So far I will only use three flags: Carry, Zero and Minus. The 181 outputs a zero flag but this one is only valid for four bits. OR gate IC34C takes care of combining the signals of both 181s to create a zero flag valid for a byte.Both 181s also output a Carry flag. The one coming from IC26, the first 181, is fed into IC29, the second 181. The one coming from IC29 is fed into IC36A, a 74 D-flipflop. The Q output of the D-flipflop is fed into the not-Carry input of the first 181, IC26. IC36A, 74 D-flipflop, has two functions:
- It enables the CPU to remember earlier states of Carry. Useful for ADC (ADd with Carry) equivalent instructions.
- A program can set (I10) or reset (I8) the Carry on demand
Remark: the 181 uses an inverted Carry for whatever reason and I will keep it that way in this CPU. So in the hardware used here, an active Carry is LOW. So that's why I10, although it resets output Q of the flipflop, it sets Carry.
If the Carry from the ALU has to be clocked into IC36A, signal I9 has to be set (L). At step 1000 all outputs of the instruction latches are tri-stated. The pull-up resistors pull the various Ixx signals (H), including I9. This causes the D-flipflop to latch the bit at the D input. Because ALL Ixx signals are pulled (H), this also means that function inputs of the ALU ICs are pulled (H) and this can change the level of the Carry at the D-input of IC36A. To make sure that the Carry is clocked before a possible change, I added an extra resistor, R2, to line I9 to make sure that the rising flank is steeper. I also count a bit on the internal delays of the ALU ICs.
The minus flag is derived from bit 7 of the output of the 181s = pin 13 of IC29.
The flags can be used for conditional jumps or branches. The advantage of branches is that an executable can be relocated within the memory which is impossible with an executable using jumps. But using branches means that the new address has to be calculated in real time, something that is built-in in a 6502 but will cost a lot of instructions for this CPU. It will be up to the programmer to decide what to use.
How to select the flag needed for the condition? IC09, a 74LS153 4-to-1 multiplexer enables one to choose from four flags:
- bit 7 from the output of the ALU as the Minus flag
- the output of OR gate IC34C as the Zero flag
- the Carry output of ALU IC29
- the Q output of D-flipflop IC36A as the Carry flag
Controls I20 and I21 determine what flag is chosen.
The next step is to feed the signal of output Y of the 153 into OR gate IC34A and, through the inverter IC24C, into OR gate IC34BC. The output of these two OR gates is fed into an AND gate, IC23C. The output of this AND gate is connected to the /LOAD inputs of the 161s mentioned above.
I22 and I23 control the behavior of the two OR gates. If both controls are (H), both outputs of the OR gates will be (H) as well. Therefore the output of the AND gate and the /LOAD inputs will also be (H) as well. This means no active /LOAD and the 161s just keep on counting. During the first six steps the pull-up resistors make sure these lines are (H) anyway.
When negating both I22 and I23 at least one of the outputs of the OR gates will be (L) and this will cause the AND gates to output a (L) which will on its turn cause the 161 to copy the address saved into the 573 latches IC08 and IC14. In short this means that when both controls are (L), the 161s will behave like a jump.
The two last possible situations are the ones where one of the controls is (L) and the other (H). Let's have a look at the following table:
I23 I22 Flg | /LOAD -----------------+------------- 0 0 0 | 0 = jump 0 0 1 | 0 = jump | 0 1 0 | 1 = count 0 1 1 | 0 = jump 1 0 0 | 0 = jump 1 0 1 | 1 = count | 1 1 0 | 1 = count 1 1 1 | 1 = countThe first two and the last two rows have been explained already. Rows 3, 4, 5 and 6 handle the case where a conditional jump is needed. In words:
- 'I22 = (L)' and 'I23 = (H)' handle the situation when a jump is needed when the chosen flag is not set.
- 'I22 = (H)' and 'I23 = (L)' handle the situation when a jump is needed when the chosen flag is set.
The above circuit enables us to have instruction (more or less) like the Z80's "jp z,xxxx" or "jp nc, $YYYY".