What is it?This document describes the ins and outs of hoe to build a 6502-like processor.
Is it possible?Building your own processor, is that possible? Yes, it is. First of all, forget that processors are just one single IC. Computers have been constructed already long before the IC was invented. Intel's 4004 is considered to be the first one-chip CPU in the world and its production started in 1971.
In the early days all parts were built using simple components like relays, tubes and transistors. And if you don't believe me, please have a look at this site, a ring of self built processors. Very interesting indeed! It will show you, for example, some processors using relays as logic gates.
And if you don't like the 6502?But what about those amongst you that don't like the 6502 and prefer a 6800, 6809, Z80 or Pentium? Even then this design can be useful. First of all: I can explain the design in such a way that it can serve as base for many processors. But if I only give you theory, you will surely loose me at one moment. So I want to help you by giving a real example, an example you can build.
But why a 6502? My very first reason: I'm most familiar with this CPU. But if I wanted to tell you how a car worked, it wouldn't do any good by giving you the newest Mercedes or Porsche as example to build yourself; you wouldn't find all the needed parts. But you will find enough parts on a scrapyard to build yourself an equivalent of the VW Beetle or Fiat 500.
But keeping it simple, very simple in this case, some sacrifices have to be made. And the biggest one of them all will be speed.
Second (if you don't like the 6502): the Instruction Decoder is responsible for handling the opcodes. Changing its contents will change the behavior of the opcodes. The code of NOP for a 6502 is $EA. Nothing can stop you to use $00 instead; the code used by the Z80 and 6800!
But at some point hardware has to be built and I made the decision to make it 6502 compatible. A possible problem: the pin out of the 6502 is different from that of the 6800, 6809 and Z80. Now you can do three things:
- Use a 6502 based system, for example an Apple II, as base and run Z80 code on.
- If you don't have such a system, build your own computer around it. Some RAM, ROM and a RS-232 interface will do. Now use a PC as terminal to control your baby.
- Adjust the pin out. Should be easy for the 6800 but in case of the Z80 I'm afraid you have to alter some of the design as well.
I already mentioned the reallocation of the code NOP. One step further: you are absolutely free to create you own opcode set! But there is one huge disadvantage: every bit of software you have to write yourself, including an assembler so you can write it at all.
Using a 6502 as base has two advantages:
- You will find a lot of software for it. But most important: you will find a lot of assemblers.
- You still can create your own opcodes by replacing the KILL opcodes (and other illegal opcodes) by your own ones.
General descriptionThe hardware is kept as simple as possible. The whole can be divided in five parts:
- the interface to the outside world
- the registers
- the ALU
- the branch part
- the Instruction Decoder
Instruction Decoder: the output
The Instruction Decoder (ID) is the heart of the processor because it is the ID that decides how the processor behaves. The heart of the ID is, what I called, the super ROM. In this case I use FlashRAMs in combination with a 74ALS573 or 74ALS574 latch (which one you use doesn't matter).
Why the latch? The moment you change the input of the address bus of the FlashRAM, the output will change as well. But if all these input bits don't change at the same moment, the FlashRAM can output the wrong data. Even it is a matter of a few nano seconds, bad things can happen. Then there is the matter of the access time of the FlashRAMs: it takes some time before the output reacts on the changed input. For FlashRAMs this is between 15 and 25 nano seconds. But it can differ for the same type of FlashRAM. So we run the risk that one FlashRAM enables an output it controls while another FlashRAM hasn't disabled its controlled output yet.
All latches copy the data of their FlashRAM and output it all at same the moment when it is sure that the data is stable.
As said, the latches have to be triggered by a pulse the moment we are sure that the signals outputted by the FlashRAMs are stable. First we have to deal with the access time itself. My idea: use a delay line IC (IC15) from a scrapped PC. This IC outputs the input signal at different timing intervals at several output pins.
IC15 is fed by CLK0. The needed pulse is created by feeding the outputs of two taps of the delay IC to an EXOR gate. This gate outputs the difference in time between these outputs as a pulse:
____________________ ________________ PHI0 ____| |____________________| . . . .____________________ ___________ TAP1 ____.____| |____________________| .<-->. = acess time ____________________ ________ TAP2 ____________| |____________________| __ __ __ EXOR _________| |_________________| |_________________| |________The 574's only need the rising edge of this pulse.
Instruction Decoder: the inputWhat input do the FlashRAMs need?
- a counter
- the opcode
- a signal that tells there is a reset or interrupt
- a branch signal
The counter circuitThe opcode itself is a collection of several micro instructions. Some of these micro instructions can be executed parallel, some have to be executed in a certain order, read: at the right step. The counter tells the ID what step has to be executed.
The base signal for the counter is the clock signal provided by the motherboard through pin PHI0. But it can't be used as so, some things have to be done with it before we can use it.
Because PHI0 must serve as input for many other gates in my design and not being sure if this will stress the original system, I decided to buffer it first with IC13d, a left over AND gate.
The 6502 outputs PHI1, nothing more than an inverted PHI0 and it is created by inverting PHI0 using IC07b. The PHI2 needed for this card and the motherboard is generated by inverter IC07c. Two EXOR gates delay the result (see later).
The actual counter is IC10a (393: 4 bits binary counter with clear). Together with PHI2 we now have a five bits counter where PHI2 is the (Lowest Significant Bit (LSB), now called CLK0. These five bits, good for 15 clock cycles/31 steps, are directly fed to the backplane and serve as inputs for the FlashRAMs of the ID.
The outputs of the 393 are also fed to a 4-input NOR gate, a gate created out of an OR gate (IC06c) and a 3-input NOR gate (IC11b). At the end of an instruction the 393 is cleared and all its output become (L). At that moment the output of this 4-input NOR gate becomes (H). This output is AND-ed with PHI0' using AND gate IC13d. The output of IC13d is used as trigger to latch the data into two 573s, IC09 and IC12 (see later).
Resetting the counter
The idea is that two things have to be done at the last cycle of an instruction:
- prepare the Mini6502 to read the opcode of the next instruction
- reset the counter
The last is done by setting ID output I05 (H). After some processing I05 is used to clear the 393. What kind of processing has to be done?
First, the Reset signal must be able to reset the counter as well. So before the signal reaches the 393, it has to be OR-ed (IC06b) with the inverted (IC07e) Reset signal. The effect is that an active Reset keeps the 393 in reset mode, it will output 0000 as long as the reset signal is active.
Second, once the counter has been resetted by I05, the clear signal must be disabled again before the falling edge of PHI2 otherwise the 393 won't be able to count at all. We just happen to have such a signal in our system that can do the trick: the pulse for the FlashRAM latches. The AND gate IC13b reduces the actual clear signal to the size of the pulse for the FlashRAMs.
Setting I05 can be done any time. If done on count xxxx0 it won't do anything because it is mixed with PHI0 first, see later. If done on count xxxx1, the counter turns into count 00001. But, as the FlashRAMs already have been latched, nothing will happen any more. And that is the reason the last cycle of the previous instruction has to prepare the decoder to read the opcode of the next instruction.
Now I05 has to be cleared again. This is done at the latch of count 00010. But that won't happen immediately at the moment the FlashRAM pulse becomes (H) and so we run the risk that the 393 is reset again by another, now unwanted pulse. By AND-ing (IC13a) I05 with PHI0' we make sure it is already disabled at forehand.
Latching the OpcodeIC09, a 573 8 bits latch, is the so called Instruction Register: it latches the opcode. The opcode is only present at the first step. The Instruction Register makes sure that the opcode is available for the other steps as well.
Latching SO, Reset, NMI and IRQFirst a bit of history. The first computers didn't have a reset or interrupt. When a program had been loaded in memory, the Program Counter was set by hand, the clock was fed to the processor and the computer started to compute. When the program was finished, the computer was halted by stopping the clock.
But nowadays processors have one or more of the inputs mentioned in the title. It should be obvious now that I have to feed them to the Instruction Decoder, but how?
I first thought about feeding them directly to the inputs of the super ROM. But when I started to design TTL6502, FlashRAMs were very expensive and I had to use EPROMs. But the reasonable priced EPROMs had too less inputs, thus another idea was needed.
Then it occurred to me that during a reset or interrupt, the opcode part of the IR wasn't used. So I decided to feed the IR with these signals at the opcode part. Now I only needed one signal the tell the ID whether it was dealing with an opcode or a reset or interrupt: RIND. IC12, a 573, latches Reset, IRQ and NMI and feeds them to the ID when needed.
FYI: the 'D' of RIND stands for 'data'.
NMI, a negative edge triggered interrupt, is inverted first (IC07f) and fed to the CLK input of a 74 D-flipflop (IC08a). Why not tying NMI directly to PRE and saving a gate? At the end of the process the ID has to reset the flipflop. Using PRE as input would immediately set the flipflop again and thus forcing the decoder to repeat the whole process.
After handling NMI, the flipflop is resetted by I06.
IRQ is a level triggered interrupt and it is only checked at the end of PHI0. IRQ can also be disabled. NOR gate (IC11c) serves all these demands in one go. If the IRQ is still active (L) at the end of PHI0 and the disable bit is inactive (read: low) as well, then the moment PHI0 becomes (L) the rising edge of the output of the NOR gate will trigger the D-flipflop.
After handling IRQ, the flipflop is resetted by I07.
Reset doesn't suffer the above problems because the counter cannot start until the Reset signal is inactive again. Therefore it can be connected directly to the PRE input.
After handling Reset, the flipflop is resetted by I07. Yes, this is the same signal as for resetting the IRQ flipflop. The idea behind it: resetting the Reset flipflop at any other point in time won't have any effect so why not using an existing signal? For the same money I could also have used I06, the NMI one.
If for one or another reason the system is resetted the moment I07 is (L), there is a possibility that the flipflop will toggle its outputs and thus presenting wrong information to the latch and other parts. But that is no problem because during a normal run count 00000 can normally only occur during a reset. In that case the ID has been programmed to reset all outputs to default values: I07 will be set (H).
SO is an input that is typical for the 6502. SO stands for 'Set Overflow' and does nothing else then setting the Overflow flag the moment the SO pin is negated.
The Overflow flag is a part of the Flag register, so why do I describe the SO input already here? This Mini6502 is a simplified design of my TTL6502. When I described the Flag Register (see later), I noticed I had simplified the design a bit too much: I had no means to set the Overflow flag anymore! I have really thought about dropping this input completely because the only circuit that I know of that makes use of this input is the Commodore 1541 drive. But a bit later I realized I could set the flag in another way.
The Instruction decoder of the TTL6502 has a circuit to detect the ABORT signal of the 65816 processor. I dropped this idea to emulate the 65816 later for several reasons but kept this little piece of circuit for the simple reason because it was made out of left over gates anyway. I think you already guessed it: instead of the ABORT signal the circuit now detects the presence of the SO signal for the Mini6502.
For the rest I can be very short about the circuit: SO is treated the same way NMI is. IC07c is the inverter, IC05b the flipflop and I04 the reset signal.
The further processing of SO, Reset, IRQ and NMI
The four \Q-outputs of the flipflops are latched by a 573 (IC12) at step 00001 and NOR-ed by IC06a and IC11a. If one or more of these signal are (H), IC11a's output becomes (L). The output of AND gate IC13d, that is used to latch the data into IC12, is also used to clock D-flipflop IC05a. The flipflop latches the output of the NOR gate.
The outputs of this flipflop now either activate the outputs of IC09 (= opcode) or those of IC12. One of these signals is RIND, the signal that tells the ID whether an IRQ, SO, NMI or Reset is detected or not.
Why is D-flipflop IC05a needed? Just like the opcode or the above signals the output of IC11a can change in the middle of an instruction and therefore has to be preserved as well. We cannot use a free pin of IC12 because it isn't available all the time, so we have to use a separate latch; IC05a in this case.
SYNCSYNC is a 6502 signal that tells the outside world the first cycle of an opcode is being processed. Just by coincidence the output of IC13a is exactly what we need.
ReaDY signalRDY is a signal to tell the 6502 to halt as long as RDY is (L). The outputs of the 393 are increased at every falling edge of PHI2. The basic idea is that RDY prevents PHI2 to reach the counter. Before PHI2 is fed to the 393, it goes through an OR gate, IC06d. This gate ORs PHI2 with the \Q-output of IC08b, a 74 D-flipflop. This D-flipflop represents the state of RDY at the end of PHI0 and does this by saving the state of RDY at the rising edge of PHI1. If RDY is (L), output \Q becomes (H) and will block all pulses from PHI2 towards the 393 and ID by keeping CLK0 (H).
I mentioned before that I use two obsolete 86 EXOR gates to delay PHI2. Why? There are several delays in the circuit and I just want to be sure that the output of \Q reaches the OR gate IC06b before PHI2 does. If it wouldn't do so, there is a change that the OR gate still could output a (L) pulse long enough to make the 393 count.
Extra opcodesSo far I only need 15 inputs for my Instruction Decoder. Using 29F040s or equivalents means I have 4 left over inputs that I can use for something else. The idea rose to use (part of) them to create extra opcode sets and/or processors. A 174, 4-bits D-flipflop, takes car of this.
Extra opcode sets can be created by using two- or even three-byte commands. How does it work? The extra ID bits are set at step 00010. At the rising edge of the negative ID latch signal the data is copied from the D-inputs to the according Q-outputs. And on their turn these outputs control the extra ID inputs. The next stage is acting if this was just a one-byte opcode, thus preparing the ID for reading the next opcode from the data bus and resetting the counter.
The read byte will be treated as just another opcode. Assuming that at least one of these just set ID bits isn't (L) anymore, the according instructions for this opcode are read from another location inside the FlashRAM and therefore can be completely different from those of the original 6502.
Although the schematics say to use 29F040 FlashRAMs for the ID, I don't have enough of them and will use 29F020s. This is still enough for seven extra instruction sets. But how to use them? The 6502 has some so called KILL opcodes; when executed, these opcodes will freeze the 6502 completely. We can use these opcodes to create two-byte opcodes.
If compatibility is a must, then we can use one of those KILL opcodes and a combination of two and three-byte opcodes. But which one? The 65816 hasn't any illegal opcodes anymore but there exists an unused opcode: WDM. WDM stands for "William D. Mensch", the designer of the 6502 and 65816. This code was meant to be used in future 65xxx CPUs. My idea is to use this opcode as base for the multi-byte instructions.
When emulating other processors using the same set of FlashRAMs, I already need to point the ID to another location inside the FlashRAM at Reset. This can only be done by setting (a part) of the extra opcode bits O8..O11 by hand. This is done by lifting the according output pins of the 175 out of their sockets. Resistors R1..R4 pull the lines (H). Jumpers J1..J4 enable you to set the right combination to select the wanted processor.
The interface to the outside world and Registers
CPU in/outputThis board is the actual interface to the outside world. Connector CON3 represents the 6502. Why a 40 pins header? I already have a cable laying around with a 40 pins header on one end and an IC like connector on the other end. Using a header has another advantage: it is easier to connect the card directly to other boards, in my case my own test card.
PHI0 comes from the host system and goes to card with the Instruction Decoder through connector CON1. This card on its turn generates PHI1 and PHI2.
- Two 573s, IC02 and IC03, take care of the address bus by latching and buffering the data coming from the internal data bus.
- A 573, IC04, takes care of reading and latching the data coming from outside. Why is latching needed? That is to make sure that the data is also available to the Mini6502 after PHI0 has become (L).
- A 573, IC05, takes care of latching the internal data and presenting it to the outside world.
RegistersEvery processor has several registers. The Program Counter of a 6502 and Z80 is 16 bits wide, as well as the Stack Pointer of the Z80. The one of the 6502 is only 8 bits wide. The 6502 has only six registers, the Z80 twelve 16-bits ones and four 8-bits ones. I could use 74ALS573s, an 8-bits latch, to create the registers but then I would blow my design out of proportion. More practical would be to use a standard memory IC. The smallest 8-bits one I know is the already mentioned 6116, IC7, a 2 KB SRAM.
Another practical decision is the number of bits for this internal data bus. For the Mini6502 this is eight bits. This means that I need two steps to fill the address latches instead of the single step in the original 6502. But if I had chosen for a 16-bits data bus, I also had to provide circuits that enabled me to let the low byte bus exchange data with the high byte one and vice versa.
The ALU and Flag Register
The ALU, short for Arithmetic Logic Unit, is the calculator of the processor. But one with limited functions. The 74181 is an ALU IC and is used by many other designers. The problem is that the ALU of the 6502 also has a so called BCD mode (Binary Conversion to Decimal): it is able to calculate in decimal mode. And so far I haven't seen any ALU IC capable of it.
I have thought about using 181s plus and extra circuit for the BCD mode but that would expand the design to much. So I decided to use FlashRAMs here as well. The idea is to program every possible situation into the FlashRAMs.
Needed inputs for the ALU FlashRAMsAs you can see I use two cascaded FlashRAMs, each handling four data bits. Handling eight bits would mean I would need a FlashRAM with at least 24 inputs. It is possible that it exists but I don't have or know one (that is, not in +5V). By cascading two FlashRAMs I get the same result.
The FlashRAMs need to be able to handle at least the next commands:
- CMP / CPX / CPY / SUB
- DEC / DEX / DEY
- INC / INX / INY
Four selection bits will cover the above 14 commands.
An extra bit is needed to deal with the decimal mode. It has to be an extra bit because it is an external input coming from the Flag Register.
After every operation the processor may want to know if the result was zero. In this case the second FlashRAM has to know whether the result of the first FlashRAM was zero or not.
In case of an addition, subtraction or a shift the FlashRAM may need the Carry of the Flag Register or the previous FlashRAM. In case of a shift to the right, the FlashRAM may need the Carry of the Flag Register or the next FlashRAM. This means two inputs for the Carry.
The result (for the moment): - Zero flag
- Carry flag (2*)
- Decimal mode flag
- 4 bits 1st operand
- 4 bits 2nd operand
- 4 command bits
Only needing 16 inputs now would mean that a 29F020s is more than perfect for the job. The two extra input bits make sure I don't have to worry about implementing extra commands.
The Flag RegisterIn case of a conditional branch, the processor needs the data of the Flag Register to decide whether it has to make a jump, or not. But how is the Instruction Decoder notified of the state of each flag? When needed the contents of the Flag Register inside the SRAM is copied to a 74ALS573 data latch. The processor only needs to know the state of only one flag at a time. A 74LS151 8-to-1 multiplexer takes care of that by selecting the needed flag. It outputs the state of this flag as the signal BRAD (BRAnch Data) towards the Instruction Decoder.
The 74ALS573 is not only needed for branching but it also makes sure that the state of the Interrupt Disable bit is outputted all the time.
In this simplified design the ALU does ALL the calculations. If the Program counter needs to be incremented, the ALU will do that. Most of the time only the low byte of the address needs to be increased. But if a page boundary is crossed, that is the low byte goes from $FF to $00, the ALU will generate a Carry. This will be a sign for the processor to increment the high byte of the address as well. But we have to notify the processor in one or another way but we cannot use the Flag Register as this isn't a regular Carry. A 74LS74 D-flipflop stores the state of this Carry and the multiplexer mentioned above sends it on request to the Instruction Decoder.
The ALU also does all the calculations in case of a branch. But in this case a branch can be either positive or negative. In case of a negative branch, the high byte of the Program Counter may have to be decreased. Again the ID has to be informed of this. The multiplexer does this by selecting the MSB of the second operand as BRAD signal.
Static addressesAfter a reset, the address bus has to output the addresses $FFFC and $FFFD. And when serving an interrupts or accessing the stack, other addresses have to be outputted. What circuit is going to take care of that all? TTL6502 uses two extra FlashRAMs connected to the internal address bus for this purpose. Having no internal address bus anymore, I decided to let the ALU perform this function as well. And now the two extra inputs are more than welcome! Extra commands cause the ALU to output the needed data.
Testing everythingI use an I/O ISA card to test everything. It is quite possible that you don't have this card but the next part can give you ideas to create your own test device.
One question to myself was: "How do I test everything?". Not just the whole design but also every single card. I just happened to have some PC ISA cards with four 8255s on them. These four 8255s are good for 96 in- and outputs, presented to the outside world by two 50-pins headers, CON1 and CON2.
In my design the two 8255s behind CON1 are used to simulate the host system. This done by connecting it to the 6502 connector on card #1 through connector CON 3.
As you can see I use two ports for the data bus: one to write data, the other to read data. A 541 buffer, IC2, buffers the data towards the TTL6502. The idea is quite simple: when the TTL6502 performs a write operation, the R/W line will be (L) which on its turn, with the help of inverter IC1d, will disable the 541. During a read operation the output of IC2 will be enabled and the TTL6502 can read the data.
People who are familiar with the 8255 may wonder why I did it this way; a port can also be programmed to be either input or output. In most cases the TTL6502 is in read mode and therefore the PC will be in write mode. The problem is that the PC only then will notice that the TTL6502 has gone into write mode when the R/W line becomes (L). The detection has to be done by polling and that will take time. And once the PC has found out about the new situation, it has to reprogram the control port of the 8255. And this will also take time. And all that time long two ICs are outputting data on the same bus.....
The two 8255s behind CON2 and one port of CON1 are used to either monitor the signals on the backplane or to generate them if the responsible card isn't present. Whether a port is in read or write mode depends partly on what card is tested and partly on what action is needed. If the Instruction Decoder is not amongst the cards to be tested, CON2 will generate them, otherwise it only has to read the signals.
A small problem is BRAD, the signal that helps the ID to decide whether a branch has to be taken or not. If the FR isn't present, the computer must generate it, in this case using output pin Con1 2C3. If the FR is present, it only has to be read. The problem is that, in contrary to the 6522 or 6526, you cannot select a single bit of a port to be in- or output. This can only be done in blocks of four or eight bits.
So if the FR is present, we use Con1 port 1C as an input. Jumper J1 then takes care of making sure that both signals won't clash.
The backplaneThe backplane is the connection between all boards. As base I'm using 64-pins AB-DIN connectors. The pin out:
- 2 * +5 Volt
- 2 * GND
- 8 * internal data bus: D0..D7
- 5 * counter for the ID
- 8 * opcode for the ID
- 1 * branch signal for the ID
- 1 * Reset/IRQ/NMI signal for the ID
- 1 * latch signal for the ID
- 4 * future expansion for the ID
- 1 * external SO signal for the flag register board
- 1 * disable interrupt signal from the ALU board
- 30 pins for future use
AfterwordI think that it is clear that the ALU does most of the work. Not just the calculations like those of a calculator but also thing like increasing the Program Counter and in- or decreasing the Stack Pointer.
Did I oversimplify the things too much? No, I didn't. It is in fact the way the first computers worked. I read somewhere that some Americans managed to create a working computer in the early 50's using 'only' 3000 radio tubes. That was only possible using circuits over and over again within the same design.
Then what about speed? The only 'computers' that were around were humans with mechanical calculators. The first American computers were used to calculate artillery trajectories. It took some humans several hours to do that, the ENIAC did it in 20 seconds.
And compared to these computers, the Mini6502 is huge as well. But in the mean time I found out that with the TTL6502, I did a too good job. When I started to write the Pascal program that should calculate the contents of the FlashRAMs, I noticed that I needed fewer steps than the 6502 to get the same result. So I had to add dummy steps to remain compatible. And that was already before I turned it into a 16-bitter!
You can email me here.