Ruud's Commodore Site: Hardware course Home Email

Hardware course




What is it?

This document describes the ins and outs of digital electronics and how it is used in computers, especially RAM and processors. On the end we build a 6502-like processor.


Some words about reading this document.

I am a Dutchman and English is not my native language. This is the very first version and you certainly will find errors. You do me a favor by letting me know where to find them and, more important, how to correct them.

If you are not familiar with electronics and don't understand something or want to know more precise info, let me know.
If you are familiar and see errors, let me know as well.


Foreword.

For a lot of people a computer is just a black box that appears to be very intelligent and can perform a lot of magic tricks. I even know people who are more or less afraid for computers and don't want to have anything to do with them. Explaining that their beloved cell phone is just another type of computer doesn't help a bit. But I can assure you that computers aren't dangerous at all. At least I never heard that someone was bitten by one :)
Ok, I personally know a case where a man got a static shock from a computer (or was it the other way round?). Anyway, the man survived the shock but the computer was dead :(

There is also a group of people who work with computers but have no idea how they exactly function. And don't care about it. But a part of this group would like to obtain this knowledge. Once they know how the hardware works, there is a good chance they can make a better use of computers. And this document is meant to give this group some of the needed knowledge.
People will also see that the computer is just a very dumb machine with no intelligence or what ever. And what about the miracles a computer performs? Find a monkey which does exactly what you tell him to do and it can build a space shuttle. Find a few hundred more and they can build the same shuttle in 10 days. But these hundreds of monkeys will never invent one!

Some small knowledge is needed: the meaning of "voltage", "current", some knowledge of mathematics and "binary-decimal" conversion.


The operator: the human processor

Computers perform programs. Programs on its turn contain a collection of instruction. An instruction for a machine operator could be like this: If valve A is in this position, switch B is in that position and scale C reads this value then do this, this and that. With valve B having more then one position, same for switch B and knowing that scale C can show you a lot of different readings, you can understand you need a lot of different actions for the various situations.
In every case one thing must be sure: for a given situation the operator must always perform the same action. It is completely out of the question that one time he does this for a given situation and the other time that.
An operator can perform more then one instruction in a row. Where does he get these instructions from? A good example could be that his boss gives him a task (= program) and that all instructions needed to perform this task can be found in a manual. Computer programs can be stored in ROM and RAM. The instructions are the bytes in the ROM and RAM. So let's start from the very beginning and execute a program.

Before an operator can start his work at all, he has to do some initial work first like entering the building and turn the main power switch. How does he know? Because it is stored (programmed) in his memory (RAM) because someone thought him so. But let's go back, way back to moment he was "powered up", the moment of his conception, what program did start him up? Very simple: his ROM, or in biological terms, his genes.

Now let's find out how we can translate the above story into electronics....


ICs

If you ever have seen the inside of a computer, you probably noticed little black boxes with many shiny legs on green boards. If the board was not green, don't worry. Most are. Why? I don't know.
Those little black boxes are the so called ICs or chips. IC stands for "Integrated Circuits". Most ICs have little codes printed on them. It is very rare to see ICs without codes. In that case there is a good chance that, if you look more closely, the original numbers have been erased. Why? Just to make it hard to reengineer and copy the product.
In most cases one of the codes is the type of the IC. Another possible code is the year and week of production.

What do these ICís contain? Electronic circuits, mostly in the form of transistors, diodes and resistors. Their function? That can vary from simple gates to circuits as complicated as processors. There is a good chance that you use a computer to read this document and therefore I assume that you at least know that there is a processor inside your computer.
"Gates" are the smallest units we know in the world of digital electronics. You can compare them with atoms. As atoms are needed to create molecules, so gates are needed to create processors, counters and other digital logic. We can even go a step further. As atoms are made of protons, neutrons and electrons, gates are made of these already mentioned transistors, diodes and resistors.


Resistors

Resistors can have various functions in a circuit. But the main function inside computer ICs is to limit the current through a part of a circuit.


Diodes

You can consider diodes as one-way-doors for currents: the current can only flow in one direction. The voltage drop over a Silicium diode in that case is about 0.7 Volt. The voltage a diode can handle when blocking the current, is not unlimited. Once this limit is reached, in most cases the diode will be destroyed.

Some remarks about the 0.7 V voltage drop:
With a 0.7V voltage drop and a current flowing through the diode, it dissipates energy in the form of heath. So the current through the diode is limited by the heath it can dissipate.
Before using Silicium, diodes were made of Germanium. The voltage drop over a Germanium diode is only 0.2V. But compared to a Silicium diode, a Germanium leaks a factor hundred more current then a Silicium diode when in blocking mode.

A diode is made of two pieces of Silicium. But each piece is doted with another material. In one case most of the time Borium is used. This Borium doted Silicium is referred to as P-Silicium. The other piece is doted with Phosphor, referred to as N-Silicium.
    |  
    |              
   PPP
   NNN             
    |               
    |   
When the voltage applied to the P-part is higher then the one to the N-part, a current will flow from the P-part to the N-part. If it is the other way around, no current will flow. The 'why' goes a bit too far to explain here.


Transistors

Then someone added a third layer:
   ++ ------+     collector
            |              
           NNN             
    + -----PPP    base          
           NNN             
            |               
    - ------+     emitter   


The above picture shows a transistor of the NPN type. PNP types do exist as well and, in fact, were the first types to be made. The symbol is about the same.

The transistor is in fact a current amplifier. But in case of digital electronics, it is only used as a switch. And this is all you have to know for the moment. Have a look at this circuit:

When the switch is closed, a current can flow through the switch, R1 and then from the base, the pin marked with 'b', to the emitter (e) of the transistor. This causes a current to flow through the transistor from the collector (c) to the emitter (e). The current flowing from the collector to the emitter (Ice) is a multiple of Ibe, the current flowing from the base to the emitter. The amplification factor can vary from 4 to 400, dependant of the type of transistor.
In the above schematic Ibe is about 0.4 mA. In case of a BC547 the amplification factor is about 200 so Ice could become 80 mA. But in this case the current is limited by the resistor and LED to about 11 mA. If we measure the voltage between the collector and GROUND, it will be nearly 0V.
When the switch is open, Ibe is 0 mA. On its turn this means Ice is 0 ma as well. As there flows no current through the resistor and LED this means there is no voltage drop over them. On its turn this means we measure 5 V between the collector and GROUND.
Resistors R1 and R2 limit the current to acceptable values. Resistor R3 is only needed to make sure that no static electricity can activate the transistor by accident.

Now let's have a digital look at the circuit: applying 5V to the input outputs 0V, applying 0V outputs 5V. Translate 5V with "True" and 0V with "False": We have an inverter!

As said, I used 5V to feed the above circuit. That is no coincidence. In time several techniques were developed to be used for computers. From the 70's on the far most used technique was the so called TTL technique. TTL stands for "Transistor Transistor Logic". At some stage in the development of this technique, it was decided to feed TTL circuits with 5V. Why 5 Volt, I don't know. Maybe because it was a nice round number.
Is it possible to feed a TTL circuitry with another Voltage? Yes, it is. I learned to build TTL circuits using a big flat 4.5 V battery and things worked fine as well.
But in time things have changed. Nowadays processors can run at voltages as less as 1.7V.


Delay time

The speed of signals travelling through a line is limited to 300.000 KMH. Although this is very fast, signals need some time to travel from A to B. Doted Silicium is a conductor but, compared with copper, a worse one. This means that every part of a circuit has resistance. Two lines running along each other form a capacitance together. The combination of resistance and capacitance form the problem: it leads to "delay". It simply means that the effect of an action at the input side of a circuit, will take some time before showing up at the output side. This "some time" will only be some billionths of a second but that's enough to cause trouble some times.

One mean to decrease this delay is replacing the resistor by a PNP transistor. But even then we will face problems. The lines between the transistor and the switch, and even the resistors and the interior of the transistor, form a capacitance with Ground as well. After closing the switch, the voltage over the basis-emitter is not 0.7 V instantly, it takes some time.
Combining these things means that, when changing the level of an input of a circuit, it takes some time before the outputs reflect the combination of inputs and function of the circuit. The time one has to wait to be sure that the output is valid after the last change of an input is called 'delay time'.


Creating a gate output

Above we used a transistor to pull the output to 'False' and a resistor to push it to 'True'. But with the delay time caused by the combination of resistance and capacitance in our mind it is better to use a PNP transistor that takes care of pulling the output High (= to 5V). This combination of PNP and NPN transistor is also known as a 'totem pole output'.

Another type of output is the "open collector" output. In this case there is only a NPN transistor which can pull the output Low as we saw in our own example. Nothing else, not even the resistor. This construction enables a designer to tie more outputs together without the need of extra gates. Of course at least one pull-up resistor is needed so the circuit sees 5V in case no output is activated.
The Commodore computers and peripherals using the serial IEC bus, are coupled together thru open collector inverters (mostly of the type 7406).

OC outputs pulling outputs (H) never have been developed as far as I know.


Digital electronics

An inverter is one of the many gates that exist. Another one is the AND gate. The 2 input AND gate functions by the following logic: if input 1 is High and input 2 is High, then its output is High as well. How does an AND gate work internally? To be honest, I don't know. In fact, the above circuit does not represent the internal of a real inverter. If you really want to know, the data books of Texas Instruments often show you the circuit of the various gates.

This and other gates:
 In1 In2 | Out		 In1 In2 | Out
---------+----		---------+----
  L   L  |  L		  0   0  |  0
  L   H  |  L	  AND	  0   1  |  0
  H   L  |  L		  1   0  |  0
  H   H  |  H		  1   1  |  1

 In1 In2 | Out		 In1 In2 | Out
---------+----		---------+----
  L   L  |  L		  0   0  |  0
  L   H  |  H	  OR	  0   1  |  1
  H   L  |  H		  1   0  |  1
  H   H  |  H		  1   1  |  1

 In1 In2 | Out		 In1 In2 | Out
---------+----		---------+----
  L   L  |  L		  0   0  |  0
  L   H  |  H	  EXOR	  0   1  |  1
  H   L  |  H		  1   0  |  1
  H   H  |  L		  1   1  |  0		

Of the AND and OR gate also the inverted versions exists: the NAND and NOR gate:

 In1 In2 | Out		 In1 In2 | Out
---------+----		---------+----
  L   L  |  H		  0   0  |  1
  L   H  |  H	  NAND	  0   1  |  1
  H   L  |  H		  1   0  |  1
  H   H  |  L		  1   1  |  0

 In1 In2 | Out		 In1 In2 | Out
---------+----		---------+----
  L   L  |  H		  0   0  |  1
  L   H  |  L	  NOR	  0   1  |  0
  H   L  |  L		  1   0  |  0
  H   H  |  L		  1   1  |  0

And here is an old friend, the inverter:

 In  | Out
-----+----
  1  |  0 		INVERTER
  0  |  1


Crash course binary-decimal conversion

As you can see, gates accept Highs and Lows. We can go a step further by translating a High with a "1" and Low with a "0". And now we are on the communication level of a computer: it talks a 'language' only using ones and zeros. "1" and "0" are the only two numbers of the so called 'binary language'.
Unfortunately the majority of humans are only used to the decimal system which uses the numbers 0 to 9. So let's have a look at the decimal number 37652:
     4   3   2   1   0      <- exponents
   10  10  10  10  10
 ---------------------
    3   7   6   5   2
                  4  
37652 = 3 times 10  = 3 times 10000
                  3                   +
        7 times 10  = 3 times 1000
                  2                   +
        6 times 10  = 6 times 100
                  1                   +
        5 times 10  = 5 times 10
                  0                   +
        2 times 10  = 2 times 1
Now let's have a look at the binary number 11011:
     4   3   2   1   0      <- exponents
    2   2   2   2   2
 ---------------------
    1   1   0   1   1
                 4  
11011 = 1 times 2  = 1 times 16
                 3                   +
        1 times 2  = 1 times  8   
                 2                   +
        0 times 2  = 0 times  4
                 1                   +
        1 times 2  = 1 times  2 
                 0                   +
        1 times 2  = 1 times  1
                          ----- +        
	                   = 27
So 11011 is the binary representation of 27. Do I know the representation of 37652? Here it is: 1001001100010100. If it is not correct, you can blame Microsoft's Calculator.


Hexadecimal numbers, nibbles, bytes, words and double-words

Using only zeros and ones just because that is what the computer speaks has one major disadvantage: you already need a lot of them to create a relative small decimal number. This makes them unhandable, nearly unreadable and, worse, unspeakable. So they started to group them in packages of four bits. (Actually three in the beginning, but that system was abandoned later). With four bits, called a nibble, we can make numbers from 0 to 15. And a new system was developed to note these 15 numbers:
0000 - 0
0001 - 1
0010 - 2
0011 - 3
0100 - 4
0101 - 5
0110 - 6
0111 - 7
1000 - 8
1001 - 9
1010 - A
1011 - B
1100 - C
1101 - D
1110 - E
1111 - F
Numbers written down in the hexadecimal system start with a "$" or "0x", or end with an "H" or "h". Pure binary numbers start with a "%". For example:

0xA3 = 0A3h = $A3 = %10100011 = 163

The first computers known to the common public, like Commodore and Apple, used processors that could handle 8 bit wide numbers. Eight bits form a byte (*). In the beginning of the 80's the first 16-bit processors appeared: 8086, 68000, Z800. 16 bits were called a word. Then the 32-bitters (double-words) appeared: 68020, 80386, Z8000. In 2001 the 64-bitters appeared, Pentium 2, IBM Mainframe Z-series; quadruple-word.

(*) In the old days a byte could be any number of bits long. With the domination of the 8-bits processors in the 70's (8080, Z80, 6800, 6502), people started to associate a byte with 8 bits only and that remained until today.


Adding numbers

Let's do some adding:
   134  
   379  
  ---- +
   513  
How is it done: 9 plus 4 makes 13. Write down 3, remember 1. 7 plus 3 makes 10. Add the previous remembered 1, this makes 11. Write down 1, remember 1. 3 plus 1 plus the remembered 1 makes 5.

Now some basic rules for binary adding:
     0         1         0         1  
     0         0         1         1  
   --- +     --- +     --- +     --- +
     0         1         1        10  
Let's do a more difficult addition:
    110110100110          3494  
    100010110110          2230  
   ------------- +       ----- +
   1011001011100          5724  
Adding binary 1 and 1 gives a 0 and we have to remember a 1. This particular 1 is called "Carry". Now let's have a look at the basic additions in another way:
  Bit 1:      0         1         0         1  
  Bit 2:      0         0         1         1  
            --- +     --- +     --- +     --- +
  Result:     0         1         1         0  
  Carry:      0         0         0         1  
Now let's make so called 'truth tables' of these findings:
  Bit 1  Bit 2   |   Result        Bit 1  Bit 2   |  Carry
  ---------------+---------        ---------------+-------
    0      0     |     0             0      0     |    0  
    0      1     |     1             0      1     |    0  
    1      0     |     1             1      0     |    0  
    1      1     |     0             1      1     |    1  
Hey, don't these tables look familiar ??? It seems that adding two bits is the same as EXORing them to get the Result and that ANDing them gives us the Carry. See the next circuit, a one-bit adder:



Let's make a two-bits adder. After adding all individual bits, we have to do something with the Results and Carries of each separate addition:
  Carry 1   Result 2   Carry 2   |   Result 2b   Carry 2b
  -------------------------------+-----------------------
     0         0          0      |      0           0    
     0         0          1      |      0           1    
     0         1          0      |      1           0    
     0         1          1      |       Impossible      
     1         0          0      |      1           0    
     1         0          1      |      1           1    
     1         1          0      |      0           1    
     1         1          1      |       Impossible      
Adding two bits can never result in Result=1 and Carry=1 at the same moment. Therefore two lines of the above table are marked as "impossible".

This results in:
   Result 2b  =  Carry 1  EXOR  Result 2                    

   Carry 2b   =  Carry 2   OR   ( Carry 1   AND   Result 2 )
Cascading two more adders will result in a four-bits adder:
For a multi-bit adder we just can cascade as many "adding cells" as needed. Can we now build a 64-bit adder as used in an Intel Pentium 4? Yes.... and No. Remember the 'delay time' I mentioned before? Cascading all these adders means that we have to wait 64 times the delay time of a single adder before we have the final result. And that is unacceptable nowadays.
Texas Instruments is so kind to display the working of their "7483 4-bits adder" in their data books. It took me quite some time to figure out how it worked because of its complexity. TI's circuit may look illogical but the whole circuit is only a few gates "deep". Adding more bits to this design would not affect the delay time of this circuit, just the complexity of it. It was in fact this circuit that focused my attention to the subject "delay time" and, more important, how it can be circumvented.

But how was this circuit designed? In fact TI used the same technics as I used: the truth tables mentioned above. But using four bits meant they got more complicated ones, resulting in the more complicated circuits seen above.


Subtracting numbers.

The next operation we try to perform is a subtraction using a complete 16-bit subtractor:
    0000110110100110           3494  
    0000100010110110           2230  
   ----------------- -        ----- -
    0000010011110000           1264  
Now let's do it in another way:
    0000110110100110           3494  
    1111011101001010        -  2230  
   ----------------- +     -------- +
    0000010011110000           1264  
Instead of subtracting two positive numbers, we add a negative one to a positive one.

How does a negative number look like? The Most Significant Bit (MSB) of a negative number is always "1". The MSB of a byte, word or what ever is the most left bit of a number. The Least Significant Bit (LSB) is the most right one or the bit representing '2 to the power 0'.

But be aware: if the MSB of a byte, word, or what ever is 1, it doesn't always mean that we are dealing with a negative number! So then, how do we know that we are dealing with negative numbers at all? A variable of the type "word" in the programming language Turbo Pascal can have the decimal value 0 up to 65535. But a variable of the type "integer" can have the decimal value -32768 to 32767. Yet both values will be stored as a 16 bits value in memory. Programming languages like C and Turbo Pascal can handle characters. 'a', 'Z', ':' and even ' ', space, are examples of characters. But all these characters are stored as bytes in memory.
The programming language C uses the phrase 'signed word' instead of 'integer'.

Unfortunately it isn't just a matter of inverting the MSB of a number to create its counter part. For example, adding 1 and -1 results in 0. Adding %10000001 and %00000001 doesn't. But adding %11111111 and %00000001 does! (forget the Carry). The same for %11111000 and %00001000, %11111001 and %00000111.
This negative representation of a number is called "2-complement" and that is the way computers store negative numbers. (that is, all the computers I'm familiar with)

How do I create this "2-complement"? Just invert every bit from left to right but stop at the last "1". But this is not an algorithm that can be translated easily into a circuit using gates. Another algorithm we can use is: invert every bit and add 1 to the result. For a human this is more work but the advantage is that this algorithm can be translated into logic: an inverter for every bit and an adder that just adds one to what comes out of the inverters will do.

So to subtract two numbers we need a bunch of inverters and two adders. Isn't there an alternative hardware way to do the subtraction? There probably is but, again, the main goal of this document is to show that it can be done, just to make things understandable. Like with the adding circuit, it is quite possible that it indeed is done in another way. And to be honest, I haven't even thought about a subtraction circuit until writing this article.

Let's have a second look at the question "Isn't there an alternative hardware way to do the subtraction? ". The very first computers only used one adder. They first negated one of the operands and used the adder to add one to it. And then they used the same adder again to add the result to the other operand.
Fast? But in the days people only had relays and tubes, it certainly was efficient.


Multiplier

     111     number A          111        
     111     number B          101        
 ------- *                 ------- *      
     111                       111        
    1110                      0000     [1]
   11100                     11100        
 ------- +                 ------- *      
  110001                    100011        
These two examples should show you that multiplying two binary numbers can be done in the same way as multiplying two decimal numbers. I know, normally someone would not write down the row marked with [1] but I did it to make things more understandable later.
As you can see, just like with multiplying decimal numbers, the next row is shifted one position to the left. How that is done will be explained later.
       111     number A
       111     number B
   ------- *           
       111             
      1110             
   ------- +           
     10101             
     11100             
   ------- +           
    110001             
As you can see multiplying is nothing more then adding some numbers. In the above case we only need 2 adders that have been cascaded. The first adder is first fed with number A if bit 0 of number B is 1 or all zeros if the bit is 0, and then is fed with number A shifted once if bit 1 of number B is etc. etc.
The second adder is fed with the result of the first one plus number A shifted twice if etc. etc.
Multiplying two 8-bit numbers could be done by cascading seven adders.

The above way is the quick way, the whole multiplication can be done in one go. This is the way it is done in the Intel 80286 and its successors. But for a price: 15 / 31 adders and registers are needed and that is a lot of transistors.
The older Intel 8088 and 8086 processors do it in a different way in. Here the result is stored and the same adder is used to add the next shifted number. But this means that the processor needs several cycles to obtain a result.
You probably noticed I didn't mention the 6502 and Z80. The reason is simple: these processors cannot multiply. All multiplications have to be done in software.

If one can multiply by cascading adders, it seems logical to assume that one can divide numbers by cascading subtractors. But to be honest, I haven't researched this subject.


Designing circuits

I can imagine that you start to wonder about the fact that I first explain you how something works, and then throw everything overboard by saying that in reality it is done differently. First remember the saying: "Many roads lead to Rome". And second: it simply is a fact, adders have actually been build in the way I told you, certainly in the time that only relays and/or tubes were used. But then the number of components was more important then delay times. Improved production processes just gave us the possibility to disregard the number of needed components in order to improve the overall performance of a processor.

Now take the 4-bit adder for example. One way to create it is to cascade several adders as has been shown above. Another way to create it is to make a truth table for every single output. With two times four bits and a carry this means a table with 512 rows. Then it is a matter of eliminating the rows and bits which have no influence on the output. It can be done by hand by using so called Karnaugh diagrams, but it is a hell of a job. I myself faced a similar problem in 1986 and wrote a Pascal program to solve it. Once solved by the program, it is just a matter of translating the resulting equations into hardware.

Nowadays one can buy CPLDs and FPGAs, ICs that can be programmed to behave like complete electronic circuits. There exist FPGAs that can be programmed to behave like a Commodore 64, Amiga 500, CPC-64 etc. etc. and yet they only have a size of about one by one inch. These CPLDs and FPGAs can be programmed by various languages of which VHDL and Verilog are the most well known. Check Wikipedia to find out more about them.


Registers and memory

Let's have a look at the following circuit:

Pressing switch S1 causes the upper input of the upper NAND gate to become (L). FYI: 7400 is the code for an IC containing four NAND gates. Looking in the table for a NAND gate, we'll see that whenever one or more inputs are (L), the output becomes (H). This means that both inputs of the lower NAND gate become (H), causing the output to become (L) and the LED will light up. This on its turn means that the second input of the upper NAND becomes (L) as well. So when we release S1, causing the first input to become (H) again, the output remains (H) because the second input is still (L).
Pressing S2 causes the output of the lower NAND gate to become (H) and, using the above explanation, this will result in the fact that the output of the upper NAND gate will become (L) now.

This special combination of two NAND (or NOR) gates is called flipflop.

This flipflop can remember whether S1 or S2 has pressed in the past. But in this case we need two signals to tell the flipflop whether to go into this state or that state. So let's add some more gates to the circuit: two OR gates (7432) and an inverter (7404).

One input of each OR gate, I12 and I22, are tied together and connected to switch S2 and to resistor R2. This means that, as long as S2 is not pressed, R2 pulls these inputs of the two OR gates (H). This means that the outputs of these OR gates are (H) as well, indifferent of the state of the other inputs. Because of that both the inputs of the flipflop are (H) as well which simply means that nothing can happen to the state of the outputs.
One of the so far unmentioned inputs, I11, is connected directly to switch S1 and pull-up resistor R1. The last input, I21, is connected to this point as well but through an inverter. This means that the state of this input is always the opposite of the state of the other input.

Pressing S2 means negating inputs I12 and I22, causing the OR gates to follow the inputs I12 and I22. Because always one of these inputs will be (L), one of the outputs of the OR gates will become (L) as well. On its turn this will negate one of the inputs of the flipflop and cause it to react on it accordingly. In a few words: the flipflop will represent the state of S1 the moment S2 is pressed.

The use of the inverter should be clear: now only one signal is sufficient to control the output of the flipflop. Then why these OR gates? Most likely a computer wants to store more then one data bit so it must have some means to choose in what flipflop it wants to store this bit. These OR gates are the last part of a circuit to choose (in digital talks: to address) a particular bit within the memory (or register set) to store data. Most likely the I12 and I22 inputs are connected to the output of an address decoder.

So at this moment we can store a bit. Now it is just a matter of placing 8, 16, 32 or 64 pieces of the above circuit parallel to each other to handle a byte, word, double-word or quadruple-word.
Remark: when I mean more bits in parallel mode, I will use the byte as example. So for the future, when using the word 'byte', you can read 'word', 'double-word' or 'quadruple-word' as well.

A set of flipflops can either be a register or a part of memory. What is the difference between these two?
Memory is used to store data like individual characters of a text or variables of a program. Registers can contain data as well but are mainly used to control the I/O (= Input/Output) of a computer. The printer port of a PC has a register where a program should store the character that has to be printed. A user can write to or read from this register as if it was a piece of memory. But connect some LEDs to the printer port and you will see the LEDs will light up (or not) accordingly the bit pattern of the byte written to this specific register.
Some registers can only be read, like the register of the printer port that reports the state of the printer. Some registers of the SID, the Commodore sound chip, can only be written. But don't ask me why the designers choose to do so. (I have only one idea: fewer transistors = cheaper)

Processors have registers as well. Some of them behave like I/O registers, others like memory. But to keep things simple, they are all called registers.


tristate buffers

Now we know how data is stored in a flipflop. The next step is to find out how it can be read again. So far I told that the output of a gate can be either 0 or 1. To make a long technical story short, this is done by powering the according transistor. But what happens if we power neither of the two transistors? Then the output turns into the third state, called "tristate". Simply see it as if the output is connected to anything anymore.
But this means that we can connect another, active, output to this now inactivated output with no fear of a data collision (read: short circuit). It only needs an extra line to tell the output to go in tristate (or not).

You may have noticed the 74125 gate in the above schematic. The 74125 is an IC containing 6 tristate buffers. The little 'o' at the 'output control' input of this gate, marks it as 'active low'. This simply means that a (L) on this input activates the function of the gate.
In the future I will refer to the combination of OR gates, inverter, flipflop and OC output as memory cell or, in short, cell.

tristate buffers exist in stand-alone form, like the 74125. The 74541, for example, contains eight tristate buffers but with a common 'output control' input. It enables a designer to disconnect a circuit bytewise from the data bus. The register reading the state of the printer is in fact nothing more then such a tristate buffer.
In the future I will refer to tristate buffers just as "buffers".


Address decoders

The 6116 is a 2K*8 memory IC (K means Kilo, but in this case not 1000 but 1024 = 2 to the power 10). This means it contains 2048 sets of memory cells where each set is eight bits wide. In short: 2 KB of RAM (KB = Kilo Byte). FYI: RAM stands for "Random Access Memory".

But how is each byte addressed? The one at address %000 0000 0000 could be addressed by using a 12 input OR gate; 11 inputs for the address lines and one input telling the byte that this particular IC has been selected the "Chip Select" input (CS). The moment all its inputs are (L), the output can negate the inputs I12 and I22 of each of the eight cells of the byte.
But where do these address bits come from? For this purpose the IC is equipped with pins, one for every address bit. The 6116 has 11 dedicated pins for addressing each individual address bit labelled A0 to A10. Other pins are used for the power supply, chip select, data pins and other needed functions.

The byte at address %10 0100 0001 can be addressed by using again a 12 input OR gate but with three extra inverters, one for every '1' in the above address.
But using a 12 input OR gate and a handful of inverters for every byte could become a bit costly, so reduction is needed. One solution is tying 10 inverters directly to the 10 address lines. Now connect each input of every 12 input OR gate to either an inverter or directly to an address line.

This 12 input OR gate sounds quite complicated. But in fact it is a giant transistor with 12 inputs. Seen from a technically view of point, the above memory cell is more complicated to build than this giant OR gate.


Why do we need this "Chip Select" line?

The Timex 1000, a American clone of the well known Sinclair ZX81, is equipped with this 6116. Expanding it with another 2 KB of memory means one has to add another 6116 to the board, "parallel" to the original one. But how can I make sure that when I read the byte on address $0123, that it is originates from the first IC and not the second? This is where this "Chip Select" input comes in. Negating this pin informs the IC that the address on the address pins is meant for this particular IC and not for another one.


Reading from or writing to a memory or I/O IC

. How does the 6116 know what type of action is required? This can be signaled by using another pin. AFAIK all memory ICs use the same protocol: a (H) on this pin means a read action, a (L) a write action. In most cases this pin is labeled either
      __        _
      WE  or  R/W

WE stands for 'Write Enable', R/W for Read/Write. The line above the characters tell us we can write to the IC when the signal is (L).
AFAIK all MOS/Rockwell 65xx and Motorola 68xx I/O ICs use this one-pin mechanism. Beside this R/W pin and one or more CS pins, these ICs also have a clock input. This pin has to be connected to a specific pin of the processor: PHI2 for the 65xx range, E for the 680x range. The 65xx processors, like the well known 6502, output a (H) to tell the rest of the board that the outputted address is stable and valid.

Intel and Zilog I/O ICs use another mechanism: they use two pins, labelled
      __       __
      RD  and  WR

Obvious but for sure: RD stands for ReaD and WR stands for WRite. And again, the underscores above the characters tell us that the action is only valid when the signal is (L). So again the underscore is way of telling that an input or output is 'active Low' input.
Why not using only one pin, you may ask? These pins also have a second function: they only may be negated when the address is stable and valid. When an Intel or Zilog processor is used, the inputs of these I/O ICs are directly connected to the processor in most cases and the designer has further nothing to worry about.

But what about memory ICs, why don't they have a clock input or an equivalent? Indeed, all memory ICs I know nowadays, lack this pin. But in most designs I know this clock signal has been incorporated in the circuit used to select the RAM ICs.


Control bus

We already mentioned the data and address bus. We refer to all signals needed to control ICs, like the R/W line(s) and clock lines, as the control bus.


memory ICs

Memory can be bought in different sizes. In 1985 an 8K*8 IC, the 6164, was considered big. In 1987 the first 1 MB module showed up. In this case a module is a small print equipped with 2 or more RAM ICs. In 1998 one could buy 32 MB modules. Nowadays one can buy 64 GB modules.

1 MegaByte (MB) = 1024 KiloBytes (KB) = 1024 * 1024 Bytes.
1 TeraByte (TB) = 1024 GigaByte (GB) = 1024 * 1024 MegaByte

The first commercial computers only had a few KB onboard. For example the Commodore KIM-1 (1976) and Sinclair ZX81 (1980) had only 1 KB. At my work I handle HP ProLiant servers with more than 300 GB of RAM!


Static and dynamic RAM

The type of memory using flipflops is called "Static RAM" (SRAM). Another type of memory is "Dynamic RAM" (DRAM). DRAM uses a capacitor to store a bit. This capacitor can be either full or empty. Unfortunately the used insulators are not ideal and after the capacitor is charged, the load will start to leak away. Therefore a DRAM contains a circuit that checks the load and restores the original level if needed. This process is called "refresh". Such a "refresh" is done for a complete group of cells a time. Generally the square root of the total of memory.


Multiplexed address lines

The above may sound a little bit strange but DRAMs have another strange feature: they have a multiplexed address bus. This means that each pin is used for two address lines.
The Commodore 64 has 64 KB RAM onboard in the form of eight 4164 ICs (64K*1). The 4164 is of the DRAM type. For 64 K we normally would need 16 address lines, with multiplexing only eight. Instead of a 'Chip Select' pin, the DRAM has a CAS and a RAS pin. CAS stands for "Column Address Select", RAS stands for "Row Address Select". First the system presents one half of the address and negates RAS, then presents the second half and negates CAS.

The obvious question: why? An obvious answer is that an IC needs fewer pins for the same function. But then have a look at this:
- The 4164 (and other types as well) has a separate pin for "data in" and "data out". The only circuit I have seen using this feature is the parity-error-circuit in PCs.
- The 4164 is a 64K*1 configuration in a 16-pin IC. It sounds very logical to develop a 64K*8 configuration. Using a combined data in/out you only need 7 extra pins. Even then you end up with less pins then the 24 needed for the 6116, the 2KB*8 SRAM. But I only know the 4-bit version, the 41464 / 4464.

In general: the developers must have had their reasons although these are not clear to me.


Disadvantages and advantages of the DRAM

In most systems designers used DRAM. Why? Technically seen the design of a DRAM is simpler then that of a SRAM, even including the refresh mechanism. And simpler means smaller and cheaper.

The SRAM has two advantages above the DRAM:
- SRAM is faster
- with a battery and a few extra parts you can save the data even if the system has been powered down.


Refresh

Negating RAS doesn't only load the first part of the address, but also triggers the refresh circuit. Very important: every row has to be refreshed with in a given time interval. For several reasons this has to be an independent circuit. In the PETs and CBMs of Commodore, a simple counter took care of this job. The Z80, an 8-bit processor used in the Sinclair ZX81 and Spectrum, has an onboard refresh register especially for this purpose.
Later DRAM designs had a counter on board. In this case the refresh was triggered by negating CAS before negating RAS.


ROM, PROM, EPROM and EEPROM

We now know we can store data in Random Access Memory. But what happens if we turn off the power? Sorry for you, but all the data will disappear. I mentioned using SRAM and a battery but this is unworkable for huge amounts of memory and won't work for DRAM at all. And if you don't use the computer regularly, you also run the risk of an empty battery. And also, what about computers that never have been started before?

So parallel to RAM, manufacturers developed ROM: Read Only Memory. In simplified form: take a RAM as base, remove the flipflop and OR gates, and tie the input of the output buffers to (H) or (L) according the wishes of the customer. This last sentence means that the factory only produced ROMs on demand. That made ROMs quite expensive. Oh boy, if you made an error being the programmer :(

The next step was the PROM: Programmable ROM. Instead of connecting the OR gates to a flipflop, one OR gate was connected to a mini fuse. So one could program his own ROM by blowing the right fuses. Costs dropped now as factories could now mass produce PROMs instead of producing only small batches. Errors in programming became much cheaper now.

The next step was the invention of EPROMs: Erasable PROMs. These EPROMs could be erased using ultraviolet light. You can recognize EPROMs at there little round lead glass window. Erasing an EPROM costs quite some time, about 20 minutes. But as it could be reprogrammed several thousand times, it was much, much cheaper then buying PROMs. The EPROM became so popular that the production of PROMs was halted.
With the mass production of circuits and bugs being removed, the manufacturers of these circuits started looking for cheaper PROM equivalent of the EPROMs again. The manufacturers answered with producing OPROMs, One-time Programmable ROMS, which were nothing more than EPROMs without the relative expensive lead glass window.

The last development: the EEPROM, Electrically Erasable PROMS. These can be erased and programmed again by the system itself as they only need the standard 5 Volt (+ 12 Volt) power supply.


More advanced flipflops

The flipflop above made of two NAND gates is a very simple one. In time manufacturers have made more sophisticated ones. An example is the 7474, a so called D-flipflop (D stands for Data).

D (for Data) is an input. The moment the level on input CLK (for CLocK) goes from (L) to (H) , ">" means "positive edge", the value of Data is copied to output Q. Output Q\ (another way to write Q with a "_" on top) becomes the opposite of Q. Changing the Data has no further influence on the outputs as long as Clock does not go from (L) to (H).
Negating Preset causes Q to go (H) and Q\ to go (L) at any time. Negating Reset causes Q to go (L) and Q\ to go (H) at any time. Negating both these inputs causes unpredictable results.


Latch

Take 8 D-flipflops and connect the D inputs to the data bus. Tie the Preset and Reset inputs to (H). Tie all clock inputs together to one pin (CLK). We now have a so called 8-bits latch. CLK can be used to copy data from the data bus. The outputs can be used to send data to a printer.
If I mention "latch" in this document, I mainly mean this type of use of the D-flipflops.


Dividers and Counters

Take a D-flipflop and connect output Q\ to input D. Now feed the clock input with a square wave (through an inverter). The result is that the value of output Q toggles at every negative edge of the original square wave. The signal outputted by Q is a square wave as well, but with half of the frequency of the original signal. Now take some more D-flipflops, connect Q\ with D and cascade the D-flipflops by connecting the clock input to output Q\ of the previous flipflop.

The result is that every flipflop outputs a square wave at half of the frequency of the previous one, a frequency divider.
    _   _   _   _   _   _   _   _   _   _   _   _   _   _   _   _           
  _| |_| |_| |_| | | |_| |_| |_| |_| |_| |_| |_| |_| |_| |_| |_| |  original
      ___     ___     ___     ___     ___     ___     ___     ___            
  ___|   |___|   |___|   |___|   |___|   |___|   |___|   |___|   |  1st FF  
          _______         _______         _______         _______           
  _______|       |_______|       |_______|       |_______|       |  2nd FF  
                  _______________                 _______________            
  _______________|               |_______________|               |  3rd FF  
Now let's convert this picture into bits:
  0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1  original    
  0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1  FF 1    
  0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1  FF 2    
  0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1  FF 3    
Now look at each row bottom-up: it seems we have a counter in our hands! We only have to use as many D-flipflops as we need to output bits.

Now imagine the original clock is connected directly to the clock input of first D-flipflop and cascade the D-flipflops by connecting the clock input to output Q (notice: Q and not Q\) of the previous D-flipflop. Result:
    _   _   _   _   _   _   _   _   _   _   _   _   _   _   _   _           
  _| |_| |_| |_| | | |_| |_| |_| |_| |_| |_| |_| |_| |_| |_| |_| |  original
    ___     ___     ___     ___     ___     ___     ___     ___            
  _|   |___|   |___|   |___|   |___|   |___|   |___|   |___|   |__  1st FF  
    _______         _______         _______         _______           
  _|       |_______|       |_______|       |_______|       |______  2nd FF  
    _______________                 _______________            
  _|               |_______________|               |______________  3rd FF  
Now let's convert this picture into bits as well:
  0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1  original    
  0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0  FF 1    
  0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0  FF 2    
  0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0  FF 3    
We have created another counter, but one that counts down!


A multiplexer

A multiplexer enables us to choose which input signal we want to feed to an output.

 S   D1  D2  |  O
 ------------+---
 0   0   0   |  0
 0   0   1   |  0
 0   1   0   |  1
 0   1   1   |  1
 1   0   0   |  0
 1   0   1   |  1
 1   1   0   |  0
 1   1   1   |  1

Result: O = (S\ * D1) + (S * D2)
Translated in plain English: output O will output the value of D1 as long as input S is (L) and will output the value of D2 as long as it is (H).

If we combine this multiplexer with one of the counters above, we can create one that is capable of either counting up or counting down by connecting the clock input to either Q\ / inverted clock or Q / original clock.

It is also possible to combine multiplexers and D-flipflops in such a way that it is either a latch or an up-counter by connecting the clock input to either Q\ / inverted clock or a Chip Select signal and by connecting the data input to either is own Q\ clock or a data bus. This "pre-loadable up/down-counter" can be used as the Stack Pointer of a processor. The Stack Pointer points to a place in memory where the processor stores some special information.
If we use only use the 'count up' mode, the circuit can be used as Program Counter. The Program Counter is the register that is mainly responsible for outputting an address at the address bus of the processor.


Shifters

Imagine a bunch of D-flipflops where the data inputs are connected to the Q output of the previous one. So a clock signal forces a D-flipflop to copy the data of its predecessor. The result is that the last D-flipflop in the row starts to output the data of all previous ones.
At the first flipflop in the row we find a so far unused D input. It can be used to input serialized data like data coming out of a modem. After eight clocks we can read it as a byte directly from the Q outputs.
So now we can read serialized data. But how can we create it? This is done by adding the multiplexers mentioned above but now between the Q output and the data input. Now it is possible to load data directly into the flipflops.


The first processors

If you are Dutch, you should be familiar with "Draaiorgels / "Barrel organ". The music the draaiorgel produces, is directed by its Draaiorgelboek / Music roll. The holes in the paper of the rolls make the various instruments produces their sounds. There are draaiorgels having up to 128 instruments on board.

But what have draaiorgels to do with processors? The various registers, counters, adders and whatever else you find inside a processor can be considered as its instruments and the program is its Draaiorgelboek. If you already have some knowledge how processors work this may sound weird but the first computers/processors were operated this way: the program directly manipulated the various registers, counters etc. A part of the 'opcode' manipulated the registers etc. and another part contained the data.

In the early days computers were hand built and could have the size of a room. If they built two in a row, there was a big chance that the second one was improved in one or another way. This meant more or different registers, counters, etc. But this also immediately meant that programs written for the first computer could not be used on the second one. In other words: incompatibility.
Another problem was the amount of registers, counters, etc. The more of them, the wider the data path would become. And as already said above, data was a part of this x-bits wide byte as well. The wider the path, the more memory was needed. And in those days, memory was really expensive.
Some engineers noticed that the first processors needed quite a lot of instruction to perform certain tasks and that all those tasks quite often had a lot of instructions in common. So they decided to combine group of instructions to new mega-instructions, nowadays known as opcodes.


Instruction Decoder (1)

Asking a processor to execute one of these new mega-instructions meant that you asked it in fact to execute a bunch of these original instructions. The original instructions were placed in ROM (read: hardwired in the system) and the new mega-instructions were stored in memory.
You can compare it with the relation ship between machine language and a programming language like BASIC: you give your instructions in BASIC words and the interpreter translates it in machine language. In this case it is the instruction decoder that translates the machine language instructions into so called micro code. In case of the Commodore 128 and earlier machines the BASIC code was stored in RAM and the interpreter was stored in ROM.

So, as said above, you can compare the instruction decoder with a ROM: as inputs it expects the code of the instruction plus the input of a counter. The counter tells the Instruction Decoder what part, read: micro instruction, it should execute. The output of this ROM controls the various registers, counters, etc.
This new technique is used in various ways. The one I'm going to explain is the one used in most modern processor; from the old 4004 and 6502 up to the newest Pentium.


Break: IBM Mainframe

The company I work for used to have IBM Mainframes starting in the '60s. The CPUs of the very first mainframes didn't have an Instruction Decoder and used the old 'barrel organ' method. But the user hadn't access to the original opcodes. IBM grouped opcodes to routines, the so called "microcode", and these could be accessed by the user.
Information provided by Ray Young: the microcode of the 360/30 was updated by a special kind of punch card. It looked like a punch card but one with silver stripes. The silver meant conduct, the hole non-conduct. The card was read by the computer and the code on the card was used to update the microcode.


Instruction Decoder (2)

As we already know, a processor works with numbers. The bit size of these number doesn't matter at all, every processors I mentioned above all work the same: the first number they read is treated as a command. Depending on the type of command the next number they read can be data belonging to this command or another new command. Even the next number that is read can be data: the number of data the processor has to read doesn't have to be limited to one number but can be anything the instruction demands. But one thing is sure: the first number the processor reads is always a command (from now on called 'opcode').

What are the advantages of using an Instruction Decoder?
- Much less memory needed for storing instructions. We also can say: one can store bigger programs in the same amount of memory.
- Compatibility can be maintained. The best example: nowadays Pentiums can still run programs made for the first x86 processor, the 8086.


An example

As example I use the instruction decoder of the 6502. That is, how we think it works. The schematic has never been published and some people designed a 6502 in FPGA that behaved exactly as the original one.
My design is based on a super ROM and a counter: the ROM contains all the microinstruction and the counter makes sure that they are executed in the right order. The outputs of this ROM are used to control the registers, counters and, for example, the ALU (Arithmetic Logical Unit), the "calculator" inside the processor.
As mentioned above, the first part of the input for this ROM is the opcode. Because the data bus, the internal one as well as external one, could be needed for other operations, the opcode is stored in a register, the Instruction Register, whose outputs serve as inputs for this ROM. Now the counter kicks in and starts counting up. At every count of the external clock the ROM outputs another pattern (= microinstruction). The last microinstruction is for every opcode the same: it resets the counter and we are back at the start: the processor is ready to read the next instruction.

Many processors have branch instruction. These instructions enable the processor to jump, or not, to another part of the program depending on the fact whether a certain condition is set or not. This means that the instruction decoder has to be told by an input of all these conditions as well.
Most processors have special inputs, like Reset or Interrupt. When one of these inputs is triggered, the instruction decoder has to act on it as well. So again more inputs are needed.


Clock signal

I mentioned the counter several times but what makes it tick? That is the famous clock signal. Was it only 1 MHz for the Commodore 64 and 4.77 MHz for the first IBM PC, nowadays we are talking about 3-4-5 GHz.
But a remark is on its place: a processor with a faster clock doesn't have to be faster then a processor with a slower clock. The 8088 needs about 4-5 clock cycles to read a byte, the 6502 about one. This all has to do with the internal organization and the use of the instruction decoder. I will explain it later.


Some examples

I will try to explain the working of the instruction decoder of the 6502 by using some instruction as example. The first picture is a template that I will use to explain some of the parts that you will find inside the 6502.


The brown boxes are the various registers, counters, etc. The blue lines show the address and data buss. The green lines are the ones that are active at a given moment, the grey lines are the inactive ones. Lines not needed for the examples aren't drawn.

I first have to explain some abbreviations:
PC - Program Counter, a 16 bits pre-loadable up-counter. Used to drive the address bus. One input, 'E', is for en/disabling its output, the other, 'C', is for clocking the counter.
TAR - Temporary Address Register, also a pre-loadable up-counter. Used to drive the address bus when doing a direct access (= reading from/writing to memory/IO). One of the examples will show its use. The counter is only needed for indirect accesses (not shown here). Therefore the clock signal isn't drawn either. Input 'E' is for en/disabling its output, 'L' for loading a byte in the lower part of the counter, 'H' for loading a byte in the higher part.
DATA - A bidirectional buffer to separate the internal and external data bus. Input 'E' is for en/disabling its outputs.
A - Register A, an eight bits latch. In the 6502 this is the main register. Input 'I' is loading a byte, 'O' for outputting it.
X - Register X, also an eight bits latch. In the 6502 this is a register mainly used for indexed addressing.
IR - Instruction Register, an eight bits latch used to store the instruction during the whole processing. Input 'L' is loading a byte.
C - The Instruction Decoder counter, a three bits counter continuously clocked by the external clock signal. Input 'RES' is for resetting the counter.


Loading the opcode
The first phase is for all instructions the same: loading the opcode. The DATA buffer is enabled so the processor can read the opcode. The L input of the IR is activated so the it can store the opcode. IMHO it should be obvious that the decoder completely ignores the opcode in this phase, it only reacts on the counter and clock.





TAX
TAX stands for "Transfer the content of register A to register X" where "transfer" should be read as "copy". TAX is a one-byte instruction. In this phase the output of register A and the input of register X are enabled. The Program Counter also receives the signal to increase the counter by one.




This phase is resetting the Instruction Decoder counter. As this happens in an instant, we are back in the very first phase: loading the opcode.





LDA #5
"LDA #5" stands for "LoaD register A with the value 5" and is a two-bytes instruction. After loading the opcode, the only thing that has to be done in this phase is increasing the Program Counter.




During the third phase the data byte is read from the outside world by enabling the outputs of the DATA latch and is copied into register A by activating its 'Input.




Again this phase is nothing more then increasing the Program Counter.




The last phase is resetting the Instruction Decoder counter and we are back in the loading phase.





STA $3174
"STA $3174" stands for "STore the content from register A into memory or a register at address $3174". This is a three-bytes instruction. Again the only thing that has to be done in this phase is increasing the Program Counter.




During the third phase the first data byte is read from the outside world by enabling the outputs of the DATA latch and is copied into the low-byte part of the Temporary Address Register by activating its 'L' input.




Again this phase is nothing more then increasing the Program Counter.




During the fifth phase the second data byte is read from the outside world by enabling the outputs of the DATA latch and is copied into the high-byte part of the Temporary Address Register by activating its 'H' input.




In this phase the Program Counter is increased once more. But because we need to store a byte in the next phase using the Temporary Address Register, the outputs of the Program Counter have to be disabled and the ones of the Temporary Address Register have to be enabled. Because we will write now, the R/W pin of the processor has to be negated.




In this phase register A is told to output the byte by activating its 'O' output. As the outputs of the DATA buffer towards the outside world are enabled as well, the byte will show up on the data pins of the processor.




In this phase the address bus is transferred to the Program Counter again. Notice that the Program Counter is not increased. Quite obvious because that has already been done two phases ago. But the point is, it could also have been done in this phase instead of that one. This is one example of having no idea what the original 6502 Instruction Decoder does in this situation.




The last phase is resetting the Instruction Decoder counter once more and we are back in the loading phase.





The Instruction Decoder is not a ROM as we know it

So far I said the part that stores the microinstruction could be seen as a ROM. A ROM implies that it has two-to-the-power-n number of memory cells. In the very first phase the Instruction Decoder doesn't need any info regarding the opcode. This means that at least 256 memory cells aren't needed at all. Another fact is that the 6502 only has 151 valid opcodes. That would mean another 105 * 256 * 16 unused cells.

In reality the Instruction Decoder of the original 6502 is a circuit that contains the sum of all algorithms for every single instruction. This resulted in the fact that some unused opcodes became unwantedly the cross product of two or more other used opcodes. A famous one is LAX #n, "Load register A and register X with the value n". The code for LDA #n is $A2, the code for LDX #n is $A9, the code for LAX #n is $AB. And $AB happens to be $A2 OR $A9 which can be translated as that both opcodes are executed at the same time. This and other examples lead to the conclusion that all official opcodes had only be decoded to that point needed to make them work just to save transistors.
These nonofficial codes are named "illegal opcodes". The first persons who discovered them immediately started to hunt for others. They found illegal codes that did nothing at all but yet behave like a one-, two- or three-byte instruction. Others did something, useful or not, and others even made the processor completely crash (the so called KILL opcodes); only a reset could put it back in action.

A warning is on its place: illegal opcodes can change/disappear without any warning. The 8500 in the Commodore 64-II is the CMOS version of the 6510 in the older Commodore 64s and one could expect that, for compatibility reasons, they should act the same. They do, that is, for the legal opcodes. But there have been reports that the illegal opcodes in the 8500 behave flaky; its seems that some instruction sometimes loses bits.
In the 65C02 and its successors all illegal codes have been replaced by new instructions or by "NOP"s (No OPeration).

The Zilog Z80, known from the various Sinclair and MSX computers, has illegal opcodes as well but is produced by various companies. And it seems that the illegal opcodes of the NEC versions behave different from the Zilog ones.


Special inputs

As said before, most processors have one or more special input pins. I will describe the three most common ones. These signals operate completely opcode independent.

Reset
A modern processor cannot start up from out of the blue, it needs a starting point. For this reason (AFAIK) all processors have a RESET input pin. For a 6502 or 6809 its active level is (L). The Z80 and 80x86 require a (H) on this pin. The actual initialization starts the moment the level is returned to the normal state.

The way a processor behaves on a reset is quite different for the various brands. After (re)setting some internal registers, the Z80 starts executing a program found at address $0000.
The 8088 (IBM-XT/PC) and successors expect a program starting at address $FFFF0. In most cases you'll find there a long jump to a lower part of the ROM.
The 6502 and successors perform an indirect jump: JMP ($FFFC). In this case the 6502 expects to find a program at the address that is made at the two bytes found at the consecutive addresses $FFFC and $FFFD.
The 6800 and 6809 behave the same but look at the addresses $FFFE and $FFFF.

Interrupt / Non Maskable Interrupt
Imagine you are reading a book and the telephone rings. Most likely you will mark the sentence in memory, answer the phone, and, after finishing the conversation, resume reading the book at the point where you stopped the moment you were interrupted by the telephone. "Most likely" because you are free to ignore the telephone. In case of a fire alarm, you can't ignore it, you have to act on it.
The two equivalent pins are the Interrupt input (IRQ / INT) and the Non-Maskable-Interrupt input (NMI). The 6502 can mask (read: ignore) an IRQ if ordered to do so by the appropriate opcode. As the name already says, a NMI cannot be ignored.

How does the 6502 deal with interrupts? The first thing it has to do is to "remember" where it is by copying the momentary address of the PC to the so called Stack: an area of memory especially reserved for this purpose. In case of the 6502 this is the range $0100-$01FF. Beside the address, the Flag register is saved as well.
Once the momentary address is stored, the processor has to execute the program belonging to the specific interrupt. The 6502 will perform an indirect jump, JMP ($FFFE) for an IRQ, JMP ($FFFC) for a NMI.
Once the interrupt routine has been finished, this is signaled to the processor with the instruction RTI (ReTurn from Interrupt). The processor restores the contents of the Program Counter and Flag register with the data previously stored on Stack, loads a new opcode and continues its operation at this point.


Video

Nowadays we don't know better then there is a big video display attached to our computer. That used to be different. The first Commodore computer had a 'screen' made of six 7-segment LED-display to show an address and according data byte. The next model, the PET 2001, had a real CRT monitor.
But how does a computer know how to display text or images on a screen? Simply, it doesn't. In short, the video circuit sends a stream of bits to the monitor which translates it as dots on the screen which, on our turn, we interpret as characters or images. In newer systems the processor has nothing to do with it, it only stored the data in the piece of memory where the video circuit reads the data from.

But how does the video card know what to send to the monitor? First we have to know how the 'ancient' Cathode Ray Tube monitor works. A beam with electrons is sent from the back of the device to the screen you are looking at. Where an electron hits the screen, the phosphor on the inside will light up. The beam moves from left to right (from your point of view) and from top to bottom, just like you would read a book. Arrived at the bottom-right point of the screen, it moves to upper-left point again. The combination of pixels that have been "drawn" in this way on the screen will appear to us as characters or pictures.

There are generally four systems to create a screen:
- The processor does it self. - A static circuit. - A programmable circuit. - A processor controlled circuit.

The processor does it self
I told you above that a video IC generated the picture. In case of the Sinclair ZX80, ZX81 and Jupiter Ace it was done by the processor. The disadvantage: the processor looses a lot of time just by generating the screen. In case of the ZX81 about 75%! The advantage: less parts needed, thus a cheaper product.

A static circuit
A circuit based on normal TTL ICs generates the various signals, including the video signal. This circuit gets its data from RAM that it shares with the processor. On the PET 2001 this is 1024 bytes of RAM found at address $8000. This amount of RAM covers the screen that can display 25 lines with 40 characters each. In other words, each byte in the RAM (except the last 24) covers one character on the screen.

Hey, wait a moment, is one byte or eight bits enough to draw all the pixels of a character? The byte itself, no. But this byte and some outputs of a counter are used as inputs for a ROM, the so called "character ROM". This character ROM contains all the needed data to create the character on the screen.
These are the first eight characters of the character ROM of the PET:
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
   000     00    00000     000   0000    000000  000000    000  
  0   0   0  0    0   0   0   0   0  0   0       0        0   0 
 0  0 0  0    0   0   0  0        0   0  0       0       0      
 0 0 00  000000   0000   0        0   0  0000    0000    0  000 
 0  00   0    0   0   0  0        0   0  0       0       0    0 
  0      0    0   0   0   0   0   0  0   0       0        0   0 
   0000  0    0  00000     000   0000    000000  0         000  
                                                                
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
Each '0' represents a dot that is written to the screen. A character of the PET is 7 dots wide, 8 dots high. A character is stored as eight consecutive bytes. So the first row of '@' is found at address 0, the first row of 'A' is found at address 8, etc., etc. How is the character ROM read? The output of the video RAM is connected with the address lines A3..A10 of the ROM. The lines A0..A2 are connected to a three bit counter (0..7).
Now assume the circuit has to display the above characters starting from the top-left corner of the screen. First the circuit reads the byte at address 0 and sends all the bits, starting with bit 7, to the monitor. Then it reads the byte at address 8 and repeats the action. After having read 40 characters, it increases the counter and starts to send the second line of the first 40 characters. Having sent all lines of the first 40 characters, it starts with sending the first line of the second 40 characters. After having read all 25 lines, it starts with the first line again.

A remark about the ROM: the above picture shows the character set we are familiar with. Just replace it with a ROM containing the Cyrillic character set and we can use the computer in Russia, Bulgaria or Serbia! (Yes, I know, the keyboard has to be changed as well)

A programmable circuit
The above circuit had at least two disadvantages:
- you are limited to a fixed number of lines and a fixed number of pixels/line
- the number of ICs needed
So some companies developed a special video IC that contained most of the needed hardware. The most well known is the Motorola 6845. IBM's Color Graphic Adapter and Monochrome Display Adapter, Hercules's Monochrome Graphic Card and ATI's Graphic Solution Card are based on this IC.
But the 6845 has another advantages: you can adjust the number lines, the number of dots/line, the number of vertical dots/character and the number of horizontal dots/character. This enabled one to show text on the screen varying from 40 to 132 characters/line and from 24 to 50 lines/screen.

A processor controlled video circuit
Until the beginning of the 80's most computer systems hade text-only video for the simple reason that RAM was expensive. Was 1 KB sufficient for a 40*25 text screen, for the same graphical representation you already needed 8 KB. Not much according to our nowadays standards but for those days it could make a computer too expensive to sell well.
Anyway, prices dropped and we got better and bigger video cards. But a VGA card capable of 1024*768 pixels with 256 colours already needs 1 MB of RAM. So in my case in 1990 I had a 80286 with only 1 MB of RAM and this 1 MB video card. But drawing lines, circles, ellipses or whatever started to become quite a stress for my computer. This is one of the reasons I had to upgrade it to a 80486.
Engineers realized this as well and soon the question rose why the computer itself had to calculate all points of a circle, line or what ever? Why not the video card itself? In that case the only thing the computer had to do was to supply the video cards with information like what type of object was wanted, coordinates, orientation, colour and what ever else was needed. The GPU, Graphical Processor Unit, was born.


Build your own processor

I'm busy building my own processor as well, the TTL6502. It is relatively huge, nine Eurocards (10 * 16 cm), compared to other projects I have seen. But not all cards are fully populated and its biggest advantage: TTL6502 should be able to replace a real 6502!
But I decided to design and build a smaller version first: the Mini6502. It is three cards only. OK, much smaller but at a price: it is two to three times slower than the original 6502. But it should still be able to replace a real 6502.
So please have a visit:

The Mini6502






Having questions or comment? You want more information?
You can email me here.