Ruud's Commodore Site Home  Email

Hardware course




What is it?

This document describes the ins and outs of digital electronics and how it is used in computers, especially RAM and processors.


Some words about reading this document.

I am a Dutchman and English is not my native language. This is the very first version and you certainly will find errors. You do me a favour by letting me know where to find them and, more important, how to correct them.

If you are not familiar with electronics and don't understand something or want to know more precise info, let me know.
If you are familiar and see errors, let me know as well.


Foreword.

For many people a computer is a black box that appears to be very intelligent and can perform a lot of magic tricks. I even know people who are more or less afraid for computers and don't want to have anything to do with them. Explaining that their GSM-phone contains a computer as well doesn't help. But I can assure you that computers aren't dangerous at all. At least I never heard that someone was bitten by one :)
Ok, I personally know a case where a man got a static shock from a computer (or was it the other way round?). Anyway, the man survived the shock but the computer was dead :(

There is also a group of people who work with computers but have no idea how they exactly function. And a part of this group would like to obtain this knowledge. Once they know how the hardware works, there is a good chance they can make a better use of computers. And this document is meant to give this group some of the needed knowledge.
People will also see that the computer is just a very dumb machine with no intelligence or what ever. And what about the miracles a computer performs? Find a monkey which does exactly what you tell him to do and it can build a space shuttle. Find a few hundred more and they can build the same shuttle in 10 days. But these hundreds of monkeys will never invent one!

Some small knowledge is needed: the meaning of "voltage", "current", some knowledge of mathematics and "binary-decimal" conversion.


The operator: the human processor

Computers perform programs. Programs on its turn contain a collection of instruction. An instruction for a machine operator could be like this: If valve A is in this position, switch B is in that position and scale C reads this value then do this, this and that. With valve B having more then one position, same for switch B and knowing that scale C can show you a lot of different readings, you can understand you need a lot of different actions for the various situations.
In every case one thing must be sure: for a given situation the operator must always perform the same action. It is completely out of the question that one time he does this for a given situation and the other time that.
An operator can perform more then one instruction in a row. Where does he get these instructions from? A good example could be that his boss gives him a task (= program) and that all instructions needed to perform this task can be found in a manual. Computer programs can be stored in ROM and RAM. The instructions are the bytes in the ROM and RAM. So let's start from the very beginning and execute a program.

Before an operator can start his work at all, he has to do some initial work first like entering the building and turn the main power switch. How does he know? Because it is stored (programmed) in his memory (RAM). But let's go way back to moment he was "powered up", the moment of his conception, what program did start him up? Very simple: his ROM, or in biological terms, his genes.

Now let's find out how we can translate the above story into electronics....


IC’s

If you ever have seen the inside of a computer, you probably noticed little black boxes with many shiny legs on green boards. If the board was not green, don't worry. Most are. Why? I don't know.
Those little black boxes are the so called ICs or chips. IC stands for "Integrated Circuits". Most ICs have little numbers printed on them. It is very rare to see ICs without numbers. In that case there is a good chance that, if you look more closely, the original numbers have been erased. Why? To make it hard to re-engineer and copy the product.
Some of these numbers have a meaning, others don't (at least to me). These numbers enable someone to see what IC it is, what version and, sometimes, when it was produced.

What do these IC’s contain? Electronic circuits, mostly in the form of transistors, diodes and resistors. There function? That can vary from simple amplifiers and gates to circuits as complicated as processors. You probably are familiar with the word "amplifier". There is a good chance that you use a computer to read this document and therefore I assume that you at least know that there is a processor inside your computer.
"Gates" are the smallest units we know in the world of digital electronics. You can compare them with atoms. As atoms are needed to create molecules, so gates are needed to create processors, counters and other digital logic. We can even go a step further. As atoms are made of protons, neutrons and electrons, gates are made of the already mentioned transistors, diodes and resistors. So first I explain a little bit about these parts.


The basics and the law of Ohm

Although the computer is "digital", it is based on techniques coming from the "analogue" world. So we first start with some basics. As said in the foreword, I presume you already have some knowledge of electronics. In this case I presume some knowledge of the following items:

Voltage
- Voltage can be compared with pressure: the more pressure you have, the more air or water you can pump through a pipe. Measured in Volt.

Current
- Current is the quantity of electrons that flows through a line per second. Measured in Ampere.

Resistance
- I skip this one :) Measured in Ohm. In most cases resistance is inconvenient; it only causes waisting power. But sometimes you need it to limit the current and/or voltage. This is done by using so called 'resistors'.

Power / Dissipation
- When a current flows through a resistor, energy in the form of heat is generated. Measured in Watt.

Given the resistance of and the voltage over a circuit, how much is then the current through that circuit? That's the famous law of OHM:


U / R = I

Voltage divided by resistance = current

Knowing two of the three items, one can calculate the third one.

The 'power' mentioned above can be calculated with the formulas:

P = U * I or P = R * I * I

So far the parts with presumed knowledge.


More basics

Capacitance
Imagine a two inch water pipe with a closed valve. To that valve you connect a 0.25 inch pipe, a T-piece and again a 0.25 inch pipe. We place a balloon on the third opening of the T-piece. If we open the first valve, water starts to flow through the two pipes but also starts to fill the balloon. Because the balloon is filled, the amount of water that comes out of the second pipe is not the amount that comes out of the valve.
But the rubber of the balloon wants to push the water back. The more water gets into the balloon, the stronger the rubber pushes back. You can imagine that there is a moment that the pressure behind the water is equal to the pressure of the rubber on the water: no water flows into the balloon anymore.

If we close the valve, the pressure is gone as well and the rubber starts to squeeze the water out of the balloon. In the beginning there is a lot of pressure so a lot of water will appear out of the second pipe. But during this process the pressure of the rubber will decrease as will the flow do. After some time the flow of water eventually stops.

See here the electronic equivalent:

The battery is the 2" pipe with the water supply, the switch is the valve, the resistors are the 0.25" pipes and the capacitor is the balloon. The schematic representation is a good reference how a capacitor is build: two metal sheets as thin as rice paper, separated by an insulator. By rolling up the plates you decrease the volume and even increase the capacitance.
The LED will show the amount of water/electrons flowing through the pipes/resistors. From the moment on you close the switch, it will take some time before the LED lights up at its peak. When opening the switch, you will see it takes some time before the LED has been completely dimmed.

Capacitance can be good, otherwise we wouldn't use capacitors, but it also can be bad. See later.


Diodes.

You can consider diodes as one-way-doors for currents: the current can only flow in one direction. The voltage drop over a Silicium diode in that case is about 0.7 Volt. The voltage a diode can handle when blocking the current, is not unlimited. Once this limit is reached, in most cases the diode will be destroyed.

Some remarks about the 0.7 V voltage drop:
With a 0.7V voltage drop and a current flowing through the diode, it dissipates energy in the form of heath. So the current through the diode is limited by the heath it can dissipate.
Before using silicium, diodes were made of Germanium. The voltage drop over a Germanium diode is only 0.2V. But compared to a Silicium diode, a Germanium leaks a factor hundred more current then a Silicium diode when in blocking mode.

A diode is made of two pieces of Silicium. But each piece is doted with another material. In one case most of the time Borium is used. This Borium doted Silicium is referred to as P-Silicium. The other piece is doted with Phosphor, referred to as N-Silicium.
    |  
    |              
   PPP
   NNN             
    |               
    |   
When the voltage applied to the P-part is higher then the N-part, current will flow. The 'why' goes a bit too far.


Transistors
Then someone added a third layer:

   ++ ------+     collector
            |              
           NNN             
    + -----PPP    base          
           NNN             
            |               
    - ------+     emitter   



Sending a current from the base to the emitter (Ibe), enables a current to flow from the collector to the emitter (Ice). Ice, is a multiple of Ibe. This factor can vary from 4 to 400, dependant of the type of transistors.

The above picture shows a NPN-type transistor. PNP-types do exist as well and, in fact, were the first types to be made. The symbol is about the same.

The transistor is a current amplifier. But in case of digital electronics, it is used as a switch. And this is all you have to know for the moment. Have a look at this circuit:

When the switch is closed, a current can flow through the switch, R1 and then from the base to the emitter of the transistor. The amplification factor of the BC547 is about 200. Ibe is about 0.42 mA. This means that Ice can be up to 80 mA. But R3 limits this to 5 mA. And this means we measure 5V over R3, which on its turn means that we nearly measure 0V between point Y and Z. When the switch is open, Ibe is 0 mA. On its turn this means Ice is 0 ma as well. As there flows no current through ie. is no voltage drop over R3, this means we measure 5 V between Y and Z.
A remark regarding R2: this resistor is only needed to make sure that no static electricity activates the transistor.

Now let's have a digital look at the circuit: applying 5V to the input outputs 0V, applying 0V outputs 5V. Translate 5V with "True" and 0V with "False": We have an inverter!

I used 5V to feed the above circuit. That is no coincedence. In time several techniques were developed to be used for computers. Mostly used in the 80's was the so called TTL-technique. TTL stands for "Transistor Transistor Logic". At some stage in the development of this technique, it was decided to feed TTL-circuits with 5V +/- 5%. Why 5 Volt, I don't know. Maybe because it was a nice round number.
Is it possible to feed a TTL-circuitry with another Voltage? Yes, it is. I learned to build TTL-circuits using a big flat 4.5 V battery and things worked fine.
But things have changed. Nowadays processors run at voltages as less as 1.7V.


Delay time

The speed of signals travelling through a line is limited to 300.000 KMH. Although this is very fast, signals need some time to travel from A to B. Doted silicium is a conductor but, compared with copper, a worse one. This means that every part of a circuit has resistance. Every line running along another line form together a capacitance. The combination of resistance and capacitance form the problem. Let's take the above circuit with the transistor as example. The little piece of metal that connects the resistor, transistor and output to each other, is a capacitor on itself. And, as said, a combination of resistance and capacitance causes delay.

One mean to decrease this delay is replacing the resistor by a PNP-transistor. But even then we will face problems. The lines between the transistor and the switch, and even the resistors and the interior of the transistor, form a capacitance with Ground as well. After closing the switch, the voltage over the basis-emitter is not 0.7 V instantly, it takes some time.
Combining these things means that, when changing the level of an input of a circuit, it takes some time before the outputs reflect the combination of inputs and function of the circuit. The time one has to wait to be sure that the output is valid after the last change of an input is called 'delay time'.


Creating a gate output

Above we used a transistor to pull the output to 'False' and a resistor to push it to 'True'. But with the delay time caused by the combination of resistance and capacitance in our mind it is better to use a PNP-transistor that takes care of pulling the output High (= to 5V). This combination of PNP- and NPN-transistor is also known as a 'totem pole output'.

Another type of output is the "open collector"-output. In this case there is only a NPN-transistor which can pull the output Low as we saw in our own example. Nothing more, not even the resistor. This construction enables a designer to tie more outputs together without the need of extra gates. Of course at least one pull-up resistor is needed so the circuit sees 5V in case no output is activated.
The Commodore computers and peripherals using the serial IEC-bus, are coupled together thru open collector inverters (mostly of the type 7406).

OC-outputs pulling outputs (H) never have been developed as far as I know.


Digital electronics

An inverter is one of the many gates we have. Another one is the AND gate. The 2-input AND-gate functions by the following logic: if input 1 is High and input 2 is High, then its output is High as well. How does an AND-gate work internally? To be honest, I don't know. In fact, the above circuit does not represent the internal of a real inverter. If you really want to know, the data books of Texas Instruments often show you the circuits of the various gates.

This and other gates:
 In1 In2 | Out		 In1 In2 | Out
---------+----		---------+----
  L   L  |  L		  0   0  |  0
  L   H  |  L	  AND	  0   1  |  0
  H   L  |  L		  1   0  |  0
  H   H  |  H		  1   1  |  1

 In1 In2 | Out		 In1 In2 | Out
---------+----		---------+----
  L   L  |  L		  0   0  |  0
  L   H  |  H	  OR	  0   1  |  1
  H   L  |  H		  1   0  |  1
  H   H  |  H		  1   1  |  1

 In1 In2 | Out		 In1 In2 | Out
---------+----		---------+----
  L   L  |  L		  0   0  |  0
  L   H  |  H	  EXOR	  0   1  |  1
  H   L  |  H		  1   0  |  1
  H   H  |  L		  1   1  |  0		

Of the AND- and OR-gate also the inverted versions exists: the NAND- and NOR-gate:

 In1 In2 | Out		 In1 In2 | Out
---------+----		---------+----
  L   L  |  H		  0   0  |  1
  L   H  |  H	  NAND	  0   1  |  1
  H   L  |  H		  1   0  |  1
  H   H  |  L		  1   1  |  0

 In1 In2 | Out		 In1 In2 | Out
---------+----		---------+----
  L   L  |  H		  0   0  |  1
  L   H  |  L	  NOR	  0   1  |  0
  H   L  |  L		  1   0  |  0
  H   H  |  L		  1   1  |  0

Crash course binary-decimal conversion

As you can see, gates accept Highs and Lows. We can go a step further by translating a High with a "1" and Low with a "0". And now we are on the communication level of a computer: it talks a language only using ones and zeros. "1" and "0" are only two numbers, the so called 'binary language'.
Unfortunately the majority of humans are only used to the decimal system which uses the numbers 0 to 9. So let's have a look at the decimal number 37652:
     4   3   2   1   0
   10  10  10  10  10
 ---------------------
    3   7   6   5   2
                  4  
37652 = 3 times 10  = 3 times 10000
                  3                   +
        7 times 10  = 3 times 1000
                  2                   +
        6 times 10  = 6 times 100
                  1                   +
        5 times 10  = 5 times 10
                  0                   +
        2 times 10  = 2 times 1
Now let's have a look at the binairy number 11011:
     4   3   2   1   0      <- exponents
    2   2   2   2   2
 ---------------------
    1   1   0   1   1
                 4  
11011 = 1 times 2  = 1 times 16
                 3                   +
        1 times 2  = 1 times  8   
                 2                   +
        0 times 2  = 0 times  4
                 1                   +
        1 times 2  = 1 times  2 
                 0                   +
        1 times 2  = 1 times  1
                          ----- +        
	                   = 27
So 11011 is the binary representation of 27. Do I know the representation of 37652? Here it is: 1001001100010100. If it is not correct, you can blame Microsoft's Calculator.


Hexa-decimal numbers, nibbles, bytes, words and double-words

Using only zeros and ones just because that is what the computer speaks has one major disadvantage: you need a lot of them to form a relative small decimal number. This makes them unhandable and, worse, unspeakable. So they started to group them in packages of four bits. (Actually three, but that system was abandoned later). With four bits, called a nibble, we can make numbers from 0 to 15. And a new system was developed to note these 15 numbers down:
0000 - 0
0001 - 1
0010 - 2
0011 - 3
0100 - 4
0101 - 5
0110 - 6
0111 - 7
1000 - 8
1001 - 9
1010 - A
1011 - B
1100 - C
1101 - D
1110 - E
1111 - F
Numbers written down in the hexadecimal system start with a "$" or end with an "H". Pure binary numbers start with a "%". For example:

$A3 = %10100011 = 163

The first computers known to the public, like Commodore, Sinclair and Apple, used processors that could handle 8 bit wide numbers. Eight bits form a byte. In the beginning of the 80's the first 16-bit processors appeared: 8086, 68000, Z800. 16 bits were called a word. Then the 32-bitters (double-words) appeared: 68020, 80386, Z8000. Now, 2001, the 64-bitters appear, Pentium 2, IBM Mainframe Z-series, but I have no idea yet how a 64-bit number will be called.


Adding numbers

Let's do some adding:
   134  
   379  
  ---- +
   513  
How is it done: 9 plus 4 makes 13. Write down 3, remember 1. 7 plus 3 makes 10. Add the previous remembered 1, this makes 11. Write down 1, remember 1. 3 plus 1 plus remembered 1 makes 5.

Now some basic rules for binary adding:
     0         1         0         1  
     0         0         1         1  
   --- +     --- +     --- +     --- +
     0         1         1        10  
Let's do a more difficult addition:
    110110100110          3494  
    100010110110          2230  
   ------------- +       ----- +
   1011001011100          5724  
Adding binary 1 and 1 gives a 0 and we have to remember a 1. This particular 1 is called "Carry". Let's have a look at the basic additions in another way:
  Bit 1:      0         1         0         1  
  Bit 2:      0         0         1         1  
            --- +     --- +     --- +     --- +
  Result:     0         1         1         0  
  Carry:      0         0         0         1  
Now we make truth tables of these findings:
  Bit 1  Bit 2   |   Result        Bit 1  Bit 2   |  Carry
  ---------------+---------        ---------------+-------
    0      0     |     0             0      0     |    0  
    0      1     |     1             0      1     |    0  
    1      0     |     1             1      0     |    0  
    1      1     |     0             1      1     |    1  
Hey, don't these tables look familiar ??? It seems that adding two bits is the same as EXORing them to get the Result and ANDing them to get the Carry. See the next circuit:



Let's make a two-bit-adder. After adding all individual bits, we have to do something with the Results and Carries of each separate addition:
  Carry 1   Result 2   Carry 2   |   Result 2b   Carry 2b
  -------------------------------+-----------------------
     0         0          0      |      0           0    
     0         0          1      |      0           1    
     0         1          0      |      1           0    
     0         1          1      |       Impossible      
     1         0          0      |      1           0    
     1         0          1      |      1           1    
     1         1          0      |      0           1    
     1         1          1      |       Impossible      
Adding two bits never results in Result=1 and Carry=1 together. Therefore two lines of the above table are marked as "impossible".

This results in:
   Result 2b  =  Carry 1  EXOR  Result 2                    

   Carry 2b   =  Carry 2   OR   ( Carry 1   AND   Result 2 )


For a multi-bit adder we just cascade as many "adding cells" as needed. Can we now build a 64-bit adder as used in an Intel Pentium 4? Yes.... and No. Remember the delay time? Cascading all these adders means that we have to wait 64 times the delay time of a single adder before we have the final result. And that is unacceptable.
Texas Instruments is so kind to display the working of their "7483 4-bit adder" in their data books. It took me quite some time to figure out how it worked because of its complexity. TI's circuit may look illogical but it is only a few gates "deep". Adding more bits to this design would not affect the delay time of this circuit, just the complexity of it. It was in fact this circuit that focussed my attention to the subject "Delay time".


Subtracting numbers.

The next operation we try to perform is a subtraction using a complete 16-bit subtracter:
    0000110110100110           3494  
    0000100010110110           2230  
   ----------------- -        ----- -
    0000010011110000           1264  
Now let's do it in another way:
    0000110110100110           3494  
    1111011101001010        -  2230  
   ----------------- +     -------- +
    0000010011110000           1264  
Instead of subtracting two positive numbers, we add a negative one to a positive one.

How does a negative number look like? In that case the Most Significant Bit (MSB) is always "1". The MSB of a byte, word or what ever is the most left bit of a number. The Least Significant Bit (LSB) is the most right one or the bit representing '2 to the power 0'.

How do we know that we are dealing with negative numbers at all? In that case add the words 'signed' or 'unsigned'. Using 'unsigned word' means we are dealing with numbers ranging from 0 to 65535. Using 'signed word' means we are dealing with numbers ranging from -32768 to 32767.
Various programming languages, like Pascal, use the term 'integer' and 'long integer' instead of 'unsigned word' or 'unsigned double-word'.

Unfortunately it isn't just a matter of inverting the MSB of a number to create its counter part. For example, adding 1 and -1 results in 0. Adding %10000001 and %00000001 doesn't. But adding %11111111 and %00000001 does! (forget the Carry). The same for %11111000 and %00001000, %11111001 and %00000111.
This negative representation of a number is called "2-complement" and that is the way computers store negative numbers. (that is, all the computers I'm familiar with)

How do I create this "2-complement"? Just invert every bit from left to right but stop at the last "1". But this is not an algorithm that can be translated easily into a circuit using gates. Another algorithm we can use is: invert every bit and add 1 to the result. For a human this is more work but the advantage is that this algorithm can be translated into logic: an inverter for every bit and an adder that just adds one to what comes out of the inverters will do.

So to subtract two numbers we need a bunch of inverters and two adders. Isn't there an alternative hardware way to do the subtraction? There probably is but, agian, the main goal of this document is to show that it can be done to make things understandable. Like with the adding circuit, it is quite possible that it indeed is done in another way. And to be honest, I haven't even thought about a subtraction circuit until writing this article.


Multiplier

     111     number A          111        
     111     number B          101        
 ------- *                 ------- *      
     111                       111        
    1110                      0000     [1]
   11100                     11100        
 ------- +                 ------- *      
  110001                    100011        
These two examples should show you that multiplying two binary numbers can be done in the same way as multiplying two decimal numbers. I know, normally someone would not write down the row marked with [1] but I did it to make things more understandable later.
As you can see, just like with multiplying decimal numbers, the next row is shifted one position to the left. How that is done will be explained later.
       111     number A
       111     number B
   ------- *           
       111             
      1110             
   ------- +           
     10101             
     11100             
   ------- +           
    110001             
As you can see multiplying is nothing more then adding some numbers. In the above case we only need 2 adders that have been cascaded. The first adder is first fed with number A if bit 0 of number B is 1 or all zeros if the bit is 0, and then is fed with number A shifted once if bit 1 of number B is etc. etc.
The second adder is fed with the result of the first one plus number A shifted twice if etc. etc.
Multiplying two 8-bit numbers could be done by cascading seven adders.

The above way is the quick way, the whole multiplication can be done in one go. This is the way it is done in the Intel 80386 and its successors. But for a price: seven adders and registers are needed and that is a lot of transistors.
The older Intel 8088 and 8086 processors do it in a different way in. Here the result is stored and the same adder is used to add the next shifted number. But this means that the processor needs several cycles to obtain a result.

If one can multiply by cascading adders, it seems logical to assume that one can divide numbers by cascading subtractors. But to be honest, I haven't researched this subject.


Designing circuits

I can imagine that you start to wonder about the fact that I first explain you how something works, and then throw everything overboard by saying that in reality it is done differently. Remember the saying: "Many roads lead to Rome". Adders actually have been build in the way I told you, certainly in the time that only relays and/or tubes were used. But then the number of components was more important then delay times. Improved production processes just gave us the possibility to disregard the number of needed components in order to improve the overall performance of a processor.

Now take the 4-bit adder for example. One way to create it is to cascade several adders as has been shown above. Another way to create it is to make a truth table for every single output. With two times four bits and a carry this means a table with 512 rows. Then it is a matter of eliminating the rows and bits which have no influence on the output. It can be done by hand by using so called Karnaugh-diagrams, but it is a hell of a job. I myself faced a similar problem in 1986 and wrote a BASIC program to solve it. once solved by the program, it was just a matter of translating the resulting equations into hardware.

Nowadays one can buy CPLD's and FPGA's, ICs that can be programmed to behave like complete electronic circuits. There exist FPGA's that can be programmed to behave like a Commodore 64, Amiga 500, CPC-64 etc. etc. and yet they only have a size of one by one inch. These CPLD's and FPGA's can be programmed by various languages of which VHDL and Verilog are the most well known. Check Wikipedia to find out more about them.


Registers and memory

Let's have a look at the following circuit:

Pressing switch S1 causes the upper input of the upper NAND-gate to become (L). FYI: 7400 is the code for an IC containing four NAND gates. Looking in the table for a NAND-gate, we'll see that whenever one or more inputs are (L), the output becomes (H). This means that both inputs of the lower NAND-gate become (H), causing the output to become (L) and the LED will light up. This on its turn means that the second input of the upper NAND becomes (L) as well. So when we release S1, causing the first input to become (H) again, the output remains (H) because the second input is still (L).
Pressing S2 causes the output of the lower NAND-gate to become (H) and, using the above explanation, this will result in the fact that the output of the upper NAND-gate will become (L) now.

This special combination of two NAND- (or NOR-) gates we call Flip-flop.

This Flip-flop can remember whether we have pressed S1 or S2 in the past. But in this case we need two signals to tell the Flip-flop whether to go into this state or to go into that state. So let's add some more gates to the circuit: two OR-gates (7432) and an inverter (7404).

One input of each OR-gate, I12 and I22, are tied together and connected to switch S2 and to resistor R2. This means that, as long as S2 is not pressed, R2 pulls these inputs of the two OR-gates (H). This means that the outputs of these OR-gates are (H) as well, indifferent of the state of the other inputs. Because of that both the inputs of the Flip-flop are (H) as well which simply means that nothing can happen to the state of the outputs.
One of the so far unmentioned inputs, I11, is connected directly to switch S1 and pull-up resistor R1. The last input, I21, is connected to this point as well but through an inverter. This means that the state of this input is always the opposite of the state of the other input.

Pressing S2 means negating inputs I12 and I22, causing the OR-gates to follow the inputs I12 and I22. Because always one of these inputs will be (L), one of the outputs of the OR-gates will become (L) as well. On its turn this will negate one of the inputs of the Flip-flop and cause it to react on it accordingly. In a few words: the Flip-flop will represent the state of S1 the moment S2 is pressed.

The use of the inverter should be clear: now only one signal is sufficient to control the output of the Flip-flop. Then why these OR-gates? Most likely a computer wants to store more then one data bit so it must have some means to choose in what Flip-flop it wants to store this bit. These OR-gates are the last part of a circuit to choose (in digital talks: to address) a particular bit within the memory (or register set) to store data. Most likely the I12 and I22 inputs are connected to the output of an address decoder.

So at this moment we can store a bit. Now it is just a matter of placing 8, 16, 32 or 64 pieces of the above circuit parallel to each other to handle a byte, word, double-word or quadruple-word.
Remark: when I mean more bits in parallel mode, I will use the byte as example. When using the word 'byte', you can read 'word', 'double-word' or 'quadruple-word' as well.

A set of Flip-flops can either be a register or a part of memory. What is the difference between these two?
Memory is used to store data like individual characters of a text or variables of a program. Registers can contain data as well but are mainly used to control the I/O (= Input/Output) of a computer. The printer port of a PC has a register where a program should write the character that has to be printed. A user can write to or read this register as if it was a piece of memory. But connect some LEDs to the printer port and you will see the LEDs will light up (or not) accordingly the bit pattern of the byte written to this specific register.
Some registers can only be read, like the register of the printer port that reports the state of the printer. Some registers of the SID, the Commodore sound chip, can only be written. But don't ask me why the designers choose to do so. (I have only one idea: fewer transistors = cheaper)

Processors have registers as well. Some of them behave like I/O registers, others like memory. But to keep things simple, they are all called registers.


Tri-state buffers

Now we know how data is stored in a Flip-flop. The next step is to find out how it can be read again. Until now I have told you about two types of outputs: the totem pole (TP) and Open-Collector (OC) output. The third one is a variation on the TP. You may remember that a TP basically is made out of two transistors but only one is activated at the time. But what happens if you disable that one as well? From that moment on that output does not exist anymore, it is in so called 'tri-state'. Technically seen you only see two diodes placed in the blocking direction. This means that we can connect another, active, output to this inactivated output with no fear of a data collision (read: short circuit). It only needs an extra line to tell the electronics to go in tri-state (or not).

You may have noticed the 74125 gate in the above schematic. The 74125 is an IC containing 6 tri-state buffers. The little 'o' at the 'output control' input of this gate, marks it as 'active Low'. This simply means that a (L) on this input activates the function of the gate.
In the future I will refer to the combination of OR-gates, inverter, Flip-flop and OC-output as memory cell or, in short, cell.

Tri-state buffers exist in stand-alone form, like the 74125. The 74541 for example contains 8 tri-state buffers but with their 'output control' input tied together. It enables a designer to disconnect a circuit byte-wise from the data bus. The register reading the state of the printer is in fact nothing more then such a tri-state buffer.
In the future I will refer to tri-state buffers just as "buffers".


Address decoders

The 6116 is a 2K*8 memory IC (K means Kilo, but in this case not 1000 but 1024 = 2 to the power 10). This means it contains 2048 sets of memory cells where each set is eight bits wide. In short: 2 KB of RAM (KB = Kilo Byte). FYI: RAM stands for "Random Access Memory".

But how is each byte addressed? The one at address %000 0000 0000 could be addressed by using a 12-input OR-gate; 11 inputs for the address lines and one input telling the byte that this particular IC has been selected the "Chip Select"-input (CS). The moment all its inputs are (L), the output can negate the inputs I12 and I22 of each of the eight cells of the byte.
The byte at address %10 0100 0001 can be addressed by using again a 12-input OR-gate but with three extra inverters, one for every '1' in the above address.

But where come these address bits from? The IC itself is equipped with pins, one for every address bit. The 6116 has 11 dedicated pins for addressing each individual byte labelled A0 to A11.

But using a 12-input OR-gate and a handful of inverters for every byte could become a bit costly, so reduction is needed. One solution is tying 10 inverters directly to the 10 address lines. Now connect each input of every 12-input OR-gate to either an inverter or directly to an address line.

This 12-input OR-gate sounds quite complicated. But in fact it is a giant transistor with 12 inputs. Technically seen the above cell is more complicated to build the this giant OR-gate.


Why do we need this "Chip Select"-line?

The Timex 1000, a American clone of the well known Sinclair ZX81, is equipped with a 6116, a 2KB RAM. Expanding it with 2 KB of memory means one has to add another 6116 to the board, "parallel" to the original one. But how can I make sure that when I read the byte on address $0123, that it comes from the first IC and not the second? This is where this "Chip Select"-input comes around the corner. Negating this pin informs the IC that the address on the address pins is meant for this particular IC and not for another one.
A 74LS138, a "3-to-8 demultiplexer", enables one to expand the Timex up to 16 KB.


Reading from or writing to a memory- or I/O-IC

. How does the 6116 know what type of action is required? This can be signalled by using another pin. AFAIK all memory-ICs used the same protocol: a (H) on this pin means a read-action, a (L) a write-action. In most cases this pin is labeled either
__        _
WE  or  R/W
WE stands for 'Write Enable', R/W for Read/Write. The line above the characters tell us we can write to the IC when the signal is (L).
AFAIK all MOS/Rockwell 65xx and Motorola 68xx I/O-ICs use this one-pin mechanism. Beside this R/W-pin and one or more (!) CS-pins, these IC's also have a clock-input. This pin has to be connected to a specific pin of the processor: PHI2 for the 65xx range, E for the 680x range. The 65xx processors, like the well known 6502, output a (H) to tell the rest of the board that the outputted address is stable and valid.

Intel and Zilog I/O-ICs use another mechanism: they use two pins, labelled
__       __
RD  and  WR
Obvious but for sure: RD stands for ReaD and WR stands for WRite. The underscores above the characters tell us that the action is only valid when the signal is (L). So the underscore is another way of telling that an in- or output is 'active Low' input.
Why not using only one pin, you may ask? These pins also have a second function: they only may be negated when the address is stable and valid. When an Intel or Zilog processor is used, the inputs of these I/O-ICs are directly connected to the processor in most cases so the designer has nothing to worry about.

But what about memory-ICs, why don't they have a clock-input or an equivalent? Indeed, all memory-ICs I know nowadays, lack this pin. But most designs I know this clock-signal has been incorparated in the circuit used to select the RAM-ICs.


Control bus

We already mentioned the data and address bus. We refer to all signals needed to control ICs, like the R/W-line(s) and clock-lines, as the control bus.


Memory-ICs

Memory can be bought in different sizes. In 1985 an 8K*8 IC, the 6164, was considered big. In 1987 the first 1 MB module showed up. In this case a module is a small print equipped with 2 or more RAM-ICs. In 1998 one could buy 32 MB modules. Nowadays one can buy 2 GB modules.
I have seen 2 GB modules equipped with 16 ICs. That means that each IC is capable of handling 128 MB.

1 MegaByte (MB) = 1024 KiloBytes (KB) = 1024 * 1024 Bytes.
1 TeraByte (TB) = 1024 GigaByte (GB) = 1024 * 1024 MegaByte

The first commercial computers only had a few KB onboard. For example the Commodore KIM-1 (1977) and Sinclair ZX81 (1980) had only 1 KB. The KIM-1 used eight 1K*1 ICs, the ZX81 two 1K*4 ICs. The famous Commodore 64 had eight 64K*1 ICs. I have seen PC boards with the 80286 processor that could be equipped with two or four 1 MB modules. I also have seen PC boards with the 80486 processor that could be equipped with four or eight 1 MB modules. Later 80486 boards could be equipped with two or four 8 MB modules.

These few lines may sound puzzling. Most processors in '70s were so called '8-bitters and therefore used 8 bits wide data busses. So it seems logical to produce ICs with an 8 bits wide data bus. So why for example these 1K*1 and 1K*4 configurations and not 1K*8?
I can only think of two reasons:
- the dropout rate was that high that it was better to have small ICs. - the actual IC would become too big.

Anyway, from the 6116 on most RAM-ICs had an 8 bits wide data bus.

The 1 MB module I mentioned above, was equipped with either nine 1M*1 ICs or one 1M*1 plus two 1M*4 ICs. But this makes 9 bits? This 9th bit was used for parity control. It tells you something about the faith the producers had in their own products :)
As the 80286 was a 16-bitter, it obviously needed 2 or 4 modules. And as the 80486 is a 32-bitter....

The 1 MB module was a 30-pins module, only capable of supporting up to 4 MB. Because of the various 16- and 32-bitters engineers developed the 72-pins module. This module had a 16-bits data bus and was capable of supporting up to 32 MB (AFAIK). Then the 32-bits 168-pins module was developed, supporting up to 2 GB. And then I lost track....


Static and dynamic RAM

The type of memory using Flip-flops is called "Static RAM" (SRAM). Another type of memory is "Dynamic RAM" (DRAM). DRAM uses a capacitor to store a bit. This capacitor can be either full or empty. Unfortunately the used insulators are not ideal and after the capacitor is charged, the load will start to leak away. Therefore a DRAM contains a circuit that checks the load and restores the original level if needed. This process is called "refresh". Such a "refresh" is done for a complete group of cells a time. Generally the square root of the total of memory. This sounds a little bit strange but AFAIK all DRAM's have another strange feature: they have a multiplexed address bus. This means that each pin is used for two address lines.
As already mentioned above, the Commodore 64 has 64 KB RAM onboard in the form of eight 4164 ICs (64K*1). The 4164 is of the DRAM type. For 64 K we normally would need 16 address lines, with multiplexing only eight. Instead of a 'Chip Select'-pin, the DRAM has a CAS- and a RAS-pin. CAS stands for "Column Address Select", RAS stands for "Row Address Select". First the system presents one half of the address and negates RAS, then presents the second half and negates CAS.


Why multiplexed address lines?

An obvious answer to this question is that an IC needs fewer pins for the same function. But then have a look at this:
- The 4164 (and other types as well) has a separate pin for "data in" and "data out". The only circuit I have seen using this feature is the parity-error-circuit in PCs.
- The 4164 is a 64K*1 configuration in a 16-pin IC. It sounds very logical to develop a 64K*8 configuration. Using a combined data in/out you only need 7 extra pins. Even then you end up with less pins then the 24 needed for the 6116, the 2KB*8 SRAM. But I only know the 4-bit version, the 41464 / 4464.

In general: the developers must have had their reasons although they are not clear to me.


Dis- and advantages of the DRAM

In most systems designers used DRAM. Why? Technically seen the design of a DRAM is simpler then that of a SRAM, even including the refresh mechanism. And simpler means smaller and cheaper.

The SRAM has two advantages above the DRAM:
- SRAM is faster
- with a battery and a few extra parts you can save the data even if the system has been powered down.


Refresh

Negating RAS also triggers the refresh circuit. But every row has to be refreshed with in a given time interval. For several reasons this has to be an independent circuit. In the PETs and CBMs of Commodore, a simple counter took care of this job. The Z80, an 8-bit processor used in the Sinclair ZX81 and Spectrum, has an onboard refresh register especially for this purpose.
Later DRAM designs had a counter on board. The refresh was triggered by negating CAS before negating RAS.


ROM, PROM, EPROM and EEPROM

We now know we can store data in Random Access Memory. But what happens if we turn off the power? Sorry for you, but all the data will disappear. I mentioned using SRAM and a battery but this is unworkable for huge amounts of memory. And what about computers that never have been started before? And if you don't use the computer regularly, you also run the risk of an empty battery.

So parallel to RAM, manufacturers developed ROM: Read Only Memory. In simplified form: take a RAM as base, remove the Flip-flop and OR-gates, and tie the input of the output buffers to (H) or (L) according the wishes of the customer. This last sentence means that the factory only produced ROMs on demand. That made the ROM's expensive. Oh boy, if you made an error as programmer :(

The next step were PROMs: Programmable ROMs. Instead of connecting the OR-gates to a Flip-flop, one OR-gate was connected to a mini-fuse. So one could program his own ROM by blowing the right fuses. Costs dropped now as factories could now mass produce PROMs instead of producing only small batches. Errors in programming became much cheaper now.

The next step was the invention of EPROMs: Erasable PROMs. These EPROMs could be erased using ultraviolet light. Erasing an EPROM costed quite some time, about 20 minutes. But as it could be reprogrammed several thousand times, it was much, much cheaper then buying

So they developed EEPROMs: Electrically Erasable PROMS. These can be erased and programmed again by the system itself as they only need the standard 5 Volt (+ 12 Volt) power supply.


More advanced Flip-flops

The Flip-flop above made of two NAND-gates is a very simple one. In time manufacturers have made more sophisticated ones. An example is the 7474, a so called D-Flip-flop (D for Data).

D (for Data) is an input. The moment the level on input CLK (for CLocK) goes from (L) to (H) , ">" means "positive edge", the value of Data is copied to output Q. Output Q\ (another way to write Q with a "_" on top) becomes the opposite of Q. Changing the Data has no further influence on the outputs as long as Clock does not go from (L) to (H).
Negating Preset causes Q to go (H) and Q\ to go (L) at any time. Negating Reset causes Q to go (L) and Q\ to go (H) at any time. Negating both these inputs causes unpredictable results.
Beside the D-Flip-flop, there is also the JK-Flip-flop. I won't discuss it here.


Latch

Take 8 D-Flipflops and connect the D-inputs to the data bus. Tie the Preset and Reset inputs to (H). Tie all clock inputs together to one pin (CLK). We now have a so called 8-bits latch. CLK can be used to copy data from the data bus. The outputs can be used to send data to a printer. To be able to check what data we copied, just add an 8-bits buffer.
The above combination describes the well known 74373 and 74573. The only difference between these two ICs is the way the pins have been arranged.

If I mention "latch" in this document, I mainly mean the use of the D-Flip-flops. Not in all circumstances a buffer is needed.


Dividers and Counters

Take a D-Flipflop and connect output Q\ to input D. Now feed the clock input with a square wave (through an inverter). The result is that the value of output Q toggles at every negative edge of the original square wave. The signal outputted by Q is a square wave as well, but with half of the frequency of the original signal. Now take some more D-Flipflops, connect Q\ with D and cascade the D-Flipflops by connecting the clock-input to output Q\ of the previous Flip-flop.

The result is that every Flip-flop outputs a square wave at half of the frequency of the previous one, a frequency divider.
    _   _   _   _   _   _   _   _   _   _   _   _   _   _   _   _           
  _| |_| |_| |_| | | |_| |_| |_| |_| |_| |_| |_| |_| |_| |_| |_| |  original
      ___     ___     ___     ___     ___     ___     ___     ___            
  ___|   |___|   |___|   |___|   |___|   |___|   |___|   |___|   |  1st FF  
          _______         _______         _______         _______           
  _______|       |_______|       |_______|       |_______|       |  2nd FF  
                  _______________                 _______________            
  _______________|               |_______________|               |  3rd FF  
Now let's convert this picture into bits:
  0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1  original    
  0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1  FF 1    
  0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1  FF 2    
  0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1  FF 3    
It seems we have a counter in our hands! We only have to use as many D-Flipflops as we need to output bits.

Now imagine the original clock is connected directly to the clock input of first D-Flipflop and cascade the D-Flipflops by connecting the clock-input to output Q (notice: Q and not Q\) of the previous D-Flipflop. Result:
    _   _   _   _   _   _   _   _   _   _   _   _   _   _   _   _           
  _| |_| |_| |_| | | |_| |_| |_| |_| |_| |_| |_| |_| |_| |_| |_| |  original
    ___     ___     ___     ___     ___     ___     ___     ___            
  _|   |___|   |___|   |___|   |___|   |___|   |___|   |___|   |__  1st FF  
    _______         _______         _______         _______           
  _|       |_______|       |_______|       |_______|       |______  2nd FF  
    _______________                 _______________            
  _|               |_______________|               |______________  3rd FF  
Now let us also convert this picture into bits:
  0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1  original    
  0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0  FF 1    
  0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0  FF 2    
  0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0  FF 3    
We have created another counter, but one that counts down!


A selector

A selector enables us to choose which signal we want to feed to an input.
 S   D1  D2  |  O
 ------------+---
 0   0   0   |  0
 0   0   1   |  0
 0   1   0   |  1
 0   1   1   |  1
 1   0   0   |  0
 1   0   1   |  1
 1   1   0   |  0
 1   1   1   |  1

Result: O = (S\ * D1) + (S * D2)

If we combine this selector with one of the counters above, we can create one that is capable of either counting up or counting down by connecting the clock input to either Q\ / inverted clock or Q / original clock.

It is also possible to combine selectors and D-Flipflops in such a way that it is either a latch or an up-counter by connecting the clock input to either Q\ / inverted clock or a Chip Select signal and by connecting the data input to either is own Q\ clock or a data bus. This "pre-loadable up-counter" can be used as the Program Counter of a processor. The Program Counter is the register that is mainly responsible for outputting an address at the address bus of the processor.


Shifters

Imagine a bunch of D-Flipflops where the data inputs are connected to the Q-output of the previous one. So a clock signal forces a D-Flipflop to copy the data of its predecessor. The result is that the last D-Flipflop in the row starts to output the data of all previous ones.
On the other side we find a so far unused D-input. It can be used to input serialised data like data coming out of a modem. After eight clocks we can read it byte wise directly from the Q-outputs.
So far we can read serialised data. But how can we create it? This is done by adding the selectors mentioned above. They make it possible to load bytes, words or whatever is wanted.


The first processors

If you are Dutch, you should be familiar with "Draaiorgels / "Barrel organ". The music the Draaiorgel produces, is directed by its Draaiorgelboek / Music roll. The holes in the paper make the various instruments produces their sounds. The various registers, counters, adders and whatever else you find inside a processor are its instruments and the program is its Draaiorgelboek. But there is at least one major difference: the processor can jump and branch inside its program, the organ cannot.
I have no idea how many "instruments" one of these old processors had and I also have no how many bits they supported. Bear in mind, having an eight bits wide data bus doesn't mean that these old processor could only support up to eight registers etc.; three bits and a decoder can select up to seven registers (one state is reserved for 'none'). So eight bits could support up to 17 registers or equivalents and could activate three at a time.


How does a program look like?

As already said, a processor works with numbers. What ever bit size they use, they all work the same: the first number they read is treated as a command. Depending on the type of command the next number they read can be another command or data. Remark: the number of data the processor has to read doesn't have to be limited to one number. But to emphaszie: the first number is always a command (from now on called 'opcode').


Instruction decoder

Some engineers noticed that the first processors needed quite a lot of instruction to perform certain task and that all those had a lot of instructions in common. So they decided to combine group of instructions to new mega-instructions. Asking a processor to execute one of these mega-instructions means that you ask it in fact to execute all these micro-instructions itself.

For the moment I use the instruction decoder of the 6502 as example. That is, how we think it works. The schematic has never been published so people simply designed several ones that acted exactly as the original one.
My design is based on a super ROM and a counter: the ROM contains all the micro-instructions and the counter makes sure that they are executed in the right order. But how?
The outputs of this super ROM are used to control all kind of buffers, registers counters and the ALU (Arithmic Logical Unit), the "calculator" inside the processor.
Regarding the inputs of this super ROM: the first input is the instruction itself, or better, the opcode. As the data bus, the internal as well as external one, could be needed for other operations, the opcode is stored in a latch whose outputs serve as inputs for the super ROM. Now the counter kicks in and starts counting up. At every count of the external clock and the counter the ROM outputs another pattern (= micro-instruction). The last micro-instruction is for every opcode the same: it resets the counter and we are back at the start: the processor is ready to read the next instruction.

Most processors have extra special inputs, like Reset or Interrupt. When one of these inputs is triggered, the instruction decoder has to act on it as well. So in one or another way these signals have to be fed to it.

Many processors have branch instruction. These instructions enable the processor to jump, or not, to another part of the program depending on the fact whether a certain condition is set or not. This means that the instruction decoder has to be aware of all these conditions as well in one or another way.


Clock signal

By the way, what makes the counter tick? That is the famous clock signal. Was it only 1 MHz for the Commodore 64 and 4.77 Mhz for the first IBM PC, nowadays we ar talking about 3-4 GigaHertz.
But a remark is on its place: the 8088 needs about 4-5 clock cycli to read a byte, the 6502 about one. This has to do with the fact that the 6502 uses an hard coded instruction decoder while the 8088 and successors use micro-code. See it as a mini processor running a program inside the 8088 where the instruction is the program.


Some examples

I will try to explain the working of the instruction decoder using some instruction as example. The first picture is a template that I will use to explain some parts you will find inside the processor.


The brown boxes are the various registers, counters, etc. The blue lines show the address and data buss. The green lines are the ones that are active at a given moment, the grey lines are the inactive ones. Lines not needed for the examples aren't drawn.

I first have to explain some abreviations:
PC - Program Counter, a 16 bits pre-loadable up-counter. Used to drive the address bus. One input, 'E', is for en/disabling its output, the other, 'C', is for clocking the counter.
TAR - Temporary Address Register, also a pre-loadable up-counter. Used to drive the address bus when doing a direct access (= reading from/writing to memory/IO). One of the examples will show its use. The counter is only needed for indirect accesses (not shown here). Therefore the clock signal isn't drawn either. Input 'E' is for en/disabling its output, 'L' for loading a byte in the lower part of the counter, 'H' for loading a byte in the higher part.
DATA - A bi-directional buffer to seperate the internal and external data bus. Input 'E' is for en/disabling its outputs.
A - Register A, an eight bits latch. In the 6502 this is the main register. Input 'I' is loading a byte, 'O' for outputting it.
X - Register X, also an eight bits latch. In the 6502 this is a register mainly used for indexed addressing.
LATCH - An eight bits latch used to store the instruction during the whole processing. Input 'L' is loading a byte.
C - The Instruction Decoder counter, a three bits counter continously clocked by the external clock signal. Input 'RES' is for resetting the counter.


Loading the opcode
The first phase is for all instructions the same: loading the opcode. The DATA buffer is enabled so the processor can read the opcode. The L input of the LATCH is activated so the it can store the opcode. IMHO it should be obvious that the decoder completely ignores the opcode in this phase, it only reacts on the counter and clock.





TAX
TAX stands for "Transfer the content of register A to register X" where "transfer" should be read as "copy". TAX is a one-byte instruction. In this phase the output of register A and the input of register X are enabled. The Program Counter also receives the signal to increase the counter by one.




This phase is resetting the Instruction Decoder counter. As this happens in an instant, we are back in the very first phase: loading the opcode.





LDA #5
"LDA #5" stands for "LoaD register A with the value 5" and is a two-bytes instruction. After loading the opcode, the only thing that has to be done in this phase is increasing the Program Counter.




During the third phase the data byte is read from the outside world by enabling the outputs of the DATA latch and is copied into register A by activating its 'I'input.




Again this phase is nothing more then increasing the Program Counter.




The last phase is resetting the Instruction Decoder counter and we are back in the loading phase.





STA $3174
"STA $3174" stands for "STore the content from register A into memory or a register at address $3174". This is a three-bytes instruction. Again the only thing that has to be done in this phase is increasing the Program Counter.




During the third phase the first data byte is read from the outside world by enabling the outputs of the DATA latch and is copied into the low-byte part of the Temporary Address Register by activating its 'L' input.




Again this phase is nothing more then increasing the Program Counter.




During the fifth phase the second data byte is read from the outside world by enabling the outputs of the DATA latch and is copied into the high-byte part of the Temporary Address Register by activating its 'H' input.




In this phase the Program Counter is increased once more. But because we need to store a byte in the next phase using the Temporary Address Register, the outputs of the Program Counter have to be disabled and the ones of the Temporary Address Register to be enabled. Because we store a byte, the R/W pin of the processor has to be negated.




In this phase register A is told to output the byte by activating its 'O' output. As the outputs of the DATA buffer towards the outside world are enabled as well, the byte will show up on the data pins of the processor.




In this phase the address bus is transfered to the Program Counter again. Notice that the Program Counter is not increased. Quite obvious because that has already been done two phases ago. But the point is, it could also have been done in this phase instead of that one. This is one example where we have no idea what the original 6502 Instruction Decoder does.




The last phase is resetting the Instruction Decoder counter once more and we are back in the loading phase.





The Instruction Decoder is not a ROM

So far I said the part that stores the micro-instruction could be seen as a ROM. A ROM implies that it has two-to-the power-n number of memory cells. In the very first phase the Instruction Decoder doesn't need any info regarding the opcode. This means that at least 256 memory cells aren't needed at all. Another fact is that the 6502 only has 151 valid opcodes. This means another 105 * 256 * 16 unused cells.

In reality the Instruction Decoder in the original 6502 is the sum of all algorithms for every single instruction. This resulted in the fact that some unused opcodes became unwantedly the cross product of two or more other used opcodes. A famous one is LAX #n, "Load register A and register X with the value n". The code for LDA #n is $A2, the code for LDX #n is $A9, the code for LAX #n is $AB. And $AB happens to be $A2 OR $A9 which can be translated as that both opcodes are executed at the same time. This and other examples lead to the conclusion that all official opcodes had only be decoded to that point needed to make them work just to save transistors.
Nowadays the non-official codes are named "illegal codes". The first persons who discovered them immediatly started to hunt for others. They found illegal codes that did nothing at all but yet behave like a one-, two- or three-byte instruction. Others did something, usefull or not, and others made the processor completely crash; only a reset could put it back in action.

A warning is on its place: illegal opcodes can change/disappear without any warning. The 8500 in the Commodore 64-II is the CMOS version of the 6510 in the older Commodores and one could expect that, for compatability reasons, they should act the same. They do, that is, for the legel opcodes. But there have been reports that the illegal opcodes in the 8500 behave flaky; its seems that some instruction sometimes loses bits.
In the 65C02 and its successor all illegal codes have been replaced by new instruction or by "NOP" (No OPeration).

The Zilog Z80, known from the various Sinclair and MSX computers, has illegal opcodes as well but is produced by various companies. It seems that the illegal opcodes of the NEC versions behave different from the Zilog ones.


Special inputs

Most processors have one or more special input pins. I will decribe the three most commomn ones. Those signals operate completely opcode independant. So I decided to use an extra latch to store the signals and an extra input for the Instruction Decoder that tells it to use the extra latch as input instead of the latch with the opcode.

Reset
A processor cannot start up from the blue, it needs a starting point. For this reason (AFAIK) all processors have a RESET-input pin. For a 6502 or 6809 its active level is (L). A Z80 or an 80x86 requires a (H) on this pin. The actual initialization starts the moment the level is returned to the normal state.

The way a processor behaves on a reset is quite different for the various brands. After (re)setting some internal registers, the Z80 starts executing a program found at address $0000.
The 8088 (IBM-XT/PC) and successors expect a program at address $FFFF0. In most cases you'll find there a long jump to a lower part of the ROM.
The 6502 and successors perform an indirect jump: JMP ($FFFC). The 6502 expects to find a program at the address that on its turn is found at the consecutive addresses $FFFC and $FFFD.
The 6800 and 6809 behave the same but look at the addresses $FFFE and $FFFF.

Interrupt / Non Maskable Interrupt
Imagine you are reading a book and the telephone rings. Most likely you will mark the sentence in memory, answer the phone, and, after finishing the conversation, resume reading the book at the point where you stopped due to the interruption of the telephone. "Most likely" because you are free to ignore it. In case of a fire alarm, you most likely won't ignore it.
The two equivalent pins are the Interrupt-input (IRQ / INT) and the Non-Maskable-Interrupt-input (NMI). The 6502 can mask (read: ignore) an IRQ if ordered to do so by the appropriate opcode. As the name already says, a NMI cannot be ignored.

How does the 6502 deal with interrupts? The first thing it has to do is to "remember" where it is by copying the momentary address of the PC to the so called Stack: an area of memory especially reserved for this purpose. In case of the 6502 this is the range $0100-$01FF. Beside the address, the Flag register is saved as well.
Once the momentary address is stored, the processor has to execute the program belonging to the specific interrupt. The 6502 will perform an indirect jump, JMP ($FFFE) for an IRQ, JMP ($FFFC) for a NMI.

Once the interrupt routine has been finished, this is signalled to the processor with the instruction RTI (ReTurn from Interrupt). The processor restores the contents of the Program Counter and Flag register with the data previously stored on Stack, loads a new opcode and continues its operation at this point.


Video

Nowadays we don't know better then there is a big video display attached to our computer. That used to be different. The first Commodore computer used six 7-segment LED-display to show an address and according data byte. The next model, the PET 2001, had a real monitor. But how a computer know how to display text or images on a screen? Simply, it doesn't. In short it sends a stream of bits to the monitor which translates it as dots on the screen which we on our turn interprete as characters or images.

But how does the computer know what to send to the monitor? First we have to know how the 'ancient' Cathode Ray Tube monitor works. An beam with electrons is sent from the back of the device to screen what you are looking at. Where an electron hits the screen, the phospher on the inside will light up. The beam moves from left to right (from your point of view) and from top to bottom, just like you would read a book. Arrived at the bottom-right point of the screen, it moves to upper-left point again. To produce smoother pictures, the so-called interlaced monitors first write the uneven lines and then the even ones.

To make sure that the text/picture looks right, the computer has to tell the monitor when to start a horizontal line or the vertical movement. These signals are called the horizontal and vertical synchronization.
Older televison sets have a so-called "composite video input", in most cases in the form of a CINCH plug. Here the video and synchronization signals travel over one single line according a certain algorithm. But because these signals can interfere with each other in this way, seperate signals are prefered.

There are generally four systems to create a screen:
- The processor does it self. - A static circuit. - A programmable circuit. - A processor controlled circuit.

The processor does it self
This is the way it is done by the Spectrum ZX80, ZX81 and Jupiter Ace. The disadvantage: the processor looses a lot of time just by generating the screen. In case of the ZX81 about 75%! The advantage: less parts needed, thus a cheaper product.

A static circuit
A circuit based on normal TTL IC's generates the various signals, including the video signal. This circuit gets its data from RAM that is shared with the processor. On the PET 2001 this is 1024 bytes of RAM found at address $8000. This amount of RAM covers the screen that can display 25 lines with 40 charactres each. In other words, each byte in the RAM (except the last 24) covers one character on the screen.

These are the first eight characters of the character ROM of the PET:
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
   000     00    00000     000   0000    000000  000000    000  
  0   0   0  0    0   0   0   0   0  0   0       0        0   0 
 0  0 0  0    0   0   0  0        0   0  0       0       0      
 0 0 00  000000   0000   0        0   0  0000    0000    0  000 
 0  00   0    0   0   0  0        0   0  0       0       0    0 
  0      0    0   0   0   0   0   0  0   0       0        0   0 
   0000  0    0  00000     000   0000    000000  0         000  
                                                                
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
Each '0' represents a dot that is written to the screen. A character is 7 dots wide, 8 dots high. A character is stored as eight consecutive bytes. So the first row of '@' is found at address 0, the first row of 'A' is found at address 8, etc., etc. How is the character ROM read? The output of the video RAM is connected with the address lines A3..A10. The lines A0..A2 are connected to a three bit counter (0..7).
Now assume the circuit has to display the above characters starting from the top-left corner of the screen. First the circuit reads the byte at address 0 and sends all the bits, starting with bit 7, to the monitor. Then it reads the byte at address 8 and repeats the action. After having read 40 characters, it increases the counter and starts to send the second line of the first 40 characters. Having sent all lines of the first 40 characters, it starts with sending the first line of the second 40 characters. After having read all 25 lines, it starts with the first line again.

A remark about the ROM: the above picture shows the character set we are familiar with. Just replace it with a ROM containing the cyrilic character set and we can use the computer in Russia, Bulgaria or Serbia! (Yes, I know, the keyboard....)

A programmable circuit
The above circuit had at least two disadvantages:
- you are limited to a fixed number of lines and a fixed number of pixels/line
- the number of IC's needed
So some companies developed a special Video-IC that contained most of the needed hardware. The most well known is the Motorola 6845. The Color Graphic Adapter and Monochrome Display Adapter from IBM, the Monochrome Graphic Card from Hercules and the Graphic Solution Card from ATI were based on this IC.
But the 6845 has another advantages: you can adjust the number lines, the number of dots/line, the number of vertical dots/character and the number of horizontal dots/character. This enabled one to show text on the screen varying from 40 to 132 characters/line and 25 to 40 lines/screen.

A processor controlled circuit
Until the beginning of the 80's most computer systems were text only for the simple reason that RAM was expensive. Was 1 KB sufficient for a 40*25 text screen, for the same graphical representation you already needed 8 KB. Not much according our nowadays standards but for those days it could make a computer too expensive to sell well.
Anyway, prices dropped and we got better and bigger video cards. But a VGA card capable of 1024*768 pixels with 256 colours alrady needs 1 MByte of RAM. So in my case in 1990 I had a 80286 with only 1 MB of RAM and this 1 MB video card. Drawing lines, circles, ellipses or whatever started to become quite a stress for computers. This is one of the reasons I had to upgrade my computer to a 80486.
Engineers realised this as well and soon the question rose why the computer itself had to calculate all points of a circle and why not the video card itself? In that case the only thing the computer had to do was to supply the video cards with information like what type of object was wanted, coordinates, orientation, colour and what ever else was needed. The GPU, Graphical Processor Unit, was born.

The electronics
What electronics is needed to make the video card run? It may suprise you but all parts needed have already been discussed before:
- RAM.
- A counter to select the character, one to select line where the character is founf but it can be used to select the row within the character as well. - Some logic, connected to the counters, to generate the horizontal and vertical synchronization signal. - In case of the 6845 (and other video IC's) you need pro-loadable counters plus latches were the counters can load their data from. - A (EP)ROM. - A shifter to serialize the data that is sent to the monitor.





Having questions or comment? You want more Info?
You can email me here.