Auto Disassembler

What is it?

This disassembler converts 6502, 680x and Z80 binaries into source files. It does it more or less in an intelligent way: it tries to determine which part of the program is code and which part is data by analyzing the complete program.

The story

The problem with normal disassemblers is that they disassemble everything starting from the address you supply. So data will be disassembled as well which can confuse anyone reading the resulting file. It even becomes worse if the resulting code at the end of a data field overlaps the code following this field. In that case you have to start the disassembling process all over again starting at this specific address. So I got the idea of developing a disassembler that determines (with a little bit of help) which part of the binary is code and which part is data.
The idea behind it is simple: give it the start address of the PRG and it will proceed from that address on analyzing the opcodes it encounters. In case we are dealing with a Kernal or Boot ROM, we can use the RESET, IRQ and NMI vector as start addresses. If the disassembler encounters a Branch (BEQ, BCS etc.) or a Subroutine (JSR) it saves the address of this location and caries on. At a jump (JMP) it actually goes to this new address but marks the begin and end address of the area it just has examined. It stops disassembling when it encounters a Return (RTS, RTI), an unsolvable (indirect) jump/branch or when it reaches an area that already has been examined. A jump/branch is unsolvable when the address where the PRG is supposed to go to is not in the area covered by the PRG itself. Example: JMP ($0314) where $0314/5 is RAM (Commodore 64). As we don't know the contents of $0314/5 (analyzing does not mean emulating!), we won't know where actually to jump to.
After being forced to stop, the disassembler looks for the address of the last subroutine or branch it encountered and proceeds at the address the subroutine or branch could have gone to. If this address is in an area not covered yet, the disassembler continues disassembling again. If the area already has been examined, the disassembler goes to the next branch or subroutine in the list. This process will go on until at the end all subroutines and branches it encountered have been investigated.

At this point theoretically the whole PRG should have been split up in areas of code and data. Unfortunately most of the time this is not the case due to the already mentioned unsolvable jumps and branches. Encountering an unsolvable subroutine or branch simply means that we probably will be stuck with an area of data that in reality is an area of code. A good example is the IRQ routine of the C64 which starts at $FF48. Due to a branch it will go on with "JMP ($0316)" at $FF55 or with "JMP ($0314)" at $FF58. In either case the investigation ends as both addresses are in RAM and their contents are unknown to us. This means that the area starting from $EA31 won't be covered and will be regarded as data (for the moment).

There are three ways of using the RAM area to tell the original program where to go to. The first one is as used with the IRQ routine: fill $0314 with $31 and $0315 with $EA and the result will be that PRG will resume the IRQ routine at $EA31. The second way is using the stack like the SYS command does when starting at $E130: it pushes the values $E1 and $46 on stack which means that the RTS at the end of the program of the user forces the C64 to resume its task at $E147. The third way is copying a part of the ROM to RAM like with the CHRGET routine. This routine is copied from the area $E3A2/E3B9 to $0073.
What can we do about it? If no listing is available then one thing we can do is to go step for step through the already disassembled code to find out where it fills the used RAM areas with data, note down the contents of the area and then tell the disassembler to use this info when disassembling the PRG again. Another thing we can do is to tell the disassembler to disassemble all areas that haven't been covered so far anyway.

Starting from $FD30 you'll find a list with addresses that need to be copied to RAM. One of these addresses is copied to $0314/5. Tell my disassembler of the existence of this list and it will do the rest.
At $A00C/$A051 we find another list but these are the actual addresses minus 1. These addresses are used in combination with RTS. This type of list my disassembler can handle as well.

I found two situations which can produce errors:
- From $E394 on at a C64 you'll find the 'BASIC cold start routine'. This routine ends at $E3A0 with a branch to the 'warm start routine'. In this specific case the branch will always be executed. But my disassembler assumes that a program can execute the instructions behind that branch as well. - The 1541 uses JSRs to create an error message. But at some point the Stack Pointer is resetted and the routine jumps to the main loop and the expected RTS is never found. This means the stored return address is not valid for this situation.
In both cases the disassembler will disassemble the code behind such a branch or JSR. In such a case four things can happen:

The succeeding code is real code: in this case no harm is done at all.
The succeeding code is data but the disassembled code fits neatly between the original blocks of code: in this case no real harm is done either but it looks a little bit silly to find code somewhere where you expect data.
The succeeding code is data but the disassembled code does not fit neatly between the original blocks of code: the resulting code overlaps the original one. My disassembler will notice this because it WILL encounter the starting label for this block of code at an invalid address and thus will give a warning. Declaring the first byte behind that branch as data can solve the problem. This will stop the disassembler to disassemble the code behind that branch.
The succeeding code is data but the disassembler runs into an illegal opcode before the end of the block. A warning will be issued and the result will be opcode plus a block of data .

I found out about this type of branch because I expected that the CHRGET routine would appear as 'unexamined area' but it didn't. Being code, this routine was interpreted as valid code as well.

How to use it.

You start the disassembler by supplying it with a file that on its turn holds at least three or more parameters. One parameter must contain the name of the file to be disassembled, one must contain the start address and the last one the type of processor:

ROM     name
LOAD    $xxxx
CPU     6502

If these are the only parameters, AD assumes it should start disassembling from the reset, NMI and IRQ vector.

The (extra) parameters are:

BEGIN
 Same as LOAD, address where to store the file.

CPU
 Type of CPU: 6502, 6800, 6809 or Z80.

DATA $aaaa $bbbb
 Known block of valid data code, no need to disassemble.

EQUI Label name = $aa
EQUI Label name = $aaaa
 Instead of using its own generated label names, the disassembler will use the
 names supplied by you.
 
EXAM $aaaa
 Examine the code starting from $aaaa.

FILE filename $aaaa
 Load file at address $aaaa. 
 Example: to disassemble the Kernal ($E000-FFFF) of the C64, you need the
 BASIC ROM ($A000-BFFF) as well.

JUMP $aaaa $bbbb
 Known block of JMP commands starting at $aaaa and ending at $bbbb. 
 Example: the vector tables at the end of the ROM of various Commodores.
 
MARK $aaaa $bbbb
 Mark block with special remark statement.
 
MOVE $aaaa $bbbb $cccc
 Used for parts in the original code, which will be moved or copied to other
 areas in memory by the program like the CHRGET routine mentioned earlier.
 $aaaa $bbbb = range of block to move/copy
 $cccc       = new address
 
NAME filename
 Name of output file(s). 

NOIRQ
 Don't check the IRQ vector
 
NONMI
 Don't check the NMI vector
 
NORES
 Don't check the Reset vector
 
NOSWI
 Don't check the SWI vector
 
PROG $aaaa $bbbb
 Known block of valid program code. Warning: jumps/branches from this area to
 other areas and only used by this area won't be examined!. Can be used for
 'straight on' disassembling by declaring (a part of) the whole file as code.

START
 Start address

TEXT $aaaa $bbbb
 Treat a block of data as text.
 
VECP $aaaa $bbbb
 As above, used for RTS jumps, so add 1 to address to get the real address.

VECT $aaaa $bbbb
 Known block of program vectors starting at $aaaa and ending at $bbbb. Mainly
 used for the addresses of the BASIC commands of the Commodores.

You MUST use four digits for all addresses. You can add comment but you have to place at least two spaces before it like: "EXCL $C000 $DFFF RAM and I/O area". Have a look at the directive file for the C64, 'C64.DIR' , which I used to disassemble the complete ROM of the Commodore 64.

Source code

AD is written in Pascal for the Free Pascal Compiler. Because I use the disassembler almost daily, I regularly run into bugs or think of improvements. So if you want the source code, simply ask for it and you will get the newest version!

Having questions or comment? You want more information?
You can email me here.