Ruud's Commodore Site Home  Email

Auto Disassembler




What is it?

This disassembler converts 6502 binaries to source files. It does it more or less in an intelligent way: it tries to determine which part of the program is code and which part is data by analyzing the complete program.


The story

The problem with normal disassemblers is that they disassemble everything starting from the address you supply. So data will be disassembled as well which can confuse anyone reading the resulting file. It even becomes worse if the resulting code at the end of a data field overlaps the code following this field. In that case you have to start the disassembling process all over again starting at this specific address. So I got the idea of developing a disassembler that determines (with a little bit of help) which part is code and which part is data.
The idea behind it is simple: give it the start address of the PRG and it will proceed from that address on analyzing the opcodes it encounters. In case we are dealing with a Kernal or Boot ROM, we can use the RESET-, IRQ- and NMI-vector as start addresses. If the disassembler encounters a Branch (BEQ, BCS etc.) or a Subroutine (JSR) it saves the address of this location and caries on. At a jump (JMP) it actually goes to this new address but marks the begin and end address of the area it just has examined. It stops disassembling when it encounters a Return (RTS, RTI), an unsolvable (indirect) jump/branch or when it reaches an area that already has been examined. A jump/branch is unsolvable when the address where the PRG is supposed to go to is not in the area covered by the PRG itself. Example: JMP ($0314) where $0314/5 is RAM. As we don't know the contents of $0314/5 (analyzing does not mean emulating!), we won't know where actually to jump to.
After being forced to stop, the disassembler looks for the address of the last subroutine or branch it encountered and jumps to the address the subroutine or branch could have gone to. If this address is in an area not covered yet, the disassembler starts examining again. If the area already has been examined, the disassembler goes to the next branch or subroutine in the list. This process will go on until at the end all subroutines and branches it encountered have been investigated.

At this point theoretically the whole PRG should have been split up in areas of code and data. Unfortunately most of the time this is not the case due to the already mentioned unsolvable jumps and branches. Encountering an unsolvable subroutine or branch simply means that we probably will be stuck with an area of data that in reality is an area of code. A good example is the IRQ-routine of the C64 which starts at $FF48. Due to a branch it will go on with "JMP ($0316) at $FF55 or with "JMP ($0314) at $FF58. In either case the investigation ends as both addresses are in RAM and their contents are unknown to us. This means that the area starting from $EA31 won't be covered and will be regarded as data. (But how do I know that the area starting at $EA31 is code? Because I have the complete listing of the ROMs of the C64 on paper!)

There are three ways of using the RAM-area to tell the original program where to go to. The first one is as used with the IRQ-routine: fill $0314 with $31 and $0315 with $EA and the result will be that PRG will resume the IRQ-routine at $EA31. The second way is using the stack like the SYS-command does when starting at $E130: it pushes the values $E1 and $46 on stack which means that the RTS at the end of the program of the user forces the C64 to resume its task at $E147. The third way is copying a part of the ROM to RAM like with the CHRGET-routine. This routine is copied from the area $E3A2/E3B9 to $0073.
What can we do about it? If no listing is available then we have to go step for step through the already disassembled code to find out where it fills the used RAM areas with data, note down the contents of the area and then tell the disassembler to use this info when disassembling the PRG again. The advantages of my disassembler above a common 'straight-on' disassembler are marginal in case no listing is available. (Remark: my disassembler can be used as a 'straight-on' disassembler as well!).

In the C64 the area $A052/$A07F contains a list with the addresses of several BASIC-functions. Tell my disassembler of the existence of this list and it will do the rest. At $A00C/$A051 we find another list but these are the actual addresses minus 1. This type of list my disassembler can handle as well.
Having a complete listing the real advantages starts to show up. With a normal disassembler you have to tell it every time which area to assemble and which one not. With my assembler you only have to give it a list of start addresses and compare the 'unexamined areas' with the listing. The work to be done is much less then flipping through the listing looking for every area yourself.

I found two situations which can produce errors:
- From $E394 on at a C64 you'll find the 'BASIC cold start routine'. This routine ends at $E3A0 with a branch to the 'warm start routine'. In this specific case the branch will always be executed. But my disassembler assumes that a program can execute the instructions behind that branch as well. - The 1541 uses JSR's to create an error message. But a part of the routine is resetting the Stack Pointer. Which means the return address will never be used.
In both cases the disassembler will disassemble the code behind such a branch or JSR. In such a case four things can happen:
I found out about this type of branch because I expected that the CHRGET-routine would appear as 'unexamined area' but it didn't. Being code, this routine was interpreted as valid code as well.


How to use it.

You start the disassembler by supplying two or more parameters. One parameter must contain the name of the file to be disassembled and another one must contain the start address: "/Ffilename" and "/B$xxxx".
The extra parameters are:
/A      = Check only the areas, do not disassemble. This will save some
          time when you first want to sort out all areas.
/D=file = file with disassembling directives 
/I      = do NOT search for an IRQ routine
/M128   = program is C128 module (not implemented yet)
/M64    = program is C64 module
/N      = do NOT search for a NMI routine
/Sxxxx  = address where program starts (else RESET routine)
          /S is ignored when /M is used.

How to use the disassembling directive:
DATA $aaaa $bbbb
 Known block of valid data code.
DISA $aaaa $bbbb
 Block to be disassembled, ignore the rest.
EXAM $aaaa
 Examine the code starting from $aaaa.
 
 
EQUI Label name = $aa
EQUI Label name = $aaaa
 Instead of using its own generated label names, the disassembler will use the
 names supplied by you.
 
Remark: this only works for byte sized operands, not for addresses found inside the range to be disassembled. Sorry :(    
 
 
EXCL $aaaa $bbbb
Exclude data within this block from disassembling.
FILE filename $aaaa
 Load file at address $aaaa, used for code needed to disassemble the original
 file. Example: to disassemble the Kernal ($E000-FFFF), you need the BASIC-ROM
 ($A000-BFFF) as well.
JUMP $aaaa $bbbb
 Known block of JMP commands starting at $AAAA and ending at $bbbb. Mainly used
 by Commodores; the vector-tables at the end of the ROM.
MARK $aaaa $bbbb
Mark block with special remark statement.
MOVE $aaaa $bbbb $cccc
 Used for parts in the original code, which will be moved or copied to other
 areas in memory by the program like the CHRGET-routine mentioned earlier.
 $aaaa $bbbb = range of block to move/copy
 $cccc       = new address
NAME filename
 Name of output file(s). 
PROG $aaaa $bbbb
 Known block of valid program code. Warning: jumps/branches from this area to
 other areas and only used by this area won't be examined!. Can be used for
 'straight-on' disassembling by declaring (a part of) the whole file as code.
VECT $aaaa $bbbb
 Known block of program vectors starting at $aaaa and ending at $bbbb. Mainly
 used for the addresses of the BASIC commands of the Commodores.
VECP $aaaa $bbbb
 As above, used for RTS-jumps, so add 1 to address to get the real address.
You MUST use four digits for all addresses. You can add comment but you have to place at least two spaces before it like: "EXCL $C000 $DFFF RAM and I/O area". Have a look at the directive file for the C64, 'C64.DIR' (found in the ZIP), which I used to disassemble the complete ROM.





Having questions or comment? You want more Info?
You can email me here.