6502 Auto-disassembler
This disassembler does not only disassemble your file but also tries to determine which part of the program is code and which part is data by analysing the complete program.
DISCLAIMER
Copyrights
At this point theoretically the whole PRG should have been split up in areas of code and data. Unfortunately most of the time this is not the case due to the already mentioned unsolvable jumps and branches. Encountering an unsolvable subroutine or branch simply means that we probably will be stuck with an area of data that in reality is an area of code. A good example is the IRQ-routine of the C64 which starts at $FF48. Due to a branch it will go on with "JMP ($0316) at $FF55 or with "JMP ($0314) at $FF58. In either case the investigation ends as both addresses are in RAM and their contents are unknown to us. This means that the area starting from $EA31 won't be covered and will regarded as data. (But how do I know that the area starting at $EA31 is code? Because I have the complete listing of the ROMs of the C64 on paper!)
There are three ways of using the RAM-area to tell the original program where to go to. The first one is as used with the IRQ-routine: fill $0314 with $31 and $0315 with $EA and the result will be that PRG will resume the IRQ-routine at $EA31. The second way is using the stack like the SYS-command does when starting at $E130: it pushes the values $E1 and $46 on stack which means that the RTS at the end of the program of the user forces the C64 to resume its task at $E147. The third way is copying a part of the ROM to RAM like with the CHRGET-routine. This routine is copied from the area $E3A2/E3B9 to $0073.
I found a situation that can produce errors. From $E394 on at a C64 you'll find the 'BASIC cold start routine'. This routine ends at $E3A0 with a branch to the 'warm start routine'. In this specific case the branch always happens. My disassembler assumes that a program can perform the branch but interprets the instructions behind that branch as valid instructions as well. In this particular case I found out about this branch because I expected that the CHRGET-routine would appear as 'unexamined area' but it didn't. By accident this routine was interpreted as valid code as well.
The story
The problem with normal disassemblers is that they disassemble everything starting from the address you supply. So data will be disassembled as well which can confuse anyone reading the resulting file. It even becomes worse if the resulting code at the end of a data field overlaps the code following this field. In that case you have to start the disassembling process all over again starting at this specific address. So I had the idea of developing a disassembler that determines (with a little bit of help) which part is code and which part is data.
The idea behind it is simple: give it the start address of the PRG and it will proceed from that address on analysing the opcodes it encounters. In case we are dealing with a Kernal then we can use the RESET-, IRQ- and NMI-vector as start addresses. If the disassembler encounters a Branch (BEQ, BCS etc.) or a Subroutine (JSR) it saves the address of this location and caries on. At a jump (JMP) it actually goes to this new address but marks the begin and end address of the area it just has examined. It stops disassembling when it encounters a Return (RTS, RTI), an unsolvable (indirect) jump/branch or when it reaches an area what already has been examined. An jump/branch is unsolvable when the address where the PRG is supposed to go to is not in the area covered by the PRG itself. Example: JMP ($0314) where $0314/5 is RAM. As we don't know the contents of $0314/5, we won't know where actually to jump to. (analysing does not mean emulating!)
After being forced to stop, the disassembler looks for the address of the last subroutine or branch it encountered and jumps to the address the subroutine or branch could have gone to. If this address is in an area not covered yet, the disassembler starts examining again. If the area already has been examined, the disassembler goes to the next branch or subroutine in the list. This process will go on until at the end all subroutines and branches it encountered have been investigated.
What can we do about it? If no listing is available then we have to go step for step through the already disassembled code to find out where it fills the used RAM areas with data, note down the contents of the area and then tell the disassembler to use this info when disassembling the PRG again. The advantages of my disassembler above a common 'straight-on' disassembler are marginal in case no listing is available. (Remark: my disassembler can be used as a 'straight-on' disassembler as well!).
In the C64 the area $A052/$A07F contains a list with the addresses of several BASIC-functions. Tell my disassembler of the existence of this list and it will do the rest. At $A00C/$A051 we find another list but these are the actual addresses minus 1. This type of list my disassembler can handle as well.
Having a complete listing the real advantages starts to show up. With a normal disassembler you have to tell it every time which area to assemble and which one not. With my assembler you only have to give it a list of start addresses and compare the 'unexamined areas' with the listing. The work to be done is much less then flipping through the listing looking for every area yourself.
In a 'always branch' situation three things can happen:
How to use it.
You start the disassembler by supplying two or more parameters. One parameter must contain the name of the file to be disassembled and another one must contain the start address: "/F=filename" and "/B=$xxxx".
Extra parameters are:
/A = Check only the areas, do not disassemble. This will save some
time when you first want to sort out all areas.
/D=file = file with disassembling directives
/I = do NOT search for IRQ routine
/M128 = program is C128 module (not implemented yet)
/M64 = program is C64 module
/N = do NOT search for NMI routine
/Sxxxx = address where program starts (else RESET routine)
/S is ignored when /M is used.
How to use the disassembling directive:
DATA $aaaa $bbbb
Known block of valid data code.
DISA $aaaa $bbbb
Block to be disassembled, ignore rest.
EXAM $aaaa
Examine the code starting from $aaaa.
EQUI Labelname = $aa
EQUI Labelname = $aaaa
Instead of using its own generated lablenames, the disassembler will use the
names supplied by you.
EXCL $aaaa $bbbb
Exclude data within this block from disassembling.
FILE filename $aaaa
Load file at address $aaaa, used for code needed to disassemble the original
file. Example: to disassemble the Kernal ($E000-FFFF), you need the BASIC-ROM
($A000-BFFF) as well.
JUMP $aaaa $bbbb
Known block of JMP commands starting at $AAAA and ending at $bbbb. Mainly used
by Commodores; the vector-tables at the end of the ROM.
MARK $aaaa $bbbb
Mark block with special remark statement.
MOVE $aaaa $bbbb $cccc
Used for parts in the original code, which will be moved or copied to other
areas in memory by the program like the CHRGET-routine mentioned earlier.
$aaaa $bbbb = range of block to move/copy
$cccc = new address
NAME filename
Name of outputfile(s).
PROG $aaaa $bbbb
Known block of valid program code. Warning: jumps/branches from this area to
other areas and only used by this area won't be examined!. Can be used for
'straight-on' disassembling by declaring (a part of) the whole file as code.
VECT $aaaa $bbbb
Known block of program vectors starting at $aaaa and ending at $bbbb. Mainly
used for the addresses of the BASIC commands of the Commodores.
VECP $aaaa $bbbb
As above, used for RTS-jumps, so add 1 to address to get the real address.
You MUST use four digits for all addresses. You can add comment but you have to place at least two spaces before it like: "EXCL $C000 $DFFF RAM and I/O area". Have a look at the directive file for the C64, 'C64.DIR' which can be found in the ZIP, I used to disassemble the complete ROM.
This ZIP contains the sourcecode, executable and some directive files.