What is itPreferring Pascal above C, I decided to write a Pascal compiler for my C64. Yes, there are various compilers available like Oxford Pascal and the Data-Becker version but I wanted one with the capabilities Borland's Turbo Pascal and written using Borland's Turbo Pascal 7.0 (TP7).
The first idea was to write one in ML but then the idea changed into writing it in Pascal with the idea that, in the end, it should be capable of compiling itself. The main idea was that it should generate a binary in the endand should be able to do itfor the various Commodores: C64, PET, CBM, Plus/4 etc., etc. I even had some Z80 machines in mind. This meant that I had to reserve program lines for every type of machine for every component and that was a bit too much.
Then I got the great idea to not to generate binaries but macros only! Macros can be completely independent of the used machine and/or processor. The only thing I had to do was to tell the assembler what file to use when translating the macros into ML. Fromthis rose the idea that I also wanted it to run on a PC.
At this moment I'm writing the macro files for the 8088. I started with writing them for the C64 but testing the resulting executable was much simpler for the PC version. Still lot of work to do.....
Another versionAlthough not working perfect, I already proofed that the idea worked. But unfortunately there were several challenges:
- The resulting EXE of the compiler generated by Free Pascal Compiler (FPC) was 291 KB big. I'm quite sure that even a wizard cannot turn this source code into an executable that can run on a plain C64.
- I just mentioned FPC. I started with TP7 but I found out that I needed more and more large amounts of memory in time while developing the compiler in such a way it was able to compile itself. TP7 wasn't able to provide it anymore in an easy way, FPC could. A plain C64 cannot provide this amount of memory at all.
- And a bit frustrating: TURBO.COM is only 34 KB. OK, it isn't TP7, it lacks quite some features supported by TP7, but still.
So I decided to start with a ML version as well. But one that should run on a 8088 PC. I also was already working at my own operating system and BASIC program for the 8088 PC and so I already had a good source for quite a lot of usefull ML routines. The red line in the whole design would be the same as the one written in Pascal.
Some time ago I was able to lay my hands on a program that disassembles Borland's Turbo Pascal 3.0, including comment. The resulting source code was meant for MASM/TASM but preferring NASM, I changed it into one that could be assembled by NASM. On the end I was able to produce a compatible code of the same size as the original TURBO.COM and that ran fine. "Compatible" because there exists assembly instructions that can be coded in two different ways. So a piece of code assembled by MASM or TASM will do exactly the same as as when it was assembled by NASM. Just the generated code will partly look different to a human. A disassembled would produce to the same pieces of source code again.
Unfortunately I wasn't able to change the code very much; even adding some innocent NOPs cuased the program not to run at all. Too make a long story short, I found out that the author of the disassembly program made a mistake and I fixed that.
As said, the resulting source code included comment but that did not mean that it was therefore easy to understand. So while starting to understand the code, I not only started to add my own comment but also started to replace the sometimes crytical labels by more understandable ones.
And now the funny part: it seems that the way I interpreted the source code with my FPC program is more or less the same as TP3 does! But there are also differences and the main one is that TP3 generates the binary code more or less right on the spot. The way TP3 does it has also it disadvantages. The main one is that all possibly needed subroutines have to be present, even if they are not used. This is the reason that a source code just containing "begin" and "end." generates a COM file that is still 10 KB big. I solved this in my FPC program by adding the related subroutines only to the file with macros when the routine is called that needs them.
Is it possible to use such a mechanism in the ML version? The problem is that in this case we are looking at two growing heaps of code: the one with subroutines and the one with main code. The Pascal version doesn't mind because it is the ML compiler that does the final job and it only sees one finished file with source code. Assume the final code starts with the block of suroutines and that is followed by the main code. Adding another subroutine means that all jumps and calls generated earlier for the main code have to be recalculated again. You maybe already guessed, using a fixed block of routines like TP3 does, avoids this problem.
Yet another version: Turbo Pascal basedOne thought, or better two thoughts, didn't leave my mind:
- Can the FPC version be compiled by itself under CBM-OS or PascalOS?
- Using the same mechanism in the ML version as used in the FPC version, is it still possible to compile it under CBM-OS or PascalOS?
So the idea rose to start a new version from scratch again, but one that can be compiled by Turbo Pascal. Why from scratch? First: it won't be completely from scratch; I certainly will reuse the interpreter. I only want to use some other ideas how to store the results. Now everything is stored in memory. I figured out that when running into a line like "writeln('Hello!');", that the "Hello!" part can be stored directly into the resulting assembler file as a constant. Which means that it doesn't need to be stored in memory at all. Declared constants still have to be saved because they have to be known by the compiler when used by other items.
Thoughts/ideas that crossed my mind:
- Turbo and Free Pascal store parametes on the stack. The 6502 only has a small stack, just 256 bytes, and never can support this mechanism. So I decided to use a separate stack. I will maintain this idea because IMHO it should simplify programming.
The original compiler does his things in one pass. The idea is do things in two passes.
- The first pass looks were all declared constants, variables, functions and procedures can be found. On the end of that pass it should be clear which one of these items are really used. This information is stored in memory but if it seems that that will consume too much memory, I will store it in a file. Directly used onstants, like the one in "writeln('Hello!');", are not handled yet.
- In the second pass I will store all variables used by the main program and units in one of the assembler files in one go. All constants, whether used by the program, units, functions or procedures will be stored here as well. Variables used by units and functions are stored on the artificial stack and I think I can use the mechanism as used in the Free Pascal compiler here as well. An ID is given to the program and a procedure, function or unit when running into one. When storing the information we take care of two things:
- each unit has an unique name
- the main file or an unit uses units
- procedures and functions are a part of the main file or a unit.
- procedures and functions can be a part of another procedure or function.
- each function or procedure is unique within its program, unit, function or procedure
A simple TYPE can be stored by its name and total length in memory. But what about RECORDs? MP-ASM and ASM86 don't support structures (yet). A possible solution is to calculate the offset of each part of a record like " mov ax,2REC+12".
Then is the next question: how to store RECORDs and CASE inside a type? My idea: create a seperate type inside the found type for these. What about the use of CASE inside a type? My idea: reserve the maximum amount of memory for a given variable thus covering any possible length caused by the CASE in other parts of the program.
You can email me here.