1541: The floppy disk

What is it.

Almost nobody thinks about the fact how data is stored on a floppy disk. Most of the time people first start paying attention to matter the moment they see the very unpopular message "LOADING ERROR" on the screen. This document gives you the ins and outs how data is stored on a disk. A part of what you will read here is also covered by "1541: Transferring data" .

First there were bits....

All the data is stored as bits on the floppy. For this purpose magnetic particles on the surface of the floppy are magnetised. As long as the little magnets points in the same direction, a "0" is read. The moment the direction is reversed, a "1" is read.

And then there were bytes.

OK, we have the bits but as you'll probably know, most of the time we work with bytes. Bytes are made of combining 8 bits in a row. So, where should we start with combining them?
The 1541 uses a counter to check at regular intervals if there is a "0" or a "1" under the head. Another counter checks how many bits have been read. The hardware is designed in such a way that reading ten "1"s in a row causes both the counters to be reset. At that moment the drive also stops reading bytes. So the first "0" after ten (or more) "1"s causes both counters to start counting again. From that moment on bytes are read again as well.

Two or more $FFs in a row will cause the drive to stop reading. I now can hear you think: "But if I store 5 $FF bytes in a row on the disk, I can read them again afterwards. How about that?"
The fact is that data that is to be stored on the disk, is not stored "as it is". It first goes through an encryption scheme called "binary to GCR (Group Code Recording) conversion". Every nibble (= 4 bits) is turned into a block of 5 bits using the following scheme:

0000 - 01010
0001 - 01011
0010 - 10010
0011 - 10011
0100 - 01110
0101 - 01111
0110 - 10110
0111 - 10111
1000 - 01001
1001 - 11001
1010 - 11010
1011 - 11011
1100 - 01101
1101 - 11101
1110 - 11110
1111 - 10101

After the conversion eight blocks of five bits each are combined to make five bytes. Further you can notice two things:

Every combination of two nibbles does not result in combination of bits containing more then eight "1"s in a row. ($5E = 0111111110) With a maximum of eight "1"s in a row, data can never reset the counters.
Every combination of two nibbles does not result in combination of bits containing more then two "0"s in a row.

Why not more then two "0"s in a row? The reason is that a "1" is also used to synchronise the counters. (synchronise, NOT reset)
If you would write 100 "0"s in a row followed by a "1" to a disk and then started to read these bits again with a drive which is 1% faster, one can imagine that you probably will read 99 "0"s and a "1" instead of the original 100 "0"s. I have no idea about the variation in drive speeds but I can imagine that it is more then 1%. More important is that this trick works.

Bytes make sectors.

From now on it looks simple: take some bytes together and call this group a sector. Unfortunately this is not the case. For a starter: how does the drive know with which sector it is dealing? So the C= designers developed a sector made out of two blocks: the header block and the data block.

The header block
The header block actually contains eight bytes of data:

Header Block ID
This always is $08.
Header Block Checksum
This byte is found by EORing the next four bytes.
Sector Number
Track Number
At first it looks illogical to store the number of the track. But remember that the 1541 has no "track 0" detector (exception: 1541C). The first thing a drive does after powering on and getting a command concerning the floppy is reading a sector header to find out where the head is positioned.
ID Character #2
This is the second character of the ID that you specify when creating a new disk. The drive uses this and the next byte to check with the byte in memory to ensure the disk is not swapped in the mean time.
ID Character #1
Two $0F bytes
These bytes are used to complete eight bytes as we need (a multiple of) four to create (a multiple of) five GCR bytes.

Once the above eight bytes have been GCRed, another five $55-bytes are added. These are the so called "Header Gap". These bytes give the drive some to setup for reading the data block that follows.

NOTE: The header block is written ONLY during the formatting process.

The data block
The data block contains 260 bytes of data:

Data Block ID
This always is $07.
256 Data Bytes
Data Block Checksum
This byte is found by EORing the above 256 bytes.
Two $00 bytes These bytes are needed to complete the needed multiple of four bytes.

Between the header and data block we'll find five $FF bytes as synchronisation markers. Between each sector you'll find these synchronisation markers as well but the number depends on the number of the track and the speed of the drive.

Organising the sectors

Having created the sectors does not mean that the floppy disk is now ready to be used. If we write data to the floppy, we must administrate somewhere and somehow that there is data on the floppy at all. For this reason some sectors are reserved for special purposes. All others can be used to store data.

The BAM

BAM means "Block Availability Map" and is found at track 18, sector 0. Generally it contains the information about which sector is still free to be used or isn't. Following list displays the meaning of every byte(s):

Bytes   Content  Meaning
-------  -------  ---------------------------------------------------  
  0        $12    Track where first directory sector can be found
  1        $01    Sector where first directory entry can be found
  2        "A"    Indication of drive format; 1541/4040 in this case
  3         0     Unused
  4-143           Block Availability Map 
144-159           Diskette name padded with shifted spaces (= $A0)
160-161    $A0    Shifted spaces
162-163           Diskette ID
164        $A0    Shifted space
165        $32    DOS version: 2
166        "A"    Format type
167-170    $A0    Shifted spaces
170-255     ?     Unused

Block Availability Map - entries
For every track four bytes are reserved. The first byte indicates the number of free sectors on this track. Bit 0 to bit 7 of the second byte represent the state of the first 8 sectors of the track. Bit 0 to bit 7 of the third byte represent the state of sector 8 to 15. Finally bit 0 to bit 7 of the fourth byte represent the state of the last 1 to 5 sectors of the track. If the bit is "1" it means the sector is still free. A "0" means allocated or nonexisting.

The directory

The general pattern for a directory sector is:

Bytes   Content  Meaning
-------  -------  ---------------------------------------------------
0        $12    Track where next sector can be found
  1	          Next sector
  2- 31           File entry #1
 32- 33     0     Unused
  ...
224-225     0     Unused
226-255           File entry #8

As the directory must end somewhere, byte 0 of the last sector is filled with a 0. Byte 1 is filled with $FF. This byte informs the system how many bytes of this sector are actually used. In case of a directory sector all bytes are used.

File entry
The entry exists out of 30 bytes. The first one is the file type byte. Bits 0 to 2 determine the type of file. The use of bit 3 and 4 are unknown to me. Bit 5 determines if it is a replacement file (default 0). Bit 6 determines if it is locked (= write protected) or not (default 0 = unlocked). Bit 7 determines if the file is closed (= free to use) or not (default 1 = closed).

HEX      File type                 Directory shows 

---      ---------                 ----------------------
$00      Scratched                 Does not show
$01      Unclosed sequential       *SEQ
$02      Unclosed program          *PRG
$03      Unclosed user             *USR
$04      Unclosed relative         Cannot occur
$80      Deleted                   DEL
$81      Sequential                SEQ
$82      Program                   PRG
$83      User                      USR
$84      Relative                  REL
$A0      Deleted @ replacement     DEL
$A1      Sequential @ replacement  SEQ
$A2      Program @ replacement     PRG
$A3      User @ replacement        USR
$A4      Relative @ replacement    Cannot occur
$C0      Locked deleted            DEL<
$C1      Locked sequential         SEQ<
$C2      Locked program            PRG<
$C3      Locked user               USR<
$C4      Locked relative           REL<

The next two bytes show the track and sector of the first sector of the file.
The next 16 bytes are used for storing the filename. If the length of the name is smaller then 16 bytes, the rest is filled with shifted space (= $A0).
The next three bytes have only a meaning for relative files. The first two give you the information where to find the track and sector of the "Side sector information". The third byte is the record size of each entry.
The next four bytes are unused and filled with $00.
The next two bytes are only used when saving the file using the replace option (@). During the actual saving these two bytes point to the track/sector of the first record of the replacement. When the saving is finished, these two bytes replace to the first two bytes of the entry. After this replacement the bytes are nullified.
The last two bytes represent the size of the file.

The files and their file types

A normal file is made out of one or more sectors. Every sector has, as you'll already know, 256 bytes. The first two bytes of every sector point to the track/sector of the next sector. In case the file is just one sector long, the first byte is $00. As with a directory sector, the second byte then informs the system how many bytes of the last sector are actually used by the file.

There is no reason for setting up a complete different scheme for linking sectors together to make a file so you can use all 256 bytes. But then do NOT validate such a disk: the drive operating system relies on this two-byte-linking system. With no valid bytes, the DOS probably will free and allocate the wrong sectors.

SEQ - sequential file
A SEQ can contain all kind of information varying from text to database records to binary data. The normal way to approach this file is reading the data starting at the very first byte. The normal procedure to alter data is by rewriting the file completely.

PRG - program
You could say that a PRG file is a SEQ file containing machinecode and/or BASIC statements. The big difference is that the third and fourth byte of the first sector store the memory address where the rest of the data has to be loaded. This is the address used when performing a LOAD"",,1.

USR - user file
The only program I know using USR files is GEOS. And that is the only info I can give you. The books I read about this subject only said that the user is free to do with the contents of this file as long as he uses the two-byte link system.

DEL - deleted file
There is even mentioned less about this file. My guess is that is meant to be used as an extra phase before deleting the file when using the "replace option". I only saw people using this file type when inserting headers, footers, separators and other nice features in their directory structure.

REL - Relative file
This is the toughest one and so I kept it to the last.
Technical seen a REL file exists of two files: a sequential file containing the records, and a "side sector" file containing the pointers to sectors containing the records. As already said, byte 20 and 21 of an directory entry point to this "side record" file. Byte 01 and 1 point to the sequential part.

A side sector is built according the following scheme:

 Bytes    Meaning
-------   -----------------------------------------------------------
  0       Track where next sector can be found
  1       Next sector
  4- 15   T/S bytes of maximal 6 side sectors (both 0 when unused)
 16-255   T/S bytes of 120 data blocks (both 0 when unused)

Having questions or comment? You want more information?
You can email me here.