Everyone who had once owned a ZX Spectrum, surely had at least once seen an amazing loading effect and wondered: How are they doing this? So, come and have a read… It will be a long story and perhaps you might even learn something.
Because ZX Spectrum didn’t have any dedicated circuit for storing and retrieving programs to and from a cassette tape (such as PMD), it had to do it purely by software. At the highest level, programs were loaded as a pair of “header + data”. In the header file there was information about the file (name, type, length, etc.) and the data file contained actual custom data.
The header file has a length of 17 bytes and a following structure:
|0||1||Type (0=program, 1=number array, 2=character array, 3=code)|
|1||10||Name (right padded with spaces to 10 characters)|
|11||2||Length of data block|
|13||2||Parameter 1. Eg. for programs it is a parameter LINE, for ‘code’ it is the address of beginning of the code block|
|15||2||Parameter 2. Eg. for programs it is the beginning of variables|
Data block doesn’t have any such structure, it just contains data.
Let us descend one level: Each block was preceded by one flag byte ($00 for a header, $ff for data) and terminated with a checksum (all bytes, including the flag byte, XORed together). Header was therefore actually 19 bytes long – it began with 0 and ended with the checksum, but these bytes were “consumed” by the system and you couldn’t get your hands on them.
This is where the level at which you could get by calling routines from the ROM ends. The rest of it was just
Ones and zeroes
Data on the tape was written as a sequence of pulses. The processor took care of generating of these pulses, including the proper timing.
Each entry was preceded by a so-called pilot tone. It was created by a rectangular signal (where ON and OFF states were regularly alternated), which means that MIC output in the ON state was 2168 T long (T is the processor clock), and in the OFF state, 2168 T long as well. Boot tone prepared input circuits (if there were any) in a tape recorder to the correct volume level. This boot tone before the header lasted 5 seconds and before the data block two seconds. The duration of the phase tone was determined by a flag byte, all values less than 128 meant a “long” phase tone and more than 128 meant a short one.
After the pilot tone a sync pulse was generated. Its purpose was to let the loading routine know that the pilot tone ends and to begin to process the data. Sync pulse was 667 T ON and 735 T OFF long.
The sync pulse was then followed by actual bytes of data. First was the flag byte, then the contents, and eventually checksum. Bytes were sent bit by bit from the highest one. Every bit was represented by a pulse (state switching to ON and OFF), but differed in length. For logical 0 it was 855T in the ON state and 855T in the OFF state, for logical 1 it was doubled, thus 1710T and 1710T, respectively.
When reading data (which is what we are after here) processor then measured the time between each change of the input signal (ie. time between transitions, or the so-called “Edges”). The recording was therefore not sensitive to polarity (such as for aforementioned PMD in the first version), however the processor did not have much time to do anything else, since it had to listen to EAR input, count loops until the signal changed, and check for SPACE key being pressed (which interrupted the loading, as you surely remember).
Now this is the moment to extract the ROM contents (taken from ‘The Complete Spectrum ROM Disassembly’ by Dr. Ian Logan and Dr. Frank O’Hara, as published by Melbourne House in 1983, available at WOS):
THE 'LD-BYTES' SUBROUTINE
This subroutine is called to LOAD the header information (from 076E) and later
LOAD, or VERIFY, an actual block of data (from 0802).
0556 LD-BYTES INC D This resets the zero flag (D cannot
EX AF,AF' The A register holds +00 for a header
and +FF for a block of data
The carry flag is reset for VERIFYing
and set for LOADing
DEC D Restore D to its original value
DI The maskable interrupt is now disabled
LD A,+0F The border is made WHITE
LD HL,+053F Pre-load the machine stack with the
PUSH HL address - SA/LD-RET
IN A,(+FE) Make an initial read of port 254
RRA Rotate the byte obtained
AND +20 but keep only the EAR bit
OR +02 Signal RED border
LD C,A Store the value in the C register
(+22 for 'off' and +02 for 'on' - the
present EAR state)
CP A Set the zero flag
The first stage of reading a tape involves showing that a pulsing signal
actually exists. (i.e. 'On/off' or 'off/on' edges.)
056B LD-BREAK RET NZ Return if the BREAK key is being pressed
056C LD-START CALL 05E7,LD-EDGE-1 Return with the carry flag reset if
JR NC,056B,LD-BREAK there is no 'edge' within approx.
14,000 T states. But if an 'edge' is
found the border will go CYAN
The next stage involves waiting a while and then showing that the signal is
LD HL,+0415 The length of this waiting period will
0574 LD-WAIT DJNZ 0574,LD-WAIT be almost one second in duration.
CALL 05E3,LD-EDGE-2 Continue only if two edges are found
JR NC,056B,LD-BREAK within the allowed time period.
Now accept only a 'leader signal'.
0580 LD-LEADER LD B,+9C The timing constant
CALL 05E3,LD-EDGE-2 Continue only if two edges are found
JR NC,056B,LD-BREAK within the allowed time period
LD A,+C6 However the edges must have been found
CP B within about 3,000 T states of each
JR NC,056C,LD-START other
INC H Count the pair of edges in the H
JR NZ,0580,LD-LEADER register until 256 pairs have been found
After the leader come the 'off' and 'on' parts of the sync pulse.
058F LD-SYNC LD B,+C9 The timing constant
CALL 05E7,LD-EDGE-1 Every edge is considered until two edges
JR NC,056B,LD-BREAK are found close together - these will be
LD A,B the start and finishing edges of the
CP +D4 'off' sync pulse
CALL 05E7,LD-EDGE-1 The finishing edge of the 'on' pulse
RET NC must exist
(Return carry flag reset)
The bytes of the header or the program/data block can now be LOADed or VERIFied.
But the first byte is the flag byte.
LD A,C The border colours from now on will be
XOR +03 BLUE & YELLOW
LD H,+00 Initialize the 'parity matching' byte
LD B,+B0 Set the timing constant for the flag
JR 05C8,LD-MARKER Jump forward into the byte LOADing loop
The byte LOADing loop is used to fetch the bytes one at a time. The flag byte is
first. This is followed by the data bytes and the last byte is the 'parity'
05A9 LD-LOOP EX AF,AF' Fetch the flags
JR NZ,05B3,LD-FLAG Jump forward only when handling the
JR NC,05BD,LD-VERIFY Jump forward is VERIFYing a tape
LD (IX+00),L Make the actual LOAD when required
JR 05C2,LD-NEXT Jump forward to LOAD the next byte
05B3 LD-FLAG RL C Keep the carry flag in a safe place
XOR L Return now if the flag byte does not
RET NZ match the first byte on the tape
(Carry flag reset)
LD A,C Restore the carry flag now
INC DE Increase the counter to compensate for
JR 05C4,LD-DEC its decrease after the jump
If a data block is being verified then the freshly loaded byte is tested against
the original byte.
05BD LD-VERIFY LD A,(IX+00) Fetch the original byte
XOR L Match it against the new byte
RET NZ Return if 'no match' (Carry flag reset)
A new byte can now be collected from the tape.
05C2 LD-NEXT INC IX Increase the 'destination'
05C4 LD-DEC DEC DE Decrease the 'counter'
EX AF,AF' Save the flags
LD B,+B2 Set the timing constant
05C8 LD-MARKER LD L,+01 Clear the 'object' register apart from
a 'marker' bit
The 'LD-8-BITS' loop is used to build up a byte in the L register.
05CA LD-8-BITS CALL 05E3,LD-EDGE-2 Find the length of the 'off' and 'on'
pulses of the next bit
RET NC Return if the time period is exceeded
(Carry flag reset)
LD A,+CB Compare the length against approx.
CP B 2,400 T states; resetting the carry flag
for a '0' and setting it for a '1'
RL L Include the new bit in the L register
LD B,+B0 Set the timing constant for the next bit
JP NC,05CA,LD-8-BITS Jump back whilst there are still bits to
The 'parity matching' byte has to be updated with each new byte.
LD A,H Fetch the 'parity matching' byte and
XOR L include the new byte
LD H,A Save it once again
Passes round the loop are made until the 'counter' reaches zero. At that point
the 'parity matching' byte should be holding zero.
LD A,D Make a furter pass if the DE register
OR E pair does not hold zero
LD A,H Fetch the 'parity matching' byte
CP +01 Return with the carry flag set if the
RET value is zero (Carry flag reset if in
THE 'LD-EDGE-2' and 'LD-EDGE-1' SUBROUTINES
These two subroutines form the most important part of the LOAD/VERIFY operation.
The subroutines are entered with a timing constant in the B register, and the
previous border colour and 'edge-type' in the C register.
The subroutines return with the carry flag set if the required number of 'edges'
have been found in the time allowed; and the change to the value in the B
register shows just how long it took to find the 'edge(s)'.
The carry flag will be reset if there is an error. The zero flag then signals
'BREAK pressed' by being reset, or 'time-up' by being set.
The entry point LD-EDGE-2 is used when the length of a complete pulse is
required and LD-EDGE-1 is used to find the time before the next 'edge'.
05E3 LD-EDGE-2 CALL 05E7,LD-EDGE-1 In effect call LD-EDGE-1 twice;
RET NC returning in between in there is an
05E7 LD-EDGE-1 LD A,+16 Wait 358 T states before entering the
05E9 LD-DELAY DEC A sampling loop
The sampling loop is now entered. The value in the B register is incremented for
each pass; 'time-up' is given when B reaches zero.
05ED LD-SAMPLE INC B Count each pass
RET Z Return carry reset & zero set if
LD A,+7F Read from port +7FFE
IN A,(+FE) i.e. BREAK and EAR
RRA Shift the byte
RET NC Return carry reset & zero reset if BREAK
XOR C Now test the byte against the 'last
AND +20 edge-type'
JR Z,05ED,LD-SAMPLE Jump back unless it has changed
A new 'edge' has been found within the time period allowed for the search.
So change the border colour and set the carry flag.
LD A,C Change the 'last edge-type' and border
AND +07 Keep only the border colour
OR +08 Signal 'MIC off'
OUT (+FE),A Change the border colour (RED/CYAN or
SCF Signal the successful search before
Note: The LD-EDGE-1 subroutine takes 464 T states, plus an additional 59 T
states for each unsuccessful pass around the sampling loop.
For example, therefore, when awaiting the sync pulse (see LD-SYNC at 058F)
allowance is made for ten additional passes through the sampling loop.
The search is thereby for the next edge to be found within, roughly, 1,100 T
states (464 + 10 * 59 overhead).
This will prove successful for the sync 'off' pulse that comes after the long
In this article there is no space for detailed description of how the algorithm works, but let’s quickly go over some facts.
Routine disables interrupts. This is logical, since it is dependent on exact timing. That’s also the reason why loaders stored in slow RAM ($4000- $7FFF) will not work.
The last state of EAR input is stored in the C register, but it is shifted by 1 bit to the right, and the lowest three bits contains border color. For the leader it is 02 (red) / 05 (cyan), for data it is 01 (blue) / 06 (yellow). Why the right shift? During loading, a sequence LD A, $7F and IN A, ($FE) is processed. The value $7F is sent to the upper 8 bits of the address, therefore when reading a keyboard input it selects a line of keys B – N – M – Symbol Shift – Space. The status of these keys is in the lowest five bits (0-4), bit 6 contains an EAR state. Using the RRC instruction the lowest bit (status of the SPACE key) moves to the flag CY and EAR state moves to the 5th bit position. Therefore, if CY is zero, it means that a SPACE was pressed and loading ends.
The routine begins by detecting a signal that corresponds to the pilot tone (LD-LEADER). If 255 of such pulses were read, it was then believed that this was a pilot tone and the routine waits for a shorter synchronization pulse. When it arrives (LD-SYNC), you can retrieve data.
LD-MARKER reads 1 byte to register L. It begins with a value of 01, to serve as a counter. Gradually it fills with bits from the right by instruction RL L, the highest bits then passes into CY. If CY = 0, they keep loading further, but once CY = 1, it means that a complete set of eight bits was retrieved.
Key routines are LD-EDGE-n (wherein n is 1 or 2). LD-EDGE-1 first waits for a certain period of time (465T) and then determines whether the value of the EAR input has changed and compares it against the last stored value (in register C, see above). If it has not changed, the loop is repeated. For each loop the value in the registry B is increased. Once it gets to zero, it means “timeout” – the edge did not come in the expected time limit.
If the edge is found, the content of the register C is negated. This results both in a change of stored EAR value, but also in a change of the color border.
LD-EDGE-2 actually performs two LD-EDGE-1 in a sequence.
LD-EDGE output is as follows:
- CY = 0, Z = 1 – during a time interval EAR change did not come (“timeout”)
- CY = 0, Z = 0 – SPACE was pressed (BREAK)
- CY = 1 – edge was found, the current value of the counter is in B
The counter in B counts, as I wrote, upwards. During every loop pass, which takes 58 processor ticks, B is incremented by one. For example, when reading a bit, routine LD-EDGE-2 is called, thus seeking two edges. The counter in B is set to $B0. This means that the timeout comes after $4F cycles ($FF- $B0). This represents a 2 x 465T of wait loop + 79 * 58T = 5512T. After all this time, the routine reports a signal failure.
The resulting counter value is compared to $CB. If it is smaller, it is then evaluated as two short pulses of log. 0, if it is larger, then log. 1. The value of $CB means that the loop was done 27 times ($CB-$B0), specificaly that the two edges came at a time of less than 2496T. Let’s recall: For log. 0 the last two pulses take 1710T, for log. 1 they take 3420T. Therefore the difference between these two times is 2565 and this is roughly what I came up with. It is a bit less because of overhead (subroutine calls, evaluation etc., See LD-8-BITS).
This was not so difficult, was it? So now, let’s make some of those tricks …
First, reading routine is not completely “T-pedantic”, so few Ts here or there do not pose any major problem. If you want a simple effect, we can add it without any complicated adjustments.
We can for example change color of the stripes, if we do not like the default two-tone ones. How about a rainbow? Simply rewrite the end of the routine LD-EDGE:
What’s going on here? Instead of negating of the contents of register C, we increase its value by 1 and negate the value of the fifth bit. Thanks to masking of the value of $27, we avoid the overflow which would affect the EAR bit. Therefore it will vary in the range 0-7 and create a rainbow effect border.
Note: If you intend to try this, be sure to place the loading routine in the upper 32 Kbytes of RAM!
If we add a pair of instructions XOR A; OUT ($FE), A to the end and before the instruction SCF, stripes in the border will change into short lines on a black background.
Here we can put any effects that affect a color border either by amending it, or by using it. What about the effect that Busy used with loader for Song In Lines 3 (rounded corners), it looks impressive, huh?
Yet it is not very hard… Four squares in the corners contain a simple pattern (“rounding”). Pixels that are equal to 1 (the color INK) will look as if they were part of the border and will show streaks. I did not examine how Busy does it, but I’d bet that in principle it’s done somehow like this:
LD A,C ;Change of the last "edge type"
CPL ;as well as BORDER color.
AND $07 ;Taking just only the BORDER color
OR $08 ;MIC off
OUT ($FE),A ; Change border color
OR $30 ; A contains: 0 0 1 1 1 b b b (b = border color)
; so PAPER=7, INK=border color,
; BRIGHT 0, FLASH 0
LD ($5800),A ;left top corner attribute
LD ($581F),A ;right top corner
LD ($5AE0),A ;left bottom
LD ($5AFF),A ;right bottom
SCF ;Set CY=1 as a "success"
RET ;before return
Border effects are mostly simple and fast enough, so we can squash them here and not worry about the timing too much. Usually they fit within tolerance.
Simple effects with loaded content
During the loading we can certainly manage simpler operation with loaded data, either at the bit or byte level. Digisynth demos loader was able to perform real-time data extracting using Huffman decompression algorithm (Huffman suits this purpose quite well, you just need to have a decompression tree saved and pass through it according to the loaded bit). For those interested, I have prepared a reconstruction of the loader. But we will have a look at another case, and this will be a well known Mad Load – a routine that loads square images in a certain order. The video shows its improved version.
Mad Load used a very simple data format. After the flag byte, data for each square followed. Each took 11 bytes – lower and higher byte of a screen address where the square should be stored, then 8 bytes of video memory and a 1 byte of attribute. This was followed by another square …
This loading routine was slightly modified by Frantisek Fuka (FUXOFT) – he made LD-8-BITS into a subroutine which reads 1 byte to register L. This subroutine is then used in another subroutine to retrieve one of the squares (MAD_SQUARE). Squares loading subroutine is then called over and over again until there is data to be loaded from the tape, and when there isn’t, it stops and returns back. There is no checksum performed or anything.
Here is the Mad Loader source code. I only commented on the parts that differ from the standard code.
LD_SYNC: LD B,$C9
; Mad Load itself
CALL LD_ONE_BYTE ; The first one is a flag byte. Drop it!
CALL LD_ONE_BYTE ; Lower byte of address
EX AF,AF' ; save into AF'
CALL LD_ONE_BYTE ; Upper address byte
LD H,L ; to the H register
LD L,A ; and the first one to the L, so I have a full address in HL
LD B,$08 ;8 bitmap bytes for each square
PUSH HL ;save the address
PUSH BC ;and the counter
CALL LD_ONE_BYTE ; read 1 byte
POP BC ; restore the counter
LD A,L ; byte to accumulator
POP HL ; restore address for this byte
RET NC ; If some error occured, return
LD (HL),A ;Else store the byte into screen memory
INC H ; addr + 256 - it means "next screen microline"
DJNZ MAD_SCRN ; repeat for all 8 bytes
LD A,H ; Convert address from screen memory to attribute memory
SUB $08 ; First, sub 8 to get the original value
RRA ; H div 8
AND $03 ; lowest 2 bits
OR $58 ;$58, $59 or $5a - attribute memory address
LD H,A ; So now I have an attribute address in HL
PUSH HL ; save it
CALL LD_ONE_BYTE ; read one byte
LD A,L ; save it to accumulator
POP HL ; restore the address
LD (HL),A ; and put the attribute byte to a proper place
RET ; Finished, one square is done
FF78: RET NC
FF7C: JR NZ,LD_DELAY
FF7E: AND A
… And at this point I would to finish for now.
Next time we take a look at more complex effects, for example various counters of bytes or time remainging and other horseplay, for which we need more processor ticks, and so we will have to adjust the timing loop and chop our algorithms so that they fit into the time we have available. Meanwhile, please accept this brief “introductory to the mystery of loaders”, try to experiment yourself, and if you create some nice loader, show it off!
(Translated by Ondřej Ficek)