Mar 212015

Everyone who had once owned a ZX Spectrum, surely had at least once seen an amazing loading effect and wondered: How are they doing this? So, come and have a read… It will be a long story and perhaps you might even learn something.

Because ZX Spectrum didn’t have any dedicated circuit for storing and retrieving programs to and from a cassette tape (such as PMD), it had to do it purely by software. At the highest level, programs were loaded as a pair of “header + data”. In the header file there was information about the file (name, type, length, etc.) and the data file contained actual custom data.

The header file has a length of 17 bytes and a following structure:

Position Length Description
0 1 Type (0=program, 1=number array, 2=character array, 3=code)
1 10 Name (right padded with spaces to 10 characters)
11 2 Length of data block
13 2 Parameter 1. Eg. for programs it is a parameter LINE, for ‘code’ it is the address of beginning of the code block
15 2 Parameter 2. Eg. for programs it is the beginning of variables

Data block doesn’t have any such structure, it just contains data.

Let us descend one level: Each block was preceded by one flag byte ($00 for a header, $ff for data) and terminated with a checksum (all bytes, including the flag byte, XORed together). Header was therefore actually 19 bytes long – it began with 0 and ended with the checksum, but these bytes were “consumed” by the system and you couldn’t get your hands on them.

This is where the level at which you could get by calling routines from the ROM ends. The rest of it was just

Ones and zeroes

Data on the tape was written as a sequence of pulses. The processor took care of generating of these pulses, including the proper timing.

Each entry was preceded by a so-called pilot tone. It was created by a rectangular signal (where ON and OFF states were regularly alternated), which means that MIC output in the ON state was 2168 T long (T is the processor clock), and in the OFF state, 2168 T long as well. Boot tone prepared input circuits (if there were any) in a tape recorder to the correct volume level. This boot tone before the header lasted 5 seconds and before the data block two seconds. The duration of the phase tone was determined by a flag byte, all values less than 128 meant a “long” phase tone and more than 128 meant a short one.

After the pilot tone a sync pulse was generated. Its purpose was to let the loading routine know that the pilot tone ends and to begin to process the data. Sync pulse was 667 T ON and 735 T OFF long.

The sync pulse was then followed by actual bytes of data. First was the flag byte, then the contents, and eventually checksum. Bytes were sent bit by bit from the highest one. Every bit was represented by a pulse (state switching to ON and OFF), but differed in length. For logical 0 it was 855T in the ON state and 855T in the OFF state, for logical 1 it was doubled, thus 1710T and 1710T, respectively.

When reading data (which is what we are after here) processor then measured the time between each change of the input signal (ie. time between transitions, or the so-called “Edges”). The recording was therefore not sensitive to polarity (such as for aforementioned PMD in the first version), however the processor did not have much time to do anything else, since it had to listen to EAR input, count loops until the signal changed, and check for SPACE key being pressed (which interrupted the loading, as you surely remember).

Assembler time

Now this is the moment to extract the ROM contents (taken from ‘The Complete Spectrum ROM Disassembly’ by Dr. Ian Logan and Dr. Frank O’Hara, as published by Melbourne House in 1983, available at WOS):

In this article there is no space for detailed description of how the algorithm works, but let’s quickly go over some facts.

Routine disables interrupts. This is logical, since it is dependent on exact timing. That’s also the reason why loaders stored in slow RAM ($4000- $7FFF) will not work.

The last state of EAR input is stored in the C register, but it is shifted by 1 bit to the right, and the lowest three bits contains border color. For the leader it is 02 (red) / 05 (cyan), for data it is 01 (blue) / 06 (yellow). Why the right shift? During loading, a sequence LD A, $7F and IN A, ($FE) is processed. The value $7F is sent to the upper 8 bits of the address, therefore when reading a keyboard input it selects a line of keys B – N – M – Symbol Shift – Space. The status of these keys is in the lowest five bits (0-4), bit 6 contains an EAR state. Using the RRC instruction the lowest bit (status of the SPACE key) moves to the flag CY and EAR state moves to the 5th bit position. Therefore, if CY is zero, it means that a SPACE was pressed and loading ends.

The routine begins by detecting a signal that corresponds to the pilot tone (LD-LEADER). If 255 of such pulses were read, it was then believed that this was a pilot tone and the routine waits for a shorter synchronization pulse. When it arrives (LD-SYNC), you can retrieve data.

LD-MARKER reads 1 byte to register L. It begins with a value of 01, to serve as a counter. Gradually it fills with bits from the right by instruction RL L, the highest bits then passes into CY. If CY = 0, they keep loading further, but once CY = 1, it means that a complete set of eight bits was retrieved.

Key routines are LD-EDGE-n (wherein n is 1 or 2). LD-EDGE-1 first waits for a certain period of time (465T) and then determines whether the value of the EAR input has changed and compares it against the last stored value (in register C, see above). If it has not changed, the loop is repeated. For each loop the value in the registry B is increased. Once it gets to zero, it means “timeout” – the edge did not come in the expected time limit.

If the edge is found, the content of the register C is negated. This results both in a change of stored EAR value, but also in a change of the color border.

LD-EDGE-2 actually performs two LD-EDGE-1 in a sequence.

LD-EDGE output is as follows:

  • CY = 0, Z = 1 – during a time interval EAR change did not come (“timeout”)
  • CY = 0, Z = 0 – SPACE was pressed (BREAK)
  • CY = 1 – edge was found, the current value of the counter is in B

The counter in B counts, as I wrote, upwards. During every loop pass, which takes 58 processor ticks, B is incremented by one. For example, when reading a bit, routine LD-EDGE-2 is called, thus seeking two edges. The counter in B is set to $B0. This means that the timeout comes after $4F cycles ($FF- $B0). This represents a 2 x 465T of wait loop + 79 * 58T = 5512T. After all this time, the routine reports a signal failure.

The resulting counter value is compared to $CB. If it is smaller, it is then evaluated as two short pulses of log. 0, if it is larger, then log. 1. The value of $CB means that the loop was done 27 times ($CB-$B0), specificaly that the two edges came at a time of less than 2496T. Let’s recall: For log. 0 the last two pulses take 1710T, for log. 1 they take 3420T. Therefore the difference between these two times is 2565 and this is roughly what I came up with. It is a bit less because of overhead (subroutine calls, evaluation etc., See LD-8-BITS).

Loader hacking

This was not so difficult, was it? So now, let’s make some of those tricks …

First, reading routine is not completely “T-pedantic”, so few Ts here or there do not pose any major problem. If you want a simple effect, we can add it without any complicated adjustments.

Border effects

We can for example change color of the stripes, if we do not like the default two-tone ones. How about a rainbow? Simply rewrite the end of the routine LD-EDGE:

What’s going on here? Instead of negating of the contents of register C, we increase its value by 1 and negate the value of the fifth bit. Thanks to masking of the value of $27, we avoid the overflow which would affect the EAR bit. Therefore it will vary in the range 0-7 and create a rainbow effect border.

Note: If you intend to try this, be sure to place the loading routine in the upper 32 Kbytes of RAM!

If we add a pair of instructions XOR A; OUT ($FE), A to the end and before the instruction SCF, stripes in the border will change into short lines on a black background.

Here we can put any effects that affect a color border either by amending it, or by using it. What about the effect that Busy used with loader for Song In Lines 3 (rounded corners), it looks impressive, huh?

Yet it is not very hard… Four squares in the corners contain a simple pattern (“rounding”). Pixels that are equal to 1 (the color INK) will look as if they were part of the border and will show streaks. I did not examine how Busy does it, but I’d bet that in principle it’s done somehow like this:

Border effects are mostly simple and fast enough, so we can squash them here and not worry about the timing too much. Usually they fit within tolerance.

Simple effects with loaded content

During the loading we can certainly manage simpler operation with loaded data, either at the bit or byte level. Digisynth demos loader was able to perform real-time data extracting using Huffman decompression algorithm (Huffman suits this purpose quite well, you just need to have a decompression tree saved and pass through it according to the loaded bit). For those interested, I have prepared a reconstruction of the loader. But we will have a look at another case, and this will be a well known Mad Load – a routine that loads square images in a certain order. The video shows its improved version.

Mad Load used a very simple data format. After the flag byte, data for each square followed. Each took 11 bytes – lower and higher byte of a screen address where the square should be stored, then 8 bytes of video memory and a 1 byte of attribute. This was followed by another square …

This loading routine was slightly modified by Frantisek Fuka (FUXOFT) – he made LD-8-BITS into a subroutine which reads 1 byte to register L. This subroutine is then used in another subroutine to retrieve one of the squares (MAD_SQUARE). Squares loading subroutine is then called over and over again until there is data to be loaded from the tape, and when there isn’t, it stops and returns back. There is no checksum performed or anything.

Here is the Mad Loader source code. I only commented on the parts that differ from the standard code.

… And at this point I would to finish for now.

Next time we take a look at more complex effects, for example various counters of bytes or time remainging and other horseplay, for which we need more processor ticks, and so we will have to adjust the timing loop and chop our algorithms so that they fit into the time we have available. Meanwhile, please accept this brief “introductory to the mystery of loaders”, try to experiment yourself, and if you create some nice loader, show it off!

(Translated by Ondřej Ficek)

  4 Responses to “ZX Spectrum and loaders – part one”

  1. Great read, thank you.
    When can we expect part two 2?

  2. ASAP, I hope… maybe this friday, maybe monday… It’s all up to my translator. 🙂

  3. […] the previous article I dandily claimed that DigiSynth was unpacking data during loading. Wow, really? Wasn’t I […]

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">