Intense: Aline: Colors in sync

Summary

Traditional challenges associated with using multicolor in ZX Spectrum programs include:

Writing precisely timed display code.
Adjusting the synchronization timings by hand to execute it at the right moment.
Keeping the rest of your program from influencing the overall timing.

Aline, the technique presented in this article, aims to get rid of problems 2 and 3 altogether.

Explanation

Aline is the idea of synchronization by floating bus reads taken to its logical conclusion. Originally, reading from an idle 'ULA' port has been used by a number of ZX Spectrum games instead of relying on interrupts, mainly in order to allow more time for drawing in the frame. However, we have found that it is possible to achieve stable sync within single T states of precision using this mechanism.

This article will primarily discuss synchronization with the top of the screen area, which is usually what is sought after in most cases. Other algorithms that would allow synchronization to an arbitrary character line appear feasible as well but more complicated.

Let's take a look at the rundown of the floating bus fetch cycle pattern over at the Sinclair FAQ Wiki. It can be seen that the pattern isn't random and there is a correlation between screen addresses and their contents, and values returned at particular T states. Therefore if we fill some bitmap/attribute lines with uniquely identifiable sync marker codes, it becomes possible to unambiguously tell where we were at the time of reading.

The idea behind aline is reinterpreting these sync codes as actual complementary T state delay times from the current raster location to the end of the line. When one of these is read back, execution is delayed for this many T states, therefore resuming at the same constant T state at the end of the line.

In order to be able to do this, we need a way to make sure that we always have at least one bitmap/attribute read off the very first scanline of the screen. This can be accomplished by making the input loop take an odd number of T states that is close to a multiple of 8, while being reasonably fast. In this way, the execution would constantly 'drift' relative to the floating bus pattern and eventually arrive at either the bitmap or the attribute area in the pattern. Before that happens, the idle values of 255 from either the border or empty parts of the pattern that are encountered in the loop are simply ignored. A single loop iteration of 25 T states (8*3 + 1) can repeat 4-5 times over the course of the active screen area on a line (128 T states), which is enough for this purpose.

    ld e,2
    call .sync
    ...
.sync
    ld hl,.sync_lp
    ld bc,#FFFF
.sync_lp
    in a,(c)
    cp e
    ret p
    jp (hl)

Regarding initializing the top line with sync marker values, keep in mind that it is not necessary to fill every attribute and bitmap byte. The only requirement here is making sure that the 'pattern distance' between two marker values does not exceed 4. The rest of the bytes must be set to zero or any other value that would be ignored by the loop.

Notice that within this scheme, it is expected that the aline algorithm would be running exclusively during the lower/top border time, which might not always be the case. Therefore, it would need to be supplemented by another piece of code that would delay execution until the raster is on the lower/top border. The following or similar code is included with all aline versions.

SyncBorder
    ld hl,#4000
    ld (hl),#FF            ;+2A/+3 workaround (see below)
    ld d,4                 ;D = number of consecutive reads
    ld bc,(Aline.LOC_Port)
.lp1
    ld e,d
.lp2
    ld a,(hl)              ;+2A/+3 workaround
    in a,(c)
    inc a
    jr nz,.lp1
    dec e
    jr nz,.lp2
    ret

Finally, due to the nature of this technique, some attribute artifacts might be left visible in the upper scanlines. The standard implementation of aline attempts to mask this for the first two scanlines, visually leaving them black. If the calling program begins filling the screen with full-width multicolor data immediately afterwards, no artifacts would be left displayed on the screen. If the width of the multicolor area is less than that, additional work might be necessary in order to clear the edge attributes.

The 'color' variant of the standard aline implementation allows changing the color of the sync marker area. The sync marker code assignment method is changed such that the resulting attribute artifacts are kept to a minimum, only affecting the BRIGHT setting of each other attribute cell over a half of the sync marker area horizontally. If the color is set to black, no artifacts would be left visible, as in the standard edition.

The 'attribute-only' floating bus pattern variants of aline discussed further below utilize a narrower sync area, requiring significantly less work to mask the attribute artifacts. Additionally, they make it possible to change the sync marker area color without limitations.

Note: In some cases, the emphasis might be on reducing the width of the sync marker area to a minimum, even at the cost of having partially visible onscreen artefacts, as well as fully restoring the attribute data overwritten by the sync area as soon as possible. These requirements are followed in a separate algorithm referred to as 'aline-special'. In particular, it uses a sync area that is only 10 characters wide. However, it is less universal compared to the others, with additional conditions imposed on the routine placement in memory, and as such is not included in the standard package.

Advantages

Multicolor-compatible synchronization in a simple manner similar to issuing a HALT
Free CPU time can be used efficiently and the restriction on timing-uncompensated branching is lifted
A multicolor program can be written like any other with comparatively few special considerations expected of such productions, outside of tuning the display code portion itself
As the result, the potential complexity of multicolor software is greatly increased

Disadvantages

Requires ZX Spectrum models that implement some form of floating bus functionality
The available screen space is reduced by up to 3 scanlines depending on the algorithm
Alining to somewhere other than the top of the screen in this manner is more complicated
Upper scanline artifacts that may or may not be masked depending on the algorithm

Compatibility

Sinclair and Amstrad ZX Spectrum models

So far, aline is confirmed to work as expected under emulation on all Sinclair ZX Spectrum configurations up to and including the +2 (SpecEmu, Spectaculator, Fuse, Spectramine), as well as on the Amstrad ZX Spectrum +2A/+3 configurations (SpecEmu, Spectramine) with some limitations.

Regarding the +2A/+3 models, it was discovered recently that it is possible to access the floating bus functionality on these. Specifically, the fetch pattern was found to be similar to that of the Sinclair models, with a few notable differences:

It responds to port addresses with the mask 0000XXXXXXXXXX01b in 128K mode
The values returned from the floating bus port have the bit 0 set
The value of an attribute preceding an idle portion in the pattern is returned instead of 255
The border idle value is changed after a contended memory access

A single iteration of the +2A/+3 fetch pattern over the paper area might therefore look like this (Bitmap, Attribute):

B0 A0 B1 A1 A1 A1 A1 A1

The first two points listed above are accounted for in the standard implementation of aline. The third point on the other hand diminishes the usefulness of fourth attributes in the fetch pattern, effectively reducing the synchronization precision down from 1 to 5 T states. It must be noted, however, that this reduced precision is normally sufficient on practice given the nature of application for such a method, and factors such as memory contention. As well, a special algorithm to get around this issue appears likely to emerge at some point.

Another implication here is that there's no default idle value that is returned during border time on the +2A/+3. What would be normally read in this case is rightmost attributes of each individual character line. However, if a contended memory address is accessed by the CPU during border time, the idle value that is returned from the fb port is set to the contents of that address. This new idle value remains in effect until the raster is passing over the screen area again. The standard implementation of aline accounts for this as well.

Unofficial 'attribute-only' floating bus mods

Modifying the aline algorithm, it is possible to support unofficial Spectrum-compatible machines that implement a simplified variation of the floating bus functionality. On these, reading the floating bus port produces the values of attributes at the current position of the raster beam each 4 T states.

At a glance, it might seem that with this pattern, the synchronization precision must be reduced down to 4 T states. However, this is not necessarily the case, and it can be overcome by using a specialized aline algorithm. Furthermore, the 'attribute-only' floating bus pattern in fact provides several advantages over that of the Spectrum models. First off, there are no 'idle' values included in the pattern. As long as the beam is moving over the PAPER portion of the screen, the values read from the port will correctly reflect its position over a particular attribute. This makes it unnecessary to use the full width of the line as the sync marker area. If the sync marker area is located at the top left corner of the screen, it only has to be as wide as to encompass the equivalent duration of a single run of the port reading loop.

.sync_lp
in a,(c)
jp m,.sync_lp

The duration of this loop is 22 T states, which corresponds to the sync area that is 22 / 4 = 5.5 ~= 6 characters wide. Therefore, not only it requires significantly less work to mask the resulting attribute artifacts, but the reduced number of sync marker codes that does not exceed 7 means they can be contained entirely within the INK portion of the values, allowing for unrestricted changes to the color of the sync marker area using the PAPER and BRIGHT settings. This scheme, referred to as 'attr4', works exactly the same way as the original aline algorithm. Despite the fact that it can only reach 4 T state precise synchronization, this is often sufficient on practice for most applications that involve timing-reliant video effects.

There exists another, more complex algorithm that allows to obtain 1 T state precise synchronization. This scheme, referred to as 'attr1', works the following way. The 6 characters wide sync area is duplicated three times over the first attribute line of the screen. When the raster beam reaches the sync area, the routine samples the port three times in sequence and aligns the program execution according to the returned values using a special algorithm. This method is suited for applications that require higher synchronization precision but uses a wider sync area compared to the 'attr4' algorithm. In both cases, only the attributes need to be set up as sync marker values.

Due to significant differences in the workings of both 'attr' variants, they return to the caller program at different moments within a frame on the same machine. In particular, 'attr1' returns somewhat later than 'attr4'. As well, the routines do not attempt to balance their return timing with that of the original aline schemes due to the fact that the timing parameters of unofficial machines are known to deviate substantially even across the same model (strictly speaking, they should be presumed unknown).

The 'attr' variants of aline are confirmed to work under emulation on the Pentagon and Scorpion configurations (UnrealSpeccy versions 0.35b2 and 0.38.3, 'Even M1' delays disabled). However, because a number of floating bus mod schemes are known to work unstable, there is no guarantee the 'attr' variants of aline would exhibit stable work on real hardware in all cases.

Resources

The source code for aline is available here. Both standard and 'attribute-only' implementations are included, with versions of a multicolor test program tailored for each.

A demo of the CATS Mint engine by Intense is available for an example of using aline.
Download archive (.TAP)
Watch demonstration

Special thanks

Chernandezba, Ast A. Moore, Your Spec-chum, Woody, Weiv and everyone else of the +2A/+3 floating bus testing effort over at the World of Spectrum forums.

Revisions

20181130

Added the 'attr1' aline variant
Rewritten the 'attribute-only' pattern section
Reorganized the Resources section

20180721

Corrected the information regarding the 'attribute-only' pattern handling
Improved the 'attr4' aline variant
Added the 'attr4' test program

20171124

Added the 'color' aline variant
Expanded the Resources section
Added .TAP version of the CATS Mint example

20170922

Initial publication