20170922

Aline: Colors in sync

Summary

Traditional challenges associated with using multicolor in ZX Spectrum programs include:
  1. Writing precisely timed display code.
  2. Adjusting the synchronization timings by hand to execute it at the right moment.
  3. Keeping the rest of your program from influencing the overall timing.
Aline, the technique presented in this article, aims to get rid of problems 2 and 3 altogether.

 

Explanation

Aline is the idea of synchronization by floating bus reads taken to its logical conclusion. Originally, reading from an idle 'ULA' port has been used by a number of ZX Spectrum games instead of relying on interrupts, mainly in order to allow more time for drawing in the frame. However, we have found that it is possible to achieve stable sync within single T states of precision using this mechanism.

This article will primarily discuss synchronization with the top of the screen area, which is usually what is sought after in most cases. Other algorithms that would allow synchronization to an arbitrary character line appear feasible as well but more complicated.

Let's take a look at the rundown of the floating bus fetch cycle pattern over at the Sinclair FAQ Wiki. It can be seen that the pattern isn't random and there is a correlation between screen addresses and their contents, and values returned at particular T states. Therefore if we fill some bitmap/attribute lines with uniquely identifiable sync marker codes, it becomes possible to unambiguously tell where we were at the time of reading. 

The idea behind aline is reinterpreting these sync codes as actual complementary T state delay times from the current raster location to the end of the line. When one of these is read back, execution is delayed for this many T states, therefore resuming at the same constant T state at the end of the line.

In order to be able to do this, we need a way to make sure that we always have at least one bitmap/attribute read off the very first scanline of the screen. This can be accomplished by making the input loop take an odd number of T states that is close to a multiple of 8, while being reasonably fast. In this way, the execution would constantly 'drift' relative to the floating bus pattern and eventually arrive at either the bitmap or the attribute area in the pattern. Before that happens, the idle values of 255 from either the border or empty parts of the pattern that are encountered in the loop are simply ignored. A single loop iteration of 25 T states (8*3 + 1) can repeat 4-5 times over the course of the active screen area on a line (128 T states), which is enough for this purpose.

    ld e,2
    call .sync
    ...
.sync
    ld hl,.sync_lp
    ld bc,#FFFF
.sync_lp
    in a,(c)
    cp e
    ret p
    jp (hl)


Regarding initializing the top line with sync marker values, keep in mind that it is not necessary to fill every attribute and bitmap byte. The only requirement here is making sure that the 'pattern distance' between two marker values does not exceed 4. The rest of the bytes must be set to zero or any other value that would be ignored by the loop.

Notice that within this scheme, it is expected that the aline algorithm would be running exclusively during the lower/top border time, which might not always be the case. Therefore, it would need to be supplemented by another piece of code that would delay execution until the raster is on the lower/top border. The following or similar code is included with all aline versions.

SyncBorder
    ld hl,#4000
    ld (hl),#FF            ;+2A/+3 workaround (see below)
    ld d,4                 ;D = number of consecutive reads
    ld bc,(Aline.LOC_Port)
.lp1
    ld e,d
.lp2
    ld a,(hl)              ;+2A/+3 workaround
    in a,(c)
    inc a
    jr nz,.lp1
    dec e
    jr nz,.lp2
    ret

Finally, due to the nature of this technique, some attribute artifacts might be left visible in the upper scanlines. The standard implementation of aline attempts to mask this for the first two scanlines, visually leaving them black. If the calling program begins filling the screen with full-width multicolor data immediately afterwards, no artifacts would be left displayed on the screen. If the width of the multicolor area is less than that, additional work might be necessary in order to clear the edge attributes.

The 'Color' variant of the standard aline implementation allows changing the color of the sync marker area. The sync marker code assignment method is changed such that the resulting attribute artifacts are kept to a minimum, only affecting the BRIGHT setting of each other attribute cell over a half of the sync marker area horizontally. If the color is set to black, no artifacts would be left visible, as in the standard edition.

The 'attribute-only' floating bus pattern variants of aline discussed further below utilize a narrower sync area, requiring significantly less work to mask the attribute artifacts. Additionally, they should make it possible to change the sync marker area color without limitations.


Advantages

  • Multicolor-compatible synchronization in a simple manner similar to issuing a HALT
  • Free CPU time can be used efficiently and the restriction on timing-uncompensated branching is lifted
  • A multicolor program can be written like any other with comparatively few special considerations expected of such productions, outside of tuning the display code portion itself
  • As the result, the potential complexity of multicolor software is greatly increased


Disadvantages

  • Requires ZX Spectrum models that implement some form of floating bus functionality
  • The available screen space is reduced by up to 3 scanlines depending on the algorithm
  • Alining to somewhere other than the top of the screen in this manner is more complicated
  • Upper scanline artifacts that may or may not be masked depending on the algorithm

Compatibility

 

Sinclair and Amstrad ZX Spectrum models

So far, aline is confirmed to work as expected under emulation on all Sinclair ZX Spectrum configurations up to and including the +2 (SpecEmu, Spectaculator, Fuse, Spectramine), as well as on the Amstrad ZX Spectrum +2A/+3 configurations (SpecEmu, Spectramine) with some limitations.

Regarding the +2A/+3 models, it was discovered recently that it is possible to access the floating bus functionality on these. Specifically, the fetch pattern was found to be similar to that of the Sinclair models, with a few notable differences:
  • It responds to port addresses with the mask 0000XXXXXXXXXX01b in 128K mode
  • The values returned from the floating bus port have the bit 0 set
  • The value of an attribute preceding an idle portion in the pattern is returned instead of 255
  • The border idle value is changed after a contended memory access
A single iteration of the +2A/+3 fetch pattern over the paper area might therefore look like this (Bitmap, Attribute):

B0 A0 B1 A1  A1 A1 A1 A1

The first two points listed above are accounted for in the standard implementation of aline. The third point on the other hand diminishes the usefulness of fourth attributes in the fetch pattern, effectively reducing the synchronization precision down from 1 to 5 T states. It must be noted, however, that this reduced precision is normally sufficient on practice given the nature of application for such a method, and factors such as memory contention. As well, a special algorithm to get around this issue appears likely to emerge at some point.

Another implication here is that there's no default idle value that is returned during border time on the +2A/+3. What would be normally read in this case is rightmost attributes of each individual character line. However, if a contended memory address is accessed by the CPU during border time, the idle value that is returned from the fb port is set to the contents of that address. This new idle value remains in effect until the raster is passing over the screen area again. The standard implementation of aline accounts for this as well.


Unofficial 'attribute-only' floating bus mods

A modification of the algorithm would allow for a variant of aline for Spectrum-compatibles with simplified-function floating bus mods that only return attribute values each 4 T states. At most, 3 reads are required in order to obtain 1T precise synchronization; this scheme will be referred to as 'attr-1t'. It must be noted that this form of aline isn't interchangeable as-is with the algorithm for the original machines since the latter only requires a single read.

The scheme is as follows. Keep reading the floating bus port until we have a non-zero read, A. Delay execution for 'TStatesPerLine + 2' T states, essentially moving 2 T states further relative to the pattern, and do a second read of the port, B. At this point, there are two branches depending on whether A=B:
  1. If A=B, we are currently at T states 1 or 2 out of 4. In order to make it clear, a third read C must be made after 'TStatesPerLine - 3' T states. If C=A, we are at T state 1/4. If CA, we are at T state 2/4.
  2. If AB, we are currently at T states 3 or 4 out of 4. In order to make it clear, a third read C must be made after 'TStatesPerLine - 1' T states. If C=A, we are at T state 3/4. If CA, we are at T state 4/4.

A=(+0)
B=(+2), 'T states per line + 2' later
A=B: case 1: 1/4 or 2/4

---CA-B-----
----CA-B----
000011112222
    C=(-1),
'T states per line - 3' later
    C=A: 1/4, else 2/4


AB: case 2: 3/4 or 4/4

------ACB---
-------ACB--
000011112222
    C=(+1)
, 'T states per line - 1' later
    C=A: 3/4, else 4/4

As stated before however, even reduced precision (4T in this case) would still often be sufficient on practice, and it can be achieved with a single read exactly like in the original algorithm; this scheme will be referred to as 'attr-4t'. In both cases, only the attributes need to be set up as sync marker values.

Another notable characteristic of the 'attribute-only' floating bus pattern as opposed to that of the Spectrum models is that it does not include 'idle' values. In other words, as long as the beam is moving over the PAPER portion of the screen, the values read from the port will correctly reflect its position over a particular attribute. This is favorable for a couple of reasons. First, it makes it unnecessary to use the full width of the line as the sync marker area. If the sync marker area is located at the top left corner of the screen, it only has to be as wide as to encompass the equivalent duration of a single run of the port reading loop. Assuming a 25 T state reading loop, as well as the fact it takes 4 T states for the beam to move over a single attribute, it can be seen that a 7 characters wide sync area (equivalent to 28 T states) is enough for this purpose. Therefore, not only it requires significantly less work to mask the resulting attribute artifacts, but the reduced number of sync codes means in this case they can be contained entirely within the INK portion of the values, allowing for unrestricted changes to the color of the sync marker area using the PAPER and BRIGHT settings.

The current 'attr-4t' aline implementation accomodates these points, which are also expected to apply to the 'attr-1t' algorithm. It is confirmed to work under emulation on the Pentagon and Scorpion configurations (UnrealSpeccy versions 0.35b2 and 0.38.3, 'Even M1' delays disabled).



Resources

The source code for aline is available here. The following files are included:
  • aline.asm: standard implementation
  • aline-color.asm: allows changing the color of the sync area
  • aline-fast.asm: uses unrolled (speed-optimized) setup code
  • aline-attronly-4t.asm: unofficial 'attribute-only' floating bus mod edition (4T precision)
  • test-aline.tap: a simple test program for aline using the 8x1 multicolor mode
  • test-aline-attr-4t.tap: as above, 'attr-4t' algorithm, Pentagon/Scorpion timings

A demo of the CATS Mint engine by Intense is available for an example of using aline.
Download archive (.TAP)
Watch demonstration


Special thanks

Chernandezba, Ast A. Moore, Your Spec-chum, Woody, Weiv and everyone else of the +2A/+3 floating bus testing effort over at the World of Spectrum forums.


Revisions

20180721
  • Corrected the information regarding the 'attribute-only' pattern handling
  • Improved the 'attr-4t' aline variant
  • Added the 'attr-4t' test program
20171124
  • Added the 'Color' aline variant
  • Expanded the Resources section
  • Added .TAP version of the CATS Mint example
20170922
  • Initial publication