Skip to content

Investigation: The Apple IIgs Memory Manager

Thomas Harte edited this page Mar 3, 2021 · 34 revisions

As of the first draft of this document, all parts are present for Apple IIgs emulation but bugs remain. At least one of those manifests as the in-ROM memory manager not responding as expected; in particular:

  • titles including Airball, Bouncing Bluster and Crystal Quest meet an unexpected BRK during startup, somewhere in the 00:BB or 00:BC page, dropping the user into the monitor; and
  • titles such as Defender of the Crown, Block Out and Bubble Ghost stop completely on an error of 'Unable to reserve memory segment $0201'.

The second set would appear overtly to be memory manager related; the former also seems to be.

Unexpected BRK

Digging down on Airball, the immediate progenitor of the error is this routine in the ROM's MoveUp (from mm.asm):

lda Count+2
adc Dest+2	;A = 00bank
xba	;A = bank00
ora #$44	;MVP opcode
pha
bra muFirst

i.e. the ROM is intending to assemble the sequence MVP, RTS on the stack, to which it later jumps. An ORA is used because a single-byte load isn't available. In my emulator the value at Dest+2 originates from whatever the contents of RAM were at power up rather than necessarily being 00 and as a result the generated opcode isn't MVP and the on-stack sequence that is jumped to runs off the rails.

Specifically the value at Dest+2 at that point in the code is 7f15.

Origin of the 7f15

By the time MoveUp is reached, the value is in Dest+2 (i.e. 00bcb1 and 00bcb2).

(Aside on cycle times below: they are relative to the 14Mhz bus, and depend on things like the exact disk rotation speeds I currently have, so should not be expected necessarily to be accurate in the future)

It was placed there by a STY at fc0d5f, 502081910 cycles since the machine launched. That immediately follows a load from 00bcc5/6.

The value was put into 00bcc5/6 by a STY at fc0d5f, 502078270 cycles since the machine launched. The value had been picked up by a LDY [$21] at fc0cb2, which read the value from 00bcc9/a.

It had been placed into 00bcc9/a by a STA at fc0fcb, 502077180 cycles since the machine launched. It picked the value up from from a LDA [$13],y with y=2 at fc0fc9, which read the value from e11806/7.

In context, that is likely to be the upper word of the block of memory pointed to by a memory handle; see précis on memory manager handles below.

The value was written to e11806/7 by a STA [$17],y at fc0dac with y=2, 497930810 cycles since machine launch. Its origin then was one instruction earlier, a LDA $03,x with x=2 and d=$bca4 at fc0daa, which accessed bca9/a.

7f15 had been stored to bca9/a at fc0da4 by a STA [$05], 497929740 cycles since machine launch, having been read from 00bcc1/2 one instruction earlier by a LDA [$1d].

It had been stored there by a STY [$1d] at fc0d5f, 497928970 cycles since machine launch, having been fetched by a LDY [$21] at fc0cb2, 497928450 cycles since machine launched, which fetched from 00bcc5/6.

That value was left by a STA [$21] at fc0fcb, 497927880 cycles since machine launch. It was fetched one instruction earlier by a LDA [$13], y with y=2, and d=$bca4 yielding a base for the long address of bcb7 which gave e119a8; that plus y leads to a two-byte fetch from e119aa/b.

Digging down on e119ab, logging reads and writes to it since launch until its relevant access at 497927875 cycles since launch shows:

e119ab <- 00 [15775430]
e119ab -> 00 [141578780]
e119ab -> 00 [141579715]
e119ab <- 00 [469272795]
e119ab -> 00 [469274435]
e119ab -> 00 [469274490]
e119ab -> 00 [473880845]
e119ab -> 00 [473881875]
e119ab -> 00 [473920325]
e119ab <- 00 [480022440]
e119ab <- 7f [495809345]
e119ab -> 7f [495811040]
e119ab -> 7f [495811095]
e119ab -> 7f [495820535]
e119ab -> 7f [495820590]
e119ab -> 7f [497927875]

Therefore: it looks like the handle map is properly cleared at startup, which at least takes guesses around detecting a cold start and with loop lengths off the table.

Digging down on cycle 495809345 after machine launch returns us to fc0dac, the same location that wrote to e11806/7 above; that instruction begins 495809310 cycles after machine launch.

This time the LDA $03,x at fc0daa with x=2 has d=$bc4a and therefore loads from 00bc4f and 00bc50.

The store at fc0da4 this time stores a value previously loaded from bc67.

bc67 was stored to by fc0d5f at 495807470, with the offending 7f15 having been picked up from bc6b/c by the LDY [$21] at fc0cb2 which read from bc6b. That instruction began at 495806950.

The fc0fc9/fc0fcb load and store which stored 7f15 to bc6b/bc6c occured from time 495806380, and specifically the LDA [$13], y at fc0fc9 occurred with d=$bc4a and y=2 so read from bc5d, bc5e and bc5f, which contained $755b4a. Which is not in the memory manager handle set.

I also don't currently believe that 755b4c has been initialised. Therefore it's likely to be the pointer to there that has been calculated or written incorrectly somehow.

755b4a was written in two steps, at times 495802640 and 495805825 for 00bc5d/00bc5e and at 495805880 for 00bc5f. These timings possibly aren't yet fixed, which might be a clue.

The former occurs at fc0fa2 from a STA [$13] with d=$bc4a, following a LDA [#0f] at fc0fa0; that is followed by a LDA [$11] at fc0fa4 which obtains $5575 and then writes that at fc0fa6 via STA [$15]. Therefore the 4-byte address at bc5d is copied from 00bc59.

That address ended up at bc59 from the instruction at fc0fb2 at time 495804590, which is a STA [$11] that follows the LDA [$13], y at fc0fb0, with y = $e following a direct LDY #$0e at fc0fae and d=$bc4a.

That LDA reads from $594816 which in context is likely part of the handle traversing based on its rough ROM location, possibly implying that something has broken a previous pointer somewhere in the handle chain.

Memory Manager Notes

The memory manager uses the region between e11700 and e11aff for a list of handles. Each handle is 20 bytes long and has the format:

  • 0–3: address of held memory;
  • 4–5: attributes;
  • 6–7: segment owner ID;
  • 8–11: size of held memory;
  • 12–15: pointer to previous;
  • 16–19: pointer to next.

e119aa/b is 682 bytes into the full list of handles; it is therefore the high two bytes of the first field — the address of the memory held — of the 34th handle.

Attempt 1: Verifying the Memory Manager Handle List

Adding code to sanity check the memory manager handle linked list whenever a write occurs to that section of memory, and attempting manually to filter for false positives, the first breakage appears to occur at time 495717310 when an earlier instance of fc0dacSTA [$17], y stores a value of 1f31, indicating a most significant address byte of 1f.

d=$bc59, y=$e so the base handle address is read from bc70 and is e11994, causing the back pointer at e11728 to be updated with an out-of-bounds address.

That address was read from bc6a/bc6b in the preceding instruction — a LDA $03,x with x=$e and d as just stated.

The corresponding offending fc0fb0/fc0fb2 start at 495715150 and reference bc6c which is pointing to 797c3b.

Skipping back a few steps: watching the history of values written to 00bc6e by fc0fa6, the first one that isn't a=00e1 occurs at time 492357210, where a=0000; it is followed by a write at 492358530 of a=6431.

So am I failing to apply the z flag properly somewhere perhaps?

Attempt 2: Verifying the Memory Manager Handle List

Believing the above to be flawed, I rewrote the list verification. The only thing initially unexpected seems to be that the first item in a list contains a back pointer to a not-yet-populated handle, which is always the next handle in linear memory after the first one in the list.

E.g. this is the initial state of the bank e1 handle list:

Bank e1: e11750 (<- e11764 | e1173c ->) [000000:000000] -> 
e1173c (<- e11750 | e11728 ->) [000000:000000] -> 
e11728 (<- e1173c | e11714 ->) [000000:000000] -> 
e11714 (<- e11728 | e11700 ->) [000000:000000] -> 
e11700 (<- e11714 | 000000 ->) [000000:000000] -> 
nil

i.e. the only surprising thing there is that the handle at e11750, which is the first in the list for bank e1, contains a back pointer to e11764, which is the next handle in memory after e11750 but is not initially populated.

Once all four linked lists are initially populated the full set looks like:

Bank 00: e11714 (<- e11728 | e11700 ->) [000000:000000] -> 
e11700 (<- e11714 | 000000 ->) [000000:000000] -> 
nil
Bank 01: e11728 (<- e1173c | e11714 ->) [000000:000000] -> 
e11714 (<- e11728 | e11700 ->) [000000:000000] -> 
e11700 (<- e11714 | 000000 ->) [000000:000000] -> 
nil
Bank e0: e1173c (<- e11750 | e11728 ->) [000000:000000] -> 
e11728 (<- e1173c | e11714 ->) [000000:000000] -> 
e11714 (<- e11728 | e11700 ->) [000000:000000] -> 
e11700 (<- e11714 | 000000 ->) [000000:000000] -> 
nil
Bank e1: e11750 (<- e11764 | e1173c ->) [000000:000000] -> 
e1173c (<- e11750 | e11728 ->) [000000:000000] -> 
e11728 (<- e1173c | e11714 ->) [000000:000000] -> 
e11714 (<- e11728 | e11700 ->) [000000:000000] -> 
e11700 (<- e11714 | 000000 ->) [000000:000000] -> 
nil

So the back pointer from the first handle on bank e1 remains invalid; for the preceding banks each points to the first in the list for the bank that was populated one before. But that's possibly a coincidental effect of the way that lists are first constructed.

Digging in further, the first writes to e11760 onwards (i.e. the previous pointer associated with the handle at e11750) occur from time 15820330; it looks like e11760 and e11761 are likely part of the same instruction, being written only 5 cycles apart. e11762 and e11763 occur at 15820375 and 15820380.

Digging into the ROM

Believing the initial list to be incorrectly formed, I determined which routine is running and looked up its expected results. Key takeaway: the documentation I had as handle layout had the next and previous pointers transposed. Therefore my traversal was still incorrect.

This puts me back at square 1 in terms of trying to find the actual issue. I can find no immediate persistent issue in handling of memory handles.

Redoubling

A new resource has been provided: an old comp.sys.apple2.programmer post on the handle list. New knowledge:

  • only the used, purge and free list are supposed to be complete;
  • i.e., implicitly, the per-bank lists just point into the used list.

That provides for two new tests: (i) do those all properly begin and end with nil pointers; and (ii) does the total number of handles remain correct (i.e. at 51 — floor(1024/20))?

At first blush, this appears to uncover a position of error.

The final time that both tests pass is at 492332040. Never again after that are the full 51 handles accounted for; assuming 492351475 is completion of the next list modify which begins at 492351410 (which is probably based on number of steps, and their proximity), one handle is lost.

Specifically, e119e4 never appears again (and other losses subsequently occur; only 42 handles remain at the point that BRK is reached).

Differencing

For the next route in: determine the first time that the program counter is in user program and the handle list looks incorrect. Backtrack from there to find the breaking call. Difference that with other instances of the same call to try to find the failing path.

That appears to suggest a first failure in the ROM call that terminates at 491037040. Backtracking from there, the problem seems to occur in NewHandle.

... and, specifically, the first bad dereference possibly traces to 491030230, when a first call to GetHandle has dropped back into the loop from SearchDown:LoopDF at its attempt to lda addr1+2 (i.e. inspect the top word of addr1 via stack-relative addressing) which loads $9e09 — an invalid top word in IIgs addressing.

That value is loaded from bc78/bc79; the 09 in bc78 is actually the 09 from the NewHandle function code, $0902 and the 9e in bc79 was left by a phd at time 491026400 which appears to come from a previous stack frame pointer. Both are leftover from the tool locator dispatcher from fe00c5 that user code reached via the indirection at e10000.

So: stack corruption?

Value of the direct register is bc5b, having been established at 491027685, following some arithmetic to create a new stack frame. bc78/bc79 aren't touched after that, so option #2 is bad control flow failing to initialise something?

Some places that potentially write to addr+2:

  • XCheckFit ('addr2 destroyed');
  • GetNext;
  • GetNextFree; and
  • GetPrevFree.

There is a call to GetNextFree from 491029890, but upon finding a nil pointer to hndl1, it doesn't ever write to addr2.