Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lcc: overflow of relational operators #64

Open
kervinck opened this issue May 9, 2019 · 6 comments
Open

lcc: overflow of relational operators #64

kervinck opened this issue May 9, 2019 · 6 comments
Assignees

Comments

@kervinck
Copy link
Owner

kervinck commented May 9, 2019

The result of SUBW doesn't indicate if an overflow has occurred. Consequently, vCPU offers only signed comparisons against 0 for branching decisions. With that, the relational operators a<b, a<=b, a a>=b a>b only work when the difference between a and b fits in 15 bits. Ultimately this idiosyncrasy stems from the lack of status register in the 8-bit hardware architecture, and the underlying design idea that software can always compensate for missing hardware features...

TinyBASIC_v2.gt1 uses the following sequence to get correct comparisons over the full range:

0461  ee 02                    LDLW  $02                |..|
0463  fc 3a                    XORW  $3a                |.:|
0465  35 53 6a                 BGE   $046c              |5Sj|
0468  ee 02                    LDLW  $02                |..|
046a  90 6e                    BRA   $0470              |.n|
046c  ee 02                    LDLW  $02                |..|
046e  b8 3a                    SUBW  $3a                |.:|
0470  35 56 73                 BLE   $0475              |5Vs|

So prior to the standard subtract, it first checks if the operand highest bits are equal. If equal, it perform the normal SUBW for comparison. If not equal, the first operand is loaded for BLE/BLT (or the second operand in the case of BGE/BGT). This sequence costs 8 vCPU instructions or 18 bytes here, compared to 3/7 for the naive sequence. This is the reason that Tiny BASIC uses this idiom selectively.

For a compliant C compiler we need something similar. For example, now 0xffffu compares as smaller than 0. And we also can't really say that ints are just 15-bits wide instead.

My plan:

  1. First bite the bullet and implement it correctly for < <= > and >=. Try hard if we can get away with at most one new helper function in rt.py ("@prepcmp"?). For example, implement only in one order and reverse the order of operands if necessary.
  2. Provide macros/instrinsics/builtins to give access to the the naive relational operators when so desired by the programmer. Something like LT(a,b), GT(a,b), LE(a,b) and GE(a,b).
  3. Optimise away the prelude when comparing against a zero constant.
  4. Think deep about further optimisation options. For example, try to deduce the high bit of both sides by static analysis...?
  5. Consider some trivial rewrites. Eg. comparisons of an unsigned integer against 0 can be always be replaced by one of true/false/NE/EQ.
@kervinck kervinck added the bug label May 9, 2019
@Cwiiis
Copy link
Contributor

Cwiiis commented May 9, 2019 via email

@kervinck
Copy link
Owner Author

kervinck commented May 9, 2019

Yes, this concept needs to evolve, hence the issue to track the thought process

My idea of prioritising things in LCC are:

  1. Correctness
  2. Usability (e.g. error messages, 64K support, essential library functions, ...)
  3. Optimisations
    ...
  4. Floating point :-)

@kervinck kervinck self-assigned this May 13, 2019
@kervinck
Copy link
Owner Author

kervinck commented May 13, 2019

This has the potential to make a lot of code very inefficient, and I like to know the impact. For example:

if (putc(…) < 0)… can become nasty, whereas

if (putc(…) == EOF) … has potentially an efficient translation (using ADDI 1 + BNE)

kervinck added a commit that referenced this issue May 16, 2019
- No overflow cases included yet
@kervinck
Copy link
Owner Author

Perhaps we can squeeze in another new vCPU instruction that helps for these: "CMPW $DD".

@kervinck kervinck added compliancy and removed bug labels Jul 15, 2019
kervinck added a commit that referenced this issue Oct 5, 2019
Addresses: "New vCPU instructions? (#85)"
  #85

See also: "lcc: overflow of relational operators (#64)"
  #64

  See also this thread: https://forum.gigatron.io/viewtopic.php?f=4&t=136

Mnem. Encoding  #C Description
----- --------- -- -----------
CALLI $85 LL HH 28 Goto immediate address and remember vPC (vLR,vPC=vPC+3,$HHLL)
CMPHS $1f DD    28 Adjust high byte for signed compare (vACH=XXX)
CMPHU $97 DD    28 Adjust high byte for unsigned compare (vACH=XXX)

Changed cycle times
-------------------
LD    $1A DD    22 (was 18)
INC   $93 DD    22 (was 16)
ANDI  $82 DD    22 (was 16)

Regression test with Mandelbrot:
 1144.551 seconds -> 1144.751 seconds

----------------------------------------------------------------
On vCPU instructions for comparisons between two 16-bit operands
----------------------------------------------------------------

vCPU's conditional branching (BCC) always compares vAC against 0,
treating vAC as a two's complement 16-bit number. When we need to
compare two arbitrary numnbers we normally first take their difference
with SUBW.  However, when this difference is too large, the subtraction
overflows and we get the wrong outcome. To get it right over the
entire range, an elaborate sequence is needed. TinyBASIC uses this
blurp for its relational operators. (It compares stack variable $02
with zero page variable $3a.)

      0461  ee 02            LDLW  $02
      0463  fc 3a            XORW  $3a
      0465  35 53 6a         BGE   $046c
      0468  ee 02            LDLW  $02
      046a  90 6e            BRA   $0470
      046c  ee 02            LDLW  $02
      046e  b8 3a            SUBW  $3a
      0470  35 56 73         BLE   $0475

The CMPHS and CMPHU instructions were introduced to simplify this.
They inspect both operands to see if there is an overflow risk. If
so, they modify vAC such that their difference gets smaller, while
preserving the relation between the two operands. After that, the
SUBW instruction can't overflow and we achieve a correct comparison.
Use CMPHS for signed comparisons and CMPHU for unsigned. With these,
the sequence above becomes:

      0461  ee 02            LDLW  $02
      0463  1f 3b            CMPHS $3b        Note: high byte of operand
      0465  b8 3a            SUBW  $3a
      0467  35 56 73         BLE   $0475

CMPHS/CMPHU don't make much sense other than in combination with
SUBW. These modify vACH, if needed, as given in the following table:

      vACH  varH  |     vACH
      bit7  bit7  | CMPHS  CMPHU
      ---------------------------
        0     0   |  vACH   vACH      no change needed
        0     1   | varH+1 varH-1     narrowing the range
        1     0   | varH-1 varH+1     narrowing the range
        1     1   |  vACH   vACH      no change needed
      ---------------------------
@kervinck
Copy link
Owner Author

kervinck commented Apr 5, 2020

ROM v5 will have CMPHS and CMPHU instructions that are designed to solve this. See also f584a2a

@lb3361
Copy link
Contributor

lb3361 commented Aug 25, 2022

Suggesting to close this issue since glcc does this correctly already (on both roms v4 --with code-- and v5a+ --with cmphi/cmphs).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants