Skip to content

A fast Forth Standard system written in Z80 assembly for SHARP PC-G850 pocket computers

License

Notifications You must be signed in to change notification settings

Robert-van-Engelen/Forth850

PC-G850

A modern Forth 2012 standard compliant system for the vintage SHARP PC-G850(V)(S) pocket computer or any Z80 system (with a few tweaks to port).

Forth850 is under 8K and has 295 words.

A more complete 11K version forth850-full is also included with:

Forth850 includes stack under/overflow checks, dictionary overflow checks and can be interrupted by pressing BREAK.

You can write Forth source code in the PC-G850(V)(S) built-in TEXT editor and compile it into Forth850 with the TEXT word included in the full version.

You can extend Forth850 as you wish, including assembly code written on the machine itself in the PC-G850(V)(S) TEXT editor and assembled with its Z80 Assembler. See ASMDEMO1.FTH for an example with an explanation. You can also use the Monitor to set breakpoints and run Forth850 from the Monitor with G100 to trigger them.

If you want to rebuild Forth850 from source code, you will need to install the asz80 assembler part of the ASxxxx Cross Assemblers.

If you plan to use parts of Forth850 and/or the optimized Z80 code in a project that you plan to share or redistribute, then please give me credit for my work as per BCD-3 license.

Discuss: HP Forum and reddit

Performance

I've implemented Forth850 as efficiently as possible in direct threaded code (DTC) with new Z80 code written from scratch, including faster Z80 integer and float math routines compared to other Z80 Forth implementations. See the technical implementation sections why Forth850 is fast for a DTC implementation. The Forth850 source code is included and extensively documented.

The n-queens benchmark is solved in 0.865 seconds, the fastest Forth implementation of the benchmarks. Forth850 n-queens runs 5 times faster than the C n-queens benchmark on the Sharp PC-G850VS.

Installation

In RUN MODE enter MON to enter the Monitor, then enter USER3FFF to reserve 16K memory space:

*USER3FFF
FREE:0100-3FFF

Loading via SIO (serial) requires a serial adapter. See my post on the HP Forum how to construct one as a DIY project. After reserving memory in the Monitor as described above, use the R command to read the forth850.ihx file or the forth850-full.ihx full version sent from your PC to your PC-G850(V)(S):

*R100

The R command is used to transmit/receive data in Intel hex format over SIO. This command is for receiving machine code from a personal computer or other device. See the Sharp PC-G850(V)(S) manual.

To load via the cassette interface, press BASIC to return to RUN MODE. Load forth850.wav using a cassette interface CE-126P or a CE-124:

BLOADM

Load the forth850-full.wav "full version" to include many additional words and floating point words. The full version will continue to evolve with new features.

How to switch between Forth and BASIC

To return to Forth, enter CALL256 in RUN MODE.

To return to BASIC from Forth, press the BASIC key. The TEXT key takes you to the TEXT editor.

To turn the machine off, press the OFF key. The machine will also power off automatically after about 10 minutes waiting for user input at the prompt.

How to increase or decrease memory allocation for Forth850

Memory allocation can be adjusted without affecting the Forth dictionary.

In RUN MODE enter MON to enter the Monitor, then enter USERaddr with an upper address addr larger than 23ff (9K bytes.) If words are added to Forth850, you must make sure that addr is large enough, i.e. equal or larger than the hex value displayed with:

HERE #708 + HEX . DECIMAL
23FF

In the Monitor specify USERaddr with the address displayed. This leaves about 200 bytes free dictionary space plus 40 bytes for the "hold area" to run Forth850. The largest size is USER75FF which gives about 21K free dictionary space (but there won't be space left on the machine for files, BASIC or TEXT.)

Forth850 manual

Forth850 is 2012 standard compliant. For help, see the manual included with Forth for the Sharp PC-E500(S) and Forth 2012 Standard.

Forth850 implements a subset of the standard Forth words. A list of Forth850 words with an explanation is given below.

Forth850 words

List of Forth850 built-in words. Reference implementations in Forth are included when applicable, although many words are implemented in Z80 code for speed rather than in Forth.

(:)

-- ; R: -- ip call colon definition; runtime of the : compile-only word

(;)

-- ; R: ip -- return to caller from colon definition; runtime of the ; compile-only word

(EXIT)

-- ; R: ip -- return to caller from colon definition; runtime of the EXIT compile-only word

(;CODE)

-- ; R: ip -- set LASTXT cfa to ip and return from colon definition; a runtime word compiled by the DOES> compile-only word

(DOES)

addr -- addr ; R: -- ip calls the DOES> definition with pfa addr; a runtime word compiled by the DOES> compile-only word coded as call dodoes

(VAR)

-- addr leave parameter field address (pfa) of variable; runtime word of a VARIABLE coded as call dovar

(VAL)

-- x fetch value; runtime word of a VALUE coded as call doval

(2VAL)

-- dx fetch double value; runtime word of a 2VALUE coded as call dotwoval

(CON)

-- x fetch constant; runtime word of a CONSTANT coded as call docon

(2CON)

-- x fetch double constant; runtime word of a 2CONSTANT coded as call dotwocon

(DEF)

-- execute deferred word; runtime word of a DEFER coded as call dodef

(LIT)

-- x fetch literal; runtime word compiled by EVALUATE, INTERPRET and NUMBER

(2LIT)

-- x1 x2 fetch double literal; runtime word compiled by EVALUATE, INTERPRET and NUMBER

(SLIT)

-- c-addr u fetch literal string; runtime word compiled by S" and ."

0

-- 0 leave constant 0

0 CONSTANT 0

1

-- 1 leave constant 1

1 CONSTANT 1

-1

-- -1 leave constant -1

-1 CONSTANT -1

BL

-- 32 leave constant 32 (space)

#32 CONSTANT BL

PAD

-- c-addr leave address of the PAD; the PAD is a free buffer space of 256 bytes not used by Forth850

TIB

-- c-addr leave address of TIB; the terminal input buffer used by Forth850

TMP

-- c-addr leave address of the next temp string buffer; switches between two string buffers of 256 free bytes each; used by S" to store a string when interpreting

DROP

x -- drop TOS

DUP

x -- x x duplicate TOS

?DUP

x -- x x or 0 -- 0 duplicate TOS if nonzero

SWAP

x1 x2 -- x2 x1 swap TOS with 2OS

OVER

x1 x2 -- x1 x2 x1 copy 2OS over TOS

ROT

x1 x2 x3 -- x2 x3 x1 rotate cells

: ROT >R SWAP R> SWAP ;

-ROT

x1 x2 x3 -- x3 x1 x2 undo (or back, or left) rotate cells

: -ROT ROT ROT ;

NIP

x1 x2 -- x2 nip 2OS

: NIP SWAP DROP ;

TUCK

x1 x2 -- x2 x1 x2 tuck TOS under 2OS

: TUCK SWAP OVER ;

2DROP

xd1 xd2 -- xd1 drop double TOS

: 2DROP DROP DROP ;

2DUP

xd -- xd xd duplicate double TOS

: 2DUP OVER OVER ;

2SWAP

xd1 xd2 -- xd2 xd1 swap double TOS with double 2OS

: 2SWAP ROT >R ROT R> ;
: 2SWAP 3 ROLL 3 ROLL ;

2OVER

xd1 xd2 -- xd1 xd2 xd1 copy double 2OS over double TOS

: 2OVER >R >R 2DUP R> R> 2SWAP ;
: 2OVER 3 PICK 3 PICK ;

DEPTH

-- u parameter stack depth

: DEPTH sp0 @ SP@ - 2- 2/ ;

CLEAR

... -- purge parameter stack

: CLEAR sp0 @ SP! ;

.S

-- display parameter stack

: .S DEPTH 0 ?DO sp0 @ I 2+ CELLS - ? LOOP ;

SP@

-- addr fetch stack pointer

SP!

addr -- store stack pointer

>R

x -- ; R: -- x move TOS to the return stack

DUP>R

x -- x ; R: -- x duplicate TOS to the return stack, a single word for DUP >R

R>

R: x -- ; -- x move cell from the return stack

RDROP

R: x -- ; -- drop cell from the return stack, a single word for R> DROP

R@

R: x -- x ; -- x fetch cell from the return stack

2>R

x1 x2 -- ; R: -- x1 x2 move double TOS to the return stack, a single word for SWAP >R >R

2R>

R: x1 x2 -- ; -- x1 x2 move double cell from the return stack, a single word for R> R> SWAP

2R@

R: x1 x2 -- x1 x2 ; -- x1 x2 fetch double cell from the return stack

RP@

-- addr fetch return stack pointer

RP!

addr -- store return stack pointer

PICK

xu ... x0 u -- xu ... x0 xu pick u'th cell from the parameter stack; 0 PICK is the same as DUP; 1 PICK is the same as OVER

: PICK 1+ CELLS SP@ + @ ;

@

addr -- x fetch from cell

C@

c-addr -- char fetch char

2@

addr -- x1 x2 fetch from double cell

: 2@ DUP CELL+ @ SWAP @ ;

!

x addr -- store in cell

(TO)

x -- store in value; runtime of the TO compile-only word

C!

char c-addr -- store char

2!

x1 x2 addr -- store in double cell

: 2! TUCK ! CELL+ ! ;

(2TO)

dx -- store in double value; runtime of the TO compile-only word

+!

n addr -- increment cell

(+TO)

n -- increment value; runtime of the +TO compile-only word

ON

addr -- store TRUE (-1) in cell

: ON -1 SWAP ! ;

OFF

addr -- store FALSE (0) in cell

: OFF 0 SWAP ! ;

+

n1 n2 -- n3 sum n1+n2

M+

d1 n -- d2 double sum d1+n

D+

d1 d2 -- d3 double sum d1+d2

: D+ >R M+ R> + ;

-

n1 n2 -- n3 difference n1-n2

D-

d1 d2 -- d3 double difference d1-d2

: D- DNEGATE D+ ;

UM*

u1 u2 -- ud unsigned double product u1*u2

M*

n1 n2 -- d signed double product n1*n2

: M*
  2DUP XOR >R
  ABS SWAP ABS UM*
  R> 0< IF DNEGATE THEN ;

*

n1|u1 n2|u2 -- n3|u3 signed and unsigned product n1*n2

: * UM* DROP ;

UMD*

ud1 u -- ud2 unsigned double product ud1*u

: UMD* DUP>R UM* DROP SWAP R> UM* ROT + ;

MD*

d1 n -- d2 signed double product d1*n

: MD*
  2DUP XOR >R
  ABS -ROT DABS ROT
  UMD*
  R> 0< IF DNEGATE THEN ;

UM/MOD

ud u1 -- u2 u3 unsigned remainder and quotient ud/u1; the result is undefined when u1 = 0

SM/REM

d1 n1 -- n2 n3 symmetric remainder and quotient d1/n1 rounded towards zero; the result is undefined when n1 = 0

: SM/REM
  2DUP XOR >R
  OVER >R
  ABS -ROT DABS ROT
  UM/MOD
  R> 0< IF SWAP NEGATE SWAP THEN
  R> 0< IF NEGATE THEN ;

FM/MOD

d1 n1 -- n2 n3 floored signed modulus and quotient d1/n1 rounded towards negative (floored); the result is undefined when n1 = 0

: FM/MOD
  DUP>R
  SM/REM
  DUP 0< IF
    SWAP R> + SWAP 1-
  ELSE
    RDROP
  THEN ;

/MOD

n1 n2 -- n3 n4 symmetric remainder and quotient n1/n2; the result is undefined when n2 = 0

: /MOD SWAP S>D ROT SM/REM ;

MOD

n1 n2 -- n3 symmetric remainder of n1/n2; the result is undefined when n2 = 0

: MOD /MOD DROP ;

/

n1 n2 -- n3 quotient n1/n2; the result is undefined when n2 = 0

: / /MOD NIP ;

*/MOD

n1 n2 n3 -- n4 n5 product with symmetric remainder and quotient n1*n2/n3; the result is undefined when n3 = 0

: */MOD -ROT M* ROT SM/REM ;

*/

n1 n2 n3 -- n4 product with quotient n1*n2/n3; the result is undefined when n3 = 0

: */ */MOD NIP ;

M*/

d1 n1 n2 -- d2 double product with quotient d1*n1/n2; the result is undefined when n2 = 0

: M*/ >R MD* R> SM/REM NIP ;

AND

x1 x2 -- x1&x2 bitwise and x1 with x2

OR

x1 x2 -- x1|x2 bitwise or x1 with x2

XOR

x1 x2 -- x1^x2 bitwise xor x1 with x2

=

x1 x2 -- flag true if x1 = x2

<>

x1 x2 -- flag true if x1 <> x2

<

n1 n2 -- flag true if n1 < n2 signed

: <
  2DUP XOR 0< IF
    DROP 0<
    EXIT
  THEN
  - 0< ;

>

n1 n2 -- flag true if n1 > n2 signed

: > SWAP < ;

U<

u1 u2 -- flag true if u1 < u2 unsigned

: U<
  2DUP XOR 0< IF
    NIP 0<
    EXIT
  THEN
  - 0< ;

U>

u1 u2 -- flag true if u1 > u2 unsigned

: U> SWAP U< ;

0=

x -- flag true if x = 0

0<

n -- flag true if n < 0

D0=

dx -- flag true if dx = 0

: D0= OR 0= ;

D0<

d -- flag true if d < 0

: D0< NIP 0< ;

S>D

n -- d widen single to double

D>S

d -- n narrow double to single; may throw -11 "result out of range" valid range is -32768 to 65535

MAX

n1 n2 -- n3 signed max of n1 and n2

: MAX
  2DUP < IF SWAP THEN
  DROP ;

MIN

n1 n2 -- n3 signed min of n1 and n2

: MIN
  2DUP > IF SWAP THEN
  DROP ;

UMAX

u1 u2 -- u3 unsigned max of u1 and u2

: UMAX
  2DUP U< IF SWAP THEN
  DROP ;

UMIN

u1 u2 -- u3 unsigned min of u1 and u2

: UMIN
  2DUP U> IF SWAP THEN
  DROP ;

WITHIN

x1 x2 x3 -- flag true if x1 is within x2 up to x3 exclusive

: WITHIN OVER - >R - R> U< ;

INVERT

x1 -- x2 one's complement ~x1

: INVERT 1+ NEGATE ;
: INVERT -1 XOR ;

NEGATE

n1 -- n2 two's complement -n1

: NEGATE 0 SWAP - ;
: NEGATE INVERT 1+ ;

ABS

n1 -- n2 absolute value |n1|

: ABS DUP 0< IF NEGATE THEN ;

DNEGATE

d1 -- d2 two's complement -d1

: DNEGATE SWAP INVERT SWAP INVERT 1 M+ ;

DABS

d1 -- d2 absolute value |d1|

: DABS DUP 0< IF DNEGATE THEN ;

LSHIFT

x1 u -- x2 logical shift left x1 << u

RSHIFT

x1 u -- x2 logical shift right x1 >> u

1+

n1 -- n2 increment n1+1

: 1+ 1 + ;

2+

n1 -- n2 increment n1+2

: 2+ 2 + ;

1-

n1 -- n2 decrement n1-1

: 1- 1 - ;

2-

n1 -- n2 decrement n1-2

: 2- 2 - ;

2*

n1 -- n2 arithmetic shift left n1 << 1

: 2* 2 * ;

2/

n1 -- n2 arithmetic shift right n1 >> 1

: 2/ 2 / ;

COUNT

c-addr1 -- c-addr2 u convert counted string to string

: COUNT DUP 1+ SWAP C@ ;

COMPARE

c-addr1 u1 c-addr2 u2 -- -1|0|1 compare strings, leaves -1 = less or 0 = equal or 1 = greater

S=

c-addr1 u1 c-addr2 u2 -- flag true if strings match

: S= COMPARE 0= ;

SEARCH

c-addr1 u1 c-addr2 u2 -- c-addr3 u3 flag true if the second string is in the first; leaves matching address, remaining length and true; or leaves the first string and false

CMOVE

c-addr1 c-addr2 u -- move u bytes from c-addr1 to c-addr2 (from begin)

: CMOVE
  SWAP >R
  BEGIN DUP WHILE
    NEXT-CHAR R@ C!
    R> 1+ >R
  REPEAT
  RDROP
  2DROP ;

CMOVE>

c-addr1 c-addr2 u -- move u bytes from c-addr1 to c-addr2 up (from end)

MOVE

c-addr1 c-addr2 u -- move u bytes from c-addr1 to c-addr2

: MOVE
  -ROT
  2DUP U< IF
    ROT CMOVE>
  ELSE
    ROT CMOVE
  THEN ;

FILL

c-addr u char -- fill memory with char

ERASE

c-addr u -- fill memory with zeros

: ERASE 0 FILL ;

BLANK

c-addr u -- fill memory with 0x20 (BL) chars

: ERASE BL FILL ;

CHOP

c-addr u1 char -- c-addr u2 truncate a string up to a matching char; leaves the string if char not found; char = 0x20 (BL) chops 0x00 to 0x20 (white space and control)

TRIM

c-addr1 u1 char -- c-addr2 u2 trim initial chars from a string; char = 0x20 (BL) trims 0x00 to 0x20 (white space and control)

-TRIM

c-addr u1 char -- c-addr u2 trim trailing chars from a string; char = 0x20 (BL) trims 0x00 to 0x20 (white space and control)

-TRAILING

c-addr u1 -- c-addr u2 trim trailing white space and control characters from a string

: -TRAILING BL -TRIM ;

/STRING

c-addr1 u1 n -- c-addr2 u2 slice n characters off the start of a string

: /STRING ROT OVER + -ROT - ;

NEXT-CHAR

c-addr1 u1 -- c-addr2 u2 char get next char from a string; increments the string address and decrements its length by one

: NEXT-CHAR OVER C@ >R 1- SWAP 1+ SWAP R> ;
: NEXT-CHAR OVER C@ -ROT 1- SWAP 1+ SWAP ROT ;

X!

u -- set cursor column 0 to 23

Y!

u -- set cursor row 0 to 5

X@

-- u fetch cursor column 0 to 23, or 24 when beyond the right window edge

Y@

-- u fetch cursor row 0 to 5

AT-XY

u1 u2 -- set column x to u1 (0 to 23) and row y to u2 (0 to 5)

: AT-XY Y! X! ;

EMIT

char -- emit char to screen; supports the following control codes: 8 (BS backspace, cursor left), 9 (TAB), 10 (LF line feed), 11 (VT scroll), 12 (FF clear screen), 13 (CR carriage return), 28 (cursor right), 29 (cursor left), 30 (cursor up), 31 (cursor down)

TYPE

c-addr u -- type string to output; string may contain control codes, see EMIT

: TYPE
  BEGIN DUP WHILE
    NEXT-CHAR EMIT
  REPEAT
  2DROP ;

CR

-- carriage return and line feed

: CR $A EMIT ;

SPACE

-- emit a space (BL)

: SPACE BL EMIT ;

SPACES

n -- emit n spaces (zero or negative n does nothing)

: SPACES
  DUP 0< IF
    DROP
    EXIT
  THEN
  0 ?DO SPACE LOOP ;

PAGE

-- clear screen

: PAGE $C EMIT ;

BASE

-- addr variable with numeric base for conversion

VARIABLE BASE

DECIMAL

-- set BASE to 10

: DECIMAL #10 BASE ! ;

HEX

-- set BASE to 16

: HEX #16 BASE ! ;

HP

-- addr hold pointer

0 VALUE HP

<#

-- begin pictured numeric output

: <# HERE h_size + TO HP ;

HOLD

char -- hold char for pictured numeric output

: HOLD HP 1- DUP TO HP C! ;

ud1 -- ud2 hold digit

: #
  0 BASE @ UM/MOD >R
  BASE @ UM/MOD
  SWAP DUP #9 > IF
    #7 +
  THEN
  '0 + HOLD
  R> ;

#S

ud -- 0 0 hold all remaining digits

: #S BEGIN # 2DUP D0= UNTIL ;

SIGN

n -- hold minus sign if n < 0

: SIGN 0< IF '- HOLD THEN ;

#>

ud -- c-addr u end pictured numeric output, leave string

: #> 2DROP HP HERE h_size + OVER - ;

D.R

d +n -- output signed double d right aligned in field of +n chars wide

: D.R -ROT TUCK DABS <# #S ROT SIGN #> ROT OVER - SPACES TYPE ;

D.

d -- output signed double d with a trailing space

: D. 0 D.R SPACE ;

U.R

u +n -- output unsigned u right aligned in field of +n chars wide

: U.R 0 SWAP D.R ;

U.

u -- output unsigned u with a trailing space

: U. 0 D. ;

.R

n +n -- output signed n right aligned in field of +n chars wide

: .R SWAP S>D ROT D.R ;

.

n -- output signed n with a trailing space

: . S>D D. ;

?

addr -- output signed cell stored at addr

: ? @ . ;

OUT

u1 u2 -- output byte u1 to port u2

INP

u1 -- u2 input from port u1

DRAW

c-addr u -- draw pixel patterns on screen at xy; writes string c-addr u of pixel patterns at xy; specify xy with AT-XY, xy not changed after DRAW

VIEW

c-addr u -- view screen pixels at xy; read string of screen pixel patterns at xy into buffer c-addr u specify xy with AT-XY, xy not changed after VIEW

REVERSE

+n -- reverse video of the +n characters displayed at xy; specify xy with AT-XY

INKEY

-- x check for key press and read key code of a key is pressed; 0x00 = no key pressed and 0x52 = multiple keys pressed

GETKEY

-- char wait and read key; leaves ASCII char or special key code: ON =$05, BS =$08, DEL =$09, CA =$0b, CLS =$0c, ENTER =$0d, DIGIT =$0e, F-E =$0f, INS =$12, ANS =$15, CONST =$17, RCM =$19, M+ =$1a, M- =$1b, right =$1c, left =$1d, up =$1e, down =$1f; a space is produced for the TAB key by the GETCHR system call, calc keys and BASIC keys produce BASIC tokens as key code $fe: SIN =$fe register B = $95 BASIC token for SIN (ignored)

KEY

-- char display cursor and wait to read key; same as GETKEY leaves ASCII char or special key code

EDIT

c-addr +n1 n2 n3 n4 -- c-addr +n5 edit buffer c-addr; buffer size +n1; string in buffer has length n2; place cursor at n3; non-editable left margin n4; leaves c-addr and length +n5

ACCEPT

c-addr +n1 -- +n2 accept user input into buffer c-addr +n1; leaves length +n2

: ACCEPT 0 0 0 EDIT NIP ;

>IN

-- addr variable with offset into input buffer (TIB)

VARIABLE >IN

SOURCE-ID

-- 0|-1 value with 0 = source input or -1 = string input

0 VALUE SOURCE-ID

SOURCE

-- c-addr u double value with input source

TIB 0 2VALUE SOURCE

REFILL

-- flag attempt to refill the input buffer; leaves false when end of input

SKIPS

char "" -- skips chars in input when present, 0x20 (BL) skips 0x00 to 0x20 (white space and control)

: SKIPS SOURCE >IN @ /STRING ROT TRIM DROP SOURCE DROP - >IN ! ;

PARSE

char "ccc" -- c-addr u parse "ccc" up to char when present

: PARSE SOURCE >IN @ /STRING ROT CHOP DUP 1+ >IN @ + SOURCE NIP UMIN >IN ! ;

PARSE-WORD

char "ccc" -- c-addr u parse char-delimited word; may throw -18 "parsed string overflow"

: PARSE-WORD
  DUP SKIPS PARSE
  DUP tmp_size-1 U> IF -18 THROW THEN ;

CHECK-NAME

c-addr u -- c-addr u check if name is valid; may throw -16 "attempt to use a zero-length string as a name"; may throw -19 "definition name too long"

: CHECK-NAME
  DUP 0= IF -16 THROW THEN
  DUP length_bits U> IF -19 THROW THEN ;

PARSE-NAME

"name" -- c-addr u parse space-delimited name; check if name length is valid

: PARSE-NAME BL PARSE-WORD CHECK-NAME ;

(

"ccc" -- start a comment block; parse and skip input up to the closing )

: (
  ') PARSE
  BEGIN
    + DROP
    SOURCE + = IF
      DROP REFILL
    ELSE
      C@ ') <> IF
        REFILL
      ELSE
        FALSE
      THEN
    THEN
  0= UNTIL ; IMMEDIATE

\

"ccc" -- start a comment line; parse and skip input up to the end of line; note that the PC-G850 symbol for \ is ÂĄ

: \ $A PARSE 2SROP ;

.(

"ccc" -- emit CR then type "ccc" up to the closing )

: .( ') PARSE CR TYPE ; IMMEDIATE

>DIGIT

char -- n convert char digit to numeric digit when within BASE; leaves -1 if char is invalid

>NUMBER

ud1 c-addr1 u1 -- ud2 c-addr2 u2 convert string to number; updates accumulated double ud1 to ud2; leaves string with the remaining unconvertable chars or empty

: >NUMBER
  BEGIN DUP WHILE
    NEXT-CHAR >DIGIT
    DUP 0< IF
      DROP -1 /STRING
      EXIT
    THEN
    >R
    2SWAP
    BASE @ UMD*
    R> M+
    2SWAP
  REPEAT ;

DBL

-- flag true if >DOUBLE or NUMBER produced a double

0 VALUE DBL

>DOUBLE

c-addr u -- d true | false convert string to signed double; leaves the double and true if string is converted; leaves false if string is unconvertable

L>NAME

lfa -- nt convert link field address to name token (nfa)

NAME>STRING

nt -- c-addr u convert name token (nfa) to string

NAME>

nt -- xt convert name token (nfa) to execution token (cfa)

>NAME

xt -- nt convert execution token (cfa) to name token (lfa); may throw -24 "invalid numeric argument"

>BODY

xt -- pfa convert execution token to parameter field address

FIND-WORD

c-addr u -- c-addr 0 | xt 1 | xt -1 search dictionary for matching word; leaves execution token and 1 = immediate or -1 = not immediate; leaves c-addr and 0 when not found

'

"name" -- xt parse name and get execution token; may throw -13 "undefined word"

: ' PARSE-NAME FIND-WORD 0= IF -13 THROW THEN ;

WORDS

-- display context vocabulary words

HERE

-- addr address of free memory after the dictionary; new definitions are added here; note that numeric output words use HERE for conversion

LASTXT

-- xt leaves the last execution token defined

0 VALUE LASTXT

STATE

-- addr compilation state; STATE @ leaves TRUE when compiling; STATE @ leaves FALSE when interpreting

VARIABLE STATE

[

-- switch state to interpreting

: [ STATE OFF ;

]

-- switch state to compiling

: ] STATE ON ;

HIDE

-- hide the last definition

: HIDE CURRENT @ L>NAME DUP C@ smudge_bits OR SWAP C! ;

REVEAL

-- reveal the last definition

: REVEAL CURRENT @ L>NAME DUP C@ ~smudge_bits AND SWAP C! ;

IMMEDIATE

-- make the last definition immediate

: IMMEDIATE CURRENT @ L>NAME DUP C@ immediate_bits OR SWAP C! ;

?COMP

-- check if compiling; may throw -14 "interpreting a compile-only word"

?SYS

-- ; C: x -- check if compiled control structure matches x; may throw -14 "interpreting a compile-only word"; may throw -22 "control structure mismatch"

UNUSED

-- u unused dictionary space

: UNUSED top @ HERE - ;

ALLOT

n -- allocate n bytes starting from HERE in the dictionary; undo the last ALLOT with negative n to reclaim memory (only do this when no new words are defined); may throw -8 "dictionary overflow"

COMPILE,

xt -- append execution token to dictionary; may throw -8 "dictionary overflow"

: COMPILE, , ;

,

x -- append cell to dictionary; may throw -8 "dictionary overflow"

C,

char -- append char to dictionary; may throw -8 "dictionary overflow"

2,

x1 x2 -- append double cell to dictionary; may throw -8 "dictionary overflow"

: 2, , , ;

NFA,

"name" -- parse name and append dictionary entry with name; set LASTXT to HERE; may throw -8 "dictionary overflow"

: NFA, PARSE-NAME HERE CURRENT @ , CURRENT ! DUP C, HERE SWAP DUP ALLOT CMOVE HERE TO LASTXT ;

CFA,

addr -- append cfa call addr to dictionary; may throw -8 "dictionary overflow"

CFA:,

-- addr colon_sys append cfa colon definition to dictionary; make CONTEXT the CURRENT vocabulary; start compiling; may throw -8 "dictionary overflow"

: CFA:, ] HERE colon_sys ['] (:) CFA, CURRENT TO CONTEXT ;

POSTPONE

"name" -- postpone compile action of name; if name is immediate, then compile name instead of executing it; otherwise compile name into the current colon definition; can be used to create macros, e.g. : TRUE POSTPONE -1 ; IMMEDIATE; may throw -13 "undefined word"; may throw -14 "interpreting a compile-only word"

:

-- ; C: "name" -- addr colon_sys define name and start compiling

: : NFA, HIDE CFA:, ;

;

-- ; C: addr colon_sys -- end colon definition and stop compiling; may throw -14 "interpreting a compile-only word"; may throw -22 "control structure mismatch"

: ; ?COMP colon_sys <> IF -22 THROW THEN DROP POSTPONE (;) REVEAL [ ; IMMEDIATE

EXIT

-- exit colon definition

: EXIT ?COMP POSTPONE (EXIT) ; IMMEDIATE

CREATE

"name" -- ; -- addr create name; executing name leaves address (HERE addr after CREATE)

: NFA, ['] (VAR) CFA, ;

DOES>

-- ; ... -- ... change CREATE name behavior to execute code after DOES>

: DOES> ?COMP POSTPONE (;CODE) ['] (DOES) CFA, ; IMMEDIATE

VARIABLE

"name" -- ; -- addr define a variable; executing name leaves address of value (initialized to zero)

: VARIABLE CREATE 0 , ;

2VARIABLE

"name" -- ; -- addr define a double variable; executing name leaves address of double value (initialized to zero)

: 2VARIABLE CREATE 0 0 2, ;

CONSTANT

x "name" -- ; -- x define a constant; executing name leaves x

: CONSTANT NFA, ['] (CON) CFA, , ;
: CONSTANT CREATE , DOES> @ ;

2CONSTANT

x1 x2 "name" -- ; -- x1 x2 define a double constant; executing name leaves x1 x2

: 2CONSTANT NFA, ['] (2CON) CFA, 2, ;
: 2CONSTANT CREATE 2, DOES> 2@ ;

VALUE

x "name" -- ; -- x define a value; executing name leaves x

: VALUE NFA, ['] (VAL) CFA, , ;

2VALUE

dx "name" -- ; -- dx define a double value; executing name leaves dx

: 2VALUE NFA, ['] (2VAL) CFA, 2, ;

TO

"name" -- ; x -- assign value name; may throw -32 "invalid name argument"

: TO
  '
  DUP VALUE? IF
    >BODY
    STATE @ IF
      POSTPONE (TO)
      ,
      EXIT
    THEN
    !
    EXIT
  THEN
  DUP 2VALUE? IF
    >BODY
    STATE @ IF
      POSTPONE (2TO)
      ,
      EXIT
    THEN
    2!
    EXIT
  THEN
  #-32 THROW ; IMMEDIATE

+TO

"name" -- ; n -- increment value name; may throw -32 "invalid name argument"

: +TO
  '
  DUP VALUE? IF
    >BODY
    STATE @ IF
      POSTPONE (+TO)
      ,
      EXIT
      THEN
    +!
    EXIT
  THEN
  #-32 THROW ; IMMEDIATE

DEFER

"name" -- ; ... -- ... define a deferred name

: DEFER NFA, ['] (DEF) CFA, ['] UNDEF , ;

UNDEF

-- throw -256 "execution of an uninitialized deferred word"

: UNDEF -256 THROW ;

DEFER!

xt1 xt2 -- store xt1 in deferred xt2

: DEFER! >BODY ! ;

DEFER@

xt1 -- xt2 fetch execution token from deferred xt1

: DEFER@ >BODY @ ;

IS

xt "name" -- assign execution token to deferred name; may throw -32 "invalid name argument"

: IS
  '
  DUP DEFER? IF
    STATE @ IF
      LITERAL
      POSTPONE DEFER!
      EXIT
    THEN
    DEFER!
    EXIT
  THEN
  #-32 THROW ; IMMEDIATE

ACTION-OF

"name" -- xt fetch execution token of deferred name; may throw -32 "invalid name argument"

: ACTION-OF
  '
  DUP DEFER? IF
    STATE @ IF
      LITERAL
      POSTPONE DEFER@
      EXIT
    THEN
    DEFER@
    EXIT
  THEN
  #-32 THROW ; IMMEDIATE

LITERAL

x -- ; -- x compile a literal

: LITERAL ?COMP POSTPONE (LIT) , ; IMMEDIATE

2LITERAL

x1 x2 -- ; -- x1 x2 compile a double literal

: 2LITERAL ?COMP POSTPONE (2LIT) 2, ; IMMEDIATE

SLITERAL

c-addr u -- ; -- c-addr u compile a string literal; max literal string length is 255

: SLITERAL
  ?COMP
  DUP 255 U> IF -18 THROW THEN
  POSTPONE (SLIT)
  DUP C,
  HERE OVER ALLOT SWAP CMOVE ; IMMEDIATE

."

"ccc" -- ; -- type "ccc" (compiled)

: ." '" PARSE SLITERAL POSTPONE TYPE ; IMMEDIATE

S"

"ccc" -- ; -- c-addr u leave string "ccc" (compiled and interpreted)

: S"
  '" PARSE
  STATE @ IF
    SLITERAL
    EXIT
  THEN
  TMP SWAP
  2DUP 2>R
  CMOVE
  2R> ; IMMEDIATE

VALUE?

xt -- flag true if xt is a VALUE

: VALUE? DUP C@ $CD = SWAP 1+ @ ['] (VAL) = AND ;

2VALUE?

xt -- flag true if xt is a 2VALUE

: 2VALUE? DUP C@ $CD = SWAP 1+ @ ['] (2VAL) = AND ;

DEFER?

xt -- flag true if xt is a DEFER word

: DEFER? DUP C@ $CD = SWAP 1+ @ ['] (DEF) = AND ;

FENCE

-- addr only permit FORGET past the dictionary FENCE address

0 VALUE FENCE

FORGET

"name" -- delete name and all following definitions; beware of vocabulary definitions crossings; may throw -15 "invalid FORGET"

[']

"name" -- ; -- xt compile xt of name as literal; may throw -14 "interpreting a compile-only word"

: ['] ?COMP ' LITERAL ; IMMEDIATE

RECURSE

... -- ... recursively call the currently defined word; may throw -14 "interpreting a compile-only word"

: RECURSE ?COMP LASTXT COMPILE, ; IMMEDIATE

?STACK

-- check parameter stack bounds; may throw -3 "stack overflow"; may throw -4 "stack underflow"

(UNTIL)

x -- branch if x = 0; runtime of the UNTIL compile-only word

(IF)

x -- branch if x = 0; runtime of the IF and WHILE compile-only words

(AGAIN)

-- branch; runtime of the AGAIN and REPEAT compile-only words

(AHEAD)

-- branch; runtime of the AHEAD, ELSE and ENDOF compile-only words

(OF)

x1 x2 -- x1 or x1 x2 -- branch if x1 <> x2; runtime of the OF compile-only word

(LOOP)

-- repeat loop unless loop counter crosses the limit; runtime of the LOOP compile-only word

(+LOOP)

-- increment counter and repeat loop unless counter crosses the limit; runtime of the +LOOP compile-only word

(?DO)

n1|u1 n2|u2 -- begin loop with limit n1|u1 and initial value n2|u2; skip loop when zero trip loop; runtime of the ?DO compile-only word

(DO)

n1|u1 n2|u2 -- begin loop with limit n1|u1 and initial value n2|u2; loop at least once; runtime of the DO compile-only word

(UNLOOP)

R: ... -- remove loop parameters; runtime of the UNLOOP compile-only word

(LEAVE)

-- discard the loop parameters and exit the innermost do-loop; runtime of the LEAVE compile-only word

AHEAD

-- ; C: -- addr orig branch ahead to THEN; may throw -14 "interpreting a compile-only word"

BEGIN

-- ; C: -- addr dest begin WHILE REPEAT; may throw -14 "interpreting a compile-only word"

AGAIN

-- ; C: addr dest -- branch back to BEGIN; may throw -14 "interpreting a compile-only word"; may throw -22 "control structure mismatch"

UNTIL

x -- ; C: addr dest -- branch back to BEGIN if x = 0; may throw -14 "interpreting a compile-only word"; may throw -22 "control structure mismatch"

IF

x -- ; C: -- addr orig branch to closest ELSE or THEN if x = 0; may throw -14 "interpreting a compile-only word"

THEN

-- ; C: addr orig -- close AHEAD, IF, ELSE; may throw -14 "interpreting a compile-only word"; may throw -22 "control structure mismatch"

ELSE

-- ; C: addr orig -- addr orig close IF and branch to THEN; may throw -14 "interpreting a compile-only word"; may throw -22 "control structure mismatch"

WHILE

x -- ; C: addr sys -- addr orig addr sys branch to exit REPEAT if x = 0; may throw -14 "interpreting a compile-only word"

REPEAT

-- ; C: addr orig addr dest -- branch back to BEGIN after WHILE; may throw -14 "interpreting a compile-only word"; may throw -22 "control structure mismatch"

DO

n1|u1 n2|u2 -- ; C: -- addr do_sys begin loop from initial value n2|u2 to the limit n1|u1; loop at least once; may throw -14 "interpreting a compile-only word"

?DO

n1|u1 n2|u2 -- ; C: -- addr do_sys begin loop from initial value n2|u2 to the limit n1|u1; skip loop when zero trip loop; may throw -14 "interpreting a compile-only word"

LOOP

-- ; C: addr do_sys -- repeat loop unless loop counter crosses the limit; may throw -14 "interpreting a compile-only word"; may throw -22 "control structure mismatch"

+LOOP

n|u -- ; C: addr do_sys -- increment counter and repeat loop unless counter crosses the limit; may throw -14 "interpreting a compile-only word"; may throw -22 "control structure mismatch"

UNLOOP

-- remove loop parameters; may throw -14 "interpreting a compile-only word"

LEAVE

-- exit the innermost do-loop; may throw -14 "interpreting a compile-only word"

I

-- n counter of innermost do loop

J

-- n counter of outer (second) do loop

CASE

x -- ; C: -- 0 begin CASE ENDCASE switch; may throw -14 "interpreting a compile-only word"

OF

x1 x2 -- x1 or x1 x2 -- ; C: n1 -- orig n2 take CASE arm if x1 = x2; otherwise branch to next OF; may throw -14 "interpreting a compile-only word"

ENDOF

-- ; C: n -- orig n branch to ENDCASE; may throw -14 "interpreting a compile-only word"; may throw -22 "control structure mismatch"

ENDCASE

x -- ; C: n*orig n -- close CASE; may throw -14 "interpreting a compile-only word"; may throw -22 "control structure mismatch"

HANDLER

-- addr variable with saved return stack pointer

VARIABLE HANDLER

EXECUTE

... xt -- ... execute execution token xt

CATCH

... xt -- ... 0 or xt -- n execute xt leaving nonzero exception code n or 0 when no exception occurred; when an exception was caught, the parameter and return stacks are restored to their state before execution of xt

: CATCH
  SP@ >R
  HANDLER @ >R
  RP@ HANDLER !
  EXECUTE
  R> HANDLER !
  RDROP
  0 ;

THROW

0 -- or ... n -- ... n throw exception n if nonzero

: THROW
  ?DUP IF
    HANDLER @ ?DUP IF
      RP!
      R> HANDLER !
      R> SWAP >R
      SP!
      DROP
      R>
      EXIT
    THEN
    >R CLEAR R>
    ERROR
    REPL
  THEN ;

QUIT

... -- ; R: ... -- throw -56 "QUIT"; no exception error is displayed; unlike ABORT, the parameter stack is not cleared

: QUIT -56 THROW ;

(ABORT")

... flag c-addr u -- ; R: ... -- if flag then abort with string message unless an active catch is present; runtime of the ABORT" compile-only word; throw -2 "ABORT""

: (ABORT")
  ROT IF
    HANDLER @ IF
      2DROP
    ELSE
      TYPE
    THEN
    -2 THROW
  THEN
  2DROP ;

ABORT"

... flag -- ; C: "ccc" -- ; R: ... -- if flag then abort with string message unless an active catch is present; throw -2 "ABORT""; clears the parameter stack unless caught with CATCH; may throw -14 "interpreting a compile-only word"

: ABORT" ?COMP POSTPONE S" POSTPONE (ABORT") ; IMMEDIATE

ABORT

... -- ; R: ... -- throw -1 "ABORT"; clears the parameter stack unless caught with CATCH

: ABORT -1 THROW ;

ERROR

n -- display exception n at the offending location in the input; n = -1 ABORT and n = -2 ABORT" clear the stack; n = -56 QUIT stays silent; List of Forth850 errors:

code error
-1 ABORT
-2 ABORT"
-3 stack overflow
-4 stack underflow
-8 dictionary overflow
-10 division by zero
-11 result out of range
-13 undefined word
-14 interpreting a compile-only word
-15 invalid FORGET
-16 attempt to use zero-length string as a name
-18 parsed string overflow
-19 definition name too long
-22 control structure mismatch
-24 invalid numeric argument
-28 user interrupt (BREAK was pressed)
-32 invalid name argument (invalid TO name)
-42 floating-point divide by zero
-43 floating-point result out of range
-46 floating-point invalid argument
-56 QUIT
-256 execution of an uninitialized deferred word

NUMBER

c-addr u -- n|u|d|ud convert string to number; value DBL is set to -1 when the number is a double; may throw -13 "undefined word" when string is not numeric

INTERPRET

-- interpret input while input is available

EVALUATE

... c-addr u -- ... evaluate string

REPL

-- read-evaluate-print loop

: REPL
  rp0 @ RP!
  HANDLER OFF
  0 TO SOURCE-ID
  CR
  [
  BEGIN
    BEGIN ['] REFILL CATCH ?DUP WHILE
      ERROR CR
    REPEAT
  WHILE
    SPACE
    ['] INTERPRET CATCH ?DUP IF
      ERROR
      REPL
    THEN
    STATE @ INVERT IF
      ." OK["
      DEPTH 0 U.R
      '] EMIT
    THEN
    CR
  REPEAT
  BYE ;

BYE

-- return to BASIC

CONTEXT

-- addr leaves address of link of the last vocabulary context definition

' FORTH VALUE CONTEXT

CURRENT

-- addr leaves address of link of the last current vocabulary definition

' FORTH VALUE CURRENT

DEFINITIONS

-- make CURRENT the CONTEXT vocabulary

: DEFINITIONS CONTEXT TO CURRENT ;

VOCABULARY

"name" -- define a new vocabulary

: VOCABULARY CREATE , fig_kludge , DOES> TO CONTEXT ;

FORTH

-- make FORTH the CONTEXT vocabulary

VOCABULARY FORTH

Additional words included with the full version

FALSE

-- 0 leave 0

0 CONSTANT FALSE

TRUE

-- -1 leave -1

-1 CONSTANT TRUE

2ROT

xd1 xd2 xd3 -- xd2 xd3 xd1 rotate double cells

: 2ROT 5 ROLL 5 ROLL ;

ROLL

xu x(u+1) ... x1 x0 u -- x(u+1) ... x1 x0 xu roll u cells on the parameter stack; 0 ROLL does nothing; 1 ROLL is the same as SWAP; 2 ROLL is the same as ROT

D*

d1|ud1 d2|ud2 -- d3|ud3 signed and unsigned double product d1*d2

: D* >R ROT DUP>R -ROT MD* 2R> * 0 SWAP D+ ;

UD/MOD

ud1 ud2 -- ud3 ud4 unsigned double remainder and quotient ud1/ud2; the result is undefined when ud2 = 0

D/MOD

d1 d2 -- d3 d4 double symmetric remainder and quotient d1/d2; the result is undefined when d2 = 0

: D/MOD
  DUP 3 PICK DUP>R XOR >R
  DABS 2SWAP DABS 2SWAP
  UD/MOD
  R> 0< IF DNEGATE THEN
  R> 0< IF 2SWAP DNEGATE 2SWAP THEN ;

DMOD

d1 d2 -- d3 double symmetric remainder of d1/d2; the result is undefined when d2 = 0

: DMOD D/MOD 2DROP ;

D/

d1 d2 -- d3 double quotient d1/d2; the result is undefined when d2 = 0

: D/ D/MOD 2SWAP 2DROP ;

D=

d1 d2 -- flag true if d1 = d2

: D= D- D0= ;

D<

d1 d2 -- flag true if d1 < d2

: D<
  DUP 3 PICK XOR 0< IF
    2DROP D0<
    EXIT
  THEN
  D- D0< ;

DU<

du1 du2 -- flag true if ud1 < ud2

: DU<
  DUP 3 PICK XOR 0< IF
    2SWAP 2DROP D0<
    EXIT
  THEN
  D- D0< ;

DMAX

d1 d2 -- d3 signed double max of d1 and d2

: DMAX
  2OVER 2OVER D< IF 2SWAP THEN
  2DROP ;

DMIN

d1 d2 -- d3 signed double min of d1 and d2

: DMIN
  2OVER 2OVER D< INVERT IF 2SWAP THEN
  2DROP ;

CELL+

addr -- addr increment to next cell

: CELL+ 2+ ;

CELLS

n1 -- n2 convert to cell unit

: CELLS 2* ;

CHAR+

n1 -- n1 increment to next char

: CHAR+ 1+ ;

CHARS

n1 -- n2 convert to char unit

: CHARS ;

DUMP

c-addr u -- dump memory in hex

: DUMP
  BASE @ >R
  HEX
  BEGIN DUP WHILE
    NEXT-CHAR .
  REPEAT
  2DROP
  R> BASE ! ;

HOLDS

c-addr u -- hold string for pictured numeric output

: HOLDS
  BEGIN DUP WHILE
    1- 2DUP + C@ HOLD
  REPEAT
  2DROP ;

BEEP

-- sound the speaker for a short ~2KHz beep

KEY-CLEAR

-- wait until no keys are pressed

: KEY-CLEAR BEGIN INKEY 0= UNTIL ;

KEY?

-- flag true if a key is pressed

: KEY? INKEY 0= 0= ;

WORD

char "ccc" -- c-addr parse word as a counted string

: WORD TMP DUP ROT PARSE-WORD ROT 2DUP C! 1+ SWAP CMOVE ;

CHAR

"name" -- char parse char; note that the syntax 'char is preferred instead of this legacy word

: CHAR PARSE-NAME DROP C@ ;

FIND

c-addr -- c-addr 0 | xt 1 | xt -1 search dictionary for counted string; see WORD, COUNT and FIND-WORD

BUFFER:

n "name" -- ; -- addr define buffer with n bytes; executing name leaves address of n bytes

: BUFFER: CREATE ALLOT ;

:NONAME

-- xt colon definition without name; leaves execution token of definition to be used or saved

C"

"ccc" -- ; -- c-addr leave counted string "ccc" (compiled); may throw -18 "parsed string overflow"

: C" ?COMP POSTPONE S" POSTPONE DROP POSTPONE 1- ;

MARKER?

xt -- flag true if xt is a MARKER word

MARKER

"name" -- ; -- define a dictionary marker; executing name deletes marker and all definitions made after; beware of vocabulary definitions crossings

: MARKER
  CURRENT
  DUP @
  HERE
  CREATE
    , 2,
  DOES>
    DUP CELL+ 2@
    SWAP TO CONTEXT
    DUP CONTEXT !
    DEFINITIONS
    L>NAME NAME> TO LASTXT
    @ HERE - ALLOT ;

ANEW

"name" -- ; -- define a dictionary marker; deletes previously defined name and all following definitions; beware of vocabulary definitions crossings

: ANEW
  >IN @ >R
  PARSE-NAME FIND-WORD
  OVER MARKER?
  AND IF
    EXECUTE
  ELSE
    DROP
  R> >IN !
  MARKER ;

[CHAR]

"char" -- ; -- char compile char as literal; note that the syntax 'char is preferred instead of this legacy word; may throw -14 "interpreting a compile-only word"

: [CHAR] ?COMP CHAR LITERAL ; IMMEDIATE

[COMPILE]

"name" -- ; ... -- ... compile name; note that POSTPONE is preferred instead of this legacy word; may throw -14 "interpreting a compile-only word"

: [COMPILE] ?COMP ' COMPILE, ; IMMEDIATE

K

-- n counter of outer (third) do loop

TEXT

-- read and evaluate TEXT editor area with Forth source code; caveat: .( and ( in TEXT cannot span more than one line, they end at EOL

: TEXT
  $7973 @ 1+ >R
  BEGIN
    R>                  \ -- addr
  DUP C@ $FF <> WHILE
    2+ DUP C@ SWAP 1+   \ -- len addr
    2DUP + >R
    SWAP 1- EVALUATE
  REPEAT
  DROP ;

Floating point math words included with the full version

Floating point values are doubles on the stack. Double words, such as 2DUP, can be used to manipulate floats. Floats can be stored in 2CONSTANT, 2VARIABLE and 2VALUE assigned with TO (but not with +TO.)

Beware that HEX prevents inputting floats and garbles the output of floats.

F+

r1 r2 -- r3 sum r1+r2; may throw -43 "floating-point result out of range"

F-

r1 r2 -- r3 difference r1-r2; may throw -43 "floating-point result out of range"

F*

r1 r2 -- r3 product r1*r2; may throw -43 "floating-point result out of range"

F/

r1 r2 -- r3 quotient r1/r2 may throw -42 "floating-point divide by zero"; may throw -43 "floating-point result out of range"

FTRUNC

r1 -- r2 truncate float towards zero

FLOOR

r1 -- r2 floor float towards negative infinity may throw -43 "floating-point result out of range"

FROUND

r1 -- r2 round float to nearest; may throw -43 "floating-point result out of range"

FNEGATE

r1 -- r2 negate float

FABS

r1 -- r2 absolute value |r1|

: FABS 2DUP F0< IF FNEGATE THEN ;

F=

r1 r2 -- flag true if r1 = r2

: F= D= ; ( works for IEEE 754 floating point without negative zero and inf/nan )

F<

r1 r2 -- flag true if r1 < r2

: F<
  DUP 3 PICK AND 0< IF
    2SWAP
  D< ; ( works for IEEE 754 floating point without negative zero and inf/nan )

F0=

r -- flag true if r = 0.0e0

: F0= D0= ; ( works for IEEE 754 floating point without negative zero and inf/nan )

F0<

r -- flag true if r < 0.0e0

: F0< D0< ; ( works for IEEE 754 floating point without negative zero and inf/nan )

FMAX

r1 r2 -- r3

max of r1 and r2

: FMAX
  2OVER 2OVER F< IF 2SWAP THEN
  2DROP ;

FMIN

r1 r2 -- r3

min of r1 and r2

: FMIN
  2OVER 2OVER F< INVERT IF 2SWAP THEN
  2DROP ;

D>F

d -- r widen signed double to float

S>F

n -- r widen signed single to float

F>D

r -- d narrow float to a signed double; may throw -11 "result out of range"

F>S

r -- n narrow float to a signed single; may throw -11 "result out of range"

>FLOAT

c-addr u -- r true | false convert string to float; leaves the float and true if string is converted; leaves false if string is unconvertable

REPRESENT

r c-addr u -- n flag true convert float to string; store decimal digits of the float in the non-empty buffer c-addr u; leaves decimal exponent n+1 and flag = true if negative

PRECISION

-- +n floating point output precision, the number of decimal digits displayed is 7 by default

7 VALUE PRECISION

F.

r -- output float with a trailing space; output fixed notation when 1e-1 <= |r| < 1e+7, otherwise output scientific notation

: F.
  HERE PRECISION REPRESENT DROP IF
    '- EMIT
  THEN
  DUP 0 PRECISION 1+ WITHIN IF
    HERE OVER TYPE
    '. EMIT
    HERE OVER +
    PRECISION ROT - '0 -TRIM TYPE SPACE
    EXIT
  THEN
  HERE C@ EMIT
  '. HERE C!
  HERE PRECISION '0 -TRIM TYPE
  'E EMIT
  1- . ;

Dictionary structure

The Forth850 dictionary is organized as follows:

     low address
      _________
+--->| $0000   |     last link is zero (2 bytes)
^    |---------|
|    | 3       |     length of "(:)" (1 byte)
|    |---------|
|    | (:)     |     "(:)" word characters (3 bytes)
|    |---------|
|    | code    |     machine code
|    |=========|
+<==>+ link    |     link to previous entry (2 bytes)
^    |---------|
:    :         :
:    :         :
:    :         :
|    |=========|
+<==>| link    |     link to previous entry (2 bytes)
^    |---------|
|    | $80+5   |     length of "aword" (1 byte) with IMMEDIATE bit set
|    |---------|
|    | aword   |     "aword" word characters (5 bytes)
|    |---------|
|    | code    |     Forth code and/or data
|    |=========|
+<---| link    |<--- last link to previous entry (2 bytes)
     |---------|
     | 7       |     length of "my-word" (1 byte)
     |---------|
     | my-word |     "my-word" word characters (7 bytes)
     |---------|
     | code    |<--- LASTXT points to code (last xt)
     |=========|<--- HERE pointer
     | hold    |     hold area for numerical output (40 bytes)
     |---------|
     |         |
     | free    |     unused dictionary space
     | space   |
     |         |
     |=========|<--- dictionary limit
     |         |
     | data    |     stack of 256 bytes (128 cells)
     | stack   |     grows toward lower addresses
     |         |<--- SP stack pointer
     |=========|
     |         |
     | return  |     return stack of 256 bytes (128 cells/calls)
     | stack   |     grows toward lower addresses
     |         |<--- RP return stack pointer
     |_________|<--- USER address
                <--- USER+1 address

     high address set with USER in Monitor MON

A link field points to the previous link field. The last link field at the lowest address of the dictionary is zero.

LASTXT returns the execution token of the last definition, which is the location where the machine code of the last word starts.

Forth850 is a Direct Threaded Code Forth implementation. Code is either machine code or starts with a jump or call machine code instruction of 3 bytes, followed by Forth code (a sequence of execution tokens in a colon definition) or data (constants, variables, values and other words created with CREATE.)

Immediate words are marked with the length byte high bit 7 set ($80). Hidden words have the "smudge" bit 6 ($40) set. A word is hidden until successfully compiled. HIDE hides the last defined word by setting the smudge bit. REVEAL reveals it. Incomplete colon definitions with compilation errors should never be revealed.

Implementation

The following sections explain parts of the technical implementation of Forth850. I will explain the new Forth system routines, the new Z80 math routines and the string routines.

Forth850 is built with the asz80 assembler and aslink linker.

Z80 Forth system routines

Forth850 uses direct threaded code (DTC). Faster would be to use subroutine threaded code (STC), but this would significantly increase the overall code size and Forth compilation complexity, which are less desirable for a small Z80-based system.

The following Z80 Forth routines are inspired by the article Moving Forth. However, I've decided to use a different Z80 register mapping that is more efficient:

  • BC: instruction pointer (IP)
  • DE: top of stack (TOS)
  • IY: address of the "next routine", for jp (iy)

By contrast to the article, having the TOS in DE makes it quicker to perform address arithmetic with the TOS, because we can exchange DE with HL with ex de,hl in just 4 CPU cycles. Moving BC to HL takes 8 CPU cycles.

I've placed the return stack pointer (RP) in RAM. There is no advantage to use register IX for RP as the article suggests. In fact, the colon call and return have the same cycle counts, but almost all of the return stack operations, such as >R, require more cycles with the RP in IX compared to the RP in RAM.

A jump to the "next routine" is with jp (iy) takes 8 CPU cycles, compared to a jp next that takes 10 cycles. Inlining the "next routine" eliminates this overhead, but increases the code size. Inlining should only be applied to performance-critical words that are frequently used. See macros NEXT and JP_NEXT defined in the section below.

Next fetch and execute

Fetching an execution token (xt) from the instruction pointer (IP) address, incrementing IP and executing the token takes 38 CPU cycles in the "next routine":

.macro          NEXT
                ld a,(bc)       ;  7    ;
                ld l,a          ;  4    ;
                inc bc          ;  6    ;
                ld a,(bc)       ;  7    ;
                ld h,a          ;  4    ;
                inc bc          ;  6    ; [ip++] -> hl with xt
                jp (hl)         ;  4(38); jump to hl
.endif

The "next routine" cycles contribute to the overhead of DTC, which cannot be removed to speed up DTC execution. To improve speed by 10% on average, the fetch and execute routine is inlined with the NEXT macro for performance-critical words. When performance is not critical, a JP_NEXT macro is used, which simply expands into jp (iy) with the IY register pointing to the "next routine":

.macro          JP_NEXT
                jp (iy)         ;  8(46); jump to next routine
.endm

Colon call and return

Each colon definition in memory starts with a call docol. The docol routine associated with the (:) word saves the instruction pointer in BC on the return stack and pops the new instruction pointer from the parameter stack (since call docol leaves the address after the call on the stack.) The routine checks for ON/BREAK key and begins executing the colon definition with the "next routine":

docol:          ld hl,(rp)      ; 16    ; [rp] -> hl
                dec hl          ;  6    ;
                ld (hl),b       ;  7    ;
                dec hl          ;  6    ;
                ld (hl),c       ;  7    ; save bc -> [--rp] with caller ip on the return stack
                ld (rp),hl      ; 16    ; ip - 2 -> rp
                pop bc          ; 10(68); pop ip saved by call docol
;               continue with ON/BREAK key check
cont:           in a,(0x1f)     ; 11    ; port 0x1f bit 7 is set if ON/BREAK is depressed
                add a           ;  4    ; test ON/BREAK key
                jr c,break      ;  7(22); if ON/BREAK pressed then break
;               next
next:           ld a,(bc)       ;  7    ;
                ld l,a          ;  4    ;
                inc bc          ;  6    ;
                ld a,(bc)       ;  7    ;
                ld h,a          ;  4    ;
                inc bc          ;  6    ; [bc++] -> hl with xt
                jp (hl)         ;  4(38); jump to hl

A return from a colon definition with the (;) word pops the return instruction pointer off the return stack to continue executing the caller's next instruction.

doret:          ld hl,(rp)      ; 16    ; [rp] -> hl
                ld c,(hl)       ;  7    ;
                inc hl          ;  6    ;
                ld b,(hl)       ;  7    ;
                inc hl          ;  6    ;
                ld (rp),hl      ; 16(58); restore [rp++] -> bc with ip of the caller
                NEXT            ; 38    ; continue

A colon call takes 145 cycles (17 + 68 + 22 + 38 cycles) and a colon return takes 96 cycles (58 + 38 cycles.) This includes the 38 cycle overhead of the "next routine" to fetch and execute the next token.

Variables

A Forth variable leaves its address on the parameter stack. A call dovar is used to push the address on the stack, which is then retrieved to set the new TOS:

dovar:          pop hl          ; 10    ; pop hl with pfa addr saved by call dovar
                push de         ; 11    ; save TOS
                ex de,hl        ;  4(25); set new TOS to hl with pfa addr
                NEXT            ; 38    ; continue

Executing a word defined as a variable takes 80 cycles (17 + 25 + 38 cycles), which includes the "next routine" overhead.

Constants and values

A Forth constant or value leaves its value on the parameter stack. A call doval is used to push the address of the constant/value on the stack. The constant/value is then fetched:

doval:          pop hl          ; 10    ; pop hl with pfa addr saved by call doval
                push de         ; 11    ; save TOS
                ld e,(hl)       ;  7    ;
                inc hl          ;  6    ;
                ld d,(hl)       ;  7(41); set [hl] -> de as new TOS
                NEXT            ; 38    ; continue

Executing a word defined as a constant or value takes 96 cycles (17 + 41 + 38 cycles), which includes the "next routine" overhead.

Fetch and store

The @ fetch and ! store words make good use of ex de,hl:

fetch:          ex de,hl        ;  4    ; addr -> hl
                ld e,(hl)       ;  7    ;
                inc hl          ;  6    ;
                ld d,(hl)       ;  7(24); set [hl] -> de as new TOS
                NEXT            ; 38    ; continue

store:          pop hl          ; 10    ; pop addr -> hl
                ex de,hl        ;  4    ; x -> de, addr -> hl
                ld (hl),e       ;  7    ;
                inc hl          ;  6    ;
                ld (hl),d       ;  7    ; de -> [hl] with x
                pop de          ; 10(44); pop new TOS
                NEXT            ; 38    ; continue

CREATE with DOES>

A Forth definer word that uses CREATE with DOES> to define new words is compiled to execute the (;CODE) token with label doscode, followed by a call dodoes to start interpreting the DOES> code:

doscode:        ld hl,(lastxt+3)        ; LASTXT -> hl with last defined word xt
                inc hl                  ;
                ld (hl),c               ;
                inc hl                  ;
                ld (hl),b               ; ip -> [LASTXT+1] overwrite call address
                jr doret                ; (;) return to caller

dodoes:         ld hl,(rp)      ; 16    ; [rp] -> hl
                dec hl          ;  6    ;
                ld (hl),b       ;  7    ;
                dec hl          ;  6    ;
                ld (hl),c       ;  7    ;
                ld (rp),hl      ; 16    ; save bc -> [--rp] with old ip on the return stack
                pop bc          ; 10    ; pop bc with new ip of the DOES> routine saved by call dodoes
                pop hl          ; 10    ; pop pfa addr
                push de         ; 11    ; save TOS
                ex de,hl        ;  4(93); set new TOS to hl with pfa addr
                NEXT            ; 38    ; continue

A word defined by a CREATE/DOES> definer makes a call to the call dodoes routine. For example, suppose we define CONSTANT as follows:

: CONSTANT CREATE , DOES> @ ;
123 CONSTANT X

then CONSTANT and X are compiled as:

CONSTANT:       call docol
                .dw create
                .dw comma
                .dw doscode
CONSTANT_does:  call dodoes
                .dw fetch
                .dw doret

X:              call CONSTANT_does
                .dw 123

Executing X takes 192 cycles (17 + 17 + 24 + 38 + 96 cycles.) When more optimally defined as a CONSTANT in code, this takes 96 cycles.

Parsing

Forth words are parsed with my new CHOP and TRIM words that efficiently parse and extract white-space-delimited words from the input.

Entry:

  • DE with TOS: a char to truncate the string with
  • 2OS: string length u1
  • 3OS: string address c-addr

Exit:

  • DE with TOS: truncated string length u2
  • 2OS: string address c-addr

Performance: 21 cycles per character for non-BL char to chop, 40 cycles per character for BL to chop white space

chop:           ld a,e                  ; char -> a
                exx                     ; save bc with ip
                ex af,af'               ; save a with char
                pop bc                  ; pop u1 -> bc
                ld e,c                  ;
                ld d,b                  ; u1 -> de
                ld a,c                  ;
                or b                    ; test bc = 0, 0 -> cf
                jr z,2$                 ; if bc = 0 then not found
                pop hl                  ;
                push hl                 ; c-addr -> hl
                ex af,af'               ; restore a with char
                cp 0x20                 ;
                jr z,3$                 ; if a = 0x20 then find white space
                or a                    ; 0 -> cf not found
;               find char in string
                cpir            ; 21/16 ; repeat until a = [hl++] or --bc = 0
                jr nz,2$                ; if match then 
1$:             ccf                     ;   complement to correct cpi bc--
2$:             ex de,hl                ; u1 -> hl
                sbc hl,bc               ; u1 - bc - cf -> hl
                push hl                 ; save hl as TOS
                exx                     ; restore bc with ip
                pop de                  ; pop new TOS
                JP_NEXT                 ; continue
;               find white space in string
3$:             cp (hl)         ;  7    ; loop to compare a to [hl]
                cpi             ; 16    ;   hl++, bc--
                jr nc,1$        ;  7    ;   if [hl]<a then found
                jp pe,3$        ; 10    ; until bc = 0
                jr 1$                   ; not found

Entry:

  • DE with TOS: char to trim the string by removing them from its beginning
  • 2OS: string length u1
  • 3OS: string address c-addr1

Exit:

  • DE with TOS: updated string length u2
  • 2OS: updated string address c-addr2

Performance: 33 cycles to trim non-BL char, 106 cycles to trim white space with BL char

trim:           ld a,e                  ; char -> a
                exx                     ; save bc with ip
                pop bc                  ; u1 -> bc
                pop hl                  ; c-addr1 -> hl
1$:             ex af,af'       ;  4    ; save a
                ld a,c          ;  4    ;
                or b            ;  4    ;
                jr z,3$         ;  7    ; if bc <> 0 then
                ex af,af'       ;  4    ;   restore a
2$:             cpi             ; 16    ;   loop
                jr nz,4$        ;  7/12 ;     while a = [hl++], --bc
                jp pe,2$        ; 10    ;   until b = 0
3$:             push hl                 ; save hl as 2OS
                push bc                 ; save bc as TOS
                exx                     ; restore bc with ip
                pop de                  ; pop new TOS
                JP_NEXT                 ; continue
4$:             cp 0x20         ;  7    ;
                jr nz,5$        ;  7    ; if char = 0x20 then
                dec hl          ;  6    ;
                cp (hl)         ;  7    ;
                inc hl          ;  6    ;
                jr nc,1$        ; 12    ;   if [hl-1] <= 0x20 then keep trimming
5$:             inc bc                  ; correct bc++ for cpi match
                dec hl                  ; correct hl-- for cpi match
                jr 3$                   ; finalize trimming

To parse a white-space-delimited word is efficiently performed with BL PARSE where the PARSE word is defined as:

: PARSE     ( char "ccc<char>" -- c-addr u )
  SOURCE
  >IN @ /STRING
  ROT CHOP
  DUP 1+ >IN @ +
  SOURCE NIP UMIN >IN ! ;

To skip input until the next non-white-space character is efficiently performed with BL SKIPS, where SKIPS is defined as:

: SKIPS     ( char "<chars>" -- )
  SOURCE >IN @ /STRING
  ROT TRIM
  DROP
  SOURCE DROP - >IN ! ;

Dictionary search with case insensitive string matching

The FIND-WORD word searches the dictionary starting with CONTEXT for a matching word. The search is case insensitive. Smudged words are skipped.

Entry:

  • DE with TOS: size of the string to search u
  • 2OS: address of the string to search c-addr

Exit:

  • DE with TOS: 0 = not found, 1 = found immediate, -1 = found (not immediate)
  • 2OS: string address if not found or execution token when found

Performance: 95 cycles per dictionary entry, 51 or 102 cycles per character comparison when characters match

findword:       ld a,d                  ;
                or a                    ; test d = 0 high order byte of u
                jp nz,zero_next         ; if u is too large then set new TOS to 0
                sla e                   ; shift u to compare w/o immediate bit
                jp c,zero_next          ; if u is too large then set new TOS to 0
                jp z,zero_next          ; if u = 0 then set new TOS to 0
                push de                 ; save de with 2*u
                exx                     ; save bc with ip
                pop bc                  ; pop 2 * u -> bc
                pop de                  ; pop c-addr -> de
                ld hl,(context+3)       ; CONTEXT -> hl
                jr 3$                   ; start searching
;               loop over dictionary
1$:             pop de                  ; restore de with c-addr
2$:             pop hl          ; 10    ; loop, restore hl with lfa
3$:             ld a,(hl)       ;  7    ;
                inc hl          ;  6    ;
                ld h,(hl)       ;  7    ;
                ld l,a          ;  4    ;   [hl] -> hl follow link at hl = lfa
                or h            ;  4    ;
                jr z,6$         ;  7    ;   if hl = 0 then not found
                push hl         ; 11    ;   save hl with lfa
                inc hl          ;  6    ;
                inc hl          ;  6    ;   hl + 2 -> hl with nt (nfa)
                ld a,(hl)       ;  7    ;   word length
                add a           ;  4    ;   shift away immediate bit
                cp c            ;  4    ;   test a = c word length match (both shifted)
                jr nz,2$        ; 12(95);   if lengths differ then continue searching
;               compare string to word
                push de                 ;   save de with c-addr
                inc hl                  ;   hl++ point to nfa chars
                ld b,c                  ;   2 * u -> b
                srl b                   ;   u -> b word length (nonzero)
;               loop over word chars
4$:             ld a,(de)       ;  7    ;   loop
                cp (hl)         ;  7    ;     compare [de] = [hl]
                jr z,5$         ; 12/7  ;     if mismatch then
                and 0xdf        ;    7  ;       make upper case
                cp 'A           ;    7  ;
                jr c,1$         ;    7  ;       if a<'A' then continue search
                cp 'Z+1         ;    7  ;
                jr nc,1$        ;    7  ;       if a>'Z' then continue search
                xor (hl)        ;    7  ;
                and 0xdf        ;    7  ;       case insensitive compare [de] = [hl]
                jr nz,1$        ;    7  ;       if mismatch then continue search
5$:             inc de          ;  6    ;     de++ point to next char of c-addr
                inc hl          ;  6    ;     hl++ point to next char of word
                djnz 4$         ; 13(51/102);until --b = 0
;               found a matching word
                pop de                  ;   discard saved c-addr
                ex (sp),hl              ;   save hl with xt as 2OS, restore hl with lfa
                inc hl                  ;
                inc hl                  ;   hl + 2 -> hl with nt (nfa)
                bit immediate_bit,(hl)  ;   test immediate bit of [hl] word length
                exx                     ;   restore bc with ip
                jp nz,one_next          ;   set new TOS to 1 if word is immediate
                jp mone_next            ;   set new TOS to -1
;               not found
6$:             push de                 ; save de with c-addr as 2OS
                exx                     ; restore bc with ip
                jp zero_next            ; set new TOS to 0
                JP_NEXT                 ; continue

Z80 integer math routines

I've written the following Z80 math routines. My objective was to make them as efficient as possible. The second objective was to keep the code size small by using tricks with CPU arithmetic and flags.

Fast signed/unsigned 16x16->16 bit multiplication

Entry:

  • BC: signed or unsigned multiplier n1
  • DE: signed or unsigned multiplicand n2

Exit:

  • HL: signed product or unsigned product n3

Perfomance: max 51 cycles x 16 iterations = 816 cycles or max 51 cycles x 8 iterations + 45 x 8 = 768 cycles, excluding entry/exit overhead

mult1616:       ld hl,0                 ; 0 -> hl
                ld a,c                  ; c -> a low order byte of n1
                ld c,b                  ; b -> c save high order byte of n1
                ld b,8                  ; 8 -> b loop counter
1$:             rra             ;  4    ; loop, a >> 1 -> a set cf
                jr nc,2$        ;  7    ;   if cf = 1 then
                add hl,de       ; 11    ;     hl + de -> hl
2$:             sla e           ;  8    ;
                rl d            ;  8    ;   de << 1 -> de
                djnz 1$         ; 13(51); until --b = 0
                ld a,c                  ; c -> a high order byte of n1
                ld b,8                  ; 8 -> b loop counter
3$:             rra             ;  4    ; loop, a >> 1 -> a set cf
                jr nc,4$        ;  7    ;   if cf = 1 then
                add hl,de       ; 11    ;     hl + de -> hl
4$:             sla e           ;  8    ;
                rl d            ;  8    ;   de << 1 -> de
                djnz 3$         ; 13(51); until --b = 0
                ret                     ; done

We can make an additional speed improvement, which only costs us one more instruction byte. To calculate the high order byte we do not need to iterate over all 8 bits of the high order multiplier stored in register c, but only over the nonzero bits. We also can ignore the lower order result stored in register e. This reduces the max loop iteration cycle time to 32 and 33 per bit. Furthermore, the second loop only runs until the last bit of register c is shifted out. If register c is zero, the second loop does not execute thereby saving hundreds of cycles. We also use jp instead of jr to improve and balance the cycle time per bit:

mult1616:       ld hl,0                 ; 0 -> hl
                ld a,c                  ; c -> a low order byte of n1
                ld c,b                  ; b -> c save high order byte of n1
                ld b,8                  ; 8 -> b loop counter
1$:             rra             ;  4    ; loop, a >> 1 -> a set cf
                jr nc,2$        ;  7    ;   if cf = 1 then
                add hl,de       ; 11    ;     hl + de -> hl
2$:             sla e           ;  8    ;
                rl d            ;  8    ;   de << 1 -> de
                djnz 1$         ; 13(51); until --b = 0
                ld a,h                  ; h -> a do high order, low order is done
                jr 5$                   ; jump to shift c and loop
3$:             add d           ;  4    ; loop, a + d -> d
4$:             sla d           ;  8    ;   d << 1 -> d
5$:             srl c           ;  8    ;   c >> 1 -> c set cf and z if no bits left
                jr c,3$         ; 12/7(32); until cf = 0 repeat with addition
                jp nz,4$        ;   10(33); until c = 0 repeat without addition
                ret                     ; done

Note: unrolling the loops would improve the speed at the cost of a significant code size increase, which is undesirable for small memory devices.

Fast unsigned 16x16->32 bit multiplication

Entry:

  • DE: unsigned multiplicand u1
  • BC: unsigned multiplier u2

Exit:

  • DE: low order unsigned product u3
  • HL: high order unsigned product u3

Perfomance: max 64 cycles x 17 iterations = 1088 cycles, excluding entry/exit overhead

umult1632:      xor a                   ; 0 -> cf
                ld l,a                  ;
                ld h,a                  ; 0 -> hl
                ld a,17                 ; 17 -> a loop counter
1$:             rr h            ;  8    ; loop
                rr l            ;  8    ;
                rr d            ;  8    ;
                rr e            ;  8    ;   hlde + cf >> 1 -> hlde
                jr nc,2$        ;  7    ;   if cf = 1 then
                add hl,bc       ; 11    ;     hl + bc -> hl
2$:             dec a           ;  4    ;
                jp nz,1$        ; 10(64); until --a = 0
                ret                     ; done

Note: unrolling the loop would improve the speed at the cost of a significant code size increase, which is undesirable for small memory devices.

Fast signed/unsigned 32x32->32 bit multiplication

Entry:

  • BC': low order signed or unsigned multiplicand d1
  • DE: high order signed or unsigned multiplicand d1
  • DE': low order signed or unsigned multiplier d2
  • HL': high order signed or unsigned multiplier d2

Exit:

  • HL': low order signed product or unsigned product d3
  • HL: high order signed product or unsigned product d3

Perfomance: max 98 cycles x 32 iterations = 3136 cycles, excluding entry/exit overhead

mult3232:       ld hl,0                 ; 0 -> hl high order d3, de with d2 high order
                exx                     ; save bc with ip
                ld a,h                  ;
                push af                 ; save d1 high order byte 3
                ld a,l                  ;
                push af                 ; save d1 high order byte 2
                ld a,b                  ;
                push af                 ; save d1 low order byte 1
                ld a,c                  ;
                push af                 ; save d1 low order byte 0
                ld hl,0                 ; 0 -> hl' low order d3
                ld c,4                  ; 4 -> c outer loop counter
1$:             pop af                  ; loop, [sp++] -> a next d1 byte
                ld b,8                  ;   8 -> b inner loop counter
2$:             rra             ;  4    ;   loop, a >> 1 -> a set cf
                jr nc,3$        ;  7    ;     if cf = 1 then
                add hl,de       ; 11    ;       hl' + de' -> hl add low order
                exx             ;  4    ;
                adc hl,de       ; 15    ;       hl + de + cf -> hl add high order
                exx             ;  4    ;
3$:             sla e           ;  8    ;
                rl d            ;  8    ;     de' << 1 -> de' shift low order
                exx             ;  4    ;
                rl e            ;  8    ;
                rl d            ;  8    ;     de << 1 + cf -> de shift high order
                exx             ;  4    ;
                djnz 2$         ; 13(98);   until --b = 0
                dec c                   ;
                jr nz,1$                ; until --c = 0
                ret                     ; done

The same tricks as the 16x16->16 multiplication method can be used to reduce the cycle time, but at a cost of increased code size. We also assign different registers to the low and high order parts:

Entry:

  • BC': low order signed or unsigned multiplicand d1
  • DE: high order signed or unsigned multiplicand d1
  • HL': low order signed or unsigned multiplier d2
  • DE': high order signed or unsigned multiplier d2

Exit:

  • HL: low order signed product or unsigned product d3
  • HL': high order signed product or unsigned product d3

Perfomance: max 8 x (98+87+59+33) = 2216 cycles, excluding entry/exit overhead

mult3232:       ld hl,0                 ; 0 -> hl low order d3, de with d2 low order
                exx                     ; save bc with ip
                ld a,h                  ;
                push af                 ; save d1 high order byte 3
                ld a,l                  ;
                push af                 ; save d1 high order byte 2
                ld a,b                  ;
                push af                 ; save d1 low order byte 1
                ld a,c                  ; d1 -> a low order byte 0
                ld hl,0                 ; 0 -> hl' high order d3
                ld b,8                  ; 8 -> b loop counter
1$:             rra             ;  4    ; loop, a >> 1 -> a set cf
                jr nc,2$        ;  7    ;   if cf = 1 then
                exx             ;  4    ;
                add hl,de       ; 11    ;     hl + de -> hl add low order
                exx             ;  4    ;
                adc hl,de       ; 15    ;     hl' + de' + cf -> hl add high order
2$:             exx             ;  4    ;
                sla e           ;  8    ;
                rl d            ;  8    ;   de << 1 -> de shift low order
                exx             ;  4    ;
                rl e            ;  8    ;
                rl d            ;  8    ;   de' << 1 + cf -> de' shift high order
                djnz 1$         ; 13(98); until --b = 0
                pop af                  ;
                ld c,a                  ; d1 -> c low order byte1
                exx                     ;
                ld a,h                  ; h -> a low order d3
                exx                     ;
                ld b,8                  ; 8 -> b loop counter
3$:             rr c            ;  8    ; loop, c >> 1 -> c set cf
                jr nc,4$        ;  7    ;   if cf = 1 then
                exx             ;  4    ;
                add d           ;  4    ;     a + d -> a add low order
                exx             ;  4    ;
                adc hl,de       ; 15    ;     hl' + de' + cf -> hl add high order
4$:             exx             ;  4    ;
                sla d           ;  8    ;   d << 1 -> d shift low order
                exx             ;  4    ;
                rl e            ;  8    ;
                rl d            ;  8    ;   de' << 1 + cf -> de' shift high order
                djnz 3$         ; 13(87); until --b = 0
                exx                     ;
                ld h,a                  ; a -> h low order d3
                exx                     ;
                pop af                  ; d1 -> a high order byte 2
                ld b,8                  ; 8 -> b loop counter
5$:             rra             ;  8    ; loop, c >> 1 -> c set cf
                jr nc,6$        ;  7    ;   if cf = 1 then
                add hl,de       ; 15    ;     hl' + de' + cf -> hl add high order
6$:             rl e            ;  8    ;
                rl d            ;  8    ;   de' << 1 + cf -> de' shift high order
                djnz 5$         ; 13(59); until --b = 0
                pop af                  ;
                ld c,a                  ; d1 -> c high order byte 3
                ld a,h                  ; h -> a high order
                jr 9$                   ; jump to shift c and loop
7$:             add d           ;  4    ; loop, a + d -> a
8$:             sla d           ;  8    ;   d << 1 -> d
9$:             srl c           ;  8    ;   c >> 1 -> c set cf and z if no bits left
                jr c,7$         ; 12/7(32); until cf = 0 repeat with addition
                jp nz,8$        ;   10(33); until c = 0 repeat without addition
                ld h,a                  ; a -> h high order
                exx                     ;
                ret                     ; done

Note: unrolling the inner loop would improve the speed at the cost of a significant code size increase, which is undesirable for small memory devices.

Fast unsigned 32/16->16 bit division and remainder

This implementation is used by all division and remainder (modulo) Forth words by calling UM/MOD. As such, it is an important and versatile algorithm.

Entry:

  • HL: high order dividend ud
  • BC: low order dividend ud
  • DE: divisor u1

Exit:

  • HL: remainder u2
  • BC: quotient u3

Performance: max 85 cycles x 16 iterations = 1360 cycles, excluding entry/exit overhead

udiv3216:       xor a                   ;
                sub e                   ;
                ld e,a                  ;
                sbc a                   ;
                sub d                   ;
                ld d,a                  ; -de -> de with -u1
                ld a,b                  ; b -> a low order dividend in ac
                ld b,16                 ; 16 -> b loop counter
                sla c                   ;
                rla                     ; ac << 1 -> ac
1$:             adc hl,hl       ; 15    ; loop, hl << 1 + cf -> hl
                jr nc,2$        ; 12/ 7 ;   if cf = 1 then
                add hl,de       ;    11 ;     hl + -u1 -> hl
                scf             ;     4 ;     1 -> cf
                jr 3$           ;    12 ;   else
2$:             add hl,de       ; 11    ;     hl + -u1 -> hl
                jr c,3$         ; 12/ 7 ;     if cf = 0 then
                sbc hl,de       ;    15 ;       hl - -u1 -> hl to undo, no carry
3$:             rl c            ;  8    ;
                rla             ;  4    ;   ac << 1 + cf -> ac
                djnz 1$         ; 13(85); until --b = 0
                ld b,a                  ; a -> b quotient bc, remainder in hl
                ret                     ; done

The algorithm negates the divisor first to speed up subtraction by adding the negative of the divisor instead. Another benefit of using the negated divisor is that this produces the right carry value to shift into the quotient, otherwise the carry should be inverted or the resulting quotient must be inverted.

By moving the first conditional block out of the loop, we can save 5 CPU cycles on the critical path (the most expensive path through the loop) to reduce to max 80 cycles per iteration at the cost of making the code more cluttered.

Entry:

  • HL: high order dividend ud
  • BC: low order dividend ud
  • DE: divisor u1

Exit:

  • HL: remainder u2
  • BC: quotient u3

Performance: max 80 cycles x 16 iterations = 1280 cycles, excluding entry/exit overhead

udiv3216:       xor a                   ;
                sub e                   ;
                ld e,a                  ;
                sbc a                   ;
                sub d                   ;
                ld d,a                  ; -de -> de with -u1
                ld a,b                  ; b -> a low order dividend in ac
                ld b,16                 ; 16 -> b loop counter
                sla c                   ;
                rla                     ; ac << 1 -> ac
1$:             adc hl,hl       ; 15    ; loop, hl << 1 + cf -> hl
                jr c,3$         ;  7/12 ;   if cf = 1 then hl + -u1 -> hl, 1 -> cf else
                add hl,de       ; 11    ;     hl + -u1 -> hl
                jr c,2$         ; 12/ 7 ;     if cf = 0 then
                sbc hl,de       ;    15 ;       hl - -u1 -> hl to undo, no carry
2$:             rl c            ;  8    ;
                rla             ;  4    ;   ac << 1 + cf -> ac
                djnz 1$         ; 13(80); until --b = 0
                ld b,a                  ; a -> b quotient bc, remainder in hl
                ret                     ; done

3$:             add hl,de       ;    11 ; hl + -u1 -> hl
                scf             ;     4 ; 1 -> cf
                jr 2$           ;    12 ;

By comparison, the CamelForth Z80 code is also fast, but slower than my implemenation with 90 cycles x 16 iterations = 1440 cycles, excluding entry/exit overhead:

udiv3216:       ld a,16                 ; 16 -> a loop counter
                sla e                   ;
                rl d                    ; de << 1 -> de
1$:             adc hl,hl       ; 15    ; loop, hl << 1 + cf -> hl
                jr nc,2$        ; 12/ 7 ;   if cf = 1 then
                or a            ;     4 ;     0 -> cf
                sbc hl,bc       ;    15 ;     hl - u1 -> hl
                or a            ;     4 ;     0 -> cf
                jp 3$           ;    10 ;   else
2$:             sbc hl,bc       ; 15    ;     hl - u1 -> hl
                jr nc,3$        ; 12/ 7 ;     if cf = 1 then
                add hl,bc       ;    11 ;       hl + u1 -> hl to undo sbc, sets cf
3$:             rl e            ;  8    ; 
                rl d            ;  8    ;   de << 1 + cf -> de with inverse cf we'll need
                dec a           ;  4    ;
                jp nz,1$        ; 10(90); until --a = 0
                ld a,e                  ;
                cpl                     ;
                ld e,a                  ;
                ld a,d                  ;
                cpl                     ;
                ld d,a                  ; complement de, faster than ccf in loop
                ret                     ; done

Note: unrolling the loop would improve the speed at the cost of a significant code size increase, which is undesirable for small memory devices.

Fast unsigned 32/32->32 bit division and remainder

This algorithm uses the shadow registers BC', DE' and HL'. Because of this register pressure, there is little room for further optimization. Registers IX and IY cannot be used since they lack the necessary adc and sbc instructions.

Entry:

  • BC: high order dividend ud1
  • BC': low order dividend ud1
  • DE': high order divisor ud2
  • DE: low order divisor ud2

Exit:

  • HL': high order remainder ud3
  • HL: low order remainder ud3
  • BC: high order quotient ud4
  • BC': low order quotient ud4

Performance: max 162 cycles x 32 iterations = 5184 cycles, excluding entry/exit overhead

udiv3232:       exx                     ;
                xor a                   ;
                ld h,a                  ;
                ld l,a                  ; 0 -> hl'
                rl c                    ;
                rl b                    ;
                exx                     ;
                ld h,a                  ;
                ld l,a                  ; 0 -> hl
                ld a,b                  ; b -> a
                rl c                    ;
                rla                     ; ac << 1 -> ac
                ld b,32                 ; 32 -> b loop counter
1$:             adc hl,hl       ; 15    ;
                exx             ;  4    ;
                adc hl,hl       ; 15    ;
                exx             ;  4    ;   hl'.hl << 1 + cf -> hl'.hl no carry
                sbc hl,de       ; 15    ;
                exx             ;  4    ;
                sbc hl,de       ; 15    ;   hl'.hl - de'.de -> hl'.hl
                jr nc,2$        ; 12/ 7 ;   if cf = 1 then
                exx             ;     4 ;
                add hl,de       ;    11 ;
                exx             ;     4 ;
                adc hl,de       ;    15 ;     hl'.hl + de'.de -> hl'.hl to undo, sets carry
2$:             ccf             ;  4    ;   complement cf
                rl c            ;  8    ;
                rl b            ;  8    ;
                exx             ;  4    ;
                rl c            ;  8    ;
                rla             ;  4    ;   ac.bc' << 1 + cf
                djnz 1$         ; 13(162); until --b = 0
                ld b,a                  ;
                ld e,c                  ;
                ret                     ;

Z80 floating point math routines

I've written a collection of Z80 IEEE 754 single precision floating point math routines:

  • math.asm (960 bytes of code) a simple version with truncation
  • mathr.asm (1085 bytes of code) includes three IEEE 754 rounding modes, where the default rounding mode is to round to nearest, ties to even;
  • mathri.asm (1296 bytes of code) includes the three IEEE 754 rounding modes, and inf/nan and signed zero. This version is not intended for Forth850, because Forth850 raises floating point exceptions.

My objective was to make the floating point routines as efficient as possible, such as by using the shadow registers instead of memory. No memory is used, except at most one push-pop pair to move a value between the (shadow) registers. The second objective was to keep the code size small by using tricks with CPU arithmetic and flags. The floating point library is about 1KB.

Single precision floating point values are stored in registers BC (high order) and DE (low order) to form a 32 bit float bcde and shadow float bcde'.

  • fadd: float bcde + bcde' -> bcde; cf set on overflow
  • fsubx: float bcde - bcde' -> bcde; cf set on overflow
  • fsuby: float bcde' - bcde -> bcde; cf set on overflow
  • fneg: float - bcde -> bcde; no errors (cf reset)
  • fabs: float |bcde| -> bcde; no errors (cf reset)
  • fmul: float bcde * bcde' -> bcde; cf set on overflow
  • fdivx: float bcde / bcde' -> bcde; cf set on overflow or when dividing by zero
  • fdivy: float bcde' / bcde -> bcde; cf set on overflow or when dividing by zero
  • ftoi: float bcde -> signed 32 bit integer bcde truncated towards zero; cf set when out of range
  • itof: signed 32 bit integer bcde -> float bcde; no errors (cf reset)
  • ftrunc: float trunc(bcde) -> bcde; no errors (cf reset)
  • ffloor: float floor(bcde) -> bcde; cf set on overflow
  • fround: float round(bcde) -> bcde; cf set on overflow
  • fpow10: 10^a * bcde -> bcde for -128 <= a < 39; cf set on overflow
  • atof: string [hl..hl+a-1] -> float bcde; cf set on parsing error and hl points after the char
  • ftoa: float bcde -> [hl...hl+a-1] string of digits, exponent e and sign d bit 7; no errors (flags undefined)
  • fzero: set bcde to 0.0

mathri.asm includes inf/nan and signed zero. In this version the routines listed above may return signed inf or nan with cf set to indicate overflow and errors. In addition, this version includes the following routines:

  • ftype: float bcde -> bcde unchanged; cf set if bcde is nan, cf reset and z set if bcde is +/-inf
  • fnan: set bcde to nan; cf set
  • finf: set bcde to inf with sign in register A bit 7 (negative when set); cf set

Z80 string routines

I've written the following Z80 string routines. My objective was to make them as efficient as possible, such as by making the obvious choice to use the cpi and cpir Z80 instructions to minimize cycle count. The second objective was to keep the code size small by using tricks with CPU arithmetic and flags.

Fast string comparison

Entry:

  • IX: address of the first string c-addr1
  • HL: size of the first string u1
  • DE: address of the second string c-addr2
  • BC: size of the second string u2

Exit:

  • A: -1 (less), 0 (equal), 1 (greater)
  • F: zero flag set when equal, sign flag set when less

Performance: 46 cycles per character comparison when characters match

compare:        push ix                 ; save c-addr1
                push hl                 ; save u1
                xor a                   ; 0 -> a flags u1 = u2, 0 -> cf
                sbc hl,bc               ;
                jr z,1$                 ; if u1 <> u2 then
                inc a                   ;   1 -> a flags u1 > u2
                jr nc,1$                ;   if u1 < u2 then
                pop bc                  ;     pop u1 -> bc
                push bc                 ;     rebalance stack
                ld a,-1                 ;   -1 -> a flags u1 < u2
1$:             pop hl                  ; pop to discard u1
                pop hl                  ; pop c-addr1 -> hl
                ex af,af'               ; save a with -1|0|1 flag
                ld a,c                  ;
                or b                    ;
                jr z,3$                 ; if bc <> 0 then
;               compare chars
2$:             ld a,(de)       ;  7    ;   loop
                cpi             ; 16    ;     compare [hl++] to [de], --bc
                jr nz,4$        ;  7    ;     while characters [hl] and [de] are equal
                inc de          ;  6    ;     de++
                jp pe,2$        ; 10(46);   until bc = 0
;               chars match, check lengths
3$:             ex af,af'               ; restore a with -1|0|1 flag
                ret                     ;
;               strings differ
4$:             dec hl                  ; hl-- to correct cpi overshoot
                cp (hl)                 ; test a<[hl]
                ccf                     ; complement cf, cf = 1 if [hl]<a
                sbc a                   ; a = -1 if cf = 1 else 0
                add a                   ; a = -2 if cf = 1 else 0
                inc a                   ; a = -1 if cf = 1 else 1
                ret                     ; done

Fast string search

Naive string search, i.e. not Knuth-Morris-Pratt which is faster but would require a table and more code.

Entry:

  • HL: address of the string searched c-addr1
  • IX: size of the string searched u1
  • DE: address of the string to search c-addr2
  • BC: size of the string to search u2

Exit:

  • F: carry set when no match found
  • HL: address of the string position found c-addr3
  • BC: size of the remaining characters after the match

Performance: 21 cycles per character to search the first or next character match and 46 cycles per character comparison when characters match

search:         or a                    ; 0 -> cf
                sbc ix,bc               ; u1 - u2 -> ix
                ret c                   ; if u2>u1 then impossible search, cf = 1
                ld a,c                  ;
                or b                    ;
                ret z                   ; if u2 = 0 then done (found), cf = 0
                push ix                 ;
                push bc                 ;
                pop ix                  ; u2 -> ix
                pop bc                  ; u1 - u2 -> bc
                inc bc                  ; u1 - u2 + 1 -> bc correct for cpir
                push hl                 ; save c-addr1 on the stack
;               find char match
1$:             push de                 ; loop, save de with c-addr2
                ld a,(de)               ;   [de] -> a
                cpir            ; 21/16 ;   repeat until a = [hl++] or --bc = 0
                jr nz,4$                ;   if no match then not found
                pop de                  ;   restore de with c-addr2
                push bc                 ;
                push de                 ;
                push hl                 ;   save bc,de,hl
                push ix                 ;
                pop bc                  ;   u2 -> bc
;               compare substrings
                dec bc                  ;   u2 - 1 -> bc since u2 > 0
                ld a,c                  ;
                or b                    ;
                jr z,3$                 ;   if bc<> 0 then
                inc de                  ;     de++ to start matching at c-addr2+1
2$:             ld a,(de)       ;  7    ;     loop
                cpi             ; 16    ;       compare [hl++] to [de], --bc
                jr nz,3$        ;  7    ;       while characters [hl] and [de] are equal
                inc de          ;  6    ;       de++
                jp pe,2$        ; 10(46);     until bc = 0
3$:             pop hl                  ;
                pop de                  ;
                pop bc                  ;   restore bc,de,hl
                jr nz,1$                ; repeat
;               substrings match
                dec hl                  ; hl-- to correct cpir overshoot
                ret                     ; done, cf = 0
;               not found
4$:             scf                     ; 1 -> cf
                ret                     ; done, cf = 1

Sharp PC-G850(V)(S)

Forth resources

Forth850 benefits from the work done by many others to offer inspiration, but the system does not include licensed code of the following implementations or any other implementation not listed here. Some parts of Forth850 are derived from freely available Forth resources listed above and the Z80 resources listed further below:

Z80 resources

Z80 maths