After a while, I was introduced to the sod32 fort system written by Lennart Benschop . For this system fort, the virtual machine is written in C, and the binary image that is loaded into the virtual machine does not depend on the byte order in the word host. sod32 has 32 primitives, here they are:
NOOP
( --- )
Do nothing
SWAP
( x1 x2 --- x2 x1 )
Swap the two top items on the stack.
ROT
( x1 x2 x3 --- x2 x3 x1 )
Rotate the three top items on the stack.
0=
( x --- f)
f is true if and only if x is 0.
NEGATE
( n1 --- -n1)
Negate top number on the stack.
UM*
( u1 u2 --- ud )
Multiply two unsigned numbers, giving double result.
C@
( c-addr --- c)
Fetch character c at c-addr.
@
( a-addr --- x)
Fetch cell x at a-addr.
+
( w1 w2 --- w3)
Add the top two numbers on the stack.
AND
( x1 x2 --- x3)
Bitwise and of the top two cells on the stack.
OR
( x1 x2 --- x3)
Bitwise or of the top two cells on the stack.
XOR
( x1 x2 --- x3)
Bitwise exclusive or of the top two cells on the stack.
U<
( u1 u2 ---- f)
f is true if and only if unsigned number u1 is less than u2.
<
( n1 n2 --- f)
f is true if and only if signed number n1 is less than n2.
LSHIFT
( x1 u --- x2)
Shift x1 left by u bits, zeros are added to the right.
RSHIFT
( x1 u --- x2)
Shift x1 right by u bits, zeros are added to the left.
UM/MOD
( ud u1 --- urem uquot)
Divide the unsigned double number ud by u1, giving unsigned quotient and remainder.
+CY
( n1 n2 cy1 --- n3 cy2)
Add n1 and n2 and the carry flag cy1 (must be 0 or 1) giving sum n3 and carry flag cy2. (n3 cy2 can be seen as a double number)
SCAN1
( x d --- u)
Scan x for the first 1 bit. u is the position of that bit (counted from the scan direction) and 32 if x=0. d is the scan direction, 0 is left-to-right, 1 is right-to-left.
SPECIAL
( x ---)
Any of a large number of special instructions, indicated by x.
DROP
( x ---)
Discard the top item on the stack.
>R
( x ---)
Push x on the return stack.
C!A
( c c-addr --- c-addr)
Store character c at c-addr, keep address.
!A
( x a-addr --- a-addr)
Store cell x at a-addr, keep address.
DUP
( x --- x x )
Duplicate the top cell on the stack.
OVER
( x1 x2 --- x1 x2 x1)
Copy the second cell of the stack.
R@
( --- x)
x is a copy of the top of the return stack.
R>
( --- x)
Pop the top of the return stack and place it on the stack.
0
( --- 0)
The constant number 0.
1
( --- 1)
The constant number 1.
4
( --- 4)
The constant number 4.
LIT
( --- lit)
Push literal on the stack (literal number is in-line).
I realized that I had a chance to find an answer to my question and I started to turn primitives into high-level definitions. I want to point out right away that all this activity has a purely academic meaning. It is unlikely that it will be possible to apply the results obtained in practice due to the loss of performance. As I experimented, I tracked the size of the fort system binary and the runtime of the John Hayes test suite. I built a new binary image with the command
echo 'S" extend-cross.4" INCLUDED '|./sod32 kernel.img
and ran the tests like this:
time ./sod32 kernel.img <<< $(echo -e 'S" tester.fr" INCLUDED\n12345\nBYE')
In the table below you can see how each change affected size and performance. Links from the "change" column lead to the corresponding commit in the github.
the change | kernel.img size | execution time tester.fr |
original sod32 | 10164 | 0m0.059s |
lshift, rshift | 10312 | 0m0.071s |
+, um *, um / mod | 11552 | 0m0.123s |
c @, c! a | 11952 | 0m0.795s |
0 =, negate <, u < | 12340 | 0m2.800s |
drop | 12784 | 0m5.022s |
swap, rot, over | 13436 | 0m5.860s |
sp @, sp !, rp @, rp !, dup | 13680 | 0m8.696s |
r @, r>,> r | 14160 | 0m15.463s |
and, or, xor | 14336 | 0m21.198s |
0, 1, 4 | 15236 | 0m21.671s |
0branch | 15912 | 0m41.765s |
As a result, the size of the binary image of the fort system increased from 10164 to 15912 (+ 57%), the performance dropped 708 times (almost 3 orders of magnitude). Perhaps performance could be improved by profiling the code and optimizing bottlenecks, but I'm sure the result will still be very slow compared to the original sod32.
From 32 primitives plus additional logic in the internal interpreter, I ended up with 7: nop, nand,!, @, Um +, special, lit. The internal interpreter retains the logic for executing primitives and high-level definitions (call), as well as logic for completing the high-level definitions (next). I found the answer to my question: a fort system can be built on the basis of 9 primitives (or 8, if nop does not have to be a primitive). I am confused by the fact that there are as many as 3 primitives for memory access: @ ,! and lit, but I haven't figured out how to avoid it. I could well have missed something, so if you know how you can get rid of a large number of primitives, please write in the comments.