How many primitives are needed to implement a fort system?

In 1992, another competition for obfuscated programming in the C language was held . One of the projects presented was a small fort system . It struck me that the virtual machine was implemented in just 794 bytes of C code. The rest of the fort system was loaded from the source on the fort. After studying the project, the initial enthusiasm gave way to disappointment, as the author used a not entirely "honest" trick: he used the scanf () function to parse the fortified source. From that moment on, I was tormented by the question - how many primitives are needed to implement a fort system without such tricks?



After a while, I was introduced to the sod32 fort system written by Lennart Benschop . For this system fort, the virtual machine is written in C, and the binary image that is loaded into the virtual machine does not depend on the byte order in the word host. sod32 has 32 primitives, here they are:

NOOP ( --- ) Do nothing
SWAP ( x1 x2 --- x2 x1 ) Swap the two top items on the stack.
ROT ( x1 x2 x3 --- x2 x3 x1 ) Rotate the three top items on the stack.
0= ( x --- f) f is true if and only if x is 0.
NEGATE ( n1 --- -n1) Negate top number on the stack.
UM* ( u1 u2 --- ud ) Multiply two unsigned numbers, giving double result.
C@ ( c-addr --- c) Fetch character c at c-addr.
@ ( a-addr --- x) Fetch cell x at a-addr.
+ ( w1 w2 --- w3) Add the top two numbers on the stack.
AND ( x1 x2 --- x3) Bitwise and of the top two cells on the stack.
OR ( x1 x2 --- x3) Bitwise or of the top two cells on the stack.
XOR ( x1 x2 --- x3) Bitwise exclusive or of the top two cells on the stack.
U< ( u1 u2 ---- f) f is true if and only if unsigned number u1 is less than u2.
< ( n1 n2 --- f) f is true if and only if signed number n1 is less than n2.
LSHIFT ( x1 u --- x2) Shift x1 left by u bits, zeros are added to the right.
RSHIFT ( x1 u --- x2) Shift x1 right by u bits, zeros are added to the left.
UM/MOD ( ud u1 --- urem uquot) Divide the unsigned double number ud by u1, giving unsigned quotient and remainder.
+CY ( n1 n2 cy1 --- n3 cy2) Add n1 and n2 and the carry flag cy1 (must be 0 or 1) giving sum n3 and carry flag cy2. (n3 cy2 can be seen as a double number)
SCAN1 ( x d --- u) Scan x for the first 1 bit. u is the position of that bit (counted from the scan direction) and 32 if x=0. d is the scan direction, 0 is left-to-right, 1 is right-to-left.
SPECIAL ( x ---) Any of a large number of special instructions, indicated by x.
DROP ( x ---) Discard the top item on the stack.
>R ( x ---) Push x on the return stack.
C!A ( c c-addr --- c-addr) Store character c at c-addr, keep address.
!A ( x a-addr --- a-addr) Store cell x at a-addr, keep address.
DUP ( x --- x x ) Duplicate the top cell on the stack.
OVER ( x1 x2 --- x1 x2 x1) Copy the second cell of the stack.
R@ ( --- x) x is a copy of the top of the return stack.
R> ( --- x) Pop the top of the return stack and place it on the stack.
0 ( --- 0) The constant number 0.
1 ( --- 1) The constant number 1.
4 ( --- 4) The constant number 4.
LIT ( --- lit) Push literal on the stack (literal number is in-line).






I realized that I had a chance to find an answer to my question and I started to turn primitives into high-level definitions. I want to point out right away that all this activity has a purely academic meaning. It is unlikely that it will be possible to apply the results obtained in practice due to the loss of performance. As I experimented, I tracked the size of the fort system binary and the runtime of the John Hayes test suite. I built a new binary image with the command

echo 'S" extend-cross.4" INCLUDED '|./sod32 kernel.img
      
      





and ran the tests like this:

time ./sod32 kernel.img <<< $(echo -e 'S" tester.fr" INCLUDED\n12345\nBYE')
      
      





In the table below you can see how each change affected size and performance. Links from the "change" column lead to the corresponding commit in the github.

the change kernel.img size execution time tester.fr
original sod32 10164 0m0.059s
lshift, rshift 10312 0m0.071s
+, um *, um / mod 11552 0m0.123s
c @, c! a 11952 0m0.795s
0 =, negate <, u < 12340 0m2.800s
drop 12784 0m5.022s
swap, rot, over 13436 0m5.860s
sp @, sp !, rp @, rp !, dup 13680 0m8.696s
r @, r>,> r 14160 0m15.463s
and, or, xor 14336 0m21.198s
0, 1, 4 15236 0m21.671s
0branch 15912 0m41.765s


As a result, the size of the binary image of the fort system increased from 10164 to 15912 (+ 57%), the performance dropped 708 times (almost 3 orders of magnitude). Perhaps performance could be improved by profiling the code and optimizing bottlenecks, but I'm sure the result will still be very slow compared to the original sod32.



From 32 primitives plus additional logic in the internal interpreter, I ended up with 7: nop, nand,!, @, Um +, special, lit. The internal interpreter retains the logic for executing primitives and high-level definitions (call), as well as logic for completing the high-level definitions (next). I found the answer to my question: a fort system can be built on the basis of 9 primitives (or 8, if nop does not have to be a primitive). I am confused by the fact that there are as many as 3 primitives for memory access: @ ,! and lit, but I haven't figured out how to avoid it. I could well have missed something, so if you know how you can get rid of a large number of primitives, please write in the comments.



All Articles