Cheney on the MTA: a compiler in which the stack also serves as a heap

 

Did he ever return? No, he never returned,

And his fate is still unlearned,

He may ride forever 'neath the streets of Boston,

He's the man who never returned.



“Charlie on the MTA”, 1949


1. Closures



One of the handy features of modern programming languages ​​is nested functions:



def bubble(arr, comp):

    def swap(i, j):
        temp = arr[i]
        arr[i] = arr[j]
        arr[j] = temp

    flag = True
    while flag:
        flag = False
        for i in range(len(arr) - 1):
            if comp(arr[i], arr[i+1]) > 0:
                swap(i, i+1)
                flag = True

      
      





This possibility itself is not new: it was already in Algol (1958) and is familiar to many from Pascal (1970). Compiling nested functions is easy: for example, the stack frame of the inner function can store a pointer to the stack frame of the outer function so that the inner function can access the parameters and local variables of the outer one. Someone may recall that the instructions enter



and leave



, which appeared in 80186 (1982), implement exactly this kind of support for nested functions (although I have never seen any compiler that did it).



Difficulties begin if the language allows you to transfer an internal function outside of an external one:



def by_field(name):

    def comp(x, y):
        return x[name] – y[name]

    return comp

bubble(my_records, by_field("year"))

      
      





How can the inner function access the parameters and local variables of the outer after the return from the outer function has destroyed its stack frame? Somehow the inner function has to "grab" the used variables along with it; a function together with externally captured variables is called a "closure". Pascal no longer supports this; but for example, in C # 7 (2017), for this, an object is created on the heap, containing all those parameters and local variables that the internal function refers to; and calls both from it and from the external function go not to the values ​​on the stack, but to the fields of the object on the heap. Is it possible to do without this complication, and continue working with the stack - to avoid unnecessary indirect addressing and preserve the locality of calls, and not clog the data cache with jumps over the heap?



2. Continuation Passing



When compiling functional programming languages, where capturing local variables by a return function is a very common technique, the first step is to translate the entire program into a “Continuation-passing style” (CPS). function returns are replaced with an explicit callback passed to each function as an additional argument. For example, in this simple function that calculates the product of all prime numbers from 1 to n



inclusive:



def prodprimes(n):
    if n == 1:
        return 1
    elif isprime(n):
        return n * prodprimes(n-1)
    else:
        return prodprimes(n-1)

      
      





- prodprimes



different return addresses are implicitly transferred to two calls . If these addresses are designated as j



and h



, and the return address is passed as an explicit argument c



, then the entire transfer of control can be made explicit:



def prodprimes(n, c):
    if n == 1:
        c(1)
    else:
        def k(b):
            if b:
                def j(p):
                    c(n * p)
                prodprimes(n-1, j)
            else:
                def h(q):
                    c(q)
                prodprimes(n-1, h)
        isprime(n, k)

      
      





In CPS, there is no difference between calling a function and returning a value from a function - both are abstracted as "pass a value to a specified address." This technique has been used by most Scheme compilers since the very first (1975); a whole book "Compiling with Continuations" (1992) is devoted to him, from where the above example is taken. More recently, a similar style of programming has become popular under the name “promises”.



The reason why CPS was used internally by compilers as an intermediate representation before SSA(invented in 1988) has become more popular - simplicity: this is not another language with its own rules, but a sublanguage of the original PL with the restriction that a function or continuation call is allowed only as the last function or continuation operator. It is easy to translate PL code into CPS by a set of formal transformations, and it is easy to apply simplifying transformations to CPS code - for example, find out that the continuation is h



trivial and replace the usage h



with c



. An important feature of CPS for us is that closures are used in the same scope in which they were declared, and therefore can access variables captured from the outside in the same way as in 80186 - through pointers to external stack frames:



def by_field(name, c):

    def comp(x, y, c):
        c(x[name] – y[name])

    c(comp)

def by_year(comp):
    bubble(my_records, comp, halt)

by_field("year", by_year)

      
      





After conversion to CPS comp



, it knows that it itself is a function of nesting 2, and the value name



lies in the nesting frame 1, so compilation of the call to name



will not cause difficulties.



But CPS has a drawback: since continuations never execute return



then their stack frames are never destroyed and the stack will overflow very quickly. On the other hand, a continuation may need a certain stack frame, either if it itself refers to it, or if it receives as a parameter a continuation that refers to this frame. In addition, the transition to the next continuation must be the last action inside the continuation, so that the address and parameters of the next continuation can be considered the result of the continuation. (The "promises" model returns this result explicitly.) Classic Scheme compilers used a dispatcher as the following infinite loop:



  1. Execute the current continuation and get the address and parameters of the next continuation as a result;
  2. Check which stack frames can be accessed by the next continuation and continuation parameters passed to it;
  3. , .


With this implementation, the system call stack is not used, and transitions between the dispatcher and the continuations are implemented as normal jmp



. Continuation frames are decoupled from the call stack and destroyed in random order instead of LIFO , so that their collection can be considered as a stack as well as a heap.



As you might guess, with a stack check on every branch between continuation, the performance of the program left a lot to be desired. One of the possible optimizations is to check the current stack size before exiting the continuation, and if it does not exceed the specified threshold, then go to the next continuation directly; otherwise, transfer control to the dispatcher so that it collects all the garbage from the stack. Boston graduate MIT Henry Baker commented on this approach: "instead of bouncing on the trampoline all the time, we jump off the Empire State Building - but much less often."



3. On MTA



In 1948, the Boston Metro (Metropolitan Transit Authority) raised fares from 10 cents to 15 cents. Rather than replacing all the dimes at the entrance to the subway, conductors were instructed to charge a five-cent surcharge upon exiting the train. Making fun of this approach, a candidate for mayor of Boston ordered a recording of a song about a certain Charlie, who did not have enough penny to pay for an exit, and he was doomed to ride the subway endlessly. The candidate earned a reputation as a communist, won only 1.2% of the vote in the elections, and left politics forever; But the song about the passenger who never returns to the surface of the earth turned out to be much more popular: even the Boston subway card, introduced in 2006, is called CharlieCard in honor of that same passenger.



Charlie's story inspired Henry Baker to write a Scheme compiler in 1994 that turns every continuation into a C function so that the execution of those functions never reaches return



: for example,



(define (revappend old new)
  (if (null? old)
      new
      (revappend (cdr old) (cons (car old) new))))

      
      





- turns into



void revappend(clos_type *cont, cons_type *old, cons_type *new) {
  if (old == NIL) {
    /* Call continuation with new as result. */
    return (cont->fn)(cont->env, new);
  }
  cons_type newer; /* Should check stack here. */
  /* Code for (revappend (cdr old) (cons (car old) new)). */
  newer.tag = cons_tag; newer.car = old->car; newer.cdr = new;
  return revappend(cont, old->cdr, &newer);
}

      
      





The meaning of the operator return



after such a conversion is to tell the C compiler that there is no need to save the registers before calling the continuation; in fact, a function called as an operand return



will never return. At the place marked as /* Should check stack here. */



, a stack size check is inserted, and if necessary, the dispatcher is called for garbage collection.



This approach has several advantages over the classic one:



  1. Scheme : ; – , .. (varargs); – , .. (VLA). 21 . LLVM, .
  2. ABI – , Scheme. «» , CPS, . «» callback- «» , , , «» – , . Scheme, map



    fold



    , , .


On the other hand, C does not support nested functions, which means that all pointers to variables captured from the outside must be explicitly passed along with the closure. In addition, when placing continuation frames on the system stack instead of a self-written one, a difficulty arises: how to implement garbage collection for the system stack, which is not tied to the device of a specific C compiler on a specific architecture?



4. Cheney's garbage collector



The very first and simplest garbage collector was implemented in 1969 for LISP: when one half of the heap was full, the program stopped, and all "live" data was recursively (depth-first traversal) transferred to the second half of the heap. In 1970, Chris Cheney - also in Cambridge, but on the other side of the ocean from MIT - noticed that by crawling live data wide, the collector itself would not need additional memory during the build; since then, garbage collection with stopping the program and moving live objects to the second half of the heap has been called the "Cheney algorithm". Its main drawback is that live data can occupy only half of the available memory, and the second half is occupied by a "buffer for copying".



The efficiency of garbage collection can be improved by observing that the data in a typical program is divided into "very short-lived" and "very long-lived": if an object survived one garbage collection, it is very likely to survive the next collection too. Therefore, the heap is divided into a "nursery" for newly created objects, and an "adult heap" of two halves. When the nursery becomes full, live data from it is transferred to the adult heap; when one half of the adult heap overflows, live data from it is carried over to the other half. This saves both memory (no space is reserved for short-lived data in the second half of the heap) and build time (long-lived data is not copied back and forth with each garbage collection in the nursery).



When using “Cheney on the MTA”, the stack acts as a cattery. The dispatcher called on stack overflow explicitly receives as parameters pointers to all live data: these are all parameters and local variables of the calling continuation, including captured variables passed to the continuation as an implicit parameter. Unlike traditional garbage collectors, Cheney on the MTA does not need to look for live data inside the stack: CPS ensures that all data below the last stack frame is dead if it is not available from the last frame. This means that the garbage collector does not care about the structure of the stack frames, or the possible presence in the stack of "foreign" frames left over from functions in other languages.







The “Cheney on the MTA” approach is used in the C Scheme compilers “Chicken Scheme” (2000) and “Cyclone Scheme” (2016). In Cyclone, to Baker's long-standing idea, they added support for parallel garbage collection, when only one thread, whose stack is being processed, stops during the collection, and the rest continue to work.






All Articles