- backpressure is cool
- backpressure is only available in libraries that implement the reactive streams specification
- this specification is so complex that you shouldn't even try to implement it yourself
In this article I will try to show that:
- backpressure is very simple
- to implement an asynchronous backpressure, it is enough to make an asynchronous version of the semaphore
- if there is an asynchronous semaphore implementation, the org.reactivestreams.Publisher interface is implemented in a few dozen lines of code
Backpressure is a feedback that adjusts the data producer's speed to match the consumer's speed. In the absence of such a connection, the faster producer can overflow the consumer's buffer or, if the buffer is dimensionless, exhaust all the RAM.
In multithreaded programming, this problem was solved by Dijkstroy, who proposed a new synchronization mechanism - the semaphore. A semaphore can be thought of as a permission counter. It is assumed that the producer requests permission from the semaphore before committing a resource-intensive action. If the semaphore is empty, then the producer thread is blocked.
Asynchronous programs cannot block threads, so they cannot access an empty semaphore for permission (but they can do all other semaphore operations). They must block their execution in another way. This other way is that they simply leave the worker thread they were running on, but before that they arrange to return to work as soon as the semaphore is full.
The most elegant way to pause and resume an asynchronous program is to structure it as a dataflow actor with ports :
A dataflow model - actors with ports, the directed connections between their ports, and initial tokens. Taken from: A Structured Description Of Dataflow Actors And Its Application
There are input and output ports. The input ports receive tokens (messages and signals) from the output ports of other actors. If the input port contains tokens, and the output port has a place for placing tokens, then it is considered active. If all ports of the actor are active, it is sent for execution. Thus, when resuming its work, the actor program can safely read tokens from input ports and write to the weekend. This simple mechanism contains all the wisdom of asynchronous programming. Allocating ports as separate actor subobjects greatly simplifies coding of asynchronous programs and allows you to increase their diversity by combining ports of different types.
The classic Hewitt actor contains 2 ports - one is visible, with a buffer for incoming messages, the other is a hidden binary that blocks when the actor is sent for execution and thus prevents the actor from restarting until the end of the initial launch. The desired asynchronous semaphore is a cross between these two ports. Like the buffer for messages, it can store many tokens, and like a hidden port, these tokens are black, that is, indistinguishable, like in Petri nets, and a token counter is enough to store them.
At the first level of the hierarchy, we have a class
AbstractActor
with three nested classes - base Port
and derivatives AsyncSemaPort
and InPort
, as well as with a mechanism for launching an actor for execution in the absence of blocked ports. In short, it looks like this:
public abstract class AbstractActor {
/** */
private int blocked = 0;
protected synchronized void restart() {
controlPort.unBlock();
}
private synchronized void incBlockCount() {
blocked++;
}
private synchronized void decBlockCount() {
blocked--;
if (blocked == 0) {
controlPort.block();
excecutor.execute(this::run);
}
}
protected abstract void turn() throws Throwable;
/** */
private void run() {
try {
turn();
restart();
} catch (Throwable throwable) {
whenError(throwable);
}
}
}
It contains a minimal set of port classes:
Port
- base class of all ports
protected class Port {
private boolean isBlocked = true;
public Port() {
incBlockCount();
}
protected synchronized void block() {
if (isBlocked) {
return;
}
isBlocked = true;
incBlockCount();
}
protected synchronized void unBlock() {
if (!isBlocked) {
return;
}
isBlocked = false;
decBlockCount();
}
}
Asynchronous semaphore:
public class AsyncSemaPort extends Port {
private long permissions = 0;
public synchronized void release(long n) {
permissions += n;
if (permissions > 0) {
unBlock();
}
}
public synchronized void aquire(long delta) {
permissions -= delta;
if (permissions <= 0) {
//
// ,
//
block();
}
}
}
InPort
- minimum buffer for one incoming message:
public class InPort<T> extends Port implements OutMessagePort<T> {
private T item;
@Override
public void onNext(T item) {
this.item = item;
unBlock();
}
public synchronized T poll() {
T res = item;
item = null;
return res;
}
}
The full version of the class
AbstractActor
can be viewed here.
At the next level of the hierarchy, we have three abstract actors with specific ports, but with undefined processing routines:
- a class
AbstractProducer
is an actor with one port of the asynchronous semaphore type (and an internal control port, present in all actors by default). - the class
AbstractTransformer
is a regular Hewitt actor, with a reference to the input port of the next actor in the chain, where it sends the converted tokens. - the class
AbstractConsumer
is also an ordinary actor, but it does not send the converted tokens anywhere, while it has a link to the producer semaphore, and opens this semaphore after absorbing the input token. In this way, the number of tokens in process is kept constant and no buffer overflow occurs.
At the last level, already in the test directory, specific actors used in tests are defined :
- the class
ProducerActor
generates a finite stream of integers. - the class
TransformerActor
takes the next number from the stream and sends it down the chain. - class
ConsumerActor
- accepts and prints the resulting numbers
Now we can build a chain of asynchronous, parallel working handlers as follows: producer - any number of transformers - consumer
Thus, we have implemented a backpressure, and even in a more general form than in the reactive streams specification - the feedback can span an arbitrary number of processing cascades, and not only adjacent ones, as in the specification.
To implement the specification, you need to define an output port that is sensitive to the number of permissions passed to it using the request () method - this will be
Publisher
, and supplement the existing one with a InPort
call to this method - this will be Subscriber
. That is, we assume that interfaces Publisher
andSubscriber
describe the behavior of ports, not actors. But judging by the fact that in the list of interfaces there is also Processor
, which can in no way be a port interface, the authors of the specification consider their interfaces to be actor interfaces. Well, we can make actors that implement all these interfaces by delegating the execution of interface functions to the corresponding ports.
For simplicity, let ours
Publisher
do not have its own buffer and will write directly to the buffer Subscriber
. To do this, you need someone to Subscriber
subscribe and fulfill request()
, that is, we have 2 conditions and, accordingly, we need 2 ports - InPort<Subscriber>
and AsyncSemaPort
. None of them are suitable as a base for implementationPublisher
'a, since it contains unnecessary methods, so we will make these ports internal variables:
public class ReactiveOutPort<T> implements Publisher<T>, Subscription, OutMessagePort<T> {
protected AbstractActor.InPort<Subscriber<? super T>> subscriber;
protected AbstractActor.AsyncSemaPort sema;
public ReactiveOutPort(AbstractActor actor) {
subscriber = actor.new InPort<>();
sema = actor.new AsyncSemaPort();
}
}
This time, we
ReactiveOutPort
did not define the class as nested, so it needed a constructor parameter - a reference to the enclosing actor to instantiate the ports defined as nested classes.
The method
subscribe(Subscriber subscriber)
boils down to saving the subscriber and calling subscriber.onSubscribe()
:
public synchronized void subscribe(Subscriber<? super T> subscriber) {
if (subscriber == null) {
throw new NullPointerException();
}
if (this.subscriber.isFull()) {
subscriber.onError(new IllegalStateException());
return;
}
this.subscriber.onNext(subscriber);
subscriber.onSubscribe(this);
}
which usually results in a call
Publisher.request()
that boils down to raising the semaphore with a call AsyncSemaPort.release()
:
public synchronized void request(long n) {
if (subscriber.isEmpty()) {
return; // this spec requirement
}
if (n <= 0) {
subscriber.current().onError(new IllegalArgumentException());
return;
}
sema.release(n);
}
And now it remains for us not to forget to lower the semaphore using a call
AsyncSemaPort.aquire()
at the time of resource use:
public synchronized void onNext(T item) {
Subscriber<? super T> subscriber = this.subscriber.current();
if (subscriber == null) {
throw new IllegalStateException();
}
sema.aquire();
subscriber.onNext(item);
}
The AsyncSemaphore project was specially designed for this article. It is intentionally made as compact as possible so as not to tire the reader. As a result, it contains significant limitations:
-
Publisher
'Subscriber
' -
Subscriber
' 1
In addition,
AsyncSemaPort
it is not a complete analogue of a synchronous semaphore - only one client can perform the operation aquire()
y AsyncSemaPort
(meaning the enclosing actor). But this is not a disadvantage - AsyncSemaPort
it fulfills its role well. In principle, you can do it differently - take java.util.concurrent.Semaphore
and supplement it with an asynchronous subscription interface (see AsyncSemaphore.java from the DF4J project ). Such a semaphore can bind actors and threads of execution in any order.
In general, each type of synchronous (blocking) interaction has its own asynchronous (non-blocking) counterpart. So, in the same DF4J project there is an implementation
BlockingQueue
, supplemented by an asynchronous interface. This opens up the possibility of a step-by-step transformation of a multithreaded program into an asynchronous one, partly replacing threads with actors.