I was wrong. The future is CRDT

A few weeks ago, I watched Martin Kleppman's presentation on his approach to real-time editing via CRDT and felt searing despair. His approach is so good that it surpasses all my work over the past decade, and there will be no place for it in the future.



But let's start over.



In 2010, I worked at Google Wave, where we tried to create collaborative editable spaces to replace email, Google Docks, forums, instant messaging and many other single-tasking applications. Among my tools, I especially like the general-purpose environment, nowhere else as in Wave, the functionality was not formulated at that time. Unlike most other tools, the general purpose environment does not impose its own workflow, so you can use it to plan holidays, create wikis, play board games with friends, schedule work meetings, and much more.



Internally, Wave co-editing works on top of an Operational Transform (OT), and it was used for days at that time: our algorithm was based on a 1995 Jupiter talk . For each document, the algorithm keeps a separate chronological list of changes, “Type H at position 0”, “Type i at position 1”, and so on. In most cases, users change the latest version of the document, and the log looks like a sequence of changes, however, when co-authoring, we are faced with simultaneous edits.



In this case, the first edit that reaches the server is recorded as usual, and the next, if it turns out to be outdated, is compared with the event log to determine the user's original goals. (More often than not, it all comes down to updating symbol positions.) Then the algorithm, supposedly "this is the result the user wanted", adds a new operation, like git-rebase in real time.



With the closure of Google Wave, I ported the OT model to ShareJS . At the time, node felt new and strange, and if I remember correctly, I started ShareJS even before npm was released. A simple co-editor required only a thousand lines of code, and during the demo I co-edited the document in the browser and in the application.



At its core, OT is an embellished for () loopwith multiple help functions for updating character offset. In practice, OT is simple, easy to understand, quickly put into operation, and works great. (10-100 thousand operations per second in unoptimized javascript, 1-20 million in optimized C ). The event log can consume more memory than usual, but it can be cut down if desired, although it will not work to combine especially old edits. Globally assigning operations will require a centralized server, but most systems already have such a server or database, don't they?



Centralized servers



A significant OT problem is its dependence on a centralized server. Have you ever wondered why, when allowing access to a Google Docs document through social networks, you were faced with a strange message like "This document is overloaded and its editing is disabled"? The reason (in my opinion) is as follows: when you open a document, one specific server is selected to process all its edits, and when a crowd of users pounces on the document, the system has to try very hard not to overload the server.



There are several ways to work around this problem: in addition to subdocument sharding (like in Google Docks), you can make edits through a retry loop bypassing database transactions, so that the serialization problem is shouldered by the same database (this is how Firepad worksand ShareDB ).



However, OT is not perfect. We wanted to replace e-mail with Wave, but mail supports merging, one chain of letters can stretch across many companies, and somehow it all works successfully. In addition, unlike Facebook messages, email can be sent to companies mentioned in the “copy” column. If we want Wave to replace mail, it will also need the functionality of sending messages without access to the external network, for example, when I send a letter to my colleague at the next table. But how can you implement all of this on top of OT? We somehow managed to set up such a process, but it turned out to be too complex and full of errors: we created a diagram, in which each wave protocol set up a tree of wave servers to transfer operations in both directions, but it never fully worked. A little less than ten years ago, at the Wave Protocol Summit, I gave a presentation on how to create and configure such a network, but despite all my preparation and all the preliminary checks, strict adherence to each step on the presentation itself failed, and the network never worked. I still don't know why this happened, but whatever the bugs were, they were hardly ever fixed in the public version, it was all too difficult.



Takeoff CRTD



As I already mentioned, the main Wave algorithm was created quite a long time ago, in 1995, and I don't even remember having the Internet at home at that time. Since then, researchers have worked tirelessly to improve the performance of OT, and in the most promising direction they use CRTD (Conflict-Free Replicated data types). This approach is somewhat different from the usual and allows you to edit files in real time without the need for a central server. Martin's presentation describes their work better than I could describe them, so I'll skip the details.



People have been asking for my opinion on CRTD for years, and my answer always sounds like this:



They are neat and I'm glad people are working on them, however:


  • . . , 100 Delta-CRTD . (: B4.)
  • - CRTD , , 100 automerge master 83 . , , , , . ( automerge 1.1 .)
  • For years, the functionality present in OT has been absent in CRDT, for example, no one has yet made a CRDT with support for / object move / (transferring something from one part of the JSON tree to another). Such procedures are required for applications like Workflowy, and OT does a great job with them .
  • CRDTs are complex in and of themselves and hard to reason about.
  • You most likely already have a centralized server / database.


For all my criticism, I ignored the CRDT, but in so doing ignored the relevant literature, and to my surprise, I missed the quiet and imperceptible improvement of the CRDT. In his presentation (which is more than worth your attention) Martin addresses the main points:



  • : CRDT (Automerge / RGA Y.js / YATA) [log(n)] . ( .)
  • : - , 54- . automerge , Y.js, Y.js 100 160 3 . .
  • : , .
  • : , CRDT OT. , automerge .


The reasoning for speed did not convince me, so to test the idea, I independently implemented and tested CRDT in Rust through a B-tree using ideas from automerge. It lacked functionality (deleting characters, conflicts), but it was able to process 6 million edits per second . (Each iteration did 2000 edits to a blank document by two alternating users, which took 330 microseconds in total, or 6.06 million edits per second.) So CRDTs have really improved and the speed difference between them and OT is now even less than between Rust and Javascript.



All of these fixes have been in the "coming soon" section of the automerge performance branch for a long time, but after all automerge is not the only CRDT. Y.js proves itself worthy and easily bypasses the current version of automerge in its tests . It lacks the functionality I'm interested in, but in general it is certainly easier to fix the existing implementation than to create a new algorithm.



Inventing the future



I am very concerned about making progress. What would it be strange not to have in use in a hundred years? Obviously, we will have real-time editing, but I am no longer sure about its implementation through OT and all the work I have done in this regard, which cannot but sadden me.



JSON and REST are ubiquitous these days. Let's say in 15 years real-time co-editing will be ubiquitous. What would be the equivalent of JSON in terms of co-authoring for easy transfer to your project? In this glorious future, we will need a high-quality implementation of CRDT, since for some applications OT simply will not work, it will not work to create a real-time version of GIt through it or a simple variation of Google Wave. But if we already have a good CRDT implementation, do we also need an OT implementation? I think not, because transferring all the functionality from OT to CRDT will not be difficult (including, by the way, trimming operations), while the opposite is not true. Smart people disagree with me, but in my opinion, given we have a good, fast CRDT for each language,the need for OT will disappear completely.



One of the advantages of OT is that it can be easily implemented in centralized systems - like most modern applications - but distributed algorithms are just as well implemented. (Take a look at Github for example.) In my opinion, a high-quality CRDT on wasm is faster than an OT implementation in JS. And if you're only concerned about centralized systems, remember: OT restrictions led Google to scaling problems in Google Docs.



So it seems to me that now is the time to switch to a small and fast CRDT. For the most part, the academic work has already been done, the matter remains for successful implementations.



What's next



I am less and less concerned with the world of centralized applications. Applications interact with my data, on my devices, and it would be high time for these applications to react to such connections accordingly. I want my laptop and my phone to be able to transfer data to each other over wifi, and not through uploading my data to servers in another country. Especially if these servers are funded by ad giants competing for my attention .



Philosophically, when I edit a document in Google Docs, my computer asks Google for permission to edit the file (because if for some reason the server says no, I lose all my edits). For comparison, when git pushin github, I just notifygithub about edits in my code. My repository still belongs to me, as does every bit of data and hardware it's on, and this is how my applications should work. Thanks to people like Martin, we now know how to make good CRDTs. However, before local-first applications are accepted as the basis, more lines of code will have to be written.



All in all, it's time to say goodbye to operational transformation. We had a great time together, of all I wrote, the operational transformation code was one of the most difficult and interesting. You are smart and amazing OT, but CRDT can do something that you can never. And I need CRDT. I think with a few good implementations we can achieve something really special.



I mourn all the work I put into OT over the years, but OT no longer fits into my vision of the future. CRDT will allow us to rebuild Wave easier and faster and create apps that treat users like digital citizens rather than digital peasants. And this is important.



It's time to create.



All Articles