This database is on fire ...





Let me tell you a technical story.



Many years ago I was developing an application with collaboration features built into it. It was a handy experimental stack that took advantage of the full potential of early React and CouchDB. It synchronized data over JSON OT in real time . It was used internally in the company, but its wide applicability and potential in other areas was evident.



While trying to sell this technology to potential customers, we encountered an unexpected obstacle. In the demo video, our technology looked and worked great, no problem. The video showed exactly how it works, and nothing was imitated in it. We came up with and coded a realistic scenario for using the program.



Two users interacting through a mobile app


In fact, this became the problem. Our demo worked exactly the way everyone else simulated the work of their applications. More specifically, information was instantly transferred from A to B, even if it was large media files. After logging in, each user saw new entries. Using the application, different users could collaborate on exactly the same projects, even if the Internet connection was interrupted somewhere in the village. This is implicitly implied in any cut product video in After Effects.



Even though everyone knew what the Refresh button was for, no one at all understood that the web applications they ask us to create are usually subject to their own limitations. And that if they are no longer needed, then the user experience will be completely different. Basically, they noticed that you can "chat" by leaving notes to the interlocutors, so they wondered how it differs, for example, from Slack. Uf-f-f!



Everyday sync design



If you already have experience in software development, then it should get on your nerves to remember that most people cannot just look at a picture of an interface and understand what it will do when interacting with it. Not to mention what's going on inside the program itself. Knowing what can happen is largely the result of knowing what cannot happen and what should not happen. This requires a mental model not only of what the software does, but also of how its individual parts are coordinated and communicate with each other.



A classic example of this is a user looking at spinner.gif for twenty minuteswondering when the work will finally end. The developer would understand that the process is probably frozen and that the gif will never disappear from the screen. This animation simulates the execution of work, but is not associated with its state. In cases like this, some techies like to roll their eyes, marveling at the degree of user confusion. However, notice which of them points to the rotating clock and says that they are actually stationary?



An animated activity spinner


This is the essence of real-time value. Real-time databases are still very little used these days, and many are suspicious of them. Most of these databases are actively leaning towards the NoSQL style, which is why they usually use Mongo-based solutions, which are better to forget about. However, for me it means the comfort of working with CouchDB, as well as the study of designing structures that not only some bureaucrat will be able to fill with data. I think I am spending my time better.



But the real topic of this post is what I'm using today. Not by choice, but because of the indifferently and blindly applied corporate policy. So I'll give you a Perfectly Honest and Unbiased comparison of two closely related Google real-time database products.





Both have the word Fire in their names. One thing I remember with fondness. The second for me is a different kind of fire. I'm in no hurry to say their names, because as soon as I do that, we are faced with the first big problem - names.



The first is called Firebase Real-Time Database and the second is Firebase Cloud Firestore . Both are products from Google 's Firebase suite . Their APIs are named, respectively, firebase.database(…)and firebase.firestore(…).



This is because Real-Time Database is just the original Firebase before Google bought it in 2014. Then Google decided to create a copy ofFirebase based on the company's big data, and called it Firestore with a cloud. I hope you are not confused yet. If you do get confused, don't worry, I myself have rewritten this part of the article ten times.



Because Firebase needs to be quoted in the Firebase question, and Firestore in the Firebase question, at least to be understood a few years ago on Stack Overflow.



If there was an award for the worst naming of software products, then this case would definitely become one of the contenders. The Hamming distance between these names is so small that it confuses even experienced engineers, whose fingers are typing one name, although the head thinks about another. These are plans that have failed miserably, invented with the best intentions; they fulfilled the prophecy that the database would be on fire. And I’m not kidding. The man who came up with this naming scheme caused blood, sweat and tears.





Pyrrhic victory



One would think that Firestore is a replacement for Firebase, its next generation descendant, but that would be a misconception. Firestore is guaranteed not to be a replacement for Firebase. It looks like someone cut out everything interesting from it, and confused most of the rest in different ways.



However, a quick look at the two products can be confusing: they seem to be doing the same thing, through mostly the same APIs, and even in the same database session. The differences are subtle and only come to light with a careful comparative study of the extensive documentation. Or when trying to port code that works perfectly to Firebase to work with Firestore. Even then, you figure out that the database interface lights up as soon as you try to do a drag and drop in real time. Again, I'm not joking.



The Firebase client is polite in the sense that it buffers changes and automatically retries updates, giving priority to the last write. However, Firestore has a limit of 1 write operations per document per user per second, and this limit is imposed by the server. When working with it, you yourself have to find a way to work around it and implement a refresh rate limiter, even when you are just trying to build your application. That is, Firestore is a real-time database without a real-time client, which masquerades as it using an API.



With this we begin to see the first signs of the meaning of the Firestore existence. I may be wrong, but I suspect that someone high up in the google leadership looked after buying on Firebase and just said, “No my God, no. This is unacceptable. Only not under my leadership. "





He emerged from his chambers and proclaimed,



“One big JSON document? No. You will split the data into separate documents, each of which will be no more than 1 megabyte in size. "



It looks like such a limitation won't survive the first encounter with any reasonably motivated user base. You know that it is. At work, for example, we have over fifteen hundred presentations, and this is Perfectly Normal.



With this limitation, you will have to come to terms with the fact that a single “document” in a database will not be like any object that a user might call a document.



“Arrays of arrays that can recursively contain other elements? No. Arrays will only contain fixed-length objects or numbers as the Lord intended. "



So if you were hoping to put GeoJSON in your Firestore, you will find that this is not possible. Nothing non-uniform is allowed. I hope you love Base64 and / or JSON inside JSON.



JSON import and export over HTTP, command line tools or admin panel? No. You will only be able to export and import data to Google Cloud Storage. So it seems to be called now. And when I say "you", I am referring only to those who have Project Owner powers. Everyone else can go and create tickets. "



As you can see, the FireBase data model is easy to describe. It contains one huge JSON document linking JSON keys to URL paths. If you write the following HTTP PUTin /FireBase:



{
  "hello": "world"
}


It GET /hellowill return "world". This basically works exactly as you'd expect. A collection of FireBase objects is /my-collection/:idequivalent to a JSON dictionary {"my-collection": {...}}at the root, the contents of which are available in /my-collection:



{
  "id1": {...object},
  "id2": {...object},
  "id3": {...object},
  // ...
}


This works fine if each insert has a non-collision ID, which is the standard solution for this in the system.



In other words, the database is 100% JSON (*) compliant and works great with HTTP like CouchDB. But mostly you use it through a real-time API that abstracts websockets, authorization, and subscriptions. The admin panel has both capabilities, allowing both real-time editing and JSON import / export. If you stick with the same code in your code, you will be surprised how much custom code is wasted when you realize that patch and diff JSON solve 90% of routine persistent state tasks.



The Firestore data model is similar to JSON, but differs from it in several critical respects. I already mentioned the lack of arrays inside arrays. The sub-collections model is to be first class concepts separate from the containing JSON document. Since there is no out-of-the-box serialization for this, a specialized code path is required to get and write data. To process your own collections, you need to write your own scripts and tools. The admin panel only allows you to make small changes one field at a time, and has no import / export capabilities.



They took a real-time NoSQL database and turned it into a slow non-SQL with auto-join and a separate non-JSON column. Something in the spirit of GraftQL .





Hot Java



If Firestore was to become more reliable and scalable, the irony is that the average developer will get a less reliable solution than choosing FireBase out of the box. The software that the grumpy Database Administrator needs requires such a level of effort and caliber of specialists that it is simply unrealistic for a niche in which the product is supposed to be good. This is similar to how HTML5 Canvas is not at all a replacement for Flash if there are no development tools and a player. Moreover, Firestore is bogged down in a quest for data cleanliness and sterile validation, which is simply not in line with the way the average business user likes to work : for him everything is optional, because everything is a draft to the very end.



The main disadvantage of FireBase is that the client was created several years ahead of time, even before most web developers knew about immutability. Because of this, FireBase assumes that you will be modifying data, and therefore does not take advantage of user-provided immutability. In addition, it does not reuse data in snapshots sent to the user, which makes diff much more difficult. For large documents, its mutable diff based transaction mechanism is simply inadequate. Guys, we already have WeakMapJavaScript. It's comfortable.



By shaping the data as needed and not making the trees too bulky, this problem can be circumvented. But I'm curious if FireBase would be much more interesting if the developers came out with a really good client API that uses immutability combined with some solid, practical advice on database design. Instead, they seem to have tried to fix what's not broken, and that made it worse.



I don't know all the logic behind the creation of Firestore. Reasoning about the motives that arise inside the black box is also part of the fun. This juxtaposition of two extremely similar but incomparable databases is quite rare. As if someone was thinking, "Firebase is just a feature that we can emulate in Google Cloud."but have not yet discovered the concept of defining real-world requirements or creating useful solutions that satisfy all of these requirements. “Let the developers think about it. Just make the UI pretty ... Can you add more fire? "



I understand a couple of things about data structures. I can clearly see that the concept of "everything in one big JSON tree" is an attempt to abstract from the database any sense of large-scale structure. To expect software to simply handle any questionable data structure fractal is crazy. I don't even need to imagine how bad everything can be, I conducted rigorous code audits and saw things that you humans never dreamed of . But I also know what good structures look like, how to use them andwhy it should be done . I can imagine a world in which Firestore seemed logical and the people who created it thought they did a good job. But we do not live in this world.



The support for building queries in FireBase is bad by any standards, it practically does not exist. It definitely needs improvement or at least revision. But Firestore is not much better, as it is limited to the same 1D indexes found in plain SQL. If you want queries that people perform on chaotic data, then you need full-text search, filters on multiple ranges, and arbitrary user-defined ordering. On closer inspection, the functions of plain SQL are too limited by themselves. Also, the only SQL queries people can run in production are fast queries. You will need a specialized indexing solution with sophisticated data structures. For everything else, at least there should be an incremental map-reduce or something similar.



If you look in Google docs for information about this, then you will hopefully be pointed in the direction of something like BigTable and BigQuery. However, all of these decisions are accompanied by such a volume of thick corporate sales jargon that you will quickly go back and start looking for something else.



The last thing you need in the case of a real-time database is something made by humans and for people working on a salary scale for leadership.



(*) This is a joke, there is no such thing as 100% JSON compatibility .






Advertising



Are you looking for a VDS for debugging projects, a server for development and deployment? You are definitely our client :) Daily billing of servers of various configurations, anti-DDoS and Windows licenses are already included in the price.






All Articles