In the second article we will tell you about three more:
And the third article will be devoted to the conclusions.
Note: This will focus solely on the underlying technology, and features such as enterprise features will be largely ignored (where appropriate).
TileD B
TileDB is a database built around multidimensional arrays . It allows you to simplify the work with data types that do not quite fit into existing relational database management systems (RDBMS), for example, dense and sparse arrays , data frames . TileDB is specifically tailored for use cases such as genomics and geospatial data .
Notable features
- Switchable data backends with support for Amazon S3 , MinIO, and more.
- Storage integrations for HDFS , PrestoDB , Apache Spark , Dask, and more.
- Language bindings for C / C ++ , Python , R , Java and Go .
What I liked especially
We like such "highly specialized" databases, sharpened for a specific set of data types and tasks. Traditional RDBMSs are, of course, good at their relative versatility to cover an extremely wide range of use cases (no kidding). But sometimes there are extreme cases when at the very last stage (a) the capabilities of "conventional" systems are not enough, and (b) the task is very important for your business.
We expect other similar systems to emerge as the specialization of database use cases grows and new subject areas emerge. The good old RDBMS, of course, will not go anywhere, but I would like to see how TileDB and other similar systems develop and expand the boundaries of what is possible. We hope that there will be very “hackable”, non-monolithic databases that will have an interface for connecting plugins so that you can work with data types that are very specific for specific use cases. But more on that later.
Questions to the project
- ? TileDB , , , ? .
- - ? , «TileDB -». ?
Materialize
Materialize is marketed as "the first true streaming SQL database," and perhaps that's really no exaggeration! Basically, it is a relational database that is "wire-compatible" with PostgreSQL , but with one important difference: it offers materialized views that are updated in real time.
There is Materialize's definition of " streaming storage ", which seems to work for it .
In standard Postgres, for example, you have to manually update materialized views:
CREATE MATERIALIZED VIEW my_view (/* ... */);
REFRESH MATERIALIZED VIEW my_view;
/* The underlying tables change */
REFRESH MATERIALIZED VIEW my_view;
/* More stuff happens */
REFRESH MATERIALIZED VIEW my_view;
This can be done at any frequency, for example, using a script or a scheduler task. What we haven't seen yet (although we always really wanted to see) is a database that natively supports incremental updates of materialized views. Yes, indeed: Materialize tracks changes in the data sources you specify and updates views based on changes in those sources.
Even if Materialize doesn't win or stay on the market long enough, the capabilities it offers should continue and will almost certainly be replicated in other databases.
Notable features
- , ( Postgres), JSON, CSV , Kafka Kinesis, .
- : «» (timely dataflow) «» (differential dataflow). , . Materialize, , , Materialize — « », , .
- Materialize «» Postgres, psql Postgres.
Materialize has the potential to replace a lot. The most obvious thing is that the system allows you to use all the available arsenal of processes to progressively update your materialized views. But that's not a big win.
Much more important for us is that Materialize allows us to make inactive those parts of the data stack that are allocated for tracking changes in data sources. This can be done natively :
CREATE SOURCE click_events
FROM KAFKA BROKER 'localhost:9092' TOPIC 'click_events'
FORMAT AVRO USING CONFLUENT SCHEMA REGISTRY 'http://localhost:8081';
Your database now "knows" about the data source that it can use to build automatically updated materialized views. This native "pipelining" seems to me even more magical than automatic updating of views. If you run, for example, serverless functions, or Heron jobs , or Flink data pipelines , which are only tracking, and add an operator there
INSERT
, then Materialize will allow you to simply strip that portion of the stack.
Questions to the project
- Why a separate DB and not a Postgres extension ? We are sure that there are good architectural reasons why the extension will not work like this here, but I would like to know why.
- How easy is it to build extensions for other data sources? How can one write, for example, an extension for Apache Pulsar ? Are there plans to open APIs to developers for this purpose?
Prisma
Prisma is less a database than a set of tools to abstract your database as much as possible . The system is currently compatible with PostgreSQL , MySQL and SQLite on the database side, and with JavaScript and TypeScript in terms of language (with the prospect of supporting other databases and languages in the future). It is billed as the “data layer for modern applications,” which is generally true.
The "golden way" of using Prisma is something like this:
- Define your data types at the application level using Prisma's SDL schema.
- Based on the created scheme, generate a highly idiomatic code for the language of your choice.
- Get busy building REST APIs, GraphQL APIs, and whatever else you want to build.
We understand the doubts of some people about tools like Prisma. There is a large group of developers who oppose database abstraction. They don't need smart DSLs and GUIs. They want to write simple SQL and create all the database interaction code by hand. We understand the desire to maintain this degree of control, but we still recommend that you spend 20-30 minutes trying Prisma. It seems to us that it does a very good job of finding the "golden mean" between "brute force" and raw SQL.
Notable features
- Prisma SDL , . .
- Prisma Migrate ( , SQL). , A , .
- Prisma Studio — , Prisma.
Prisma's schema DSL lets you not only specify the data types required for your application, but also determine which code generator you want to use, as well as the connection information for the database you are connecting to.
In addition, enumerated types (enums) are provided, convenient annotations such as
@id
, @relation
and @default
. Reminiscent of the DSL for migrating databases from ActiveRecord and Ecto , as well as IDLs like those used in Protocol Buffers and Thrift .
I'd love to see an adaptation of Prisma's SDL schemas or something similar as a language-independent standard (suitable for language-specific libraries). The status quo assumes that every programming language reinvents the wheel. We think it would be useful to have a language-independent way of defining the types that define the interaction between the application and the database.
By the way, special thanks to Prisma for the excellent documentation. We definitely consider this to be an important difference, and if your documentation does not fall into the “What I liked most” section of a blog post, then you should invest more resources in creating whitepapers.
Questions to the project
Will Prisma also be useful in languages that already have widely used object-relational mappings (ORMs)? If using ActiveRecord for Ruby or Ecto for Elixir, what is the incentive to switch?