“Well this is necessary,” she said loudly without addressing anyone. - Well this is necessary! So it is written directly - the main task of the company is to make profit in the interests of shareholders. Well, you think! They are not afraid of anything!
Julius Dubov, "The Lesser Evil"
Seeing such a headline, you probably have already decided that the article is either stupidity or a provocation. But do not rush to conclusions: employees of large corporations, especially corporations with state participation, quite often have to compare different platforms, including completely different ones, for example, like those in the title.
Of course, no one compares DBMS like that, because their strengths and weaknesses are well known. As a rule, platforms that solve an applied problem are subject to comparison. In the article, I will show the technique that is used at the same time, using the example of databases as a subject that is not by hearsay familiar to Habr's readers. So,
Motivation
When you start an educational project or a hobby project, the motivation for choosing a platform can be very diverse: “I know this platform best”, “I am interested in understanding this one”, “here is the best documentation” ... In the case of a commercial company, the selection criterion is one: how much will I have to pay and what will I get for this money.
Naturally, you want to pay less and get more. However, it is necessary to decide which is more important - pay less or receive more, and assign a weight to each node. Let's assume that a high-quality solution is more important to us than a cheap one, and we assign the “Cost” node a weight of 40%, and the “Opportunities” node - 60%.
In large corporations, the opposite is usually the case - the weight of the value does not fall below 50%, and maybe more than 60%. In the model example, it is only important that the total weight of the child nodes of any parent node must be 100%.
Cutoff conditions
Site db-engines.com known about 500 database management systems. Naturally, if you choose a target platform from so many options, you may end up with a review article, but not a commercial project. In order to reduce the choice space, cutoff criteria are formulated, and if the platform does not meet these criteria, then it is not considered.
Cutoff criteria can relate to technological features, for example:
- ACID guarantees;
- relational data model;
- SQL language support (note, this is not the same as "relational model");
- the possibility of horizontal scaling.
There may be general criteria:
- availability of commercial support in Russia;
- open source;
- availability of the platform in the Register of the Ministry of Telecom and Mass Communications;
- the presence of the platform in some rating (for example, in the first hundred of the db-engines.com rating);
- availability of experts on the market (for example, based on the results of searching for the platform name in the resume on the hh.ru website).
After all, there may be enterprise-specific criteria:
- availability of specialists in the staff;
- compatibility with the monitoring system X or with the backup system Y, on which all maintenance is tied ...
The most important thing is to have a list of cut-off criteria. Otherwise, there will definitely be some expert (or "expert") who enjoys the special trust of the management, who will say "why didn't you choose platform Z, I know it's the best."
Cost estimate
The cost of the solution obviously consists of the cost of licenses, the cost of maintenance and the cost of equipment.
If the systems are of approximately the same class (for example, Microsoft SQL Server and PostgreSQL), then for simplicity, we can assume that the amount of equipment for both solutions will be approximately the same. This will allow you not to evaluate the equipment, thereby saving a lot of time and effort. If you have to compare completely different systems (say, Oracle vs. Redis), then it is obvious that for a correct assessment it is necessary to do sizing (calculation of the amount of equipment). Sizing a non-existent system is a very thankless task, so they still try to avoid such a comparison. It is easy to do this: zero data loss and a relational model are written in clipping conditions, or vice versa - a load of 50 thousand transactions per second.
To evaluate licenses, it is enough to ask the vendor or its partners for the cost of a license for a fixed number of cores and support for a fixed period. As a rule, companies already have strong relationships with software vendors, and if the database operation department cannot answer the cost question on its own, then one letter is enough to receive this information.
Different vendors may have different licensing metrics: by the number of cores, the amount of data, or the number of nodes. The standby database can be free, or it can be licensed in the same way as the main one. If only some differences in metrics are found, you will have to describe in detail the model stand and calculate the cost of licenses for the stand.
An important point for a correct comparison is the same support conditions. For example, Oracle support costs 22% of the license price per year, and PostgreSQL support costs nothing. Is it correct to compare? No, because an error that cannot be eliminated on our own has completely different consequences: in the first case, support specialists will quickly help fix it, and in the second case, there is a risk of project delay or downtime of the finished system for an indefinite period.
There are three ways to equalize the calculation conditions:
- Use Oracle without support (in reality, this does not happen).
- Buy PostgreSQL support - for example, from Postgres Professional.
- Include risks associated with lack of support.
For example, the calculation of risks may look like this: in the event of a fatal database failure, the downtime of the system will be 1 business day. The planned profit from the use of the system is 40 billion Mongolian tugriks per year, the frequency of accidents is estimated at 1/400, thus, the risk of lack of support is estimated at about 100 million Mongolian tugriks per year. Obviously, “planned profit” and “estimated accident rate” are virtual quantities, but it is much better to have such a model than not to have none.
In reality, the system may be too important, and the reputation losses from prolonged downtime will be unacceptable, so support will be required. If downtime is allowed, then dropping support can sometimes be a good way to save money.
Suppose that after all the calculations, the cost of operating platform A for 5 years turned out to be 800 million Mongolian tugriks, the cost of operating platform B - 650 million tugriks, and the cost of operating platform C - 600 million tugriks. Platform C, as a winner, receives a full-weight point for the cost, and platforms A and B - slightly less, in proportion to how many times they are more expensive. In this case - 0.75 and 0.92 points, respectively.
Opportunity assessment
The assessment of opportunities is divided into many groups, the number of which is limited only by the imagination of the person making the assessment. The best option seems to be the division of capabilities into teams that will use these capabilities; in our example, these are developers, administrators, and information security officers. Let's assume that the weights of these functions are distributed as 40:40:20.
Development functions include:
- ease of data manipulation;
- scaling;
- the presence of secondary indexes.
The list of criteria, like their weights, are very subjective. Even when solving the same problem, these lists, item weights and answers will differ significantly depending on the composition of your team. For example, Facebook uses MySQL to store data, and Instagram is built on top of Cassandra. It is unlikely that the developers of these applications filled out such tables. One can only guess that Mark Zuckerberg chose a full-fledged relational model, paying for it with the need for applied sharding, while Kevin Systrom laid down scaling by means of the platform, sacrificing the convenience of data access.
Administration functions include:
- backup system capabilities;
- ease of monitoring;
- convenience of capacity management - disks and nodes;
- data replication capabilities.
Please note that the wording of the questions must be quantifiable. You can even agree on how to evaluate a particular function. Let's, for example, try to rate the backup tools using the example of tools supplied with the Oracle DBMS:
Tool | A comment | Rating |
---|---|---|
imp / exp | Upload and download data | 0.1 |
begin / end backup | Copying files | 0.3 |
RMAN | Incremental copy capability | 0.7 |
ZDLRA | Incremental copy only, fastest recovery to point | 1.0 |
If there are no clear assessment criteria, it makes sense to ask several experts to provide marks and then average them.
Finally, let's just list the information security features:
- availability of password management policies;
- the ability to connect external authentication tools (LDAP, Kerberos);
- role model of access;
- ;
- ;
- (TLS);
- .
Separately, I would like to warn against using the results of any load tests that were not done by you as arguments.
First, the data structure and load profile of the applications under test may differ significantly from the task that you are going to solve. About 10-15 years ago, database vendors loved to flaunt the TPC benchmark results, but now nobody seems to take these results seriously.
Secondly, the performance of the system depends quite strongly on the platform for which the code was originally written and on what hardware the test was carried out. I've seen many benchmarks compare Oracle to PostgreSQL. The results range from the unconditional superiority of one system to the equally unconditional superiority of another.
And finally, thirdly, you do not know anything about who conducted the test. Both qualifications, which affect the quality of OS and platform customization, are important, as well as motivation, which affects the test results more than all other factors combined.
If performance is a critical factor, conduct the test yourself, preferably with the help of specialists who will configure and maintain the production system.
Result
Finally, the result of all the work done should be a spreadsheet, where all the estimates are brought together, multiplied and summed up:
As you understand, by changing the weights and adjusting the estimates, you can achieve any desired result, but this is a completely different story ...