In our data center, another cycle of hardware renewal has begun. Why and how we did it, and most importantly - how exactly the work of the Linxdatacenter cloud platform after the upgrade will improve in specific numbers, - says Oleg Fedorov, Linxdatacenter Product and Solutions Manager, in today's article.
The time is now
In the last year, customers' requests for high-frequency processors have become more frequent for deploying systems that are demanding on the speed of calculations and the availability of CPU resources - a clear sign of the next cycle of hardware upgrades. This is how our own Linxdatacenter cloud infrastructure upgrade project was born.
In the course of long calculations and negotiations between our technical team and manufacturers, it was decided to use a validated design from Cisco VersaStack. This design is used in Cisco Fabric Interconnect, UCS servers, storage systems of the IBM FlashSystem family.
It was decided to take the cloud platform's CPU performance to a new level: for this, we bought new high-frequency blade servers based on the Cisco B200 M5 series UCS chassis. This is the newest line, with the latest generation processors clocked at 3.4 GHz, which did not hit the market until Q1 2020.
A solution with such "brains" at its core allows to accelerate the work of products that demand performance. First of all, these are cloud platforms 1C and SAP, ERP solutions, applications requiring the processing of serious data sets, various solutions for software development and testing.
Another significant update is the new backup storage system. To ensure SLA for the BaaS service, it was decided to install in our data center Cisco S3260 - a large 4-unit harvester, into which 54 disks are "stuck". This step was taken to unify the platforms in St. Petersburg and Moscow. Also, unlike old hardware, this will allow us to use Direct Storage Access technology.
Veeam software is installed on the Cisco S3260 with the prospect of providing a Direct Access model through integration with a new storage system from IBM. Snapshots of the latest versions of the storage system and collection of backups will take place without a request to the virtualization servers - thus we get rid of an extra command from VMware.
This scheme removes excessive load from the servers, improves performance and performance.
Traditional VM backup systems take a snapshot before execution, then copy the VM data to themselves and delete the snapshot. For large and high-intensity VMs, the size of the snapshot can grow dramatically during the backup process, and when the snapshot is deleted, the main disk of the VM and the snapshot will merge. At this point, the VM may be unavailable for a few seconds. Using snapshot technology at the storage level avoids such problems.
A little more detail
Let's take a look at how the Linxdatacenter cloud platform will improve after the upgrade.
The key point is that we are one of the first to try to bring to the market end to end NVME technology, which is distinguished by high IOPS and low (an order of magnitude less than SSD) latency (delay before the request is executed). However, this technology requires additional improvement of the infrastructure and also affects the network part, the update of which is also planned for us.
Let's move on to the CPU. Traditionally, it is the fastest growing area in IT equipment. For example, as of the second quarter of 2019, the processor on the market at a clock frequency of 3.3 GHz gave only 8 cores.
Our new Intel 6246R processors are clocked at 3.4 GHz with 16 cores. In just over a year, both the frequency and the number of available cores have increased significantly. In terms of virtualization, the upgrade will enable more customers to have a higher performing IT system.
As for storage systems, this element has always been the slowest developing area of any information systems. Actually, RAM, as an element of any IT system - user or professional, has emerged as a tool for bypassing low storage performance.
But today there is a technological opportunity to equate the speed of the storage system with the speed of RAM, which will make it possible to execute transactions and take their results from data storage systems dozens of times faster.
Suppose if one operation - for example, processing a request to a high-load database - used to be performed in 1 minute, then on modern storage systems it will take only a couple of seconds.
And last but not least, the IBM FlashSystem theoretically allows you to reduce maximum disk latency to less than 1 millisecond, that is, literally - not even 0.1, but 0.01 milliseconds. Now we have taken one more step towards these indicators: they will become available in our cloud after the next stage of the upgrade.
In horse feed, in business growth
To accurately describe the effect of an iron upgrade on a cloud, it is appropriate to use this analogy.
Imagine that you are writing text in the Word editor. You typed it on the keyboard, look up, and the program from the sentence you typed managed to display only the first word on the screen. You thought it over, formulated it, typed it with your fingers on the keyboard, it is already there, has already gone into the computer, but has not yet appeared on the screen.
After upgrading the infrastructure with an integrated approach, this gap disappears and becomes impossible even on a theoretical level.
Of course, all these "pumped-up" elements must be properly assembled into a final solution that will provide a high level of economic efficiency and business benefit.
For the business of our clients in Russia, new opportunities will allow, first of all, to significantly speed up the work of 1C software.
If 1C is required only for 10-15 users, it will work fine even “on a calculator”, that is, it will have enough modest or standard IT resources. However, as soon as a business starts providing services in real time on the basis of 1C, or the company has sufficiently large-scale operations and many different improvements - all this “eats up” processor time and power.
Accordingly, the more improvements, the wider the scale of operations on 1C, the higher the requirements for the CPU resource. The architecture of 1C software is built in this way. And then the following happens: the higher the processor frequency, the fewer the number of cores it can give. And its price rises at the same time.
Therefore, if you are using mid-level or high-level business applications, you cannot do without high-performance processors at the heart of a modern IT solution.
From a business point of view, their use means that an accountant who pressed a button in 1C to receive, say, an annual report, on a high-performance system will receive the result not in 2 minutes, but instantly. Accordingly, he can complete the final result of summing up budgets, totals and closing the financial period throughout the company not in three days, as now, but clearly as of the last day of the reporting period.
As for backup tasks, it should be understood that any snapshot "freezes" the virtual machine for at least a split second, and sometimes even more. When the previous version of it is removed, it can also freeze the VM for a couple of seconds. This is a standard effect.
Transferring the procedure to the storage level using Direct Storage Access technology completely eliminates such delays, even if minimal.
Suppose a company runs backup tasks on a hyper-converged platform using a distributed Ceph cluster (a file storage system for multiple virtual machines). With this approach, any VM delays are unacceptable.
Or take this scenario: the execution of a transaction in the bank's database, which lasts, say, 30 seconds, coincides in time with the "freeze" of the VM involved in this transaction during the snapshot creation.
As a result, the client deposited money at the ATM, but the money was not credited to the account. The client is dissatisfied and through word of mouth shares his negative opinion about the bank. The result is a reputation loss for the business.
First users
There are already companies in our data center that are interested in the capabilities of the updated platform and are testing it free of charge to see what practical results will be obtained.
So far, we are recording interest from the financial services segment, construction, as well as from companies using business applications that are critical to any, even minimal downtime. The less downtime, the higher the availability of the service and the lower the cost of maintaining the application - and the better the service that the end user receives performs.
Most likely, all participants in these tests will be interested in the updated platform - because the economic choice between buying one high-performance server without further development and renting it for a month is virtually obvious both when focusing on short-term projects and on the prospect of long-term business development based on advanced IT solutions.