Free and high quality: how a presale can set the tone for a project

Usually the team's pride is implementation. The most important work of working out a solution is unfairly left in the shadows. In our project to modernize the bank's backup system, this hidden part of the iceberg turned out to be more labor-intensive than the project itself. And not in vain. Deep study helped to find a balanced solution for a real business problem and justify the choice of an expensive but more suitable product - Dell EMC Data Domain 6800.





The bank's backup system (DBS) has been operating on the basis of Veritas NetBackup software for a long time. But the equipment, on which the SRC was spinning, could no longer cope with the load.



A signal of an impending problem was the lack of a backup window. SRK no longer had time to back up data from servers and workstations overnight, and some of the tasks were spread over working hours. As a result, for some resources, copies were not made every day, and this created the risk of data loss and violation of the SLA level in the event of real failures.



Another problem is the lack of space on the disk library. To somehow compensate for this, the customer reduced the storage time of backups on the disk library from 14 days to 7 days. This put additional stress on the tape library drives, which were almost completely utilized anyway.



The disk library was used for online storage of backups and provided parallel execution of backup jobs up to 25 threads simultaneously.



The tape library was used for long term storage. According to the requirements of the regulator, various documents must be stored from 1 to 5 years. With a further increase in the load, the customer would have problems with compliance with the regulations of the Central Bank. Not a very positive outlook.



Change? Repair? Expand? Upgrade?



When the SRK began to "choke", approaching the limit of its productivity, the customer had a question - where is the "bottleneck" of the system? Since we are involved in the maintenance of the software part of the SRK, the bank's IT service asked us to analyze the operation of the system.



The solution at that time included the following components:



  • 1 x NetBackup Solaris x86 master / media server
  • 1 x VMware Backup Media Server;
  • 45 x AIX Media Servers
  • 10 x SPARC Solaris Media Servers;
  • 1 x Dell EMC Data Domain 4200 Disk Library in VTL mode;
  • 1 x Oracle SL3000 Tape Library with 8 LTO6 Drives.


To store online backups, several streams of backups from media servers and Enterprise clients were simultaneously recorded using the FC protocol to the disk library. The copies were then backed up to tape media of the Oracle SL3000 library through the NetBackup master / media server over the FC protocol.



The bank has 830 SRK clients, including about 730 VMware virtual machines, Enterprise clients on AIX and Solaris, and physical x86 servers. The original volume of a complete copy of the data backed up was 115 TB.



To find the bottleneck, we looked at NetBackup job execution statistics, media server I / O configuration, SAN configuration, tape library drive utilization, and disk library performance. For this, the customer provided us with diagnostic reports:



  • nbsu - Veritas NetBackup Support Utility;
  • NetBackup DeployUtil and software license specification;
  • Brocade SAN Health on SAN configuration;
  • AutoSupport from Data Domain Disk Library.


The nbsu report - Veritas NetBackup Support Utility provides comprehensive information on NetBackup configuration, including performance information for backup jobs. This information is presented in the bpdbjobs dump - most_columns. But you need to be able to parse it, and convert dates and times from the epoch format to a human-readable format.





The output from the bpdbjobs command helps you evaluate the performance and duration of each job. This is how you can get a picture of the disposal of tape library drives on a time scale:





The nbsu has data on the media used, retention periods and their distribution across pools. Below is a summary of media retention times based on NBU_available_media.txt from nbsu.





The NetBackup DeployUtil report estimates the actual consumption of backup software licenses for different licensing models - traditional and capacity. It is generated in MS Excel, contains a complete list of backup clients, information about the platform of redundant servers, the version of NetBackup used and the amount of data being backed up.



Brocade SAN Health report describes SAN topology, zoning configuration, look at utilization of ISL links.



Data Domain AutoSupport “talks” about disk library configuration, storage efficiency, and performance. After parsing, we identified the patterns and got the heatmap of the load:





As a result, it turned out that the "weak link" was the Dell EMC Data Domain 4200 disk library, which worked in VTL mode.



We compared the actual parameters and formal requirements for the volume of backups and their frequency. It turned out that the current capacity and performance of the disk library do not provide storage of operational CDs with the required period. Moreover, precisely because of the limitations of the read speed from the Dell EMC DD4200, the duplication of information on tapes occurred in a mode close to the limit. The lower performance of DD for reading is due to the resource-intensive process of information rehydration - restoring the sequence of blocks to their original form before deduplication.



Everything pointed to the need to replace the outdated disk library. The customer needed hardware that could support 5-6 TB of data per hour, with additional controllers for fault tolerance, and increased capacity.



Three candidates to choose from



The most obvious suggestion in this case was replacing Dell EMC Data Domain with a newer version. Or the Veritas NetBackup Appliance could be an alternative. (This is largely analogous to Data Domain, and in the same price category). But both options raised budget concerns.



The third option is a solution based on standard server architecture with native deduplication Veritas NetBackup - Media Server Deduplication Pool (MSDP).



When we came to the customer with a proposal, it turned out that he had already considered solutions based on both Veritas NetBackup Appliance and Dell EMC Data Domain from other vendors, but the customer was not sure how optimal they were in terms of price / result ratio. In other words, our version on standard servers came in handy.



While the bank was testing configurations based on Veritas NetBackup Appliance, we advised the customer's IT team on the specifics of using deduplication from Veritas, the nuances of Fiber Transport technology for transmitting SRK traffic over a SAN, mechanisms for creating synthetic copies based on NetBackup Accelerator technology, and proposed to enable verification of these technologies into the testing program. Based on the test results, the customer approved our solution based on two standard x86 servers with block storage, since the entire stack of tested technologies was implemented in it.



We also prepared a proposal to replace the Dell EMC Data Domain 4200 with a newer library. For this project, the Dell EMC Data Domain 6800 HA model was chosen - a more powerful, spacious and productive model. The advantage of the solution was the high availability of the library in a dual controller configuration. The disk library in this configuration is no longer a single point of failure. If the controller is lost, the library will remain available through NPIV technology and the backup jobs will continue automatically.



In the case of choosing a solution based on Data Domain, the customer did not need to replace the client software of the SRK Enterprise-client with a SAN-client, and the amount of work to "embed" it into the IT landscape was minimal. This was another plus for the Dell EMC Data Domain 6800 HA.



More power + DD BOOST



Dell EMC Data Domain 6800 disk library supports dual controller mode (High Availability), and can work not only with VTL protocol, but also with DD BOOST. The new library has a usable capacity of 174 TB excluding deduplication and compression, while the Dell EMC DD4200 was limited to 130 TB. Moreover, we estimated the expected speed of the disk library and showed the customer that it should be from 5.3 to 8 TB per hour when writing and reading simultaneously, fully covering his needs for backup and data transfer to tapes.



Simultaneous support for DD Boost and VTL proved to be useful as it was possible to combine the use of technologies in case of compatibility issues. The benefits of DD Boost are obvious:



  • ;
  • ( ) (image);
  • DD Boost , - NetBackup;
  • NetBackup ;
  • .


Since the bank's ecosystem is based on VMware virtualization, the NetBackup Accelerator for Vmware is also a useful feature of DD Boost. This technology tracks the changed blocks of VMware CBT (Changed Block Tracking) and, based on deduplication technology, creates a synthetic full backup during an incremental one. At the same time, the possibility of granular recovery of files and Microsoft applications (AD, SQL, Exchange, SharePoint) from virtual machine backups is preserved.



More affordable, but not better



Our team has calculated the costs of switching to various options for new libraries. It turned out that reorganizing the SRC using standard servers would require more integration work. But the most unpleasant thing is additional risks for business: replacement of client backup software, reconfiguring policies and, as a result, possible downtime for the most critical servers (more than 50 AIX / Solaris servers).



As a result, the customer chose to migrate to the Dell EMC Data Domain 6800.



Dell EMC Data Domain 6800 was the more expensive alternative. But its use made it possible to reduce the costs of modernization in general: not to change the infrastructure of the RMS, to minimize the risk of data loss and service unavailability, and also not to abandon the old library. So adding another DD to the system more than doubled the storage capacity, nothing me in the already debugged processes. Maintaining VTL support did not require additional configuration on NetBackup media servers and Enterprise Clients. There was also no need to change the client backup software, and the backup jobs were easily redistributed between the disk libraries - the already installed DD4200 and the new DD6800. SLP policies for transferring backups to tapes also remain the same as before,only data now comes from two disk libraries.



Transition to a new system



Below is the target solution scheme:





By the time the new disk library was introduced, the need to expand the capacity for backup was so urgent that the bank was ready to back up production to it until all tests were completed. We managed to dissuade the customer from this step. We have performed all checks under the test program, including destructive failover tests.



The implementation happened quickly. Two weeks later, the bank was running a new disk library. As a result, the customer received a system with a larger capacity and a sufficient margin of performance for the next few years. Performance indicators in fact even exceeded the calculated ones. The actual performance of the DD 6800 is 8-9 TB per hour (calculated from 5.3 TB), and the capacity, taking into account deduplication and compression, is about 1 Petabyte.



Because we simply expanded the disk storage capacity and did not change the architecture, the cost of NetBackup licenses for the bank remained the same - nothing changed in terms of data backup and number of clients. Now the new library works in parallel with the Dell EMC DD 4200, but its capacity is quite sufficient to painlessly decommission the old library, if required.



A deep study at the start of the project in terms of labor costs "outweighed" the introduction of the new library. In fact, we completed a small consulting project with a miscalculation of possible options for 0 rubles. But as it turned out it was not in vain. This allowed the customer to obtain a justification for the modernization, minimize risks and make an informed decision.



Author: Alexey Polyakov, Design Engineer of Data Storage Systems, Jet Infosystems



All Articles