What's New in NetEngine's High Performance Router Line

The time has come to reveal details about the new Huawei NetEngine 8000 carrier-class routers - about the hardware base and software solutions that allow building end-to-end connections with 400 Gbps throughput on their basis and monitoring the quality of network services at a subsecond level.











What determines what technologies are needed for network solutions



The requirements for the latest networking equipment are now driven by four pivotal trends:



  • the proliferation of 5G mobile broadband;
  • the growth of cloud loads in both private and public data centers;
  • the expansion of the IoT world;
  • increasing demand for artificial intelligence.




During the pandemic, another general trend emerged: scenarios with a reduced physical presence as much as possible in favor of a virtual one are becoming more attractive. These include, among others, virtual and augmented reality services and solutions based on Wi-Fi 6 networks. All these applications require a high quality channel. NetEngine 8000 is intended to provide it.







NetEngine 8000 family



The devices in the NetEngine 8000 family are divided into three main series. Marked with the letter X are high-performance flagship models for telecom operators or for high-load data centers. The M series is designed for the embodiment of various metro-scenarios. And devices with the index F are intended primarily for the implementation of common DCI scenarios (Data Center Interconnect). Most of the 8K models can be part of end-to-end tunnels with a bandwidth of 400 Gbps and maintain a guaranteed service level (Service Level Agreement - SLA).







Fact: today only Huawei manufactures a full range of equipment for organizing 400GE class networks. The illustration above shows a scenario for building a network for a large enterprise customer or a large operator. In the latter case, the high-performance NetEngine 9000 core routers are used, as well as the new NetEngine 8000 F2A routers capable of aggregating a large number of 100, 200, or 400 Gbps connections.



Metro factories are based on M-series devices. Such solutions allow adapting to the tenfold growth in traffic volume, which is expected over the next decade, without changing the platform.







Huawei independently manufactures optical modules with a bandwidth of 400 Gbps. The solutions built on them are 10-15% cheaper than similar ones in capacity, but using 100-gigabit channels. Testing of modules began in 2017, and already in 2019, the first implementation of equipment based on them took place; African telecommunications operator Safaricom is currently commercializing such a system.







The huge bandwidth of the NetEngine 8000, which may seem excessive in 2020, will definitely be needed in the not too distant future. In addition, the router is suitable for use as a large exchange point, which will certainly be useful for both second-tier operators and large enterprise structures in the phase of rapid growth and creators of e-government solutions.







Huawei also promotes the proliferation of a number of new technologies, including the SRv6 routing protocol, which greatly simplifies the delivery of carrier VPN traffic. FlexE (Flexible Ethernet) technology provides guaranteed bandwidth at the second level of the OSI model, and iFIT (In-situ Flow Information Telemetry) allows you to accurately track the parameters of SLA compliance.







From a provider's point of view, SRv6 can be applied from the container layer in a data center built on NFV (Network Functions Virtualization) to, for example, a wireless broadband environment. Corporate customers will need end-to-end use of the new protocol when building backbone (backbone) networks. The technology, we emphasize, is not proprietary and is used by different vendors, which eliminates the risks of incompatibility.







This is how the timeline for commercializing SRv6 technology to support 5G solutions looks like. Practical case: the Arab company Zain Group, in the process of transition to the use of 5G, modernized its network, increasing the capacity of backbone channels, and also improved the manageability of the infrastructure by introducing SRv6.







How to apply these technologies



Three dissimilar products have previously been used as the “technology umbrella” covering the above solutions. U2000 was used as NMS for transmission domain and IP domain. Additionally, SDN systems used uTraffic systems and the much more famous Agile Controller. However, this combination turned out to be not very convenient for carrier-grade routers, so now these products are combined into the CloudSoP tool .







First of all, it allows you to completely manage the life cycle of the infrastructure, starting with the construction of a network - optical or IP. It is also responsible for managing resources, both standard (MPLS) and new (SRv6). Finally, CloudSoP makes it possible to fully serve all services with a high level of granularity.







Let's take a closer look at the classic management approach. In this case, it can be done using L3VPN or SR-TE, which gives additional options for creating tunnels. In order to allocate resources for various service tasks, more than a hundred parameters and segment routing are used.







What does the deployment of such a service look like? First, you need to set the primary policy for a specific level (plane). In the diagram above, the SRv6 technology has been selected, with the help of which the delivery of traffic from point A to point E is configured. The system will calculate possible paths taking into account the bandwidth and delays, and also create parameters for subsequent control.







After setting up, we are starting to create and launch additional VPN services. A serious advantage of Huawei's solution is that, unlike the standard MPLS Traffic Engineering, it allows you to synchronize tunnel paths without any additional add-ons.







The diagram above shows the general process of capturing information. SNMP is often used for it, which takes a lot of time, and gives an average result. However, telemetry, which we previously used in data centers and campus solutions, has come to the world of carrier networks. It adds a load, but it allows you to understand what is happening on the network not at the minute, but at the sub-second level.







Of course, the received traffic volume must be “digested” in some way. For this, additional machine learning technology is used. Based on preloaded patterns of the most common network faults, the monitoring system is able to predict the likelihood of excesses. For example, a breakdown of an SFP (Small Form-factor Pluggable) module or a sudden burst of traffic on the network.







And this is how a scale-out control system based on the TaiShan ARM servers and GaussDB database looks like. Individual nodes of the analytical system have the concept of "roles", which allows you to granularly expand diagnostic services with an increase in traffic or an increase in the number of network nodes.

In other words, everything that was good in the storage world is gradually coming into the area of ​​network management.




A striking example of the introduction of our new technologies is the Industrial and Commercial Bank of China (ICBC). It has deployed a backbone network of high-performance routers that are assigned specific roles. According to the NDA, we can only give a general idea of ​​the network structure in the diagram. It includes three large data centers connected by end-to-end tunnels, and 35 additional sites (second-tier data centers). Both standard connections and SR-TE are used.







IP WAN 3-Layer Intelligent Architecture



Huawei solutions are based on a three-layer architecture, at the lower level of which there is equipment of different performance. At the second level, there is an equipment management environment and additional services that extend the functionality of network analysis and control. The top layer, relatively speaking, is applied. The most common application scenarios involve the organization of the networks of telecom operators, financial institutions, energy companies and government agencies.



Here is a short video describing the capabilities of the NetEngine 8000 and the technical solutions used in it:





Of course, the equipment should be designed to accommodate traffic growth and infrastructure expansion, taking into account proper power supply and adequate cooling. When the flagship router model is equipped with 20 PSUs of 3 kW each, the use of carbon nanotubes in the heat removal system no longer seems redundant.







What is it all for? It sounds like a fantasy, but now for us 14.4 Tbit / s per slot is quite achievable. And this mind-boggling bandwidth is in demand. In particular, all the same financial and energy companies, many of which today have backbone networks created using DWDM technology (Dense Wavelength Division Multiplexing). Eventually, the number of applications requiring ever higher speeds is also growing.



One of our scenarios for building machine learning networks between two Atlas 900 clusters also requires terabit-class bandwidth. And there are a lot of similar tasks. These include, in particular, nuclear computing, meteorological calculations, etc.











Hardware base and its requirements



The diagrams show the currently available LPUI router modules with integrated cards and their characteristics.







And this is how the roadmap looks like with new module options that will be available over the next two years. When developing solutions based on them, it is important to consider energy consumption. Now standard data centers are built at the rate of 7-10 kW per rack, while the use of terabit-class routers implies several times higher power consumption (up to 30-40 kW at peak). This entails the need to design a specialized site or create a separate high-load zone in the existing data center.







A general look at the chassis reveals that the factories are hidden behind the middle fan assembly. There is a possibility of their "hot" replacement, realized thanks to redundancy according to the 2N or N + 1 scheme. In essence, this is a standard high-reliability orthogonal architecture.







Not only flagships



No matter how impressive the flagship models are, the most installations fall on box solutions of the M and F series.



The most popular service routers now are the M8 and M14 models. They allow working with both low-speed, such as E1, and high-speed interfaces (100 Gb / s now and 400 Gb / s in the near future) within one platform.







The performance of the M14 is sufficient to satisfy all the needs of ordinary enterprise customers. It can be used to build standard L3VPN solutions for communication with providers, it is also good as an additional tool, for example, for collecting telemetry or using SRv6.







A large number of cards are available for the model. There are no separate factories, and supervisors are used to ensure connectivity. Thus, the distribution of performance by ports indicated in the diagram is achieved.







In the future, the supervisor can be replaced with a new one, which will give new performance on the same ports.







The M8 model is slightly smaller than the M14, the performance is also inferior to the older model, but their use cases are very similar.







A set of M8-compatible physical cards allows, for example, to configure a connection to P-devices via a 100 Gbps interface, use FlexE technology and encrypt all this.







Generally speaking, it is with the M6 ​​that you can start working with your operator environment. It is small and not suitable for providers, but it is easily applicable as a traffic aggregation point for connecting regional data centers, for example, in a bank. Moreover, the set of software here is the same as on the older models.







The available cards for the M6 ​​are smaller, and the maximum performance is 50 Gb / s, which, however, is noticeably higher than the standard 40 Gb / s solutions in the industry.







The youngest model, M1A, deserves a separate mention. This is a small solution, which may come in handy where an extended operating temperature range is expected (-40 ... +65 ° ).








A few words about the F line. The NetEngine 8000 F1A model became one of the most popular Huawei products in 2019, not least due to the fact that it is equipped with ports with bandwidth from 1 to 100 Gb / s (up to 1.2 Tb / s in total ).







More about SRv6



Why exactly now it was required to include support for SRv6 technology in our products?



Currently, the number of protocols required to organize VPN tunnels can be 10+, which causes serious management problems and suggests the need to radically simplify the process.







The industry's response to this challenge was the creation of SRv6 technology, in the emergence of which Huawei and Cisco had a hand.







One of the limitations that needed to be removed was the need to use per-hop behavior (PHB) for routing standard packets. It is quite difficult to establish "inter-operator" interaction through Inter-AS MP-BGP with additional services (VPNv4), so there are very few such solutions. SRv6 allows you to initially route a packet through the entire segment without writing special tunnels. And the programming of the processes themselves is simplified, which makes large deployments much easier.







The diagram shows a case for implementing SRv6. The two WANs have been linked by several different protocols. To receive service from any virtual or hardware server, a large number of handovers were required between VXLAN, VLAN, L3VPN, etc.

After the implementation of SRv6, the operator had an end-to-end tunnel not even to the hardware server, but to the Docker container.




Learn more about FlexE technology



The second level of the OSI model is bad in that it does not provide the necessary services and the level of SLA that providers need. They, in turn, would like to get some analogue of TDM (Time-division multiplexing), but on Ethernet. Many approaches have been used to solve the problem, with only very limited results.







Flex Ethernet serves precisely to guarantee the quality of the SDH (Synchronous Digital Hierarchy) and TDM layer in IP networks. This became possible thanks to the work with the forwarding plane, when we modify the L2 environment in this way so that it becomes as efficient as possible.







How does any standard physical port work? There is a certain number of queues and a tx ring. The packet in the buffer is waiting for its processing, which is not always convenient, especially if there are elephant and mice streams.



Additional inserts and another layer of abstraction help to provide guaranteed bandwidth at the level of the physical environment.







An additional MAC layer is allocated at the level of information transfer, which allows you to create hard physical queues that can be assigned specific SLAs.







This is how it looks at the implementation level. The additional layer actually implements TDM framing. With this meta-insertion, it is possible to granularly queue and shape TDM services over Ethernet.







One of the use cases for FlexE involves very strict adherence to SLA by creating time slots to equalize bandwidth or provide resources for critical services.







Another scenario allows you to work with defects. Instead of simply hashing the transmission of information, we form separate channels practically at the physical level, in contrast to the virtual ones created by QoS (Quality of Service).







Learn more about iFIT



Like FlexE, iFIT is a licensed technology from Huawei. It allows for very granular SLA reviews. Unlike standard IP SLA and NQA mechanisms, iFIT operates not with synthetic, but with “live” traffic.







IFIT is available on all devices that support telemetry. For this, an additional field is used that is not occupied by the standard Option Data. Information is recorded there that allows you to understand what is happening in the channel.



***



Summarizing what has been said, we emphasize that the functionality of NetEngine 8000 and the technologies incorporated in the "eight thousandth" make these devices a reasonable and justified choice when creating and developing carrier-class networks, backbone networks of energy and financial companies, as well as systems of the level of "e-government".



All Articles