👨🏻‍🔬 ⚛️ 👩🏽‍🤝‍👨🏻 Brief Best practice of building cluster solutions F5 ♻️ ↔️ 📸

Continuity of Service, Always Available, Consistent SLA, Single Point of Failure — we have encountered these conditions many times when we need to consider the high availability of a website or application.

The main task of the fault-tolerant scheme is to eliminate application downtime. Any incident of external tampering or internal malfunction must go unnoticed by the user. To ensure this "stealth" and continuous operation, the F5 publishing and protection device is designed, which initially has all the necessary mechanisms through the use of redundant physical and logical infrastructure. Device service clustering

is responsible for the high availability function in F5 BIG-IP. (DSC), which makes it possible to:

add software and hardware resources, without stopping the system and large-scale architectural transformations.
ensure uninterrupted operation of the system in the event of failure of one or more devices.
sync data between devices.
efficiently distribute client requests across devices.
perform routine maintenance (for example, software updates) without downtime.
save session states at the time of switching between devices in a fault-tolerant scheme.

F5 solutions are not yet under development and are designed from the ground up to have a redundant F5 server. In this case, the failure of any of the components does not interrupt the operation of the system. All this works at any time without being tied to the manufacturer of network or server equipment:

session mirroring (MAC, TCP, SSL, session binding according to various criteria). Sessions on the active F5 device are duplicated on the standby system. In the event of a standby failure, the system can start processing connections immediately, without interruption.
synchronization of configuration (security policies, access policies) ensuring the current configuration on all cluster members at any time.
correct handling in case of a network failure without rebuilding (MAC and IP do not change) as well as the availability of redundant network interfaces for the correct operation of the cluster in the event of a network failure.

There are two scenarios for building an F5 cluster

Active / Standby

In this mode, one system is active and processes all traffic, and the second is in standby mode (the standby does not process traffic). If a failure is detected in the active device F5, all traffic will be moved to the standby, since the standby system already has all the configuration and session mirroring, the standby system becomes active.

Active / Active

In this mode, both systems can be active at the same time. The existing equipment is used to the full. This type of tuning is mainly used where the F5 hardware is limited and the load requirements are large. But in this case, if one of the servers fails, some of the services will become unavailable.

Depending on the peculiarities of the application, the requirements for its publication SLA and the load, the scheme of operation of F5 devices and their number in each data center are selected.

Output. To ensure guaranteed fault tolerance, supported by SLA, F5 recommends building fault-tolerant configurations of at least 2 devices. Basically, 2 options are used that have their own advantages and disadvantages:

Approach 1.Building a cluster in two or three data centers and distributing traffic between them using DNS. The advantage of this option is the small number of F5 devices - one device in each data center. but low switching times between data centers, which varies from minutes to several hours, depending on the settings. This switching time is due to the peculiarities of the DNS protocol, but allows the use of a small number of F5 devices.

Approach 2. Creation of a cluster in each data center of at least 2 virtual or hardware devices F5. The advantage of this approach is instant application switching without interrupting the user session, but requires the installation of at least 2 F5 devices in each data center.

Depending on the features of the application and the requirements for its availability, you should choose between Approach 1 or Approach 2, taking into account the peculiarities of one or the second option. In the case where F5 publishes and protects applications with the required SLA level of 99.9 (almost 9 hours of downtime per year) and above, these approaches should be used together. When choosing an F5 solution and implementing it, it is also worth considering the active / active or active / passive operating mode. It is important to note that these modes can be implemented both in one data center (different F5 devices for different applications) for maximum utilization of F5 devices, and between data centers so that both data centers process application traffic (active-active DC) or only one (Disaster DC) ...

Brief Best practice of building cluster solutions F5

Active / Standby

Active / Active

More articles: