Why Traditional High Availability (HA) for Security Devices is Not Enough
It’s funny although sometimes the first way we do something might be the right way, we try to improve it to make it look shinier. Eventually we realize that the most obvious answer was actually the right answer, our original tactic.
For those of you reading this who were in the firewall business in the mid-90s, you should remember a time when firewall vendors used an external device to provide high availability (HA) functionality. In the case of Checkpoint for example, the relationship with Stonesoft was exemplary; They sold as a team and were very successful together. Their joint solution offered best-in-class, fast, efficient, easy-to-manage firewalling, and a great HA experience. For that brief point in time, security devices’ HA design and functionality was 15 years ahead of its networking counterparts. Firewalls could be set up in an active/active configuration, N+X redundancy was possible (if not needed at the time) and horizontal scaling was a distinct option.
Then events conspired to change the way the world designed HA security, and not for the better. What happened, you ask? Stonesoft decided to make a monumental business blunder and make a firewall. Whenyou make most of your sales through a strategic partner who happens to make firewalls for future reference, you should probably not make a firewall. This blunder resulted in Checkpoint quickly moving away from the load balanced HA model, embracing an emerging protocol called VRRP, severing their partnership with Stonesoft, and the rest as they say is history. Given Checkpoint’s dominance in the firewall space and marketing strength, the load balanced firewall design all but disappeared over the next 20 years. Today more than ever security device load balancing should be back to the forefront of our minds when we think HA. Because today, high availability designs need to offer more than just uptime, they also have to offer piece of mind, stability and predictable use under all environments.
So I’ll list out the advantages of the load balanced design then go into more details. Active configuration (high end firewalls can cluster, but there is a difference), lateral scale (price and throughput), N+X redundancy, gateway translation functionality and intelligent traffic steering.
Today most security device manufacturers (firewalls, IPS, anti-malware etc.) recommended HA designs based upon the active-passive model. In most cases, these vendors have an active option but seldom recommend its use because of the likely problem with asynchronous routes (user sessions coming in through one firewall and leaving through the other, creating a visibility problem). On the higher end, a few vendors can offer HA through clustering up to three devices together. This does get around the asynchronous routes problem, but does it answer large enterprise, ecommerce or other commercial customer availability concerns? What I mean is, if a company that decides they need four active clustered core switches with active data centers, is N+2 redundancy for a in-line single point of failure device enough? Let’s also illustrate the limitations with either model offered by the security device manufactures with a theoretical example.
Let’s say a new security bug comes out tomorrow, a really bad one. All the security vendors scramble to make a fix. Two days from now, they release a new version of beta code (no testing in the field) with the new feature to counteract this big threat. Management comes to you and says, “we need this fix, we are at risk” today, how comfortable are you upgrading your firewalls to test the stability of this code in production? The answer is, you’re not. Why? Because the way HA works in the security device world is that all devices need to run the same code. I can’t run the new code on one device and just failover if it doesn’t work, because the HA configuration requires both devices to be running the same code. There are exceptions (Forcepoint and Fortinet can run multiple version of code in a clustered or HA configuration) but the only the lowest common feature set will be active, so the problem still persists. Without upgrading all units, I can’t roll out new code in production for testing without introducing an unacceptable risk.
The load-balanced architecture solves this problem. You can run as many versions of code on as many active firewalls as you like, and if there is a bug, you can just use traffic steering to move the traffic while you fix the code version. To take it one step further, you could run multiple vendors’ devices in parallel while offering an HA architecture. A common grumble I hear today is, “we would like to change manufacturers but the cutover would be too difficult as we have to many rules, etc.” Well, with the load balanced architecture it can become a migration over time as opposed to a single point in time cutover.
Now let’s address lateral scale today in the security device world. When I need to upgrade throughput, for example, if I get larger internet pipes and go from requiring 10 gig of firewall throughput to requiring 20 gig, I throw out the 10 gig firewalls and buy 20 gig firewalls. Why, you ask? Because native HA has to be set up with two similar devices. However, if you were using the load balanced architecture, you could just scale by adding another 10 gig firewall in an active configuration. You would have the correct throughput and more redundancy because you would have an extra unit. Also, should the need arise, you could run disparate units, i.e.: 1x 10 gig and 1 x 20 gig. This isn’t only a more economical solution but it once again offers better redundancy.
SSL inspection is one of the leading drivers for this architectural change today, and in many cases ADCs are being bought solely for this function, in my opinion because the ADC vendors aren’t explaining the value of the other ADC features set.
Roughly 50% of all internet traffic today is SSL-encrypted, yet most security device manufactures can only deliver a fraction of their inspection on SSL encrypted traffic. For example, Palo Alto offers only 2 gig aggregate SSL inspection on their 30 gig 5060 firewall. On their newer 5060, they offer roughly 5 gig aggregate which seems impressive, but that is a 73 gig platform. Radware’s smallest ADC can offer more than 9 gig at a fraction of the cost. The ADC SSL inspection architecture not only lends itself to being able to size smaller security devices, thus saving significant Capex, it also reduces latency. If each security device needs to do the SSL negotiation for inspection purposes, that adds a lot of latency (typical environment, Firewall, IPS anti-malate on the way in and firewall, IPS and DLP on the way out. Six SSL negotiations as opposed to two, one in and one out, using the ADC architecture).
Additional Gateway functionality can also be useful, such as having the load balancer/ADC act as a reverse HTTP2 gateway. In the event that older security devices cannot understand HTTP2, the load balancer is in the right place to downgrade the HTTP2 to HTTP1, pass it through the security device, than re-encapsulate it into HTTP2 before sending it on its way. HTTP2 is just one example, but I think you will see this benefit being used more often in the future.
All these functions put together are the reasons load balanced architecture made sense before and why we should remember it makes sense still. Sometimes the most obvious solution is the best solution.