When the internet fails, do your profits fail along with it? Does your security still work?
Another day, another "internet outage." We all know the drill: A key provider fails, and a swath of the web goes dark because of it. It feels like the internet is broken, but what’s actually busted is one single vendor that many other businesses depend on.
But wait. As IT experts, we learned early that you must avoid single points of failure. So, why does it seem as if so many businesses rely on a single vendor of internet infrastructure services?
The answer is both simple and complex: Resilience is expensive. True multicloud and multi-vendor strategies are complex operational challenges. For many businesses, the cost of building redundant, fail-safe systems, even on the level of multivendor failover, is deemed higher than the cost of occasional downtime.
Risks involved in internet outages
However, there are not-so-obvious risks involved in outages that go beyond temporary business interruptions. One of these risks is the impact that the web infrastructure vendor’s service degradation has on the security services that the same vendor provides; these services potentially run on the very same infrastructure that experiences the outage.
One good example (but not the only one!) is the availability of distributed denial-of-service (DDoS) defense systems.
Full disclosure: Akamai is an internet services vendor that also offers DDoS (and other) security solutions. 🙂
In this blog post, we’ll lay out how we approach the outage conundrum that lies in running multiple defense technologies in parallel and, partially, on shared infrastructure. We believe that this information can be useful when assessing security posture questions, evaluating your options, and identifying the right partners with which to build a truly resilient and available security posture.
If the internet was down — would a DDoS attack still reach me?
Although the answer depends on several factors, it is certainly possible. Far-reaching, or even global, outages are rarely truly full outages. Instead, they are often a result of system degradations at a large provider of internet infrastructure services that include content delivery network (CDN), load balancing, DNS, and more.
These incidents can differ greatly from case to case, and they rarely occur for all regions, networks, or services simultaneously. A DDoS attack that’s launched during an outage — either on the provider or on one of their customers — is highly likely to have an impact, even when routing is broken and all attack traffic is not likely to find its way through the network.
If my website is already down, why worry about a DDoS attack?
Even if your website is unavailable because of a vendor back-end issue, a DDoS attack piling on at the same time is one of the worst scenarios you can face.
That situation would be particularly dangerous if you, as a customer, decide to temporarily remove your infrastructure service provider from your stack to stay online while they are down. Threat actors could detect the DNS change and will know that (with that switch) the target has also eliminated most or all of its protection layers.
Despite the risk, this is something that businesses frequently do during outages. Being “double down” — simultaneously offline from a system failure and under active attack — introduces several additional risks, including:
You (or your vendor) lose visibility into the real root cause. A DDoS attack during an outage can mask the underlying technical problem, distort or stall telemetry, and slow down your troubleshooting. Teams can waste valuable time untangling overlapping symptoms instead of fixing the real issue quickly.
Recovery becomes significantly harder and slower. As soon as you or your vendor start to bring systems back, the attack traffic can then flood recovery paths, overwhelm warm-up phases, stall autoscaling, or block key dependencies. What should have been a straightforward restart can turn into an uphill battle against hostile traffic. This can intensify if the recovery of the DDoS defense system itself is hindered.
It increases operational and financial damage. Two major incidents at once means that more people are pulled into situation rooms, and there are more service-level agreement (SLA) penalties and higher incident-response costs. It can also result in more reputational impact when users notice not just downtime, but chaos.
It creates an opening for future attacks. If the DDoS threat actors see that an outage leaves you or your vendor blind or slow to react, they’ll come back the next time you’re weak. Protecting yourself even during downtime signals resilience, not vulnerability.
Adding insult to injury
Internet infrastructure providers typically run multiple services on the same platform — for example, CDN, DDoS protection, and DNS. In a well-designed architecture, these systems typically run in separate, decoupled virtual instances with no cross-dependencies on one another, so that a malfunction in one of them doesn’t bring down the others.
A look at media headlines and incident reports, of course, tells a different, sometimes embarrassing story.
Cross-dependencies that cause devastating chain reactions are fairly common. A bug in one component causes a problem in another; sometimes even in a circular way: Component 1 causes component 2 to fail, but component 1 also depends on 2 to keep running. Because they are difficult to debug, the consequences can be as devastating.
A DDoS defense system that’s caught in such a scenario might fail entirely or partially when simultaneously confronted with an attack.
This is not a simple question of capacity degradation (i.e., how much attack traffic the system can absorb). A degradation of the system’s computational ability to intelligently and flexibly respond to a sophisticated DDoS attack that cycles through an array of tricky DDoS attack vectors and methods is just as dangerous.
The DDoS threat landscape has evolved. Are you prepared?
Although record-breaking DDoS attacks have gathered media headlines in the recent weeks, it is a disturbing fact that the most impactful DDoS attacks we observed over the past 18 months were “smart” attacks that didn’t simply use massive traffic volume.
Instead, these sophisticated attacks aimed to overcome a DDoS defense system’s ability to properly detect and mitigate the malicious traffic. Once successful, these attacks didn’t need massive traffic because they basically managed to unlock and open the floodgate first.
If there is one key takeaway for network security teams, then it is that the old approach to DDoS needs to evolve. Until very recently, DDoS attacks and DDoS defense were seen as a one-upmanship game of capacity.
Defense capacity is still highly relevant in light of the recent record-breaking attacks reported by various companies. However, it is no longer the sole foundation of a successful network security strategy.
Build architectures with enterprise-grade resiliency and availability in mind
Sharing infrastructure and resources with other infrastructure services can be problematic for DDoS protection systems. Effective security requires a clean, sophisticated architecture built from the start with enterprise-grade resiliency and availability in mind.
A dedicated, separate, and single-purpose DDoS infrastructure is an alternative that avoids these problems. This is the approach we take at Akamai.
This approach comes with a few challenges — for instance, it is more difficult to rapidly scale up dedicated defense capacity. However, our focus is (and always has been) on protecting the most critical workloads and brands on the planet, and this architecture provides excellent availability and robustness.
Any system that regularly experiences outages, goes offline, or can be easily overwhelmed by a sophisticated attack is pointless. We aim to achieve 99.99% network connectivity uptime and 100% platform availability for our DDoS solution, both on paper as part of our SLAs, as well as in real-world system uptime. This goal drives our design and architecture.
As we continue to evolve our platform, we are moving to new, hybrid approaches that we believe promise even better resiliency and uptimes. Stay tuned for more on this.
Nobody is perfect
It’s a universal truth: Every system can and will fail. Outages can hit even the most reliable infrastructure. We know it’s also true for us — we’ve been there.
Although we architect our solutions and services for the highest possible reliability and resiliency, they are generally also designed to support multivendor, multicloud, on-prem, and hybrid scenarios, so that failover and backup scenarios in almost all combinations can be realized, including failover across and to other vendors.
Our design goal is to provide technology that supports the highest resiliency levels for the most critical applications and workloads, suited for the largest global players in the industry.
Tags