The Fastly Incident Shows Why Classic Load Balancing Systems Could Benefit From A Distributed Peer-To-Peer Approach
In this article we will discuss:
Why were a large portion of internet sites unavailable to web users?
Over the course of Tuesday, June 8th, 2021, internet users experienced something they are not used to: an ‘HTTP Error 503 service unavailable response’. This happened globally across ~50% of internet sites that are currently using one out of two of the world’s major Content Delivery Networks (CDN), by the name of Fastly. People trying to access news sites, marketplaces, and other sites simply could not gain access, and quickly took to Twitter in frustration (‘Was this another cybersecurity attack or just a technical failure?’, they wondered publicly). The incident was solved within the hour by the company’s technical team, but left the world wondering about the circumstances, and how this sort of incident could be prevented going forward.
But before we dive into this, let’s go through a few quick definitions so that you can better understand the context of what happened, and ways in which load-bearing networks can be improved.
First things first – defining ‘Error 503 responses’, and explaining CDNs
What exactly is an ‘HTTP Error 503 service unavailable response’
Simply put, it is a server-side HyperText Transfer Protocol (HTTP) error which indicates that the server a user is trying to access is not currently able/ready/capable of handling a given request. Often users will see this error code when a server is undergoing some form of maintenance or has too many requests, and is overloaded.
What are Content Delivery Networks (CDNs), and what went wrong?
Say you are located in the U.S. and are trying to access a Britain-based website. Most people think they are actually surfing the local British site but in actuality, this would create an uncomfortable time lag. It is for this reason that CDNs save up-to-date copies of these websites in datacenters close to their target audiences, in this case, the U.S.
Fastly tweeted that it “identified a service configuration that triggered disruptions”, across its proprietary machine clusters globally.
CDN technology is built around the concept of distributing internet traffic via nodes in order to achieve even-keeled load balancing, and maintaining quick content delivery i.e. optimized latency.
In short, algorithms direct traffic to the best ‘node candidate’ (either from an availability perspective or a performance perspective) so that end users can gain access as quickly as possible to the site they are trying to access.
But if that were 100% the case then the world would not have experienced an ‘service unavailable response’ in the first place. Let’s explore some options companies dealing with load-bearing distribution challenges can use in order to improve their network stability.
What lessons can be learned from peer-to-peer data technology?
Much like Blockchain technology, peer-to-peer data technology takes a decentralized approach – it enables companies to use real devices provided by real people who actively opt-in, and are compensated. Millions of user devices effectively become the critical mass of the network instead of using a limited subnet of IPs from a datacenter.
By dividing the ‘load’ up among millions of peers, companies are essentially able to create a network which doesn’t depend on any specific server but can go through the grid in many different ways. This type of traffic distribution ensures that networks are constantly operational, as there is no dependency on any specific node. It also takes care of potential latency issues due to a target site’s location, by making use of peers from nearby locations.
Let’s further unpack these ideas using two practical components of a network/traffic-based business:
By utilizing a global P2P network companies can perfect their operational maintenance. This can be achieved by performing:
Preemptive cybersecurity: As with the Colonial Pipeline ransomware attack, companies can have their red teams use global data networks to search for vulnerabilities. For example, an in-house cybersecurity team can use a global network of IPs in order to map, and monitor internet-exposed attacks (outside of their firewall).
Operational load-bearing testing: Much like with DDoS preventative testing, teams can use large quantities of IPs from different locations in order to test, and find malfunctions in algorithmic load-bearing capabilities.
On this same note, and more importantly in the context of ‘Fastly’, companies looking to decentralize network systems further may want to consider moving towards a peer-based model. If CDN copies of sites were being kept and routed through millions of peer devices in close proximity to end user geolocations, it would help to eliminate pressure on ‘node routing’ with infinitely more readily available options.
The bottom line
Even a 1-hour ‘service unavailable response’ can be a scary thing for users, and companies alike, causing the latter considerable monetary losses. In the name of enhancing current systems, companies stand to benefit from a more distributed peer-to-peer approach.