Cloudflare has revealed the cause behind an outage that took down much of the internet: one bad file.
On Tuesday, an incident at the web infrastructure company meant that many of the world’s biggest websites were knocked offline. Visitors to products such as ChatGPT, X and many other sites instead saw confusing error messages.
It quickly became clear that the problem was at Cloudflare, which provides technologies intended to ensure that websites are able to serve their pages to visitors. A technical error meant that those websites instead stopped working.
Some – including those working at Cloudflare – had speculated that the issue could be the result of a major cyber attack. But the company has now revealed that the problem was internal, and relatively simple.
Cloudflare has a “feature file” that keeps its systems updated with possible threats to work against. But a change in the company’s systems meant that its database was putting multiple entries into that file, which meant that it rapidly doubled in size, and was then sent around to all of the machines in Cloudflare’s network.
The software that runs on those machines tried to read that file. But it has a limit on its maximum size, which meant that the software failed and so did a host of apparently unconnected websites.
Cloudflare had initially thought that it was under a cyber attack but eventually identified the problem. It was then able to update it and the issue was fully fixed around six hours after it began.
Matthew Prince, Cloudflare’s chief executive, said that the company “are sorry for the impact to our customers and to the Internet in general”. “Given Cloudflare's importance in the Internet ecosystem any outage of any of our systems is unacceptable,” he wrote in a post explaining the problems.
“That there was a period of time where our network was not able to route traffic is deeply painful to every member of our team. We know we let you down today.”
© Independent Digital News & Media Ltd