Cloudflare went down

So Alexa shat the bed putting an audiobook on for my kids.

It worked in that it started playing but get no status on the app so had to go manually turn it off. (You know because the piece of **** doesn’t support timers).

And now I get to my Kindle and met with a blank white screen until I rebooted it.
 
Sooo. Apparently.

Cloudflares outage was caused by a single oversized configuration file used for bots and threat related blocks.

The file which auto generates from live threat intel, grew far beyond expected size. When the system reloaded it during routine opss, the bot management service crashed, triggering 500 errors across the global CDN.

Surprisingly, it apparently was not a DNS issue (color me surprised).
Just one config file that got too big and took down the interwebs for hrs.
eepyware on X
 
The file which auto generates from live threat intel, grew far beyond expected size. When the system reloaded it during routine opss, the bot management service crashed, triggering 500 errors across the global CDN.

They mentioned a "latent bug" as well, presumably referring to the bad handling of the oversize file.

But the deeper problem is the architecture that makes it possible for this to take the whole service down.

If this does turn out to be the root cause it won't be all that dissimilar to the CrowdStrike incident. Automated global updates couple the system together and create long-tail black swan type fragility.
 
Bro disabling proxying, WTF!? It's exactly because of what clients are willing to pay us for that we don't have the amount of resources to handle the bulk of rouge internet requests, bots, DDOS attacks that we resort to Cloudflare in the first place. I handle many high-profile sites and they all use CF for some sort of bot and DDOS protection. I asked what you would recommended in the place of CF for redundancy but you don't seem to comprehend why people would resort to CF in the first place. Facepalm.
IF Cloudflare is broken, then what do you have to lose? Would it be acceptable to your clients to just do nothing and blame Cloudflare?

Crowdsec could help as a redundant system that you could fallback on.
 
The following sites all appears to be down:

Shopify (and all Shopify sites)
Yoco
Vodacom
Downdetector
Takealot
News24
Clicks
Woolworths
FirstShop
Sage (Some services)
Crocs
Netwerk24
ENCA
EWN
LinkedIn
 
Top
Sign up to the MyBroadband newsletter
X