Cloudflare, Microsoft 365 suffer major outages

Share post:

Two major IT providers suffered service problems this morning, causing CIOs and CISOs hours of grief. A huge outage affected more than a dozen of content provider Cloudflare’s data centers, which affected a large number of major websites. It began around 2:34 a.m. Eastern time and was reported by the company to be resolved about an hour and a half later. Ironically, the problem was caused by Cloudflare making a change to increase its resiliency. Meanwhile the cloud-based Microsoft 365 service also reported outages. Around 6 a.m. Eastern the company tweeted that it was investigating complaints some users were experiencing delays or connection issues when accessing the Exchange Online service. That expanded to the realization that multiple Microsoft 365 services were experiencing delays, connection and search issues. The fault was in the traffic management infrastructure “not working as expected,” the company said around 8 a.m. Eastern. “We’ve successfully rerouted traffic, and we’re seeing an improvement in service availability.” In a blog this morning Cloudflare officials said traffic in 19 of its data centers were affected. Unfortunately they handle a significant proportion of its global traffic. “This outage was caused by a change that was part of a long-running project to increase resilience in our busiest locations,” officials said. “A change to the network configuration in those locations caused an outage which started at 06:27 UTC. At 06:58 UTC the first data center was brought back online and by 07:42 UTC all data centers were online and working correctly.” “We are very sorry for this outage. This was our error and not the result of an attack or malicious activity.” Over the last 18 months Cloudflare has been trying to convert all of its busiest locations to a more flexible and resilient architecture, the company said. A critical part of this new architecture, which is designed as a Clos network, is an added layer of routing that creates a mesh of connections. This mesh allows Cloudflare to easily disable and enable parts of the internal network in a data center for maintenance or to deal with a problem Like other IT networks, Cloudflare uses the BGP protocol. As part of this protocol, operators define policies that decide which prefixes (a collection of adjacent IP addresses) are advertised to peers (the other networks they connect to), or accepted from peers. These policies have individual components, which are evaluated sequentially. The end result is that any given prefixes will either be advertised or not advertised. A change in policy can mean a previously advertised prefix is no longer advertised, known as being “withdrawn”, and those IP addresses will no longer be reachable on the Internet. While deploying a change to Cloudflare’s prefix advertisement policies, a re-ordering of terms caused the withdrawal of a critical subset of prefixes, causing things to go sideways. The post Cloudflare, Microsoft 365 suffer major outages first appeared on IT World Canada.
Howard Solomon
Howard Solomonhttps://www.itworldcanada.com
Currently a freelance writer, I'm the former editor of ITWorldCanada.com and Computing Canada. An IT journalist since 1997, I've written for several of ITWC's sister publications including ITBusiness.ca and Computer Dealer News. Before that I was a staff reporter at the Calgary Herald and the Brampton (Ont.) Daily Times.

Featured Tech Jobs

SUBSCRIBE NOW

Related articles

Cyber Security Today, Week in Review for week ending Friday, April 26, 2024

This episode features a discussion on the latest in the Change Healthcare ransomware attack, a vulnerability in an abandoned Apache open source project, the next step in Canada's proposed critical infrastructure cybersecurity law and the future

Cyber Security Today, April 26, 2024 – Patch warnings for Cisco ASA gateways and a WordPress plugin

This episode reports on the malicious plugin worm that refuses to die

Cyber Security Today, April 24, 2024 – Good news/bad news in Mandiant report, UnitedHealth admits paying a ransomware gang, and more

This episode reports on the danger of using expired open-source packages, a tool used by a Russian hacking group and passw

Google Play introduces new biometric verification with a user warning

Google has recently announced updates to the biometric verification process for Google Play purchases, aiming to bolster security...

Become a member

New, Relevant Tech Stories. Our article selection is done by industry professionals. Our writers summarize them to give you the key takeaways