Integrity360 - Commentary on CrowdStrike outage

  • CrowdStrike has had a catastrophic error that has taken a large percentage of the global IT systems offline. On the one hand it's shown how large CrowdStrike’s market share is, but it's also shown how fragile the interconnected world we live in can be.

    This issue has grounded airlines, halted broadcasters and taken channels offline, and, at the most critical end, severely impacted emergency services. In this instance a small change has led to a huge global impact, and the questions will be how and why it happened. CrowdStrike were very bullish in their mission statement: "We Stop Breaches". Unfortunately, this time, they've created the outage.

    The CrowdStrike ecosystem revolves around a single agent deployment to deliver their portfolio of security solutions, which operates permanently online, connected to their SaaS-based management platform. In a world where threats are constantly evolving, and we need to move quickly and often to counter them, this approach really works and has become the industry norm. Updates are delivered directly to the endpoint agents as they become available ensuring systems have the real-time protection they need. The downside, and what has happened with CrowdStrike today, is that a bad update can have wide ranging ramifications.

    RELATED: IT.ie forecasts €1M revenue increase from ethical hacking service

    With this specific issue, as it does all the time, CrowdStrike has pushed what it refers to as a Channel File, which would likely include updates to their threat detection definitions to all CrowdStrike agents. This file, when processed by the CrowdStrike agent running on a Windows device, causes the agent to dramatically crash, creating a Blue Screen of Death (BSOD) and restarting the machine. Unfortunately, this file is run by the agent during system boot, crashing the system and repeating the process creating a restart loop.

    The fix is relatively trivial. Once the agent is online it will just download the fixed Channel File. The challenge is getting it online, and this is no small feat. As the system crashes and reboots before getting online, in many cases it requires manual intervention to fix as the users will need to end a special administrative mode before the system boots and use the command line to search for and delete the file. As many users aren't IT experts or won't be old enough to remember the days of MS-DOS, this will be entirely new to them, and a nightmare for IT teams to orchestrate.

    But, unfortunately, it gets worse. Best security practice to protect your data is to have your data stored on the system encrypted at the hard drive level. This prevents data being directly extracted from the drive, should it be stolen, but, importantly, also protects the boot process. The knock-on effect is that to access the administrative mode for systems with drive encryption (provided as part of Microsoft Windows), systems will need to be put in recovery mode and may require a recovery key, unique to that system, in order to implement the fix. The recovery process is going to be long and really test IT teams and resilience of organisations.

    RELATED: Datapac’s Managed Threat Ops service analyses over 380 million cybersecurity events in first year

    Questions need to be asked about how this has happened. Is this the product of agile, CI/CD (Continuous Integration/Continuous Delivery) software development? If you're introducing an update, even to an external file, has this not been thoroughly tested through a QA process? The widespread impact calls this into question. Or, if it has gone through the testing and QA process, has the file been subverted further along in the process by a threat actor? There's absolutely evidence currently that this is the case, but we only need to look at the SolarWinds breach to see evidence of this happening in the past. These are all questions CrowdStrike will need to answer over the coming days and weeks.

    Lastly, we should also think about the current approach to security, and the unification of technologies and vendors. The move to XDR, for example, is putting a lot of eggs in a single basket. Will the impact of this incident cause the more risk-averse organisations to distribute their security controls and risk across more vendors and segment areas of the business? Potentially, but all will definitely be acutely aware of the trust we put into vendors and our recovery plans. CrowdStrike haven't been alone in this, with Microsoft only this week confirming a major outage for Microsoft 365 caused by a configuration change. A not so gentle reminder that Availability is equally as important as Confidentiality and Integrity in the CIA security triad."

    Subscribe to our Sync NI newsletter for all the latest technology news, jobs and upcoming events in Northern Ireland.

    Read Sync NI's latest quarterly magazine online for free here.

    Visit Sync NI online for the latest technology news in Northern Ireland.

Share this story