Win back lost trust by working smarter!¶

In a typical enterprise, a division of responsibilities is codified: an IT team runs IT systems and a security team operates security systems. There might not be any risk of security systems affecting IT systems until the security tools are running on end-user devices, servers and as active elements in the network (Firewall admins will agree with me, they get lots of unwarranted grief from IT teams that “firewall is slowing things down”).

Out of the security tool that have potential impact on IT managed systems are the anti-malware kernel-hooked drivers. As the cyber threat actors improve their attacks, so do capabilities of anti-malware tools. To perform their function efficiently these are allowed privileged access into the deeper levels of the operating systems and applications. That is where the technical, responsibility and incident management issues arise. To resolve these, IT and Security teams must work together, not against each other.

Take a security tool that requires a piece of software (agent/service/kernel driver) to run on IT manages systems, be it end user computers or servers. The security team cannot and should not demand that the IT team install the said software on their systems, blindingly trusting the security team that “this software is safe”.

Instead, the IT team should insist on correct justification, performance impact testing. An assessment should be done how these tools, managed by a security team, affect IT team’s Recovery Time Objectives (RTO) and recovery Point Objectives (RPOs) contract to between the IT team and the rest of the business.

Unfortunately, based on my experience, and the analysis of the biggest IT incident caused by a security company to date, many enterprises even in the regulated industries failed to do just that.

You might recall these businesses that, even days after the Crowdstrike distributed faulty channel update ¹ and released a fix a few hours later, were unable to resume normal operations.

Take Delta Airlines as an example. While all other US airlines restored their predations within two days of the fix made available, the Delta airline was unable to operate for 5 days. As per Crowdstrike blog post, the blame for not restoring in due time is shared between the Crowdstrike’s IT and Security teams.

While I am not advocating for the reduction of Crowdstrike portion of blame, I argue that the failure to resume operations once the fix was available, represents failure of IT and security teams in affected organisations.

IT teams’ primary objective is to deliver business value by making sure necessary IT systems are available and performing within agreed parameters. Security teams’ primary objective is to reduce the probability of material impact due to a cyber event. The Crowdstrike was not a cyber event. It was an IT event that was caused by a security vendor. Similar events happen due to Microsoft blunders every year.

Inevitably, the lack of preparedness to restore normal operations within agreed RTOs and RPOs tarnishes both IT and Security teams’ reputation in the other business functions executives’ books.

The lost trust and reputation are difficult to regain. As an industry ² we need to learn from this and work smarter.

The following are three lessons learned from this era-defining incident:

Focus on testing recovery based on agreed RTO & RPOs. Security teams should insist that IT team perform recovery testing covering scenarios where a security tool makes the operating system non-bootable.
CIOs and CISO should talk jointly to the rest of business executives and explain the needs for specialised security tooling but also assurances that the tested recovery is within the agreed parameters (e.g. RTOs and RPOs).
Engage with the company’s legal counsel and procurement to review the security vendor’s contracts and identify unfair advantages that vendors have embedded in the contracts regarding the compensations due to their faults in service delivery.

iCloud: Cyber think tank footnote.md

https://en.wikipedia.org/wiki/2024_CrowdStrike_incident ↩
Even through cyber security is not officially recognised as an industry: https://commonslibrary.parliament.uk/research-briefings/cbp-8353/ & https://www.ibisworld.com/united-kingdom/list-of-industries/ ↩