August 1, 2024

Jason Keogh

The Largest IT Outage in History: Lessons Learned

On July 19th, 2024, an update by CrowdStrike triggered what some have called “The largest IT outage in history.” The CrowdStrike IT outage will certainly always be remembered for the impact it had across global enterprises, airports, emergency services, etc.

However, updates that cause issues happen every day. And while this CrowdStrike IT outage was perhaps the largest in history, I think “to date” should be appended to that moniker. Why? Because there’s always a chance there could be an even worse outage just around the corner!

During the outage, I spoke with several customers who used the 1E Client to safeguard operations and ensure employees could work. After all, there’s no larger negative impact on DEX and productivity than a blue screen of death. Other customers I spoke to needed assistance getting machines out of blue screen.

Remember, you can hope for the best, but should always be prepared for the worst. There are some simple preparatory steps to make this much easier to do.

Here are my top 3 lessons from this outage: Things that all 1E customers can do to make sure they're ready for the next outage, no matter how big or small it is.

Always have centralized backups of the BitLocker recovery key for each device
Ensure you can use the 1E Client, even in a safe mode boot
Prepare to equip users to fix problems, for when you can’t remote into systems

Let’s dive into each of these in turn.

Always have centralized backups of the BitLocker key for each device

This is a piece of IT hygiene that any company using BitLocker should already be on top of. However, as the outage showed, many weren't prepared.

There's a simple 1E Automation, which:

Generates the BitLocker recovery key
Stores this in the 1E Platform as a persistent device tag, so that it is available and easy to find, even if a machine is offline and
Stores the same string in Active Directory—in case that’s where people look for it during a crisis—there’s no harm in being over-prepared. Two backup locations are better than one!

Ensure you can use the 1E Client, even in a safe mode boot

Safe mode is a fantastic tool. It enables only the most important, essential services and software when Windows boots. This is so that, for example, when a service installed on your machine has a bad update, you can still boot the machine up to repair the issue.

There's a 1E Automation that allows you to set which version of Safe Mode you want the 1E Client to work in. We default this to Safe Mode with Networking and suggest that’s the one you use.

It’s critical that you do not enable the 1E Client in both versions of safe mode (with networking and minimal). To guarantee this, the automation will disable one when you enable the other. If you set Windows to run the 1E Client in Safe Mode with Networking, it'll make sure it isn't enabled for Safe Mode minimal, and vice versa. Allowing startup in only one type of safe mode means that if there's an issue with the 1E Client, you can get to a safe mode that doesn't start the 1E Client.

During the CrowdStrike IT outage, you could boot to Safe Mode with Networking. If you had set 1E Client to work in this mode, you could then issue an instruction to rename the offending update files. This works to resolve the issue across all machines in seconds.

If you did not have this setting already in place, your users would have to go into the right folder to delete the files by themselves. Regardless of their comfort level in doing so, they may not have had administrative rights that would enable them to take the required action.

This was part of the reason why fully resolving the issue took so long, many days in some organizations.

Having the 1E Client set to be available in Safe Mode with Networking allows you real-time control of remote devices while they are in that state.

Prepare to equip users to fix problems, for when you can’t remote into systems

It’s always the unknown unknowns that get you, and it’s important to prepare for what we can predict. The 1E Client is great at making sure devices are kept in a compliant state, which will deliver the best end-user experience, starting with being available and stable.

If you can't communicate with devices directly, if the 1E Client is running—even when the devices aren't connected to the network—you equip users with instructions to "heal" themselves. The 1E Client can run automations in reaction to “Triggers”, even when the device is offline.

Our goal is to ensure that you and your users have the tools available when needed to react to unforeseen circumstances.

For example:

1. Enable users to self-elevate their user to become a Local Administrator user in Windows.

Depending on who the user is (persona-based) you can make this easily available. For technical users (e.g. those in your IT team) you can allow them to self-elevate on-demand with a desktop icon. For non-technical users, you can leave resources available but unknown, requiring a password to access. This allows the user to gain access, under guidance, to an elevated command line or add their user to the local admin group. This is much safer than, for example, having a common local admin user with a common password on all machines.This elevation can be revoked automatically, for example, after a period of time, or on log-out (whichever is sooner).

2. Have an instruction that allows boot to safe mode with networking enabled.

Often the challenge is in getting the user started. There are several steps to reboot a machine to safe-mode or safe-mode with networking. You can simplify this by having an instruction that lets you start the process from anywhere. This will bring one, some, or all devices to a safe mode or safe mode with networking state. This should be controlled so that approval is needed by someone else before it can be run. All of this is possible in the 1E Platform (it’s default that such actions need approval by someone else!).

If you follow these steps, the next time an outage, large or small, comes along, you and your devices will be best prepared to handle whatever may occur.

Jason Keogh

Jason is the VP of Solutions at 1E, where he helps IT leaders leverage the 1E Platform to solve the problems that plague them, whether that’s relating to DEX, the digital workplace, or cyber security. Since joining 1E in 2014, Jason has held leadership roles, including VP of Product and Field CTO. He now leads a team focused on delivering innovative solutions to strategic partners and customers.
A recognized authority in IT Asset Management (ITAM) and Information Security, Jason represents Ireland on several ISO subcommittees and serves as an editor for key ISO standards, including IT Asset Management (19770) and Information Security (27000). He regularly speaks at international conferences on DEX.
Jason’s key areas of interest include AI and Low-Code solutions, IT Asset Management, and the evolving intersection of digital workplace technologies and cybersecurity.

IT'S OFFICIAL

1E is now a TeamViewer company.

TeamViewer DEX Helps with

TeamViewer DEX for

Loved by users

Core Capabilities

Add-ons and Extensions

TeamViewer DEX Platform

Resource Library

Blogs

Use Cases

DEX Glossary

Other Resources

What is Digital Employee Experience (DEX)?

Our Customers

DEX ROI

Professional Services

Trust, Security, and Compliance

Federal

Customer Resources

Customer Success

Company Overview

Partners

Events and Webinars

Careers

Newsroom

CSR

Contact

About 1E

August 1, 2024

Jason Keogh

The Largest IT Outage in History: Lessons Learned

Always have centralized backups of the BitLocker key for each device

Ensure you can use the 1E Client, even in a safe mode boot

Prepare to equip users to fix problems, for when you can’t remote into systems

Jason Keogh

Related Posts

Endpoint management & security

Revolutionizing IT Support: TeamViewer and 1E Now Seamlessly Integrated in ServiceNow

Digital Employee Experience (DEX)

Unlocking Seamless Remote Connectivity with the Tensor Integration on the 1E Platform

Endpoint management & security

DEX Revolution: Turning Insights to Action with 1E

SUBSCRIBE NOW

Get 1E digests straight to your inbox, including the latest thought leadership, insights on digital employee experience, endpoint management, and more.

About

Compare

Platform

Solutions

Resources