Avoiding Outages and Preventing Widespread System Failures

Widespread software outages can often be prevented. Resilience – a software program's ability to handle and maintain critical functionality in the face of bugs or other unexpected conditions – is essential given the significant role software plays in the infrastructure of the economy. As with previously discussed security principles, many common types of software flaws can be preemptively addressed through systematic and known processes that prevent or minimize the likelihood of outages. Systematic processes for software resilience include rigorous testing of both code and configuration, incremental rollout procedures, and the development and use of APIs that minimize risk.

Agency staff have acknowledged that resilience can be threatened by lack of competition in critical inputs. Dominant firms pushing maximum adoption of their digital products and services can create single points of failure for entire industries, within and beyond the digital economy. As the number of users relying on the same enterprise software increases, so too can the scale of disruption caused by outages of that software. Every software outage is an opportunity to take a critical look at systems in place to achieve resilience and assess these systems for appropriateness and sufficiency.

Beyond code, configuration and data changes can lead to failures. An inability to treat these with the same care and robust processes can lead to outages. Software is made up of code, instructions that tell the computer how to operate. But these instructions may reference configuration files or other data which can also lead to failures. Consider a local ice cream store with a mobile app. The app may contain code for displaying a menu based on a set of items defined in a configuration file (or received from a server). Even if the code for displaying the menu doesn't change, a change to the configuration file could trigger a previously unknown bug — for example an apostrophe being added for the first time with "Cookies 'n Cream" — causing the app to crash.

The principle of treating code, configuration files, and data changes with the same care is especially critical when building software run by banks, hospitals, and transportation services. Software vendors must use appropriate software architecture and processes to guard against unexpected failures from both code and non-code changes.

Everything Everywhere All at Once may be a great movie – but it's a poor strategy for deploying software changes. Even the most rigorous testing environment can't mirror the diversity of real-world computers. When deploying changes to automatically updating software, one common strategy to mitigate this risk is to initially deploy to a small subset of machines, and then rolling out to more users after it’s confirmed that the smaller subset has continued to function without interruption. It's important to consider this strategy for not just code changes, but also updates to configuration and data.

Auto-updating software can be an important mechanism to ensure that critical security or stability updates get deployed in a timely manner. For security software, it can help ensure companies using the software are running a version able catch the latest threats. Yet, software vendors may unnecessarily risk causing widespread outages by deploying these updates broadly without a mechanism to detect and reverse changes that cause major issues for a high percentage of machines before the software is widely deployed.

Platforms and operating systems that fail to provide resilient APIs risk the stability of the whole system. An API, or application programming interface, is a way for two pieces of software to communicate in a structured way. For example, the local ice cream store's app may use the phone's location API to tell a customer how far they are from the store. A poor API design could mean that a mistake by the app's programmer causes the customer's whole phone to crash, rather than just the app itself. Building APIs that account for human error can help programmers by limiting the damage from mistakes and preventing small issues from becoming larger ones.

However, creating resilient APIs is not an excuse to lock competitors out of the access they need to build effective alternatives. In order to be competitive with a platform or operating system vendor's own offerings, some types of software require broad, low-level access (the ability to read detailed information and control aspects of the inner workings of a system). Some might argue that allowing such access to third parties requires companies to sacrifice resiliency, and that they can only have one or the other. But just as competition through interoperability can coexist with privacy and security, so too can interoperability coexist with reliability. When platforms or operating system build safer versions of these interfaces, they must do so in a way that, to the fullest extent possible, supports, rather than suppresses, the ecosystem of existing and emerging products. Incumbent companies shouldn't use API changes in the name of resilience as an excuse to cut off broad access for competitors, especially not while continuing to retain broad access for their own offerings.

Widespread software outages can lead to substantial consumer injury including the inability to access critical services and financial loss. Given the enormous impacts on consumers, small businesses, and the public that arise from failures in computer systems, FTC staff will continue to monitor such developments and investigate when warranted. The FTC has long examined companies' computer systems to determine whether proper steps were taken to secure the systems and the consumer data they hold. The FTC’s record investigating computer systems has shown that there are steps companies can take to minimize the likelihood of bad outcomes. While the state of the art in software continues to improve, there are known approaches that systemically and preemptively address many potential issues, either entirely preventing them or dramatically reducing the likelihood of them occurring.

Thank you to the staff who contributed to this post: Simon Fondrie-Teitler, Hannah Garden-Monheit, Wells Harrell, Amritha Jayanti, Erik Jones, Stephanie T. Nguyen, Shaoul Sussman, Grady Ward, Ben Wiseman.

Tags:

The FTC Is on the Front Lines of Tech Innovation & Regulation

Behind the FTC’s 6(b) Report on Large AI Partnerships & Investments

Surveillance Pricing Update & The Work Ahead

AI and the Risk of Consumer Harm

Enforcement

Search or browse
the Legal Library

Take action

Competition Matters Blog

New HSR thresholds and filing fees for 2025

Policy

Search or browse
the Legal Library

Take action

Technology Blog

The FTC Is on the Front Lines of Tech Innovation & Regulation

Advice and Guidance

Take action

Consumer Advice

Business Guidance

Servicemembers:
Your tool for financial readiness

Get consumer protection basics, plain and simple

Learn how the FTC protects free enterprise and consumers

Looking for competition guidance?

News and Events

Latest News

FTC Highlights Actions to Protect Consumers from Impersonation Scams

Upcoming Event

Chairman Andrew N. Ferguson keynote speaker at the Stigler Center Antitrust and Competition Conference

Follow us on social media

About the FTC

Avoiding Outages and Preventing Widespread System Failures

Subscribe

Additional Information

More from the Technology Blog

Get Business Blog updates

Search or browse the Legal Library

Take action

Competition Matters Blog

Search or browse the Legal Library

Take action

Technology Blog

Advice and Guidance

Take action

Servicemembers: Your tool for financial readiness

Get consumer protection basics, plain and simple

Learn how the FTC protects free enterprise and consumers

Looking for competition guidance?

Latest News

Upcoming Event

Follow us on social media

Feature

Latest Data Visualization

Featured

Meet the Chairman

Avoiding Outages and Preventing Widespread System Failures

More from the Technology Blog

The FTC Is on the Front Lines of Tech Innovation & Regulation

Get Business Blog updates

Search or browse
the Legal Library

Search or browse
the Legal Library

Servicemembers:
Your tool for financial readiness