At the end of last year, I published a blog on test automation best practice – a topic that has suddenly popped into the spotlight again thanks to the Federal Aviation Administration’s recent weekend disaster. A newly implemented software which was meant to enhance the en-route tracking system and get jetliners to their destinations more quickly, instead malfunctioned, causing nearly 1,000 flight cancellations and delays in the space of a few hours on Saturday.
So, apart from several thousand ruined vacations and a lot of public embarrassment, what happened? In situations of this kind, there is generally no single identifiable cause – it's often a series of failures. Every little compromise at each step adds up. There can be many factors affecting the quality of software, such as failing to pay attention to non-functional testing – crucial areas such as performance and resource leakage over time, to mention just two. One important aspect of performance testing is that it can help verify how quickly and gracefully software recovers from a crash or breakdown. In the case of the FAA, it took 5 miserable hours.

Errors happen – so you must test

The lesson here is that errors happen. Error-free software is a myth. That said, most possible errors can be mitigated through rigorous testing and automated testing, providing it has been done to a high standard, is always a relatively cheap and efficient safety guard to continuously verify if, at any point, a bug has been introduced – manual retesting all the time is too costly to contemplate and is error prone in itself.
Best practice dictates that you implement risk-based testing and, most importantly, exploratory testing.
Exploratory testing operates on the basis that the more we explore software as a user, the more likely we are to uncover these type of scenarios – and on this basis, one doesn't even have to be tester to do exploratory testing. Testing on a simulated live environment is one of the most important safety nets – typically, we do this by "dog fooding" our software.

Why clean code matters

Although it may sound a bit strange, problems can also arise because developers may fail to adhere to good coding standards and write clean (human readable) code; this can lead to a lot of confusion within a development team when they have to work on each other's code or when someone new joins. Developers may unknowingly (re)introduce bugs at any point, causing a snow-ball effect and increasing the amount of testing required with every change. Ultimately, testing is key, but none of these steps is a replacement for what we should all be aiming for: good quality code.