If you’ve been reading our content about the importance of processes, you know the kinds of things that happen when standard procedure isn’t followed. Satellites crash, nuclear reactors melt down, and information security disasters ruin huge corporations.
For software companies, failure doesn’t always have such wide implications but it can mean carelessly shipping a product that creates catastrophic problems for your customers and destroys their trust in you. And of course, it’s a pain in the arse for your development team to fix.
In this article, I’m going to go over some famous software disasters where Quality Control dropped the ball, and look over some common quality control methods.
Quality control mistakes you definitely shouldn’t replicate
It’s always good to learn from the past, and it’s thanks to well-known software disasters that we’ve got tight frameworks for software quality control today.
Let’s take a look at some examples where quality control fell short:
Intel thought a mathematical bug was too rare to bother fixing in 1994
In 1994, a Lynchburg College math professor named Dr. Thomas R. Nicely reported that Intel’s early Pentium processors were returning incorrect values for some division calculations.
“It appears that there is a bug in the floating point unit (numeric coprocessor) of many, and perhaps all, Pentium processors. For example, 1/824633702441.0 is calculated incorrectly (all digits beyond the eighth significant digit are in error).” — Dr. Nicely
For most users, this would never be a noticeable issue. That doesn’t make it any less of an oversight on Intel’s part, though. While trying to triple processing speed, Intel changed the old floating point calculation algorithm to a new version with a lookup table of 1,066 entries. In the first generation Pentiums, however, only 1,061 entries were present…
This error means nothing for all but one in 9 billion use cases. It passed quality control and went into production. However, when Thomas Nicely publicly complained about the error, Intel had to replace $475 million worth of processors for a bug they they claim they’d noticed months back. It was PR nightmare, and costly to clean up.
The mistake in this case? Thinking that a bug affecting a core part of the software’s functionality is too small to be dealt with.
Microsoft branded paying customers as pirates and locked them out of features
Windows Genuine Advantage (WGA) — better known as ‘that irritating pop-up that tells you your OS is illegal’ — is famously frustrating and inaccurate, but at no point in time did it cause more frustration than on Friday, August 24, 2007.
A bug-infested, untested version of WGA was loaded onto the Windows update server and installed on all PCs that chose to update their operating systems for the 19 hours the patch was in effect.
“The problem began around 8 p.m. Friday, causing users to begin posting messages about it on Microsoft’s forums. Vista’s Aero graphical interface was among the features disabled for users accused of running pirated software, and user frustration reached significant levels before the problem was fixed.” — Katherine Noyes
Such an obvious flaw is bound to make customers question Microsoft’s quality control process, and even wonder why they bother buying Windows at all when they’re labelled as pirates any way.
This mistake was partly traditional programming errors, partly lack of quality control, and partly pure human error attributed to whoever uploaded the dodgy update file to the servers.
Quality control is the set of processes you have in place to avoid bugs
The simple answer to how you can avoid disasters like these is to have a set of processes in place for rigorously testing software before you ship it.
By examining the different systems, you can work out which would be the best for your company to adopt so you can ship better software faster.
The ‘zero defects’ methodology
Perhaps the simplest high-level way to approach quality control is to implement a ‘zero defects’ methodology. Used by Microsoft after the development of Word turned out to be a nightmare, the methodology doesn’t expect there to be nothing wrong with releases, but puts fixing bugs first above all else.
The reason this works is because it eliminates issues later on down the line, and makes sure developers aren’t building new features on a faulty code base. That means that some kind of freak software butterfly effect could never happen. Additionally, code written yesterday is a lot easier to understand for a developer than code written months back, just because it’s top of mind.
Schedule software development around fixing bugs, not building new features. Microsoft’s philosophy has evolved to put bug hunting as the #1 priority. (But somehow something will always go wrong.)
Total Quality Management (TQM)
Unlike Microsoft’s specific methods, TQM is a vague framework for development teams that emphasizes the importance of testing, improvement and communication while leaving the exact testing process up to you to decide.
The elements are:
- Root cause analysis
- Active employee participation
- Internal and external self assessment
- Continuous improvement
- Making well informed decisions
- Effective communication
Similar to Agile, TQM isn’t for development but for pure testing. As well as offering guidelines for process execution, it also has a framework for software process improvement.
For a specific example of a problem solved with TQM, check here to find out how an IT team drastically reduced user downtime.
Plan, Do, Study, Act
When solving any problem, the Plan, Do, Study, Act (PDSA) framework helps you break the issue down into steps, find the root cause, and implement it.
It seems simple, but PDSA really is at the heart of not making a fool of yourself in the software world. In fact, you’d be hard pressed to find any framework that doesn’t draw on PDSA.
Here’s the process:
- Plan. Recognize an opportunity and plan a change.
- Do. Test the change. Carry out a small-scale study.
- Check. Review the test, analyze the results and identify what you’ve learned.
- Act. Take action based on what you learned in the study step: If the change did not work, go through the cycle again with a different plan. If you were successful, incorporate what you learned from the test into wider changes. Use what you learned to plan new improvements, beginning the cycle again.
Check here for a guide on applying PDSA specifically to software development.
Quality control processes for software development
I’ve spoken a lot about the general methods of quality control, but now it’s time for a specific process you can adopt.
Later on in this series, I’ll be diving deeper into the methods and processes you need to get started with a proper quality control system for your software company. Anything in particular you’d like to see covered? Let us know in the comments.