Saturday, November 9, 2013

A Bit Flip That Killed?

During my bitsquatting research I was amazed how many critical RAM chips in a typical PC lack error correcting memory.

It turns out that ECC is missing from an even more critical device: cars.

Details from the recent Toyota civil settlement show that the drive-by-wire control of Toyota cars was lacking error detection and correcting RAM.

Although the investigation focused almost entirely on software, there is at least one HW factor: Toyota claimed the 2005 Camry's main CPU had error detecting and correcting (EDAC) RAM. It didn't. EDAC, or at least parity RAM, is relatively easy and low-cost insurance for safety-critical systems.

I can't fathom why that would ever be the case. The amount of RAM required is relatively small, and the extra cost is inconsequential to the total cost of a car. Oh, and the software runs next to a car engine.

"We've demonstrated how as little as a single bit flip can cause the driver to lose control of the engine speed in real cars due to software malfunction that is not reliably detected by any fail-safe," Michael Barr, CTO and co-founder of Barr Group, told us in an exclusive interview. Barr served as an expert witness in this case.

Drive-by-wire systems aren't the only critical control systems susceptible to bit-errors. There is some speculation that a bit-error caused a sudden altitude drop in a Qantas A330. Amazingly, airplane software systems did not have to consider single or multiple bit errors until 2010 (see page 222) to achieve certification.