“Why Plan B Often Works Out Badly.” Interesting explanation of Risk Management, from MSNBC commentator, March 18, 2011. Two quotes follow:
Engineers used to talk about guarding against the “single point of failure” when designing critical systems like aircraft control systems or nuclear power plants. But rarely does one mistake or event cause a catastrophe. As we’ve seen in Japan, disaster is usually a function of multiple mistakes and a string of bad luck, often called an “event cascade” or “propagating failures.”
Defending against and preparing for such event cascades is a problem that vexes all kinds of systems designers, from airplane engineers to anti-terrorism planners. There’s a simple reason, according to Peter Neumann, principal scientist at the Computer Science Lab at SRI International, a not-for-profit research institute. Emergency drills and stress tests aside, Neumann said, there is no good way to simulate a real emergency and its unpredictable consequences. Making matters worse is the ever-increasing interconnectedness of systems, which leads to cascading failures, and the fact that preventative maintenance is a dying art.
Thanks to Anne Strauss for calling it to my attention.
____________________________
Of general interst, the posting today (March 20) by blogger Phil Palin on the Japan Disasters.
What a singularly misinformed commentary from Neumann. We are not talking about completeness or computability here. This is a matter of water inundating the EMERGENCY generators, which have always been understood to be the SPOF for a meltdown at this type of plant–if all else fails, the generators must run.
This is a simply problem to solve, not a complicated one. No matter what other complications, connections, or incomprehensible jargon and job-securitizing babble comes from the PhDs, you put the generators up high if you want them to make it. Of course, a more robust plan would put some high, some low.
Complicated systems, even those with incompletely understood or implied connections, are best secured one subsystem at a time. Call it encapsulation, or graceful degradation, or fault tolerance (so to speak), but the one thing it is not is complicated.
On a connected note, in my opinion, scenario-based drills and so forth too often are substituted for unit testing. Both are necessary, but only one is visible and satisfying on an executive summary. That’s a more pernicious problem than connectedness.
Claire! An interesting article on longterm recovery in Japan in Today’s
NY times.