Japan Disasters — risk management explained

“Why Plan B Often Works Out Badly.” Interesting explanation of Risk Management, from MSNBC commentator, March 18, 2011. Two quotes follow:

Engineers used to talk about guarding against the “single point of failure” when designing critical systems like aircraft control systems or nuclear power plants. But rarely does one mistake or event cause a catastrophe. As we’ve seen in Japan, disaster is usually a function of multiple mistakes and a string of bad luck, often called an “event cascade” or “propagating failures.”

Defending against and preparing for such event cascades is a problem that vexes all kinds of systems designers, from airplane engineers to anti-terrorism planners.  There’s a simple reason, according to Peter Neumann, principal scientist at the Computer Science Lab at SRI International, a not-for-profit research institute. Emergency drills and stress tests aside, Neumann said, there is no good way to simulate a real emergency and its unpredictable consequences. Making matters worse is the ever-increasing interconnectedness of systems, which leads to cascading failures, and the fact that preventative maintenance is a dying art.

Thanks to Anne Strauss for calling it to my attention.

____________________________

Of general interst, the posting today (March 20) by blogger Phil Palin on the Japan Disasters.

2 thoughts on “Japan Disasters — risk management explained

  1. What a singularly misinformed commentary from Neumann. We are not talking about completeness or computability here. This is a matter of water inundating the EMERGENCY generators, which have always been understood to be the SPOF for a meltdown at this type of plant–if all else fails, the generators must run.
    This is a simply problem to solve, not a complicated one. No matter what other complications, connections, or incomprehensible jargon and job-securitizing babble comes from the PhDs, you put the generators up high if you want them to make it. Of course, a more robust plan would put some high, some low.
    Complicated systems, even those with incompletely understood or implied connections, are best secured one subsystem at a time. Call it encapsulation, or graceful degradation, or fault tolerance (so to speak), but the one thing it is not is complicated.
    On a connected note, in my opinion, scenario-based drills and so forth too often are substituted for unit testing. Both are necessary, but only one is visible and satisfying on an executive summary. That’s a more pernicious problem than connectedness.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.