Big disasters will always happen unless we expect them to

atlantis NASA last flight of Atlantis

Can We Do Better at Managing Rare, Big Risks?
[Via Dot Earth]

Can humans overcome traits that lead to well blowouts and other foreseen disasters?

[More]

No matter how great the process is, humans will act to increase the failure rates unless there are strong incentives not to.

It seems to be a basic part of human nature to slowly become more and more lax when it comes to processes that have very low probability of producing a disaster, even when the disaster can be of horrific extent. When nothing bad happens, people ask why they have to follow procedure. Since the process has acted to prevent a disaster and because the disaster has a low probability of happening at any one time, the answer is usually unsatisfactory, especially when the question is being asked by someone who wants to cut costs or has other pressures to bear.

Basically people just do not understand probability. They do not understand how their behavior can take a process with a very small probability of failure and substantially increase the chances that something catastrophic can happen.

Good biologists almost intrinsically know about how probabilities can be altered. They know all about the effect time and large numbers can have on low probability processes. And not only with evolution.

Almost every biological process is based on stochastic processes, that is, random ones. Some important biological events are based on incredibly small probabilities but when that event happens a whole cascade of actions occur. With enough cells and enough time, even extremely low probability events can happen, even one in a million. Sometimes the cascade is beneficial, allowing us to fight off infectious disease. Sometimes, the cascade is harmful, creating cancer.

We can do things to reduce the likelihood of harmful biological events, often by essentially reducing the probability of the cascade getting started. But if people make decisions that alter the odds, then it becomes even more likely that a low probability event will happen. Thus smoking so substantially increases the odds of lung cancer that it actually becomes the single major cause.

Let’s take a look at some numbers using well known mathematics, in order to understand why some of this happens.

Take solid fuel boosters. According to Wikipedia, these have a 1% failure rate. While a 99% success rate sounds great, doing it enough times produces a very different result. Simple math shows that the chances of having 50 successful launches is 0.99 X 0.99 X 0.99 … for fifty times (0.99 50). Using a calculator, we find an overall success rate of 61% – a failure rate of 1- 0.61 or 39%. It is even money that there would be one failure after 70 launches. One hundred launches would reduce the chance of all the launches being successful to 37%. The chances of having about 230 successful launches without a failure would be less than 10%.

The more times you do an event, the greater the chance that at least one event will result in failure.

The Challenger tragedy happened on the 25th mission of the shuttle. There are two solid rocket boosters on each mission. The chances that a catastrophic failure of the solid rocket boosters would occur in the first 25 missions of the Space Shuttle (50 solid rocket boosters) is thus 39%. Unfortunately, that particular shuttle mission did not beat the odds. But the chances were that some mission was going to fail catastrophically, even with a 1% failure rate.

Yes, the chance for any one particular launch failing might be 1% but the chance of all the missions succeeding drops quite rapidly. The goal is not necessarily making sure a specific mission is a success but all the launches. If the loss of even one launch could be catastrophic, either in personnel or material, then the best thing to do is really work at reducing the failure rate below 1%. Make it 0.1% and it would take about 2400 launches before the the chance of at least one failure occurring reached 90%.

A success rate of 99.99% would mean that is would take almost 24,000 launches before the chance of one failure was greater than 90%.

So, even with a 1% failure rate, the chances are almost certain that if you do enough missions, there will be a failure. And, while reducing the failure rate can have huge effects, given enough time and scale of the enterprise, disasters will still occur.

Let’s do some more math. I’m making up some of these numbers just to give you an idea of how large numbers and time can change the odds. Let’s say that there are 6000 offshore wells around the world and that the chance one of them will have a catastrophic blowout, as we see in the Gulf, in any one year is 0.000001 – a failure rate of one in a million. So the chance of any failures worldwide in any year works out to 0.6%, not quite as thrilling as one in a million but pretty low. I mean we ran the entire Space Shuttle program for almost 30 years with worse odds than that.

But that is only over 1 year. Doing the math, there is only about a 50% chance of complete success over a 100 year period. That means that even with the odds being one in a million of a catastrophic failure occurring on any one offshore rig in one year, over time and given enough rigs, it is almost certain that there would be a catastrophic failure.

And that is for a one in a million chance. I don’t know what the odds really are but I have the feeling that the processes used on the Deepwater platform were not designed to produce the highest success rates.

In my experience, when a process is developed with a certain failure rate, human beings, unless strictly supervised, will act to cut corners or alter the process, in a way that increases that failure rate. As time goes on, they wonder why they have to keep spending so much time on things that may not appear to be really useful. They take shortcuts that create suboptimal circumstances. People not directly involved want to know why things can not be sped up.

With the Shuttle, the failure rate may have been 1% but some of the decisions made before the Challenger launch most likely increased that percentage. That is, if there is a 1% failure rate under optimal conditions what is does the failure rate become under suboptimal conditions? Well, obviously more than 1%.

From the data we have been seeing on the well in the Gulf, there seem to have been ample evidence that it was not a normal well and normal odds would not have applied. Thus, they should have taken even more precautions to prevent the catastrophe that, mathematically, will eventually happen.

But human nature being what human nature is, they did the exact opposite. They cut corners and tried to get an expensive well under control while doing even less than the minimal tests of safety.

So, in order to reduce catastrophic disasters we need to not only develop processes that greatly reduce the intrinsic failure rate. We need to include all sorts of human-based pressures to prevent people from increasing the rate of failure.

This is not an area to decrease regulation. At least if we want to prevent catastrophe.