Getting to Yes The Flight Readiness Review
As its name suggests, a Flight Readiness Review, or FRR, gives teams responsible for various elements of a NASA flight mission an opportunity to ensure technical questions raised at earlier reviews have been adequately dealt with and to raise concerns about anything else that might affect mission success. Typically held about two weeks before a scheduled launch, the reviews gather team members in one meeting room, where they report on their areas of responsibility and, at the end of the session, express their judgment in a go or no-go flight decision. Most often, technical issues that could affect the flight are studied and resolved by engineers before the meeting; their work is reviewed and discussed and the session usually ends in a unanimous go decision.
STS-119, the March 2009 Discovery flight to the International Space Station (ISS), was an exception. Getting to a positive launch decision took three FRRs, including a marathon second session, where frustratingly incomplete technical data led to uncertainty, some disagreement, and, finally, the decision that STS-119 would not be declared ready for flight. This unusual experience vividly demonstrated that the FRR process worked as intended, providing an open forum for voicing and examining concerns about flight safety and success and a focal point for rigorous technical work.
A Broken Valve
On November 14, 2008, as Endeavour rocketed skyward on STS-126, flight controllers monitoring data noted an unexpected hydrogen-flow increase from one of the shuttle’s main engines. Since three flow-control valves (one per engine) work in concert to maintain proper pressure in the hydrogen tank, one of the other valves reduced flow to compensate for the valve that malfunctioned.
Understanding the causes and implications of the failure was essential to the safety of future shuttle missions. Management would have to promote and ensure open communication among the multiple organizations involved in the shuttle program so that all relevant information would be available to decision makers with the responsibility to approve or delay future shuttle flights.
We knew at least on paper the consequences could be really, really bad, and this could have significant implications for the orbiter fleet and, most urgently, the next vehicle in line. Depending on where the vehicle landed, we wanted to get these inspections done and some X-rays done as quickly as we could, said John McManamen, chief engineer of the Space Shuttle Program.
Shuttle and ISS program managers preferred launching STS-119 prior to mid-March so it would not interfere with the March 26 mission of the Russian Soyuz to transport the Expedition 19 crew to the ISS. If the launch was delayed until after the Soyuz flight, interdependencies in the schedule would require a reevaluation of other future launches.
STS-126 touched down at Edwards Air Force Base on November 30 after unfavorable weather conditions at Kennedy Space Center led flight controllers to divert the landing to California. This delayed work until December 12, when the shuttle was ferried back to Kennedy aboard a specially equipped 747.
A December 19 X-ray showed evidence of a problem with a poppet, a kind of tapered plug that moves up and down in the valve to regulate flow. Inspection determined that a fragment had broken off, the first time such a problem had occurred during flight, although there had been two similar failures in the early 1990s during testing of a new set of flow-control valves for Endeavour.
There were a total of twelve flight-certified valves in existence: three in each shuttle, and three spares. Simply buying more was not an option—these custom parts had not been manufactured in years, and NASA had shut down its flow-control valve acceptance-testing capability.
With the launch scheduled for February 19, the program scheduled a Flight Readiness Review for February 3. At that review, it quickly became clear that the engineering and safety organizations felt that significant work needed to be done before a sound flight rationale could be established. Steve Altemus, director of Engineering at Johnson Space Center, summarized the knowledge gap from the Johnson engineering community’s point of view: We showed up at the first FRR and we’re saying, ‘We don’t have a clear understanding of the flow environment; therefore, we can’t tell you what the likelihood of having this poppet piece come off will be. We have to get a better handle on the consequences of a particle release.’ The most important outcome of the meeting was the establishment of new lines of inquiry that could lead to better understanding.
On February 6, the launch was delayed until February 22.
Analysis of the cracked valve showed that the failure resulted from high-cycle fatigue (in which a material is damaged by numerous cycles of stress). This raised several questions. Had STS-126 presented an unusual environment, or was another valve likely to break in normal flight? What would be the worstcase consequences of a break? Engineers needed to determine the probable size and the maximum size of a loose particle, understand how it would move through the propulsion system, and what the system could tolerate without experiencing a potentially catastrophic rupture in its lines.
Teams worked on the problem from multiple angles, including materials, structural dynamics, computational fluid dynamics (CFD), and fracture mechanics. Initial efforts relied on visual inspection and nondestructive evaluation (NDE) techniques, including scanning electron microscopy. The microscopes could see small cracks only after the poppet was polished, however, and polishing invalidated the flight certification of the hardware. A polished poppet could upset the flow balance of the valve, rendering it unusable for flow management. In this case the valve could get stuck in the high-or low-flow positions, which could cause a serious issue in flight, said Steve Stich, the orbiter project manager. In order to ensure that a polished poppet was properly balanced required testing using the system that had been shut down at the White Sands Test Facility in the late nineties. So we were in a bit of Catch-22 situation with respect to performing the best possible NDE.
The Orbiter Project authorized impact testing at Glenn Research Center, Stennis Space Center, and the White Sands Test Facility to learn more about whether a fragment of a broken poppet would puncture the pressurization lines downstream of the valve. The data from these tests and other analyses contributed to a probabilistic risk assessment of the entire flow-control valve hydrogen-repress system. At the same time, the CFD analysts figured out the velocity and spin of a given-sized particle as well as the probable path it would travel through the elbow-joint turns in the pipe.
As data began to come in from these tests, the program decided to convene a second FRR on February 20, although some members of the engineering and safety organizations expressed doubts about the timing of the review.
One NDE technique that was initially dismissed was an eddy-current system, because the size of the probe head was too large for the valve.
The Marathon FRR
The second FRR for STS-119 lasted nearly fourteen long hours, and the outcome was not clear until the end. It was much more of a technical review than typical Flight Readiness Reviews. There was a lot of new data placed on the table that hadn’t been fully vetted through the entire system. That made for the long meeting, said FRR Chairman Bill Gerstenmaier.
Well over a hundred people were in the Operations Support Building II at Kennedy Space Center, seated around the room in groups with their respective organizations as technical teams made presentations to the senior leaders on the FRR board. Some participants believed that the analysis done on the potential risk of a valve fragment puncturing the tubing that flowed hydrogen from the external tank to the shuttle main engines showed that the risk was low enough to justify a decision to fly. Others remained concerned throughout that long day about the fidelity of the data, and that they didn’t know enough about the causes of the valve failure and the likelihood and risk of its occurring again.
Despite the tremendous amount of analysis and testing that had been done, technical presentations on the causes of the broken valve on STS-126 and the likelihood of recurrence were incomplete and inconclusive. Unlike at most FRRs, new data, such as computations of loads margins that couldn’t be completed in advance, streamed in during the review and informed the conversation. A chart reporting margins of safety included TBD (to be determined) notations.
Doubts about some test data arose when Gene Grush received a phone call from Stennis informing him that the test program there had used the wrong material. I had to stand up in front of that huge room and say, ‘Well there’s a little problem with our testing. Yes, we did very well, but the hardness of the particle wasn’t as hard as it should have been.’ That was very critical because that means that your test is no longer conservative. You’ve got good results, but you didn’t test with the right particle, he said.
NASA Chief Safety and Mission Assurance Officer Bryan O’Connor remarked, Gerst [Gerstenmaier] was absolutely open. He never tried to shut them [the participants] down. Even though he could probably tell this was going to take a long time, he never let the clock appear to be something that he was worried about.
Toward the end of the meeting, Gerstenmaier spoke about the risks to the ISS program and to the shuttle schedule of not approving Discovery’s launch. A few participants perceived his comments as pressure to approve the flight. Others saw it as appropriate context-setting, making clear the broader issues that affect a launch decision. After he spoke, he gave the groups forty minutes to caucus, to discuss what they had heard during the day and decide on their recommendations. When they came back, he polled the groups. The engineering and safety organizations and some center directors in attendance made it clear that they did not find adequate flight rationale.
Bill McArthur, safety and mission assurance manager for the Space Shuttle at the time, said, The fact that people were willing to stand up and say, ‘We just aren’t ready yet,’ is a real testament to the fact that our culture has evolved so that we weren’t overwhelmed with launch fever, and people were willing to tell Bill Gerstenmaier, ‘No, we’re no-go for launch.’
As the participants filed out of the meeting, Joyce Seriale-Grush said to Mike Ryschkewitsch, This was really hard and I’m disappointed that we didn’t have the data today, but it feels so much better than it used to feel, because we had to say that we weren’t ready and people listened to us. It didn’t always used to be that way.
Charles Bryson, an engineer at Marshall Space Flight Center, used his eddy-current probe equipment with a relatively large probe head to inspect a poppet and his inspection, confirmed by other analysis, indicated that the eddy-current inspection technique showed promise in finding flaws. Propulsion Systems Engineering and Integration Chief Engineer at Marshall Rene Ortega told colleagues from the Materials and Processes Problem Resolution Team about Bryson’s eddy-current inspection results. Ortega helped arrange for Bryson to examine several poppets at Boeing’s Huntington Beach facility. Bryson then worked collaboratively with a team from Johnson led by Ajay Koshti, an NDE specialist with expertise in eddy-current investigations. Koshti brought an eddy-current setup with a better response than Bryson’s, and together they arrived at a consistent inspection technique.
Once we were able to screen flaws with the eddy current and there wasn’t a need to polish poppets with the process, Ortega explained, we had a method by which we could say that we … thought we’re pretty good at screening for non-polished poppets.
Engineers had found that some of the smaller flaws identified in the poppets didn’t seem to be growing very fast. Through that exercise, we came up with the suggestion that, ‘Hey, it doesn’t look like these flaws are growing out very rapidly in the flight program, and with the screening of the eddy current we can probably arrive at a flight rationale that would seem to indicate that those flaws being screened by the eddy current wouldn’t grow to failure in one flight,’ Ortega said. The eddy-current technique was not a silver bullet, but in conjunction with the other techniques and test data, it provided critical information that would form the basis for sound flight rationale.
The Final FRR
With the results from the test programs all now supporting a shared understanding of the technical problem, there was wide consensus among the community that the third Flight Readiness Review, on March 6, would result in a go vote.
By the time we eventually all got together on the last FRR the comfort level was very high, said O’Connor. For one thing, everybody understood this topic so well. You couldn’t say, ‘I’m uncomfortable because I don’t understand.’ We had a great deal of understanding of not only what we knew about, but what we didn’t know about. We had a good understanding of the limits of our knowledge as much as possible, whereas before we didn’t know what those were.
The FRR board agreed and STS-119 was approved for launch on March 11. After delays due to an unrelated leak in a liquid hydrogen vent line, Discovery lifted off on March 15, 2009, and safely and successfully completed its mission.