Apollo 12: A Detective Story
Almost forty years ago, when I worked for Fairchild Semiconductor, I received an unusual telephone call from Andy Procassini, head of Fairchild Quality Assurance. Andy asked me, first, could I keep confidentiality about the topic he was calling about and, second, could I immediately come to the Mountain View facility and meet with him and a few other Fairchild folk? Of course, the only possible answer to such a request is, Yes, sir. Be there in thirty minutes.
I drove to the Mountain View plant and was ushered into a conference room with six or seven others. Andy immediately came to the point. You know, he said, that Apollo 11 successfully landed on the moon on July 20, and all three astronauts are now safely home. Well, they [NASA] are planning on fueling up Apollo 12 for an early November launch, but there’s a problem. That’s why you people are here.
While all systems tested go on the Apollo 12 Saturn rocket, Command Module (CM), and Lunar Module (LM), a problem had been detected in a later version of the LM radar transponder being put together at the Grumman facility in New Mexico. The failure was in a Fairchild linear amplifier identical to the one already installed and deeply buried in the electronics of the Apollo 12 LM. If that one failed, docking would be impossible. For obvious reasons, this was not an acceptable risk.
The questions put to the head of quality and reliability at Fairchild were, considering these devices had been tested umpteen times before installation and were operating properly, what went wrong with the failed amplifier? And, given that fueling of Apollo 12 was scheduled for the next ten days or so, what was the likelihood that the current properly operating device would fail during the mission?
Andy asked us for an answer within days and told us to be ready to go to Johnson Space Center in Houston to discuss the results of our investigation and analysis.
Building a Team
The five of us hardly knew each other; Mike was from marketing, Frank and Charlie from manufacturing, and I was from research and development. We had little in common, but here we were with a major problem that had to be resolved in days. We immediately got together and analyzed the data Fairchild had received. We had no physical evidence yet; the device that had failed was part of a disassembled LM in New Mexico. Other linear amps from the same batch were also being retested, including those in the lunar modules for Apollo 13 and 14, but the devices were not available.
On the other hand, we had all the test data; the devices made available for NASA met the highest test standards available, Mil- Standard-883, and all devices in this batch had been tested and retested. So the first and most obvious question was how had our highest test standards passed devices that so quickly failed? Either the test procedures were at fault and we had passed bad devices, or the device failed because of something that occurred after the tests. Since the people who assembled the radar unit were not part of Fairchild and had obviously tested the device and unit subsequent to our selling the devices, we immediately suspected some sort of failure that occurred after device assembly into the radar module. We could rule out examination of our test procedures (even though we did look into these) and recognize that the device somehow failed after assembly.
We were indeed fortunate that our hastily assembled team got along; we did so because we all recognized we were becoming part of history, and we did not want history to record that Fairchild caused a delay of the second Apollo lunar landing. Furthermore, we all knew of each other, at least by reputation, and respected each other’s technical abilities, so there were no serious ego problems. Finally, we had a deadline to meet. There is nothing like a hard deadline to promote cooperation among dedicated technologists.
The next set of data to reach us was disheartening; other devices in the same batch, including another Apollo LM device, had also failed in NASA tests as they concurrently tried to trace the nature of the problem. The NASA problem escalated into a major field problem, as this device had been sold to a number of other customers, including the Department of Defense. If Fairchild had a batch of faulty devices incorporated into many sensitive applications, there could be enormous consequences beyond delaying a scheduled Apollo liftoff.
Charlie Gray and Frank Durand were responsible for manufacturing quality control, so they immediately got to work looking at the manufacturing records of these devices. In anticipation of exactly this kind of situation, Mil Standard devices had extensive traceability back to the sources of all parts used. My role was to analyze failed devices and come up with a plausible story of how and why they failed and, furthermore, make some sort of recommendation about the future of the specific device that was still functioning properly and installed on the Apollo 12 LM, already on the launchpad and being readied for fueling.
Our first act was to gather a number of other highreliability devices manufactured at the same time and retest them. Ordinarily, this should be unnecessary, since the highreliability testing was extensive and redundant, and 100 percent of the devices should pass rescreening. Imagine our surprise when a number of our stored devices failed this test, for test characteristics similar to those that failed the NASA tests. At about the same time, we received and retested some of the failed NASA devices. (Of course, they failed, too!)
The next step was obvious: open the hermetically sealed devices and see if we could identify the cause of failure. This part of the failure analysis was trivial; the cause was as obvious as it was astonishing. Basically, no bond wires connected the chip to the outside world. None! So solving why the devices failed was indeed trivial, but how had the wires disappeared?
The assembled devices were encased in a ceramic package sealed with a high-temperature sealing glass in a special furnace, a process called hot cap sealing, prior to final testing. First, the completed chip was attached to a cavity in one part of the package, using conventional die-attach processes. A metal lead frame was embedded in a thin layer of a high-melting-temperature sealing glass; this lead frame was the conduit of current and voltage to the external world from the embedded chip. The chip was connected to the lead frame through aluminum wires wire bonded to the aluminum-coated lead frame, again using conventional semiconductor assembly processes.
The wire-bonded bottom half of the ceramic package was then sealed to an upper cavity. In the hot cap sealing process this upper part of the package, which contained a layer of hightemperature sealing glass, was heated to a temperature sufficient to melt the layer of glass, and the top part of the package was pressed onto the bottom part of the package, also heated to melt its glass layer. The two layers of molten glass would join and weld the parts of the package together. The chip was hermetically embedded in the sealed cavity, and the electrical signals would pass through the glass seal by way of the embedded metal lead frames. It turned out that the temperature at which the ceramic parts were heated needed to be controlled to within a few degrees centigrade. This process had failed. The devices were sealed at too high a temperature; this excessive temperature was the most important cause of subsequent device failure.
Talking to NASA
Fortunately, the failure analysis took only a few days, so we had time to go to Houston to discuss the issue before a forced delay in fuel loading of Apollo 12 was to begin. Andy Procassini suggested we as a team go to Houston to tell them of our findings.
We arrived the day before our review with a host of NASA decision makers. We spent the night at our motel rehearsing our message. We discussed our strategy for the meeting and decided on the answer we knew we must be prepared to give and defend at the end of our presentation. Mike was chosen to talk about the devices, the architect to talk about its characteristics, and I would talk about the nature of the failure and its implications for Apollo 12.
We were ushered into the meeting at 8:30 a.m. We listened to NASA and Grumman engineers define the exact nature of their problem: a potential for a failed radar transponder after the LM left the surface of the moon, resulting in an impossibility of docking with the CM. The Grumman engineer gave an impressive talk about the device, stating what would happen if this or that particular pin failed for just about every possible combination of pin failures. This guy knew his radar system!
As he was talking, I looked around the room. Ten or twelve NASA officials, including Jim McDivitt and George Low, the ultimate decision maker, sat at a long table along with engineers responsible for the CM, the LM, the radar system, the fueling operation, and other elements. The hanging lights illuminating the table left the rest of the room in gloom; in this gloom were the attendees from Fairchild, from Grumman, from other Apollo spacecraft manufacturers, scientists, and engineers—perhaps another dozen people.
My presentation was quite simple. The hot glass sealer had exceeded its temperature for a brief time, heating the glass beyond its normal sealing temperature. As a result, the glass seal was porous and allowed moisture to diffuse into this otherwise hermetic package. High levels of moisture combined with contaminants infused at the same time corroded the aluminum bond wires, leaving the appearance of no bond wires. Jim McDivitt asked if that meant aluminum would always dissolve in the presence of moisture, implying that, if so, the devices on Apollo 12 and 13 were time bombs ready to fail at any time. I said no, people always boiled water in aluminum containers. It took more than moisture or even contaminants; only specific contaminants attacked the thin aluminum oxide layer that protected all aluminum from instant corrosion. George Low suggested that since the contaminants present in the failed devices were likely present in the unfailed devices, they were still time bombs. In my view, the failed devices had failed months or years ago when the non-hermetic packages had been exposed to sufficient moisture and contaminants; currently operating devices were not likely to fail in the future, especially devices embedded in protective plastic as part of the lunar module assemblies.
George Low then asked the question I will always remember: Dr. Meieran, would you fly this bird? This was at 11:25 a.m., according to the clock that looked like Big Ben to me, on the wall in back of the long conference table. My response was, Yes, I think it is safe to fuel Apollo 12, as the probability of this device failing is very, very small. I knew that moisture diffused into even a badly sealed package and aluminum dissolved at a measurable rate quite fast compared to the time between assembly of the device and its encapsulation in the LM radar system. It seemed reasonable to believe that any corrosion that would occur had occurred already. This hypothesis was confirmed by examination of a large number of devices with different date codes.
For the next half hour, the NASA engineers discussed the implications of our findings. As the minute hand on the clock approached twelve, George Low announced, It’s a go. Looking back, I think my comment about being able to boil water in aluminum containers made the difference. While all these people were highly intelligent engineers, they were not corrosion scientists. Using a practical example they could relate to helped them understand my recommendation and trust it.