The Mars Pathfinder mission was widely proclaimed as "flawless" in the early days after its July 4th, 1997 landing on the Martian surface. Successes included its unconventional "landing"bouncing onto the Martian surface surrounded by airbags, deploying the Sojourner rover, and gathering and transmitting voluminous data back to Earth, including the panoramic pictures that were such a hit on the Web. But a few days into the mission, not long after Pathfinder started gathering meteorological data, the spacecraft began experiencing system resets. The press reported these failures in terms such as "software glitches" and "the computer was trying to do too many things at once".
On December 3rd, 1997 I attended a fascinating talk by David Wilner, Chief Technical Officer of Wind River Systems, maker of VxWorks, the real-time embedded systems kernel used in the Mars Pathfinder mission, who explained the software flaw. I sent a description of his talk entitled "What really happened on Mars?" to a few friends in the systems community, after which it was widely circulated. Among other places, it appeared in Peter G. Neumann's moderated Risks Forum (comp.risks) on Tuesday, 9 December 1997 in issue RISKS-19.49.
On December 15th I was honored to receive this detailed first-hand account of "What really happened on Mars?" from Glenn Reeves of JPL, who led the software team for the Mars Pathfinder spacecraft. It contains both a far more detailed and accurate description of the problem than I sent and many valuable insights into what factors, both at design time and mission time, enabled Pathfinder to be such a stunning success, overcoming this one flaw, and continuing with its mission. I highly recommend his account.
One other footnote. Since I sent my original message, a number of people replied saying that the priority inversion problem and solutions to it were known much prior to Sha et al.'s 1990 article. More history on priority inversion is available here.
Finally, I'd like to state publicly that I greatly admire what the Pathfinder team accomplished. While my account starts by describing a software flaw, it also describes a just few of the many, many things that were done right, enabling the flaw to be tolerated, identified, and fixed in the field. Pathfinder, including its software systems, is a tremendous engineering and scientific success story. I know we'll be learning things from it for many years to come. My hope is that these accounts will help others to learn from and emulate their successes.
Michael B. Jones
Redmond, Washington -- December 16th, 1997
Last modified December 16, 1997.