Lessons Learned Applied To Space System Development - ASQ

subduedjourneyΛογισμικό & κατασκευή λογ/κού

28 Οκτ 2013 (πριν από 3 χρόνια και 9 μήνες)

70 εμφανίσεις

Space
System Development:
Lessons Learned

(Excerpts)


Conference on Quality in
the Space and Defense
Industries

March 14, 15, 2011



Joe
Nieberding

© 2006 All Rights Reserved. Aerospace Engineering
Associates
LLC

Presenter

2

Joe Nieberding:


Mr. Nieberding has over 40 years of management and technical
experience in leading and participating in NASA independent review
teams, and in evaluating NASA advanced space mission planning. Before
retiring from NASA GRC in 2000, under his direction numerous studies
were conducted during 35 years at GRC to help select transportation,
propulsion, power, and communications systems for advanced NASA
mission applications. His Advanced Space Analysis Office led all
exploration advanced concept studies for GRC. In addition, he was a
launch team member on over 65 NASA Atlas/Centaur and Titan/Centaur
launches, and is a widely recognized expert in launch vehicles and
advanced transportation architecture planning for space missions. Mr.
Nieberding is co
-
founder and President of Aerospace Engineering
Associates.


© 2006 All Rights Reserved. Aerospace Engineering
Associates
LLC

Introduction

3


Excerpted from two day presentation aimed at assisting today’s
space system developers


Explore overarching fundamental lessons derived from


Many specific mishap case histories from multiple programs


“Root” causes not unique to times/programs


Will cover some material from the two day presentation:


A few of the detailed case histories


A summary of causes for all case histories


Example countermeasure “Rules of Practice”


References given for all resource information


Lessons learned charts (yellow background) were either developed
independently by Aerospace Engineering Associates(AEA) or extracted
from resource information





It
ain’t

what you don’t know that gets you into trouble.

It’s what you know for sure that just
ain’t

so.

Mark Twain

© 2006 All Rights Reserved. Aerospace Engineering Associates LLC


4

2 Day Outline




Introduction


The Practice of Failure Analysis


Space Mission Record of Success


General Management Lessons


Lessons Learned from Specific Case Histories


Screening
Out Design Errors


Impact
of Weak Testing Practices


S
creening
Out Procedural
Errors


System Engineering Lapses


Mishaps
Associated With Software


When Processes Break Down


Adverse Program Management Factors Can Produce Bad
Outcomes



A Piece Part Failure


Not
Everyone May Want the Project to Succeed


Experienced Teams make Mistakes


Normalizing Deviance


When Advanced Warnings are
Missed


The Perils of Heritage





© 2006 All Rights Reserved. Aerospace Engineering
Associates LLC

2 Day Outline (concluded)

5





Summary of Causes for the Foregoing Case
Histories


The Unsuccessful Failure Investigation of Atlas
Centaur 70


Common Cause Failures


The Human Element


Applying the Lessons: Sample “Rules of
Practice”


One Strike and You’re Out!


Flight Termination


Conclusions











Politicians are like diapers; They need to be changed often and for the same reason

Mark Twain

© 2006 All Rights Reserved. Aerospace Engineering Associates LLC


6

Historical Perspective

© 2006 All Rights Reserved. Aerospace Engineering
Associates
LLC

7

The Practice of Failure Analysis

Case

Event

The Milan Cathedral

Wall collapse

The
Tay

Rail Bridge

Bridge collapse


75 fatalities

Kansa City

Hyatt Regency Skyway

Skyway collapse


114 fatalities

American Airlines Flight 96

Separation of DC
-
10 aft cargo door


no fatalities

Turkish Air Flight 981

Separation of DC
-
10 aft cargo door


346 fatalities

Tacoma Narrows Bridge

Bridge collapse

Russian R
-
16 ICBM

Pad explosion
-

>120 fatalities

© 2006 All Rights Reserved. Aerospace Engineering
Associates LLC


Baikonur

Cosmodrome

Russia, 10/24/1960


Preps for first test flight of R
-
16 ICBM


Program rushed to launch on anniversary
of Bolshevik revolution (as a present for
Premier Khrushchev
)


Lead by head of the Soviet Ballistic Missile
Forces Marshal
Mitrofan

Nedelin


250 people on and around pad


Viewing stand for visiting dignitaries


Unsafe design and undisciplined
procedures

caused 2
nd

stage ignition


More than 120 people were killed including
Nedelin



8

Historical Perspective: Prominent Failures from Across the
Spectrum of Engineering Endeavors

Mitrofan

Nedelin

R
-
16 ICBM

Destroyed Pad and Memorial at
Baikonur

(
Tyuratam
)

Video

Possibly The Largest Disaster in the History of Rocketry!

For additional information see “Rockets and People: Creating a Rocket Industry, Volume
II”, Boris
Chertok
, NASA History Series SP
-
2006
-
4110

© 2006 All Rights Reserved. Aerospace Engineering Associates LLC


9

Design Screens

© 2006 All Rights Reserved. Aerospace Engineering Associates LLC


10

A Quick Aside About Design Error “Screens”

Design Error


“Screens”

Design Review

Test

Unexpected

Behavior

Design

Error

GIVEN:

Our design “machine” (humans)

WILL

produce errors at some >0 rate

“Engineers today, like Galileo three and a half centuries ago, are not superhuman. They
make mistakes in their assumptions, in their calculations, in their conclusions. That they
make mistakes is forgivable; that they catch them is imperative.”
(1)


(1)
“To Engineer is Human”; Henry
Petroski
, Vintage Books, 1992

© 2006 All Rights Reserved. Aerospace Engineering Associates LLC


11

Selected Mishaps

© 2006 All Rights Reserved. Aerospace Engineering Associates LLC


Genesis

12



Underlying Issue:

Omitted test combined with flawed
adaptation of heritage design


Problem:

Spacecraft failed to properly deploy drogue
chute (9/8/2004)


Impact:
Loss of some scientific data


Source: http://www.nasa.gov/pdf/149414main_Genesis_MIB.pdf;
Genesis Mishap Report, Dr. M.
Ryschkewitsch

Chairperson, 11/30/2005;
Presentation: Genesis Mishap Investigation and Stardust Entry, Dr. Mike
Ryschkewitsch

and Pete
Spidaliere

Video

© 2006 All Rights Reserved. Aerospace Engineering
Associates
LLC

13

Genesis G
-
Switch Orientation

Acceleration


to Activate Switch

Aerobraking


Acceleration

As Installed

Velocity

Heatshield

Pyros

© 2006 All Rights Reserved. Aerospace Engineering Associates LLC


14

Genesis (cont’d)



WHY: Improperly oriented gravity switch sensors
(inverted). Deficiencies in the following processes
resulted in the mishap:


Design that inverted the G
-
switch sensor (a heritage design)


Design reviews did not detect the error


Verification processes did not detect the design error


No tests were conducted that would reveal the problem


Red Team review did not uncover the failure in the verification
process


© 2006 All Rights Reserved. Aerospace Engineering
Associates
LLC

Genesis (cont’d)

15


The Board further identified ineffective systems
engineering as a root cause:


Inadequate project and systems engineering management


Inadequate systems engineering processes


Inadequate review process


Unfounded confidence in heritage designs


Failure to “Test like you fly”


Better/Faster/Cheaper philosophy
-

quote from MIB Report:








“Root Cause 6.1: Faster, Better, Cheaper (FBC) philosophy: Cost
-
capped
mission with threat of cancellation if overrun…

Findings:


The project maintained the cost
-
cap, in part at the expense of adequate
technical oversight by JPL into LMSS Flight System and at the expense of
a complete and robust Systems Engineering function.


The Agency was at fault for encouraging and accepting the FBC philosophy
as described above.”

© 2006 All Rights Reserved. Aerospace Engineering Associates LLC


Genesis (concluded)

16

LESSONS:


Imposition of a concept (Better/Faster/Cheaper) absent
sensible
,
practical, and reliable implementation guidance is a recipe for
serious
trouble


Treat changed heritage designs as new designs


Make it
very difficult

to change
baselined
* test plans


Test like you fly


and pay attention to when you don’t


Don’t let system reviews get superficial (checking the block)

*Those adopted after appropriate vetting activities

© 2006 All Rights Reserved. Aerospace Engineering Associates LLC


17

CONTOUR


Underlying Issue:

Erroneous

prediction of
spacecraft thermal environment


Problem
:
Spacecraft broke up following SRM firing
(8/15/2002)


Impact:

Loss of mission

© 2006 All Rights Reserved. Aerospace Engineering Associates LLC


CONTOUR (cont’d)

18


Why: Spacecraft overheating caused by
improper installation of a “heritage” SRM


Inadequate systems engineering process


Inappropriate reliance on analysis by similarity


Inadequate review function


Dubious decision to omit telemetry coverage of motor
firing event


Inadequate oversight, insight, and review of
subcontractors


Inadequate communications between APL and ATK


ATK models not specific to CONTOUR


Limited understanding of the SRM plume heating
environments in space


Limited understanding of CONTOUR SRM operating
conditions



Source: Contour Mishap Investigation Board Report, May 31, 2003;


http://klabs.org/richcontent/Reports/Failure_Reports/contour/contour.pdf



© 2006 All Rights Reserved. Aerospace Engineering Associates LLC


19

CONTOUR (concluded)

LESSONS:


Heritage designs must be re
-
qualified for new applications


Systems engineering is absolutely vital to mission success


in
this case it should have:


Challenged the flawed heritage assumption


Objected to the use of invalid models


Insisted on a more complete understanding of SRM plume

heating


Involve subcontractors
early in the design process


They need to understand and “buy in” to how their product is

integrated

© 2006 All Rights Reserved. Aerospace Engineering
Associates
LLC

Ariane

5

20


Underlying Issue: Unwarranted reliance on
heritage software


Problem: Forty seconds into maiden Ariane
-
5
flight (6/4/1996), vehicle veered off course and
broke
-
up


Impact:
Loss of mission


Why: Flight software error


The flight software was programmed for Ariane
-
4
launch and trajectory conditions


Didn’t account for higher horizontal velocity of Ariane
-
5


Caused IRU software overflow error resulting in loss of
guidance information


Never tested in conditions that simulated the Ariane
-
5
trajectory




Source: I
-
Shih Chang, Space Launch Reliability
-




http://sunnyday.mit.edu/accidents/Ariane5accidentreport.html

Video

© 2006 All Rights Reserved. Aerospace Engineering Associates LLC


21

Ariane

5 (concluded)

LESSONS
:


Technical experts need to push back against baseless
management directives


Be very thorough in justifying dependence on previous
“heritage” hardware or software development/testing


Have the decision to accept “heritage” verifications examined
in an IV&V mode


Test
like you fly and fly like you test

© 2006 All Rights Reserved. Aerospace Engineering
Associates
LLC

Lewis Spacecraft

22


Underlying Issue: Misapplication of heritage system


Problem: Spacecraft tumbled out of control 8/26/1997


Impact:
Loss of spacecraft

© 2006 All Rights Reserved. Aerospace Engineering
Associates
LLC

23

Lewis Spacecraft (cont’d)


Why:
Proximate Cause
-

Inoperable ACS safe mode


Spacecraft had multiple anomalies during initial operations


Contact lost for two orbits


Reappeared in uncontrolled attitude mode


Commanded to “safe mode”


“Safe mode” adopted from Total Ozone Mapping Spacecraft


Inherently unstable in Lewis application (no X
-
axis gyro)


In spite of serious “cause unknown” anomalies, operations crew
entered rest period


X
-
axis rates due to thruster imbalances


Rates transferred to Y and Z axes (
Polhode

Motion)


Computer shuts down excessive thruster firings


Spacecraft rates transferred to principal moment of inertia axis


Edge on to Sun
-

battery discharged ~ 72%


Attempt to recover was flawed and failed


Spacecraft went out of contact and was never reacquired


Only one crew conducted all on
-
orbit operations (One 12 hour
shift/day)


No crew on duty during significant periods when spacecraft in view of
ground station




Source
: Lewis Spacecraft Mission Failure
Investigation Board Final
Report, February 12,
1998

http://www.lr.tudelft.nl/live/pagina.jsp?id=a8b6dca2
-
92dc
-
4965
-
a64c
-
298189e5b58e&lang=en&binary=/doc/lewi s_document.pdf

Polhode

Motion

Safe Mode

X Axis Spin

© 2006 All Rights Reserved. Aerospace Engineering Associates LLC


24

Lewis Spacecraft (cont’d)


Root Causes:


No mutual contractor/government understanding as to what
is meant by “Better/Faster/Cheaper” leading to:


Requirements changes without adequate resource adjustment


Undue cost and schedule pressures


Inadequate ground station availability for initial operations


Frequent key personnel changes


Inadequate engineering discipline


Inadequate management discipline


Active NASA oversight and management absent


Senior management imposition of an ill
-
defined concept
(Better/Faster/Cheaper)*

*While the BFC thrust was abandoned after multiple disappointing outcomes, vestiges (both good and bad) remain.

© 2006 All Rights Reserved. Aerospace Engineering Associates LLC


Lewis Spacecraft (cont’d)

25

LESSONS:

With respect to the proximate cause
:


“Heritage” hardware/software is often a
trap


Flag any proposed use of heritage designs for special attention


Challenge applicability and understand its qualification history


Make certain that the true heritage (especially the limitations) is
fully
understood


Even presumably qualified heritage items need to be
functionally tested in the way they will fly!

© 2006 All Rights Reserved. Aerospace Engineering Associates LLC


Lewis Spacecraft (concluded)

26

LESSONS: (
concluded)

With respect to the root
causes
:


Imposition of a
concept
(Better/Faster/Cheaper) absent
sensible, practical, and reliable
implementation guidance is
a recipe for serious trouble


Take great care to select qualified people to run a program
-

when it’s clear they’re not right for the job, replace them

© 2006 All Rights Reserved. Aerospace Engineering Associates LLC


27

Causation Summary

© 2006 All Rights Reserved. Aerospace Engineering Associates LLC


28

Distribution of

Proximate Causes


1

7

7

10

12

27

Dev Test
Analysis
Qual Test
Sim
Heritage
Engineering
25 Design Proximate Causes

Nature of Deficiencies

23%

69%

Design

Prod/Ops

Pgm

Mgt

8%

Distribution of
Root
Causes


51%

41%

Sys

Engr

Prod/Ops
8%

Pgm

Mgt

Causation Analysis


Breakdown by Category

© 2006 All Rights Reserved. Aerospace Engineering Associates LLC


29

Observations


Only one of the 39 cases analyzed (Atlas Centaur 24)
had failure of a proper part as the cause!


Programs doing good job of acceptance testing


The other 38 were associated with human
error:
management weaknesses, systems engineering
shortcomings, etc
.


Therefore, it is necessary that risk assessments be
based on data that somehow reflects human error

Facts are stubborn things, but statistics are pliable.

Mark Twain

© 2006 All Rights Reserved. Aerospace Engineering Associates LLC


© 2006 All Rights Reserved. Aerospace Engineering Associates LLC


30

Observations (concluded)


Programs that adopt a zero
-
based approach to
testing are betting on the ability of the engineering
community to foresee all aspects of system
performance under all conditions


This is a very risky bet!

History demonstrates that tests frequently, if not
usually, produce unexpected (and unwanted) results

© 2006 All Rights Reserved. Aerospace Engineering Associates LLC


© 2006 All Rights Reserved. Aerospace Engineering Associates LLC


Applying the Lessons:


A Sample Set of “Rules of Practice”

31

© 2006 All Rights Reserved. Aerospace Engineering Associates LLC


© 2006 All Rights Reserved. Aerospace Engineering Associates LLC


Applying the Lessons:

A Sample Set of “Rules of Practice”


Issue:

Many lessons learned have common themes. The issue is
to systematically infuse this knowledge into programs so they’re
not lessons forgotten


One approach:

For large and complex programs, impose a
Program specific set of overarching “Rules of Practice” that
govern how certain things are to be done (i.e. to codify some of
the lessons)


Any deviation from these “Rules” would be cause for special attention (risk
management) by Program Management


These
ad hoc
“Rules” would not take the place of existing design
standards or similar tools, but rather provide an additional mechanism to
flag when special action is warranted


32

© 2006 All Rights Reserved. Aerospace Engineering Associates LLC


33

Applying the Lessons:

A Sample Set of “Rules of Practice” (cont’d)


Advance Warning: (Causal in 17 of 39 cases)


An effective system for
facilitating communication

between those
concerned about a potential safety
-
of
-
flight problem and those in a
position to reconcile it is to be designed and embedded in the
Program culture (easier said than done
-

but surely it’s doable!). It
must be:


Formal and visible.


Reliable (if not foolproof).


Simple to use with quick feedback.


Plugged into real authority to stop the action.


Culturally valued and respected.


© 2006 All Rights Reserved. Aerospace Engineering Associates LLC


© 2006 All Rights Reserved. Aerospace Engineering Associates LLC


34

Applying the Lessons:

A Sample Set of “Rules of Practice” (cont’d)


Analytical Modeling:
(Causal in 12 of 39 )


All analytical modeling on which designs are based will be
test
-
validated

and acquired from at least
two independent sources
.


An
independently validated

plume heating analysis is required of all
systems employing a new propulsion arrangement.






Heritage Items:
(Contributing cause in 12 of 39 cases)


Any item adopted for use based on successful flight performance in
another program will be deemed
unqualified

in the adopting
application until a thorough analysis has been performed to confirm
that the adopting application is identical (or less demanding) in all
relevant features to the prior successful application.


Any deviations must be qualified by test.




© 2006 All Rights Reserved. Aerospace Engineering
Associates
LLC

Applying the Lessons:

A Sample Set of “Rules of Practice” (cont’d)

35





Software:
(Causal in 6 of 39 cases:
Ariane

501, Titan IVB
-
32, SOHO, MCO, MPL, DART)


All software development, testing, and application processes will
be controlled by a
single formal, and configuration managed
Software Management Plan for which a
single individual

is
responsible.


Testing provided for in this plan will specifically include:


Demonstration of proper flight software operation in nominal and off
nominal flight simulation functional testing; this will be done with
flight hardware to the greatest extent possible.


Formal “qualification” and “acceptance” testing of flight critical
software “end items” prior to controlled “release” for use.


The plan will
also provide for periodic,
independent verification
that the original requirements remain valid.





© 2006 All Rights Reserved. Aerospace Engineering Associates LLC


© 2006 All Rights Reserved. Aerospace Engineering Associates LLC


36

Applying the Lessons:

A Sample Set of “Rules of Practice” (concluded)


General Engineering Management Practices:
Certain practices will
constitute required standard operating procedures:


Rationale Documentation:

It will be mandatory to systematically record the
rationale associated with all engineering products such as design and
operational requirements, procedures, test parameters, processes, design
choices, specifications, etc., and to place the rationale as close to the item it
relates to as possible.


Assumptions:

All assumptions that form the foundation for engineering
activities (analyses, test or not
-
to
-
test decisions, trade studies, design
approaches, etc.) will be explicitly stated and documented. A process for
validating, and periodically revalidating, the assumptions will be initiated.


Etc. (This is a sampling


not an all inclusive list. Certainly, Project
specific “Rules” are also appropriate.)


© 2006 All Rights Reserved. Aerospace Engineering
Associates
LLC

The Message

37



Some may say that the foregoing rules are rather boring
-

Nothing earthshaking
-

all pretty routine



Rigorous

implementation and infusion of
quality

into all
aspects of
routine, common sense

practices will prevent
most mission failures


It’s really
not

rocket science!

But that’s exactly the point!

© 2006 All Rights Reserved. Aerospace Engineering Associates LLC


Conclusions

38

© 2006 All Rights Reserved. Aerospace Engineering
Associates
LLC

39

Conclusions


Stuff Happens


Most mishaps can be broadly attributed to human error, not rocket
science


Lack of complete understanding of how complex systems interact with each other


Inadequate attention to every detail


Flawed analyses or tests


Improper use of “heritage” systems


Flawed processes


Flawed understanding of how software fails


Reaction to budget or schedule pressure


Imperfect management


Often, a complex, subtle, sequence of events is needed


If just one event in the chain were prevented, the failure would not have happened






Must ensure quality in all the above areas


Essential for mission success


Over decades, the same root causes of failures appear repeatedly


There are few new ones!

© 2006 All Rights Reserved. Aerospace Engineering Associates LLC


40

Conclusions


About Learning From Past Incidents


Sometimes we do, but the process is haphazard


Those involved

learn what to do and/or what not to do


But eventually they disappear taking with them:


The nuances of causation


Factors omitted from the official record


The lessons themselves (often) and their underlying rationale


Mishap Reports and Lessons Learned Data Bases (which have
come a long way) are what’s left but:


Relevant information may be missing


They lack the live element (the passion) and,


Nothing beats talking to those who “were there”





© 2006 All Rights Reserved. Aerospace Engineering
Associates
LLC

41

Conclusions (cont’d)


Basically, there is no universally successful approach
to learning the lessons from the
past


What’s needed is a dependable process that:


Uncovers root causation from those involved and/or the
documentation


Develops
and promulgates “Rules of Practice

as countermeasures


Organizations desiring to profit from applying lessons
previously learned should develop their own tailored
approaches


Should be included in the Project Plan






In the end, lessons are still best learned as a “contact sport”

© 2006 All Rights Reserved. Aerospace Engineering Associates LLC


MISSION

AEA’s mission is to leverage the vital lessons
learned by NASA’s
spacefaring

pioneers to
strengthen the skills of today’s aerospace
explorers.

P. O. Box 40448

Bay Village OH 44140

www.aea
-
llc.com

Joe Nieberding, President

Email: joenieber@sbcglobal.net

Cell: 440
-
503
-
4758

Larry Ross, CEO

Email: ljross1@att.net

Cell: 440
-
227
-
7240