See edit history of this section
Post feedback on this section
- 1. The Requirement
- 2. Rationale
- 3. Guidance
- 4. Small Projects
- 5. Resources
- 6. Lessons Learned
- 7. Software Assurance
1. Requirements
5.5.4 The project manager shall implement process assessments for all high-severity software non-conformances (closed loop process).
1.1 Notes
NPR 7150.2, NASA Software Engineering Requirements, does not include any notes for this requirement.
1.2 History
1.3 Applicability Across Classes
Class A B C D E F Applicable?
Key: - Applicable | - Not Applicable
2. Rationale
Understand why the high severity software non-conformance or defect occurred and make process changes to avoid additional high severity software non-conformances or defects. To reduce software defects.
3. Guidance
To reduce defects from occurring, we have to understand why the defect or software non-conformance occurred. Using a method like, Root Cause Analysis is a technique that will help you address the requirement. Root Cause Analysis is a structured evaluation method that identifies the root causes of an undesired outcome and the actions adequate to prevent a recurrence. In science and engineering, root cause analysis is a method of problem-solving used for identifying the root causes of faults or problems. Root cause analysis can be decomposed into four steps:
- Identify and describe clearly the problem.
- Establish a timeline from the normal situation up to the time the problem occurred.
- Distinguish between the root cause and other causal factors (e.g., using event correlation).
- Establish a causal graph between the root cause and the problem.
It is up to the project, engineering, and assurance to decide on the definition of high severity for their project. This requirement intends to assess any critical or high severity software defect or non-conformance to find out why the defect happened and what could be done to avoid generating these types of defects in the future on the project.
Using the project definition of high severity defects, the project performs a root cause analysis to determine why the defect occurred and what could have been done to prevent it. This information generally serves as input to a remediation process whereby corrective actions are determined and taken to prevent the problem from reoccurring in the future.
Proactive management, conversely, consists in preventing problems from occurring. Many techniques can be used for this purpose, ranging from good practices in design to analyzing in detail problems that have already occurred, and taking actions to make sure they never reoccur. Speed is not as important here as the accuracy and precision of the diagnosis. The focus is on addressing the real cause of the problem rather than its effects.
A factor is considered the root cause of a problem if removing it prevents the problem from recurring. A causal factor, conversely, affects an event's outcome but is not the root cause. Although removing a causal factor can benefit an outcome, it does not prevent its recurrence with certainty.
The goal of the requirement is to identify the root cause of the software problem, defect, or non-conformance. The next step is to trigger long-term corrective actions to address the root cause identified during root cause analysis and make sure that the problem does not resurface.
Definitions Pertaining to Root Cause Analysis Cause (Causal Factor) | An event or condition that results in an effect. Anything that shapes or influences the outcome. |
Proximate Cause(s) | The event(s) that occurred, including any condition(s) that existed immediately before the undesired outcome, directly resulted in its occurrence and, if eliminated or modified, would have prevented the undesired outcome. Also known as the direct cause(s). |
Root Cause(s) | One of the multiple factors (events, conditions, or organizational factors) that contributed to or created the proximate cause and subsequent undesired outcome and, if eliminated or modified, would have prevented the undesired outcome. Typically multiple root causes contribute to an undesired outcome. |
Root Cause Analysis (RCA) | A structured evaluation method that identifies the root causes of an undesired outcome and the actions adequate to prevent a recurrence. Root cause analysis should continue until organizational factors have been identified, or until data are exhausted. |
Event | A real-time occurrence describes one discrete action, typically an error, failure, or malfunction. Examples: pipe broke, power lost, lightning struck, the person opened a valve, etc… |
Condition | Any as-found state, whether or not resulting from an event, that may have safety, health, quality, security, operational, or environmental implications. |
Organizational Factors | Any operational or management structural entity that exerts control over the system at any stage in its life cycle, including but not limited to the system’s concept development, design, fabrication, test, maintenance, operation, and disposal. Examples: resource management (budget, staff, training); policy (content, implementation, verification); and management decisions. |
Contributing Factor | An event or condition that may have contributed to the occurrence of an undesired outcome but, if eliminated or modified, would not by itself have prevented the occurrence. |
Barrier | A physical device or administrative control is used to reduce the risk of the undesired outcome to an acceptable level. Barriers can provide physical intervention (e.g., a guardrail) or procedural separation in time and space (e.g., lock-out-tag-out procedure). |
Severity is defined as the degree of impact a Defect has on the development or operation of a component application being tested.
A higher effect on the system functionality will lead to the assignment of higher severity to the bug. Software Assurance engineer usually works with the engineering group to determine the severity level of defect
The higher the priority, the sooner the defect should be resolved.
Defects that leave the software system unusable should be given a higher priority over defects that cause a small functionality of the software to fail.
A business goal of all NASA software development organizations is to reduce software defects and non-conformances. The only way to succeed is to look at why the software defects and non-conformances occurred and fix the process or step that caused the software defects and non-conformances.
Additional guidance related to process assessments may be found in the following related requirements in this Handbook:
4. Small Projects
No additional guidance is available for small projects.
5. Resources
5.1 References
- (SWEREF-027) SMA-002-14, provides This course provides training on Root Cause Analysis (RCA) methodology that can be used in both general problem solving and mishap and close call investigations. NOTE: This course is one of five needed to fulfill the requirements for introductory training on NASA mishap investigations in accordance with NPR 8621.1B User needs account to access SATERN courses. This NASA-specific information and resource is available in at the System for Administration, Training, and Educational Resources for NASA (SATERN), accessible to NASA-users at https://satern.nasa.gov/.
- (SWEREF-052) Introduction to Root Cause Analysis (SMA-002-10) (NASA Root Cause Analysis, combined with the four prerequisite courses, meets the Root Cause Analysis training requirements in NPR 8621.1B) User needs account to access SATERN courses. This NASA-specific information and resource is available in at the System for Administration, Training, and Educational Resources for NASA (SATERN), accessible to NASA-users at https://satern.nasa.gov/.
- (SWEREF-053) NASA Root Cause Analysis (SMA-SAFE-OSMA-4003), combined with the five prerequisite courses, meets the Root Cause Analysis training requirements in NPR 8621.1B). User needs account to access SATERN courses. This NASA-specific information and resource is available in at the System for Administration, Training, and Educational Resources for NASA (SATERN), accessible to NASA-users at https://satern.nasa.gov/.
- (SWEREF-054) NPR 8621.1C, Office of Safety and Mission Assurance, Effective Date: May 19, 2016, Expiration Date: May 19, 2021 Also see Mishap Investigation web site at https://sma.nasa.gov/sma-disciplines/mishap-investigation
- (SWEREF-058) SATERN is NASA's Learning Management System (LMS) that provides web-based access to training and career development resources. Generic Reference to SATERN. User needs account to access SATERN courses. This NASA-specific information and resource is available in at the System for Administration, Training, and Educational Resources for NASA (SATERN), accessible to NASA-users at https://satern.nasa.gov/.
- (SWEREF-197) Software Processes Across NASA (SPAN) web site in NEN SPAN is a compendium of Processes, Procedures, Job Aids, Examples and other recommended best practices.
5.2 Tools
NASA users find this in the Tools Library in the Software Processes Across NASA (SPAN) site of the Software Engineering Community in NEN.
The list is informational only and does not represent an “approved tool list”, nor does it represent an endorsement of any particular tool. The purpose is to provide examples of tools being used across the Agency and to help projects and centers decide what tools to consider.
5.3 Training resources for NASA
6. Lessons Learned
6.1 NASA Lessons Learned
No Lessons Learned have currently been identified for this requirement.
6.2 Other Lessons Learned
No other Lessons Learned have currently been identified for this requirement.
7. Software Assurance
7.1 Tasking for Software Assurance
- Perform or confirm that a root cause analysis has been completed on all identified high severity software nonconformances, the results are recorded, and that the results have been assessed for adequacy.
- Confirm that the project analyzed the processes identified in the root cause analysis associated with the high severity software nonconformances.
- Assess opportunities for process improvement on the processes identified in the root cause analysis associated with the high severity software nonconformances.
- Perform or confirm tracking of corrective actions to closure on high severity software non-conformances.
7.2 Software Assurance Products
- Root Cause Analysis (Includes results, and any problem reports or findings recorded from Analysis)
- Record of Corrective Action Closures (Confirmation or trending showing closure status of findings/problem reports.)
- SA assessment of process improvement opportunities.
Objective Evidence
- Root Cause Analysis reports (Includes results, and any problem reports or findings recorded from Analysis)
- Record of Corrective Action Closures (Confirmation or trending showing closure status of findings/problem reports.)
- Software assurance audit reports
- Process improvement status reports
7.3 Metrics
- # of Root Cause Analyses performed
- # of Non-Conformances identified by each root cause analysis
- # of Corrective Actions (CAs) raised by SA vs. total #
- Attributes (Type, Severity, # of days Open, Life-cycle Phase Found)
- State (Open, In work, Closed)
- Trends of CA closures over time
- Trend the # of inconsistencies or corrective actions identified, and # closed
- Total # of Non-Conformances over time (Open, Closed, # of days Open, and Severity of Open)
- # of Non-Conformances in the current reporting period (Open, Closed, Severity)
- # of software process Non-Conformances by life-cycle phase over time
- # of software work product Non-Conformances identified by life-cycle phase over time
7.4 Guidance
Task 1: Perform a root cause analysis on all identified high severity software non-conformances and record the results.
Software assurance personnel review the list of non-conformances and choose all those non-conformances that are marked as “high priority.” Typically, those will be the non-conformances that cause a complete crash of the software, those that cause a major problem such as preventing a primary software function from performing, allowing a hazard to occur, or producing erroneous critical results. The high priority non-conformances all fall into a category where the software could not be released for use until these non-conformances are fixed or a work-around is identified.
Examine each of these non-conformances to determine the underlying cause of the failure. If the non-conformances seem to be related, it might be possible to do the root cause analysis as a group, rather than individually, but it is important to make sure that the analysis delves into the problem deeply enough to either confirm that the root cause is the same issue or identifies the individual root causes of each non-conformance.
Follow a typical root cause analysis process which usually consists of these basic steps:
- Define the problem
- Collect the data relating to the problem
- Identify what is causing the problem
- Prioritize the causes
- Identify solutions to the underlying problem
- Implement the change
- Monitor and sustain
To “define the problem” identify the high priority non-conformances that need a root cause analysis and determine whether there are ones that might be done as a group or whether an individual analysis should be done for each.
For “Collect the data relating to the problem,” get a good understanding of what happened when the non-conformance occurred. Think about: When does the problem occur? What is the software doing when the problem occurs? Is there a particular activity that seems to cause the problem to occur? How was the problem discovered? Collect any available details about the failure.
For “Identify what is causing the problem,” think about all the possible causes of the problem. Several techniques can be used to help with this process. One of the most popular is the Fishbone process where many possible causes are identified and then sorted into useful categories. The “bones” of the fish are each labeled with a broad category of possible causes and then populated using brainstorming to identify potential causes for this case.
“Prioritize the causes”- From a list of the potential causes or from something like a fishbone chart, the most likely causes are selected.
“Identify solutions to the underlying problem” – The most likely causes are further explored (or tested until they are narrowed down to the actual cause or causes. Then the solution can be found to correct the problem (non-conformance)
“Implement the change” – This step is done by the developers, but software assurance assures that the change is correctly implemented and fixes the non-conformance. It is also important to review other areas in the code where a similar non-conformance might exist and assure that those areas are also fixed.
“Monitor and sustain” -In addition to assuring that similar non-conformances in other areas of the code are also fixed, software assurance should check for other problems with similar causes when reviewing other parts of the system or when checking on changes to the system.
Task 2: Confirm that the project analyzed the processes identified in the root cause analysis associated with the high severity software non-conformances. If any of the processes used were determined to be root causes of the non-conformance, think about possible changes and improvements to the process that could prevent future similar non-conformances.
Task 3: Assess opportunities for process improvement on the processes identified as a causing factor in the root cause analysis associated with the high severity software non-conformances. Decide whether any changes should be made to the processes in use, what to change, and how to determine whether the change was effective in preventing a similar high severity non-conformance.
Task 4: Perform tracking of corrective actions to closure on high severity software non-conformances.
As part of completing the root cause analysis, software assurance will track all the corrective actions identified to closure.
Record the number of root cause analyses that software assurance has done on the project and make a list of the root causes found on each analysis. This list of root causes can be used in future reviews of the system to improve the quality of the software by paying close attention to these root causes and preventing similar issues.