See edit history of this section
Post feedback on this section
- 1. The Requirement
- 2. Rationale
- 3. Guidance
- 4. Small Projects
- 5. Resources
- 6. Lessons Learned
- 7. Software Assurance
1. Requirements
3.7.2 If a project has safety-critical software, the project manager shall implement the safety-critical software requirements contained in NASA-STD-8739.8.
1.1 Notes
NPR 7150.2, NASA Software Engineering Requirements, does not include any notes for this requirement.
1.2 History
1.3 Applicability Across Classes
Class A B C D E F Applicable?
Key: - Applicable | - Not Applicable
2. Rationale
The implementation of the safety-critical software requirements and processes helps ensure that a safe product is produced.
3. Guidance
3.1 Software Safety
Software safety is defined as “the aspects of software engineering and software assurance that provide a systematic approach to identifying, analyzing, tracking, mitigating, and controlling hazards and hazardous functions of a system where software may contribute either to the hazard or to its mitigation or control, to ensure safe operation of the system.”
It is important to have a systematic, planned approach for ensuring that safety is designed into developed or acquired software and that safety is maintained throughout the software and system life cycle. NASA-STD-8739.8B 278 specifies the software safety activities, data, and documentation necessary for the acquisition and development of software in a safety-critical system. Safety-critical systems that include software are evaluated for the software's contribution to the safety of the system during the concept phase and should be repeated at each major milestone as the design matures. See also Topic 7.03 - Acquisition Guidance, 7.04 - Flow Down of NPR Requirements on Contracts and to Other Centers in Multi-Center Projects
3.2 Software Safety Requirements
After the project has determined that the project has safety-critical software, the project manager should implement the safety-critical software requirements contained in NASA-STD-8739.8B in the project's software plans and the project's software requirements specification(s). The safety-critical software requirements contained in NASA-STD-8739.8B are listed below:
Derived from NPR 7150.2D para 3.7.3 SWE 134: Table 1, SA Tasks 1 - 6
1. Analyze the software requirements and the software design and work with the project to implement NPR 7150.2 requirement items "a" through "l."
2. Assess that the source code satisfies the conditions in the NPR 7150.2 requirement "a" through "l" for safety-critical and mission-critical software at each code inspection, test review, safety review, and project review milestone.
a. Use of partitioning or isolation methods in the
design and code,
b. That the design logically isolates the safety-critical
design elements and data from those that are
non-safety-critical.
6. Ensure the SWE-134 implementation supports and is consistent with the system hazard analysis.
3.3 Determining If Software Is Safety Critical
The Engineering Technical Authority and S&MA Technical Authority shall jointly determine if the software is designated as “safety-critical.” The “safety-critical” designation defines additional requirements mapping within this NPR. Software Safety-Critical Assessment Tool guidance is provided in NASA-HDBK-2203 as well as the software safety-critical determination process defined in NASA-STD-8739.8. Allocation of system safety requirements, hardware, and risk need to be considered in the assessment. The Engineering Technical Authority and S&MA Technical Authority must reach an agreement on the safety-critical designation of software. Disagreements are elevated via both the Engineering Technical Authority and Safety and Mission Assurance Technical Authority chains. Engineering and software assurance initially determine software safety criticality in the formulation phase per NASA-STD-8739.8, Software Assurance Standard; the results are compared and any differences are resolved. As the software is developed or changed and the software components, software models, and software simulations are identified, the safety-critical software determination can be reassessed and applied at lower levels.
3.4 Requirement Coverage
Software safety requirements must cover “both action (must work) and inaction (must not work). There are two kinds of software safety requirements: process and technical. Both need to be addressed and properly documented within a program, project, or facility.” The Standard required in this requirement was “developed by the NASA Office of Safety and Mission Assurance (OSMA) to provide the requirements for ensuring software safety across all NASA Centers, programs, projects, and facilities. It describes the activities necessary to ensure that safety is designed into the software. The magnitude and depth of software safety activities should be commensurate with ... the risk posed by the software. See also Topic 8.10 - Facility Software Safety Considerations.
3.4.1 Cyclomatic complexity
Cyclomatic complexity is a software metric used to indicate the complexity of a program. It is a quantitative measure of the number of linearly independent paths through a program's source code.
Software Safety Analysis supplements the system hazard analysis by assessing the software performing critical functions serving as a hazard cause or control. The review assures compliance to the levied functional software requirements, including SWE-134 - Safety-Critical Software Design Requirements, the software doesn’t violate the independence of hazard inhibits, and the software doesn’t violate the independence of hardware redundancy. The Software Safety Analysis should follow the phased hazard analysis process. A typical Software Safety Analysis process begins by identifying the must work and must not work functions in Phase 1 hazard reports. The system hazard analysis and software safety analysis process should assess each function, between Phase 1 and 2 hazard analysis, for compliance with the levied functional software requirements, including SWE-134. For example, Solar Array deployment (must work function) software should place deployment effectors in the powered off state when it boots up and requires to initialize and execute commands in the correct order within 4 CPU cycles before removing a deployment inhibit. The analysis also assesses the channelization of the communication paths between the inputs/sensors and the effectors to assure there is no violation of fault tolerance by routing a redundant communication path through a single component. The system hazard analysis and software safety analysis also assure the redundancy management performed by the software supports fault tolerance requirements. For example, software can’t trigger a critical sequence in a single fault-tolerant manner using single sensor input. Considering how software can trigger a critical sequence is true for triggering events such as payload separation, tripping FDIR responses that turn off critical subsystems, failover to redundant components, and providing closed-loop control of critical functions such as propellant tank pressurization.
See also, SWE-220 - Cyclomatic Complexity for Safety-Critical Software.
3.5 Design Analysis
The design analysis portion of software safety analysis should be completed by Phase 2 safety reviews. At this point, the software safety analysis supports a requirements gap analysis to identify any gaps (SWE-184 - Software-related Constraints and Assumptions) and ensure the risk and control strategy documented in hazard reports are correct as stated. Between Phase 2 and 3 safety reviews, the system hazard analysis and software safety analysis supports the analysis of test plans to assure adequate off-nominal scenarios (SWE-062 - Unit Test, SWE-065 - Test Plan, Procedures, Reports - a). Finally, in Phase 3, the system hazards analysis must verify the final implementation and verification upholds the analysis by ensuring test results permit closure of hazard verifications (SWE-068 - Evaluate Test Results) and that the final hazardous commands support the single command and multi-step command needs and finalized pre-requisite checks are in place.
Additional specific clarifications for the NPR 7150.2 SWE 134 requirement items "a" through "l." :
Item a: Aspects to consider when establishing a known safe state includes state of the hardware and software, operational phase, device capability, configuration, file allocation tables, and boot code in memory.
Assess that the source code satisfies the conditions in the NPR 7150.2 requirement "a" through "l" for safety-critical software at each code inspection, test review, safety review, and project review milestone.Item d: Multiple independent actions by the operator help to reduce potential operator mistakes.
- Item f: Memory modifications may occur due to radiation-induced errors, uplink errors, configuration errors, or other causes so the computing system must be able to detect the problem and recover to a safe state. As an example, computing systems may implement error detection and correction, software executable and data load authentication, periodic memory scrub, and space partitioning to protect against inadvertent memory modification. Features of the processor and/or operating system can be utilized to protect against incorrect memory use.
Item g: Software needs to accommodate both nominal inputs (within specifications) and off-nominal inputs, from which recovery may be required.
Item h: The requirement is intended to preclude the inappropriate sequencing of commands. Appropriateness is determined by the project and conditions designed into the safety-critical system. Safety-critical software commands are commands that can cause or contribute to a hazardous event or operation. One must consider not only the inappropriate sequencing of commands (as described in the original note) but also the execution of a command in the wrong mode or state. Safety-critical software commands must perform when needed (must work) or be prevented from performing when the system is not in a proper mode or state (must-not work).
Item j: The intent is to establish a safe state following the detection of an off-nominal indication. The safety mitigation must complete between the time that the off-nominal condition is detected and the time the hazard would occur without the mitigation. The safe state can either be an alternate state from normal operations or can be accomplished by detecting and correcting the fault or failure within the timeframe necessary to prevent a hazard and continuing with normal operations. The intent is to design in the ability of software to detect and respond to a fault or failure before it causes the system or subsystem to fail. If failure cannot be prevented, then design in the ability for the software to place the system into a safe state from which it can later recover. In this safe state, the system may not have full functionality but will operate with this reduced-functionality.
Item k: Error handling is an implementation mechanism or design technique by which software faults and/or failures are detected, isolated, and recovered to allow for correct run-time program execution. The software error handling features that support safety-critical functions must detect and respond to hardware and operational faults and/or failures as well as faults in software data and commands from within a program or from other software programs.
Item l: The design of the system must provide sufficient sensors and effectors, as well as self-checks within the software, to enable the software to detect and respond to system potential hazards.
See also 7.23 - Software Fault Prevention and Tolerance,
3.6 Training and Acquisition Guidance
For additional considerations when acquiring safety-critical software, see Topic 7.03 - Acquisition Guidance.
Training in software safety is available in the NASA SMA Technical Excellence Program (STEP).
These topics and more are expanded in NASA-GB-8719.13 276. Consult the guidebook for additional guidance, techniques, analysis, references, resources, and more for software developers creating safety-critical software as well as guidance for project managers, software assurance personnel, system engineers, and safety engineers. Knowledge of the software safety tasks performed by persons in roles outside of software engineering will help engineering personnel understand requests from these persons for software engineering products and processes.
3.7 Additional Guidance
Additional guidance related to this requirement may be found in the following materials in this Handbook:
3.8 Center Process Asset Libraries
SPAN - Software Processes Across NASA
SPAN contains links to Center managed Process Asset Libraries. Consult these Process Asset Libraries (PALs) for Center-specific guidance including processes, forms, checklists, training, and templates related to Software Development. See SPAN in the Software Engineering Community of NEN. Available to NASA only. https://nen.nasa.gov/web/software/wiki 197
See the following link(s) in SPAN for process assets from contributing Centers (NASA Only).
SPAN Links |
---|
4. Small Projects
The specific activities and depth of analyses needed to meet the requirements can, and should, be modified to the software safety risk. In other words, while the requirements must be met, the implementation and approach to meeting those requirements may and should vary to reflect the system to which they are applied. Substantial differences may exist when the same software safety requirements are applied to dissimilar projects.
For projects designated as a small project based on personnel or budget, the following options may be considered to assist in the fulfillment of this requirement:
- Utilize existing tools already validated and approved for use in the development of safety-critical software.
- If a standard set of validated and approved tools does not exist, consider establishing them for future projects.
- Use an existing safety plan specifically developed for small projects.
- If such a plan does not exist, consider creating one so future projects do not have to create a new one.
- Use one person to fill multiple roles.
- The software safety engineer may have other project roles or fill similar roles for other projects.
- Keep in mind, that safety, quality, and reliability analyses and activities must be either performed or assessed, verified, and validated by a party independent of those developing the product.
5. Resources
5.1 References
- (SWEREF-001) Software Development Process Description Document, EI32-OI-001, Revision R, Flight and Ground Software Division, Marshall Space Flight Center (MSFC), 2010. See Chapter 8. This NASA-specific information and resource is available in Software Processes Across NASA (SPAN), accessible to NASA users from the SPAN tab in this Handbook.
- (SWEREF-034) NASA-HDBK 8739.23A, Approved: 02-02-2016, Superseding: NASA-HDBK-8739.23 With Change 1,
- (SWEREF-197) Software Processes Across NASA (SPAN) web site in NEN SPAN is a compendium of Processes, Procedures, Job Aids, Examples and other recommended best practices.
- (SWEREF-271) NASA STD 8719.13 (Rev C ) , Document Date: 2013-05-07
- (SWEREF-276) NASA-GB-8719.13, NASA, 2004. Access NASA-GB-8719.13 directly: https://swehb-pri.msfc.nasa.gov/download/attachments/16450020/nasa-gb-871913.pdf?api=v2
- (SWEREF-278) NASA-STD-8739.8B , NASA TECHNICAL STANDARD, Approved 2022-09-08 Superseding "NASA-STD-8739.8A,
- (SWEREF-294) The Safety and Mission Assurance (SMA) Technical Excellence Program (STEP) is a career-oriented, professional development roadmap for SMA professionals.
- (SWEREF-342) SMA-SA-WBT-230 SATERN (need user account to access SATERN courses). This NASA-specific information and resource is available in at the System for Administration, Training, and Educational Resources for NASA (SATERN), accessible to NASA-users at https://saterninfo.nasa.gov/.
- (SWEREF-344) SATERN Need user account to access SATERN courses.This NASA-specific information and resource is available in at the System for Administration, Training, and Educational Resources for NASA (SATERN), accessible to NASA-users at https://saterninfo.nasa.gov/.
- (SWEREF-350) U.S. Department of Defense (1993). MIL-STD-882C, 1993. Note that MIL-STD-882D exists, but that the NASA Software Safety Guidebook recommends using MIL-STD-882C.
- (SWEREF-504) Public Lessons Learned Entry: 343.
- (SWEREF-517) Public Lessons Learned Entry: 707.
- (SWEREF-522) Public Lessons Learned Entry: 772.
- (SWEREF-527) Public Lessons Learned Entry: 839.
- (SWEREF-539) Public Lessons Learned Entry: 1122.
5.2 Tools
NASA users find this in the Tools Library in the Software Processes Across NASA (SPAN) site of the Software Engineering Community in NEN.
The list is informational only and does not represent an “approved tool list”, nor does it represent an endorsement of any particular tool. The purpose is to provide examples of tools being used across the Agency and to help projects and centers decide what tools to consider.
6. Lessons Learned
6.1 NASA Lessons Learned
The NASA Lesson Learned database contains the following lessons learned related to software safety:
- Fault-Detection, Fault-Isolation, and Recovery (FDIR) Techniques. Lesson Number 0839 527: "Apply techniques such as Built-in Test (BIT), strategic placing of sensors, centralized architecture, and fault isolation and recovery to optimize system availability... Operating in such a critical environment as outer space, astronauts' lives and mission success are dependent on the integrity of a system. Since time and resources are limited, the sooner failures can be accurately detected and a failed system repaired and recovered, the more likely crew survival rate and mission success are to be improved."
- Fault-Tolerant Design. Lesson Number 0707 517: "Incorporate hardware and software features in the design of spacecraft equipment which tolerates the effects of minor failures and minimizes switching from the primary to the second string. This increases the potential availability and reliability of the primary string."
- Fault Protection. Lesson Number 0772 522: "Fault protection is the use of the cooperative design of flight and ground elements (including hardware, software, procedures, etc.) to detect and respond to perceived spacecraft faults. Its purpose is to eliminate single point failures or their effects and to ensure spacecraft system integrity under anomalous conditions."
- Mars Observer Inappropriate Fault Protection Response Following Contingency Mode Entry due to a Postulated Propulsion Subsystem Breach. Lesson Number 0343 504: The Recommendations are: "(1) spacecraft designers must consider the consequences of anomalies at all mission phases and ensure that fault protection takes proper action regardless of spacecraft state. (2) Fault responses should not be allowed to interrupt critical activities unless they can assure the completion of these activities. Finally, stable fault protection modes (such as contingency mode) should autonomously assure communications."
- Aero-Space Technology/X-34 In-Flight Separation from L-1011 Carrier. Lesson Number 1122 539: "The X-34 technology demonstrator program faces safety risks related to the vehicle's separation from the L-1011 carrier aircraft and to the validation of flight software. Moreover, safety functions seem to be distributed among the numerous contractors, subcontractors, and NASA without a clear definition of roles and responsibilities" The Recommendation is that "NASA should review and assure that adequate attention is focused on the potentially dangerous flight separation maneuver, the thorough and proper validation of flight software, and the pinpointing and integration of safety responsibilities in the X-34 program."
6.2 Other Lessons Learned
No other Lessons Learned have currently been identified for this requirement.
7. Software Assurance
7.1 Tasking for Software Assurance
1. Confirm that the identified safety-critical software components and data have implemented the safety-critical software assurance requirements listed in this standard.
7.2 Software Assurance Products
- Software safety requirements mapping table for the SASS standard requirements.
Objective Evidence
- Evidence that confirms that the identified software safety-critical components and data have implemented the safety-critical requirements in NASA-STD-8739.8.
- NPR 7150.2 and NASA-STD-8739.8 requirements mapping matrices signed by the engineering and SMA technical authorities for each development organization.
7.3 Metrics
- # of safety-related requirement issues (Open, Closed) over time
- # of safety-related non-conformances identified by life-cycle phase over time
See also Topic 8.18 - SA Suggested Metrics
7.4 Guidance
Step 1 Confirm that the identified safety-critical software components have implemented the safety-critical software assurance requirements listed in this standard. See SWE-205 - Determination of Safety-Critical Software for guidance on determining if a software component is safety-critical.
The safety-critical software assurance requirements listed in NASA-STD-8739.8 278 are:
Derived from NPR 7150.2D para 3.7.3 SWE 134: Table 1, SA Tasks 1 - 6
1. Analyze the software requirements and the software design and work with the project to implement NPR 7150.2 requirement items "a" through "l."
2. Assess that the source code satisfies the conditions in the NPR 7150.2 requirement "a" through "l" for safety-critical and mission-critical software at each code inspection, test review, safety review, and project review milestone.
a. Use of partitioning or isolation methods in the
design and code,
b. That the design logically isolates the safety-critical
design elements and data from those that are
non-safety-critical.
6. Ensure the SWE-134 implementation supports and is consistent with the system hazard analysis.
For a list of Safety-Specific Activities by general life-cycle phases, refer to Topic 8.20 - Safety Specific Activities in Each Phase.
The project and engineering have responsibilities to implement an approach that minimizes the risk associated with safety-critical software. The panel below defined what engineering should do when a project has determined that the software is safety-critical.
Additional specific clarifications for the NPR 7150.2 SWE-134 - Safety-Critical Software Design Requirements requirement items "a" through "l." :
- Item a: Aspects to consider when establishing a known safe state includes the state of the hardware and software, operational phase, device capability, configuration, file allocation tables, and boot code in memory.
- Item d: Multiple independent actions by the operator help to reduce potential operator mistakes.
- Item f: Memory modifications may occur due to radiation-induced errors, uplink errors, configuration errors, or other causes so the computing system must be able to detect the problem and recover to a safe state. As an example, computing systems may implement error detection and correction, software executable and data load authentication, periodic memory scrub, and space partitioning to protect against inadvertent memory modification. Features of the processor and/or operating system can be utilized to protect against incorrect memory use.
- Item g: Software needs to accommodate both nominal inputs (within specifications) and off-nominal inputs, from which recovery may be required.
- Item h: The requirement is intended to preclude the inappropriate sequencing of commands. Appropriateness is determined by the project and conditions designed into the safety-critical system. Safety-critical software commands are commands that can cause or contribute to a hazardous event or operation. One must consider not only the inappropriate sequencing of commands (as described in the original note) but also the execution of a command in the wrong mode or state. Safety-critical software commands must perform when needed (must work) or be prevented from performing when the system is not in a proper mode or state (must-not work).
- Item j: The intent is to establish a safe state following the detection of an off-nominal indication. The safety mitigation must complete between the time that the off-nominal condition is detected and the time the hazard would occur without the mitigation. The safe state can either be an alternate state from normal operations or can be accomplished by detecting and correcting the fault or failure within the timeframe necessary to prevent a hazard and continuing with normal operations. The intent is to design in the ability of software to detect and respond to a fault or failure before it causes the system or subsystem to fail. If failure cannot be prevented, then design in the ability for the software to place the system into a safe state from which it can later recover. In this safe state, the system may not have full functionality but will operate with this reduced functionality.
- Item k: Error handling is an implementation mechanism or design technique by which software faults and/or failures are detected, isolated, and recovered to allow for correct run-time program execution. The software error handling features that support safety-critical functions must detect and respond to hardware and operational faults and/or failures as well as faults in software data and commands from within a program or from other software programs.
- Item l: The design of the system must provide sufficient sensors and effectors, as well as self-checks within the software, to enable the software to detect and respond to system potential hazards.
See also Topic 8.01 - Off Nominal Testing.
Assess that the source code satisfies the conditions in the NPR 7150.2 requirement "a" through "l" for safety-critical software at each code inspection, test review, safety review, and project review milestone.
Software safety requirements must cover “both action (must work) and inaction (must not work). There are two kinds of software safety requirements: process and technical. Both need to be addressed and properly documented within a program, project, or facility.” The Standard required in this requirement was “developed by the NASA Office of Safety and Mission Assurance (OSMA) to provide the requirements for ensuring software safety across all NASA Centers, programs, projects, and facilities. It describes the activities necessary to ensure that safety is designed into the software. The magnitude and depth of software safety activities should be commensurate with ... the risk posed by the software.”
Software safety is defined as “the aspects of software engineering and software assurance that provide a systematic approach to identifying, analyzing, tracking, mitigating, and controlling hazards and hazardous functions of a system where software may contribute either to the hazard or to its mitigation or control, to ensure safe operation of the system.”
It is important to have a systematic, planned approach for ensuring that safety is designed into developed or acquired software and that safety is maintained throughout the software and system life cycle. NASA-STD-8739.8 specifies the software safety activities, data, and documentation necessary for the acquisition and development of software in a safety-critical system... Safety-critical systems that include software are evaluated for the software's contribution to the safety of the system during the concept phase and should be repeated at each major milestone as the design matures.
Engineering and software assurance initially determine software safety criticality in the formulation phase per NASA-STD-8739.8, Software Assurance Standard; the results are compared and any differences are resolved. As the software is developed or changed and the software components, software models, and software simulations are identified, the safety-critical software determination can be reassessed and applied at lower levels.
The Engineering Technical Authority and S&MA Technical Authority shall jointly determine if the software is designated as “safety-critical.” The “safety-critical” designation defines additional requirements mapping within this NPR. Software Safety-Critical Assessment Tool guidance is provided in NASA-HDBK-2203 as well as the software safety-critical determination process defined in NASA-STD-8739.8. Allocation of system safety requirements, hardware, and risk need to be considered in the assessment. The Engineering Technical Authority and S&MA Technical Authority must reach an agreement on the safety-critical designation of software. Disagreements are elevated via both the Engineering Technical Authority and Safety and Mission Assurance Technical Authority chains.
Cyclomatic complexity is a software metric used to indicate the complexity of a program. It is a quantitative measure of the number of linearly independent paths through a program's source code.
See the software assurance tab in SWE-134 - Safety-Critical Software Design Requirements for an explanation of cyclomatic complexity and code coverage guidance.
Software Safety Analysis supplements the system hazard analysis by assessing the software performing critical functions serving as a hazard cause or control. The review assures compliance to the levied functional software requirements, including SWE-134, the software doesn’t violate the independence of hazard inhibits, and the software doesn’t violate the independence of hardware redundancy. The Software Safety Analysis should follow the phased hazard analysis process. A typical Software Safety Analysis process begins by identifying the must work and must not work functions in Phase 1 hazard reports. The system hazard analysis and software safety analysis process should assess each function, between Phase 1 and 2 hazard analysis, for compliance with the levied functional software requirements, including SWE-134. For example, Solar Array deployment (must work function) software should place deployment effectors in the powered-off state when it boots up and requires to initialize and execute commands in the correct order within 4 CPU cycles before removing a deployment inhibit. The analysis also assesses the channelization of the communication paths between the inputs/sensors and the effectors to assure there is no violation of fault tolerance by routing a redundant communication path through a single component. The system hazard analysis and software safety analysis also assure the redundancy management performed by the software supports fault tolerance requirements. For example, software can’t trigger a critical sequence in a single fault-tolerant manner using single sensor input. Considering how software can trigger a critical sequence is true for triggering events such as payload separation, tripping FDIR responses that turn off critical subsystems, failover to redundant components, and providing closed-loop control of critical functions such as propellant tank pressurization.
See also, SWE-220 - Cyclomatic Complexity for Safety-Critical Software.
The design analysis portion of software safety analysis should be completed by Phase 2 safety reviews. At this point, the software safety analysis supports a requirements gap analysis to identify any gaps (SWE-184 - Software-related Constraints and Assumptions) and ensure the risk and control strategy documented in hazard reports are correct as stated. Between Phase 2 and 3 safety reviews, the system hazard analysis and software safety analysis supports the analysis of test plans to assure adequate off-nominal scenarios SWE-062 - Unit Test, SWE-065 - Test Plan, Procedures, Reports a). Finally, in Phase 3, the system hazards analysis must verify the final implementation and verification upholds the analysis by ensuring test results permit closure of hazard verifications (SWE-068 - Evaluate Test Results) and that the final hazardous commands support the single command and multi-step command needs and finalized pre-requisite checks are in place.
The requirements specified in this Standard obligate the program, project, and facility, and safety and mission assurance organizations to:
- Identify when software plays a part in system safety and generate appropriate requirements to ensure the safe operation of the system.
- Ensure that software is considered within the context of system safety, and that appropriate measures are taken to create safe software.
- Ensure that software safety is addressed in project acquisition, planning, management, and control activities.
- Ensure that software safety is considered throughout the system life-cycle, including mission concept, generation of requirements, design, coding, test, maintenance, and operation of the software.
- Ensure that the acquisition of software, whether off-the-shelf or contracted, includes evaluation, assessment, and planning for addressing and mitigating risks due to the software’s contribution to safety and any limitations of the software.
- Ensure that software verification and validation activities include software safety verifications and validations.
- Ensure that the proper certification requirements are in place and accomplished before the actual operational use of the software.
- Ensure that changes and reconfigurations of the software, during development, testing, and operational use of the software, are analyzed for their impacts on system safety.”
See also Topic 8.10 - Facility Software Safety Considerations.
The Engineering Technical Authority and S&MA Technical Authority shall jointly determine if the software is designated as “safety-critical.”
Basic Steps for Implementing NASA-STD-8739.8
When implementing the requirements of NASA-STD-8739.8, follow the basic steps summarized below:
- Identify safety-critical software.
- Document identification efforts and results.
- If no safety-critical software is found, stop.
- Determine the software safety criticality.
- Determine the safety effort and oversight required.
Development Activities
The appropriate project personnel performs the following development activities to fulfill the software safety requirements:
- Analyzing or working with system safety to analyze software control of critical functions and the identification of software that causes, controls, mitigates, or contributes to hazards.
- Identify software safety design features and methods in design documents.
- Follow proper coding standards (which may include safety features) (See SWE-061 - Coding Standards).
- Use hazards analysis to identify failures and failure combinations to be tested.
Safety and Risk
When identifying software safety requirements applicable to a project, consult existing lists of software safety requirements to identify generic safety requirements. In addition, use techniques such as hazard analysis and design analysis to identify safety requirements specific to a particular project. NASA-GB-8719.13 276 provides a list of sources for generic requirements. Appendix H of that guidebook includes a checklist of generic software safety requirements from the Marshall Space Flight Center (MSFC).
Remember to include risk as a factor when determining which requirements are more critical than others.
When developing safety-critical software, the project needs to:
- Design in a degree of fault tolerance, since not all faults can be prevented
- Choose a "safe" programming language; one that enforces good programming practices finds errors at compile-time, has strict data types, bounds checking on arrays, discourages the use of pointers, etc.
- Use coding standards that enforce "safe and secure" programming practices.
- Implement defensive programming.
- Look specifically for unexpected interactions among units during integration testing.
- Evaluate the complexity of software components and interfaces.
- Design for maintainability and reliability.
- Use software peer reviews.
- Use design data analysis, design interface analysis, and design traceability analysis.
- Develop safety tests for safety-critical software units that cannot be fully tested once the units are integrated.
- Use code logic analysis, code data analysis, code interface analysis, and unused code analysis.
- Use interrupt analysis.
- Use test coverage analysis.
- Use stress testing, stability testing, resistance to failure tests, disaster testing.
- Evaluate operating systems for safety before choosing one for the project.
- Review the Design for Safety checklist in Appendix H of the NASA Software Safety Guidebook 276
See also Topic 8.02 - Software Reliability.
Programmable Logic Devices, Tools, and Off-the-Shelf (OTS) Software
If the project involves programmable logic devices, consult NASA-HDBK-8739.23, NASA Complex Electronics Handbook for Assurance Professionals. 034
For tools that are used in the development of safety-critical software, including compilers, linkers, debuggers, test environments, simulators, code generators, etc., consider the following:
- Use tools previously validated for use in the development of safety-critical software, but consider the differences in how those tools were used on the projects for which they were validated and their use on the new project to determine if re-validation is required.
- Tools previously validated for use in the development of safety-critical software and which have been in use for many years in the same environment for the same purposes may not require re-validation.
- For tools not yet approved or for which re-validation is being considered:
- Consider the tool's maturity.
- Try to obtain any known bug lists for the tool.
- Try to obtain any existing tests, analyses, and results for the tool.
- Obtain an understanding of how the tool could fail and determine if those failures could negatively affect the safety of the software or system for which they are used.
- Perform safety testing and analysis to ensure that the tools do not influence known hazards or adversely affect the residual risk of the software.
- Consider independent validation for the tool.
If the project involves off-the-shelf (OTS) or reused software, the project needs to:
- Evaluate system differences that could affect safety.
- Look at interfaces needed to incorporate it into the system or isolate it from critical or non-critical software, as appropriate.
- Perform analysis of the impacts of this software on the overall project, such as:
- Identifying extra functions that could cause safety hazards.
- Determining the effects of extra functionality needed to integrate the software with the rest of the system.
- Evaluate the cost of extra analysis and tests needed to ensure system safety due to the use of OTS or reused software.
- Seek insight into the practices used to develop the software.
- Evaluate the V&V results of OTS software to make sure that it is consistent with the level of V&V of the developed software.
For contractor-developed software, the project:
- Includes in the contract:
- Surveillance or insight activities for the contractor development process.
- Identification of responsibility for preparing and presenting the Safety Compliance Data Package to the Safety Review Panel.
- Safety analysis and test requirements.
- Requirements for delivery of software safety deliverables including software safety plan, all-hazard analyses, audit reports, verification reports, etc.
- Evaluate contractor/provider track record, skills, capabilities, stability.
- Considers performing additional software testing beyond that conducted by the provider.
Training and Additional Guidance
For additional considerations when acquiring safety-critical software, see 7.03 - Acquisition Guidance.
Training in software safety is available in the NASA SMA Technical Excellence Program (STEP).
Step 2. Analyze the software design to ensure that partitioning or isolation methods are used in the design to logically isolate the safety-critical design elements from those that are non-safety-critical. - Methods to separate the safety-critical software from software that is not safety-critical, such as partitioning, may be used.
When developing safety-critical software, the project needs to:
- Design in a degree of fault tolerance, since not all faults can be prevented
- Choose a "safe" programming language; one that enforces good programming practices finds errors at compile-time, has strict data types, bounds checking on arrays, discourages the use of pointers, etc.
- Use coding standards that enforce "safe and secure" programming practices.
- Implement defensive programming.
- Look specifically for unexpected interactions among units during integration testing.
- Evaluate the complexity of software components and interfaces.
- Design for maintainability and reliability.
- Use software peer reviews.
- Use design data analysis, design interface analysis, and design traceability analysis.
- Develop safety tests for safety-critical software units that cannot be fully tested once the units are integrated.
- Use code logic analysis, code data analysis, code interface analysis, and unused code analysis.
- Use interrupt analysis.
- Use test coverage analysis.
- Use stress testing, stability testing, resistance to failure tests, disaster testing.
- Evaluate operating systems for safety before choosing one for the project.
- Review the Design for Safety checklist in Appendix H of the NASA Software Safety Guidebook 276
7.5 Additional Guidance
Additional guidance related to this requirement may be found in the following materials in this Handbook: