- 1. Introduction
- 2. Design Analysis Guidance
- 3. Safety Analysis During Design
- 4. Analysis Report Content
- 5. Resources
The Software Design Analysis product focuses on analyzing the software design that has been developed from the requirements (software, system, and/or interface). This topic describes some of the methods and techniques Software Assurance and Software Safety personnel may use to evaluate the quality of the architecture and design elements that was developed.
The software design process begins with a good understanding of the requirements and the system architecture and system design. The architectural design begins with the development of a basic architecture and a high-level preliminary design. The architectural design is then expanded into a low-level detailed design. By the time the detailed design is complete, software engineering should be able to implement it into the code of the desired software system or application.
Since the design primarily guides the code implementation, it is important to ensure that the architecture and design are correct, safe, secure, complete, understandable, and captures the intent of the requirements. The detailed design captures the low-level component-based approach to implementing the software requirements, including the requirements associated with fault management, security, and safety. When the detailed design is complete, the analysis of the requirements traceability documents should show the relationship between the software design components and the software requirements and provides evidence that all requirements are accounted for. The information in this topic is divided into several tabs as follows:
- Tab 1 – Introduction
- Tab 2 – Software Design Analysis Guidance – provides general guidance for doing software design analysis
- Tab 3 – Safety Analysis During Design – provides additional guidance when safety critical software is involved with analysis emphasis on safety features
- Tab 4 - Analysis Reporting Content – provides guidance on the analysis report product content
- Tab 5 – Resources for this topic
The following is a list of the applicable SWE requirements that relate to the generation of the software design analysis product:
NPR 7150.2 Requirement
NASA-STD-8739.8 Software Assurance and Software Safety Tasks
The project manager shall define and document the acceptance criteria for the software.
1. Confirm software acceptance criteria are defined and assess the criteria based on guidance in the NASA Software Engineering Handbook, NASA-HDBK-2203.
If a project has safety-critical software or mission-critical software, the project manager shall implement the following items in the software:
6. Analyze the software design to ensure:
1. Assess that the software architecture addresses or contains the software structure, qualities, interfaces, and external/internal components.
2. Analyze the software architecture to assess whether software safety and mission assurance requirements are met.
The project manager shall perform a software architecture review on the following categories of projects:
1. Assess the results of or participate in software architecture review activities held by the project.
1. Assess the software design against the hardware and software requirements, and identify any gaps.
2. Assess the software design to verify that the design is consistent with the software architectural design concepts and that the software design describes the lower-level units to be coded, compiled, and tested.
3. Assess that the design does not introduce undesirable behaviors or unnecessary capabilities.
4. Confirm that the software design implements all of the required safety-critical functions and requirements.
5. Perform a software assurance design analysis.
The project manager shall track and evaluate changes to software products.
1. Analyze proposed software and hardware changes to software products for impacts, particularly to safety and security.
The project manager shall identify the software configuration items (e.g., software records, code, data, tools, models, scripts) and their versions to be controlled for the project.
2. Assess that the software safety-critical items are configuration managed, including hazard reports and safety analysis.
The project manager shall implement mandatory assessments of reported non-conformances for all COTS, GOTS, MOTS, OSS, or reused software components.
2. Assess the impact of non-conformances on the safety, quality, and reliability of the project software.
2. Software Design Analysis Guidance
In software design, software requirements are transformed into the architectural design with a software architecture and a high-level preliminary design followed by the more specific detailed software design. The architecture establishes the interfaces, overall layout/structure, and data flow of the software. The high-level preliminary design identifies the specific individual components (e.g., files, functions, subroutines, classes, modules) for each software program/application along with a description of what that piece does. In addition, it should include items such as the inputs, outputs, units, and data types along with databases and interfaces (e.g., hardware, operator/user, software program/applications, system and subsystems).
The detailed design takes the high-level components, files, functions, subroutines, classes, etc. and breaks them down to the point where they become pseudo-code with variable names and associated descriptions identified and the logic flow stubbed out. As project budgets tighten, more and more software organizations are embedding the detailed design in the source code and extracting it with tools like Javadoc and Doxygen. (Note: This is not an endorsement of these tools.) So, Software Assurance and Software Safety personnel should be aware they may receive the detailed design documentation in a less traditional manner. For small software systems, the architectural and detailed design may be combined.
The design addresses the software architectural design and software detailed design. The objective of doing design analysis is to ensure that the design:
- is a correct, accurate, and complete transformation of the software requirements that will meet the operational needs under nominal and off-nominal conditions,
- is safe,
- is secure with known weaknesses and vulnerabilities mitigated,
- introduces no unintended features, and
- does not result in unacceptable operational risk.
The design should also be created considering portability, performance, and maintainability so future changes can be made quickly without the need for significant redesign.
There are several design techniques described below that help with the analysis of the design. Each of these may be used by Software Assurance and Software Safety personnel to help ensure a more robust design. Additionally, these personnel should be aware of the Topic section – Software Design Principles – that addresses specific aspects of the design.
Tab 3 (Safety Analysis During Design) contains a more extensive list of analysis techniques that may be used by the Software Safety personnel.
Software Assurance and Software Safety tasks in NASA-STD-8739.8 that relate to design analysis are found in SWE-052, SWE-058, SWE-060, SWE-087, SWE-134, and SWE-157.
2.1 Use of Checklists and Known Best Practices
As part of the design analysis, Software Assurance and Software Safety personnel review the design to ensure that general design best practices have been implemented (see below). The use of the SADESIGN Checklist (see below) is important when evaluating the software design as it highlights many best practices. There are other aids in this Handbook that may be used for evaluating the design. They are the Programming Checklists Topic: 6.1 - Design for Safety Checklist and the Software Design Principles Topic: Software Design Principles. This information should be considered during the analysis for both safety critical software and non-safety critical software. Teams may decide to formulate some of this information into a checklist that is applicable to their project.
General Design Best Practices:
Some general design best practices to consider are:
- Break the design into smaller chunks. Don’t try to design it all at once.
- Keep the design simple.
- Keep the design modular so it will be easier to test and maintain.
- Keep boundaries, interfaces, and constraints in mind.
- Strive for maximum cohesion and minimum coupling. (Cohesion groups together the things that make sense; coupling is the relative dependence between the modules)
- Use abstraction to increase the reusability of modules. (Abstraction is the reduction of a body of data to a simplified representation of the whole.)
- Consider how the users will use and interact with the system. Keep the user interface design user friendly.
- Include error handling in the designs.
- Don’t duplicate sections of code – if the sections of code need to be used repeatedly, put them into a function, a package, or subroutine that can be called.
- Prototype new approaches or designs for difficult requirements.
- Peer review designs, particularly interfaces, data flows, and logic flows.
- Use design practices such as documentation review, pseudo code, process diagrams, and logic diagrams to aid in evaluating the design.
Additional guidance and some key design practices may also be found in SWE-058, tab 7.
2.2 Use of peer reviews or inspections
Design items designated in the software management/development plans are peer reviewed or inspected. Some of the items to look for during these meetings are:
- Assess the software design against the hardware and identify any gaps.
- Assess the software design against the system requirements and design and identify any gaps.
- Confirm that the detailed design is consistent with the architectural design and describes the program’s or application’s components at a low enough level for coding.
- Confirm the design does not contain undesirable functionality.
- Confirm the safety-related requirements (e.g., SWE-134) have been taken into account for safety-critical software.
- Confirm the design addresses possible unauthorized access, vulnerabilities, and weaknesses.
2.3 Review of Traceability Matrices
Review the traces from requirements to design and design to requirements to ensure all requirements are completely accounted for. As the project moves into implementation, the bi-directional traceability matrices between design and code should also be checked.
2.4 Software Architecture Review Board (SARB) Analysis - applies to NASA projects only
The Software Architecture Review Board (SARB) is a NASA-wide board that engages with flight projects in the formative stages of the software architecture. The objectives of the SARB are to manage and/or reduce flight software complexity through better software architecture and help improve mission software reliability and save costs. NASA projects that meet certain criteria (for example, large projects, ones with safety critical concerns, projects destined for considerable reuse, etc.) may request the SARB to do a review and assessment of their architecture. For more guidance on the focus areas of the SARB, see the SWE-143 – Tab 3 in this Handbook. For more information on the SARB or to request a review, please visit the SARB site on the NASA Engineering Network (NEN).
2.5 Problem/Issue Tracking System
3. Safety Analysis During Design
The Safety Design Analysis is a portion of the overall Software Safety Analysis that is performed on all safety critical software, as defined in NASA-STD-8739.8. A full Software Safety Analysis encompasses all the aspects of the development life cycle (requirements, design, implementation, and test) for safety-critical software and focuses on the safety features (safety requirements, controls, mitigations, fault identification, isolation and recovery, etc.). During the Design phase, Software Safety personnel analyze the design to ensure that it will not adversely impact the safety of the system/software. This tab discusses the Software Safety Analysis activities during design.
3.1 Review Software Design Analysis Information
To begin the Safety Design Analysis, the Software Safety and SA personnel should collaborate on the activities on Tab 2 – Software Design Analysis Guidance. However, Software Safety personnel should perform an independent analysis to become familiar with the design. Both teams should review each other’s Software Design Analysis results to ensure that all safety aspects have been adequately considered and addressed in the software design. In addition to the techniques and activities on Tab 2, it may be useful to use any of the following information for analyzing safety-critical software:
- Topic 1 - Design for Safety Checklist found in this Handbook, under the “Programming Checklists” Tab of Topics and
- “Software Design Principles” Tab of Topics in this Handbook.
After reviewing the analysis work done to date and the applicable checklists, examine the various operational scenarios (nominal and off-nominal) for what could go wrong with the mission or if the software (or hardware) fails. (This scenario information may be in a preliminary Hazard Report; however, some of the scenarios may not have been identified yet and are a product of this exercise.) Review the Software Design to see if the mishaps or failures are accounted for. It may be necessary to reverse engineer the scenarios to ensure that the software design accounts for them and has the proper hooks in place to deal with any faults or failures.
3.2 Design Peer Reviews or Walk-throughs
Peer reviews or walk-throughs for safety-critical components are recommended techniques to aid in identifying software design problems or issues in safety-critical components early. These meetings allow problems and issues to be revealed and worked prior to design rollout at Milestone Reviews (e.g., Preliminary, Critical). Software Safety personnel participate in these meetings to monitor and analyze the safety aspects of the software design including any changes, and to continue updating their hazard analysis (see Software Safety and Hazard Analysis product).
One of the most important aspects of a design for safety critical software is to design for minimum risk. “Minimum risk” includes the hazard risks (including loss of life, mission, and space assets), security risks, design choice risks, human errors, and other types of risk such as programmatic, cost, schedule, etc. When possible, the design should eliminate or mitigate identified hazards and risks or reduce the associated risk through design (e.g., redundancy, isolating safety critical software). Listed below are some ways to mitigate or reduce risks through software design. This list may be used by meeting attendees to help evaluate the design with respect to safety and risk considerations.
Safety Considerations during Design Peer Reviews/Walk-throughs:Does the design:
- Reduce the complexity of the software and interfaces.
Design for user-safety instead of user-friendly.
Design for testability during development and integration.
Include separation of commands, functions, files, and ports.
Include design for Shutdown/Recovery/Safing.
Plan for monitoring of system/software/hardware performance and detection (for faults, malfunctions, exceeding limits, etc.).
Isolate the components containing safety-critical requirements as much as possible.
Interfaces between safety-critical components should be designed for minimum interaction.
Document the positions and functions of safety critical components in the design hierarchy.
Specify safety-related design and implementation constraints.
Specify any error detection or recovery schemes for safety-critical components.
Consider hazardous operations scenarios which may require additional software constraints such as executing commanding operations in a two-step process (arm and fire)?
- Fully consider safing and recovery actions for real-world conditions and the corresponding time to criticality? Automatic safing is often required if the time to criticality is shorter than the realistic human operator response time, or if there is no human in the loop. This may be performed by either hardware or software or a combination depending on the best system design to achieve safing.
- Follow a strategy for handling faults and failures? A consistent strategy for handling faults and failures should be used. Some of the techniques that may be used in fault management are:
- Prevent Fault Propagation: To prevent fault propagation (cascading of a software error from one component to another), safety-critical components must be fully independent of non-safety-critical components and be able to detect an error and not pass it along.
- Built-in Test: Fault/Failure Detection, Isolation and Recovery (FDIR) can be based on self-test such as Built-in-Test (BIT) of lower tier processors where the lower level units test themselves and report their status to the higher processor. The higher level processor switches out units reporting a failed or bad status.
- Majority voting: Some redundancy schemes are based on this technique. It is especially useful when the criteria for diagnosing failures is complicated (e.g., when an unsafe condition is defined by exceeding an analog value rather than simply a binary value). An odd number of parallel units are required to achieve majority voting.
- Fault Containment Regions: Establish a Fault Containment Region (FCR) to prevent fault propagation such as from non-safety critical software to safety-critical components; from one redundant software unit to another, or from one safety-critical component to another. Techniques such as firewalling or “come from” checks should be used to provide sufficient isolation of FCRs to prevent hazardous fault propagation. FCRs can be best partitioned or firewalled by hardware. A typical method of obtaining independence between FCRs is to host them on different and independent hardware processors.
- Redundant architecture: In redundant architecture, there are two versions of the operational code which do not need to operate identically unless required. The primary version is a high performance version with all required functionality and performance requirements implemented. If problems occur with the primary version, control will be passed to the secondary version (called a safety kernel) during failover. Depending on the requirements, the secondary version may have the same or reduced functionality.
- Recovery blocks: These use multiple software versions to find and recover from faults. Output from a block will be checked against an acceptance test. If it fails, then another version computes the output and the process continues. Each version is more reliable but less efficient. If the last block fails, the program must determine some way to fail safe.
Does the software design:
- Consider any potential issues with the use of COTS, Open Source, reused or inherited code?
- Consider sampling rate selection for noise levels and expected variations of control system and physical parameters?
- Identify and document tests and/or verification methods for each safety-critical design feature?
- Consider maintainability in the design? For example: anticipate potential changes in the software, use a modular design, object-oriented design, uniform conventions, and naming conventions, use coding standards that support safety practices, use documentation standards, common tool sets.
Some additional safety-specific design considerations are:
- Are the design and its safety features appropriately flowed from the requirements and the evolving hazard analyses?
- Has the design been reviewed to ensure that software design’s correct implementation of safety controls or processes does not compromise other system/software safety features or the functionality of the software?
- Have additional system hazards, causes, or contributions discovered during the software design analysis been documented in the required system safety documentation (e.g. Safety Data Package and or Hazard Reports)?
- Have the controls, mitigations, inhibits, and safety design features to be incorporated into the design been approved by the Safety Review team?
- Are any needed or identified safety conditions, constraints, parameters, trigger points, boundary conditions, environments, and other software circumstances for safe operation, in the appropriate modes and states? Are they all flowed from the software requirements and incorporated into the design?
- Does the design maintain the system in a safe state during all modes of operation or can it transition to a safe state when and if necessary? Can the system recover from the safe state?
- Are any partitioning or isolation methods used in the design to logically isolate the safety critical design elements from those that are non-safety critical effective? This is particularly important with the incorporation of COTS or integration of legacy, heritage, and reuse software. Any software that can write or provide data to safety critical software will also be considered safety critical unless isolation is built in, and then the isolation design is considered safety critical.
- Does the software design include appropriate fault or failure tolerance as planned?
- If heritage code is being used, is there a clear understanding of the design and constraints associated with any fault management in the heritage code? Are they appropriate for the current system being developed?
3.3 Other Types of Design Analysis
There are other types of analyses that may be useful during design but require more time and effort to perform. The Safety Team should consider them and choose those they feel would provide the most value, depending on the areas where risk is highest in the design. Some of these design analysis methods are:
- Acceptable Level of Safety: Once the design is fairly mature, a design safety analysis may be done to determine whether an acceptable level of safety will be attained by the designed system. This analysis involves analyzing the design of the safety components to ensure that all safety requirements are specified correctly. Check to assure the requirements are updated once the design has determined exactly what safety features will be included in the system/software. Review the design looking for places and conditions that lead to unacceptable hazards. Consider the credible faults or failure that could occur and evaluate their effects on the designed system. Does the designed system produce the desired result with respect to the hazards? Think about what the system will do for all the “what if” cases and trace through how the system would respond—Did it respond in a safe manner?
- Prototyping or simulating: Prototyping or simulating parts of the design may show where the software can fail. In addition, this can demonstrate whether the software can meet the constraints it might have, such as response time, or data conversion speed. This could also be used to provide the operator’s inputs on the user interface. If the prototypes show that a requirement cannot be met, the requirement must be modified or the design revised.
- Independence Analysis: To perform this analysis, map the safety-critical functions to the software components, and then map the software components to the hardware hosts and FCRs. All the input and output of each safety-critical component should be inspected. Consider global or shared variables, as well as the parameters directly passed. Consider “side effects” that may be included when a component is run. The goal is to verify there is separation between safety-critical and non-safety critical functions.
- Design Logic Analysis: Logic analysis examines the safety-critical areas of a software component by analyzing each function performed by that component. If it responds to or has the potential to violate one of the safety requirements, it should be considered critical and undergo logic analysis. Design Logic Analysis (DLA) evaluates the equations, algorithms, and control logic in the software design of these safety critical components. A technique for performing design logic analysis is to compare design descriptions and logic flows and then note the discrepancies. This is the most rigorous type of analysis and may be performed using Formal Methods. Formal Methods are the use of mathematical modelling for the specification, development, and verification of systems in both software and electronic hardware. The formal methods are used to ensure these systems are developed without error. Less formal DLA involves reviewing a relatively small quantity of critical software products (e.g., PDL, prototype code) and manually tracing the logic. Safety-critical logic can include failure detection and diagnosis, redundancy management, variable alarm limits, and command inhibit logical preconditions.
- Design Data Analysis: Data analysis ensures that the structure and intended use of data will not violate a safety requirement by comparing the description to the use of each data item in the design logic. The Design Data Analysis evaluates the description and intended use of each data item in the software design. Interrupts and their effect on data must receive special attention in safety-critical areas. Analysis should verify that interrupts and interrupt handling routines do not alter critical data items used by other routines. The integrity of each data item should be evaluated with respect to its environment and host. Shared memory and dynamic memory allocation can affect data integrity. Data items should also be protected from being overwritten by unauthorized applications.
- Design Interface Analysis: The Design Interface Analysis verifies the proper design of a software component's interfaces with other components of the software, system, or even hardware. This analysis will verify that the software component's interfaces, especially the control and data linkages, have been properly designed. Interface requirements specifications (which may be part of the requirements or design documents, or a separate document) are the sources against which the interfaces are evaluated. Interface characteristics to be addressed should include inter-process communication methods, data encoding, error checking (e.g., data entry validity, value/range, type checks), and synchronization.
The analysis should consider the validity and effectiveness of checksums, cyclic redundancy checks (CRCs), and error correcting code. CRC is a type of error-detecting code used in digital networks and storage devices to detect unintentional changes to raw data. Blocks of data entering these systems get a short check value attached, based on the remainder of a polynomial division of their contents. When the data is retrieved, the calculation is repeated and if the check values do not match, the data is corrupt and corrective action can be taken.
The sophistication of error checking or correction that is implemented should be appropriate for the predicted bit error rate of the interface. An overall system error rate should be defined and budgeted to each interface.
- Design Traceability Analysis: This analysis ensures that each safety critical software requirement is included in the design. Tracing the safety requirements throughout the design (and eventually into the source code and test cases) is vital to making sure that no requirements are lost, that safety is “designed in”, that extra care is taken during the coding phase, and that all safety requirements are tested. A safety requirement traceability matrix is one way to implement this analysis.
3.4 Problem/Issue Tracking System
4. Analysis Reporting Content
Documenting and Reporting of Analysis Results.
When the design is analyzed, the Software Design Analysis work product is generated to document the results. It should include a detailed report of the design analysis results. Analysis results should also be reported in a high-level summary and conveyed as part of weekly or monthly SA Status Reports. The high-level summary should provide an overall evaluation of the analysis, any issues/concerns, and any associated risks. If a time-critical issue is uncovered, it should be reported to management immediately so that the affected organization may begin addressing it at once.
When a project has safety-critical software, analysis results should be shared with the Software Safety personnel. The results of analysis conducted by Software Assurance personnel and those done by Software Safety personnel may be combined into one analysis report, if desired.
4.1 High-Level Analysis Content for SA Status Report
Any design analysis performed since the last SA Status Report or project management meeting should be reported to project management and the rest of the Software Assurance team. When a project has safety-critical software, any analysis done by Software Assurance should be shared with the Software Safety personnel.
When reporting the results of an analysis in a SA Status Report, the following defines the minimum recommended contents:
- Identification of what was analyzed: Mission/Project/Application
- Period/Timeframe/Phase analysis performed during
- Summary of analysis techniques used
- Overall assessment of design, based on analysis
- Major findings and associated risk
- Current status of findings: open/closed; projection for closure timeframe
4.2 Detailed Content for Analysis Product:
The detailed results of all software design analysis activities are captured in the Software Design Analysis product. This document is placed under configuration management and delivered to the project management team as the Software Assurance record for the activity. When a project has safety-critical software, this product should be shared with the Software Safety personnel.
When reporting the detailed results of the software design analysis, the following defines the minimum recommended content:
- Identification of what was analyzed: Mission/Project/Application
- Person(s) or group performing the analysis
- Period/Timeframe/Phase analysis performed
- Documents used in analysis (e.g., versions of the system and software requirements, interfaces document, architectural and detailed design)
- Description or identification of analysis techniques used. Include an evaluation of the techniques used.
- Overall assessment of design, based on analysis results
- Major findings and associated risk – The detailed reporting should include where the finding, issue, or concern was discovered and an assessment of the amount of risk involved with the finding.
- Minor findings
- Current status of findings: open/closed; projection for closure timeframe
- Include counts for those discovered by SA and Software Safety
- Include overall counts from the Project’s problem/issue tracking system.
No references have been currently identified for this Topic. If you wish to suggest a reference, please leave a comment below.
NASA users find this in the Tools Library in the Software Processes Across NASA (SPAN) site of the Software Engineering Community in NEN.
The list is informational only and does not represent an “approved tool list”, nor does it represent an endorsement of any particular tool. The purpose is to provide examples of tools being used across the Agency and to help projects and centers decide what tools to consider.