9.01 Software Design Principles Multithreaded software typical of mission-critical embedded applications is vulnerable to incorrect or unpredictable behavior if the interaction between threads has not been adequately designed to prevent inappropriate interference. Multitasking operating systems enable flight software developers to partition functionality into separate domains that operate semi-autonomously from other domains. This simplifies the design problem, while creating a new set of issues for the developer. Most threads of execution either supply other threads with information and services, receive data and services, or both. All threads require system resources, which may be scarce under certain circumstances. Areas of dependence and the potential for resource contention are the main source of problems in multithreaded applications. Thus, it is important to minimize the dependencies between threads wherever possible. Where different threads must share data, employ guard mechanisms to ensure that shared data structures and arrays are modified atomically (i.e., in a single, uninterrupted operation) such that it is never possible that one thread will attempt to modify data while another is attempting to use it. Lock out interrupts and mutual exclusion semaphores are two common mechanisms used. Other solutions may be more appropriate depending on the application and operating environment. Priority-based task scheduling is a common feature of commercial real-time operating systems. This approach is sufficient to ensure that many software functions are executed within a specified time and in a particular order, but may not work in all cases. Priority inversion (where a low-priority task holds a resource that a higher-priority task depends on) can be a problem in some situations, and care must be taken to avoid these situations. Some operations require tighter time constraints or ordering rules that can be difficult or impossible to guarantee with a pure priority-based scheme. In these situations additional design features, such as using a master task that controls the execution order of other tasks, may be needed. It should be emphasized that the notion of a thread of execution discussed here need not correspond to an operating system-supported thread or task; it may be a related group of operations that are controlled by mechanisms designed into the software that do not rely on operating system facilities. Conditions that could prevent a thread of execution from completing its work within the anticipated amount of time need to be designed into the software. Design threads that operate with looping constructs to limit the number of iterations of that loop in any given invocation., Use timeouts on threads that perform operations that may block, such as I/O or the taking of a semaphore, and use appropriate error handling to prevent the thread from becoming “hung” indefinitely. Using a heartbeat scheme that allows other parts of software to know when a task upon which they depend has stopped functioning will allow the dependent parts of flight software to take appropriate action rather than becoming hung themselves. None POWER of 10 (Document used as a reference in the JPL coding standards) 417 None No Lessons Learned 439 have currently been identified for this principle.
See edit history of this section
Post feedback on this section
1. Principle
1.1 Rationale
2. Examples and Discussion
3. Inputs
3.1 ARC
a. Being sensitive to identified uncertainties.
b. Precluding an undesired response to mathematical singularities or limitations.
c. Responding predictably to possible events that exceed capabilities.
Note: Examples of these situations include buffers overflowing, exceeding a rate group time boundary, and excessive inputs or interrupts. There are several common methods for tolerating these situations, most of which relate to reducing demand from non-essential items, especially if they are the source of over subscription:3.2 GSFC
3.3 JPL
Note: Examples of these situations include buffers overflowing, exceeding a rate group time boundary, and excessive inputs or interrupts. There are several common methods for tolerating these situations, most of which relate to reducing demand from non-essential items, especially if they are the source of over subscription:
Note: Watchdog timers are commonly used for this purpose. Upon completion of a defined processing path, the software resets a watchdog timer. If the processing gets lost, or fails to make progress, the timer times-out. The timer directs the software to a known point where the processing is restored.
Note: For example, software should not be interrupted in a manner that permits it to use both old and new components of a vector.
Note: A deadlock is the condition where two processes cannot proceed because each is waiting to use a shared resource held by the other.
A race condition is anomalous behavior due to unexpected critical dependence on the relative timing of events.
Non-progress cycles exist if a potentially infinite execution cycle does not include a state indicating that progress is being made.
Thread-safe is defined as code which functions correctly during simultaneous execution by multiple threads.
Model-based techniques are recommended wherever possible as a means of demonstrating compliance with this requirement.3.4 MSFC
4. Resources
4.1 References
5. Lessons Learned
9.16 Thread Safety
Web Resources
View this section on the websiteUnknown macro: {page-info}