RTOS for Safety Critical Systems

Jan 6, 2018

January 6th, 2018 by

Systems of critical responsibility are operated by computer systems known as safety critical systems. The slightest deviation in the systems’ behavior or the environment, as well as a system error or failure, can cause operational catastrophes or hazardous situations. Therefore, safety-critical systems should not only guarantee absolute real-time behavior but also, availability and dependability of required system services. RTOS serve as the underlying platform that supports all safety and real-time features in a bid to save application developers the hustle of implementing real-time and safety mechanisms on every application.

Additionally, development and design standards must be adhered to prevent system failures and subsequent critical consequences. The standards define the techniques and methods that should be applied to enforce quality safety-critical applications and prevent system failures. The two common standards are IEC 61508 and DO-178B.

IEC 61508: it is a functional safety standard for electronic, electrical and programmable electronic safety specific systems. It is a core safety standard that sets the development basis for other domain-specific standards. It defines the safety-related systems’ lifecycle as well as its development and operational requirements. It provides solutions by recommending techniques and measures that can be used to prevent system failures and effective methods of controlling possible failures if they occur.

DO-178B: it states software considerations in the airborne equipment and systems certification. As such, it specifies the development guidelines for the avionic software. It sets a stringent and application dependent safety standard.

For systems to function optimally, they require integration of different applications with varying criticality levels into one platform. Hence, safety-critical operating systems’ applications are challenged with guaranteeing resource and processor time availability. Such challenges must be naturally integrated into the RTOS architecture through time and space domain protection.

Time Domain Protection

Operating multiple applications that have varying levels of severity on a single processor may compromise provisions that guarantee the required processor time for critical applications. For instance, two applications with varying severity levels, and a single thread, each with the same priority and running on one system. Thread 1 under application 1 is a non-critical thread while thread 2 under application 2 is a critical thread. In this case, thread 2 would require at least 45% of the processor’s time to process its workload effectively. But, the system allots equal priority to the threads; meaning that, the scheduler will assign 50% of the processor time to each thread. In this scenario, critical thread 2 will process its workload effectively. However, if thread 1 introduces a new thread with the same priority, the scheduler must allot equal time to the three threads. Therefore, each thread would work with 33% of the processor’s time. In such a scenario, thread 2 would not handle its workload effectively. Hence, the need for protection in the time domain.

Space Domain Protection

As a result of predictability issues, RTOS designers prefer not to use virtual memory management. The fact that different applications with varying levels of severity are required to run on a single processor means that the applications’ processes share one memory space. As a result, a process can easily corrupt the data, code or stack an additional process intentionally or unintentionally. Additionally, the process can also affect the operating system’s kernel code or data and end compromising reliability and safety. It can result in unexpected system behavior that affects predictability and can easily bring down the entire system. Hence, system memory protection is a major issue in RTOS safety-critical systems.

Secure Operating System Architecture

The solution to time and space domain protection requirements is an operating system architecture that defines a fully secured and partitioned RTOS. There are two types of partitioning; spatial and temporal. However, these operating systems must comply with the basic RTOS design. The only difference is positioned above the OS core layer within the application layer that hosts different partitions of the regular application layer. Every partition is apportioned to an integrity level that only allows running of applications compliant with the integrity level. Additionally, it comprises of a small partition OS that executes RTOS services as per the safety features needed for the safety integrity level in action. However, the dependent functions of the system hardware, scheduler, and device drivers rely on the operating systems’ core layer.

Summary

Designing the right RTOS architecture requires delicate attention and decisions. Basic real-time operating system services such as inter-process communication, process synchronization, process management and interrupt handling must unfold efficiently.