Fault Tolerant Scheduler preliminary API

In this post I want to explain the Fault Tolerant Scheduler API and its implementation structure.

In RTEMS, the classical API implementation is located in /rtems/kernel/rtems/cpukit/rtems/src. For example, semrelease.c allows the user to release a semaphore by accessing the function rtems_semaphore_release( rtems_id id ). The header file semimpl.h and sem.h provide  the directives; they are located in /rtems/kernel/rtems/cpukit/rtems/include/rtems/rtems. In sem.h the definition rtems_status_code rtems_semaphore_release( rtems_id   id ) can be found, and it is implemented in semrelease.c. The low level directives used in semrelease.c are located in rtems/kernel/rtems/cpukit/score/src/.
The FTS API will be implemented following the same way as existing functions in the classical API.


FTS API Description
void
fts_rtems_task_register(rtems_id id, TECHNIQUE)
From the next activation on, the task with the given id will be protected.
void
fts_init_versions(taskversions)
Informs the fts about the available task versions
MODE
fts_get_mode(rtems_id id)
Returns the execution mode the protected task has to execute next.
void
fts_off(rtems_id id)
Signs off a task from protection.
int
fts_task_status(rtems_id id)
Gives information about whether the task with the given id is registered for protection.
TECHNIQUE
fts_change_tech(rtems_id id, TECHNIQUE)
Changes the fault tolerance technique of a specific task to the one specified in the parameter.
Tasklist
fts_tasklist()
Returns the list of all registered tasks.


Data structures, macros, and detailed description will follow once the basic implementation is finished.

The data structures and low level functions will be implemented separately in score/src. Using the FTS, other fault tolerance techniques could be implemented here too. The fault tolerance techniques I will implement first are from this source by Chen et al., but other techniques can be used as well. The only requirement is that different task versions with different levels of protection are used, and the FTS makes sure that the proper version is scheduled at all times.

Some notes on the protection technique:

The function staticc(rtems_id id, tec) will ensure that Static Reliable Execution (S-RE) is used to protect the control task. It uses the concept of (m,k) robustness requirements. The other techniques will follow later.

We need a data structure for every task, containing:
- m,k
- nr of executions
- nr of detects
- nr of corrects
- replenishment counters

Can possibly be difficult:
How we can detect errors and send a signal to the FTS, which will trigger a response. Create interrupt, trigger protection version ?

A quick look at the file structure:

For classical API header files:
rtems/kernel/rtems/cpukit/rtems/include/rtems/rtems

Classical API:
rtems/kernel/rtems/cpukit/rtems/src

Kernel implementation:
rtems/kernel/rtems/cpukit/score/src

-------

The video of today is about the LISA pathfinder which detects gravitational waves, launched in December 2015:


Kommentare

Beliebte Posts aus diesem Blog

Static Pattern-based Execution (SRE) in the Fault Tolerant Scheduler

Fault Injection and Detection in Static Pattern-based Execution (SDR)