In my previous blog post, I wrote about the bare-metal vs. RTOS - advantages and disadvantages of these two architectural options. A simple definition of bare metal assumes firmware development directly on your hardware with no layers in between. But does this mean that only firmware development in assembly is considered bare metal? What about development in C, using memory-mapped registers directly with the help of some header files with macros for those addresses? Is that bare metal? C is a high-level language, and we can’t always be sure how a C compiler translates our code to machine code. What’s with vendor-provided HALs in C? That’s another abstraction layer. What’s with sequencers of schedulers? Many developers see firmware development in C using HAL as bare metal, but the definition of bare metal is often blurred. The most significant difference between RTOS and everything that’s not RTOS is task pre-emptiness, the ability to give resources to a task with higher priority.
As we saw, the definition of bare metal gets blurred quickly, so in this post, we will discuss several architectures that may or may not be bare metal, but they are definitely not RTOS. The simplest one that every embedded developer starts with is superloop.
1. Superloop
Superloop is the most basic bare-metal approach. It’s a neverending loop that executes code on each iteration. The most famous example is an Arduino “loop” function. Superloop executes instructions sequentially, and the most common way of controlling the execution of individual functions is using flags that are usually set from interrupts. Time control is often done by checking the time passed from the last iteration or setting a flag from a timer ISR.
For the purpose of example, we will use a simple code snippet that illustrates reading data from a sensor if an interrupt is raised on a GPIO, processing it, and sending it every 10 seconds to the server.
Superloop ends in spaghetti code very quickly. Adding new functionality can be challenging, as well as adjusting the existing application logic and finding and fixing bugs. Being simple as it is, superloop is still often used in creating fast proof of concept solutions with limited functionality. It is not a good choice for time-critical systems as it is hard to guarantee the duration of execution of an individual task. Also, there is no framework that can guarantee that higher-priority tasks will be executed on time.
If you are anything like me, you will start looking for more elegant solutions that are both more flexible for adding new features and easier to maintain. The natural evolution of the superloop brings us to a sequencer. Instead of writing logic that controls the execution of individual functions using flags set in an ISR, we will add tasks (function pointers) to a sequencer, which will, in turn, execute those tasks.
2. Sequencer
Sequencer is a simple design pattern that allows you to add tasks to the sequencer module that needs to be executed. Sequencer will run forever and execute available tasks. Tasks can be added to the sequencer from external or internal (timer) interrupts. Tasks can also be assigned priority so that the sequencer executes the first available tasks with higher priority. To guarantee deterministic behavior, a watchdog is deployed to guarantee the task duration, and time-critical tasks must internally ensure execution time or call frequency using timers.
The first thing we notice from the image above is that the code is cleaner and simpler. There are no flags that we need to set in ISR and reset after the execution. Sequencer is an excellent example of bare-metal event-driven design.
There are different implementations of a sequencer; tasks can be defined and added during compile time with predefined indexes and priorities and set for execution during run time as needed.
3. Cooperative scheduler
A cooperative scheduler is a pattern that builds upon the sequencer. It is a combination of a sequencer and a virtual timer. It is more restrictive in terms of timing requirements as it allows only a single interrupt - a timer interrupt that provides a tick for a whole system. Tasks are scheduled to be executed with a set interval after an initial delay that the developer also sets.
The cooperative scheduler takes care that tasks are executed at set intervals. However, it’s still up to the firmware developer to guarantee task duration and ensure tasks do not overlap. A timer is used to generate ticks, and sch_update function is called to update a data structure that describes individual tasks. These structures contain a function pointer for a task and data members used to count the ticks. In order to guarantee updates of data structures that describe tasks, sch_dispatch is called from the super loop instead of the timer ISR.
In the example above, sensor_data_read_if_available will be set to run every 10 ms and sensor_data_process_and_send every 10 s. The initial interval for both tasks is 0, which means that these two tasks will overlap as 10 s is a multiple of 10 ms. This will introduce a jitter for one of the tasks. As is the case with the sequencer, a watchdog can be employed to guarantee the duration of individual tasks. Another caveat of the cooperative scheduler is it allows only a single interrupt - the one provided by the timer that generates a tick for the system. This means that in order to use it safely, external interrupts can’t be used, and the status of GPIOs needs to be polled periodically in a task.
There is a simple way to make the scheduler preemptive for one of the tasks. If declared as a preemptive, the task can be run inside the timer ISR, which will make it run as soon as its time for execution comes inside the sch_update function instead of sch_dispatch.
Michael J. Pont describes this architecture in more detail in his book “Patterns for Time-Triggered Embedded Systems”. The content in the book is well presented with code implementation in C. Code is written for 8051 microcontrollers but can be easily applied to other architectures. The book can be downloaded for free.
4. When to choose bare-metal architectures and when to go with RTOS?
In my previous blog post, I briefly described RTOS. One of the main components of RTOS is a preemptive scheduler. It is a part of RTOS that is responsible for managing the execution of tasks. The scheduler determines which task should be executed next based on their state and priority levels. It is designed to provide a deterministic execution of tasks. This means that the tasks are executed in a predictable timely manner.
A preemptive scheduler solves the problem of low-priority tasks with long execution times, such as writing a file to a file system. When a higher-priority task becomes ready, the scheduler will make a context switch and allow it to run. Low priority task can resume after high priority task is executed. This takes off the mental burden from a developer when it comes to low-priority tasks that can take a while to execute. In order to meet hard real-time requirements, if using a sequencer or cooperative scheduler, these tasks have to be split into multiple smaller tasks for which duration can be guaranteed, which presents additional mental effort for a developer.
IoT applications that use some of the networking protocols are usually good candidates for an RTOS. Running networking stacks and application logic in bare metal event-driven architectures is possible. However, creating an RTOS task and pretending it will run forever is easier than writing discrete actions based on a ticker event.
With all said, an RTOS seems like a preferred architecture for embedded systems, but that’s not always the case. RTOS provides you with infrastructure that ensures both deterministic behavior and scalability, but it comes with the cost of additional resources being used. It requires some RAM to keep things running and also a flash to store the instructions.
If you are writing a simple application with limited features that doesn’t have hard real-time requirements, then bare-metal may have more sense. Also, hardware plays an important factor in the selection of architecture. In order to keep costs down, a product may utilize a low-cost MCU with a limited amount of RAM and flash. Still, we can ensure a good level of flexibility and still meet real-time requirements by using a sequencer and event-driven design or a cooperative scheduler.
A simple digital clock, thermometer, thermostat, BLE IMU data collecting device, or a simple medical current regulating device can be implemented using a bare-metal approach. A smartwatch, an IoT gateway, or a gaming console with a display, haptic motor, and IMU data collecting and streaming over BLE may be candidates for an RTOS.
Lines are often blurred, and there are no definitive guidelines for selecting appropriate architecture. There are many factors that play a role in the process, and they all need to be taken into account.
Wonderful post, Keep up with the good work! It would be amazing to expand this post with more code examples and details.