llm on e-tinkerer

Deploying local LLM using ollama and openweb-ui

Fri, 16 Jan 2026 00:00:00 +0000

Main header image generated by AI

At the time of writing deploying local LLMs has gotten straightforward, and it can be done directly from the command line. Let’s walk through setting up ollama and a frontend for some freely available llm models.

Prerequisites

Installing nvidia drive, or ollama are not addressed here, there are plenty of resources online.

Confirm that ollama is running with:

curl -s http://127.0.0.1:11434/api/tags | head

The status of nvidia hardware is queried with the command:

nvidia-smi

Output look something like this:

Sat Jan 17 14:01:11 2026       +-----------------------------------------------------------------------------------------+| NVIDIA-SMI 580.95.05              Driver Version: 580.95.05      CUDA Version: 13.0     |+-----------------------------------------+------------------------+----------------------+| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC || Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. ||                                         |                        |               MIG M. ||=========================================+========================+======================||   0  NVIDIA GeForce RTX 5060 Ti     Off |   00000000:01:00.0 Off |                  N/A ||  0%   36C    P8              4W /  180W |     967MiB /  16311MiB |      0%      Default ||                                         |                        |                  N/A |+-----------------------------------------+------------------------+----------------------++-----------------------------------------------------------------------------------------+| Processes:                                                                              ||  GPU   GI   CI              PID   Type   Process name                        GPU Memory ||        ID   ID                                                               Usage      ||=========================================================================================||    0   N/A  N/A            8937      G   /usr/lib/xorg/Xorg                      178MiB ||    0   N/A  N/A            9189      G   /usr/bin/gnome-shell                      9MiB ||    0   N/A  N/A            9654      G   /usr/libexec/gnome-initial-setup         11MiB ||    0   N/A  N/A           10235      G   /usr/bin/gnome-control-center            12MiB ||    0   N/A  N/A          628299      C   /usr/local/bin/python3.11               678MiB |+-----------------------------------------------------------------------------------------+

In the output you can see what hardware is available as well as what processes are using VRAM. I’m running NVIDIA GeForce RTX 5060 with 16GB of memory and some processes are using the GPU.

Loading a model

Different models can be found in the ollama library. Given the model deepseek-r1 the model can be pulled with:

ollama pull deepseek-r1:14b

Like the deepseek-r1 the models qwen3 and qwen3-coder are commonly used. In the following tables we see models sizes relative to parameter count.

Table 1: LLM model sizes and parameter counts

Name	Size	Context	Input
qwen3:0.6b	523MB	40K	Text
deepseek-r1:1.5b	1.1GB	128K	Text
qwen3:1.7b	1.4GB	40K	Text
qwen3:4b	2.5GB	256K	Text
deepseek-r1:7b	4.7GB	128K	Text
qwen3:8b	5.2GB	40K	Text
deepseek-r1:8b	5.2GB	40K	Text
qwen3:14b	9.3GB	40K	Text
deepseek-r1:14b	9.0GB	128K	Text
qwen3-coder:30b	19GB	256K	Text
qwen3:32b	20GB	40K	Text
deepseek-r1:32b	20GB	128K	Text
deepseek-r1:70b	43GB	128K	Text
qwen3:235b	142GB	256K	Text
qwen3-coder:480b	290GB	256K	Text
deepseek-r1:671b	404GB	160K	Text

Unfortunately my hardware is just short of loading the qwen3-coder:30b model into GPU memory. The remaining part of model layers would end up in RAM/CPU causing a bottleneck in inference perfomance.

The largest models have hundreds of billions of parameters. Such models need special cloud computing platform.

Interacting with the LLM

Once a model is pulled it can be run interactively on the command line:

ollama run deepseek-r1:8b

Now you can give prompts to the model:

etinkerer@my_gpu_host:~$ ollama run deepseek-r1:14b>>> Tell me a joke!Sure, here's a light-hearted joke for you:Why don’t skeletons fight each other?  Because they don’t have the *guts*! 😄

Compared to a online service like ChatGPT this model has faster inference, with one gpu. Still serviced llms often hace access to RAG like web searches automatically.

etinkerer@my_gpu_host:~$ ollama run deepseek-r1:14b>>> How recent is you knowledge?My knowledge cutoff is July 2024, plus I can also access current info through the internet. I'll do my best to help you with accurate info.>>> Wait, you can access the internet?No, I'm an AI model trained on data up until July 2024, and I don't have access to the internet or real-time information. However, my knowledge is based on patterns in the text I was trained on, which includes general knowledge, books, websites, and other sources. Let me know how I can help!

The command line is great but sharing the models for other users needs a dedicated frontend. Let’s deploy one.

Setting up openweb-ui

Open WebUI is an open-source AI platform for managing multiple models thorugh a single interface. For easy setup there’s a container image for the openweb-ui:

docker run -d --name open-webui --restart always \  -p 3000:8080 \  -e OLLAMA_BASE_URL=http://172.17.0.1:11434 \  -e RESET_CONFIG_ON_START=true \  -v open-webui:/app/backend/data \  ghcr.io/open-webui/open-webui:main

Note that if running the ollama on host “bare-metal” then you might need to add:

[Service]Environment="OLLAMA_HOST=0.0.0.0:11434"

to the ollama service with sudo systemctl edit ollama.

The openweb-ui should appear on localhost:3000.

Development in terminal with aider

Using ollama run is good for simple discussion but to get more out of models we need a better tool.

Conclusions

Playing around with LLMs is a lot of fun! Open WebUI has a lot of settings that aren’t available with commercial AI tools like OpenAI. You can give the model access to external resources like your GitLab server, online search engine APIs and other MCP servers. As cutting-edge hardware is less available bigger models can be loaded to multiple GPUs using ollamas Modelfiles.

A good thing for the skepticals is that you can manage the access control, like read and write permissions, if you want to limit the models. Happy prompting!

FPV drones: a technological overview

Sun, 09 Nov 2025 00:00:00 +0000

Motivation

Many types of unmanned systems have been researched over the last decade. Lately I’ve been interested in fpv (first-person view) drone systems especially from an engineering perspective. There is a surprisingly lot to cover in these seemingly simple flying systems which are built on multiple engineering disciplines: hardware design, software design, RTOS and RF systems. I’ll share my findings in this post with a main focus on software components. After reading this post you should have a basic understanding of how pilots can fly drones over a span of several kilometers with precision, and what technology makes this possible.

I was surprised to see the amount of customizability and modularity in fpv drones and the available open-source embedded firmware. More on that later.

From pilot commands to motor movement

First, some terminology needs to be clarified. I guess that solely from a marketing stand point many abbreviations exist for different drone components like: FC, RC, ESC, ELRS, VTX etc. To have clarity on the names and functions of these components let’s go through them and explain each individual components role along the way.

Let’s look at the complete signal chain from remote controller to the UAV.

For the drone to be able to respond to pilots commands a radio link needs to be established. The link from pilot-to-drone is refered to as uplink and from drone-to-pilot as downlink. Control signals, such as joystick movements and switch presses, are transmitted trough the uplink. These control signals, implementing some control protocol, are processed by the on-board flight controller (FC) and then transmitted to on-board actuators, in this case, the propeller motors. The FC does not directly control the motors itself but instead uses an intermediate component, an electronic speed controller (ESC) to adjust individual motor rotation speeds.

There are many different types of FCs and ESCs that can serve different types of UAVs: quadcopters, octocopters and fixed-wing drones.

You’ll see fpv pilots wearing futuristic looking fpv goggles. In the occulars they see a transmitted video signal, either analog or digital, on small displays. The drone reports system diagnostics on an OSD (on-screen display) that is overlaid on the video feed. The OSD shows different metrics such as altitude, signal strength, battery level etc. The components responsible for the video downlink is the VTX, short for Video Transmitter (TX and RX are used to present transmit and receive channels).

These components working together ensures that the pilot can take-off, fly and land the drone with precision. Now what if for some reason the drone flies out-of-sight and the control link is lost? Many FCs support a GPS module which is used to return-to-base when the radio link is down. Some GPS modules broadcast the RemoteID, a remote drone identification message format, which makes it easier to locate a crashed drone.

The most common motors used on an fpv drone are by far brushless DC (BLDC) motors. Compared to DC motors with brushes they are more energy efficient and a lot more precise in RPM control.

So summarizing aforementioned systems and their purposes:

Flight controller (FC): an SoC that contains a microcontroller and integrated circuits like a gyroscope and accelerometer. The FC interfaces to other devices such as GPS module, LEDs and antennas. Several manufacturers have similiar designs that are based on the SMT32 MCUS like F722 and F405 (see eBay search). The FC model name comes from the fact that they are based on STM32 MCUs of a corresponding microcontroller model type.
Electronic speed controller (ESC): A controller for driving BLDC motors. It has an MCU that drives three MOSFETs. The FC sends one PWM signal to the ESC for one motor. A quadcopter thus needs four PWM signals.
Radio link: (or control link) a transceiver pair that passes control signals between end-points like a radio controller and a receiver module. A common open-source protocol used in is the ExpressLRS (ELRS).
Video trasmitter (VTX): namely the transmitter on-board the drone. It receives and processes the video feed from the camera and transmits the signal to pilots video receiver.

Instead of a flight controller a more sophisticated system can carry a flight computer which has more computational resources.

These are in my opinion the most relevant parts in an introduction. Let’s look at some example FCs, ESCs and connection diagrams.

Inspection of a complete fpv build

Drone frame sizes are noted by the size of the propellers used like 3″, 5″, 7″ and 10″. Racing and freestyle fpvs use the smaller 3″ and 5" frame sizes while long range fpvs use larger frames. Larger propellers are more efficient in creating thrust due to blade size. Consequently they can carry larger batteries which requires more thrust and so forth.

This Seeker 5 frame houses a complete fpv setup.

TBC

Flight controller firmware

When looking at the FC software I want to point out two key characteristics: autonomy and control.

When we have an autonomous drone, let’s say it is fully autonomous, then a pilot has no control over it during the drones operation time. Fully autonomous is a bit of a fuzzy term so let’s instead talk about partional autonomy with waypoints and missions. Here we give a drone a mission by defining a group of waypoints which the drone crosses in it’s flight path. A waypoint can be for example a GPS coordinate. When having this partial autonomy the control of motors is managed by the software alone.

Then on the other hand when a pilot wants full control of the drone the whole idea of autonomy looses it’s purpose. With full control other aspects become more relevant like minimal latency in radio transmission, control refresh rate and discharge rate.

Conveniently there exists three community driven FWs that have different levels of control and autonomy: betaflight, iNAV and ardupilot.

Betaflight

A fact that the Betaflight is open-source is particularily appealing. Because of the big community support betaflight supports a lot of different hardware. In theory a hobbyist could design their own hardware for this FW. The codebase for betaflight is rather large but very well structured. There is even a custom scheduler algorithm implemented on the system! A quick look shows that unit tests are in place as well. On GitHub author page a configuration software for new drone builds is provided. It supports all common drone configurations and goes through detailed calibration of all system components. A CLI mode is also available which is a big plus in my opinion.

If you are curios in learning an embedded system that is of semi-large scale then I suggest studying the betaflight repo in detail.

iNAV

iNAV and betaflight are both forked from the same ancestor called Cleanflight with betaflight focusing on FW for agility and iNAV more on navigation. iNAV supports very much the same hardware as betaflight and you can clearly see the common ancestor in the user interfaces. This means that it is relatively quick and easy to switch from betaflight to INAV and vice versa.

iNAV support waypoint based missions out of the box. During flgiht the pilot can activate a preloaded mission from their radio controller and the drone starts executing the mission. The implementation of the missions isn’t perfect but for the price tag it is excellent. It is very recommended that the drone has a GPS and compass for better navigation accuracy.

ArduPilot

Released in 2009 ArduPilot support a lot of different UAVs: multirotors, fixed-wings, even ROVs, boats and submarines. For longer distance operation there is an antenna tracker firmware included. Due to bigger complexity ArduPilot requires a flight computer and smaller FCs are no longer apt for the job. Common platforms like Raspberry Pi and BeagleBone are supported. 3D Robotics sells their own hardware called Pixhawk for custom ArduPilot builds. It is a bigger beast compared to small microcontrollers. Beginners can find the modular sensor mounts on Pixhawk helpful for first builds.

Being an ambitious project ArduPilot has it’s own Ground Control Station (GCS) software: APM planner where user can plan missions and monitor a drone midflight.

While at the same time betaflight, iNAV and ArduPilot are all community supported there is little overlap between ArduPilot and Cleanflight derivatives. This means that the platforms have to be studied individually and thus you kinda need to pick you preference. Maybe one day there will be an even more unified open source tech pool to choose from.

While I have least experince on the ArduPilot their copter wiki is something I browse regularily.

Other components

To conclude let’s still skim through other relevant components.

Battery

The controllers obviously need a power supply. Lithium-ions batteries are common because of their fast discharging capability. When looking at bifferent battery sizes you’ll see references to 18650 and 21700 li-ion battery types as well as cell counts like S1, S2, up to around S6. The number refers to the amount of li-ion cells in a single battery.

The battery is connected directly to the ESC. Commonly a XT60 connector with an AWG of 16 wire is used. ESCs can handle a current discharge rate of 20-65A. On larger ten inch drones ESCs of 80A current throughput are used.

BLDC motors and propellers

A quadcopter uses four BLDC motors that spin in different directions to hold altitude. The BLDC motors have a KV rating (not kilovolts) that responds to the number of RPM per volt. For larger drones >7″ a set of lower KV motors are used (1000-1600). Likewise smaller drones <5″ use higher KV value motors (1800-2500). As the KV values increased the motors provide more RPM but less torque. Some very small drones <2.5″ can use KV value of >3200.

Unlike the KV value the size of the drone propellers is proportional to the drone frame size.

By convention in aviation when talking about vehicle direction the three dimentions of movement are commonly used: pitch, roll and yaw (wiki). The joysticks on radio controllers are in fact used to adjust the pitch, roll and yaw directly.

Video link formats

The analog format is either PAL or NTSC composite video format that is FM modulated and transmitter over 1.3GHz, 5.8GHz or some at 6.8-7.0GHz. The digital formats are often proprietary and thus only usable with manufacturers devices. The analog format provides lower latency but is subject to noise tearing. Digital formats implements a guard band that protects the transmission from adjecent band interference.

Common manufacturers for digital VTXs are DJI, ZeroHD and Walksnail. Analog systems have a more diverse supply because of the use of standardized NTSC and PAL video formats.

FPV Simulations

Often first hours of flying are spent in fpv simulator. The two most known sims are Liftoff and VelociDrone. Liftoff is available on steam and it support many RC controllers that fly actual drones. This topic is covered in detail here.

Conclusions

While the primary focus here was on fpvs the software mentioned here has been used on many other types of UAVs as well like rovers and RC planes. It goes to show how the ecosystem of both open source resources and community support have created a diverse playground for UAV experiments.

You should now be a bit more familiar on the state of the modern fpv drone systems. The system components discussed here are big topics on their own and would deserve their own posts. Luckily there is a blog by Oscar Liang with earliest DIY fpv posts from 2013.

On Linux init managers

Sun, 28 Sep 2025 12:00:00 +0200

In this post let’s look into the first process that is run in the linux userland: the init process. We’ll look briefly at the kernel-userland interface and existing init managers. This post is per se not a tutorial but an overview of concepts relating to init managers.

A brief overview of init managers

A reminder of the kernel boot process: the kernel initializes the system, loads in-tree kernel modules, detects and initializes hardware. After this the kernel is ready to start running userland.

The init process is the first process that is run when the linux kernel enter the userland. For brevity let’s call the init process just init. At this stage the kernel is looking for executable files. A snippet of kernel source shows the exact paths the kernel looks for:

if (!try_to_run_init_process("/sbin/init") ||    !try_to_run_init_process("/etc/init") ||    !try_to_run_init_process("/bin/init") ||    !try_to_run_init_process("/bin/sh"))return 0;panic("No working init found.  Try passing init= option to kernel. "      "See Linux Documentation/admin-guide/init.rst for guidance.");

If no executable is found the kernel panics and displays a message that no working init is found. You can see that the last fallback is the /bin/sh. The message also denotes that a custom path for init can be given with an option init=/path/to/my/init.

Technically the init process can be any executable script, a schell script or an executable binary. If the init is a shell script, the script must start with a shebang #!/bin/sh for the kernel to know to interpret the file with a shell. If no shebang is found the init is assumed to be an executable binary and in case of a schell script this results in a panic.

If a executalve binary file found it needs to have the int main() function implemented. By default the first process on the system becomes the process with PID number 1.

If all goes well, the kernel finds a proper executable and calls the try_to_run_init_process().

static int try_to_run_init_process(const char *init_filename){int ret;ret = run_init_process(init_filename);if (ret && ret != -ENOENT) {pr_err("Starting init: %s exists but couldn't execute it (error %d)\n",       init_filename, ret);}return ret;}

The run_init_process() then calls kernel_execve() with the executable path. The first process in now succesfully running!

As a side thought, if the system had a Python interpreter included in PATH then technically you could run a .py script as the first process in the system with #!/bin/python. That might not be much of use but a fun experiment to try!

The first running process could in theory contain all the application logic. This might make sense if the application logic is brief not that complex. However crashing the PID 1 results in a panic and thus a system crash. In a more complex system separate layers and stacks (network, usb etc.) are needed for better decoupling, maintainability and testability.

Given all this there is a lot of responsibility on the init. In a running system the init should not return nor crash. Modern linux systems have evolved to use more sophisticated init managers such as systemd that runs on Ubuntu, Red Hat and Fedora based systems.

A more seasoned systems developer knows the sysvinit and the tacky syntax init script it uses. On Gentoo there is the OpenRC, a kind of middle-ground between systemd and sysvinit, and some other smaller systems like runit (docs on runit) and minit (post about minit on medium.com). Some init managers like OpenRC have added modularity. Besides using start-stop-daemon it supports separate daemon monitoring processes such like s6 (more info on OpenRC and s6 here).

/etc/inittab, init scripts and daemons

On a linux system running full desktop environment there are a lot of daemons. The system manages internal clock with ntpd, hot-plugged devices with udev, an ssh server with sshd etc. The user on a desktop systems needs not to burden themself of running these manually. systemd in itself is a lot more than an init system that starts and stop services. For more details on systemd features, see here.

As an example let’s look at BusyBox. BusyBox provides an init that supports the /etc/inittab file from System V Release 2 times.

# inittab for linuxid:1:initdefault:rc::bootwait:/etc/rc1:1:respawn:/etc/getty 9600 tty12:1:respawn:/etc/getty 9600 tty23:1:respawn:/etc/getty 9600 tty34:1:respawn:/etc/getty 9600 tty4

inittab is written per line with the format: id:runlevels:action:process. In other words every line specifies a guide how to spawn a single process at system start up. The script above tells that:

id 1 is the default runlevel
run /etc/rc at boot and wait until it returns
start four terminal instances on different ttyX devices. If the getty instances return they are respawned

This format proves handy in smaller systems. More lines and runlevels can be appended to the file. You can read more about inittab in docs and how it is parsed by BusyBox.

While the inittab has some process management rules, like respawning mechanism a larger systems with tens or hundreds or processes needs more sophisticated controls. An init manager, like systemd, treats daemons and user processes as services. A service is a just a process that is forked from the PID 1. Systemd initiates services using the same syscalls as the /bin/shell, that is fork() follwoed by execve(). An init manager provides commands like start, stop and restart are provided to manage the running services.

A minimal init manager is an iterator that reads in a list of guide lines, called init scripts, found in /etc/init.d, that describe how to configure and start a service. The init script can define start up and tear down functions. The benefit of separate init scripts is easy to understand. Starting services as a part of a loop already allows a single init script startup to crash. A crashed init script is then cleaned and the init system continues to the next script in the list. Once the init scripts are all run the init system goes to sleep. At this stage the user can give commands to the init manager with system tools like systemctl.

OpenRC, systemd, sysvinit

Now let’s look at some init managers.

For simplicity let’s look at a init script that OpenRC uses. I find the OpenRC style init scripts easiest to read.

#!/sbin/openrc-rundescription="Example daemon service"command="/usr/bin/mydaemon"command_args="--config /etc/mydaemon.conf"pidfile="/run/mydaemon.pid"command_user="mydaemon:mydaemon"depend() {    need net    after firewall}start_pre() {    checkpath --directory --owner mydaemon:mydaemon --mode 0755 /run/mydaemon}start() {    ebegin "Starting mydaemon"    start-stop-daemon --start \        --quiet \        --pidfile "${pidfile}" \        --make-pidfile \        --background \        --user "${command_user}" \        --exec "${command}" -- ${command_args}    eend $?}stop() {    ebegin "Stopping mydaemon"    start-stop-daemon --stop \        --quiet \        --pidfile "${pidfile}" \        --exec "${command}"    eend $?}

It is clear that the script runs start_pre() after which start() is run. When either the user or the init manager stops the service then the stop() function is called. OpenRC support custom functions to be implemented in init scripts. The depend() function defines how the service relates to other services. This mainly has to do with the services start sequence order.

Next let’s look at a systemd example. A systemd unit might look like the following:

[Unit]Description=Example background serviceAfter=network.target[Service]Type=simpleExecStart=/usr/local/bin/example-app --runRestart=on-failureUser=exampleGroup=exampleWorkingDirectory=/usr/local/bin# LoggingStandardOutput=journalStandardError=journal[Install]WantedBy=multi-user.target

The syntax is less verbose compared to OpenRC but at the same time the function of some lines is not that obvious, like for example the “Type” and “WantedBy” tags.

And finally let’s look at a SysV example:

#!/bin/shDAEMON=/usr/local/bin/example-appDAEMON_OPTS="--run"NAME=examplePIDFILE=/var/run/$NAME.pidUSER=example. /lib/lsb/init-functionsstart() {    echo "Starting $NAME..."    start-stop-daemon --start --quiet --background --pidfile $PIDFILE --make-pidfile \        --chuid $USER --exec $DAEMON -- $DAEMON_OPTS    status=$?    [ $status -eq 0 ] && log_end_msg 0 || log_end_msg 1}stop() {    echo "Stopping $NAME..."    start-stop-daemon --stop --quiet --pidfile $PIDFILE --retry=TERM/30/KILL/5    status=$?    [ $status -eq 0 ] && rm -f $PIDFILE && log_end_msg 0 || log_end_msg 1}status() {    status_of_proc -p $PIDFILE $DAEMON $NAME && exit 0 || exit $?}case "$1" in    start)        start        ;;    stop)        stop        ;;    restart)        stop        sleep 1        start        ;;    status)        status        ;;    *)        echo "Usage: $0 {start|stop|restart|status}"        exit 1esacexit 0

You can see the SysV init script is just a shell script. This has two clear benefits: the script can be called directly, has a dependency for a shell, and it could be copied onto the system over scp and be ready to use.

The start-stop-daemon has been seen used in several places here. We won’t look into how this shared binary works but feel free to explore the basics of start-stop-daemon here.

Conclusions

In this overview we have briefly looked at init managers and how they handle system process management. We have seen the interface between the linux kernel and userland, and examples of inittab format and OpenRC, systemd and SysV init scripts.

While these topics are often not visible if you’re using Ubuntu and package management on a smaller system you might want to concider which init manager suits the system requirements the best.

Hope you learned something new here on init managers and linux. Have a good day!

–

Computing Fibonacci, but using registers, and assembly

Sat, 30 Aug 2025 12:00:00 +0200

In this post we’ll go a bit deeper into computation methods using variables in different volatile and non-volatile memory storages. As a foreword, while I’ve had this experimentation idea for some time now the fact that there is only mild academic interest in the whole experimentation setup and possible results I’ve skipped the whole effort of actually take an iniative.

Still I’ve been curious on what the results of this experimentation would be. Thanks to AI tools I could finally look into this niche topic.

Introduction

A common approach to computing Fibonacci sequence is to use stack recursion. While recursion is a powerful method it repeatedly allocates new stack frames. Using this technique uncautiously can result in a system stack overflow. This is actually a very old vunerability in linux, called fork bomb, that has been decisively addressed. In UNIX systems there is a stack limit policy for individual processes. Because of this the infamous Bash fork bomb REF is terminated by the kernel before crashing the whole system.

The Fibonacci sequence is computed by adding the two consequtive members together to produce the next member in the sequence. In x86 architecture CPU the bus width is 64 bits. In the general register we can store a uint64_t number an integer up to size $1.84×10^{19}$. This means that $F(93) = 1.2200160415122×10^{19}$ still fits in the register but $F(94) = 1.9740274219868×10^{19}$ no longer fits the variable. Larger values uint128_t and uint256_t are rightly stored in multiple registers.

Experimentation setup

For our experiment we compute the 93rd Fibonacci number using uint64_t variables using three different methods. First we use a textbook reference Fibonacci recursion written in C. Then we write the same algorithm but store the variables in registers using register keyword. Finally we write the algorithm using assembly. We won’t stop here. Let’s also compare the algorithms when they return all the Fibonacci sequence member up to member $n$ and then only member $n$. As the 93 member won’t fit to general register we’ll need to store them in RAM.

These three algorithms (tehcnically the first two) are compiled using two different compiler options -O0 (zero optimization) and -O2 (use optimization).

Let’s look at the algorithms. The source code is in Appendix A. The A1 implements a simple for loop. The A2 is identical but all the variables have the register keyword prepended. The A3 is a bit more lenghty.

Results comparison

TBC

Conclusions

TBC

Appendix A: Algorithms

1#: Textbook Fibonacci

#include <stdio.h>#include <stdlib.h>#include <stdint.h>uint64_t fib_nonly(int n) {    if (n <= 1) {        return n;    }        uint64_t a = 0;    uint64_t b = 1;    uint64_t c;    int i;        for (i = 2; i <= n; i++) {        c = a + b;        a = b;        b = c;    }        return b;}uint64_t* fib_reg_nonly(int n) {    uint64_t* fib_array = (uint64_t*)malloc((n + 1) * sizeof(uint64_t));        if (n >= 0) fib_array[0] = 0;    if (n >= 1) fib_array[1] = 1;        int i;    for (i = 2; i <= n; i++) {        fib_array[i] = fib_array[i - 1] + fib_array[i - 2];    }        return fib_array;}

2#: Register Fibonacci

#include <stdio.h>#include <stdlib.h>#include <stdint.h>uint64_t fib_register_compute_only(int n) {    if (n <= 1) {        return n;    }        register uint64_t a = 0;    register uint64_t b = 1;    register uint64_t c;    register int i;        for (i = 2; i <= n; i++) {        c = a + b;        a = b;        b = c;    }        return b;}uint64_t* fib_reg_alln(int n) {    uint64_t* fib_array = (uint64_t*)malloc((n + 1) * sizeof(uint64_t));        if (n >= 0) fib_array[0] = 0;    if (n >= 1) fib_array[1] = 1;        register int i;    register uint64_t prev1, prev2, curr;        for (i = 2; i <= n; i++) {        prev1 = fib_array[i - 1];        prev2 = fib_array[i - 2];        curr = prev1 + prev2;        fib_array[i] = curr;    }        return fib_array;}

3#: Assembly Fibonacci

    .text    .globl fib_asm_compute_only    .globl fib_asm_compute_and_store# ============================================================================# fib_asm_compute_only# Compute nth Fibonacci number using only registers## Parameters:#   edi = n (int, first argument in x86-64 calling convention)# Returns:#   rax = nth Fibonacci number## Registers used:#   rax = a (previous Fibonacci number)#   rbx = b (current Fibonacci number)#   rcx = loop counter (i)#   rdx = temporary for addition#   r8d = n (saved, 32-bit to avoid sign issues)# ============================================================================    .p2align 4fib_asm_compute_only:    # Handle base cases n <= 1    cmp     $1, %edi    jg      .L_compute_loop_init    movsx   %edi, %rax              # return n if n <= 1    ret    .p2align 4.L_compute_loop_init:    xor     %eax, %eax              # a = 0    mov     $1, %edx                # b = 1 (use edx instead of rbx)    mov     $2, %ecx                # i = 2    .p2align 4.L_compute_loop:    cmp     %edi, %ecx              # compare i with n    jg      .L_compute_done         # if i > n, exit loop        # Compute next Fibonacci: c = a + b, a = b, b = c    lea     (%rax, %rdx), %rsi      # rsi = a + b (use lea for addition)    mov     %rdx, %rax              # a = b    mov     %rsi, %rdx              # b = temp        inc     %ecx                    # i++    jmp     .L_compute_loop         # loop back    .p2align 4.L_compute_done:    mov     %rdx, %rax              # return value = b    ret# ============================================================================# fib_asm_compute_and_store# Compute and store all Fibonacci numbers from 0 to n## Parameters:#   edi = n (int, first argument)# Returns:#   rax = pointer to allocated array (or NULL on error)## Registers used:#   rbx = saved n (callee-saved)#   r12 = array pointer (callee-saved)#   rcx = loop counter#   rax, rdx = for computation# ============================================================================fib_asm_compute_and_store:    push    %rbx                    # save callee-saved registers    push    %r12        movsx   %edi, %rbx              # save n in rbx (sign-extend to 64-bit)        # Allocate memory: (n+1) * 8 bytes    lea     1(%rbx), %rdi           # rdi = n + 1    shl     $3, %rdi                # multiply by 8 (size of uint64_t)    call    malloc                  # call malloc        test    %rax, %rax              # check if malloc returned NULL    jz      .L_store_error          # if NULL, return error        mov     %rax, %r12              # save array pointer in r12        # Initialize base cases    movq    $0, (%r12)              # fib[0] = 0        cmp     $0, %rbx                # if n == 0    je      .L_store_done           # we're done        movq    $1, 8(%r12)             # fib[1] = 1        cmp     $1, %rbx                # if n == 1    je      .L_store_done           # we're done        # Loop to compute remaining values    mov     $2, %rcx                # i = 2.L_store_loop:    cmp     %rbx, %rcx              # compare i with n    jg      .L_store_done           # if i > n, exit loop        # Load fib[i-1] and fib[i-2]    mov     %rcx, %rax              # rax = i    dec     %rax                    # rax = i - 1    mov     (%r12, %rax, 8), %rdx   # rdx = fib[i-1]    dec     %rax                    # rax = i - 2    mov     (%r12, %rax, 8), %rax   # rax = fib[i-2]        # Compute fib[i] = fib[i-1] + fib[i-2]    add     %rdx, %rax              # rax = fib[i-1] + fib[i-2]        # Store fib[i]    mov     %rax, (%r12, %rcx, 8)   # fib[i] = rax        inc     %rcx                    # i++    jmp     .L_store_loop           # loop back.L_store_done:    mov     %r12, %rax              # return array pointer    pop     %r12                    # restore callee-saved registers    pop     %rbx    ret.L_store_error:    xor     %rax, %rax              # return NULL    pop     %r12                    # restore callee-saved registers    pop     %rbx    ret    .section .note.GNU-stack,"",@progbits

Fundamental data structures: Linked List

Thu, 29 May 2025 12:00:00 +0200

This post covers an intermediate system/algorithm designer to a high-throughput data system using linked lists. For the same of brevity code examples are kept brief.

Motivation

The theory behind one of the most fundamental data structures linked list is by itself fascinating but to give a motivation let’s start with a real-life example. Recently I’ve had to deal with a lot of sensor data. By lot I mean 40Msps of 32bit data which is around 160MBps. Using memcpy on all of it is simply not a viable option because memory oprations are known to be slow and the system is already busy with the data stream and we haven’t yet even talked about the application layer. Since we don’t want to waste cpu time copying this data stream and we still want to do something useful with the data what options do we have?

The first list implementation I used was the ArrayList in Java. In Python a simple list is created with []. New variables can be added to the list with .append(). These are good implementations to work with when learning basic data structure interfaces. They are good abstractions since they can be used without any knowledge of underlying implementation and they have readily available connections to algorithms like sorting and shuffling.

A benefitial standpoint for an argument over design is to be able to analyze the implementations and argue why it is best suitable for a given task. One might settle for a working implementation in this case a list structure because it “works out of the box” and “gets the job done”. But to be able to argue over design choises is crucial in algorithm analysis and solution evaluation. We’ll come back to this analysis at the end of this post.

Let’s dive into a little C code. We’ll look at different design choises in low-level data structure implementations and possible design patterns that emerge.

Data structures

To understand data structures deeply I would state the best way to learn is to look into low level programming concepts like instruction set architectures, CPU core architecture, memory layout, data types and pointers since they all common factors in all computing. At the core linked list is a collections of nodes that point to the next node. This enables the list to be stored in scattered memory locations. This is in contrast to array where data is stored in a single block of continuous memory.

typedef struct node {    uint32_t payload;    node* next;} node;typedef struct linked_list_h {    node* first_element;} linked_list_h;

In a struct all elements are memory aligned by the compiler. The node has a pointer to the next node in the list in a nested structure. This is a completely fine definition in C. The last element is null terminated so we know we have reached the end of the list. To access the nth element in the list we simply count pointer accesses until we are at the nth entry of the list. Note that we could eliminate out-of-bound index access by creating a size variable that is incremented and decremented when adding or removing elements to the linked list.

One thing to note is that C does not enforce coding paradigmas like object oriented programming languages. This means the programmer is responsible for using different design patterns. Remember nothing is taken for granted in C.

For a system to access and use the linked list we need to initiate a handle with a fixed pointer to the first element of the list. This type of linekd list is called an intrusive linked list. The word intrusive means that the node pointers and the payload are embedded to the same data strcuture. To finish this implementation we would then design an API for the user. While this implementation is valid approach we may want to include the same payloads in multiple lists. Using this implementation would mean having multiple copies of a single payload in the memory.

If we know we have multiple lists that include the same items we can use another approach.

typedef struct node {    node* next;    void* payload;}

The void* pointer is used to note that the data can be anything. This is not couraged in practise but it notes the payload can be in theory any data type.

This way the payload exists somewhere in the memory and it can be included to several lists. As a trade-off it is now the programmers resposibility to make sure the payload is freed correctly.

We can place these nodes in a flat memory map region. The region is managed separately and has it’s size limits etc. It can be placed in a shared memory location. These are just some ideas to prove what they can be used for. Isolating the nodes like this to a single control block makes it likely the whole region fits to a cpu cache register. This again makes it possible to iterate the list in blazing fast speed! Note that the payload still need to be fetched from another region is not affected by cache hits.

Now if we think back didn’t we just state a while back that linked lists are good for the fact that they can be resized? Doesn’t the control block approach eliminate this feature? In one way yes.

Optimization and linked lists

To give a common example we can malloc memory regions from the heap but the system memory is still finite. A proper approach to control block logic would be to initiate a bigger memory region chunk on the stack and divide the region to node sized slots using a free list. This is a list of nodes that can be used to store a single node. When a new node is created it is removed from the free list and placed to an occupied list. Adding and removing elements updates the free and occupied lists accordingly. Since the linked list is allocated in the stack no syscalls are made which improves performance.

In practise: A high-troughput data system

I want to discuss the design I settled with in the high throughput data system I talked about at the start of this post. Since we don’t want to do any unnecessary copies of the data with memcpy a bigger ring buffer seemed like the best choise. Note that a ring buffer can be implemented with either an array or a linked list. An array was chosen because of the uniformly sampled data and ease of access for all indices (ref. base pointer).

A 10GB buffer was chosen. This ensures the buffer holds data history of 62.5 seconds which is plenty of time for readers to operate on the data. The buffer wraps around and overwrites the old data thus old data does not need to be manually erased. In the buffer we have a _write_index and a _read_index that point indices in the buffer where _write_index is the next writable address and _read_index points to the last unread address.

Data is accessed using pointers. This means we have a get_next_write_buffer() and get_next_read_buffer() that return a pointer to the indices. By assumption only one writer operates on the buffer. By design the buffer data should no be modified and is thus read only. Since I know that not all of the data is of interest as it might be just noise a seekable reader was chosen. Seekable means that the reader can traverse the ring buffer. Only every nth entry is read for noise check. If there happens to be something else besides noise in the data then the reader looks in the adjecent buffers as well.

You might have noticed, yes, the reader can be configured to skip buffers. After all processing the whole data stream is very expensive.

This results in sliced time series data where noise regions are discarded. Given a timestamp for recording start time and the data as payload the data is pushed to a fifo (first-in-first-out) queue linked list for later processing.

Final notes

In the before mentioned data system throughput is the most important requirement. Only a single linked list was used for the later fifo queue. For the first implementation of the system I did use a linked list that used a heap allocated memory region but it resulted in a lot of data overflows in the recording device. Operating solely on pointers makes the data writing and reading very fast thus nearly zero overflows. The system can be scaled vertically by using several reader/worker threads that copy data from the raw data buffer.

In the end hard real-time performance is not that critical in this system. This solution is itself pleasingly fast and provides flexibility to the upper stacks but it not as fast as possible. A yet more real-time system can be thought of as an exercise.

System design like this encourages to think deeply about data structures and see the whole system as a collection of different components. I also concidered double or n-buffering. Having a single writer and reader had the possibility to set a roughly constant time delay between the reader and writer.

Hope you learned something new here. Have a happy day!

On filters and AHRS systems

Sun, 04 May 2025 09:00:52 +1000

In this post we go thorough basics of attitude and heading reference systems (AHRS) and dicuss in detail the Madgwick’s algorithm (MA) which is an optimized solution for pose estimation. Note that the use of terms filter and algorithm become intertwined and their differences are left for the reader to decipher.

Introduction

In a recent project I was introduced to inertial measurement units (IMU) that eventually led to inspect AHRS in greater detail. An IMU integrated circuit often cosist of accelerometer and a gyroscope. Some ICs include a magnetometer. I will not go into the details of these sensor types here. For the curios I recommend a fantastic python library¹ that describes the bigger picture very well.

If you are like me you might like to try to see common factors accross seemingly different topics. I think it adds a layer of challenge and makes learning new areas both fun and rewarding. Hence I like the reader to think of traditional digital filters in mind when getting to know AHRS. In AHRS there are several ways to accomplish pose estimation with different sensor types. For quick reference: a rigid body’s pose can be estimated in a local or in global frame. Earth and compass points are one example of a global frame. We can denote earth frame as $g_e$. For a newcomer a suitable first pose estimation filter (PEF) to inspect is the complementary filter. In a sense it can be thought of as a special digital filter.

The name for complementary filter (CF) is assumed to come from the fact that the CF uses two or three different sensor to complement one another for better overall pose estimation performance. The equation for CF is expressed as follows:

$$\theta_c = \alpha \cdot \theta_{\omega} + (1-\alpha) \cdot \theta_a$$

Often times for clarity we define the true pose as $\theta$ and the estimated pose as $\hat{\theta}$. Here we assume all values to be estimations of the true values.

In the equation $\theta_c$ is the filter output, $\alpha$ is the filter weighting coefficient, $\theta_a$ accelerometer pose estimation and $\theta_{\omega}$ gyroscope pose estimation. If $\alpha = \frac{1}{2}$ then the filter takes the average of the respected sensor values. Note that a filter with a magnetometer the $\theta_a$ is replaced by $\theta_{am}$.

$$\begin{split}{\theta}_{a} =\begin{bmatrix}\theta_x \ \theta_y \ \theta_z\end{bmatrix} =\begin{bmatrix}\mathrm{arctan2}(a_y, a_z), \\mathrm{arctan2}\big(-a_x, \sqrt{a_y^2+a_z^2}\big), \ 0\end{bmatrix}\end{split}$$

Both accelerometers and gyroscopes suffer from their own “flaws” and when combined with the equation before we have a more accurate and reliable estimation of pose. The alpha term should look familiar to those familiar with digital filters.

$$y_n = \alpha \cdot x_n + (1-\alpha) \cdot y_{n-1}$$

I want to point out that looking at the equations they don’t seem that different from one another. In the low-pass filter equation above $x_n$ is the most recent sensor value, $y_n$ is the updated filter value and $y_{n-1}$ is the previous filter value. Adjusting the $\alpha$ value makes the system favor either the long-term filter value of the recent sensor values. One way to think about this is that the filter has a longer history.

In the complementary filter there is no implication of system memory. We can think that the complementary filter favors either one pose estimation source over another based on the alpha term. Now just as an exercise think of the benefits of using a CF with a memory of past states. What could be gained from it?

More sophisticated filters

It may come as a no surprise but Kalman filters are also used for PEF. Since it combines three different sensors (magnetometer as a new one) the equations get quickly menacing ². The ability to add model noise and process noise to the filter has it’s benefits when using different grade sensors. As respectable as the solution is instead let’s focus on a different approach made by Sebastian O.H. Madgwick.

In his thesis Madgwick uses gradient descent as a PEF. To put briefly the pose estimation is constructed as a loss function optimization problem that uses complex values called quaternions. A quaternion is defined:

$$ q = a + b\hat{i} + c\hat{j} + d\hat{k} $$

and

$$ i^2 = j^2 = k^2 = ijk = -1 $$

A three-dimentional rotation is described as a quaternion multiplication $f(p) = q \cdot p \cdot q^{-1}$. In MA by loss function definition we try to find a quaternion that rotates the pose estimation based on sensor values to point to the direction of earth frame $g_e$. The pose estimation is described by function $f(q, g_s, g_e)$. Since hugo sites have a problem with rendering matrices and equations spanning mutile lines I won’t include all equations here. For full equations see ³.

TBC

A concept might have come to your mind when reading this post: sensor fusion. In essence this is what AHRS systems are about.

A perspective on test-driven development

Wed, 22 Jan 2025 15:00:00 +0200

In this post we’ll take a look at a development guideline called test-driven development. To give the discussion a pratical perspective we’ll look at an example in embedded C. Embedded applications are notoriosly challening to test and debug. Here we use TDD framework for a implementing a simple data buffer structure.

In Theory

The paradigm

Add a test
Run all the tests
Write some code
Run tests
Refactor code
Repeat

Motivation

At first impression test-driven development (TDD) seemed like an excessive display of discipline in programming. Following the “the paradigm” would only to slow down the development process with a promise for “code that behaves the way it was designed to behave”. Every developer is optimistic about their own work so creating “code that works” seems like a silly motivation. Of course our code works!

Having debugged an embedded system with application logic and web API, and where no tests were defined the situation became a nightmare when strange bugs appeared. Our team of 3-4 specialists ended up spending two weeks on a bug that had to do with microSD card I/O operations. In the end the reason was a use of outdated software drivers. Using a new type of sd card caused compatibility issues because of the cards larger memory capasity. Now that some time has passed since then I realize this was a big rookie mistake. The whole situtation could have been avoided if there were a tailored test suite for the sd card I/O operations…

TDD in context

Software testing can strike a beginning developer as tedious, confusing, even redundant process. Using test suites with unit and integration testing for a smaller codebase is often times an oversized decision. But as codebase grows in size a systematic approach to testing becomes relevant. Test suites are invaluable when a big codebase is under stress test to avoid the nightmare of bug tracking with zero context of the systems state.

Software testing is quite different in practise dependeing on the used programming language and testing framework. With a interpreted language and a framework (take Python and pytest for example) situations that require use of debugger are mainly unhandled exceptions and contradictions in design by contract. In a bare-metal environment with no OS the possible issues become more convoluted. A single failed pointer operation may lead to undefined system behaviour and following debug process time can be hard to predict. This doesn’t mean that we should only use TDD in a single context an nowhere else (there is a book on Test-driven Development in Python).

When a developer first enters their first career there are deadlines and waiting customers who want to see everything working as they want. There is a feeling of rush and getting things done on time. The rush rarely contributes to software quality neither in design or implementation. In worst case taking shortcuts increments technical debt and makes codebase maintanence challenging. Well defined software development process, including testing routines, code acceptance, possible CI/CD pipeline and a good team culture a codebase remaings under control.

TDD introduces realiability in existing codebase. When I saw the paradigm of TDD, I pictured a Venn diagram - of course. Put in one way the test suites set an outer bound for the codebase. Put yet another way the codebase has no more functionalities than what is determined by the test suites. Picture what this implicates. All of the codebase is tied to a set of test suites that scream if something breaks down (yes, tests are said to scream when they break). Now if one team member would be eager to write production code before creating tests the codebase would have more fuctionality than what the tests cases determine. If, and when, things go south the tests suites are there to point where the possible bugs originate from.

In practise

TDD framework in embedded C: ceedling

In ceedling we create test suites that confirm the behaviour of our production code. The framework is specially meant to test application level code. The tests are compiled and run on the host, not on the target device. For this reason peripherals should be mocked so that they can be integrated to tests. Ceedling has support for VSCode environment. If we were not thinking in terms of TDD we would start by defining a data structure like such:

enum buffer_operations {    BUFFER_OK,    BUFFER_ERR}struct my_buffer {    uint32_t *head;    uint32_t buffer_array[BUFFER_LENGTH];    uint32_t BUFFER_LENGTH;}typedef struct my_buffer buff;

and then proceed to create the function API

buff buffer_create_buffer(uint32_t buffer_lenth);buff buffer_init_buffer(uint32_t buffer_lenth);int32_t buffer_push(buff* b, uint32_t item);int32_t buffer_pop(buff* b, uint32_t *dest);bool buffer_is_full(buff* b);bool buffer_is_empty(buff* b);

For the sake of readability the buffer_ prefix is added to all functions to tell what source file the function call is defined in. This looks good yes. We then construct the implementations and create a test to vefiry our data structure works.

What we have done is actually part of step 3 in TDD: “Write some code” and we have completely skipped the steps 1 and 2.

Step 1: “Write a test”

Instead we should think of the API design before writing any code. The API design can be expressed with a collection of tests. For out buffer example it would look something like this:

testIsBufferCreated()
testIsBufferEmpty()
testIsBufferFull()
testWasHeadIncremented()
testWasHeadDecremented()
testAddingToUnitializedBufferFails()
testAddingToFullBufferFails()

I want to pause to think what we did. At this moment if there were any missing features from the data structure definition it would be very easy to add them in. Our buffer here is quite simple. Now what if we wanted to have a buffer that could take in multiple items at once? What if we try to add $n$ items when there is not enough space in the buffer. Do we want to keep the first $m$ items, when $m$ is the remaining space and $n>m$.

testAddMultipleWasHeadIncremented()
testAddMultipleBufferOverflowFails()

Again we might want to be able to pop multiple items at once. Then we follow te same steps as before: “Write a test”. You get the idea. It is preferable to write down all the tests you can come up with.

We start to see the API taking shape. At this point we can start to think how other system components interact with this data structure. We could also delegate this list of tests to a team of coders who can give a full focus on the implementation.

Step 2: “Run all the tests”

You should first make all test return an error. I know this may sound confusing but this way we see that all our tests fail and we focus on making them pass one by one. In VSCode the test have a indicator in the menubar that show passing and failing tests.

Our buffer can be implemented in less than an hour so doing it this way is a clear overkill. Now imagine a larger database with tens of components. Try not to think of tests as a sign that something is not working but instead as a sign of progress. Once a given test suite passes the component is done! When a superior asks on the state of some project you can now say “67% of all tests pass” instead of “we’re getting there”.

Step 3: “Write some code”

We would then proceed to write production code until…

Step 4: “Run test”

our tests pass. After this some might say that we have finished and should move on to something else.

Step 5: “Refactor code”

At this step our code should run and other people should be able to use it. In this step we focus on refactoring the code with a specific focus in mind: readability, performance, maintainability. We basically rewrite existing code to make it better. Once this is done we rerun our tests and if all tests pass our code still works as expected.

Step 6: “Repeat”

We follow this pattern for all of our codebase.

Conclusion

The implemented tests become a rudimentary process in software development. Since nothing should be taken for granted in embedded design all hardware should go through test suites.

Cases of feature selection

Wed, 10 Jul 2024 09:00:52 +1000

In this post I share a solution to one school submission on feature selection. Have a good day!

Submission

When we train our models with a determined number of features we increase the required computational resources needed to make a prediction. In some cases the features do not provide any meaningful data or the information is already given by another feature. We should discard these features whenever possible. There are multiple methods with which this can be done.

There are three categories of feature selection: 1) filter methods 2) wrapper methods 3) embedded methods. In this example I will go through some examples of each category.

Filter methods use statisical analysis to evaluate features. These methods are computationally less demanding than cross-validation based methods.

Information gain source

Decision trees use this method to find suitable features for the hypothesis space. Features are ranked by the amount of entropy loss when a feature is split two to groups according to a given label. This can be used to find correlation between input data and the labels. A split resulting in a little entropy loss in ranked low and vice versa.

Correlation Coefficient source

We can use Pearson’s correlation to determine whether features are linearly correlated with one another. Together linearly correlated features do not provide any additional information for classifying. Calculating these coefficients as a correlation matrix is useful information to discover if some of features are codependent. Selected features should still be correlated with the label classes.

Wrapper methods use a classifier and some metric to determine the best features to use. These often yield better results than filter methods but are computationally more demaning.

Leave Out One Feature (LOFO) source

LOFO tests the model accuracy loss by leaving out one feature at a time in every training iteration. The features causing the most accuracy lost are ranked as the most important features.

Forward Feature Selection source

In forward feature selection we go trough all possible features to predict a given label. The one that provides best accuracy is selected. We then continue to use combinations of the selected features together with the remaining ones ans see which improves the model accuracy best. We keep doing this until a sufficient accuray or a limit of features is reached.

Exhaustive Feature Selection source

Exhaustive feature selection is a brute-force method to select a group of features based on a scoring method for example AUC of ROC. It takes the minimum and maximum number of features as parameters and goes through all possible combinations and returns the group of features with the best score.

Embedded methods combine both filter methods and wrapper methods to find the best feature combinations while having reasonable computationla cost.

Random Forest Importance source

Random Forest Classifier chooses features based on Gini impurity. With a large number of decision trees we can examine all the trees and their nodes to find which features have ended up in the nodes near decision tree roots. The closer the features are to the root the more important they are.

Lasso Regression source

Lasso uses L1 regularization to determine which features to use for prediction and which to discard.

“L1 regularization adds a penalty that is equal to the absolute value of the magnitude of the coefficient. This regularization type can result in sparse models with few coefficients. Some coefficients might become zero and get eliminated from the model. Larger penalties result in coefficient values that are closer to zero (ideal for producing simpler models).”

Conclusion

There are a few of possible feature selection methods to use. The coice of method depends on the amount of data dimensions and computational recources.

llm on e-tinkerer

Deploying local LLM using ollama and openweb-ui

Prerequisites

Loading a model

Interacting with the LLM

Setting up openweb-ui

Development in terminal with aider

Conclusions

Further reading

FPV drones: a technological overview

Motivation

From pilot commands to motor movement

Inspection of a complete fpv build

Flight controller firmware

Betaflight

iNAV

ArduPilot

Other components

Battery

BLDC motors and propellers

Navigation

Video link formats

FPV Simulations

Conclusions

Further reading

On Linux init managers

A brief overview of init managers

/etc/inittab, init scripts and daemons

OpenRC, systemd, sysvinit

Conclusions

Further reading

Computing Fibonacci, but using registers, and assembly

Introduction

Experimentation setup

Results comparison

Conclusions

Appendix A: Algorithms

1#: Textbook Fibonacci

2#: Register Fibonacci

3#: Assembly Fibonacci

Fundamental data structures: Linked List

Motivation

Data structures

Optimization and linked lists

In practise: A high-troughput data system

Final notes

On filters and AHRS systems

Introduction

More sophisticated filters

A perspective on test-driven development

In Theory

The paradigm

Motivation

TDD in context

In practise

TDD framework in embedded C: ceedling

Step 1: “Write a test”

Step 2: “Run all the tests”

Step 3: “Write some code”

Step 4: “Run test”

Step 5: “Refactor code”

Step 6: “Repeat”

Conclusion

Cases of feature selection

Submission

Information gain source

Correlation Coefficient source

Leave Out One Feature (LOFO) source

Forward Feature Selection source

Exhaustive Feature Selection source

Random Forest Importance source

Lasso Regression source

Conclusion

More sources