Linux runs 5000 times slower than our virtual machines
We applied place-transition nets (PTNs) defined by System V semaphores (https://doi.org/10.1080/17445760.2026.2615010) for benchmarking Linux, Ubuntu 24.04.4 LTS, kernel, 6.17.0-35. We were using PTNs for matrix multiplication and an array of concurrent multiplications, comparing the Linux kernel performance with the performance of our virtual machines (https://doi.org/10.1080/17445760.2025.2490148). A PTN for 1024 parallel multiplications of 6-bits data runs in Linux in more than 5000 thousand times slower that our VM, namely 5673.597 s. vs 0.912 s. on AMD Ryzen 7 6800H, 4.8 GHz, 32 Gb, node; the application contains 9216 semaphores (places) and 8192 processes (transitions).
We conclude that it is too much for just the system call context switching overhead; it is about the efficiency of System V semaphores implementation in sem.c kernel module. We are looking for projects to support rewriting semaphores with wait-for-all semantics in Linux both in kernel for processes and as a runtime code for fast, futex-like, synchronization of threads. Note that futex_waitv implements wait-for-any semantics, while wait-for-all is a good remedy for a good deal of deadlocks caused by sequential acquisition of resources.
We draw PTNs in Tina (The TINA toolbox Home Page - TIme petri Net Analyzer - by LAAS/CNRS) using it as IDE; we generate big models using our toolchains. Then we export a model using our plugin NDRtoALL as an .h file for our PVZ machine, which we recompile and run as a Linux application. Basic tools are uploaded on GitHub (GitHub - dimazaitsev/SNCtools · GitHub).
Polished
Peak-load benchmarks for operating systems: Linux runs over 5,000× slower than our virtual machines
We applied place-transition nets (PTNs) defined by System V semaphores (https://doi.org/10.1080/17445760.2026.2615010) to benchmark Linux (Ubuntu 24.04.4 LTS, kernel 6.17.0-35).
Using PTNs for matrix multiplication and arrays of concurrent multiplications, we compared Linux kernel performance with that of our virtual machines (https://doi.org/10.1080/17445760.2025.2490148).
A PTN executing 1,024 parallel multiplications of 6-bit data completed in 0.912 seconds on our VM, compared with 5,673.597 seconds on Linux running on the same hardware (AMD Ryzen 7 6800H @ 4.8 GHz, 32 GB RAM). The application contains 9,216 semaphores (places) and 8,192 processes (transitions).
The Linux execution time is more than 5,000 times slower than that of our VM. We believe this gap cannot be explained solely by system-call and context-switching overhead. Instead, it points to the efficiency of the System V semaphore implementation in the Linux sem.c kernel module.
We are interested in collaborating on projects aimed at implementing semaphores with wait-for-all semantics in Linux, both at the kernel level for processes and as a runtime mechanism for fast, futex-like thread synchronization. While futex_waitv provides wait-for-any semantics, wait-for-all semantics could help eliminate many deadlocks caused by sequential resource acquisition.
For modeling, we use Tina (The TINA toolbox Home Page - TIme petri Net Analyzer - by LAAS/CNRS) as an IDE and generate large PTN models with our own toolchains. Models are exported through our NDRtoALL plugin as .h files for the PVZ machine, then recompiled and executed as Linux applications.
Our basic tools are available on GitHub:
