Unix Parallelism and Concurrency: Processes & Signalling

In this era of threads and asynchronous abstractions, applications and processes have become almost synonymous. A process is widely seen as the operating system’s underlying representation of a whole running application. However, by limiting ourselves to this model we cut outselves off from an elegant set of tools for parallelism and concurrency.

In case you thought this blog’s design looked prehistoric enough, it’s starting with a post about following concurrency patterns rooted in the era in which the mouse was a keen invention.


The key construct behind process-based Unix concurrency is the fork system call. It’s practically a paradox: one program calls it yet two programs finish calling it to move on. This appears as quite the oddity to programmers of contemporary environments like Java and the JavaScript; how can a mere function call violate the fundamental laws of how a written program executes?

Most environments, despite having a set of rules its programmers can depend on, have occasional strange artefacts on the surface that violate such rules and are exposed by the underlying system.

A Java program holding a reference to an object ensures that object remains alive, but the system exposes a different type of reference tracking with the java.lang.ref.WeakReference type insofar as allowing an object to be deleted while the programmer is still holding onto it. Likewise, storing a string literal into a object will never block execution in JavaScript, but there’s an exception that exposes the underlying browser environment: window.location. Assigning a string to that cancels the current flow of execution by redirecting the page, halting the current JavaScript environment and throwing it away.

In C or Rust, or even other higher level language like Ruby or Python, the environment is not a language-specific world unto itself like Java or a web browser; it is instead our underlying operating system. To understand our underlying operating system, we can look at specifications such as POSIX which most Unix variants will mostly align with.

Apart from your operating system, language runtimes such as those in Ruby and Python provide extra gizmos provided like stack-unwinding, garbage collection and some introspection capabilities. System calls from these languages, like the previous examples, provide capabilities that can step outside of the normal rules of the language you are using. Unix doesn’t care about the Rust drop method that your language runtime makes “guarantees” about running at the end of the scope; if you exec, the whole process’ runtime image is being swapped out and execution is jumping. Those method calls will be deep sixed.

For the demonstrations in this article, Rust is used. After installing it, set up a new project by changing into a new project directory and running cargo init --bin. Three packages outside of its standard libraries are used: num_cpus for counting the number of processors on the machine, anyhow to simplify error-handling, and nix to give expanded access to Unix’s concurrency APIs. Add all of these with cargo add anyhow nix num_cpus. Replace the generated src/main.rs file with the content of each example before running them, which can be done with cargo run.

Invoking the fork system call in Rust is as follows:

use {
    anyhow::Result,
    nix::unistd::{
        fork,
        ForkResult::{Child, Parent},
    },
};

fn main() -> Result<()> {
    match unsafe { fork() }? {
        Child => println!("In child process"),
        Parent { child } => println!("In parent process; this child is {child}"),
    }
    println!("Finished a process");
    Ok(())
}

Rust supports returning Result types from the main function. In this case, it’ll automatically report erroneous reports as a message for the user. This is used consistently across the examples when errors are possible. As fork violates the expectations of most languages, Rust expects calls to it to be wrapped in unsafe blocks. ? is used to return errors back up to main; the ? notation is used throughout the examples to return early from functions in the case of errors. () is the “unit type” which essentially means “nothing”. Put together, Ok(()) means “a successful operation that yielded nothing useful”.

This program demonstrates that oxymoron of one program entering fork and two leaving it; how else could both branches be run in a single if? This allows for a form of concurrency and parallelism:

$ cargo run
In parent process; this child is 906
Finished a process
In child process
Finished a process

Running operations side-by-side is hardly an elusive trick. Anyone spinning up a thread in Java or interleaving two setTimeouts in JavaScript could replicate something similar. Unlike the latter, however, the application can still continue while one of the hangs for ten seconds:

use {
    anyhow::Result,
    nix::unistd::{
        fork,
        ForkResult::{Child, Parent},
    },
    std::{thread::sleep, time::Duration},
};

fn main() -> Result<()> {
    match unsafe { fork() }? {
        Child => sleep(Duration::from_secs(10)),
        Parent { .. } => println!("Finished"),
    }
    Ok(())
}
$ cargo run
Finished

These clearly aren’t events being dispatched into an event loop behind the scenes, otherwise the ten-second hang would temporarily block the event loop, stopping the second event println!("finished") from running. Single events blocking the entire event loop and grinding large applications to a halt is not a theoretical problem.

Threads sidestep this problem by utilising the operating system’s scheduler which slices up time on the computer between competing tasks to avoid any one of them starving the whole system of resources, but they come with their own arsenel of footguns in many shapes and sizes:

use {
    num_cpus,
    std::{
        thread::{sleep, spawn, JoinHandle},
        time::Duration,
    },
};

struct UserCounter {
    count: usize,
}

impl UserCounter {
    pub fn new() -> Self {
        Self { count: 0 }
    }

    pub fn increment(&mut self) {
        let count = self.count;
        println!("User {count} visited");
        self.count = count + 1;
    }
}

fn main() {
    let mut counter = UserCounter::new();
    let counter_ptr = ((&mut counter) as *mut UserCounter) as usize;

    let handle_requests = move || loop {
        sleep(Duration::from_millis(300));
        let p = counter_ptr as *mut UserCounter;
        unsafe { &mut *p }.increment();
    };

    let threads: Vec<JoinHandle<()>> = (1..num_cpus::get())
        .map(|_| spawn(handle_requests))
        .collect();
    for thread in threads {
        thread.join().unwrap();
    }
}

A counter struct is created, but it’s incrementing of said counter is split across multiple operations which are then interleaved across many threads. This is where disaster strikes. Rust advertises itself as having “fearless concurrency”, and this is demonstrated here: doing it the obvious way – passing a reference to counter into the thread’s closure – causes Rust to reject the code. It knows that the counter cannot safely be modified from multiple threads. To demonstrate this failure at runtime, we bypass Rust’s safety paniopticon by smuggling a reference to the counter into a simple numeric type (usize) and then materialising it back into a pointer inside the thread. Dereferencing raw pointers conjured out of arbitary numbers is not memory-safe, so we pull out unsafe again.

The bug Rust tried to prevent strikes; the loading and incrementing is interleaved with other concurrent threads, causing the counter to be wrong most of the time.

$ cargo run
User 0 visited
User 0 visited
User 1 visited
User 2 visited
User 0 visited
User 1 visited
User 0 visited
User 0 visited
User 1 visited
User 2 visited
User 3 visited
User 4 visited
User 5 visited
User 6 visited
User 7 visited
User 8 visited
User 9 visited
User 10 visited
User 8 visited
User 8 visited
User 9 visited
User 8 visited

We must remember quite a few rules to avoid shooting ourselves in the foot in multithreaded systems. Just a few.

Between dealing with data races, deadlocks, stale reads, and other threading esoteria, programming with threads is playing Russian Roulette with a fully loaded uzi. An uzi that jams a lot too, as adding too many critical regions to synchronise threaded code bottlenecks your otherwise concurrent program into single-threaded hotspots that can end up throttling your application’s performance.

Have you managed to get the locking fine-grained enough to get good performance while avoiding those pitfalls? Well, hopefully you’re not building any more abstractions on top of it, as locks do not compose.

Operating system processes are bulky and cannot be spun up as fast as threads, but are more isolated from one another. They have their own memory space, meaning one buggy process can’t corrupt the in-memory data of another. Unlike threads, misbehaving processes can actually be killed without causing strange, hard to trace bugs in the underlying system.

Processes also encourage communication via message-passing mechanisms like signalling, domain sockets, and networking connections. It turns out that these solutions are easier to scale across multiple physical machines than shared memory communication, as others also discovered quite a long time ago.

Signalling is the easiest way to get your feet wet with Unix IPC, Inter-Process Communication:

use {
    anyhow::Result,
    nix::{
        sys::{
            signal::{kill, SigSet, Signal, SIGCONT},
            wait::waitpid,
        },
        unistd::{fork, ForkResult::Parent, Pid},
    },
    std::{process::exit, thread::sleep, time::Duration},
};

fn sigwait(signals: &[Signal]) -> Result<Signal> {
    let mut set = SigSet::empty();
    for signal in signals {
        set.add(*signal);
    }
    set.thread_block()?;
    let signal = set.wait()?;
    Ok(signal)
}

const fn new_operation<'a>(name: &'a str, wait_secs: u64) -> impl Fn() -> Result<()> + 'a {
    move || {
        println!("Starting {name} operation...");
        sleep(Duration::from_secs(wait_secs));

        kill(Pid::parent(), SIGCONT)?;
        sigwait(&[SIGCONT])?;

        println!("{name} operation complete.");
        Ok(())
    }
}

fn fork_new_operation(name: &str, wait_secs: u64) -> Result<Pid> {
    if let Parent { child } = unsafe { fork() }? {
        Ok(child)
    } else {
        let op = new_operation(name, wait_secs);
        op().unwrap();
        exit(0)
    }
}

fn start_children() -> Result<Vec<Pid>> {
    let child_1 = fork_new_operation("very expensive", 5)?;
    let child_2 = fork_new_operation("slightly expensive", 2)?;
    Ok(vec![child_1, child_2])
}

fn wait_for_all(children: &Vec<Pid>) -> Result<()> {
    for _ in children {
        sigwait(&[SIGCONT])?;
    }
    Ok(())
}

fn display_in_order(children: &Vec<Pid>) -> Result<()> {
    for &child in children {
        kill(child, Some(SIGCONT))?;
        waitpid(child, None)?;
    }
    Ok(())
}

fn main() -> Result<()> {
    let children = start_children()?;
    println!("All children started.");
    wait_for_all(&children)?;
    println!(
        "All children finished main tasks; asking them to display results in \
        order."
    );
    display_in_order(&children)?;
    Ok(())
}

This is a more complex example, so let’s go through each component piece-by-piece.

Firstly, we teach the program how to wait for a signal. It requires a few steps. As it’s done multiple times, the steps are encapsulated into a function called sigwait. It creates a “signal set”, adds the signals we wish to wait for, tells the OS to block the current thread on the signal (a neccessity for multithreaded code), and finally waits for a matching signal to come in.

Secondly, we have some operations that take time. Rather than duplicating them, we have a function that returns new operation functions with the name and time-taken provided. Being a systems language without a garbage collector, Rust’s concerns must be allayed: 'a is a lifetime specified on new_operation. It asserts to Rust that the name of the operation we pass in will not outlive the returned function, as that function prints it out as it runs. After waiting and displaying a message, it waits for the parent process to sent it a “continuation” signal to carry on and finish.

Finally, the parent spins off two operations with different timeouts. Each child sends a SIGCONT continuation signal to the parent. When the parent gets as many of them as child processes, it knows they’re both complete. It then sends back SIGCONT in a specific order, meaning each process displays their final message in an order coordinated by the parent.

Running this yields:

$ cargo run
Starting very expensive operation...
All children started.
Starting slightly expensive operation...
All children finished main tasks; asking them to display results in order.
very expensive operation complete.
slightly expensive operation complete.

The slightly expensive operation, despite being quicker, displays its output after the longer running one. Both of them ran at the same time; it waited for 5 seconds, not 2. To recap, combining fork with Unix process signalling, the following was organised:

  • Multiple tasks are forked in seperate processes. This not only allows IO interleaving like event systems, but also utilisation of multiple processor cores if we assumed the mock sleeps are actually computationally expensive operations.
  • The parent process waits for a signal, specifically SIGCONT, to indicate a child process finished its main task. It waits for the same amount of signals as child processes. This means it does not move on until they all declare having finished.
  • It iterates over the processes in the order they were defined, sends a message to each to display their results, and waits for them to complete.

The whole process not only parallelises the compution, but it linearises the results. Notice the lack of locks, shared queues, and polling.

Running man 7 signal on a Unix device tells us what signals exist. Choosing SIGCONT was an arbritary decision, as most of the signals here could have been used. SIGCONT just so happens to best describe what it was doing: continuing after the tasks had finished waiting for something else.

This relevant section of Linux’s manpage for signal is illustrative:

Signal      Standard   Action   Comment
────────────────────────────────────────────────────────────────────────
SIGABRT      P1990      Core    Abort signal from abort(3)
SIGALRM      P1990      Term    Timer signal from alarm(2)
SIGBUS       P2001      Core    Bus error (bad memory access)
SIGCHLD      P1990      Ign     Child stopped or terminated
SIGCLD         -        Ign     A synonym for SIGCHLD
SIGCONT      P1990      Cont    Continue if stopped
SIGEMT         -        Term    Emulator trap
SIGFPE       P1990      Core    Floating-point exception
SIGHUP       P1990      Term    Hangup detected on controlling terminal
                                or death of controlling process
SIGILL       P1990      Core    Illegal Instruction
SIGINFO        -                A synonym for SIGPWR
SIGINT       P1990      Term    Interrupt from keyboard
SIGIO          -        Term    I/O now possible (4.2BSD)
SIGIOT         -        Core    IOT trap. A synonym for SIGABRT
SIGKILL      P1990      Term    Kill signal
SIGLOST        -        Term    File lock lost (unused)
SIGPIPE      P1990      Term    Broken pipe: write to pipe with no
                                readers; see pipe(7)
SIGPOLL      P2001      Term    Pollable event (Sys V);
                                synonym for SIGIO
SIGPROF      P2001      Term    Profiling timer expired
SIGPWR         -        Term    Power failure (System V)
SIGQUIT      P1990      Core    Quit from keyboard
SIGSEGV      P1990      Core    Invalid memory reference
SIGSTKFLT      -        Term    Stack fault on coprocessor (unused)
SIGSTOP      P1990      Stop    Stop process
SIGTSTP      P1990      Stop    Stop typed at terminal
SIGSYS       P2001      Core    Bad system call (SVr4);
                                see also seccomp(2)
SIGTERM      P1990      Term    Termination signal
SIGTRAP      P2001      Core    Trace/breakpoint trap
SIGTTIN      P1990      Stop    Terminal input for background process
SIGTTOU      P1990      Stop    Terminal output for background process
SIGUNUSED      -        Core    Synonymous with SIGSYS
SIGURG       P2001      Ign     Urgent condition on socket (4.2BSD)
SIGUSR1      P1990      Term    User-defined signal 1
SIGUSR2      P1990      Term    User-defined signal 2
SIGVTALRM    P2001      Term    Virtual alarm clock (4.2BSD)
SIGXCPU      P2001      Core    CPU time limit exceeded (4.2BSD);
                                see setrlimit(2)

SIGXFSZ      P2001      Core    File size limit exceeded (4.2BSD);
                                see setrlimit(2)
SIGWINCH       -        Ign     Window resize signal (4.3BSD, Sun)

The signals SIGKILL and SIGSTOP cannot be caught, blocked, or ignored.

Up to and including Linux 2.2, the default behavior for SIGSYS, SIGXCPU, SIGXFSZ, and (on architectures other than SPARC and MIPS) SIGBUS was to terminate
the process (without a core dump).  (On some other UNIX systems the default action for SIGXCPU and SIGXFSZ is to terminate  the  process  without  a  core
dump.)  Linux 2.4 conforms to the POSIX.1-2001 requirements for these signals, terminating the process with a core dump.

SIGEMT  is  not  specified  in  POSIX.1-2001,  but nevertheless appears on most other UNIX systems, where its default action is typically to terminate the
process with a core dump.

SIGPWR (which is not specified in POSIX.1-2001) is typically ignored by default on those other UNIX systems where it appears.

SIGIO (which is not specified in POSIX.1-2001) is ignored by default on several other UNIX systems.

Some of those signals have special behaviour, like being impossible to handle such as SIGKILL, or being handled by some language runtimes for us, like Ruby and Python translating SIGINT into rescuable errors. SIGUSR1 and SIGUSR2 are good for non-standard signals for application-specific events.

We might want to wait for a signal, but not if it takes too long:

use {
    anyhow::Result,
    nix::{
        sys::signal::{kill, SigSet, Signal, SIGALRM, SIGCHLD, SIGKILL},
        unistd::{
            alarm, fork,
            ForkResult::{Child, Parent},
            Pid,
        },
    },
    std::{process::exit, thread::sleep, time::Duration},
};

struct Alarm;

impl Alarm {
    fn set(seconds: u32) -> Self {
        alarm::set(seconds);
        Self
    }
}

impl Drop for Alarm {
    fn drop(&mut self) {
        alarm::cancel();
    }
}

fn run_slow_operation() {
    sleep(Duration::from_secs(10));
    println!("Finished slow operation");
}

fn sigwait(signals: &[Signal]) -> Result<Signal> {
    let mut set = SigSet::empty();
    for signal in signals {
        set.add(*signal);
    }
    set.thread_block()?;
    let signal = set.wait()?;
    Ok(signal)
}

fn wait_for(pid: Pid) -> Result<()> {
    let _alarm = Alarm::set(5);

    if sigwait(&[SIGCHLD, SIGALRM])? == SIGALRM {
        println!("Too late; killing child process");
        kill(pid, SIGKILL)?;
    } else {
        println!("Child process finished");
    }

    Ok(())
}

fn main() -> Result<()> {
    match unsafe { fork() }? {
        Child => {
            run_slow_operation();
            exit(0);
        }
        Parent { child } => wait_for(child)?,
    }
    Ok(())
}
$ cargo run
Too late; killing child process

sigwait was borrowed from the previous example. We take advantage of Rust’s destructors to allow creating an alarm that is guaranteed to be disarmed even if an error occurs throughout its existence; this is done in Rust by implementing the Drop trait.

Alarms allow timers to be interleaved with events. SIGCHLD is a signal for when any child processes stop running; secondly, some signals like SIGCHLD do nothing by default.

If we wanted, we could put all of our event handling logic directly in signal handlers rather than waiting with sigwait. However, handling them inline like that opens us up to the gnarly world of “asynchronous signal-safe behaviour”. It’s better to do the bare minimum in the handler, just dispatching an event that can be handled by normal code in the usual execution flow. A lot of libraries will dispatch signals as events via a wrapper to avoid this exact footgun.

Be careful of default signal behaviour. If your program is being run by a parent process, it can pass down a non-default signalset for “masking”. As we demonstrated above, a process registers its interest in signals using the mechanisms in our sigwait function. Underneath, this is using the POSIX notion of “signal masking”. If a signal must be picked up by your program, even for just sigwait, add it to the current process’s signal mask to be sure. Remember that threads must also be considered.

Signalling is one of the simplest forms of Unix IPC. It’s enough to coordinate processes, but does not allow sending messages with payloads. Domain sockets, networking connections, and other IPC systems allow a programmer to go a lot further.


If Unix IPC is such a powerful and battle-tested standard for parallelisation and concurrency, why isn’t it the primary port of call for solving such problems today? Well, for starters it’s slow and bulky even with modern optimisations like copy-on-write for copying processes’ memory spaces to forked children. Although shared memory has problems, it is sometimes the best way of solving certain problems and it’s easier with threads. Event-driven systems avoid many of the problems with threads and handle the majority of concurrency use cases in modern webservices, so managing processes manually becomes unnecessary.

Many applications written in the likes of Node.js use processes just to utilize as many processor cores as possible, but hide process management behind modules like cluster. Processes are used to parallelise, but the concurrency is handled by abstractions built atop an event loop. Using a process per request would destroy performance as they are too coarse for that level of fine-grained concurrency. In fact, that’s how web applications were handled many years ago in CGI scripts. There’s a reason it isn’t done that way anymore.

Despite these problems, Unix IPC mechanisms are sometimes still the best way of tackling certain concurrency and parallelism problems, so it’s worth keeping those dusty old ’70s techniques in the toolbox.