Juggling software interrupts and realtime tasks

Benefits for LWN subscribers

The primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today!

By Jonathan Corbet
December 2, 2022

The software-interrupt mechanism is one of the oldest parts in the kernel; arguably, the basic design behind it predates Linux itself. Software interrupts can get in the way of other work so, for almost as long as they have existed, developers have wished that they could be made to go away. That has never happened, though, and doesn't look imminent. Instead, Android systems have long carried a patch that tries to minimize the impact of software interrupts, at least in some situations. John Stultz is now posting that work, which contains contributions from a number of authors, in the hope of getting it into the mainline kernel.

Hardware interrupts (or just "interrupts") are initiated when a physical component in the system wants the kernel's attention; they will usually cause an immediate trap into a special handler function. Since interrupts take the system away from whatever else it was doing, interrupt handlers have to do their work quickly; there is not time for any sort of extended processing. This is not a new problem; pre-Linux Unix systems often included the concept of a "bottom half" as a way of deferring work that could not be done in an interrupt handler.

The Linux kernel, too, has had to develop mechanisms to defer processing until a more convenient time. One of those mechanisms is software interrupts (or "softirqs"). It was first introduced in the 0.99 kernel under the familiar "bottom half" name; the term "softirq" doesn't appear until the 1.1.77 development release. The abbreviation "bh" ("bottom half") can still be found in the names of kernel functions related to software interrupts.

In the current implementation, there is a handful of software-interrupt "vectors" assigned to specific subsystems. When one of those subsystems has to defer some work, it sets a bit in a per-CPU mask that will cause its software-interrupt handler to be called at a later time. Often, the software interrupt will run immediately after the completion of hardware-interrupt handling, meaning that it runs at a high priority, before just about any other work in the kernel. The software-interrupt mechanism is reserved for a handful of core-kernel subsystems, including networking and the block layer, with one big exception: the kernel's tasklet mechanism, which is available to any subsystem, executes tasklets in a software-interrupt handler.

Software interrupts can create latency in an otherwise well-functioning system, so they have long drawn the attention of developers interested in response times. In kernels configured for realtime preemption, software interrupts are handled in a process like any other, so their priority can be set relative to other important tasks. Most of us do not run realtime preemption kernels, though, so software-interrupt handling will usually steal time from other tasks that would like to be running. Even on normal systems, though, a system that is generating large numbers of software interrupts will eventually defer them to the per-CPU ksoftirqd kernel thread.

Normally, one expects realtime tasks to be the highest-priority work on the machine, even one that is not running a realtime kernel. But software-interrupt handling trumps even realtime tasks and can prevent them from meeting their deadlines. The latency that software interrupts can add has, evidently, been seen to cause audio glitches on Android systems, inspiring the work being proposed by Stultz now.

There are two components to the suggested change, the first of which is a scheduler tweak to modify its placement of realtime tasks. Currently, when a realtime task becomes runnable, the scheduler will search for a CPU that appears to have sufficient available capacity to run that task; among other things, that check will prevent the placement of a resource-hungry task on a slow CPU. Stultz's patch adds a new check to see whether a candidate CPU is handling (or is about to handle) a "slow" software interrupt that is expected to take a fair amount of processing time; for the purposes of this decision, "slow" is defined as being software interrupts from the networking or block subsystems. If the CPU is indeed busy in that way, the scheduler will try to place the realtime task elsewhere so that, with luck, it will gain access to the CPU more quickly.

The second piece changes how the kernel chooses to run software-interrupt handlers. Normally, those handlers are run as soon as possible on the current CPU unless the ksoftirqd thread is already running, in which case they will be queued for handling there instead. The patch series causes the kernel to check, before running a software-interrupt handler, whether there is currently a realtime task running on the current CPU; if so, and if the software interrupt is of the "slow" variety, it will be deferred to ksoftirqd so that the realtime task can continue running. Faster software interrupts (such as those for timers) will still execute immediately on the current CPU.

To effect that split, a subtle change was required. In current kernels, if ksoftirqd is running on a CPU, all software interrupts will be deferred to it. If these changes are active (there is a new kernel configuration option to control that), "fast" software interrupts will be handled immediately whether ksoftirqd is running or not. This change was necessary because the deferral of slower software interrupt handlers, as added by this patch series, may cause ksoftirqd to be running at times when the overall software-interrupt load is not high. With the old test, faster interrupt handlers could be delayed by an unnecessary deferral to ksoftirqd. The new code, though, could have the potential to not defer software interrupts at times when they are loading the system.

The end result of this work is much better latency results with realtime tests like cyclictest. The patches have been running on Android kernels "for a number of years" and presumably do not cause problems, at least in that context. Even so, Peter Zijlstra is not entirely happy with this work; it would be better, he suggested, to simply stop software-interrupt processing when there is a high-priority task to run. He has a series of patches that make that kind of change that he had posted previously, but admitted that the work "fell on its face due to regressions" at that time.

Stultz gave Zijlstra's series a try and concluded that, while it improves the situation, it does not fully address the problem. Specifically, it doesn't help if a single software-interrupt handler runs for a long time, and that is indeed something that happens. Thus, he said: "I'm not sure if it will let us move away from the softirq-rt placement optimization patches". Hopefully, some sort of solution that is acceptable to all of the developers involved will eventually be worked out. Otherwise, the outcome may end up being that Android continues to carry this patch series for several more years.

Index entries for this article
Kernel	Android
Kernel	Interrupts/Software

(Log in to post comments)

Juggling software interrupts and realtime tasks

Posted Dec 2, 2022 17:58 UTC (Fri) by jhoblitt (subscriber, #77733) [Link]

It sure would be nice if my fedora desktops/laptops had as smooth audio playback as my android phones... It seems like every fedora kernel update is a gamble on this front (although, the source of the glitches is no doubt a broad number of subsystems).

Juggling software interrupts and realtime tasks

Posted Dec 3, 2022 0:45 UTC (Sat) by kaali1 (guest, #144803) [Link]

Are the hardware buffer so small on laptop/desktop that the kernel has deadlines missed (underrun) due to software interrupt?
While Android has low latency audio with 2/4ms deadlines, for music playback (especially on a laptop) a much deeper buffer should be used (for both power and performance reason).
I don't know much about the x86 audio subsystem, is the audio ring buffer between north bridge and the ADSP in the south bridge not shared memory? Which would allow for a big enough buffer size (100ms+) to not be impacted by the kernel scheduling (at least for non latency sensitive use case).

Juggling software interrupts and realtime tasks

Posted Dec 3, 2022 14:33 UTC (Sat) by mss (subscriber, #138799) [Link]

I don't remember ever having a problem with smooth audio playback on Linux desktop or laptop in the last decade.
Even though most of these systems have (had) "spinning rust" HDDs, which are known for their high latency.
But I only use raw ALSA, not stuff like PulseAudio.

Something has to be seriously broken in Fedora then.
Or your system is running out of memory, has crazy high CPU load or is forced to run at a very low power state for some reason.

Juggling software interrupts and realtime tasks

Posted Dec 4, 2022 4:21 UTC (Sun) by roc (subscriber, #30627) [Link]

For many years I've had glitch-free audio playback on Fedora except when the system is under very heavy load.

Juggling software interrupts and realtime tasks

Posted Dec 5, 2022 4:21 UTC (Mon) by error27 (subscriber, #8346) [Link]

For my workload it was the IO scheduler which lead to skips. Switching to BFQ solved the problem.

It might not fix your problem but it's at least easy to try:
sudo echo bfq > /sys/block/sda/queue/scheduler

Juggling software interrupts and realtime tasks

Posted Dec 5, 2022 7:55 UTC (Mon) by pabs (subscriber, #43278) [Link]

I wonder if this Linux patchset would help with the PipeWire issue I had on Debian; choppy audio under high CPU load. I never had this with PulseAudio though.

https://wiki.debian.org/PipeWire#choppy_audio_on_systems_...

Juggling software interrupts and realtime tasks

Posted Jan 8, 2023 15:40 UTC (Sun) by rep_movsd (guest, #100040) [Link]

Speaking about Android and audio latency, BlueTooth, which most people are opting for now, completey screws latency.

Try using some Piano/Keyboard app with a bluetooth headphone - literally unusable

I dont see why Bluetooth has to be so bad

Juggling software interrupts and realtime tasks

Posted Jan 8, 2023 23:40 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]

Bluetooth can be low-latency. Just check the gamer-oriented peripherals.

Juggling software interrupts and realtime tasks

Posted Dec 4, 2022 21:25 UTC (Sun) by klossner (subscriber, #30046) [Link]

pre-Linux Unix systems often included the concept of a "bottom half" as a way of deferring work that could not be done in an interrupt handler.

Actually, pre-Linux systems called this the "top half"; the "bottom half" was the interrupt handler. See e.g. UNIX PROGRAMMER’S MANUAL Seven Edition. This reflected the classic stack model: hardware at the bottom, user code at the top. When I first worked on Linux mumble years ago, this was a quite a source of confusion.

Juggling software interrupts and realtime tasks

Posted Dec 5, 2022 23:46 UTC (Mon) by willy (subscriber, #9762) [Link]

Ouch! That's a huge PDF; some warning would have been nice!

Relevant paragraph:

> A number of subroutines are available which are useful to character device drivers. Most of these handlers, for example, need a place to buffer characters in the internal interface between their ‘‘top half’’ (read/write) and ‘‘bottom half’’ (interrupt) routines. For relatively low data-rate devices, the best mechanism is the character queue maintained by the routines getc and putc.

The top half is the bit called in process context. The bottom half is the bit called in interrupt context. Linux's use of bottom half is entirely consistent with that.

Juggling software interrupts and realtime tasks

Posted Dec 6, 2022 3:48 UTC (Tue) by willy (subscriber, #9762) [Link]

Oh, what I meant to say is that traditional Unix didn't need to distinguish between hard and soft interrupt context. That very document talks about the priority level of interrupts (spl4() thought spl7()). That meant that higher priority interrupts could interrupt lower priority interrupts. As far as I know, x86 didn't (and maybe still doesn't) support that. So Linux disables all interrupts while processing any interrupt, which means that all interrupt routines must be fast and defer anything time-consuming to softirq context (which can be interrupted by hardirqs)

Juggling software interrupts and realtime tasks

Posted Dec 19, 2022 18:48 UTC (Mon) by calumapplepie (subscriber, #143655) [Link]

Interrupt priority levels aren't supported in x86??
That's an interesting choice, given that I've been working with the decade-old PIC32MX processor, which support 7 levels of interrupts-interrupting-interrupts (ie, a handler for priority-1 interrupts can be interrupted by one for priority-2, so on and so forth).

On reflection, the answer to the issue seems clear; while some PIC32 processsors have a second set of registers for holding the interrupted execution context, most don't. That means all interrupts need to save and load every CPU register to memory when starting and stopping; which is a pretty significant chunk of overhead. Having multiple interrupt levels isn't much of a cost on top of that, then.

But I'll bet money that x86 doesn't make that 'shadow register set' optional. And while the simplest AMD64 processor is miles ahead of the most sophisticated PIC32, a complete set of registers is still not a small addition; and why bother having multiple extra sets if one will do?

Juggling software interrupts and realtime tasks

Posted Dec 20, 2022 11:32 UTC (Tue) by Wol (subscriber, #4433) [Link]

Sounds like a feature I believe the 68000 had would not go amiss here ...

I seem to remember it stored its registers in main ram, with a register that specified the base address. Surely it must have cached them in the cpu for speed, but changing that base register would have done a "flush and restore", so you could simply assign a block of ram as a register stack.

Cheers,
Wol

Juggling software interrupts and realtime tasks

Posted Dec 20, 2022 20:00 UTC (Tue) by mpr22 (subscriber, #60784) [Link]

That's very definitely not the Motorola 68000 (the A24D16-bussed 32-bit microprocessor from 1979 with three zeroes in its model number).

It sounds like the Motorola 6809, an enhanced version of the Motorola 6800 (the A16D8-bussed 8-bit microprocessor from 1974 with two zeroes in its model number).

The 6800's actual registers (8-bit accumulators A and B; 16-bit index register X; 16-bit stack pointer SP; 16-bit program counter PC; 8-bit status register with the two high bits hard-wired to 1) were on-chip. However, it had a "zero page" addressing mode for memory access, which took an 8-bit address and zero-extended it to 16 bits. The speed penalty for going to RAM was a lot lower then than it would be today, so you could use the zero page as "pseudoregisters".

Among other enhancements, the 6809 relabelled the "Zero Page" addressing mode as "Direct Page" and added an 8-bit Direct Page register that controlled the high eight bits of the address presented on the bus when using that addressing mode.

Juggling software interrupts and realtime tasks

Posted Dec 21, 2022 8:15 UTC (Wed) by jem (subscriber, #24231) [Link]

The Motorola 68k has the MOVEM instruction which takes a list of registers (a bitmap) to copy to/from consecutive memory locations. The same mnemonic is used for both directions; the assembler determines the variant based on the operands. The instruction does not use a dedicated base register for this, but you can choose from a range of addressing modes for the memory operand. The 68000 is a CISC processor, after all.

Juggling software interrupts and realtime tasks

Posted Dec 21, 2022 10:30 UTC (Wed) by farnz (subscriber, #17727) [Link]

The TMS9900 has only three on-chip registers: Program Counter (PC), Status Register (SR), and Workspace Pointer (WP). It then has instructions that operate on 16-bit registers, which are memory relative to the address in WP - so R0 is actually the two bytes at WP and WP + 1, while R12 is the two bytes at WP + 24 and WP + 25. All ALU operations work on numbered registers, not on the three internal registers, and indirect addressing is also relative to the WP relative registers, not one of the internal three. PC and SR work like in other processors; WP has a dedicated load-immediate and store to register instruction pair, plus can be modified by BLWP (branch and load workspace pointer) and RTWP (return with workspace pointer)

An equivalent behaviour on a modified 6809 would be if changing the Direct Page register also changed D, X, Y, U and S - i.e. changing DP changes all but PC and condition code registers in the 6809.

Juggling software interrupts and realtime tasks

Posted Dec 21, 2022 16:54 UTC (Wed) by excors (subscriber, #95769) [Link]

> Interrupt priority levels aren't supported in x86??

I'm not an expert but after reading a bit, I think it's roughly:

They are supported by the hardware, but Linux doesn't make much use of them.

On x86-64, there is TPR (task-priority register; writeable) and PPR (processor-priority register; read-only; basically the max of TPR and the currently-active interrupt's priority). An interrupt's 8-bit vector number is used as its priority. The top 4 bits are the priority class. The CPU will only handle an interrupt with a priority class higher than PPR. That means an interrupt can only be preempted by one of higher priority than itself, and TPR can be used to mask all interrupts up to a given priority (e.g. a kernel critical section can disable most interrupts, but still allow preemption by the highest-priority ones to meet certain real-time requirements).

As far as I can tell, x86-64 made TPR more easily accessible (through the CR8 register) but the basic functionality is the same as any x86 since Pentium (specifically ones with APIC).

When an interrupt occurs, the CPU will push EIP/EFLAGS/etc to the stack and clear the IF flag (to disable all interrupts) before running the handler. The handler obviously needs to preserve any other registers on the stack before clobbering them. The handler may do some work and then choose to set IF, to allow itself to be preempted by a higher-priority interrupt. Or it may choose not to. It sounds like Linux used to support nested interrupts but removed it in 2010, because of the complexity of preventing stack overflows: https://lwn.net/Articles/380931/ . (...except the posts in that article make it sound like interrupts on the same vector can nest, and I thought PPR would prevent that? I'm probably confused about something.)

For comparison, I think ARMv7-A/ARMv8-A are similar to x86, except the CPU does not push anything onto the stack - it just copies PC and status register (CPSR/PSTATE) into some special registers. The handler must preserve those on the stack before allowing preemption.

ARMv6/7/8-M are different: they don't disable interrupts before running the handler, so all handlers are preemptible by default. The CPU pushes several registers onto the stack on exception entry, and it can even be preempted in the middle of that entry sequence. That allows lower-latency handling of high-priority interrupts.

Juggling software interrupts and realtime tasks

Posted Dec 9, 2022 1:09 UTC (Fri) by opalmirror (subscriber, #23465) [Link]

Takes me back. My first post-college job in the late 1980s was helping port V7 UNIX (designed to run on a 68000 board with a segment MMU) to a 68010 (with support for restarting instructions on bus errors - so that sbrk() and brk() could be handled automatically and gracefully without special coordination with the C compiler). We ended up having some complex scheduling and input needs so we ported in bits of BSD into it including a bespoke implementation of select(), the full BSD socket API and network stack. I suppose we should have just tried porting BSD in the first place. It was lots of fun though. Not any hard real-time tasks for that project, but it controlled devices that were hard real-time in hardware.