Juggling software interrupts and realtime tasks
Benefits for LWN subscribers The primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today! |
The software-interrupt mechanism is one of the oldest parts in the kernel; arguably, the basic design behind it predates Linux itself. Software interrupts can get in the way of other work so, for almost as long as they have existed, developers have wished that they could be made to go away. That has never happened, though, and doesn't look imminent. Instead, Android systems have long carried a patch that tries to minimize the impact of software interrupts, at least in some situations. John Stultz is now posting that work, which contains contributions from a number of authors, in the hope of getting it into the mainline kernel.
Hardware interrupts (or just "interrupts") are initiated when a physical component in the system wants the kernel's attention; they will usually cause an immediate trap into a special handler function. Since interrupts take the system away from whatever else it was doing, interrupt handlers have to do their work quickly; there is not time for any sort of extended processing. This is not a new problem; pre-Linux Unix systems often included the concept of a "bottom half" as a way of deferring work that could not be done in an interrupt handler.
The Linux kernel, too, has had to develop mechanisms to defer processing until a more convenient time. One of those mechanisms is software interrupts (or "softirqs"). It was first introduced in the 0.99 kernel under the familiar "bottom half" name; the term "softirq" doesn't appear until the 1.1.77 development release. The abbreviation "bh" ("bottom half") can still be found in the names of kernel functions related to software interrupts.
In the current implementation, there is a handful of software-interrupt "vectors" assigned to specific subsystems. When one of those subsystems has to defer some work, it sets a bit in a per-CPU mask that will cause its software-interrupt handler to be called at a later time. Often, the software interrupt will run immediately after the completion of hardware-interrupt handling, meaning that it runs at a high priority, before just about any other work in the kernel. The software-interrupt mechanism is reserved for a handful of core-kernel subsystems, including networking and the block layer, with one big exception: the kernel's tasklet mechanism, which is available to any subsystem, executes tasklets in a software-interrupt handler.
Software interrupts can create latency in an otherwise well-functioning system, so they have long drawn the attention of developers interested in response times. In kernels configured for realtime preemption, software interrupts are handled in a process like any other, so their priority can be set relative to other important tasks. Most of us do not run realtime preemption kernels, though, so software-interrupt handling will usually steal time from other tasks that would like to be running. Even on normal systems, though, a system that is generating large numbers of software interrupts will eventually defer them to the per-CPU ksoftirqd kernel thread.
Normally, one expects realtime tasks to be the highest-priority work on the machine, even one that is not running a realtime kernel. But software-interrupt handling trumps even realtime tasks and can prevent them from meeting their deadlines. The latency that software interrupts can add has, evidently, been seen to cause audio glitches on Android systems, inspiring the work being proposed by Stultz now.
There are two components to the suggested change, the first of which is a scheduler tweak to modify its placement of realtime tasks. Currently, when a realtime task becomes runnable, the scheduler will search for a CPU that appears to have sufficient available capacity to run that task; among other things, that check will prevent the placement of a resource-hungry task on a slow CPU. Stultz's patch adds a new check to see whether a candidate CPU is handling (or is about to handle) a "slow" software interrupt that is expected to take a fair amount of processing time; for the purposes of this decision, "slow" is defined as being software interrupts from the networking or block subsystems. If the CPU is indeed busy in that way, the scheduler will try to place the realtime task elsewhere so that, with luck, it will gain access to the CPU more quickly.
The second piece changes how the kernel chooses to run software-interrupt handlers. Normally, those handlers are run as soon as possible on the current CPU unless the ksoftirqd thread is already running, in which case they will be queued for handling there instead. The patch series causes the kernel to check, before running a software-interrupt handler, whether there is currently a realtime task running on the current CPU; if so, and if the software interrupt is of the "slow" variety, it will be deferred to ksoftirqd so that the realtime task can continue running. Faster software interrupts (such as those for timers) will still execute immediately on the current CPU.
To effect that split, a subtle change was required. In current kernels, if ksoftirqd is running on a CPU, all software interrupts will be deferred to it. If these changes are active (there is a new kernel configuration option to control that), "fast" software interrupts will be handled immediately whether ksoftirqd is running or not. This change was necessary because the deferral of slower software interrupt handlers, as added by this patch series, may cause ksoftirqd to be running at times when the overall software-interrupt load is not high. With the old test, faster interrupt handlers could be delayed by an unnecessary deferral to ksoftirqd. The new code, though, could have the potential to not defer software interrupts at times when they are loading the system.
The end result of this work is much better latency results with realtime
tests like cyclictest.
The patches have been running on Android kernels "for a number of
years
" and presumably do not cause problems, at least in that context.
Even so, Peter Zijlstra is
not entirely happy with this work; it would be better, he suggested, to
simply stop software-interrupt processing when there is a high-priority
task to run. He has a
series of patches that make that kind of change that he had posted
previously, but admitted that the work "fell on its
face due to regressions
" at that time.
Stultz gave
Zijlstra's series a try and concluded that, while it improves the
situation, it does not fully address the problem. Specifically, it doesn't
help if a single software-interrupt handler runs for a long time, and that
is indeed something that happens. Thus, he said: "I'm not sure if it
will let us move away from the softirq-rt placement optimization
patches
". Hopefully, some sort of solution that is acceptable to all
of the developers involved will eventually be worked out.
Otherwise, the outcome may end up being that Android
continues to carry this patch series for several more years.
Index entries for this article | |
---|---|
Kernel | Android |
Kernel | Interrupts/Software |
(Log in to post comments)
Juggling software interrupts and realtime tasks
Posted Dec 2, 2022 17:58 UTC (Fri) by jhoblitt (subscriber, #77733) [Link]
Juggling software interrupts and realtime tasks
Posted Dec 3, 2022 0:45 UTC (Sat) by kaali1 (guest, #144803) [Link]
While Android has low latency audio with 2/4ms deadlines, for music playback (especially on a laptop) a much deeper buffer should be used (for both power and performance reason).
I don't know much about the x86 audio subsystem, is the audio ring buffer between north bridge and the ADSP in the south bridge not shared memory? Which would allow for a big enough buffer size (100ms+) to not be impacted by the kernel scheduling (at least for non latency sensitive use case).
Juggling software interrupts and realtime tasks
Posted Dec 3, 2022 14:33 UTC (Sat) by mss (subscriber, #138799) [Link]
Even though most of these systems have (had) "spinning rust" HDDs, which are known for their high latency.
But I only use raw ALSA, not stuff like PulseAudio.
Something has to be seriously broken in Fedora then.
Or your system is running out of memory, has crazy high CPU load or is forced to run at a very low power state for some reason.
Juggling software interrupts and realtime tasks
Posted Dec 4, 2022 4:21 UTC (Sun) by roc (subscriber, #30627) [Link]
Juggling software interrupts and realtime tasks
Posted Dec 5, 2022 4:21 UTC (Mon) by error27 (subscriber, #8346) [Link]
It might not fix your problem but it's at least easy to try:
sudo echo bfq > /sys/block/sda/queue/scheduler
Juggling software interrupts and realtime tasks
Posted Dec 5, 2022 7:55 UTC (Mon) by pabs (subscriber, #43278) [Link]
https://wiki.debian.org/PipeWire#choppy_audio_on_systems_...
Juggling software interrupts and realtime tasks
Posted Jan 8, 2023 15:40 UTC (Sun) by rep_movsd (guest, #100040) [Link]
Try using some Piano/Keyboard app with a bluetooth headphone - literally unusable
I dont see why Bluetooth has to be so bad
Juggling software interrupts and realtime tasks
Posted Jan 8, 2023 23:40 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]
Juggling software interrupts and realtime tasks
Posted Dec 4, 2022 21:25 UTC (Sun) by klossner (subscriber, #30046) [Link]
pre-Linux Unix systems often included the concept of a "bottom half" as a way of deferring work that could not be done in an interrupt handler.Actually, pre-Linux systems called this the "top half"; the "bottom half" was the interrupt handler. See e.g. UNIX PROGRAMMER’S MANUAL Seven Edition. This reflected the classic stack model: hardware at the bottom, user code at the top. When I first worked on Linux mumble years ago, this was a quite a source of confusion.
Juggling software interrupts and realtime tasks
Posted Dec 5, 2022 23:46 UTC (Mon) by willy (subscriber, #9762) [Link]
Relevant paragraph:
> A number of subroutines are available which are useful to character device drivers. Most of these handlers, for example, need a place to buffer characters in the internal interface between their ‘‘top half’’ (read/write) and ‘‘bottom half’’ (interrupt) routines. For relatively low data-rate devices, the best mechanism is the character queue maintained by the routines getc and putc.
The top half is the bit called in process context. The bottom half is the bit called in interrupt context. Linux's use of bottom half is entirely consistent with that.
Juggling software interrupts and realtime tasks
Posted Dec 6, 2022 3:48 UTC (Tue) by willy (subscriber, #9762) [Link]
Juggling software interrupts and realtime tasks
Posted Dec 19, 2022 18:48 UTC (Mon) by calumapplepie (subscriber, #143655) [Link]
That's an interesting choice, given that I've been working with the decade-old PIC32MX processor, which support 7 levels of interrupts-interrupting-interrupts (ie, a handler for priority-1 interrupts can be interrupted by one for priority-2, so on and so forth).
On reflection, the answer to the issue seems clear; while some PIC32 processsors have a second set of registers for holding the interrupted execution context, most don't. That means all interrupts need to save and load every CPU register to memory when starting and stopping; which is a pretty significant chunk of overhead. Having multiple interrupt levels isn't much of a cost on top of that, then.
But I'll bet money that x86 doesn't make that 'shadow register set' optional. And while the simplest AMD64 processor is miles ahead of the most sophisticated PIC32, a complete set of registers is still not a small addition; and why bother having multiple extra sets if one will do?
Juggling software interrupts and realtime tasks
Posted Dec 20, 2022 11:32 UTC (Tue) by Wol (subscriber, #4433) [Link]
I seem to remember it stored its registers in main ram, with a register that specified the base address. Surely it must have cached them in the cpu for speed, but changing that base register would have done a "flush and restore", so you could simply assign a block of ram as a register stack.
Cheers,
Wol
Juggling software interrupts and realtime tasks
Posted Dec 20, 2022 20:00 UTC (Tue) by mpr22 (subscriber, #60784) [Link]
It sounds like the Motorola 6809, an enhanced version of the Motorola 6800 (the A16D8-bussed 8-bit microprocessor from 1974 with two zeroes in its model number).
The 6800's actual registers (8-bit accumulators A and B; 16-bit index register X; 16-bit stack pointer SP; 16-bit program counter PC; 8-bit status register with the two high bits hard-wired to 1) were on-chip. However, it had a "zero page" addressing mode for memory access, which took an 8-bit address and zero-extended it to 16 bits. The speed penalty for going to RAM was a lot lower then than it would be today, so you could use the zero page as "pseudoregisters".
Among other enhancements, the 6809 relabelled the "Zero Page" addressing mode as "Direct Page" and added an 8-bit Direct Page register that controlled the high eight bits of the address presented on the bus when using that addressing mode.
Juggling software interrupts and realtime tasks
Posted Dec 21, 2022 8:15 UTC (Wed) by jem (subscriber, #24231) [Link]
Juggling software interrupts and realtime tasks
Posted Dec 21, 2022 10:30 UTC (Wed) by farnz (subscriber, #17727) [Link]
The TMS9900 has only three on-chip registers: Program Counter (PC), Status Register (SR), and Workspace Pointer (WP). It then has instructions that operate on 16-bit registers, which are memory relative to the address in WP - so R0 is actually the two bytes at WP and WP + 1, while R12 is the two bytes at WP + 24 and WP + 25. All ALU operations work on numbered registers, not on the three internal registers, and indirect addressing is also relative to the WP relative registers, not one of the internal three. PC and SR work like in other processors; WP has a dedicated load-immediate and store to register instruction pair, plus can be modified by BLWP (branch and load workspace pointer) and RTWP (return with workspace pointer)
An equivalent behaviour on a modified 6809 would be if changing the Direct Page register also changed D, X, Y, U and S - i.e. changing DP changes all but PC and condition code registers in the 6809.
Juggling software interrupts and realtime tasks
Posted Dec 21, 2022 16:54 UTC (Wed) by excors (subscriber, #95769) [Link]
I'm not an expert but after reading a bit, I think it's roughly:
They are supported by the hardware, but Linux doesn't make much use of them.
On x86-64, there is TPR (task-priority register; writeable) and PPR (processor-priority register; read-only; basically the max of TPR and the currently-active interrupt's priority). An interrupt's 8-bit vector number is used as its priority. The top 4 bits are the priority class. The CPU will only handle an interrupt with a priority class higher than PPR. That means an interrupt can only be preempted by one of higher priority than itself, and TPR can be used to mask all interrupts up to a given priority (e.g. a kernel critical section can disable most interrupts, but still allow preemption by the highest-priority ones to meet certain real-time requirements).
As far as I can tell, x86-64 made TPR more easily accessible (through the CR8 register) but the basic functionality is the same as any x86 since Pentium (specifically ones with APIC).
When an interrupt occurs, the CPU will push EIP/EFLAGS/etc to the stack and clear the IF flag (to disable all interrupts) before running the handler. The handler obviously needs to preserve any other registers on the stack before clobbering them. The handler may do some work and then choose to set IF, to allow itself to be preempted by a higher-priority interrupt. Or it may choose not to. It sounds like Linux used to support nested interrupts but removed it in 2010, because of the complexity of preventing stack overflows: https://lwn.net/Articles/380931/ . (...except the posts in that article make it sound like interrupts on the same vector can nest, and I thought PPR would prevent that? I'm probably confused about something.)
For comparison, I think ARMv7-A/ARMv8-A are similar to x86, except the CPU does not push anything onto the stack - it just copies PC and status register (CPSR/PSTATE) into some special registers. The handler must preserve those on the stack before allowing preemption.
ARMv6/7/8-M are different: they don't disable interrupts before running the handler, so all handlers are preemptible by default. The CPU pushes several registers onto the stack on exception entry, and it can even be preempted in the middle of that entry sequence. That allows lower-latency handling of high-priority interrupts.
Juggling software interrupts and realtime tasks
Posted Dec 9, 2022 1:09 UTC (Fri) by opalmirror (subscriber, #23465) [Link]
Takes me back. My first post-college job in the late 1980s was helping port V7 UNIX (designed to run on a 68000 board with a segment MMU) to a 68010 (with support for restarting instructions on bus errors - so that sbrk() and brk() could be handled automatically and gracefully without special coordination with the C compiler). We ended up having some complex scheduling and input needs so we ported in bits of BSD into it including a bespoke implementation of select(), the full BSD socket API and network stack. I suppose we should have just tried porting BSD in the first place. It was lots of fun though. Not any hard real-time tasks for that project, but it controlled devices that were hard real-time in hardware.