Identifying dependencies used via dlopen()

Did you know...?

LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.

By Daroc Alden
April 16, 2024

The recent XZ backdoor has sparked a lot of discussion about how the open-source community links and packages software. One possible security improvement being discussed is changing how projects like systemd link to dynamic libraries that are only used for optional functionality: using dlopen() to load those libraries only when required. This could shrink the attack surface exposed by dependencies, but the approach is not without downsides — most prominently, it makes discovering which dynamic libraries a program depends on harder. On April 11, Lennart Poettering proposed one way to eliminate that problem in a systemd RFC on GitHub.

The systemd project had actually already been moving away from directly linking optional dependencies — but not for security reasons. In Poettering's explanation of his proposal on Mastodon he noted: "The primary reason for [using dlopen()] was to make it easier to build small disk images without optional components, in particular for the purpose of initrds or container deployments." Some people have speculated that this change is what pushed "Jia Tan" to launch their attack at the beginning of April, instead of waiting until it was more robust.

There are several problems with using dlopen() for dependencies, however. One is that, unlike normal dynamic linking, using dlopen() exposes the functions provided by the dependency as void pointers, which must be cast to the correct type. If the type in the dependency does not match the type in the dependent program, this can open a potential avenue for type-confusion attacks. Several respondants to Poettering's explanation on Mastodon worried that promoting the use of dlopen() would be a detriment to security for this reason. James Henstridge said: "I imagine you could hide some interesting bugs via not-quite-compatible function signatures (e.g. cause an argument to be truncated at 32 bits)." Poettering replied:

In current systemd git we systematically use some typeof() macro magic that ensures we always cast the func ptrs returned by dlopen() to the actual prototype listed in the headers of the library in question. Thus we should get the exact same type safety guarantees as you'd get when doing regular dynamic lib linking. Took us a bit to come up with the idea that typeof() can be used for this, but it's amazing, as we don't have to repeat other libraries' prototypes in our code at all anymore.

Henstridge agreed after looking at the code that it was "quite elegant. It also neatly solves the problem of assigning a symbol to the wrong function pointer." Not all of the problems are so easily dismissed, however. The real problem, according to Poettering's announcement is the fact that using dlopen() removes information from the program's ELF headers about what its dependencies are.

Now, I think there are many pros of this approach, but there are cons too. I personally think the pros by far outweigh the cons, but the cons *do* exist. The most prominent one is that turning shared library dependencies into dlopen() dependencies somewhat hides them from the user's and tools view, as the aforementioned tools won't show them anymore. Tools that care about this information are package managers such as rpm/dpkg (which like to generate automatic package dependencies based on ELF dependencies), as well initrd generators such as dracut.

His proposed solution is to adopt a new convention for explicitly listing optional dependencies as part of the program itself. In the systemd RFC, he gave an example of a macro that could be used to embed the name of optional dependencies in a special section of the binary called ".note.uapi.dlopen". "UAPI" stands for Userspace API — referring to the Linux Userspace API Group, a relatively recent collaboration between distributions, package managers, and large software projects to define standardized interfaces for user-space software. The initial proposal for what to encode in the note section was fairly bare-bones — just a type field, the string "uapi" denoting the ELF section "vendor", and the name of the dependency in question.

Poettering was also clear that it wouldn't be useful to implement this for systemd on its own; the note would only be useful if other tooling decided to read it, and other projects choose to implement it. Mike Yuan was quick to comment positively about the possibility of adding support to mkinitcpio, Arch Linux's initramfs generation tool, and pacman, the Arch package manager.

Luca Boccassi agreed that he could "look into adding a debhelper addon for this", but wondered if there should be some way to indicate whether a dependency is truly optional or that the program will fail if the dependency is missing. Poettering responded: "If it is a hard dep, then it should not be a dlopen() one. The whole reason for using dlopen() over regular ELF shared library deps is after all that they can be weak", although he did point out that the type field means that "the door is open to extend this later."

Antonio Álvarez Feijoo raised another concern, pointing out: "Some people are very picky about the size of the initrd and don't like to include things that aren't really necessary. [...] So yes, it's great to know which libraries are necessary, but how to know what systemd component requires them?" Boccassi replied that this was an example of a situation where information on whether a dependency is required or recommended could be useful. Poettering disagreed, asserting that "which libraries to actually include in an initrd is up to local configuration or distro policy." Ultimately, consumers of the new note section can do whatever they would like with the information, including automatically generating dependencies, or merely using them as a "linter" to complain about new weak dependencies that are not already known.

I think all such approaches are better than the status quo though: we'll add a weak dep or turn a regular dep into a weak dep, and unless downstream actually read NEWS closely (which well, they aren't necessarily good at) they'll just rebuild their packages/initrd and now everything is hosed.

This appealed to Feijoo, who agreed that using the information as a sanity-check on package definitions made sense.

Carlos O'Donell asked whether Poettering cared about exposing the specific symbols and symbol versions that systemd uses, pointing out that existing ELF headers include this information. He asserted that RPM uses this information when packaging a program. Poettering said that was a good question, but replied:

To speak for the systemd usecase: even though we dlopen() quite a number of libraries these days (21 actually), and we actually do provide symbol versioning for our own libraries, we do not bother with listing symbol versions for the stuff we dlopen(). We use plain dlsym() for all of them, not dlvsym().

He went on to point out that requiring people to pin down symbol versions would be "a major additional ask".

Poettering did seem to think that there was some benefit to integrating this new proposal into the existing implementation of dynamic linking in the GNU C library (glibc). He asked O'Donell and Florian Weimer — who are both involved in the glibc project — "should we proceed with this being some independent spec somewhere that just says '.note.uapi.dlopen contains this data, use it or don't, bla bla bla'. Or did the concept of weak leaking interest you enough that you think this should be done natively in glibc, binutils and so on?" Some other operating systems — notably macOS — have a native concept of "weak linking" for optional dependencies, so the idea of incorporating this information into the build system and standard library are not new.

Zbigniew Jędrzejewski-Szmek brought up an additional question about the formatting of the new section, asking whether it would make sense to use "a compact JSON representation". Jędrzejewski-Szmek said that this could make it easy to add a human-meaningful description of what the dependency is used for. With that addition, "it should be fairly easy to integrate this in the rpm build system." Boccassi agreed that the payload should be JSON. Poettering replied: "I have nothing against using JSON for this, but it's key we can reasonably generate this from a simple no-deps C macro I'd say."

Ultimately, the idea of having a standard encoding for optional dependencies seems to have been well-received, with several package managers potentially interested in adding support. With discussion still ongoing and the final format of the added information up in the air, however, it's too soon to say exactly what form the information will take. Anything intended to help ameliorate the pain of removing traditional dynamically linked dependencies seems like a good idea, though, since they reduce the surface open to XZ-backdoor-like attacks.

(Log in to post comments)

Identifying dependencies used via dlopen()

Posted Apr 16, 2024 21:16 UTC (Tue) by andresfreund (subscriber, #69562) [Link]

Hi,

It seems the medium to long term solution ought to be to properly support "lazily loaded dependencies" at the ELF level. Implementing this via dlopen() in various places doesn't scale all that well and makes it harder to centrally improve performance and security. The dlopen() approach pretty much requires indirect function calls via modifiable pointers, which isn't great. And as the article points out, it makes stuff like symbol versioning much more onerous.

With proper support by the dynamic linker, each lazy-loaded library could get a distinct "range" in an SO's GOT that could separately get remapped. That still would increase exposure over the non-lazy -z now -z relro binaries, but not to the degree that dlopen() ends up doing (and we just have seen -z now has its own risks).

Greetings,

Andres

Identifying dependencies used via dlopen()

Posted Apr 16, 2024 22:06 UTC (Tue) by ibukanov (subscriber, #3942) [Link]

The main problem with using normally linked libraries is that the loader can run arbitrary code coming from the library on the application startup even if the library is never used. Stop that and the problem solved. Of cause that requires quite a lot of work to ensure that the library initialisation code will be properly run on the first use before the library function is called, but long term this should be doable.

Identifying dependencies used via dlopen()

Posted Apr 22, 2024 15:57 UTC (Mon) by ScottMinster (subscriber, #67541) [Link]

Some of the software I work on makes use of that initialization phase of the library to have the library tell other parts of the software what it can do. The very act of calling dlopen() on these plugin libraries lets them register themselves. If the library initialization code didn't run until later, the library would end up never being called at all, even if it was needed.

Identifying dependencies used via dlopen()

Posted Apr 22, 2024 16:53 UTC (Mon) by rschroev (subscriber, #4164) [Link]

Can't the code that calls dlopen() to load the library also call a function like init() or register() or something like that in the library?

Identifying dependencies used via dlopen()

Posted Apr 23, 2024 21:10 UTC (Tue) by ScottMinster (subscriber, #67541) [Link]

The code in question is fairly generic and just loads any .so "plugin" files found in the write location. It doesn't know what functionality any of them have, just that it has been configured to load them. In addition, there are a lot of different ways the plugin libraries themselves can extend functionality. Maybe they add support for reading a new file format. Or writing a file format. Or doing both. Or reading and writing several similar formats. The point is, it would be hard for the code doing the loading to know what registrations to call on behalf of the library. It is easier for the library to make those calls itself based on what it knows it can do.

As a further detail, the code is C++, so the initialization is actually being done through static initializers in the library (constructors on static objects). The registration method is a constructor on a base class that records the pointer to itself (this) in a list of pointers. Later, when asked to exploit the loaded functionality, it can call a pure virtual method (implemented in the derived class in the plugin library) to do the work.

If delaying static initializers until something in the loaded library was explicitly called, I would probably have to change the logic to make some sort of dummy call into each plugin library loaded. That is a bit annoying, though on reflection, if I set it up right, it would provide some kind of check that the library that was loaded really was meant to be loaded. That it really was a plug in to this software and not some mistake. Presumably, if the right function couldn't be found with dlsym(), then the library could be safely unloaded and ignored.

So delaying is an interesting idea, but would probably cause too many problems before software was fixed to expect it. But maybe it could be an opt-in flag to dlpen()?

Identifying dependencies used via dlopen()

Posted Apr 24, 2024 9:29 UTC (Wed) by farnz (subscriber, #17727) [Link]

The code doing the loading can call a void * plugin_init(void) function, which does whatever your deferred dynamic initializers do today; call back to the opener to register a file format reader, a new renderer, a writer, whatever else they currently do at the moment they're loaded. At exit, it can call a void plugin_destroy(void * init_return) function, which does the destruction before the plugin is unloaded.

There's nothing that can be done via code running at load time that can't also be done via calling a function inside the library; the reason most psABIs run code at load time, rather than simply expecting that the start point runs constructors etc, is to make it easy to implement C++ deferred dynamic initialization.

If I were interested in playing with psABIs, with a view to producing an alternative to current ELF mechanisms, I'd be tempted to experiment with indirect symbol resolution as an alternative to running code on load; the idea is that when a symbol is resolved, instead of the loader looking up the symbol value directly (as it does normally), it would run code that provides the symbol value or a reference to another symbol to look up. You can then implement deferred dynamic initialization by having indirect lookup functions that check a flag, run static initializers if needed, and return the "real" symbol to the loader for it to look up - slightly slowing down symbol lookup if static initializers exist, but deferring it to the last moment that C++ permits.

Identifying dependencies used via dlopen()

Posted Apr 25, 2024 14:15 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

> Presumably, if the right function couldn't be found with dlsym(), then the library could be safely unloaded and ignored.

Once the library is loaded, you've already lost if it does anything that doesn't allow unloading. Touching thread-local storage on macOS will do this. As will any static initializers that *do* exist. One nice thing on Windows is that you can interrogate a library without loading it. One can use `libelf` for that with ELF platforms, but that is Another Dependency and you're left figuring out what the runtime loader will do with it on your own.

Identifying dependencies used via dlopen()

Posted Apr 22, 2024 18:44 UTC (Mon) by ibukanov (subscriber, #3942) [Link]

What I suggest is to have an option for a linked library to delay the initialization until the first call to the library function.

With such option all library functions point to a code that first ensures initialization of the library and then calls the function itself. From the library point of view this looks as if the library was loaded with dlopen followed by a call to the function.

Identifying dependencies used via dlopen()

Posted Apr 22, 2024 19:50 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

This is a bad idea. It can inject unexpected failures into functions "can not" fail, and which the application doesn't expect to fail. This can even be caused by things like sandboxing, or unexpected environment conditions like file descriptor exhaustion.

It's really much better to just do initialization explicitly by calling something like "init()" after dlopen(), or just link libraries eagerly and incorporate initialization into the overall application initialization.

Identifying dependencies used via dlopen()

Posted Apr 23, 2024 9:29 UTC (Tue) by farnz (subscriber, #17727) [Link]

This is how iOS and macOS work; the initializer is run on library load, and library load is delayed until the first symbol from that library is referenced. It's also explicitly permitted in the C++ standard - deferred dynamic initialization takes place no later than just before the first use of a symbol from the translation unit that contains the static initializer.

Now, I'd agree with the notion that deferred dynamic initialization (runtime constructors for static objects) is a mistake. But it's a mistake baked into C++, and from there into psABIs.

Identifying dependencies used via dlopen()

Posted Apr 23, 2024 17:56 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

I tried: https://gist.github.com/Cyberax/a60dce723d4e80a5d182d6067...

It's definitely not _that_ lazy by default. The initializers run during the dlopen() time.

Identifying dependencies used via dlopen()

Posted Apr 24, 2024 9:11 UTC (Wed) by farnz (subscriber, #17727) [Link]

I was referring not to dlopen, but to libraries that are lazily linked in to the binary via Mach-O dynamic linking; at the first use of a symbol from the library, the library's deferred dynamic initialization code is run.

Identifying dependencies used via dlopen()

Posted Apr 24, 2024 11:00 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

I tried it for a while and couldn't get it to work. I went down the rabbit hole of reading the source code of reading the LDD code before finding this:

https://developer.apple.com/forums/thread/131252

> That feature has been removed from the linker. The only work around is to not link with library and instead dlopen() it at runtime and then use dlsym() to find the functions you want from it.

So yes, it's was a horrible feature that was barely used. So Apple did the right thing and removed it.

Identifying dependencies used via dlopen()

Posted Apr 23, 2024 16:36 UTC (Tue) by ibukanov (subscriber, #3942) [Link]

Well, then give an option not to run the init code from .so until the app explicitly asks for it. This will be sort-of dlopen for linked libraries.

Identifying dependencies used via dlopen()

Posted Apr 23, 2024 18:04 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

This is basically replacing automatic initialization with an explicit init() function. Which is a great idea in general, because it makes initialization predictable.

Identifying dependencies used via dlopen()

Posted Apr 17, 2024 1:49 UTC (Wed) by azumanga (subscriber, #90158) [Link]

While this is a longer-time goal, I really hope we can get to full sandboxing, maybe via something like WASM?

There really is no need for something like xz, when used as a library dependancy, to have any more access than "input data goes here, output data goes here".

Identifying dependencies used via dlopen()

Posted Apr 17, 2024 8:46 UTC (Wed) by gmatht (subscriber, #58961) [Link]

Forking XZ into its own process would probably be faster than WASM. I recall that there was a proposed patch set to limit code to a particular segment of memory in the Kernel. Not sure if that could be used in userspace, or for this purpose.

Identifying dependencies used via dlopen()

Posted Apr 17, 2024 9:29 UTC (Wed) by smcv (subscriber, #53363) [Link]

Compression and decompression seems like a perfect use for seccomp: fork a subprocess, give it a fd for reading and a fd for writing, forbid syscalls other than read(), write(), close(), _exit() and memory allocation, and let it run until it exits.

(Of course, if your (de)compression library contains a back door, there's no guarantee that it is faithfully decompressing what it was given as input...)

Identifying dependencies used via dlopen()

Posted Apr 17, 2024 10:30 UTC (Wed) by jreiser (subscriber, #11027) [Link]

That list of syscalls comes quite close to restricting de-compression to one-pass online single-threaded streams. Already there are multi-threaded and/or multi-pass methods which use threads, general clone, *stat, open/creat, readlink, mmap/mprotect/munmap, memfd_create, pipe/socket, etc. For example, JPEG/MPEG can use such services. Even de-compression of zlib streams can be parallelized by using a compatible extension to the protocol.

Identifying dependencies used via dlopen()

Posted Apr 18, 2024 10:32 UTC (Thu) by LtWorf (subscriber, #124958) [Link]

Do you see how many vulnerabilities browsers have all the time?

I don't think WASM is so secure as you think it is.

Identifying dependencies used via dlopen()

Posted Apr 17, 2024 8:14 UTC (Wed) by farnz (subscriber, #17727) [Link]

There's the Mach-O approach (macOS, iOS): each symbol import identifies both the symbol and the library it comes from. A library is not loaded until the first use of a symbol it supplies; thus, merely dynamically linking a library this way is inert up until you use a symbol from that library.

This then lets you get both behaviours; with -z now, the library is loaded (and thus its .init_array and similar executed) as soon as you start the program. With -z lazy, the library is loaded (and thus its .init_array and similar executed) the first time you reference a symbol from the library.

I suspect that you could do similar in ELF without requiring the symbol imports to identify the library they come from, by making more of the link process lazy; I have not looked in depth, but I believe that you could implement this by tracking symbols against sources inside the dynamic linker instead of having to have the importer tell you which library to import for this symbol.

Identifying dependencies used via dlopen()

Posted Apr 17, 2024 10:12 UTC (Wed) by bluca (subscriber, #118303) [Link]

Yeah having those features available on Linux would be best of both worlds, I hope the companies funding glibc development can be convinced to invest in this

Identifying dependencies used via dlopen()

Posted Apr 17, 2024 11:41 UTC (Wed) by smoogen (subscriber, #97) [Link]

It is larger than glibc to do this. ELF is an international standard and has all kinds of committees to set what it does, how it operates, etc. As a change in the standard, it has to be dealt with in a committee which has a lot of people on it in different fields on it (academic, various industries). Once a standard is written, it then needs to be implemented in a backwards compatible method (since you will have applications using both the new and old method) and then compilers, linkers, libraries have to implement the change.

Method 2 would be to come up with a new standard (ORC, ELF, and DWARF are all taken so maybe ENT?) and recompile the universe to make that work. Sometimes it is easier to boil the ocean than deal with academic committees :)

Identifying dependencies used via dlopen()

Posted Apr 17, 2024 19:37 UTC (Wed) by lgerbarg (guest, #57988) [Link]

You can absolutely add platform specific extensions without the ceremony of going through the whole standardization process... Solaris actually has extended ELF in this way, they call it direct binding: https://en.wikipedia.org/wiki/Direct_binding

Identifying dependencies used via dlopen()

Posted Apr 24, 2024 18:00 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

JFYI, Apple removed that anti-feature: https://developer.apple.com/forums/thread/131252

100% support that. Lazy loading of libraries is actively bad and leads to crazy designs.

Identifying dependencies used via dlopen()

Posted Apr 17, 2024 9:47 UTC (Wed) by Tobu (subscriber, #24111) [Link]

I remember when the --as-needed linker flag was introduced, reducing the dependency footprint of binaries. Now that distributions use it by default, binaries and shared libraries only link against other libraries if they can be used to resolve a non-weak symbol. The limitation of that is that the "object needs library" relationship only describes direct relationships between any symbol of the first object towards any symbol of the second.

I think a more inspectable alternative to lazy loading (although you could do both) would be to compute the transitive closure of needed libraries on executable files at link time. This would be a subset of libraries reachable via DT_NEEDED relationships, but only take into account symbols reachable from the executable. Possibly the analysis would need something like lto to be enabled as well. Then, the dynamic loader would not map libraries that aren't needed by symbols of the top-level executable.

This way, assuming sshd links to libsystemd for one symbol, and libsystemd links to libxz but only for other symbols, the sshd executable will not have libxz included in its manifest of transitively required libraries, and the dynamic loader will not look for it either.

Identifying dependencies used via dlopen()

Posted Apr 16, 2024 22:30 UTC (Tue) by flussence (subscriber, #85566) [Link]

Yes please. This looks like a good start. Long-term I'd like to see something shaped like WebExtensions manifests but for ELF files to declare what they want to do up front.

Few ideas:
- A fixed list of dlopen patterns like this needs basic shell globs and a subset of variables if possible (so you can have desktop applications declare intent to load from ~/.local/<foo>/plugins/ or $XDG_RUNTIME_FOOBAR instead of just giving up).
- A capability bounding set stored in the binary would be good for system services instead of leaving that as a sysadmin guessing game. I can count on one hand the number of daemons I've encountered that bother to fill in that part of their systemd .service files, it needs to be ubiquitous and frictionless to get any adoption. This would be complementary to xattr caps since those are about increasing privileges (and also they don't work over NFS).
- It's probably unreasonable to try to cram seccomp or network rules in here, given that understanding either of those seems to be a full time job. Unless it's something *extremely* simple like "disallow all network and socket access" or "give me seccomp mode 1"... but then that'd break "/bin/true --help" on a GNU system.

Identifying dependencies used via dlopen()

Posted Apr 17, 2024 2:36 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

Blergh. Bad ideas on top of bad ideas. dlopen() should NOT be used at all normally.

OK, you screwed up with libsystemd exploding it in size until it became a problem. So live with it, and create another separate library that only contains lean stuff ("liblightsystemd" or whatever). Split the heavy dependencies into a separate "libjournald".

Instead we're now getting hacks upon hacks that further complicate the system.

Identifying dependencies used via dlopen()

Posted Apr 17, 2024 6:56 UTC (Wed) by Wol (subscriber, #4433) [Link]

> OK, you screwed up with libsystemd exploding it in size until it became a problem. So live with it, and create another separate library that only contains lean stuff ("liblightsystemd" or whatever). Split the heavy dependencies into a separate "libjournald".

> Instead we're now getting hacks upon hacks that further complicate the system.

Except your solution doesn't solve the underlying problem, which is "if using unsafe techniques is easy, programmers will use them". Splitting the library doesn't get rid of the problem, it just makes it harder to exploit, which for a paid black-hat isn't a problem.

Cheers,
Wol

Identifying dependencies used via dlopen()

Posted Apr 17, 2024 8:52 UTC (Wed) by pbonzini (subscriber, #60935) [Link]

True but you can do both.

Rather than splitting the library, I'd turn the utilities for daemons into a small copylib, but that's just a spin on the split idea.

Identifying dependencies used via dlopen()

Posted Apr 17, 2024 13:21 UTC (Wed) by kazer (subscriber, #134462) [Link]

You are only looking at systemd-related aspect but this is actually a thing that affects basically every piece of software in user-space. And that is about getting unexpected things loaded with the actual dependencies you want to use.

The automatic linking system is great to reduce workload on programmers, but has the mentioned downsides. Better support for optionals that would not increase bugs or programmer workload significantly sounds great.

Identifying dependencies used via dlopen()

Posted Apr 17, 2024 13:34 UTC (Wed) by farnz (subscriber, #17727) [Link]

There's also a deeper issue; anything that is loaded has code run for you by the dynamic linker at load time. regardless of whether symbols from that object are used. As a result, if an unexpected thing is loaded, it can do anything, even if it was supposed to be inert in this situation.

Identifying dependencies used via dlopen()

Posted Apr 17, 2024 9:38 UTC (Wed) by donald.buczek (subscriber, #112892) [Link]

Transitioning to dynamic library loading via dlopen() poses significant challenges for scientific computing across multi-user systems. In environments where system updates are synchronized across a network of computers, the conventional approach involves "pushing" updates and avoiding immediate reboots. This method accommodates long-running processes and minimizes work disruption, as systems can continue operating with older software versions in memory until a suitable reboot time is determined.

However, this strategy becomes problematic when dealing with runtime-loaded plugins and shared objects. The complexity escalates with components like PAM, NSSwitch, and Apache modules, particularly when updates involve critical libraries like glibc. The issue arises from shared objects within the same package that depend on each other but lack backward compatibility due to internal ABI changes. This discrepancy becomes problematic when an updated shared object is dynamically loaded by an older version still in memory.

For instance, updating glibc can lead to conflicts where long-running services, such as sshd, attempt to initiate new sessions with mixed versions of glibc's shared objects, leading to potential segmentation faults or worse. This scenario is exemplified by the interaction between libc, libnsl, and libnss_files and pam_unix within the glibc and Linux PAM packages.

While currently limited to specific use cases involving plugins, the issue represents a significant concern. The adoption of dlopen() for shared libraries, as seen with systemd, amplifies these challenges, raising apprehensions about the sustainability of this approach in critical computing environments.

Identifying dependencies used via dlopen()

Posted Apr 17, 2024 10:09 UTC (Wed) by bluca (subscriber, #118303) [Link]

This is already an existing issue as you noted, due to PAM and other projects making heavy use of plugins. It seems to me such environments would be better served by moving to an image-based deployment model, so that updates take effect atomically at the next reboot/soft-reboot.

Identifying dependencies used via dlopen()

Posted Apr 17, 2024 10:43 UTC (Wed) by donald.buczek (subscriber, #112892) [Link]

Well, currently it is limited to the glibc (nsswitch and such) and pam areas. These are the only packages we've had problems upgrading so far. I've mentioned Apache httpd, but while we have many individual installations using httpd and friends, these are independent installations and there are no Apache modules in the system which would be touched by a system update or even a central httpd installation, which would be updated under the feet of the running daemons. Of course, these do use glibc, and after a glibc update, you might have web servers failing after hours or days, when new worker threads are forked, because the existing ones reached their request limit. So Apache httpd is more a victim than a cause of the problem.

Image-based deployment models wouldn't suit us, because generally, we do profit from the fact, that after an update all systems have the same userspace and all tools are on the same version. While long-running daemons and user jobs might use old versions of shared libraries until they are restarted or the system reboots, these are only few processes compare to the dynamic work. Our users profit from the fact that something, they run on their workstations, runs without any modification on any system, any public compute server or as a cluster job and it eases our centralized management and monitoring. Plus, most of the time updates have a reason, we want the new versions, otherwise we wouldn't do the update. It might be just commands and tools, but if it is important for specific lingering daemons, we can just restart these.

Identifying dependencies used via dlopen()

Posted Apr 17, 2024 11:10 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

> This is already an existing issue as you noted, due to PAM and other projects making heavy use of plugins.

musl libc manages to do just fine without PAM or NSS. And what "other projects" that are equally fundamental in Linux?

Identifying dependencies used via dlopen()

Posted Apr 17, 2024 16:09 UTC (Wed) by jem (subscriber, #24231) [Link]

>musl libc manages to do just fine without PAM

Are you proposing to drop PAM because "musl libc manages to do just fine without PAM" (i.e. doesn't support it). Talk about putting the cart before the horse!

Identifying dependencies used via dlopen()

Posted Apr 18, 2024 0:47 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

Yes, I do. Both PAM and NSS need to die, there should be no dynamic dependencies in the basic stack.

NSS can be replaced by a daemon talking the NSCD protocol, and PAM can be segregated into a separate authentication daemon (like SSSD) or replaced by something else (like ephemeral certificates for SSH).

Identifying dependencies used via dlopen()

Posted Apr 18, 2024 7:14 UTC (Thu) by gioele (subscriber, #61675) [Link]

> Yes, I do. Both PAM and NSS need to die, there should be no dynamic dependencies in the basic stack.
>
> NSS can be replaced by a daemon talking the NSCD protocol, and PAM can be segregated into a separate authentication daemon (like SSSD) or replaced by something else (like ephemeral certificates for SSH).

Are we sure that swapping out dynamic linking for IPC is a net positive move?

(Sincere question from somebody that welcomes every reduction of complexity in the basic stack and TCB.)

Identifying dependencies used via dlopen()

Posted Apr 18, 2024 7:38 UTC (Thu) by donald.buczek (subscriber, #112892) [Link]

> Are we sure that swapping out dynamic linking for IPC is a net positive move?

Maybe not generally, but it would be w.r.t. the specific problems I have described. Also, if well done, it could be a more secure design. And to me, the pam code and API design doesn't look very nice and a replacement would have a chance to have better coding style.

On the other hand, we've utilized both, PAM and NSS with our own modules in the past and currently consider to create another PAM module for authentication. And while the authentication/authorization part of PAM could well be done with very minimal static code in the client talking to a server, which does the complicated or privileged things, there are several pam modules, which need to run complicated code in the client process, for example pam_limits or pam_env. So if you want to keep this functionality, you won't profit much from a split between processes. These difficulties don't exist for NSS, so I think NCSD might be a better design.

Identifying dependencies used via dlopen()

Posted Apr 18, 2024 7:47 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

> Are we sure that swapping out dynamic linking for IPC is a net positive move?

Unquestionably, yes.

PAM and NSS modules have always been awkward, they depend too much on the environment that is not under their control. This can be kind of an issue for PAM modules, especially if you want them to access devices like hardware tokens or write/read files.

You also need to write them in a very careful manner to not interfere with the application that _uses_ them. You can't just add a dependency on libcurl if you're writing an NSS/PAM module, you can't easily launch threads or mess with signal handlers.

Even tasks that _should_ be straightforward like logging and auditing become non-trivial inside PAMs. And this is just not a good situation to be in, for such a critical part of the system.

We now have a very capable daemon management solution (systemd), with socket activation, proper support for logging and so on. It solves nearly all the issues with PAM/NSS.

The only major missing piece is a standard high-level IPC protocol, although even that role is somewhat adequately fulfilled by dbus these days.

Identifying dependencies used via dlopen()

Posted Apr 18, 2024 10:06 UTC (Thu) by farnz (subscriber, #17727) [Link]

Definitely a positive for security; it might be a slight negative for performance in the case where your PAM or NSS stack is very fast in-process, but that should be an extremely rare case - the moment you use a network service for auth, it becomes slow enough that IPC is not the bottleneck.

With IPC to an authentication and authorization daemon (SSSD, for example), you can run the daemon in a known-good environment (so no surprises from LD_PRELOAD, ptrace or similar), with access to files that the process is blocked from accessing (can read /etc/shadow, even if the process isn't authorized to do so). In turn, this would let you lock down access to the configuration for your auth stacks, which you can't do today (/etc/nsswitch.conf has to be readable by any process that wants to ask about authorization, for example).

Identifying dependencies used via dlopen()

Posted Apr 18, 2024 11:29 UTC (Thu) by gioele (subscriber, #61675) [Link]

> Definitely a positive for security; it might be a slight negative for performance

What I worry about are failure modes. IPC has very different failure modes compared to dynamically loaded libraries (timeouts, partial responses, duplicate responses, DOSes) and security problems (MITM).

Daemonized PAM/NSS will be hot targets for malicious actors. Will users of these PAM/NSS daemons correctly handle these novel failures?

Identifying dependencies used via dlopen()

Posted Apr 18, 2024 11:41 UTC (Thu) by dezgeg (subscriber, #92243) [Link]

Complex commonly used NSS modules like sssd already use IPC with a daemon.

Identifying dependencies used via dlopen()

Posted Apr 18, 2024 11:41 UTC (Thu) by farnz (subscriber, #17727) [Link]

Anyone running SSSD, NSCD, or NSLCD has these failure modes already, along with the failure modes of dynamically loaded libraries. Those three all supply nsswitch plugins that simply do IPC to a daemon to get authorization details. Additionally, SSSD supplies a PAM plugin that does authentication over a socket.

And users of NIS, NIS+, LDAP or Kerberos have this problem but worse, since the IPC is done over a network, not over a Unix Domain Socket.

There's also no reason to insist that everything goes via IPC; you can have a built-in handler for "files" and "pam_unix" file-based methods (optionally disabled, of course), with IPC for the cases where the built-in handlers don't work for your use case. Basically, have a hard-wired set of options, with IPC instead of dlopen for cases where the hard-wired set isn't enough.

Identifying dependencies used via dlopen()

Posted Apr 18, 2024 13:06 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

> and security problems (MITM)

Others replied about other issues, but I just want to note that MITM doesn't really apply for the local IPC. You can reliably verify the caller's identity using good old Unix sockets (SCM_CREDENTIALS and others).

Identifying dependencies used via dlopen()

Posted Apr 18, 2024 21:03 UTC (Thu) by dkg (subscriber, #55359) [Link]

IPC has very different failure modes compared to dynamically loaded libraries (timeouts, partial responses, duplicate responses, DOSes) and security problems (MITM).

I agree with you about the different kinds of failure modes, but the discussions in this thread suggesting lazy dynamically loaded libraries (whether done via dlopen() or by the dynamic linker itself) also introduce some novel failure modes.

figuring out how to wrap any of these scenarios in reasonable error-handling sections so that arbitrary tooling that relies on them can be updated smoothly is the real trick.

Great idea

Posted Apr 17, 2024 10:09 UTC (Wed) by vadim (subscriber, #35271) [Link]

This would be great. Currently doing something like a chroot or an AppImage can be incredibly painful for large applications due to this. You never know when something is going to break, because something is only loaded at runtime, and only during some particular conditions.

While we're at it, it'd be nice if there could be a similar feature for binaries. As in, you can easily find out that this program calls something from /usr/libexec.

Great idea

Posted Apr 17, 2024 17:34 UTC (Wed) by matthias (subscriber, #94967) [Link]

This is meant for ELF binaries. It would be ridiculous if it only works for libraries and not for executables. Both of them are binaries and there is really not much difference between the two.

Great idea

Posted Apr 17, 2024 17:40 UTC (Wed) by matthias (subscriber, #94967) [Link]

Sorry, I completely missed your point. You mean if someone calls exec to call into other executables. Yes, this could be interesting, too.

But where do we stop? There are not only libraries and executables, but also data. In theory the application can depend on arbitrary files to be available. But I do not know how much of a problem this is.

Identifying dependencies used via dlopen()

Posted Apr 18, 2024 6:27 UTC (Thu) by guillemj (subscriber, #49706) [Link]

It is rather unfortunate that this is trying to promote and "standardize" the anti-pattern of dlopen()ing over an externally defined ABI boundary.

Using dlopen() like this might appear easier (for specific upstreams), than either properly defining a plugin framework (which should not amount to much code anyway), or moving the linking into independent programs that can then be optionally called. This is a hack and a workaround that goes behind the toolchain back, that pretty much just shifts the complexity elsewhere, with reduced safety nets and functionality (for example with dpkg we have dependency granularity up to versioned symbols). If the proposal was to improve the toolchain to support optional linking, then that would be a welcome change, but this is rather disappointing.

I've covered that in the past (with further references):

https://lists.debian.org/debian-mentors/2017/11/msg00196....
https://github.com/systemd/systemd/pull/17416#issuecommen...
https://github.com/systemd/systemd/pull/17416#issuecommen...

Identifying dependencies used via dlopen()

Posted Apr 18, 2024 10:15 UTC (Thu) by bluca (subscriber, #118303) [Link]

"plugin frameworks" make absolutely no sense whatsoever for any of this, and would be just dirty and silly workarounds for outdated and limited package managers

Identifying dependencies used via dlopen()

Posted Apr 18, 2024 11:05 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

Plugin managers are bad! Plugins are bad!

dlopen() to load optional dependencies, on the other hand, is great.

Right?

Identifying dependencies used via dlopen()

Posted Apr 18, 2024 14:42 UTC (Thu) by bluca (subscriber, #118303) [Link]

dlopen is good but not great - great is what OSX/Mach-o provides, that was already mentioned elsewhere, where the linker/loader can do it in a much nicer way

Identifying dependencies used via dlopen()

Posted Apr 18, 2024 14:54 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

Not really. Optional dependencies inside foundational libraries are bad, however you wrap them.

When your feature requires reinventing something fundamental as ELF, then you're probably on a wrong track.

Identifying dependencies used via dlopen()

Posted Apr 18, 2024 18:55 UTC (Thu) by bluca (subscriber, #118303) [Link]

Nope, they are good and exactly what we need, and that's why we are implementing it everywhere

Identifying dependencies used via dlopen()

Posted Apr 19, 2024 1:02 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

I'll bite. Why are they good? What good do they provide to application writers and packagers?

Identifying dependencies used via dlopen()

Posted Apr 19, 2024 8:45 UTC (Fri) by bluca (subscriber, #118303) [Link]

They provide the ability to have optional dependencies that can gracefully downgraded to the feature being disabled, without requiring separate builds, with a minimal overhead for developers

Identifying dependencies used via dlopen()

Posted Apr 19, 2024 8:58 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

> They provide the ability to have optional dependencies that can gracefully downgraded to the feature being disabled, without requiring separate builds, with a minimal overhead for developers

First, "optional dependency" is kind of an oxymoron. It's either a dependency or not.

A more correct way to put it: it papers over a bloated dependency set, by moving dependency resolution from compile-time to runtime. You can compare it with dynamic languages and static languages. Static languages check stuff at compile time, but dynamic languages allow users to just write whatever and delay checking until the code is run.

In this particular case, this also very much applies. dlopen() will interfere with sandboxing, or with mseal()/mimmutable(). It will also introduce failure modes if your code enters a namespace that doesn't have the dependencies.

And all of that, what, to avoid admitting the mistake and splitting libsystemd into libjournald and a lighter libsystemd?

Identifying dependencies used via dlopen()

Posted Apr 19, 2024 10:09 UTC (Fri) by bluca (subscriber, #118303) [Link]

> First, "optional dependency" is kind of an oxymoron. It's either a dependency or not.

Optional features are obviously a thing, as the dependencies needed for an optional feature are obviously and clearly optional, this is just basic logic. Not only that, you are literally arguing against reality, given this is how it actually works, right now, there's nothing theoretical about this.

It's quite plainly obvious given this rant that you have a gigantic chip on your shoulder, so there's not much point in going on. It's very simple: we will use dlopen for most of our dependencies, libsystemd will stay exactly as it is other than that as it has the right design for a number of reasons, and we will work around outdated package managers with something like the elf note discussed in the article. If one day glibc+gcc give us the same feature mach-o implements with on-demand-resolved weak dt_needed, we'll switch to that.

Identifying dependencies used via dlopen()

Posted Apr 19, 2024 11:25 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

> Optional features are obviously a thing, as the dependencies needed for an optional feature are obviously and clearly optional, this is just basic logic. Not only that, you are literally arguing against reality, given this is how it actually works, right now, there's nothing theoretical about this.

What "actually works"? I can have a completely static system with musl, or with iOS where apps are statically linked.

dlopen() hacks and "optional dependencies" are just sloppiness, nothing more. In your case it's to cover up the screwups that happened with journald.

> It's quite plainly obvious given this rant that you have a gigantic chip on your shoulder, so there's not much point in going on. It's very simple: we will use dlopen for most of our dependencies, libsystemd will stay exactly as it is other than that as it has the right design for a number of reasons, and we will work around outdated package managers with something like the elf note discussed in the article

Are you going to support mseal()/mimmutable() or are you going to give a middle finger to people who want security?

Identifying dependencies used via dlopen()

Posted Apr 19, 2024 12:40 UTC (Fri) by pizza (subscriber, #46) [Link]

> First, "optional dependency" is kind of an oxymoron. It's either a dependency or not.

One of my printer drivers has an _optional_ dependency on an image processing library.

It is optional unless you have one of two specific models, then that library becomes mandatory if you want the printer to produce anything considered passable.

That library started out as proprietary, non-redistributable, and x86-only. So I reverse-engineered it and wrote an F/OSS replacement. Problem is that due to patent concerns, the F/OSS library isn't distributed in binary form -- or even as source in the upstream project.

Doing things this way lets distributions handle most of the support burden -- ie a single binary covering everyone, and that works for everyone without those two specific models. Folks with those models who get a license to use the proprietary library (that's guaranteed to work) can plop it in /usr/local/lib; or they can do a 'git pull && make && sudo make install` and get (so far) bit-for-bit identical results.

A dlopen() based approach is by far the simplest [1] way to accomplish the goals in a portable [2] manner.

[1] ie least burdensome to *me*
[2] Linux, MacOS, and even Windows. (ie probably every platform with libusb support)

Identifying dependencies used via dlopen()

Posted Apr 19, 2024 13:02 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

That's fine. dlopen() to work around legal or legacy requirements is not a big deal. It's like using LD_PRELOAD to work around a bug in one of the libraries for an old proprietary app. Using LD_PRELOAD for all systems would not be anything close to being acceptable.