|
|
Subscribe / Log in / New account

A tale of two troublesome drivers

Did you know...?

LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.

By Jonathan Corbet
April 12, 2024
The kernel project merges dozens of drivers with every development cycle, and almost every one of those drivers is entirely uncontroversial. Occasionally, though, a driver submission raises wider questions, leading to lengthy discussion and, perhaps, opposition. That is currently the case with two separate drivers, both with ties to the networking subsystem. One of them is hung up on questions of whether (and how) all device functionality should be made available to user space, while the other has run into turbulence because it drives a device that is unobtainable outside of a single company.

mlx5ctl and fwctl

The mlx5ctl driver is not a new problem; it was covered here in December 2023. In short: this driver implements a transport channel allowing user space to query and manipulate parameters on an mlx5 device (which provides a range of networking, RDMA, and InfiniBand functionality), crucially without any understanding of those parameters on the kernel's part. Proponents say that this driver is needed to provide users with the access needed to configure and debug their hardware, especially on locked-down systems where other methods of talking directly to the hardware are unavailable. Opponents see it as a way of circumventing the normal development process that governs how device parameters are exported to user space.

Saeed Mahameed posted a new version of the mlx5ctl patch series at the beginning of February, saying: "We continue to think that mlx5ctl is reasonable and aligned with the greater kernel community values". Christoph Hellwig responded with an ack and a complaint about the "subsystem maintainer overreach" that has blocked the merging of this driver. Networking maintainer Jakub Kicinski agreed that "overreach is unfortunate", but also maintained the position that this driver should not be merged: "We have a clear rule against opaque user space to FW [firmware] interfaces". Beyond that, there was not a lot of other discussion on the submission at that time.

At the beginning of March, though, Jason Gunthorpe posted a proposal for a new subsystem called "fwctl" that would be the home for drivers like mlx5ctl. Modern devices, he wrote, tend to come with a large set of tunable parameters controlling many aspects of their functionality; these parameters need to be made accessible by user space if users are to be able to use their hardware.

fwctl's purpose is to define a common set of limited rules, described below, that allow user space to securely construct and execute RPCs inside device FW. The rules serve as an agreement between the operating system and FW on how to correctly design the RPC interface. As a uAPI the subsystem provides a thin layer of discovery and a generic uAPI to deliver the RPCs and collect the response. It supports a system of user space libraries and tools which will use this interface to control the device.

The proposal goes into some detail on the types of functionality that will be made available via fwctl interfaces. It also covers the functionality that cannot be provided, including the ability to DMA to arbitrary memory, manipulate kernel memory or subsystems outside of the driver itself, or provide functionality, such as sending a network packet, that should be handled by another subsystem.

As before, the primary opposition (to both mlx5ctl and fwctl) came from Kicinski. He described the justification for this work as "smoke and mirrors", saying it was a way for manufacturers to "hide as much as possible of what you consider your proprietary advantage in the 'AI gold rush'". Complex hardware, he said, does not need a backdoor to talk to the firmware without the kernel's mediation; he cited the network interfaces used at Meta (his employer) as an example. He questioned whether the restrictions on fwctl drivers would be enforced, and said that the conversation did not appear to be going anywhere useful:

Or should we go for another loop of me talking about openness and building common abstractions, and vendors saying how their way of doing basic configuration is so very special, and this is just for debug and security and because others.

There's absolutely no willingness to try and build a common interface here.

Kicinski has repeatedly said that this functionality should be provided via an API like devlink, where parameters are exposed after a community review that is, among other things, intended to force consistency between hardware from different manufacturers. He complained that his offer to quickly review proposed devlink knobs had been ignored by the vendors looking for interfaces like fwctl.

On the other side, David Ahern asserted that fwctl is the common interface that Kicinski is looking for. Gunthorpe said that all complex devices require hardware-specific tooling to configure them to the customer's needs. The only reason Meta does not need such tools is that, as a large customer, it is able to receive its hardware preconfigured from the vendor; smaller customers do not receive that level of service. Vendors have been providing these tools for years, he said; fwctl is just a way to provide a common interface for them.

The problem with the devlink approach, Gunthorpe added, is that, beyond the slow and painful nature of the process, it is guaranteed to fail. To be useful, an interface must be able to work with all of the parameters provided by the device:

As far as configuration/provisioning goes, it is really all or nothing.

If a specific site can configure only 90% of the stuff required because you will NAK the missing 10% it then it is still not usable and is a wasted effort for everyone.

You have never shown that there is a path to 100% with your approach to devlink. In fact I believe you've said flat out that 100% is not achievable.

Kicinski was not receptive to this argument, though, calling many of the knobs "hacks and lazy workarounds".

As of this writing, this discussion does not appear to be any closer to a resolution than it was in December. The positions taken have only hardened over time. In the end, the fate of this driver (or for a future fwctl subsystem) may well depend on whether Linus Torvalds is willing to allow a networking maintainer to block the merging of a driver that is, by most accounts, independent of the networking subsystem.

A network interface for one

At the beginning of April, Alexander Duyck posted a driver called "fbnic" for a custom network interface card that is used only within Meta. That prompted an immediate question from Jiri Pirko, who wondered why the community needs a driver for a device that nobody is able to acquire. Duyck responded that upstreaming the driver would make maintenance easier, that it would make it easier to introduce new networking features implemented in the driver, and that the company might someday open some of the hardware information as well. Pirko was unimpressed and said that the driver should not be merged.

Duyck called this reasoning "arbitrary and capricious". The driver will have a lot of users at Meta, he said. There have been other proprietary devices added to the kernel in the past; the Intel IDPF driver was mentioned as an example elsewhere in the conversation. Drivers also often show up for devices that are not yet for sale, and may never make it to the market. To reject the driver, he said, is an accusation of "some lack of good faith" from Meta.

Kicinski tried to redirect the discussion somewhat, saying that he did not want to be in the position of judging the "good faith" of companies. The community, he said, had to make its decision based on the interest of the project and the broader user base. He did not say, then, whether he thought the driver should be merged or not. Others, though, such as John Fastabend and Paolo Abeni, argued that fbnic appeared to be good code, and that in any case it is only a network-interface driver with no potential to harm the rest of the kernel, so there is no reason to keep it out.

Gunthorpe, while not arguing against the merging of fbnic, raised some concerns. There is a strong feeling that code should not be merged solely for the purpose of supporting proprietary user-space code, he said, and "this submission is clearly blurring that line". That could, he said, lead to problems in the future as more features are added to the driver.

There was a brief turn in the conversation when Andrew Lunn referred to the mlx5ctl discussion and asked Duyck to show that a separate firmware-tuning driver would not be required for this device. Kicinski said that showing that would not change anybody's mind. Ahern suggested that, in the future, when "the inevitable production problems" show up, a separate, mlx5ctl-like driver may well become necessary.

Perhaps the biggest concern, though, was expressed by Kicinski: what happens if changes elsewhere in the kernel break the driver, creating a regression for its users? Since the community as a whole cannot test the driver, such breaks could be hard to avoid and even harder to fix; that could lead to kernel changes being reverted. In such a situation, a private driver like fbnic could impede kernel development in general.

For that reason, though Kicinski eventually concluded that "there's broad support for merging the driver", he also said that there needs to be a slightly different set of rules governing drivers for private devices. These would include "weaker 'no regression' guarantees" and an expectation that the driver maintainers will participate actively in efforts to refactor subsystem interfaces. In the absence of such participation, a driver for a private device could be removed from the kernel. Pirko eventually agreed that, if the driver were to be marked as belonging to this new regime (which would have to be documented), it "would be ok to let this in".

So the fbnic driver seems likely to be merged in the end. The same may eventually be true of mlx5ctl in some form as well. The Linux kernel did not get to the position it is in by refusing to let users access the full capabilities of their hardware, and it seems unlikely to adopt such a policy now. A more difficult prospect, though, is to guess how many more lengthy discussions will be required to reach that decision.

Index entries for this article
KernelDevelopment model/Driver merging
KernelDevice drivers


(Log in to post comments)

A tale of two troublesome drivers

Posted Apr 12, 2024 16:53 UTC (Fri) by NYKevin (subscriber, #129325) [Link]

> If a specific site can configure only 90% of the stuff required because you will NAK the missing 10% it then it is still not usable and is a wasted effort for everyone.

Am I understanding this argument correctly? It sounds as if they're saying "devlink satisfies all of our technical requirements, but we can't use it for political reasons." If so, that seems like a very poor argument for adding a second configuration interface. What makes you think you won't run into exactly the same political issues with the new interface?

A tale of two troublesome drivers

Posted Apr 12, 2024 17:20 UTC (Fri) by pizza (subscriber, #46) [Link]

> Am I understanding this argument correctly? It sounds as if they're saying "devlink satisfies all of our technical requirements, but we can't use it for political reasons."

It's more like "devlink's politics prevent it from satisfying all of our technical requirements"

("technical requirements" meaning "users can configure all necessary features of the device")

A tale of two troublesome drivers

Posted Apr 12, 2024 17:24 UTC (Fri) by NYKevin (subscriber, #129325) [Link]

That's really not much better. devlink's politics are LKML's politics, and fwctl's politics will also be LKML's politics. You cannot specify the politics of an interface by writing code.

A tale of two troublesome drivers

Posted Apr 12, 2024 20:00 UTC (Fri) by pizza (subscriber, #46) [Link]

The point being, if what gets merged doesn't allow for all necessary features of the device to be configured, it _still_ needs a secondary configuration mechanism/path for the rest, or the result is not fit for purpose and won't actually be used by anyone.

devlink v. fwctl

Posted Apr 12, 2024 20:03 UTC (Fri) by corbet (editor, #1) [Link]

The difference is that, while devlink parameters are subject to review with the intent of having them work the same way across devices, fwctl parameters are whatever the firmware provides and aren't discussed on the mailing list at all. They probably do not appear in the driver code. So I would not expect fwctl to have the same difficulties (and neither do its proponents).

devlink v. fwctl

Posted Apr 12, 2024 21:15 UTC (Fri) by tialaramex (subscriber, #21167) [Link]

> So I would not expect fwctl to have the same difficulties (and neither do its proponents).

I expect it to have a whole different set of difficulties and to eventually get kicked out by Linus. Not because this couldn't work, but because this is exactly the sort of "give an inch" which idiots at hardware companies abuse until it's intolerable.

devlink v. fwctl

Posted Apr 12, 2024 23:24 UTC (Fri) by jgg (subscriber, #55211) [Link]

I wonder what intolerable would be? Lots of examples of this kind of interface in the kernel now..

devlink v. fwctl

Posted Apr 14, 2024 3:45 UTC (Sun) by lutchann (subscriber, #8872) [Link]

I assume "abuse" here means implementing functionality in a proprietary fashion through fwctl that clearly should be done with an existing generic interface, solely for the purpose of lock-in.

devlink v. fwctl

Posted Apr 14, 2024 10:15 UTC (Sun) by intelfx (subscriber, #130118) [Link]

> solely for the purpose of lock-in

Might also be for the purpose of "ship now, think later" (aka cost-cutting). The rest of your comment is on point.

devlink v. fwctl

Posted Apr 12, 2024 21:46 UTC (Fri) by Binary-Eater (subscriber, #159553) [Link]

I think the problem is that devlink is specific for configuring netdev-class devices.

https://www.kernel.org/doc/html/latest/networking/devlink/

I think the problem the fwctl subsystem is trying to solve is that for multi-class devices, such as a chip that can do both Infiniband and netdev work, devlink cannot contain Infiniband or other class device related parameters. For example, you could not propose common GPU parameters to devlink. It is more subject to the unification of network type devices than commonality among devices of various applications. I think fwctl would likely audit FW parameters being supported in the transport rather than being an arbitrary transport but use a model of acceptance and then unify rather than trying to unify upfront. Just my personal interpretation.

devlink v. fwctl

Posted Apr 12, 2024 23:41 UTC (Fri) by jgg (subscriber, #55211) [Link]

Yes, the configuration stuff that has made such a ruckus is much like, say, EFI variables stored in BIOS flash. Do a "ls /sys/firmware/efi/efivars/" for example. How fully accessing a similar set of variables in a PCI device flash deserves such a discussion is really :(

You can see examples of what "works across all devices" looks like - eg look at the discussion about max_io_eqs and how it doesn't converge because it isn't clear if the mlx5 HW notion of EQ really translates to other devices. Another different kind of device, idpf, wanting to do SFs needs a different set of knobs. So "common" becomes a set of 3 configurables that maybe no device will implement all of, or nothing is agreed and nobody gets anything.

It is a common story of trying to tease out commonality in micro details from devices that were never designed to be common at all, sort of a rorschach test.

Two driver rule

Posted Apr 12, 2024 16:54 UTC (Fri) by nickodell (subscriber, #125165) [Link]

>Others are more cautious focusing on blast radius and referring to the "two driver rule" (Daniel, Paolo)?

What is the two driver rule?

Two driver rule

Posted Apr 12, 2024 17:42 UTC (Fri) by moorray (subscriber, #54145) [Link]

"Two driver rule" is requiring two independent drivers to implement any major core stack feature before kernel merges it. But unclear whether such rules exists in practice or what the origins are. To some extent it's covered here: https://lore.kernel.org/all/20240408080420.7a6dad61@kerne...

Two driver rule

Posted Apr 12, 2024 18:50 UTC (Fri) by ballombe (subscriber, #9523) [Link]

It is common in the industry to require two independent implementation of a specification before
making it a standard.

Two driver rule

Posted Apr 12, 2024 20:45 UTC (Fri) by WolfWings (subscriber, #56790) [Link]

See also: Why "Web SQL" got discontinued mis-standards process: Every single implementation ended up just picking SQLite because it worked well enough, and W3C dropped things like a hot potato because that's not a standard, that's just a shim layer for SQLite embedding.

SQLite is the only logical choice

Posted Apr 13, 2024 13:50 UTC (Sat) by DemiMarie (subscriber, #164188) [Link]

There’s no good reason to not use SQLite in an application like that, unless one has extreme performance needs.

SQLite is the only logical choice

Posted Apr 13, 2024 14:29 UTC (Sat) by geofft (subscriber, #59789) [Link]

The specific problem was that it gave you a fairly raw interface to SQLite, to the point that upgrading SQLite would have broken existing applications by parsing queries differently or even fixing bugs. I think one of the browser vendors was shipping a decade-old copy of SQLite, which is not something browser vendors feel particularly great about doing.

There is a polyfill which uses a wasn build of SQLite, which allows individual websites to upgrade or accept the risk of old versions as they prefer.

Two driver rule

Posted Apr 13, 2024 14:07 UTC (Sat) by pbonzini (subscriber, #60935) [Link]

So the replacement is Wasm SQLite?

Two driver rule

Posted Apr 14, 2024 5:57 UTC (Sun) by WolfWings (subscriber, #56790) [Link]

If you explicitly need to parse an SQLite database? Yes, you'd need a WASM compile of SQLite, but that doesn't help you with storing data in the browser.

If you mean "What standard did they go with?" then that's IndexedDB which they changed the basic theory of to get folks to do their own ideas instead of raw naked SQL which resulted in everyone doing SQLite.

fbnic Hardware Programming Interface Documentation

Posted Apr 13, 2024 9:04 UTC (Sat) by kalvdans (subscriber, #82065) [Link]

> Since the community as a whole cannot test the driver, such breaks could be hard to avoid and even harder to fix

If Meta could link to the documentation / register-map of their device, anyone could evaluate whether the code is correct or not. One could even write a simulated model of the device to test the kernel without access to the hardware.

I would like if all drivers had a link to the hardware documentation on the top of the source code file.

fbnic Hardware Programming Interface Documentation

Posted Apr 25, 2024 17:33 UTC (Thu) by sammythesnake (guest, #17693) [Link]

It would seem fair to me that meta could (IMHO, *should*) be required as a condition of including this driver, to provide this documentation, simulation, and appropriate tests.

Armed with this, any kernel changes that would require changes to this code could be done without access to the hardware and validated against the simulation+tests. In any case where the simulation+tests did not catch problems, then the only ones who miss out are Meta, who are thereby given incentive to ensure quality/coverage of what they provide.

I think that would seem like a win-win for everyone concerned, no...?

TCP Offload engines

Posted Apr 13, 2024 9:05 UTC (Sat) by garloff (subscriber, #319) [Link]

The discussion reminds me a bit on the controversy of TCP offload engines, that were not allowed into the kernel.
They would have circumvented the kernel's networking stack to a significant degree, which was undesired, as networking would have become very much dependent on specific hardware and proprietary firmware implementations for users of that functionality.
It was rejected in the end. Instead, common ground was built with checksumming offload, tcp segmentation offload and some of the other things that `ethtool -k` exposes.
I believe that was a win for everyone except for hardware vendors that would have preferred the lock-in of course.

Applying that to mlx5ctl: If it exposes things that will be and need be hardware specific, we should probably allow it.
If it exposes things that should be done generically (affecting different hardware of the same class), we should force a generic approach. Generic debugging settings, generic filtering capabilities, ...

This is an arms race: We have devices these days that have the functionality and complexity of operating systems. They might control more of your computer's operation than you might realize. (I'm thinking of SmartNICs and DPUs here.)
Them being proprietary does remove a lot of the transparency and control that open source operating systems brought us.

TCP Offload engines

Posted Apr 13, 2024 14:45 UTC (Sat) by marcH (subscriber, #57642) [Link]

> This is an arms race: We have devices these days that have the functionality and complexity of operating systems. They might control more of your computer's operation than you might realize. (I'm thinking of SmartNICs and DPUs here.)
Them being proprietary does remove a lot of the transparency and control that open source operating systems brought us.

Not sure what "arms race" means exactly in this context but yeah, "devices" have been running operating systems for a long time now. I put "devices" in quotes because there have been micro-controllers running operating systems on the CPU and GPU chips themselves for a long time too. I think that "race" is pretty much over by now, we have been in the "Distributed System on Chip" era for a very long time already.

TCP Offload engines

Posted Apr 30, 2024 3:35 UTC (Tue) by fest3er (guest, #60379) [Link]

Kind-of like the DEC-20 that had a PDP-11 handling I/O?

TCP Offload engines

Posted Apr 13, 2024 19:25 UTC (Sat) by jgg (subscriber, #55211) [Link]

TCP offload was not rejected from the kernel, it was rejected from netdev and rejected to use the standard sockets API.

TCP offload hardware is operated through the RDMA stack instead.


Copyright © 2024, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds