A tale of two troublesome drivers
Did you know...? LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net. |
The kernel project merges dozens of drivers with every development cycle, and almost every one of those drivers is entirely uncontroversial. Occasionally, though, a driver submission raises wider questions, leading to lengthy discussion and, perhaps, opposition. That is currently the case with two separate drivers, both with ties to the networking subsystem. One of them is hung up on questions of whether (and how) all device functionality should be made available to user space, while the other has run into turbulence because it drives a device that is unobtainable outside of a single company.
mlx5ctl and fwctl
The mlx5ctl driver is not a new problem; it was covered here in December 2023. In short: this driver implements a transport channel allowing user space to query and manipulate parameters on an mlx5 device (which provides a range of networking, RDMA, and InfiniBand functionality), crucially without any understanding of those parameters on the kernel's part. Proponents say that this driver is needed to provide users with the access needed to configure and debug their hardware, especially on locked-down systems where other methods of talking directly to the hardware are unavailable. Opponents see it as a way of circumventing the normal development process that governs how device parameters are exported to user space.
Saeed Mahameed posted a new
version of the mlx5ctl patch series at the beginning of February,
saying: "We continue to think that mlx5ctl is reasonable and aligned
with the greater kernel community values
". Christoph Hellwig responded with
an ack and a complaint about the "subsystem maintainer overreach
"
that has blocked the merging of this driver. Networking maintainer Jakub
Kicinski agreed that
"overreach is unfortunate
", but also maintained the position that
this driver should not be merged: "We have a clear rule against opaque
user space to FW [firmware] interfaces
". Beyond that, there was not a
lot of other discussion on the submission at that time.
At the beginning of March, though, Jason Gunthorpe posted a proposal for a new subsystem called "fwctl" that would be the home for drivers like mlx5ctl. Modern devices, he wrote, tend to come with a large set of tunable parameters controlling many aspects of their functionality; these parameters need to be made accessible by user space if users are to be able to use their hardware.
fwctl's purpose is to define a common set of limited rules, described below, that allow user space to securely construct and execute RPCs inside device FW. The rules serve as an agreement between the operating system and FW on how to correctly design the RPC interface. As a uAPI the subsystem provides a thin layer of discovery and a generic uAPI to deliver the RPCs and collect the response. It supports a system of user space libraries and tools which will use this interface to control the device.
The proposal goes into some detail on the types of functionality that will be made available via fwctl interfaces. It also covers the functionality that cannot be provided, including the ability to DMA to arbitrary memory, manipulate kernel memory or subsystems outside of the driver itself, or provide functionality, such as sending a network packet, that should be handled by another subsystem.
As before, the primary opposition (to both mlx5ctl and fwctl) came from
Kicinski. He described
the justification for this work as "smoke and mirrors
", saying it
was a way for manufacturers to "hide as much as possible of what you
consider your proprietary advantage in the 'AI gold rush'
". Complex
hardware, he said, does
not need a backdoor to talk to the firmware without the kernel's mediation;
he cited the network interfaces used at Meta (his employer) as an example.
He questioned
whether the restrictions on fwctl drivers would be enforced, and said that
the conversation did not appear to be going anywhere useful:
Or should we go for another loop of me talking about openness and building common abstractions, and vendors saying how their way of doing basic configuration is so very special, and this is just for debug and security and because others.There's absolutely no willingness to try and build a common interface here.
Kicinski has repeatedly said that this functionality should be provided via an API like devlink, where parameters are exposed after a community review that is, among other things, intended to force consistency between hardware from different manufacturers. He complained that his offer to quickly review proposed devlink knobs had been ignored by the vendors looking for interfaces like fwctl.
On the other side, David Ahern asserted that fwctl is the common interface that Kicinski is looking for. Gunthorpe said that all complex devices require hardware-specific tooling to configure them to the customer's needs. The only reason Meta does not need such tools is that, as a large customer, it is able to receive its hardware preconfigured from the vendor; smaller customers do not receive that level of service. Vendors have been providing these tools for years, he said; fwctl is just a way to provide a common interface for them.
The problem with the devlink approach, Gunthorpe added, is that, beyond the slow and painful nature of the process, it is guaranteed to fail. To be useful, an interface must be able to work with all of the parameters provided by the device:
As far as configuration/provisioning goes, it is really all or nothing.If a specific site can configure only 90% of the stuff required because you will NAK the missing 10% it then it is still not usable and is a wasted effort for everyone.
You have never shown that there is a path to 100% with your approach to devlink. In fact I believe you've said flat out that 100% is not achievable.
Kicinski was
not receptive to this argument, though, calling many of the knobs
"hacks and lazy workarounds
".
As of this writing, this discussion does not appear to be any closer to a resolution than it was in December. The positions taken have only hardened over time. In the end, the fate of this driver (or for a future fwctl subsystem) may well depend on whether Linus Torvalds is willing to allow a networking maintainer to block the merging of a driver that is, by most accounts, independent of the networking subsystem.
A network interface for one
At the beginning of April, Alexander Duyck posted a driver called "fbnic" for a custom network interface card that is used only within Meta. That prompted an immediate question from Jiri Pirko, who wondered why the community needs a driver for a device that nobody is able to acquire. Duyck responded that upstreaming the driver would make maintenance easier, that it would make it easier to introduce new networking features implemented in the driver, and that the company might someday open some of the hardware information as well. Pirko was unimpressed and said that the driver should not be merged.
Duyck called
this reasoning "arbitrary and capricious
". The driver will have a
lot of users at Meta, he said. There have been other proprietary devices
added to the kernel in the past; the Intel
IDPF driver was mentioned as an example elsewhere in the conversation.
Drivers also often show up for devices that are not yet for sale, and may
never make it to the market. To reject the driver, he said, is an
accusation of "some lack of good faith
" from Meta.
Kicinski tried to redirect the discussion somewhat, saying that he did not want to be in the position of judging the "good faith" of companies. The community, he said, had to make its decision based on the interest of the project and the broader user base. He did not say, then, whether he thought the driver should be merged or not. Others, though, such as John Fastabend and Paolo Abeni, argued that fbnic appeared to be good code, and that in any case it is only a network-interface driver with no potential to harm the rest of the kernel, so there is no reason to keep it out.
Gunthorpe, while not arguing against the merging of fbnic, raised some
concerns. There is a strong feeling that code should not be merged
solely for the purpose of supporting proprietary user-space code, he said,
and "this submission is clearly blurring that line
". That could, he
said, lead to
problems in the future as more features are added to the driver.
There was a brief turn in the conversation when Andrew Lunn referred
to the mlx5ctl discussion and asked Duyck to show that a separate
firmware-tuning driver would not be required for this device. Kicinski said that showing
that would not change anybody's mind. Ahern suggested
that, in the future, when "the inevitable production problems
" show
up, a separate, mlx5ctl-like driver may well become necessary.
Perhaps the biggest concern, though, was expressed by Kicinski: what happens if changes elsewhere in the kernel break the driver, creating a regression for its users? Since the community as a whole cannot test the driver, such breaks could be hard to avoid and even harder to fix; that could lead to kernel changes being reverted. In such a situation, a private driver like fbnic could impede kernel development in general.
For that reason, though Kicinski eventually concluded that
"there's broad support for merging the driver
", he also said that
there needs to be a slightly different set of rules governing drivers for
private devices. These would include "weaker 'no regression'
guarantees
" and an expectation that the driver maintainers will
participate actively in efforts to refactor subsystem interfaces. In the
absence of such participation, a driver for a private device could be
removed from the kernel. Pirko eventually agreed that, if the
driver were to be marked as belonging to this new regime (which would have
to be documented), it "would be ok to let this in
".
So the fbnic driver seems likely to be merged in the end. The same may
eventually be true of mlx5ctl in some form as well. The Linux kernel did
not get to the position it is in by refusing to let users access the full
capabilities of their hardware, and it seems unlikely to adopt such a
policy now. A more difficult prospect, though, is to guess how many more
lengthy discussions will be required to reach that decision.
Index entries for this article | |
---|---|
Kernel | Development model/Driver merging |
Kernel | Device drivers |
(Log in to post comments)
A tale of two troublesome drivers
Posted Apr 12, 2024 16:53 UTC (Fri) by NYKevin (subscriber, #129325) [Link]
Am I understanding this argument correctly? It sounds as if they're saying "devlink satisfies all of our technical requirements, but we can't use it for political reasons." If so, that seems like a very poor argument for adding a second configuration interface. What makes you think you won't run into exactly the same political issues with the new interface?
A tale of two troublesome drivers
Posted Apr 12, 2024 17:20 UTC (Fri) by pizza (subscriber, #46) [Link]
It's more like "devlink's politics prevent it from satisfying all of our technical requirements"
("technical requirements" meaning "users can configure all necessary features of the device")
A tale of two troublesome drivers
Posted Apr 12, 2024 17:24 UTC (Fri) by NYKevin (subscriber, #129325) [Link]
A tale of two troublesome drivers
Posted Apr 12, 2024 20:00 UTC (Fri) by pizza (subscriber, #46) [Link]
devlink v. fwctl
Posted Apr 12, 2024 20:03 UTC (Fri) by corbet (editor, #1) [Link]
The difference is that, while devlink parameters are subject to review with the intent of having them work the same way across devices, fwctl parameters are whatever the firmware provides and aren't discussed on the mailing list at all. They probably do not appear in the driver code. So I would not expect fwctl to have the same difficulties (and neither do its proponents).
devlink v. fwctl
Posted Apr 12, 2024 21:15 UTC (Fri) by tialaramex (subscriber, #21167) [Link]
I expect it to have a whole different set of difficulties and to eventually get kicked out by Linus. Not because this couldn't work, but because this is exactly the sort of "give an inch" which idiots at hardware companies abuse until it's intolerable.
devlink v. fwctl
Posted Apr 12, 2024 23:24 UTC (Fri) by jgg (subscriber, #55211) [Link]
devlink v. fwctl
Posted Apr 14, 2024 3:45 UTC (Sun) by lutchann (subscriber, #8872) [Link]
devlink v. fwctl
Posted Apr 14, 2024 10:15 UTC (Sun) by intelfx (subscriber, #130118) [Link]
Might also be for the purpose of "ship now, think later" (aka cost-cutting). The rest of your comment is on point.
devlink v. fwctl
Posted Apr 12, 2024 21:46 UTC (Fri) by Binary-Eater (subscriber, #159553) [Link]
https://www.kernel.org/doc/html/latest/networking/devlink/
I think the problem the fwctl subsystem is trying to solve is that for multi-class devices, such as a chip that can do both Infiniband and netdev work, devlink cannot contain Infiniband or other class device related parameters. For example, you could not propose common GPU parameters to devlink. It is more subject to the unification of network type devices than commonality among devices of various applications. I think fwctl would likely audit FW parameters being supported in the transport rather than being an arbitrary transport but use a model of acceptance and then unify rather than trying to unify upfront. Just my personal interpretation.
devlink v. fwctl
Posted Apr 12, 2024 23:41 UTC (Fri) by jgg (subscriber, #55211) [Link]
You can see examples of what "works across all devices" looks like - eg look at the discussion about max_io_eqs and how it doesn't converge because it isn't clear if the mlx5 HW notion of EQ really translates to other devices. Another different kind of device, idpf, wanting to do SFs needs a different set of knobs. So "common" becomes a set of 3 configurables that maybe no device will implement all of, or nothing is agreed and nobody gets anything.
It is a common story of trying to tease out commonality in micro details from devices that were never designed to be common at all, sort of a rorschach test.
Two driver rule
Posted Apr 12, 2024 16:54 UTC (Fri) by nickodell (subscriber, #125165) [Link]
What is the two driver rule?
Two driver rule
Posted Apr 12, 2024 17:42 UTC (Fri) by moorray (subscriber, #54145) [Link]
Two driver rule
Posted Apr 12, 2024 18:50 UTC (Fri) by ballombe (subscriber, #9523) [Link]
making it a standard.
Two driver rule
Posted Apr 12, 2024 20:45 UTC (Fri) by WolfWings (subscriber, #56790) [Link]
See also: Why "Web SQL" got discontinued mis-standards process: Every single implementation ended up just picking SQLite because it worked well enough, and W3C dropped things like a hot potato because that's not a standard, that's just a shim layer for SQLite embedding.
SQLite is the only logical choice
Posted Apr 13, 2024 13:50 UTC (Sat) by DemiMarie (subscriber, #164188) [Link]
SQLite is the only logical choice
Posted Apr 13, 2024 14:29 UTC (Sat) by geofft (subscriber, #59789) [Link]
There is a polyfill which uses a wasn build of SQLite, which allows individual websites to upgrade or accept the risk of old versions as they prefer.
Two driver rule
Posted Apr 13, 2024 14:07 UTC (Sat) by pbonzini (subscriber, #60935) [Link]
Two driver rule
Posted Apr 14, 2024 5:57 UTC (Sun) by WolfWings (subscriber, #56790) [Link]
If you explicitly need to parse an SQLite database? Yes, you'd need a WASM compile of SQLite, but that doesn't help you with storing data in the browser.
If you mean "What standard did they go with?" then that's IndexedDB which they changed the basic theory of to get folks to do their own ideas instead of raw naked SQL which resulted in everyone doing SQLite.
fbnic Hardware Programming Interface Documentation
Posted Apr 13, 2024 9:04 UTC (Sat) by kalvdans (subscriber, #82065) [Link]
If Meta could link to the documentation / register-map of their device, anyone could evaluate whether the code is correct or not. One could even write a simulated model of the device to test the kernel without access to the hardware.
I would like if all drivers had a link to the hardware documentation on the top of the source code file.
fbnic Hardware Programming Interface Documentation
Posted Apr 25, 2024 17:33 UTC (Thu) by sammythesnake (guest, #17693) [Link]
Armed with this, any kernel changes that would require changes to this code could be done without access to the hardware and validated against the simulation+tests. In any case where the simulation+tests did not catch problems, then the only ones who miss out are Meta, who are thereby given incentive to ensure quality/coverage of what they provide.
I think that would seem like a win-win for everyone concerned, no...?
TCP Offload engines
Posted Apr 13, 2024 9:05 UTC (Sat) by garloff (subscriber, #319) [Link]
They would have circumvented the kernel's networking stack to a significant degree, which was undesired, as networking would have become very much dependent on specific hardware and proprietary firmware implementations for users of that functionality.
It was rejected in the end. Instead, common ground was built with checksumming offload, tcp segmentation offload and some of the other things that `ethtool -k` exposes.
I believe that was a win for everyone except for hardware vendors that would have preferred the lock-in of course.
Applying that to mlx5ctl: If it exposes things that will be and need be hardware specific, we should probably allow it.
If it exposes things that should be done generically (affecting different hardware of the same class), we should force a generic approach. Generic debugging settings, generic filtering capabilities, ...
This is an arms race: We have devices these days that have the functionality and complexity of operating systems. They might control more of your computer's operation than you might realize. (I'm thinking of SmartNICs and DPUs here.)
Them being proprietary does remove a lot of the transparency and control that open source operating systems brought us.
TCP Offload engines
Posted Apr 13, 2024 14:45 UTC (Sat) by marcH (subscriber, #57642) [Link]
Them being proprietary does remove a lot of the transparency and control that open source operating systems brought us.
Not sure what "arms race" means exactly in this context but yeah, "devices" have been running operating systems for a long time now. I put "devices" in quotes because there have been micro-controllers running operating systems on the CPU and GPU chips themselves for a long time too. I think that "race" is pretty much over by now, we have been in the "Distributed System on Chip" era for a very long time already.
TCP Offload engines
Posted Apr 30, 2024 3:35 UTC (Tue) by fest3er (guest, #60379) [Link]
TCP Offload engines
Posted Apr 13, 2024 19:25 UTC (Sat) by jgg (subscriber, #55211) [Link]
TCP offload hardware is operated through the RDMA stack instead.