- 27 Feb, 2020 25 commits
-
-
Ido Schimmel authored
During port initialization the driver instructs the device to only advertise speeds that can be supported by the port's current width. Since the device now returns the supported speeds based on the port's current width, the driver no longer needs to compute the speeds that can be advertised. Simplify port initialization by setting the advertised speeds to the queried supported speeds. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
Spectrum-1 and Spectrum-2 do not have a per-TC counter of number of packets marked by ECN. The value reported currently is the total number of marked packets. Showing this value at individual TC Qdiscs is misleading. Move the counter to ethtool instead. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Currently, only one SFN query is done from repetitive work at a time, processing 64 entries. Another work iteration is scheduled in 100ms, that means that the max rate of learned FDB entries is limited to 6400/s. That is slow. Fix this by doing 2 optimizations: 1) Run 10 SFN queries at a time. 2) In case the SFN is not drained, schedule work with 0 delay to allow to continue processing rest of the records. On a testing setup with 500K entries the time to process decreased from 870secs to 10secs. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Tested-by: Alex Kushnarov <alexanderk@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Randy Dunlap authored
When debugging via dprintk() is not enabled, make the dprintk() macro be an empty do-while loop, as is done in <linux/sunrpc/debug.h>. This fixes a gcc warning when -Wextra is set: ../net/llc/af_llc.c:974:51: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body] I have verified that there is not object code change (with gcc 7.5.0). Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: netdev@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linuxDavid S. Miller authored
Saeed Mahameed says: ==================== mlx5-updates-2020-02-25 The following series provides some misc updates to mlx5 driver: 1) From Maxim, Refactoring for mlx5e netdev channels recreation flow. - Add error handling - Add context to the preactivate hook - Use preactivate hook with context where it can be used and subsequently unify channel recreation flow everywhere. - Fix XPS cpumask to not reset upon channel recreation. 2) From Tariq: - Use indirect calls wrapper on RX. - Check LRO capability bit 3) Multiple small cleanups ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Hans Wippel says: ==================== net/smc: improve peer ID in CLC decline The following two patches improve the peer ID in CLC decline messages if RoCE devices are present in the host but no suitable device is found for a connection. The first patch reworks the peer ID initialization. The second patch contains the actual changes of the CLC decline messages. Changes v1 -> v2: * make smc_ib_is_valid_local_systemid() static in first patch * changed if in smc_clc_send_decline() to remove curly braces Changes RFC -> v1: * split the patch into two parts * removed zero assignment to global variable (thanks Leon) Thanks to Leon Romanovsky and Karsten Graul for the feedback! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Hans Wippel authored
According to RFC 7609, all CLC messages contain a peer ID that consists of a unique instance ID and the MAC address of one of the host's RoCE devices. But if a SMC-R connection cannot be established, e.g., because no matching pnet table entry is found, the current implementation uses a zero value in the CLC decline message although the host's peer ID is set to a proper value. If no RoCE and no ISM device is usable for a connection, there is no LGR and the LGR check in smc_clc_send_decline() prevents that the peer ID is copied into the CLC decline message for both SMC-D and SMC-R. So, this patch modifies the check to also accept the case of no LGR. Also, only a valid peer ID is copied into the decline message. Signed-off-by: Hans Wippel <ndev@hwipl.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Hans Wippel authored
This patch initializes the peer ID to a random instance ID and a zero MAC address. If a RoCE device is in the host, the MAC address part of the peer ID is overwritten with the respective address. Also, a function for checking if the peer ID is valid is added. A peer ID is considered valid if the MAC address part contains a non-zero MAC address. Signed-off-by: Hans Wippel <ndev@hwipl.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arjun Roy authored
TCP receive zerocopy currently does not update the returned optlen for getsockopt() if the user passed in a larger than expected value. Thus, userspace cannot properly determine if all the fields are set in the passed-in struct. This patch sets the optlen for this case before returning, in keeping with the expected operation of getsockopt(). Fixes: c8856c05 ("tcp-zerocopy: Return inq along with tcp receive zerocopy.") Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Christian Brauner says: ==================== net: fix sysfs permssions when device changes network /* v7 */ This is v7 with a build warning fixup that slipped past me the last time. It removes to unused variables in sysfs_group_change_owner(). I observed no warning when building just now. /* v6 */ This is v6 with two small fixups. I missed adapting the commit message to reflect the renamed helper for changing the owner of sysfs files and I also forgot to make the new dpm helper static inline. /* v5 */ This is v5 with a small fixup requested by Rafael. /* v4 */ This is v4 with more documentation and other fixes that Greg requested. /* v3 */ This is v3 with explicit uid and gid parameters added to functions that change sysfs object ownership as Greg requested. (I've tagged this with net-next since it's triggered by a bug for network device files but it also touches driver core aspects so it's not clear-cut. I can of course split this series into separate patchsets.) We have been struggling with a bug surrounding the ownership of network device sysfs files when moving network devices between network namespaces owned by different user namespaces reported by multiple users. Currently, when moving network devices between network namespaces the ownership of the corresponding sysfs entries is not changed. This leads to problems when tools try to operate on the corresponding sysfs files. I also causes a bug when creating a network device in a network namespaces owned by a user namespace and moving that network device back to the host network namespaces. Because when a network device is created in a network namespaces it will be owned by the root user of the user namespace and all its associated sysfs files will also be owned by the root user of the corresponding user namespace. If such a network device has to be moved back to the host network namespace the permissions will still be set to the root user of the owning user namespaces of the originating network namespace. This means unprivileged users can e.g. re-trigger uevents for such incorrectly owned devices on the host or in other network namespaces. They can also modify the settings of the device itself through sysfs when they wouldn't be able to do the same through netlink. Both of these things are unwanted. For example, quite a few workloads will create network devices in the host network namespace. Other tools will then proceed to move such devices between network namespaces owner by other user namespaces. While the ownership of the device itself is updated in net/core/net-sysfs.c:dev_change_net_namespace() the corresponding sysfs entry for the device is not. Below you'll find that moving a network device (here a veth device) from a network namespace into another network namespaces owned by a different user namespace with a different id mapping. As you can see the permissions are wrong even though it is owned by the userns root user after it has been moved and can be interacted with through netlink: drwxr-xr-x 5 nobody nobody 0 Jan 25 18:08 . drwxr-xr-x 9 nobody nobody 0 Jan 25 18:08 .. -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 addr_assign_type -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 addr_len -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 address -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 broadcast -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier_changes -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier_down_count -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier_up_count -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 dev_id -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 dev_port -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 dormant -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 duplex -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 flags -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 gro_flush_timeout -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 ifalias -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 ifindex -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 iflink -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 link_mode -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 mtu -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 name_assign_type -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 netdev_group -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 operstate -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 phys_port_id -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 phys_port_name -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 phys_switch_id drwxr-xr-x 2 nobody nobody 0 Jan 25 18:09 power -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 proto_down drwxr-xr-x 4 nobody nobody 0 Jan 25 18:09 queues -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 speed drwxr-xr-x 2 nobody nobody 0 Jan 25 18:09 statistics lrwxrwxrwx 1 nobody nobody 0 Jan 25 18:08 subsystem -> ../../../../class/net -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 tx_queue_len -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 type -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:08 uevent Constrast this with creating a device of the same type in the network namespace directly. In this case the device's sysfs permissions will be correctly updated. (Please also note, that in a lot of workloads this strategy of creating the network device directly in the network device to workaround this issue can not be used. Either because the network device is dedicated after it has been created or because it used by a process that is heavily sandboxed and couldn't create network devices itself.): drwxr-xr-x 5 root root 0 Jan 25 18:12 . drwxr-xr-x 9 nobody nobody 0 Jan 25 18:08 .. -r--r--r-- 1 root root 4096 Jan 25 18:12 addr_assign_type -r--r--r-- 1 root root 4096 Jan 25 18:12 addr_len -r--r--r-- 1 root root 4096 Jan 25 18:12 address -r--r--r-- 1 root root 4096 Jan 25 18:12 broadcast -rw-r--r-- 1 root root 4096 Jan 25 18:12 carrier -r--r--r-- 1 root root 4096 Jan 25 18:12 carrier_changes -r--r--r-- 1 root root 4096 Jan 25 18:12 carrier_down_count -r--r--r-- 1 root root 4096 Jan 25 18:12 carrier_up_count -r--r--r-- 1 root root 4096 Jan 25 18:12 dev_id -r--r--r-- 1 root root 4096 Jan 25 18:12 dev_port -r--r--r-- 1 root root 4096 Jan 25 18:12 dormant -r--r--r-- 1 root root 4096 Jan 25 18:12 duplex -rw-r--r-- 1 root root 4096 Jan 25 18:12 flags -rw-r--r-- 1 root root 4096 Jan 25 18:12 gro_flush_timeout -rw-r--r-- 1 root root 4096 Jan 25 18:12 ifalias -r--r--r-- 1 root root 4096 Jan 25 18:12 ifindex -r--r--r-- 1 root root 4096 Jan 25 18:12 iflink -r--r--r-- 1 root root 4096 Jan 25 18:12 link_mode -rw-r--r-- 1 root root 4096 Jan 25 18:12 mtu -r--r--r-- 1 root root 4096 Jan 25 18:12 name_assign_type -rw-r--r-- 1 root root 4096 Jan 25 18:12 netdev_group -r--r--r-- 1 root root 4096 Jan 25 18:12 operstate -r--r--r-- 1 root root 4096 Jan 25 18:12 phys_port_id -r--r--r-- 1 root root 4096 Jan 25 18:12 phys_port_name -r--r--r-- 1 root root 4096 Jan 25 18:12 phys_switch_id drwxr-xr-x 2 root root 0 Jan 25 18:12 power -rw-r--r-- 1 root root 4096 Jan 25 18:12 proto_down drwxr-xr-x 4 root root 0 Jan 25 18:12 queues -r--r--r-- 1 root root 4096 Jan 25 18:12 speed drwxr-xr-x 2 root root 0 Jan 25 18:12 statistics lrwxrwxrwx 1 nobody nobody 0 Jan 25 18:12 subsystem -> ../../../../class/net -rw-r--r-- 1 root root 4096 Jan 25 18:12 tx_queue_len -r--r--r-- 1 root root 4096 Jan 25 18:12 type -rw-r--r-- 1 root root 4096 Jan 25 18:12 uevent Now, when creating a network device in a network namespace owned by a user namespace and moving it to the host the permissions will be set to the id that the user namespace root user has been mapped to on the host leading to all sorts of permission issues mentioned above: 458752 drwxr-xr-x 5 458752 458752 0 Jan 25 18:12 . drwxr-xr-x 9 root root 0 Jan 25 18:08 .. -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 addr_assign_type -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 addr_len -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 address -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 broadcast -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 carrier -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 carrier_changes -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 carrier_down_count -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 carrier_up_count -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 dev_id -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 dev_port -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 dormant -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 duplex -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 flags -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 gro_flush_timeout -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 ifalias -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 ifindex -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 iflink -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 link_mode -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 mtu -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 name_assign_type -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 netdev_group -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 operstate -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 phys_port_id -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 phys_port_name -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 phys_switch_id drwxr-xr-x 2 458752 458752 0 Jan 25 18:12 power -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 proto_down drwxr-xr-x 4 458752 458752 0 Jan 25 18:12 queues -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 speed drwxr-xr-x 2 458752 458752 0 Jan 25 18:12 statistics lrwxrwxrwx 1 root root 0 Jan 25 18:12 subsystem -> ../../../../class/net -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 tx_queue_len -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 type -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 uevent Fix this by changing the basic sysfs files associated with network devices when moving them between network namespaces. To this end we add some infrastructure to sysfs. The patchset takes care to only do this when the owning user namespaces changes and the kids differ. So there's only a performance overhead, when the owning user namespace of the network namespace is different __and__ the kid mappings for the root user are different for the two user namespaces: Assume we have a netdev eth0 which we create in netns1 owned by userns1. userns1 has an id mapping of 0 100000 100000. Now we move eth0 into netns2 which is owned by userns2 which also defines an id mapping of 0 100000 100000. In this case sysfs doesn't need updating. The patch will handle this case and not do any needless work. Now assume eth0 is moved into netns3 which is owned by userns3 which defines an id mapping of 0 123456 65536. In this case the root user in each namespace corresponds to different kid and sysfs needs updating. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christian Brauner authored
Now that we moved all the helpers in place and make use netdev_change_owner() to fixup the permissions when moving network devices between network namespaces. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christian Brauner authored
Add a function to change the owner of the queue entries for a network device when it is moved between network namespaces. Currently, when moving network devices between network namespaces the ownership of the corresponding queue sysfs entries are not changed. This leads to problems when tools try to operate on the corresponding sysfs files. Fix this. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christian Brauner authored
Add a function to change the owner of a network device when it is moved between network namespaces. Currently, when moving network devices between network namespaces the ownership of the corresponding sysfs entries is not changed. This leads to problems when tools try to operate on the corresponding sysfs files. This leads to a bug whereby a network device that is created in a network namespaces owned by a user namespace will have its corresponding sysfs entry owned by the root user of the corresponding user namespace. If such a network device has to be moved back to the host network namespace the permissions will still be set to the user namespaces. This means unprivileged users can e.g. trigger uevents for such incorrectly owned devices. They can also modify the settings of the device itself. Both of these things are unwanted. For example, workloads will create network devices in the host network namespace. Other tools will then proceed to move such devices between network namespaces owner by other user namespaces. While the ownership of the device itself is updated in net/core/net-sysfs.c:dev_change_net_namespace() the corresponding sysfs entry for the device is not: drwxr-xr-x 5 nobody nobody 0 Jan 25 18:08 . drwxr-xr-x 9 nobody nobody 0 Jan 25 18:08 .. -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 addr_assign_type -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 addr_len -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 address -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 broadcast -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier_changes -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier_down_count -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier_up_count -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 dev_id -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 dev_port -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 dormant -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 duplex -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 flags -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 gro_flush_timeout -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 ifalias -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 ifindex -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 iflink -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 link_mode -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 mtu -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 name_assign_type -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 netdev_group -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 operstate -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 phys_port_id -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 phys_port_name -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 phys_switch_id drwxr-xr-x 2 nobody nobody 0 Jan 25 18:09 power -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 proto_down drwxr-xr-x 4 nobody nobody 0 Jan 25 18:09 queues -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 speed drwxr-xr-x 2 nobody nobody 0 Jan 25 18:09 statistics lrwxrwxrwx 1 nobody nobody 0 Jan 25 18:08 subsystem -> ../../../../class/net -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 tx_queue_len -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 type -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:08 uevent However, if a device is created directly in the network namespace then the device's sysfs permissions will be correctly updated: drwxr-xr-x 5 root root 0 Jan 25 18:12 . drwxr-xr-x 9 nobody nobody 0 Jan 25 18:08 .. -r--r--r-- 1 root root 4096 Jan 25 18:12 addr_assign_type -r--r--r-- 1 root root 4096 Jan 25 18:12 addr_len -r--r--r-- 1 root root 4096 Jan 25 18:12 address -r--r--r-- 1 root root 4096 Jan 25 18:12 broadcast -rw-r--r-- 1 root root 4096 Jan 25 18:12 carrier -r--r--r-- 1 root root 4096 Jan 25 18:12 carrier_changes -r--r--r-- 1 root root 4096 Jan 25 18:12 carrier_down_count -r--r--r-- 1 root root 4096 Jan 25 18:12 carrier_up_count -r--r--r-- 1 root root 4096 Jan 25 18:12 dev_id -r--r--r-- 1 root root 4096 Jan 25 18:12 dev_port -r--r--r-- 1 root root 4096 Jan 25 18:12 dormant -r--r--r-- 1 root root 4096 Jan 25 18:12 duplex -rw-r--r-- 1 root root 4096 Jan 25 18:12 flags -rw-r--r-- 1 root root 4096 Jan 25 18:12 gro_flush_timeout -rw-r--r-- 1 root root 4096 Jan 25 18:12 ifalias -r--r--r-- 1 root root 4096 Jan 25 18:12 ifindex -r--r--r-- 1 root root 4096 Jan 25 18:12 iflink -r--r--r-- 1 root root 4096 Jan 25 18:12 link_mode -rw-r--r-- 1 root root 4096 Jan 25 18:12 mtu -r--r--r-- 1 root root 4096 Jan 25 18:12 name_assign_type -rw-r--r-- 1 root root 4096 Jan 25 18:12 netdev_group -r--r--r-- 1 root root 4096 Jan 25 18:12 operstate -r--r--r-- 1 root root 4096 Jan 25 18:12 phys_port_id -r--r--r-- 1 root root 4096 Jan 25 18:12 phys_port_name -r--r--r-- 1 root root 4096 Jan 25 18:12 phys_switch_id drwxr-xr-x 2 root root 0 Jan 25 18:12 power -rw-r--r-- 1 root root 4096 Jan 25 18:12 proto_down drwxr-xr-x 4 root root 0 Jan 25 18:12 queues -r--r--r-- 1 root root 4096 Jan 25 18:12 speed drwxr-xr-x 2 root root 0 Jan 25 18:12 statistics lrwxrwxrwx 1 nobody nobody 0 Jan 25 18:12 subsystem -> ../../../../class/net -rw-r--r-- 1 root root 4096 Jan 25 18:12 tx_queue_len -r--r--r-- 1 root root 4096 Jan 25 18:12 type -rw-r--r-- 1 root root 4096 Jan 25 18:12 uevent Now, when creating a network device in a network namespace owned by a user namespace and moving it to the host the permissions will be set to the id that the user namespace root user has been mapped to on the host leading to all sorts of permission issues: 458752 drwxr-xr-x 5 458752 458752 0 Jan 25 18:12 . drwxr-xr-x 9 root root 0 Jan 25 18:08 .. -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 addr_assign_type -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 addr_len -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 address -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 broadcast -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 carrier -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 carrier_changes -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 carrier_down_count -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 carrier_up_count -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 dev_id -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 dev_port -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 dormant -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 duplex -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 flags -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 gro_flush_timeout -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 ifalias -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 ifindex -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 iflink -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 link_mode -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 mtu -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 name_assign_type -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 netdev_group -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 operstate -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 phys_port_id -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 phys_port_name -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 phys_switch_id drwxr-xr-x 2 458752 458752 0 Jan 25 18:12 power -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 proto_down drwxr-xr-x 4 458752 458752 0 Jan 25 18:12 queues -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 speed drwxr-xr-x 2 458752 458752 0 Jan 25 18:12 statistics lrwxrwxrwx 1 root root 0 Jan 25 18:12 subsystem -> ../../../../class/net -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 tx_queue_len -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 type -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 uevent Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christian Brauner authored
Add a helper to change the owner of a device's power entries. This needs to happen when the ownership of a device is changed, e.g. when moving network devices between network namespaces. This function will be used to correctly account for ownership changes, e.g. when moving network devices between network namespaces. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christian Brauner authored
Add a helper to change the owner of a device's sysfs entries. This needs to happen when the ownership of a device is changed, e.g. when moving network devices between network namespaces. This function will be used to correctly account for ownership changes, e.g. when moving network devices between network namespaces. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christian Brauner authored
Add a helper to change the owner of sysfs objects. This function will be used to correctly account for kobject ownership changes, e.g. when moving network devices between network namespaces. This mirrors how a kobject is added through driver core which in its guts is done via kobject_add_internal() which in summary creates the main directory via create_dir(), populates that directory with the groups associated with the ktype of the kobject (if any) and populates the directory with the basic attributes associated with the ktype of the kobject (if any). These are the basic steps that are associated with adding a kobject in sysfs. Any additional properties are added by the specific subsystem itself (not by driver core) after it has registered the device. So for the example of network devices, a network device will e.g. register a queue subdirectory under the basic sysfs directory for the network device and than further subdirectories within that queues subdirectory. But that is all specific to network devices and they call the corresponding sysfs functions to do that directly when they create those queue objects. So anything that a subsystem adds outside of what driver core does must also be changed by it (That's already true for removal of files it created outside of driver core.) and it's the same for ownership changes. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christian Brauner authored
Add helpers to change the owner of sysfs groups. This function will be used to correctly account for kobject ownership changes, e.g. when moving network devices between network namespaces. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christian Brauner authored
Add a helper to change the owner of a sysfs link. This function will be used to correctly account for kobject ownership changes, e.g. when moving network devices between network namespaces. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christian Brauner authored
Add helpers to change the owner of a sysfs files. This function will be used to correctly account for kobject ownership changes, e.g. when moving network devices between network namespaces. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Gustavo A. R. Silva authored
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] Lastly, fix the following checkpatch warning: CHECK: Prefer kernel type 'u32' over 'u_int32_t' #61: FILE: drivers/net/ethernet/cisco/enic/vnic_devcmd.h:653: + u_int32_t val[]; This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Gustavo A. R. Silva authored
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Gustavo A. R. Silva authored
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Gustavo A. R. Silva authored
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Martin Habets <mhabets@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Gustavo A. R. Silva authored
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was detected with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Fainelli authored
This reverts commit 7458bd54 ("net: dsa: bcm_sf2: Also configure Port 5 for 2Gb/sec on 7278") as it causes advanced congestion buffering issues with 7278 switch devices when using their internal Giabit PHY. While this is being debugged, continue with conservative defaults that work and do not cause packet loss. Fixes: 7458bd54 ("net: dsa: bcm_sf2: Also configure Port 5 for 2Gb/sec on 7278") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 26 Feb, 2020 15 commits
-
-
Jiri Pirko authored
Looks like the iavf code actually experienced a race condition, when a developer took code before the check for chain 0 was put to helper. So use tc_cls_can_offload_and_chain0() helper instead of direct check and move the check to _cb() so this is similar to i40e code. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Saeed Mahameed authored
Return NULL instead of 0. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
-
Saeed Mahameed authored
drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c:191:13: sparse: warning: incorrect type in assignment (different base types) Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
-
Nathan Chancellor authored
Clang warns: In file included from ../drivers/net/ethernet/mellanox/mlx5/core/main.c:73: ../drivers/net/ethernet/mellanox/mlx5/core/diag/rsc_dump.h:4:9: warning: '__MLX5_RSC_DUMP_H' is used as a header guard here, followed by #define of a different macro [-Wheader-guard] #ifndef __MLX5_RSC_DUMP_H ^~~~~~~~~~~~~~~~~ ../drivers/net/ethernet/mellanox/mlx5/core/diag/rsc_dump.h:5:9: note: '__MLX5_RSC_DUMP__H' is defined here; did you mean '__MLX5_RSC_DUMP_H'? #define __MLX5_RSC_DUMP__H ^~~~~~~~~~~~~~~~~~ __MLX5_RSC_DUMP_H 1 warning generated. Make them match to get the intended behavior and remove the warning. Fixes: 12206b17 ("net/mlx5: Add support for resource dump") Link: https://github.com/ClangBuiltLinux/linux/issues/897Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Hans Wippel authored
Fix a vxlan typo in the mlx5 driver documentation. Signed-off-by: Hans Wippel <ndev@hwipl.net> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Tariq Toukan authored
We can avoid an indirect call per compressed completion wrapping the completion handling call with the appropriate helper. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Tariq Toukan authored
We can avoid an indirect call per NAPI cycle wrapping the RX descriptors posting call with the appropriate helper. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Maxim Mikityanskiy authored
The current steps that are performed when the trust state changes, if the channels are active: 1. The trust state is changed in hardware. 2. The new inline mode is calculated. 3. If the new inline mode is different, the channels are recreated using the new inline mode. This approach has some issues: 1. There is a time gap between changing trust state in hardware and starting sending enough inline headers (the latter happens after recreation of channels). It leads to failed transmissions and error CQEs. 2. If the new channels fail to open, we'll be left with the old ones, but the hardware will be configured for the new trust state, so the interval when we can see TX errors never ends. This patch fixes the issues above by moving the trust state change into the preactivate hook that runs during the recreation of the channels when no channels are active, so it eliminates the gap of partially applied configuration. If the inline mode doesn't change with the change of the trust state, the channels won't be recreated, just like before this patch. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Maxim Mikityanskiy authored
Sometimes the preactivate hook of mlx5e_safe_switch_channels needs more parameters than just struct mlx5e_priv *. For such cases, a new parameter (void *context) is added to preactivate hooks. Some of the existing normal functions are currently used as preactivate callbacks. To avoid adding an extra unused parameter, they are wrapped in an automatic way using the MLX5E_DEFINE_PREACTIVATE_WRAPPER_CTX macro. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Maxim Mikityanskiy authored
Currently mlx5e_switch_priv_channels expects that the preactivate hook doesn't fail, however, it can fail, because it may set hardware parameters. This commit addresses this issue and provides a way to recover from failures of the preactivate hook: the old channels are not closed until the point where nothing can fail anymore, so in case preactivate fails, the driver can roll back the old channels and activate them again. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Maxim Mikityanskiy authored
The number of queues is now updated by mlx5e_update_netdev_queues in a centralized way, when no channels are active. Remove an extra occurrence of netif_set_real_num_tx_queues to prepare it for the next commit. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Maxim Mikityanskiy authored
Currently, mlx5e notifies the kernel about the number of queues and sets the default XPS cpumasks when channels are activated. This implementation has several corner cases, in which the kernel may not be updated on time, or XPS cpumasks may be reset when not directly touched by the user. This commit fixes these corner cases to match the following expected behavior: 1. The number of queues always corresponds to the number of channels configured. 2. XPS cpumasks are set to driver's defaults on netdev attach. 3. XPS cpumasks set by user are not reset, unless the number of channels changes. If the number of channels changes, they are reset to driver's defaults. (In general case, when the number of channels increases or decreases, it's not possible to guess how to convert the current XPS cpumasks to work with the new number of channels, so we let the user reconfigure it if they change the number of channels.) XPS cpumasks are no longer stored per channel. Only one temporary cpumask is used. The old stored cpumasks didn't reflect the user's changes and were not used after applying them. A scratchpad area is added to struct mlx5e_priv. As cpumask_var_t requires allocation, and the preactivate hook can't fail, we need to preallocate the temporary cpumask in advance. It's stored in the scratchpad. Fixes: 149e566f ("net/mlx5e: Expand XPS cpumask to cover all online cpus") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Maxim Mikityanskiy authored
mlx5e_ethtool_set_channels updates the indirection table before switching to the new channels. If the switch fails, the indirection table is new, but the channels are old, which is wrong. Fix it by using the preactivate hook of mlx5e_safe_switch_channels to update the indirection table at the stage when nothing can fail anymore. As the code that updates the indirection table is now encapsulated into a new function, use that function in the attach flow when the driver has to reduce the number of channels, and prepare the code for the next commit. Fixes: 85082dba ("net/mlx5e: Correctly handle RSS indirection table when changing number of channels") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Maxim Mikityanskiy authored
mlx5e_safe_switch_channels accepts a callback to be called before activating new channels. It is intended to configure some hardware parameters in cases where channels are recreated because some configuration has changed. Recently, this callback has started being used to update the driver's internal MLX5E_STATE_XDP_OPEN flag, and the following patches also intend to use this callback for software preparations. This patch renames the hw_modify callback to preactivate, so that the name fits better. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Maxim Mikityanskiy authored
As a preparation for one of the following commits, create a function to encapsulate the code that notifies the kernel about the new amount of RX and TX queues. The code will be called multiple times in the next commit. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-