1. 27 Feb, 2020 29 commits
    • Paolo Abeni's avatar
      mptcp: add work queue skeleton · 80992017
      Paolo Abeni authored
      Will be extended with functionality in followup patches.
      Initial user is moving skbs from subflows receive queue to
      the mptcp-level receive queue.
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      80992017
    • Florian Westphal's avatar
      mptcp: add and use mptcp_data_ready helper · 101f6f85
      Florian Westphal authored
      allows us to schedule the work queue to drain the ssk receive queue in
      a followup patch.
      
      This is needed to avoid sending all-to-pessimistic mptcp-level
      acknowledgements.  At this time, the ack_seq is what was last read by
      userspace instead of the highest in-sequence number queued for reading.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      101f6f85
    • David S. Miller's avatar
      Merge branch 'mlxsw-Small-driver-update' · 5cd129dd
      David S. Miller authored
      Jiri Pirko says:
      
      ====================
      mlxsw: Small driver update
      
      This patchset contains couple of patches not related to each other. They
      are small optimization and extension changes to the driver.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5cd129dd
    • Petr Machata's avatar
      mlxsw: spectrum: Add mlxsw_sp_span_ops.buffsize_get for Spectrum-3 · 3b909c55
      Petr Machata authored
      The buffer factor on Spectrum-3 is larger than on Spectrum-2. Add a new
      callback and use it for mlxsw_sp->span_ops on Spectrum-3.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3b909c55
    • Ido Schimmel's avatar
      mlxsw: spectrum: Initialize advertised speeds to supported speeds · b401ff85
      Ido Schimmel authored
      During port initialization the driver instructs the device to only
      advertise speeds that can be supported by the port's current width.
      
      Since the device now returns the supported speeds based on the port's
      current width, the driver no longer needs to compute the speeds that can
      be advertised.
      
      Simplify port initialization by setting the advertised speeds to the
      queried supported speeds.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b401ff85
    • Petr Machata's avatar
      mlxsw: spectrum: Move the ECN-marked packet counter to ethtool · 8a29581e
      Petr Machata authored
      Spectrum-1 and Spectrum-2 do not have a per-TC counter of number of packets
      marked by ECN. The value reported currently is the total number of marked
      packets. Showing this value at individual TC Qdiscs is misleading.
      
      Move the counter to ethtool instead.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8a29581e
    • Jiri Pirko's avatar
      mlxsw: spectrum_switchdev: Optimize SFN records processing · 648e53ca
      Jiri Pirko authored
      Currently, only one SFN query is done from repetitive work at a time,
      processing 64 entries. Another work iteration is scheduled in 100ms,
      that means that the max rate of learned FDB entries is limited to 6400/s.
      That is slow. Fix this by doing 2 optimizations:
      1) Run 10 SFN queries at a time.
      2) In case the SFN is not drained, schedule work with 0 delay to allow
         to continue processing rest of the records.
      
      On a testing setup with 500K entries the time to process decreased
      from 870secs to 10secs.
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Tested-by: default avatarAlex Kushnarov <alexanderk@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      648e53ca
    • Randy Dunlap's avatar
      af_llc: fix if-statement empty body warning · c535f920
      Randy Dunlap authored
      When debugging via dprintk() is not enabled, make the dprintk()
      macro be an empty do-while loop, as is done in
      <linux/sunrpc/debug.h>.
      
      This fixes a gcc warning when -Wextra is set:
      ../net/llc/af_llc.c:974:51: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
      
      I have verified that there is not object code change (with gcc 7.5.0).
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: netdev@vger.kernel.org
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c535f920
    • David S. Miller's avatar
      Merge tag 'mlx5-updates-2020-02-25' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 165b94ff
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      mlx5-updates-2020-02-25
      
      The following series provides some misc updates to mlx5 driver:
      
      1) From Maxim, Refactoring for mlx5e netdev channels recreation flow.
        - Add error handling
        - Add context to the preactivate hook
        - Use preactivate hook with context where it can be used
          and subsequently unify channel recreation flow everywhere.
        - Fix XPS cpumask to not reset upon channel recreation.
      
      2) From Tariq:
        - Use indirect calls wrapper on RX.
        - Check LRO capability bit
      
      3) Multiple small cleanups
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      165b94ff
    • David S. Miller's avatar
      Merge branch 'net-smc-improve-peer-ID-in-CLC-decline' · 06baf4be
      David S. Miller authored
      Hans Wippel says:
      
      ====================
      net/smc: improve peer ID in CLC decline
      
      The following two patches improve the peer ID in CLC decline messages if
      RoCE devices are present in the host but no suitable device is found for
      a connection. The first patch reworks the peer ID initialization. The
      second patch contains the actual changes of the CLC decline messages.
      
      Changes v1 -> v2:
      * make smc_ib_is_valid_local_systemid() static in first patch
      * changed if in smc_clc_send_decline() to remove curly braces
      
      Changes RFC -> v1:
      * split the patch into two parts
      * removed zero assignment to global variable (thanks Leon)
      
      Thanks to Leon Romanovsky and Karsten Graul for the feedback!
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      06baf4be
    • Hans Wippel's avatar
      net/smc: improve peer ID in CLC decline for SMC-R · a082ec89
      Hans Wippel authored
      According to RFC 7609, all CLC messages contain a peer ID that consists
      of a unique instance ID and the MAC address of one of the host's RoCE
      devices. But if a SMC-R connection cannot be established, e.g., because
      no matching pnet table entry is found, the current implementation uses a
      zero value in the CLC decline message although the host's peer ID is set
      to a proper value.
      
      If no RoCE and no ISM device is usable for a connection, there is no LGR
      and the LGR check in smc_clc_send_decline() prevents that the peer ID is
      copied into the CLC decline message for both SMC-D and SMC-R. So, this
      patch modifies the check to also accept the case of no LGR. Also, only a
      valid peer ID is copied into the decline message.
      Signed-off-by: default avatarHans Wippel <ndev@hwipl.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a082ec89
    • Hans Wippel's avatar
      net/smc: rework peer ID handling · 366bb249
      Hans Wippel authored
      This patch initializes the peer ID to a random instance ID and a zero
      MAC address. If a RoCE device is in the host, the MAC address part of
      the peer ID is overwritten with the respective address. Also, a function
      for checking if the peer ID is valid is added. A peer ID is considered
      valid if the MAC address part contains a non-zero MAC address.
      Signed-off-by: default avatarHans Wippel <ndev@hwipl.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      366bb249
    • Arjun Roy's avatar
      tcp-zerocopy: Update returned getsockopt() optlen. · 0b7f41f6
      Arjun Roy authored
      TCP receive zerocopy currently does not update the returned optlen for
      getsockopt() if the user passed in a larger than expected value.
      Thus, userspace cannot properly determine if all the fields are set in
      the passed-in struct. This patch sets the optlen for this case before
      returning, in keeping with the expected operation of getsockopt().
      
      Fixes: c8856c05 ("tcp-zerocopy: Return inq along with tcp receive zerocopy.")
      Signed-off-by: default avatarArjun Roy <arjunroy@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0b7f41f6
    • David S. Miller's avatar
      Merge branch 'net-fix-sysfs-permssions-when-device-changes-network' · ebb4a4bf
      David S. Miller authored
      Christian Brauner says:
      
      ====================
      net: fix sysfs permssions when device changes network
      
      /* v7 */
      This is v7 with a build warning fixup that slipped past me the last
      time. It removes to unused variables in sysfs_group_change_owner(). I
      observed no warning when building just now.
      
      /* v6 */
      This is v6 with two small fixups. I missed adapting the commit message
      to reflect the renamed helper for changing the owner of sysfs files and
      I also forgot to make the new dpm helper static inline.
      
      /* v5 */
      This is v5 with a small fixup requested by Rafael.
      
      /* v4 */
      This is v4 with more documentation and other fixes that Greg requested.
      
      /* v3 */
      This is v3 with explicit uid and gid parameters added to functions that
      change sysfs object ownership as Greg requested.
      
      (I've tagged this with net-next since it's triggered by a bug for
       network device files but it also touches driver core aspects so it's
       not clear-cut. I can of course split this series into separate
       patchsets.)
      We have been struggling with a bug surrounding the ownership of network
      device sysfs files when moving network devices between network
      namespaces owned by different user namespaces reported by multiple
      users.
      
      Currently, when moving network devices between network namespaces the
      ownership of the corresponding sysfs entries is not changed. This leads
      to problems when tools try to operate on the corresponding sysfs files.
      
      I also causes a bug when creating a network device in a network
      namespaces owned by a user namespace and moving that network device back
      to the host network namespaces. Because when a network device is created
      in a network namespaces it will be owned by the root user of the user
      namespace and all its associated sysfs files will also be owned by the
      root user of the corresponding user namespace.
      If such a network device has to be moved back to the host network
      namespace the permissions will still be set to the root user of the
      owning user namespaces of the originating network namespace. This means
      unprivileged users can e.g. re-trigger uevents for such incorrectly
      owned devices on the host or in other network namespaces. They can also
      modify the settings of the device itself through sysfs when they
      wouldn't be able to do the same through netlink. Both of these things
      are unwanted.
      
      For example, quite a few workloads will create network devices in the
      host network namespace. Other tools will then proceed to move such
      devices between network namespaces owner by other user namespaces. While
      the ownership of the device itself is updated in
      net/core/net-sysfs.c:dev_change_net_namespace() the corresponding sysfs
      entry for the device is not. Below you'll find that moving a network
      device (here a veth device) from a network namespace into another
      network namespaces owned by a different user namespace with a different
      id mapping. As you can see the permissions are wrong even though it is
      owned by the userns root user after it has been moved and can be
      interacted with through netlink:
      
      drwxr-xr-x 5 nobody nobody    0 Jan 25 18:08 .
      drwxr-xr-x 9 nobody nobody    0 Jan 25 18:08 ..
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 addr_assign_type
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 addr_len
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 address
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 broadcast
      -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier_changes
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier_down_count
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier_up_count
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 dev_id
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 dev_port
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 dormant
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 duplex
      -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 flags
      -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 gro_flush_timeout
      -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 ifalias
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 ifindex
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 iflink
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 link_mode
      -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 mtu
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 name_assign_type
      -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 netdev_group
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 operstate
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 phys_port_id
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 phys_port_name
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 phys_switch_id
      drwxr-xr-x 2 nobody nobody    0 Jan 25 18:09 power
      -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 proto_down
      drwxr-xr-x 4 nobody nobody    0 Jan 25 18:09 queues
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 speed
      drwxr-xr-x 2 nobody nobody    0 Jan 25 18:09 statistics
      lrwxrwxrwx 1 nobody nobody    0 Jan 25 18:08 subsystem -> ../../../../class/net
      -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 tx_queue_len
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 type
      -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:08 uevent
      
      Constrast this with creating a device of the same type in the network
      namespace directly. In this case the device's sysfs permissions will be
      correctly updated.
      (Please also note, that in a lot of workloads this strategy of creating
       the network device directly in the network device to workaround this
       issue can not be used. Either because the network device is dedicated
       after it has been created or because it used by a process that is
       heavily sandboxed and couldn't create network devices itself.):
      
      drwxr-xr-x 5 root   root      0 Jan 25 18:12 .
      drwxr-xr-x 9 nobody nobody    0 Jan 25 18:08 ..
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 addr_assign_type
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 addr_len
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 address
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 broadcast
      -rw-r--r-- 1 root   root   4096 Jan 25 18:12 carrier
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 carrier_changes
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 carrier_down_count
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 carrier_up_count
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 dev_id
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 dev_port
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 dormant
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 duplex
      -rw-r--r-- 1 root   root   4096 Jan 25 18:12 flags
      -rw-r--r-- 1 root   root   4096 Jan 25 18:12 gro_flush_timeout
      -rw-r--r-- 1 root   root   4096 Jan 25 18:12 ifalias
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 ifindex
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 iflink
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 link_mode
      -rw-r--r-- 1 root   root   4096 Jan 25 18:12 mtu
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 name_assign_type
      -rw-r--r-- 1 root   root   4096 Jan 25 18:12 netdev_group
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 operstate
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 phys_port_id
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 phys_port_name
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 phys_switch_id
      drwxr-xr-x 2 root   root      0 Jan 25 18:12 power
      -rw-r--r-- 1 root   root   4096 Jan 25 18:12 proto_down
      drwxr-xr-x 4 root   root      0 Jan 25 18:12 queues
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 speed
      drwxr-xr-x 2 root   root      0 Jan 25 18:12 statistics
      lrwxrwxrwx 1 nobody nobody    0 Jan 25 18:12 subsystem -> ../../../../class/net
      -rw-r--r-- 1 root   root   4096 Jan 25 18:12 tx_queue_len
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 type
      -rw-r--r-- 1 root   root   4096 Jan 25 18:12 uevent
      
      Now, when creating a network device in a network namespace owned by a
      user namespace and moving it to the host the permissions will be set to
      the id that the user namespace root user has been mapped to on the host
      leading to all sorts of permission issues mentioned above:
      
      458752
      drwxr-xr-x 5 458752 458752      0 Jan 25 18:12 .
      drwxr-xr-x 9 root   root        0 Jan 25 18:08 ..
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 addr_assign_type
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 addr_len
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 address
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 broadcast
      -rw-r--r-- 1 458752 458752   4096 Jan 25 18:12 carrier
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 carrier_changes
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 carrier_down_count
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 carrier_up_count
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 dev_id
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 dev_port
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 dormant
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 duplex
      -rw-r--r-- 1 458752 458752   4096 Jan 25 18:12 flags
      -rw-r--r-- 1 458752 458752   4096 Jan 25 18:12 gro_flush_timeout
      -rw-r--r-- 1 458752 458752   4096 Jan 25 18:12 ifalias
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 ifindex
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 iflink
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 link_mode
      -rw-r--r-- 1 458752 458752   4096 Jan 25 18:12 mtu
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 name_assign_type
      -rw-r--r-- 1 458752 458752   4096 Jan 25 18:12 netdev_group
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 operstate
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 phys_port_id
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 phys_port_name
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 phys_switch_id
      drwxr-xr-x 2 458752 458752      0 Jan 25 18:12 power
      -rw-r--r-- 1 458752 458752   4096 Jan 25 18:12 proto_down
      drwxr-xr-x 4 458752 458752      0 Jan 25 18:12 queues
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 speed
      drwxr-xr-x 2 458752 458752      0 Jan 25 18:12 statistics
      lrwxrwxrwx 1 root   root        0 Jan 25 18:12 subsystem -> ../../../../class/net
      -rw-r--r-- 1 458752 458752   4096 Jan 25 18:12 tx_queue_len
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 type
      -rw-r--r-- 1 458752 458752   4096 Jan 25 18:12 uevent
      
      Fix this by changing the basic sysfs files associated with network
      devices when moving them between network namespaces. To this end we add
      some infrastructure to sysfs.
      
      The patchset takes care to only do this when the owning user namespaces
      changes and the kids differ. So there's only a performance overhead,
      when the owning user namespace of the network namespace is different
      __and__ the kid mappings for the root user are different for the two
      user namespaces:
      Assume we have a netdev eth0 which we create in netns1 owned by userns1.
      userns1 has an id mapping of 0 100000 100000. Now we move eth0 into
      netns2 which is owned by userns2 which also defines an id mapping of 0
      100000 100000. In this case sysfs doesn't need updating. The patch will
      handle this case and not do any needless work. Now assume eth0 is moved
      into netns3 which is owned by userns3 which defines an id mapping of 0
      123456 65536. In this case the root user in each namespace corresponds
      to different kid and sysfs needs updating.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ebb4a4bf
    • Christian Brauner's avatar
      net: fix sysfs permssions when device changes network namespace · ef6a4c88
      Christian Brauner authored
      Now that we moved all the helpers in place and make use netdev_change_owner()
      to fixup the permissions when moving network devices between network
      namespaces.
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ef6a4c88
    • Christian Brauner's avatar
      net-sysfs: add queue_change_owner() · d755407d
      Christian Brauner authored
      Add a function to change the owner of the queue entries for a network device
      when it is moved between network namespaces.
      
      Currently, when moving network devices between network namespaces the
      ownership of the corresponding queue sysfs entries are not changed. This leads
      to problems when tools try to operate on the corresponding sysfs files. Fix
      this.
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d755407d
    • Christian Brauner's avatar
      net-sysfs: add netdev_change_owner() · e6dee9f3
      Christian Brauner authored
      Add a function to change the owner of a network device when it is moved
      between network namespaces.
      
      Currently, when moving network devices between network namespaces the
      ownership of the corresponding sysfs entries is not changed. This leads
      to problems when tools try to operate on the corresponding sysfs files.
      This leads to a bug whereby a network device that is created in a
      network namespaces owned by a user namespace will have its corresponding
      sysfs entry owned by the root user of the corresponding user namespace.
      If such a network device has to be moved back to the host network
      namespace the permissions will still be set to the user namespaces. This
      means unprivileged users can e.g. trigger uevents for such incorrectly
      owned devices. They can also modify the settings of the device itself.
      Both of these things are unwanted.
      
      For example, workloads will create network devices in the host network
      namespace. Other tools will then proceed to move such devices between
      network namespaces owner by other user namespaces. While the ownership
      of the device itself is updated in
      net/core/net-sysfs.c:dev_change_net_namespace() the corresponding sysfs
      entry for the device is not:
      
      drwxr-xr-x 5 nobody nobody    0 Jan 25 18:08 .
      drwxr-xr-x 9 nobody nobody    0 Jan 25 18:08 ..
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 addr_assign_type
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 addr_len
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 address
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 broadcast
      -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier_changes
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier_down_count
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier_up_count
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 dev_id
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 dev_port
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 dormant
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 duplex
      -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 flags
      -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 gro_flush_timeout
      -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 ifalias
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 ifindex
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 iflink
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 link_mode
      -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 mtu
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 name_assign_type
      -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 netdev_group
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 operstate
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 phys_port_id
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 phys_port_name
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 phys_switch_id
      drwxr-xr-x 2 nobody nobody    0 Jan 25 18:09 power
      -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 proto_down
      drwxr-xr-x 4 nobody nobody    0 Jan 25 18:09 queues
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 speed
      drwxr-xr-x 2 nobody nobody    0 Jan 25 18:09 statistics
      lrwxrwxrwx 1 nobody nobody    0 Jan 25 18:08 subsystem -> ../../../../class/net
      -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 tx_queue_len
      -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 type
      -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:08 uevent
      
      However, if a device is created directly in the network namespace then
      the device's sysfs permissions will be correctly updated:
      
      drwxr-xr-x 5 root   root      0 Jan 25 18:12 .
      drwxr-xr-x 9 nobody nobody    0 Jan 25 18:08 ..
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 addr_assign_type
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 addr_len
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 address
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 broadcast
      -rw-r--r-- 1 root   root   4096 Jan 25 18:12 carrier
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 carrier_changes
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 carrier_down_count
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 carrier_up_count
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 dev_id
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 dev_port
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 dormant
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 duplex
      -rw-r--r-- 1 root   root   4096 Jan 25 18:12 flags
      -rw-r--r-- 1 root   root   4096 Jan 25 18:12 gro_flush_timeout
      -rw-r--r-- 1 root   root   4096 Jan 25 18:12 ifalias
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 ifindex
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 iflink
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 link_mode
      -rw-r--r-- 1 root   root   4096 Jan 25 18:12 mtu
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 name_assign_type
      -rw-r--r-- 1 root   root   4096 Jan 25 18:12 netdev_group
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 operstate
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 phys_port_id
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 phys_port_name
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 phys_switch_id
      drwxr-xr-x 2 root   root      0 Jan 25 18:12 power
      -rw-r--r-- 1 root   root   4096 Jan 25 18:12 proto_down
      drwxr-xr-x 4 root   root      0 Jan 25 18:12 queues
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 speed
      drwxr-xr-x 2 root   root      0 Jan 25 18:12 statistics
      lrwxrwxrwx 1 nobody nobody    0 Jan 25 18:12 subsystem -> ../../../../class/net
      -rw-r--r-- 1 root   root   4096 Jan 25 18:12 tx_queue_len
      -r--r--r-- 1 root   root   4096 Jan 25 18:12 type
      -rw-r--r-- 1 root   root   4096 Jan 25 18:12 uevent
      
      Now, when creating a network device in a network namespace owned by a
      user namespace and moving it to the host the permissions will be set to
      the id that the user namespace root user has been mapped to on the host
      leading to all sorts of permission issues:
      
      458752
      drwxr-xr-x 5 458752 458752      0 Jan 25 18:12 .
      drwxr-xr-x 9 root   root        0 Jan 25 18:08 ..
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 addr_assign_type
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 addr_len
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 address
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 broadcast
      -rw-r--r-- 1 458752 458752   4096 Jan 25 18:12 carrier
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 carrier_changes
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 carrier_down_count
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 carrier_up_count
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 dev_id
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 dev_port
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 dormant
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 duplex
      -rw-r--r-- 1 458752 458752   4096 Jan 25 18:12 flags
      -rw-r--r-- 1 458752 458752   4096 Jan 25 18:12 gro_flush_timeout
      -rw-r--r-- 1 458752 458752   4096 Jan 25 18:12 ifalias
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 ifindex
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 iflink
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 link_mode
      -rw-r--r-- 1 458752 458752   4096 Jan 25 18:12 mtu
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 name_assign_type
      -rw-r--r-- 1 458752 458752   4096 Jan 25 18:12 netdev_group
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 operstate
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 phys_port_id
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 phys_port_name
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 phys_switch_id
      drwxr-xr-x 2 458752 458752      0 Jan 25 18:12 power
      -rw-r--r-- 1 458752 458752   4096 Jan 25 18:12 proto_down
      drwxr-xr-x 4 458752 458752      0 Jan 25 18:12 queues
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 speed
      drwxr-xr-x 2 458752 458752      0 Jan 25 18:12 statistics
      lrwxrwxrwx 1 root   root        0 Jan 25 18:12 subsystem -> ../../../../class/net
      -rw-r--r-- 1 458752 458752   4096 Jan 25 18:12 tx_queue_len
      -r--r--r-- 1 458752 458752   4096 Jan 25 18:12 type
      -rw-r--r-- 1 458752 458752   4096 Jan 25 18:12 uevent
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e6dee9f3
    • Christian Brauner's avatar
      drivers/base/power: add dpm_sysfs_change_owner() · 3b52fc5d
      Christian Brauner authored
      Add a helper to change the owner of a device's power entries. This
      needs to happen when the ownership of a device is changed, e.g. when
      moving network devices between network namespaces.
      This function will be used to correctly account for ownership changes,
      e.g. when moving network devices between network namespaces.
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: default avatar"Rafael J. Wysocki" <rafael@kernel.org>
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3b52fc5d
    • Christian Brauner's avatar
      device: add device_change_owner() · b8f33e5d
      Christian Brauner authored
      Add a helper to change the owner of a device's sysfs entries. This
      needs to happen when the ownership of a device is changed, e.g. when
      moving network devices between network namespaces.
      This function will be used to correctly account for ownership changes,
      e.g. when moving network devices between network namespaces.
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b8f33e5d
    • Christian Brauner's avatar
      sysfs: add sysfs_change_owner() · 2c4f9401
      Christian Brauner authored
      Add a helper to change the owner of sysfs objects.
      This function will be used to correctly account for kobject ownership
      changes, e.g. when moving network devices between network namespaces.
      
      This mirrors how a kobject is added through driver core which in its guts is
      done via kobject_add_internal() which in summary creates the main directory via
      create_dir(), populates that directory with the groups associated with the
      ktype of the kobject (if any) and populates the directory with the basic
      attributes associated with the ktype of the kobject (if any). These are the
      basic steps that are associated with adding a kobject in sysfs.
      Any additional properties are added by the specific subsystem itself (not by
      driver core) after it has registered the device. So for the example of network
      devices, a network device will e.g. register a queue subdirectory under the
      basic sysfs directory for the network device and than further subdirectories
      within that queues subdirectory.  But that is all specific to network devices
      and they call the corresponding sysfs functions to do that directly when they
      create those queue objects. So anything that a subsystem adds outside of what
      driver core does must also be changed by it (That's already true for removal of
      files it created outside of driver core.) and it's the same for ownership
      changes.
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2c4f9401
    • Christian Brauner's avatar
      sysfs: add sysfs_group{s}_change_owner() · 303a4276
      Christian Brauner authored
      Add helpers to change the owner of sysfs groups.
      This function will be used to correctly account for kobject ownership
      changes, e.g. when moving network devices between network namespaces.
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      303a4276
    • Christian Brauner's avatar
      sysfs: add sysfs_link_change_owner() · 0666a3ae
      Christian Brauner authored
      Add a helper to change the owner of a sysfs link.
      This function will be used to correctly account for kobject ownership
      changes, e.g. when moving network devices between network namespaces.
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0666a3ae
    • Christian Brauner's avatar
      sysfs: add sysfs_file_change_owner() · f70ce185
      Christian Brauner authored
      Add helpers to change the owner of a sysfs files.
      This function will be used to correctly account for kobject ownership
      changes, e.g. when moving network devices between network namespaces.
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f70ce185
    • Gustavo A. R. Silva's avatar
      net: cisco: Replace zero-length array with flexible-array member · d1c73cbd
      Gustavo A. R. Silva authored
      The current codebase makes use of the zero-length array language
      extension to the C90 standard, but the preferred mechanism to declare
      variable-length types such as these ones is a flexible array member[1][2],
      introduced in C99:
      
      struct foo {
              int stuff;
              struct boo array[];
      };
      
      By making use of the mechanism above, we will get a compiler warning
      in case the flexible array does not occur last in the structure, which
      will help us prevent some kind of undefined behavior bugs from being
      inadvertently introduced[3] to the codebase from now on.
      
      Also, notice that, dynamic memory allocations won't be affected by
      this change:
      
      "Flexible array members have incomplete type, and so the sizeof operator
      may not be applied. As a quirk of the original implementation of
      zero-length arrays, sizeof evaluates to zero."[1]
      
      Lastly, fix the following checkpatch warning:
      CHECK: Prefer kernel type 'u32' over 'u_int32_t'
      #61: FILE: drivers/net/ethernet/cisco/enic/vnic_devcmd.h:653:
      +	u_int32_t val[];
      
      This issue was found with the help of Coccinelle.
      
      [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
      [2] https://github.com/KSPP/linux/issues/21
      [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour")
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d1c73cbd
    • Gustavo A. R. Silva's avatar
      net: marvell: Replace zero-length array with flexible-array member · 274ac283
      Gustavo A. R. Silva authored
      The current codebase makes use of the zero-length array language
      extension to the C90 standard, but the preferred mechanism to declare
      variable-length types such as these ones is a flexible array member[1][2],
      introduced in C99:
      
      struct foo {
              int stuff;
              struct boo array[];
      };
      
      By making use of the mechanism above, we will get a compiler warning
      in case the flexible array does not occur last in the structure, which
      will help us prevent some kind of undefined behavior bugs from being
      inadvertently introduced[3] to the codebase from now on.
      
      Also, notice that, dynamic memory allocations won't be affected by
      this change:
      
      "Flexible array members have incomplete type, and so the sizeof operator
      may not be applied. As a quirk of the original implementation of
      zero-length arrays, sizeof evaluates to zero."[1]
      
      This issue was found with the help of Coccinelle.
      
      [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
      [2] https://github.com/KSPP/linux/issues/21
      [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour")
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      274ac283
    • Gustavo A. R. Silva's avatar
      net: hns: Replace zero-length array with flexible-array member · c5d6cf90
      Gustavo A. R. Silva authored
      The current codebase makes use of the zero-length array language
      extension to the C90 standard, but the preferred mechanism to declare
      variable-length types such as these ones is a flexible array member[1][2],
      introduced in C99:
      
      struct foo {
              int stuff;
              struct boo array[];
      };
      
      By making use of the mechanism above, we will get a compiler warning
      in case the flexible array does not occur last in the structure, which
      will help us prevent some kind of undefined behavior bugs from being
      inadvertently introduced[3] to the codebase from now on.
      
      Also, notice that, dynamic memory allocations won't be affected by
      this change:
      
      "Flexible array members have incomplete type, and so the sizeof operator
      may not be applied. As a quirk of the original implementation of
      zero-length arrays, sizeof evaluates to zero."[1]
      
      This issue was found with the help of Coccinelle.
      
      [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
      [2] https://github.com/KSPP/linux/issues/21
      [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour")
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c5d6cf90
    • Gustavo A. R. Silva's avatar
      sfc: Replace zero-length array with flexible-array member · 62f19142
      Gustavo A. R. Silva authored
      The current codebase makes use of the zero-length array language
      extension to the C90 standard, but the preferred mechanism to declare
      variable-length types such as these ones is a flexible array member[1][2],
      introduced in C99:
      
      struct foo {
              int stuff;
              struct boo array[];
      };
      
      By making use of the mechanism above, we will get a compiler warning
      in case the flexible array does not occur last in the structure, which
      will help us prevent some kind of undefined behavior bugs from being
      inadvertently introduced[3] to the codebase from now on.
      
      Also, notice that, dynamic memory allocations won't be affected by
      this change:
      
      "Flexible array members have incomplete type, and so the sizeof operator
      may not be applied. As a quirk of the original implementation of
      zero-length arrays, sizeof evaluates to zero."[1]
      
      This issue was found with the help of Coccinelle.
      
      [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
      [2] https://github.com/KSPP/linux/issues/21
      [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour")
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Acked-by: default avatarMartin Habets <mhabets@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      62f19142
    • Gustavo A. R. Silva's avatar
      qlogic: Replace zero-length array with flexible-array member · 4a34d825
      Gustavo A. R. Silva authored
      The current codebase makes use of the zero-length array language
      extension to the C90 standard, but the preferred mechanism to declare
      variable-length types such as these ones is a flexible array member[1][2],
      introduced in C99:
      
      struct foo {
              int stuff;
              struct boo array[];
      };
      
      By making use of the mechanism above, we will get a compiler warning
      in case the flexible array does not occur last in the structure, which
      will help us prevent some kind of undefined behavior bugs from being
      inadvertently introduced[3] to the codebase from now on.
      
      Also, notice that, dynamic memory allocations won't be affected by
      this change:
      
      "Flexible array members have incomplete type, and so the sizeof operator
      may not be applied. As a quirk of the original implementation of
      zero-length arrays, sizeof evaluates to zero."[1]
      
      This issue was detected with the help of Coccinelle.
      
      [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
      [2] https://github.com/KSPP/linux/issues/21
      [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour")
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4a34d825
    • Florian Fainelli's avatar
      Revert "net: dsa: bcm_sf2: Also configure Port 5 for 2Gb/sec on 7278" · 3f02735e
      Florian Fainelli authored
      This reverts commit 7458bd54 ("net: dsa:
      bcm_sf2: Also configure Port 5 for 2Gb/sec on 7278") as it causes
      advanced congestion buffering issues with 7278 switch devices when using
      their internal Giabit PHY. While this is being debugged, continue with
      conservative defaults that work and do not cause packet loss.
      
      Fixes: 7458bd54 ("net: dsa: bcm_sf2: Also configure Port 5 for 2Gb/sec on 7278")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarVivien Didelot <vivien.didelot@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3f02735e
  2. 26 Feb, 2020 11 commits