1. 12 Jun, 2023 40 commits
    • Uwe Kleine-König's avatar
      net: mlxsw: i2c: Switch back to use struct i2c_driver's .probe() · 3a2cb45c
      Uwe Kleine-König authored
      After commit b8a1a4cd ("i2c: Provide a temporary .probe_new()
      call-back type"), all drivers being converted to .probe_new() and then
      commit 03c835f4 ("i2c: Switch .probe() to not take an id parameter")
      convert back to (the new) .probe() to be able to eventually drop
      .probe_new() from struct i2c_driver.
      Signed-off-by: default avatarUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Tested-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3a2cb45c
    • Daniel Golle's avatar
      net: phy: add driver for MediaTek SoC built-in GE PHYs · 98c485ea
      Daniel Golle authored
      Some of MediaTek's Filogic SoCs come with built-in gigabit Ethernet
      PHYs which require calibration data from the SoC's efuse.
      Despite the similar design the driver doesn't share any code with the
      existing mediatek-ge.c.
      Add support for such PHYs by introducing a new driver with basic
      support for MediaTek SoCs MT7981 and MT7988 built-in 1GE PHYs.
      Signed-off-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      98c485ea
    • David S. Miller's avatar
      Merge tag 'mlx5-updates-2023-06-09' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · a89dc587
      David S. Miller authored
      mlx5-updates-2023-06-09
      
      1) Embedded CPU Virtual Functions
      2) Lightweight local SFs
      
      Daniel Jurgens says:
      ====================
      Embedded CPU Virtual Functions
      
      This series enables the creation of virtual functions on Bluefield (the
      embedded CPU platform). Embedded CPU virtual functions (EC VFs). EC VF
      creation, deletion and management interfaces are the same as those for
      virtual functions in a server with a Connect-X NIC.
      
      When using EC VFs on the ARM the creation of virtual functions on the
      host system is still supported. Host VFs eswitch vports occupy a range
      of 1..max_vfs, the EC VF vport range is max_vfs+1..max_ec_vfs.
      
      Every function (PF, ECPF, VF, EC VF, and subfunction) has a function ID
      associated with it. Prior to this series the function ID and the eswitch
      vport were the same. That is no longer the case, the EC VF function ID
      range is 1..max_ec_vfs. When querying or setting the capabilities of an
      EC VF function an new bit must be set in the query/set HCA cap
      structure.
      
      This is a high level overview of the changes made:
      	- Allocate vports for EC VFs if they are enabled.
      	- Create representors and devlink ports for the EC VF vports.
      	- When querying/setting HCA caps by vport break the assumption
      	  that function ID is the same a vport number and adjust
      	  accordingly.
      	- Create a new type of page, so that when SRIOV on the ARM is
      	  disabled, but remains enabled on the host, the driver can
      	  wait for the correct pages.
      	- Update SRIOV code to support EC VF creation/deletion.
      
      ===================
      
      Lightweight local SFs:
      
      Last 3 patches form Shay Drory:
      
      SFs are heavy weight and by default they come with the full package of
      ConnectX features. Usually users want specialized SFs for one specific
      purpose and using devlink users will almost always override the set of
      advertises features of an SF and reload it.
      
      Shay Drory says:
      ================
      In order to avoid the wasted time and resources on the reload, local SFs
      will probe without any auxiliary sub-device, so that the SFs can be
      configured prior to its full probe.
      
      The defaults of the enable_* devlink params of these SFs are set to
      false.
      
      Usage example:
      Create SF:
      $ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
      $ devlink port function set pci/0000:08:00.0/32768 \
                     hw_addr 00:00:00:00:00:11 state active
      
      Enable ETH auxiliary device:
      $ devlink dev param set auxiliary/mlx5_core.sf.1 \
                    name enable_eth value true cmode driverinit
      
      Now, in order to fully probe the SF, use devlink reload:
      $ devlink dev reload auxiliary/mlx5_core.sf.1
      
      At this point the user have SF devlink instance with auxiliary device
      for the Ethernet functionality only.
      
      ================
      a89dc587
    • David S. Miller's avatar
      Merge branch 'tcp-tx-headless' · 73f49f8c
      David S. Miller authored
      Eric Dumazet says:
      
      ====================
      tcp: tx path fully headless
      
      This series completes transition of TCP stack tx path
      to headless packets : All payload now reside in page frags,
      never in skb->head.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      73f49f8c
    • Eric Dumazet's avatar
      tcp: remove size parameter from tcp_stream_alloc_skb() · 5882efff
      Eric Dumazet authored
      Now all tcp_stream_alloc_skb() callers pass @size == 0, we can
      remove this parameter.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5882efff
    • Eric Dumazet's avatar
      tcp: remove some dead code · b4a24397
      Eric Dumazet authored
      Now all skbs in write queue do not contain any payload in skb->head,
      we can remove some dead code.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b4a24397
    • Eric Dumazet's avatar
      tcp: let tcp_send_syn_data() build headless packets · fbf93406
      Eric Dumazet authored
      tcp_send_syn_data() is the last component in TCP transmit
      path to put payload in skb->head.
      
      Switch it to use page frags, so that we can remove dead
      code later.
      
      This allows to put more payload than previous implementation.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fbf93406
    • David S. Miller's avatar
      Merge branch 'ethtool-extack' · f2f069da
      David S. Miller authored
      Jakub Kicinski says:
      
      ====================
      net: support extack in dump and simplify ethtool uAPI
      
      Ethtool currently requires header nest to be always present even if
      it doesn't have to carry any attr for a given request. This inflicts
      unnecessary pain on the users.
      
      What makes it worse is that extack was not working in dump's ->start()
      callback. Address both of those issues.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f2f069da
    • Jakub Kicinski's avatar
      net: ethtool: don't require empty header nests · 500e1340
      Jakub Kicinski authored
      Ethtool currently requires a header nest (which is used to carry
      the common family options) in all requests including dumps.
      
        $ cli.py --spec netlink/specs/ethtool.yaml --dump channels-get
        lib.ynl.NlError: Netlink error: Invalid argument
        nl_len = 64 (48) nl_flags = 0x300 nl_type = 2
      	error: -22      extack: {'msg': 'request header missing'}
      
        $ cli.py --spec netlink/specs/ethtool.yaml --dump channels-get \
                 --json '{"header":{}}';  )
        [{'combined-count': 1,
          'combined-max': 1,
          'header': {'dev-index': 2, 'dev-name': 'enp1s0'}}]
      
      Requiring the header nest to always be there may seem nice
      from the consistency perspective, but it's not serving any
      practical purpose. We shouldn't burden the user like this.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      500e1340
    • Jakub Kicinski's avatar
      netlink: support extack in dump ->start() · 5ab8c41c
      Jakub Kicinski authored
      Commit 4a19edb6 ("netlink: Pass extack to dump handlers")
      added extack support to netlink dumps. It was focused on rtnl
      and since rtnl does not use ->start(), ->done() callbacks
      it ignored those. Genetlink on the other hand uses ->start()
      extensively, for parsing and input validation.
      
      Pass the extact in via struct netlink_dump_control and link
      it to cb for the time of ->start(). Both struct netlink_dump_control
      and extack itself live on the stack so we can't keep the same
      extack for the duration of the dump. This means that the extack
      visible in ->start() and each ->dump() callbacks will be different.
      Corner cases like reporting a warning message in DONE across dump
      calls are still not supported.
      
      We could put the extack (for dumps) in the socket struct,
      but layering makes it slightly awkward (extack pointer is decided
      before the DO / DUMP split).
      
      The genetlink dump error extacks are now surfaced:
      
        $ cli.py --spec netlink/specs/ethtool.yaml --dump channels-get
        lib.ynl.NlError: Netlink error: Invalid argument
        nl_len = 64 (48) nl_flags = 0x300 nl_type = 2
      	error: -22	extack: {'msg': 'request header missing'}
      
      Previously extack was missing:
      
        $ cli.py --spec netlink/specs/ethtool.yaml --dump channels-get
        lib.ynl.NlError: Netlink error: Invalid argument
        nl_len = 36 (20) nl_flags = 0x100 nl_type = 2
      	error: -22
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5ab8c41c
    • David S. Miller's avatar
      Merge branch 'ynl-ethtool' · 23813168
      David S. Miller authored
      Jakub Kicinski says:
      
      ====================
      tools: ynl: generate code for the ethtool family
      
      And finally ethtool support. Thanks to Stan's work the ethtool family
      spec is quite complete, so there is a lot of operations to support.
      
      I chickened out of stats-get support, they require at the very least
      type-value support on a u64 scalar. Type-value is an arrangement where
      a u16 attribute is encoded directly in attribute type. Code gen can
      support this if the inside is a nest, we just throw in an extra
      field into that nest to carry the attr type. But a little more coding
      is needed to for a scalar, because first we need to turn the scalar
      into a struct with one member, then we can add the attr type.
      
      Other than that ethtool required event support (notification which
      does not share contents with any GET), but the previous series
      already added that to the codegen.
      
      I haven't tested all the ops here, and a few I tried seem to work.
      ====================
      Acked-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      23813168
    • Jakub Kicinski's avatar
      tools: ynl: add sample for ethtool · f561ff23
      Jakub Kicinski authored
      Configuring / reading ring sizes and counts is a fairly common
      operation for ethtool netlink. Present a sample doing that with
      YNL:
      
      $ ./ethtool
      Channels:
          enp1s0: combined 1
         eni1np1: combined 1
         eni2np1: combined 1
      Rings:
          enp1s0: rx 256 tx 256
         eni1np1: rx 0 tx 0
         eni2np1: rx 0 tx 0
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f561ff23
    • Jakub Kicinski's avatar
      tools: ynl: generate code for the ethtool family · 2d7be507
      Jakub Kicinski authored
      Generate the protocol code for ethtool. Skip the stats
      for now, they are the only outlier in terms of complexity.
      Stats are a sort-of semi-polymorphic (attr space of a nest
      depends on value of another attr) or a type-value-scalar,
      depending on how one wants to look at it...
      A challenge for another time.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2d7be507
    • Jakub Kicinski's avatar
      netlink: specs: ethtool: mark pads as pads · 68335713
      Jakub Kicinski authored
      Pad is a separate type. Even though in practice they can
      only be a u32 the value should be discarded.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      68335713
    • Jakub Kicinski's avatar
      netlink: specs: ethtool: untangle stats-get · 709d0c3b
      Jakub Kicinski authored
      Code gen for stats is a bit of a challenge, but from looking
      at the attrs I think that the format isn't quite right.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      709d0c3b
    • Jakub Kicinski's avatar
      netlink: specs: ethtool: untangle UDP tunnels and cable test a bit · 37c85222
      Jakub Kicinski authored
      UDP tunnel and cable test messages have a lot of nests,
      which do not match the names of the enum entries in C uAPI.
      Some of the structure / nesting also looks wrong.
      
      Untangle this a little bit based on the names, comments and
      educated guesses, I haven't actually tested the results.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      37c85222
    • Jakub Kicinski's avatar
      netlink: specs: ethtool: add empty enum stringset · 180ad455
      Jakub Kicinski authored
      C does not allow defining structures and enums with the same name.
      Since enum ethtool_stringset exists in the uAPI we need to include
      at least a stub of it in the spec. This will trigger name collision
      avoidance in the code gen.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      180ad455
    • Jakub Kicinski's avatar
      tools: ynl-gen: resolve enum vs struct name conflicts · 2c9d47a0
      Jakub Kicinski authored
      Ethtool has an attribute set called stringset, from which
      we'll generate struct ethtool_stringset. Unfortunately,
      the old ethtool header declares enum ethtool_stringset
      (the same name), to which compilers object.
      
      This seems unavoidable. Check struct names against known
      constants and append an underscore if conflict is detected.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2c9d47a0
    • Jakub Kicinski's avatar
      tools: ynl-gen: don't generate enum types if unnamed · dddc9f53
      Jakub Kicinski authored
      If attr set or enum has empty enum name we need to use u32 or int
      as function arguments and struct members.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dddc9f53
    • Jakub Kicinski's avatar
      netlink: specs: ethtool: add C render hints · d4813b11
      Jakub Kicinski authored
      Most of the C enum names are guessed correctly, but there
      is a handful of corner cases we need to name explicitly.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d4813b11
    • Jakub Kicinski's avatar
      netlink: specs: support setting prefix-name per attribute · ed2042cc
      Jakub Kicinski authored
      Ethtool's PSE PoDL has a attr nest with different prefixes:
      
      /* Power Sourcing Equipment */
      enum {
      	ETHTOOL_A_PSE_UNSPEC,
      	ETHTOOL_A_PSE_HEADER,			/* nest - _A_HEADER_* */
      	ETHTOOL_A_PODL_PSE_ADMIN_STATE,		/* u32 */
      	ETHTOOL_A_PODL_PSE_ADMIN_CONTROL,	/* u32 */
      	ETHTOOL_A_PODL_PSE_PW_D_STATUS,		/* u32 */
      
      Header has a prefix of ETHTOOL_A_PSE_ and other attrs prefix of
      ETHTOOL_A_PODL_PSE_ we can't cover them uniformly.
      If PODL was after PSE life would be easy.
      
      Now we either need to add prefixes to attr names which is yucky
      or support setting prefix name per attr.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ed2042cc
    • Jakub Kicinski's avatar
      tools: ynl-gen: record extra args for regen · 33eedb00
      Jakub Kicinski authored
      ynl-regen needs to know the arguments used to generate a file.
      Record excluded ops and, while at it, user headers.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      33eedb00
    • Jakub Kicinski's avatar
      tools: ynl-gen: support excluding tricky ops · 008bcd68
      Jakub Kicinski authored
      The ethtool family has a small handful of quite tricky ops
      and a lot of simple very useful ops. Teach ynl-gen to skip
      ops so that we can bypass the tricky ones.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      008bcd68
    • Rob Herring's avatar
      mdio: mdio-mux-mmioreg: Use of_property_read_reg() to parse "reg" · b30a1f30
      Rob Herring authored
      Use the recently added of_property_read_reg() helper to get the
      untranslated "reg" address value.
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b30a1f30
    • Krzysztof Kozlowski's avatar
      dt-bindings: net: drop unneeded quotes · 61ab5a06
      Krzysztof Kozlowski authored
      Cleanup bindings dropping unneeded quotes. Once all these are fixed,
      checking for this can be enabled in yamllint.
      Signed-off-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
      Acked-by: default avatarJernej Skrabec <jernej.skrabec@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      61ab5a06
    • David S. Miller's avatar
      Merge branch 'SCM_PIDFD-SCM_PEERPIDFD' · ba47545c
      David S. Miller authored
      Alexander Mikhalitsyn says:
      
      ====================
      Add SCM_PIDFD and SO_PEERPIDFD
      
      1. Implement SCM_PIDFD, a new type of CMSG type analogical to SCM_CREDENTIALS,
      but it contains pidfd instead of plain pid, which allows programmers not
      to care about PID reuse problem.
      
      2. Add SO_PEERPIDFD which allows to get pidfd of peer socket holder pidfd.
      This thing is direct analog of SO_PEERCRED which allows to get plain PID.
      
      3. Add SCM_PIDFD / SO_PEERPIDFD kselftest
      
      Idea comes from UAPI kernel group:
      https://uapi-group.org/kernel-features/
      
      Big thanks to Christian Brauner and Lennart Poettering for productive
      discussions about this and Luca Boccassi for testing and reviewing this.
      
      === Motivation behind this patchset
      
      Eric Dumazet raised a question:
      > It seems that we already can use pidfd_open() (since linux-5.3), and
      > pass the resulting fd in af_unix SCM_RIGHTS message ?
      
      Yes, it's possible, but it means that from the receiver side we need
      to trust the sent pidfd (in SCM_RIGHTS),
      or always use combination of SCM_RIGHTS+SCM_CREDENTIALS, then we can
      extract pidfd from SCM_RIGHTS,
      then acquire plain pid from pidfd and after compare it with the pid
      from SCM_CREDENTIALS.
      
      A few comments from other folks regarding this.
      
      Christian Brauner wrote:
      
      >Let me try and provide some of the missing background.
      
      >There are a range of use-cases where we would like to authenticate a
      >client through sockets without being susceptible to PID recycling
      >attacks. Currently, we can't do this as the race isn't fully fixable.
      >We can only apply mitigations.
      
      >What this patchset will allows us to do is to get a pidfd without the
      >client having to send us an fd explicitly via SCM_RIGHTS. As that's
      >already possibly as you correctly point out.
      
      >But for protocols like polkit this is quite important. Every message is
      >standalone and we would need to force a complete protocol change where
      >we would need to require that every client allocate and send a pidfd via
      >SCM_RIGHTS. That would also mean patching through all polkit users.
      
      >For something like systemd-journald where we provide logging facilities
      >and want to add metadata to the log we would also immensely benefit from
      >being able to get a receiver-side controlled pidfd.
      
      >With the message type we envisioned we don't need to change the sender
      >at all and can be safe against pid recycling.
      
      >Link: https://gitlab.freedesktop.org/polkit/polkit/-/merge_requests/154
      >Link: https://uapi-group.org/kernel-features
      
      Lennart Poettering wrote:
      
      >So yes, this is of course possible, but it would mean the pidfd would
      >have to be transported as part of the user protocol, explicitly sent
      >by the sender. (Moreover, the receiver after receiving the pidfd would
      >then still have to somehow be able to prove that the pidfd it just
      >received actually refers to the peer's process and not some random
      >process. – this part is actually solvable in userspace, but ugly)
      
      >The big thing is simply that we want that the pidfd is associated
      >*implicity* with each AF_UNIX connection, not explicitly. A lot of
      >userspace already relies on this, both in the authentication area
      >(polkit) as well as in the logging area (systemd-journald). Right now
      >using the PID field from SO_PEERCREDS/SCM_CREDENTIALS is racy though
      >and very hard to get right. Making this available as pidfd too, would
      >solve this raciness, without otherwise changing semantics of it all:
      >receivers can still enable the creds stuff as they wish, and the data
      >is then implicitly appended to the connections/datagrams the sender
      >initiates.
      
      >Or to turn this around: things like polkit are typically used to
      >authenticate arbitrary dbus methods calls: some service implements a
      >dbus method call, and when an unprivileged client then issues that
      >call, it will take the client's info, go to polkit and ask it if this
      >is ok. If we wanted to send the pidfd as part of the protocol we
      >basically would have to extend every single method call to contain the
      >client's pidfd along with it as an additional argument, which would be
      >a massive undertaking: it would change the prototypes of basically
      >*all* methods a service defines… And that's just ugly.
      
      >Note that Alex' patch set doesn't expose anything that wasn't exposed
      >before, or attach, propagate what wasn't before. All it does, is make
      >the field already available anyway (the struct ucred .pid field)
      >available also in a better way (as a pidfd), to solve a variety of
      >races, with no effect on the protocol actually spoken within the
      >AF_UNIX transport. It's a seamless improvement of the status quo.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ba47545c
    • Alexander Mikhalitsyn's avatar
      af_unix: Kconfig: make CONFIG_UNIX bool · 97154bcf
      Alexander Mikhalitsyn authored
      Let's make CONFIG_UNIX a bool instead of a tristate.
      We've decided to do that during discussion about SCM_PIDFD patchset [1].
      
      [1] https://lore.kernel.org/lkml/20230524081933.44dc8bea@kernel.org/
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Leon Romanovsky <leon@kernel.org>
      Cc: David Ahern <dsahern@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
      Cc: Lennart Poettering <mzxreary@0pointer.de>
      Cc: Luca Boccassi <bluca@debian.org>
      Cc: linux-kernel@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Suggested-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarAlexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
      Acked-by: default avatarChristian Brauner <brauner@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      97154bcf
    • Alexander Mikhalitsyn's avatar
      selftests: net: add SCM_PIDFD / SO_PEERPIDFD test · ec80f488
      Alexander Mikhalitsyn authored
      Basic test to check consistency between:
      - SCM_CREDENTIALS and SCM_PIDFD
      - SO_PEERCRED and SO_PEERPIDFD
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Leon Romanovsky <leon@kernel.org>
      Cc: David Ahern <dsahern@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: linux-kselftest@vger.kernel.org
      Signed-off-by: default avatarAlexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ec80f488
    • Alexander Mikhalitsyn's avatar
      net: core: add getsockopt SO_PEERPIDFD · 7b26952a
      Alexander Mikhalitsyn authored
      Add SO_PEERPIDFD which allows to get pidfd of peer socket holder pidfd.
      This thing is direct analog of SO_PEERCRED which allows to get plain PID.
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Leon Romanovsky <leon@kernel.org>
      Cc: David Ahern <dsahern@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
      Cc: Lennart Poettering <mzxreary@0pointer.de>
      Cc: Luca Boccassi <bluca@debian.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Stanislav Fomichev <sdf@google.com>
      Cc: bpf@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Reviewed-by: default avatarChristian Brauner <brauner@kernel.org>
      Acked-by: default avatarStanislav Fomichev <sdf@google.com>
      Tested-by: default avatarLuca Boccassi <bluca@debian.org>
      Signed-off-by: default avatarAlexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7b26952a
    • Alexander Mikhalitsyn's avatar
      scm: add SO_PASSPIDFD and SCM_PIDFD · 5e2ff670
      Alexander Mikhalitsyn authored
      Implement SCM_PIDFD, a new type of CMSG type analogical to SCM_CREDENTIALS,
      but it contains pidfd instead of plain pid, which allows programmers not
      to care about PID reuse problem.
      
      We mask SO_PASSPIDFD feature if CONFIG_UNIX is not builtin because
      it depends on a pidfd_prepare() API which is not exported to the kernel
      modules.
      
      Idea comes from UAPI kernel group:
      https://uapi-group.org/kernel-features/
      
      Big thanks to Christian Brauner and Lennart Poettering for productive
      discussions about this.
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Leon Romanovsky <leon@kernel.org>
      Cc: David Ahern <dsahern@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
      Cc: Lennart Poettering <mzxreary@0pointer.de>
      Cc: Luca Boccassi <bluca@debian.org>
      Cc: linux-kernel@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Tested-by: default avatarLuca Boccassi <bluca@debian.org>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarChristian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarAlexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e2ff670
    • David S. Miller's avatar
      Merge branch 'mlxsw-cleanups' · 55d7c914
      David S. Miller authored
      Petr Machata says:
      
      ====================
      mlxsw: Cleanups in router code
      
      This patchset moves some router-related code from spectrum.c to
      spectrum_router.c where it should be. It also simplifies handlers of
      netevent notifications.
      
      - Patch #1 caches router pointer in a dedicated variable. This obviates the
        need to access the same as mlxsw_sp->router, making lines shorter, and
        permitting a future patch to add code that fits within 80 character
        limit.
      
      - Patch #2 moves IP / IPv6 validation notifier blocks from spectrum.c
        to spectrum_router, where the handlers are anyway.
      
      - In patch #3, pass router pointer to scheduler of deferred work directly,
        instead of having it deduce it on its own.
      
      - This makes the router pointer available in the handler function
        mlxsw_sp_router_netevent_event(), so in patch #4, use it directly,
        instead of finding it through mlxsw_sp_port.
      
      - In patch #5, extend mlxsw_sp_router_schedule_work() so that the
        NETEVENT_NEIGH_UPDATE handler can use it directly instead of inlining
        equivalent code.
      
      - In patches #6 and #7, add helpers for two common operations involving
        a backing netdev of a RIF. This makes it unnecessary for the function
        mlxsw_sp_rif_dev() to be visible outside of the router module, so in
        patch #8, hide it.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      55d7c914
    • Petr Machata's avatar
      mlxsw: spectrum_router: Privatize mlxsw_sp_rif_dev() · df95ae66
      Petr Machata authored
      Now that the external users of mlxsw_sp_rif_dev() have been converted in
      the preceding patches, make the function static.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df95ae66
    • Petr Machata's avatar
      mlxsw: Convert does-RIF-have-this-netdev queries to a dedicated helper · 5374a50f
      Petr Machata authored
      In a number of places, a netdevice underlying a RIF is obtained only to
      compare it to another pointer. In order to clean up the interface between
      the router and the other modules, add a new helper to specifically answer
      this question, and convert the relevant uses to this new interface.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5374a50f
    • Petr Machata's avatar
      mlxsw: Convert RIF-has-netdevice queries to a dedicated helper · 0255f748
      Petr Machata authored
      In a number of places, a netdevice underlying a RIF is obtained only to
      check if it a NULL pointer. In order to clean up the interface between the
      router and the other modules, add a new helper to specifically answer this
      question, and convert the relevant uses to this new interface.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0255f748
    • Petr Machata's avatar
      mlxsw: spectrum_router: Reuse work neighbor initialization in work scheduler · 151b89f6
      Petr Machata authored
      After the struct mlxsw_sp_netevent_work.n field initialization is moved
      here, the body of code that handles NETEVENT_NEIGH_UPDATE is almost
      identical to the one in the helper function. Therefore defer to the helper
      instead of inlining the equivalent.
      
      Note that previously, the code took and put a reference of the netdevice.
      The new code defers to mlxsw_sp_dev_lower_is_port() to obviate the need for
      taking the reference.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      151b89f6
    • Petr Machata's avatar
      mlxsw: spectrum_router: Use the available router pointer for netevent handling · 14304e70
      Petr Machata authored
      This code handles NETEVENT_DELAY_PROBE_TIME_UPDATE, which is invoked every
      time the delay_probe_time changes. mlxsw router currently only maintains
      one timer, so the last delay_probe_time set wins.
      
      Currently, mlxsw uses mlxsw_sp_port_lower_dev_hold() to find a reference to
      the router. This is no longer necessary. But as a side effect, this makes
      sure that only updates to "interesting netdevices" (ones that have a
      physical netdevice lower) are projected.
      
      Retain that side effect by calling mlxsw_sp_port_dev_lower_find_rcu() and
      punting if there is none. Then just proceed using the router pointer that's
      already at hand in the helper.
      
      Note that previously, the code took and put a reference of the netdevice.
      Because the mlxsw_sp pointer is now obtained from the notifier block, the
      port pointer (non-) NULL-ness is all that's relevant, and the reference
      does not need to be taken anymore.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      14304e70
    • Petr Machata's avatar
      mlxsw: spectrum_router: Pass router to mlxsw_sp_router_schedule_work() directly · 48dde35e
      Petr Machata authored
      Instead of passing a notifier block and deducing the router pointer from
      that in the helper, do that in the caller, and pass the result. In the
      following patches, the pointer will also be made useful in the caller.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      48dde35e
    • Petr Machata's avatar
      mlxsw: spectrum_router: Move here inetaddr validator notifiers · 41b2bd20
      Petr Machata authored
      The validation logic is already in the router code. Move there the notifier
      blocks themselves as well.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      41b2bd20
    • Petr Machata's avatar
      mlxsw: spectrum_router: mlxsw_sp_router_fini(): Extract a helper variable · 50f6c3d5
      Petr Machata authored
      Make mlxsw_sp_router_fini() more similar to the _init() function (and more
      concise) by extracting the `router' handle to a named variable and using
      that throughout. The availability of a dedicated `router' variable will
      come in handy in following patches.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      50f6c3d5
    • Aaron Conole's avatar
      net: openvswitch: add support for l4 symmetric hashing · e069ba07
      Aaron Conole authored
      Since its introduction, the ovs module execute_hash action allowed
      hash algorithms other than the skb->l4_hash to be used.  However,
      additional hash algorithms were not implemented.  This means flows
      requiring different hash distributions weren't able to use the
      kernel datapath.
      
      Now, introduce support for symmetric hashing algorithm as an
      alternative hash supported by the ovs module using the flow
      dissector.
      
      Output of flow using l4_sym hash:
      
          recirc_id(0),in_port(3),eth(),eth_type(0x0800),
          ipv4(dst=64.0.0.0/192.0.0.0,proto=6,frag=no), packets:30473425,
          bytes:45902883702, used:0.000s, flags:SP.,
          actions:hash(sym_l4(0)),recirc(0xd)
      
      Some performance testing with no GRO/GSO, two veths, single flow:
      
          hash(l4(0)):      4.35 GBits/s
          hash(l4_sym(0)):  4.24 GBits/s
      Signed-off-by: default avatarAaron Conole <aconole@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e069ba07