1. 27 Oct, 2022 8 commits
  2. 26 Oct, 2022 20 commits
  3. 25 Oct, 2022 12 commits
    • Jakub Kicinski's avatar
      Merge branch 'net-lan743x-pci11010-pci11414-devices-enhancements' · a264228c
      Jakub Kicinski authored
      Raju Lakkaraju says:
      
      ====================
      net: lan743x: PCI11010 / PCI11414 devices Enhancements
      
      This patch series continues with the addition of supported features for the
      Ethernet function of the PCI11010 / PCI11414 devices to the LAN743x driver.
      ====================
      
      Link: https://lore.kernel.org/r/20221024082516.661199-1-Raju.Lakkaraju@microchip.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a264228c
    • Raju Lakkaraju's avatar
      net: phy: micrel: Add PHY Auto/MDI/MDI-X set driver for KSZ9131 · b64e6a87
      Raju Lakkaraju authored
      Add support for MDI-X status and configuration for KSZ9131 chips
      Signed-off-by: default avatarRaju Lakkaraju <Raju.Lakkaraju@microchip.com>
      Reviewed-by: default avatarHoratiu Vultur <horatiu.vultur@microchip.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b64e6a87
    • Raju Lakkaraju's avatar
      net: lan743x: Add support for get_pauseparam and set_pauseparam · cdc04540
      Raju Lakkaraju authored
      Add pause get and set functions
      Signed-off-by: default avatarRaju Lakkaraju <Raju.Lakkaraju@microchip.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cdc04540
    • Kees Cook's avatar
      net: dev: Convert sa_data to flexible array in struct sockaddr · b5f0de6d
      Kees Cook authored
      One of the worst offenders of "fake flexible arrays" is struct sockaddr,
      as it is the classic example of why GCC and Clang have been traditionally
      forced to treat all trailing arrays as fake flexible arrays: in the
      distant misty past, sa_data became too small, and code started just
      treating it as a flexible array, even though it was fixed-size. The
      special case by the compiler is specifically that sizeof(sa->sa_data)
      and FORTIFY_SOURCE (which uses __builtin_object_size(sa->sa_data, 1))
      do not agree (14 and -1 respectively), which makes FORTIFY_SOURCE treat
      it as a flexible array.
      
      However, the coming -fstrict-flex-arrays compiler flag will remove
      these special cases so that FORTIFY_SOURCE can gain coverage over all
      the trailing arrays in the kernel that are _not_ supposed to be treated
      as a flexible array. To deal with this change, convert sa_data to a true
      flexible array. To keep the structure size the same, move sa_data into
      a union with a newly introduced sa_data_min with the original size. The
      result is that FORTIFY_SOURCE can continue to have no idea how large
      sa_data may actually be, but anything using sizeof(sa->sa_data) must
      switch to sizeof(sa->sa_data_min).
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Pavel Begunkov <asml.silence@gmail.com>
      Cc: David Ahern <dsahern@kernel.org>
      Cc: Dylan Yudaken <dylany@fb.com>
      Cc: Yajun Deng <yajun.deng@linux.dev>
      Cc: Petr Machata <petrm@nvidia.com>
      Cc: Hangbin Liu <liuhangbin@gmail.com>
      Cc: Leon Romanovsky <leon@kernel.org>
      Cc: syzbot <syzkaller@googlegroups.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20221018095503.never.671-kees@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b5f0de6d
    • Kees Cook's avatar
      bnx2: Use kmalloc_size_roundup() to match ksize() usage · d6dd5080
      Kees Cook authored
      Round up allocations with kmalloc_size_roundup() so that build_skb()'s
      use of ksize() is always accurate and no special handling of the memory
      is needed by KASAN, UBSAN_BOUNDS, nor FORTIFY_SOURCE.
      
      Cc: Rasesh Mody <rmody@marvell.com>
      Cc: GR-Linux-NIC-Dev@marvell.com
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: netdev@vger.kernel.org
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20221022021004.gonna.489-kees@kernel.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      d6dd5080
    • Paolo Abeni's avatar
      Merge branch 'mptcp-socket-option-updates' · 6459838a
      Paolo Abeni authored
      Mat Martineau says:
      
      ====================
      mptcp: Socket option updates
      
      Patches 1 and 3 refactor a recent socket option helper function for more
      generic use, and make use of it in a couple of places.
      
      Patch 2 adds TCP_FASTOPEN_NO_COOKIE functionality to MPTCP sockets,
      similar to TCP_FASTOPEN_CONNECT support recently added in v6.1
      ====================
      
      Link: https://lore.kernel.org/r/20221022004505.160988-1-mathew.j.martineau@linux.intel.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      6459838a
    • Matthieu Baerts's avatar
      mptcp: sockopt: use new helper for TCP_DEFER_ACCEPT · caea6467
      Matthieu Baerts authored
      mptcp_setsockopt_sol_tcp_defer() was doing the same thing as
      mptcp_setsockopt_first_sf_only() except for the returned code in case of
      error.
      
      Ignoring the error is needed to mimic how TCP_DEFER_ACCEPT is handled
      when used with "plain" TCP sockets.
      
      The specific function for TCP_DEFER_ACCEPT can be replaced by the new
      mptcp_setsockopt_first_sf_only() helper and errors can be ignored to
      stay compatible with TCP. A bit of cleanup.
      Suggested-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      caea6467
    • Matthieu Baerts's avatar
      mptcp: add TCP_FASTOPEN_NO_COOKIE support · e64d4deb
      Matthieu Baerts authored
      The goal of this socket option is to configure MPTCP + TFO without
      cookie per socket.
      
      It was already possible to enable TFO without a cookie per netns by
      setting net.ipv4.tcp_fastopen sysctl knob to the right value. Per route
      was also supported by setting 'fastopen_no_cookie' option. This patch
      adds a per socket support like it is possible to do with TCP thanks to
      TCP_FASTOPEN_NO_COOKIE socket option.
      
      The only thing to do here is to relay the request to the first subflow
      like it is already done for TCP_FASTOPEN_CONNECT.
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      e64d4deb
    • Matthieu Baerts's avatar
      mptcp: sockopt: make 'tcp_fastopen_connect' generic · d3d42904
      Matthieu Baerts authored
      There are other socket options that need to act only on the first
      subflow, e.g. all TCP_FASTOPEN* socket options.
      
      This is similar to the getsockopt version.
      
      In the next commit, this new mptcp_setsockopt_first_sf_only() helper is
      used by other another option.
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      d3d42904
    • Paolo Abeni's avatar
      Merge branch 'soreuseport-fix-broken-so_incoming_cpu' · 818a2604
      Paolo Abeni authored
      Kuniyuki Iwashima says:
      
      ====================
      soreuseport: Fix broken SO_INCOMING_CPU.
      
      setsockopt(SO_INCOMING_CPU) for UDP/TCP is broken since 4.5/4.6 due to
      these commits:
      
        * e32ea7e7 ("soreuseport: fast reuseport UDP socket selection")
        * c125e80b ("soreuseport: fast reuseport TCP socket selection")
      
      These commits introduced the O(1) socket selection algorithm and removed
      O(n) iteration over the list, but it ignores the score calculated by
      compute_score().  As a result, it caused two misbehaviours:
      
        * Unconnected sockets receive packets sent to connected sockets
        * SO_INCOMING_CPU does not work
      
      The former is fixed by commit acdcecc6 ("udp: correct reuseport
      selection with connected sockets").  This series fixes the latter and
      adds some tests for SO_INCOMING_CPU.
      ====================
      
      Link: https://lore.kernel.org/r/20221021204435.4259-1-kuniyu@amazon.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      818a2604
    • Kuniyuki Iwashima's avatar
      selftest: Add test for SO_INCOMING_CPU. · 6df96146
      Kuniyuki Iwashima authored
      Some highly optimised applications use SO_INCOMING_CPU to make them
      efficient, but they didn't test if it's working correctly by getsockopt()
      to avoid slowing down.  As a result, no one noticed it had been broken
      for years, so it's a good time to add a test to catch future regression.
      
      The test does
      
        1) Create $(nproc) TCP listeners associated with each CPU.
      
        2) Create 32 child sockets for each listener by calling
           sched_setaffinity() for each CPU.
      
        3) Check if accept()ed sockets' sk_incoming_cpu matches
           listener's one.
      
      If we see -EAGAIN, SO_INCOMING_CPU is broken.  However, we might not see
      any error even if broken; the kernel could miraculously distribute all SYN
      to correct listeners.  Not to let that happen, we must increase the number
      of clients and CPUs to some extent, so the test requires $(nproc) >= 2 and
      creates 64 sockets at least.
      
      Test:
        $ nproc
        96
        $ ./so_incoming_cpu
      
      Before the previous patch:
      
        # Starting 12 tests from 5 test cases.
        #  RUN           so_incoming_cpu.before_reuseport.test1 ...
        # so_incoming_cpu.c:191:test1:Expected cpu (5) == i (0)
        # test1: Test terminated by assertion
        #          FAIL  so_incoming_cpu.before_reuseport.test1
        not ok 1 so_incoming_cpu.before_reuseport.test1
        ...
        # FAILED: 0 / 12 tests passed.
        # Totals: pass:0 fail:12 xfail:0 xpass:0 skip:0 error:0
      
      After:
      
        # Starting 12 tests from 5 test cases.
        #  RUN           so_incoming_cpu.before_reuseport.test1 ...
        # so_incoming_cpu.c:199:test1:SO_INCOMING_CPU is very likely to be working correctly with 3072 sockets.
        #            OK  so_incoming_cpu.before_reuseport.test1
        ok 1 so_incoming_cpu.before_reuseport.test1
        ...
        # PASSED: 12 / 12 tests passed.
        # Totals: pass:12 fail:0 xfail:0 xpass:0 skip:0 error:0
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      6df96146
    • Kuniyuki Iwashima's avatar
      soreuseport: Fix socket selection for SO_INCOMING_CPU. · b261eda8
      Kuniyuki Iwashima authored
      Kazuho Oku reported that setsockopt(SO_INCOMING_CPU) does not work
      with setsockopt(SO_REUSEPORT) since v4.6.
      
      With the combination of SO_REUSEPORT and SO_INCOMING_CPU, we could
      build a highly efficient server application.
      
      setsockopt(SO_INCOMING_CPU) associates a CPU with a TCP listener
      or UDP socket, and then incoming packets processed on the CPU will
      likely be distributed to the socket.  Technically, a socket could
      even receive packets handled on another CPU if no sockets in the
      reuseport group have the same CPU receiving the flow.
      
      The logic exists in compute_score() so that a socket will get a higher
      score if it has the same CPU with the flow.  However, the score gets
      ignored after the blamed two commits, which introduced a faster socket
      selection algorithm for SO_REUSEPORT.
      
      This patch introduces a counter of sockets with SO_INCOMING_CPU in
      a reuseport group to check if we should iterate all sockets to find
      a proper one.  We increment the counter when
      
        * calling listen() if the socket has SO_INCOMING_CPU and SO_REUSEPORT
      
        * enabling SO_INCOMING_CPU if the socket is in a reuseport group
      
      Also, we decrement it when
      
        * detaching a socket out of the group to apply SO_INCOMING_CPU to
          migrated TCP requests
      
        * disabling SO_INCOMING_CPU if the socket is in a reuseport group
      
      When the counter reaches 0, we can get back to the O(1) selection
      algorithm.
      
      The overall changes are negligible for the non-SO_INCOMING_CPU case,
      and the only notable thing is that we have to update sk_incomnig_cpu
      under reuseport_lock.  Otherwise, the race prevents transitioning to
      the O(n) algorithm and results in the wrong socket selection.
      
       cpu1 (setsockopt)               cpu2 (listen)
      +-----------------+             +-------------+
      
      lock_sock(sk1)                  lock_sock(sk2)
      
      reuseport_update_incoming_cpu(sk1, val)
      .
      |  /* set CPU as 0 */
      |- WRITE_ONCE(sk1->incoming_cpu, val)
      |
      |                               spin_lock_bh(&reuseport_lock)
      |                               reuseport_grow(sk2, reuse)
      |                               .
      |                               |- more_socks_size = reuse->max_socks * 2U;
      |                               |- if (more_socks_size > U16_MAX &&
      |                               |       reuse->num_closed_socks)
      |                               |  .
      |                               |  |- RCU_INIT_POINTER(sk1->sk_reuseport_cb, NULL);
      |                               |  `- __reuseport_detach_closed_sock(sk1, reuse)
      |                               |     .
      |                               |     `- reuseport_put_incoming_cpu(sk1, reuse)
      |                               |        .
      |                               |        |  /* Read shutdown()ed sk1's sk_incoming_cpu
      |                               |        |   * without lock_sock().
      |                               |        |   */
      |                               |        `- if (sk1->sk_incoming_cpu >= 0)
      |                               |           .
      |                               |           |  /* decrement not-yet-incremented
      |                               |           |   * count, which is never incremented.
      |                               |           |   */
      |                               |           `- __reuseport_put_incoming_cpu(reuse);
      |                               |
      |                               `- spin_lock_bh(&reuseport_lock)
      |
      |- spin_lock_bh(&reuseport_lock)
      |
      |- reuse = rcu_dereference_protected(sk1->sk_reuseport_cb, ...)
      |- if (!reuse)
      |  .
      |  |  /* Cannot increment reuse->incoming_cpu. */
      |  `- goto out;
      |
      `- spin_unlock_bh(&reuseport_lock)
      
      Fixes: e32ea7e7 ("soreuseport: fast reuseport UDP socket selection")
      Fixes: c125e80b ("soreuseport: fast reuseport TCP socket selection")
      Reported-by: default avatarKazuho Oku <kazuhooku@gmail.com>
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      b261eda8