1. 29 Aug, 2024 23 commits
    • Jakub Kicinski's avatar
      MAINTAINERS: exclude bluetooth and wireless DT bindings from netdev ML · b57d643a
      Jakub Kicinski authored
      We exclude wireless drivers from the netdev@ traffic, to delegate
      it to linux-wireless@, and avoid overwhelming netdev@.
      Bluetooth drivers are implicitly excluded because they live under
      drivers/bluetooth, not drivers/net.
      
      In both cases DT bindings sit under Documentation/devicetree/bindings/net/
      and aren't excluded. So if a patch series touches DT bindings
      netdev@ ends up getting CCed, and these are usually fairly boring
      series.
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://patch.msgid.link/20240828175821.2960423-1-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b57d643a
    • Linus Torvalds's avatar
      Merge tag 'net-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 0dd5dd63
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from bluetooth, wireless and netfilter.
      
        No known outstanding regressions.
      
        Current release - regressions:
      
         - wifi: iwlwifi: fix hibernation
      
         - eth: ionic: prevent tx_timeout due to frequent doorbell ringing
      
        Previous releases - regressions:
      
         - sched: fix sch_fq incorrect behavior for small weights
      
         - wifi:
            - iwlwifi: take the mutex before running link selection
            - wfx: repair open network AP mode
      
         - netfilter: restore IP sanity checks for netdev/egress
      
         - tcp: fix forever orphan socket caused by tcp_abort
      
         - mptcp: close subflow when receiving TCP+FIN
      
         - bluetooth: fix random crash seen while removing btnxpuart driver
      
        Previous releases - always broken:
      
         - mptcp: more fixes for the in-kernel PM
      
         - eth: bonding: change ipsec_lock from spin lock to mutex
      
         - eth: mana: fix race of mana_hwc_post_rx_wqe and new hwc response
      
        Misc:
      
         - documentation: drop special comment style for net code"
      
      * tag 'net-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits)
        nfc: pn533: Add poll mod list filling check
        mailmap: update entry for Sriram Yagnaraman
        selftests: mptcp: join: check re-re-adding ID 0 signal
        mptcp: pm: ADD_ADDR 0 is not a new address
        selftests: mptcp: join: validate event numbers
        mptcp: avoid duplicated SUB_CLOSED events
        selftests: mptcp: join: check re-re-adding ID 0 endp
        mptcp: pm: fix ID 0 endp usage after multiple re-creations
        mptcp: pm: do not remove already closed subflows
        selftests: mptcp: join: no extra msg if no counter
        selftests: mptcp: join: check re-adding init endp with != id
        mptcp: pm: reset MPC endp ID when re-added
        mptcp: pm: skip connecting to already established sf
        mptcp: pm: send ACK on an active subflow
        selftests: mptcp: join: check removing ID 0 endpoint
        mptcp: pm: fix RM_ADDR ID for the initial subflow
        mptcp: pm: reuse ID 0 after delete and re-add
        net: busy-poll: use ktime_get_ns() instead of local_clock()
        sctp: fix association labeling in the duplicate COOKIE-ECHO case
        mptcp: pr_debug: add missing \n at the end
        ...
      0dd5dd63
    • Aleksandr Mishin's avatar
      nfc: pn533: Add poll mod list filling check · febccb39
      Aleksandr Mishin authored
      In case of im_protocols value is 1 and tm_protocols value is 0 this
      combination successfully passes the check
      'if (!im_protocols && !tm_protocols)' in the nfc_start_poll().
      But then after pn533_poll_create_mod_list() call in pn533_start_poll()
      poll mod list will remain empty and dev->poll_mod_count will remain 0
      which lead to division by zero.
      
      Normally no im protocol has value 1 in the mask, so this combination is
      not expected by driver. But these protocol values actually come from
      userspace via Netlink interface (NFC_CMD_START_POLL operation). So a
      broken or malicious program may pass a message containing a "bad"
      combination of protocol parameter values so that dev->poll_mod_count
      is not incremented inside pn533_poll_create_mod_list(), thus leading
      to division by zero.
      Call trace looks like:
      nfc_genl_start_poll()
        nfc_start_poll()
          ->start_poll()
          pn533_start_poll()
      
      Add poll mod list filling check.
      
      Found by Linux Verification Center (linuxtesting.org) with SVACE.
      
      Fixes: dfccd0f5 ("NFC: pn533: Add some polling entropy")
      Signed-off-by: default avatarAleksandr Mishin <amishin@t-argos.ru>
      Acked-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
      Link: https://patch.msgid.link/20240827084822.18785-1-amishin@t-argos.ruSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      febccb39
    • Paolo Abeni's avatar
      Merge tag 'nf-24-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · 0240bceb
      Paolo Abeni authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      Patch #1 sets on NFT_PKTINFO_L4PROTO for UDP packets less than 4 bytes
      payload from netdev/egress by subtracting skb_network_offset() when
      validating IPv4 packet length, otherwise 'meta l4proto udp' never
      matches.
      
      Patch #2 subtracts skb_network_offset() when validating IPv6 packet
      length for netdev/egress.
      
      netfilter pull request 24-08-28
      
      * tag 'nf-24-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        netfilter: nf_tables_ipv6: consider network offset in netdev/egress validation
        netfilter: nf_tables: restore IP sanity checks for netdev/egress
      ====================
      
      Link: https://patch.msgid.link/20240828214708.619261-1-pablo@netfilter.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      0240bceb
    • Sriram Yagnaraman's avatar
    • Paolo Abeni's avatar
      Merge branch 'mptcp-more-fixes-for-the-in-kernel-pm' · b666a651
      Paolo Abeni authored
      Matthieu Baerts says:
      
      ====================
      mptcp: more fixes for the in-kernel PM
      
      Here is a new batch of fixes for the MPTCP in-kernel path-manager:
      
      Patch 1 ensures the address ID is set to 0 when the path-manager sends
      an ADD_ADDR for the address of the initial subflow. The same fix is
      applied when a new subflow is created re-using this special address. A
      fix for v6.0.
      
      Patch 2 is similar, but for the case where an endpoint is removed: if
      this endpoint was used for the initial address, it is important to send
      a RM_ADDR with this ID set to 0, and look for existing subflows with the
      ID set to 0. A fix for v6.0 as well.
      
      Patch 3 validates the two previous patches.
      
      Patch 4 makes the PM selecting an "active" path to send an address
      notification in an ACK, instead of taking the first path in the list. A
      fix for v5.11.
      
      Patch 5 fixes skipping the establishment of a new subflow if a previous
      subflow using the same pair of addresses is being closed. A fix for
      v5.13.
      
      Patch 6 resets the ID linked to the initial subflow when the linked
      endpoint is re-added, possibly with a different ID. A fix for v6.0.
      
      Patch 7 validates the three previous patches.
      
      Patch 8 is a small fix for the MPTCP Join selftest, when being used with
      older subflows not supporting all MIB counters. A fix for a commit
      introduced in v6.4, but backported up to v5.10.
      
      Patch 9 avoids the PM to try to close the initial subflow multiple
      times, and increment counters while nothing happened. A fix for v5.10.
      
      Patch 10 stops incrementing local_addr_used and add_addr_accepted
      counters when dealing with the address ID 0, because these counters are
      not taking into account the initial subflow, and are then not
      decremented when the linked addresses are removed. A fix for v6.0.
      
      Patch 11 validates the previous patch.
      
      Patch 12 avoids the PM to send multiple SUB_CLOSED events for the
      initial subflow. A fix for v5.12.
      
      Patch 13 validates the previous patch.
      
      Patch 14 stops treating the ADD_ADDR 0 as a new address, and accepts it
      in order to re-create the initial subflow if it has been closed, even if
      the limit for *new* addresses -- not taking into account the address of
      the initial subflow -- has been reached. A fix for v5.10.
      
      Patch 15 validates the previous patch.
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      ---
      Matthieu Baerts (NGI0) (15):
            mptcp: pm: reuse ID 0 after delete and re-add
            mptcp: pm: fix RM_ADDR ID for the initial subflow
            selftests: mptcp: join: check removing ID 0 endpoint
            mptcp: pm: send ACK on an active subflow
            mptcp: pm: skip connecting to already established sf
            mptcp: pm: reset MPC endp ID when re-added
            selftests: mptcp: join: check re-adding init endp with != id
            selftests: mptcp: join: no extra msg if no counter
            mptcp: pm: do not remove already closed subflows
            mptcp: pm: fix ID 0 endp usage after multiple re-creations
            selftests: mptcp: join: check re-re-adding ID 0 endp
            mptcp: avoid duplicated SUB_CLOSED events
            selftests: mptcp: join: validate event numbers
            mptcp: pm: ADD_ADDR 0 is not a new address
            selftests: mptcp: join: check re-re-adding ID 0 signal
      
       net/mptcp/pm.c                                  |   4 +-
       net/mptcp/pm_netlink.c                          |  87 ++++++++++----
       net/mptcp/protocol.c                            |   6 +
       net/mptcp/protocol.h                            |   5 +-
       tools/testing/selftests/net/mptcp/mptcp_join.sh | 153 ++++++++++++++++++++----
       tools/testing/selftests/net/mptcp/mptcp_lib.sh  |   4 +
       6 files changed, 209 insertions(+), 50 deletions(-)
      ---
      base-commit: 3a0504d5
      change-id: 20240826-net-mptcp-more-pm-fix-ffa61a36f817
      
      Best regards,
      ====================
      
      Link: https://patch.msgid.link/20240828-net-mptcp-more-pm-fix-v2-0-7f11b283fff7@kernel.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      b666a651
    • Matthieu Baerts (NGI0)'s avatar
      selftests: mptcp: join: check re-re-adding ID 0 signal · f18fa2ab
      Matthieu Baerts (NGI0) authored
      This test extends "delete re-add signal" to validate the previous
      commit: when the 'signal' endpoint linked to the initial subflow (ID 0)
      is re-added multiple times, it will re-send the ADD_ADDR with id 0. The
      client should still be able to re-create this subflow, even if the
      add_addr_accepted limit has been reached as this special address is not
      considered as a new address.
      
      The 'Fixes' tag here below is the same as the one from the previous
      commit: this patch here is not fixing anything wrong in the selftests,
      but it validates the previous fix for an issue introduced by this commit
      ID.
      
      Fixes: d0876b22 ("mptcp: add the incoming RM_ADDR support")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      f18fa2ab
    • Matthieu Baerts (NGI0)'s avatar
      mptcp: pm: ADD_ADDR 0 is not a new address · 57f86203
      Matthieu Baerts (NGI0) authored
      The ADD_ADDR 0 with the address from the initial subflow should not be
      considered as a new address: this is not something new. If the host
      receives it, it simply means that the address is available again.
      
      When receiving an ADD_ADDR for the ID 0, the PM already doesn't consider
      it as new by not incrementing the 'add_addr_accepted' counter. But the
      'accept_addr' might not be set if the limit has already been reached:
      this can be bypassed in this case. But before, it is important to check
      that this ADD_ADDR for the ID 0 is for the same address as the initial
      subflow. If not, it is not something that should happen, and the
      ADD_ADDR can be ignored.
      
      Note that if an ADD_ADDR is received while there is already a subflow
      opened using the same address, this ADD_ADDR is ignored as well. It
      means that if multiple ADD_ADDR for ID 0 are received, there will not be
      any duplicated subflows created by the client.
      
      Fixes: d0876b22 ("mptcp: add the incoming RM_ADDR support")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      57f86203
    • Matthieu Baerts (NGI0)'s avatar
      selftests: mptcp: join: validate event numbers · 20ccc7c5
      Matthieu Baerts (NGI0) authored
      This test extends "delete and re-add" and "delete re-add signal" to
      validate the previous commit: the number of MPTCP events are checked to
      make sure there are no duplicated or unexpected ones.
      
      A new helper has been introduced to easily check these events. The
      missing events have been added to the lib.
      
      The 'Fixes' tag here below is the same as the one from the previous
      commit: this patch here is not fixing anything wrong in the selftests,
      but it validates the previous fix for an issue introduced by this commit
      ID.
      
      Fixes: b911c97c ("mptcp: add netlink event support")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      20ccc7c5
    • Matthieu Baerts (NGI0)'s avatar
      mptcp: avoid duplicated SUB_CLOSED events · d82809b6
      Matthieu Baerts (NGI0) authored
      The initial subflow might have already been closed, but still in the
      connection list. When the worker is instructed to close the subflows
      that have been marked as closed, it might then try to close the initial
      subflow again.
      
       A consequence of that is that the SUB_CLOSED event can be seen twice:
      
        # ip mptcp endpoint
        1.1.1.1 id 1 subflow dev eth0
        2.2.2.2 id 2 subflow dev eth1
      
        # ip mptcp monitor &
        [         CREATED] remid=0 locid=0 saddr4=1.1.1.1 daddr4=9.9.9.9
        [     ESTABLISHED] remid=0 locid=0 saddr4=1.1.1.1 daddr4=9.9.9.9
        [  SF_ESTABLISHED] remid=0 locid=2 saddr4=2.2.2.2 daddr4=9.9.9.9
      
        # ip mptcp endpoint delete id 1
        [       SF_CLOSED] remid=0 locid=0 saddr4=1.1.1.1 daddr4=9.9.9.9
        [       SF_CLOSED] remid=0 locid=0 saddr4=1.1.1.1 daddr4=9.9.9.9
      
      The first one is coming from mptcp_pm_nl_rm_subflow_received(), and the
      second one from __mptcp_close_subflow().
      
      To avoid doing the post-closed processing twice, the subflow is now
      marked as closed the first time.
      
      Note that it is not enough to check if we are dealing with the first
      subflow and check its sk_state: the subflow might have been reset or
      closed before calling mptcp_close_ssk().
      
      Fixes: b911c97c ("mptcp: add netlink event support")
      Cc: stable@vger.kernel.org
      Tested-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      d82809b6
    • Matthieu Baerts (NGI0)'s avatar
      selftests: mptcp: join: check re-re-adding ID 0 endp · d397d724
      Matthieu Baerts (NGI0) authored
      This test extends "delete and re-add" to validate the previous commit:
      when the endpoint linked to the initial subflow (ID 0) is re-added
      multiple times, it was no longer being used, because the internal linked
      counters are not decremented for this special endpoint: it is not an
      additional endpoint.
      
      Here, the "del/add id 0" steps are done 3 times to unsure this case is
      validated.
      
      The 'Fixes' tag here below is the same as the one from the previous
      commit: this patch here is not fixing anything wrong in the selftests,
      but it validates the previous fix for an issue introduced by this commit
      ID.
      
      Fixes: 3ad14f54 ("mptcp: more accurate MPC endpoint tracking")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      d397d724
    • Matthieu Baerts (NGI0)'s avatar
      mptcp: pm: fix ID 0 endp usage after multiple re-creations · 9366922a
      Matthieu Baerts (NGI0) authored
      'local_addr_used' and 'add_addr_accepted' are decremented for addresses
      not related to the initial subflow (ID0), because the source and
      destination addresses of the initial subflows are known from the
      beginning: they don't count as "additional local address being used" or
      "ADD_ADDR being accepted".
      
      It is then required not to increment them when the entrypoint used by
      the initial subflow is removed and re-added during a connection. Without
      this modification, this entrypoint cannot be removed and re-added more
      than once.
      Reported-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/512
      Fixes: 3ad14f54 ("mptcp: more accurate MPC endpoint tracking")
      Reported-by: syzbot+455d38ecd5f655fc45cf@syzkaller.appspotmail.com
      Closes: https://lore.kernel.org/00000000000049861306209237f4@google.com
      Cc: stable@vger.kernel.org
      Tested-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      9366922a
    • Matthieu Baerts (NGI0)'s avatar
      mptcp: pm: do not remove already closed subflows · 58e1b66b
      Matthieu Baerts (NGI0) authored
      It is possible to have in the list already closed subflows, e.g. the
      initial subflow has been already closed, but still in the list. No need
      to try to close it again, and increments the related counters again.
      
      Fixes: 0ee4261a ("mptcp: implement mptcp_pm_remove_subflow")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      58e1b66b
    • Matthieu Baerts (NGI0)'s avatar
      selftests: mptcp: join: no extra msg if no counter · 76a2d839
      Matthieu Baerts (NGI0) authored
      The checksum and fail counters might not be available. Then no need to
      display an extra message with missing info.
      
      While at it, fix the indentation around, which is wrong since the same
      commit.
      
      Fixes: 47867f0a ("selftests: mptcp: join: skip check if MIB counter not supported")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarGeliang Tang <geliang@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      76a2d839
    • Matthieu Baerts (NGI0)'s avatar
      selftests: mptcp: join: check re-adding init endp with != id · 1c2326fc
      Matthieu Baerts (NGI0) authored
      The initial subflow has a special local ID: 0. It is specific per
      connection.
      
      When a global endpoint is deleted and re-added later, it can have a
      different ID, but the kernel should still use the ID 0 if it corresponds
      to the initial address.
      
      This test validates this behaviour: the endpoint linked to the initial
      subflow is removed, and re-added with a different ID.
      
      Note that removing the initial subflow will not decrement the 'subflows'
      counters, which corresponds to the *additional* subflows. On the other
      hand, when the same endpoint is re-added, it will increment this
      counter, as it will be seen as an additional subflow this time.
      
      The 'Fixes' tag here below is the same as the one from the previous
      commit: this patch here is not fixing anything wrong in the selftests,
      but it validates the previous fix for an issue introduced by this commit
      ID.
      
      Fixes: 3ad14f54 ("mptcp: more accurate MPC endpoint tracking")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      1c2326fc
    • Matthieu Baerts (NGI0)'s avatar
      mptcp: pm: reset MPC endp ID when re-added · dce1c6d1
      Matthieu Baerts (NGI0) authored
      The initial subflow has a special local ID: 0. It is specific per
      connection.
      
      When a global endpoint is deleted and re-added later, it can have a
      different ID -- most services managing the endpoints automatically don't
      force the ID to be the same as before. It is then important to track
      these modifications to be consistent with the ID being used for the
      address used by the initial subflow, not to confuse the other peer or to
      send the ID 0 for the wrong address.
      
      Now when removing an endpoint, msk->mpc_endpoint_id is reset if it
      corresponds to this endpoint. When adding a new endpoint, the same
      variable is updated if the address match the one of the initial subflow.
      
      Fixes: 3ad14f54 ("mptcp: more accurate MPC endpoint tracking")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      dce1c6d1
    • Matthieu Baerts (NGI0)'s avatar
      mptcp: pm: skip connecting to already established sf · bc19ff57
      Matthieu Baerts (NGI0) authored
      The lookup_subflow_by_daddr() helper checks if there is already a
      subflow connected to this address. But there could be a subflow that is
      closing, but taking time due to some reasons: latency, losses, data to
      process, etc.
      
      If an ADD_ADDR is received while the endpoint is being closed, it is
      better to try connecting to it, instead of rejecting it: the peer which
      has sent the ADD_ADDR will not be notified that the ADD_ADDR has been
      rejected for this reason, and the expected subflow will not be created
      at the end.
      
      This helper should then only look for subflows that are established, or
      going to be, but not the ones being closed.
      
      Fixes: d84ad049 ("mptcp: skip connecting the connected address")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      bc19ff57
    • Matthieu Baerts (NGI0)'s avatar
      mptcp: pm: send ACK on an active subflow · c07cc3ed
      Matthieu Baerts (NGI0) authored
      Taking the first one on the list doesn't work in some cases, e.g. if the
      initial subflow is being removed. Pick another one instead of not
      sending anything.
      
      Fixes: 84dfe367 ("mptcp: send out dedicated ADD_ADDR packet")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      c07cc3ed
    • Matthieu Baerts (NGI0)'s avatar
      selftests: mptcp: join: check removing ID 0 endpoint · 5f94b08c
      Matthieu Baerts (NGI0) authored
      Removing the endpoint linked to the initial subflow should trigger a
      RM_ADDR for the right ID, and the removal of the subflow. That's what is
      now being verified in the "delete and re-add" test.
      
      Note that removing the initial subflow will not decrement the 'subflows'
      counters, which corresponds to the *additional* subflows. On the other
      hand, when the same endpoint is re-added, it will increment this
      counter, as it will be seen as an additional subflow this time.
      
      The 'Fixes' tag here below is the same as the one from the previous
      commit: this patch here is not fixing anything wrong in the selftests,
      but it validates the previous fix for an issue introduced by this commit
      ID.
      
      Fixes: 3ad14f54 ("mptcp: more accurate MPC endpoint tracking")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      5f94b08c
    • Matthieu Baerts (NGI0)'s avatar
      mptcp: pm: fix RM_ADDR ID for the initial subflow · 87b5896f
      Matthieu Baerts (NGI0) authored
      The initial subflow has a special local ID: 0. When an endpoint is being
      deleted, it is then important to check if its address is not linked to
      the initial subflow to send the right ID.
      
      If there was an endpoint linked to the initial subflow, msk's
      mpc_endpoint_id field will be set. We can then use this info when an
      endpoint is being removed to see if it is linked to the initial subflow.
      
      So now, the correct IDs are passed to mptcp_pm_nl_rm_addr_or_subflow(),
      it is no longer needed to use mptcp_local_id_match().
      
      Fixes: 3ad14f54 ("mptcp: more accurate MPC endpoint tracking")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      87b5896f
    • Matthieu Baerts (NGI0)'s avatar
      mptcp: pm: reuse ID 0 after delete and re-add · 8b8ed1b4
      Matthieu Baerts (NGI0) authored
      When the endpoint used by the initial subflow is removed and re-added
      later, the PM has to force the ID 0, it is a special case imposed by the
      MPTCP specs.
      
      Note that the endpoint should then need to be re-added reusing the same
      ID.
      
      Fixes: 3ad14f54 ("mptcp: more accurate MPC endpoint tracking")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      8b8ed1b4
    • Linus Torvalds's avatar
      Merge tag 'random-6.11-rc6-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random · d5d547aa
      Linus Torvalds authored
      Pull random number generator fix from Jason Donenfeld:
       "Reject invalid flags passed to vgetrandom() in the same way that
        getrandom() does, so that the behavior is the same, from Yann.
      
        The flags argument to getrandom() only has a behavioral effect on the
        function if the RNG isn't initialized yet, so vgetrandom() falls back
        to the syscall in that case. But if the RNG is initialized, all of the
        flags behave the same way, so vgetrandom() didn't bother checking
        them, and just ignored them entirely.
      
        But that doesn't account for invalid flags passed in, which need to be
        rejected so we can use them later"
      
      * tag 'random-6.11-rc6-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
        random: vDSO: reject unknown getrandom() flags
      d5d547aa
    • Eric Dumazet's avatar
      net: busy-poll: use ktime_get_ns() instead of local_clock() · 0870b0d8
      Eric Dumazet authored
      Typically, busy-polling durations are below 100 usec.
      
      When/if the busy-poller thread migrates to another cpu,
      local_clock() can be off by +/-2msec or more for small
      values of HZ, depending on the platform.
      
      Use ktimer_get_ns() to ensure deterministic behavior,
      which is the whole point of busy-polling.
      
      Fixes: 06021292 ("net: add low latency socket poll")
      Fixes: 9a3c71aa ("net: convert low latency sockets to sched_clock()")
      Fixes: 37089834 ("sched, net: Fixup busy_loop_us_clock()")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarJoe Damato <jdamato@fastly.com>
      Link: https://patch.msgid.link/20240827114916.223377-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0870b0d8
  2. 28 Aug, 2024 7 commits
  3. 27 Aug, 2024 10 commits
    • Ondrej Mosnacek's avatar
      sctp: fix association labeling in the duplicate COOKIE-ECHO case · 3a0504d5
      Ondrej Mosnacek authored
      sctp_sf_do_5_2_4_dupcook() currently calls security_sctp_assoc_request()
      on new_asoc, but as it turns out, this association is always discarded
      and the LSM labels never get into the final association (asoc).
      
      This can be reproduced by having two SCTP endpoints try to initiate an
      association with each other at approximately the same time and then peel
      off the association into a new socket, which exposes the unitialized
      labels and triggers SELinux denials.
      
      Fix it by calling security_sctp_assoc_request() on asoc instead of
      new_asoc. Xin Long also suggested limit calling the hook only to cases
      A, B, and D, since in cases C and E the COOKIE ECHO chunk is discarded
      and the association doesn't enter the ESTABLISHED state, so rectify that
      as well.
      
      One related caveat with SELinux and peer labeling: When an SCTP
      connection is set up simultaneously in this way, we will end up with an
      association that is initialized with security_sctp_assoc_request() on
      both sides, so the MLS component of the security context of the
      association will get swapped between the peers, instead of just one side
      setting it to the other's MLS component. However, at that point
      security_sctp_assoc_request() had already been called on both sides in
      sctp_sf_do_unexpected_init() (on a temporary association) and thus if
      the exchange didn't fail before due to MLS, it won't fail now either
      (most likely both endpoints have the same MLS range).
      
      Tested by:
       - reproducer from https://src.fedoraproject.org/tests/selinux/pull-request/530
       - selinux-testsuite (https://github.com/SELinuxProject/selinux-testsuite/)
       - sctp-tests (https://github.com/sctp/sctp-tests) - no tests failed
         that wouldn't fail also without the patch applied
      
      Fixes: c081d53f ("security: pass asoc to sctp_assoc_request and sctp_sk_clone")
      Suggested-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarOndrej Mosnacek <omosnace@redhat.com>
      Acked-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: Paul Moore <paul@paul-moore.com> (LSM/SELinux)
      Link: https://patch.msgid.link/20240826130711.141271-1-omosnace@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3a0504d5
    • Jakub Kicinski's avatar
      Merge branch 'mptcp-close-subflow-when-receiving-tcp-fin-and-misc' · 237c3851
      Jakub Kicinski authored
      Matthieu Baerts says:
      
      ====================
      mptcp: close subflow when receiving TCP+FIN and misc.
      
      Here are different fixes:
      
      Patch 1 closes the subflow after having received a FIN, instead
      of leaving it half-closed until the end of the MPTCP connection.
      A fix for v5.12.
      
      Patch 2 validates the previous patch.
      
      Patch 3 is a fix for a recent fix to check both directions for the
      backup flag. It can follow the 'Fixes' commit and be backported up
      to v5.7.
      
      Patch 4 adds a missing \n at the end of pr_debug(), causing debug
      messages to be displayed with a delay, which confuses the debugger.
      A fix for v5.6.
      ====================
      
      Link: https://patch.msgid.link/20240826-net-mptcp-close-extra-sf-fin-v1-0-905199fe1172@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      237c3851
    • Matthieu Baerts (NGI0)'s avatar
      mptcp: pr_debug: add missing \n at the end · cb41b195
      Matthieu Baerts (NGI0) authored
      pr_debug() have been added in various places in MPTCP code to help
      developers to debug some situations. With the dynamic debug feature, it
      is easy to enable all or some of them, and asks users to reproduce
      issues with extra debug.
      
      Many of these pr_debug() don't end with a new line, while no 'pr_cont()'
      are used in MPTCP code. So the goal was not to display multiple debug
      messages on one line: they were then not missing the '\n' on purpose.
      Not having the new line at the end causes these messages to be printed
      with a delay, when something else needs to be printed. This issue is not
      visible when many messages need to be printed, but it is annoying and
      confusing when only specific messages are expected, e.g.
      
        # echo "func mptcp_pm_add_addr_echoed +fmp" \
              > /sys/kernel/debug/dynamic_debug/control
        # ./mptcp_join.sh "signal address"; \
              echo "$(awk '{print $1}' /proc/uptime) - end"; \
              sleep 5s; \
              echo "$(awk '{print $1}' /proc/uptime) - restart"; \
              ./mptcp_join.sh "signal address"
        013 signal address
            (...)
        10.75 - end
        15.76 - restart
        013 signal address
        [  10.367935] mptcp:mptcp_pm_add_addr_echoed: MPTCP: msk=(...)
            (...)
      
        => a delay of 5 seconds: printed with a 10.36 ts, but after 'restart'
           which was printed at the 15.76 ts.
      
      The 'Fixes' tag here below points to the first pr_debug() used without
      '\n' in net/mptcp. This patch could be split in many small ones, with
      different Fixes tag, but it doesn't seem worth it, because it is easy to
      re-generate this patch with this simple 'sed' command:
      
        git grep -l pr_debug -- net/mptcp |
          xargs sed -i "s/\(pr_debug(\".*[^n]\)\(\"[,)]\)/\1\\\n\2/g"
      
      So in case of conflicts, simply drop the modifications, and launch this
      command.
      
      Fixes: f870fa0b ("mptcp: Add MPTCP socket stubs")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarGeliang Tang <geliang@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Link: https://patch.msgid.link/20240826-net-mptcp-close-extra-sf-fin-v1-4-905199fe1172@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cb41b195
    • Matthieu Baerts (NGI0)'s avatar
      mptcp: sched: check both backup in retrans · 2a1f596e
      Matthieu Baerts (NGI0) authored
      The 'mptcp_subflow_context' structure has two items related to the
      backup flags:
      
       - 'backup': the subflow has been marked as backup by the other peer
      
       - 'request_bkup': the backup flag has been set by the host
      
      Looking only at the 'backup' flag can make sense in some cases, but it
      is not the behaviour of the default packet scheduler when selecting
      paths.
      
      As explained in the commit b6a66e52 ("mptcp: sched: check both
      directions for backup"), the packet scheduler should look at both flags,
      because that was the behaviour from the beginning: the 'backup' flag was
      set by accident instead of the 'request_bkup' one. Now that the latter
      has been fixed, get_retrans() needs to be adapted as well.
      
      Fixes: b6a66e52 ("mptcp: sched: check both directions for backup")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Link: https://patch.msgid.link/20240826-net-mptcp-close-extra-sf-fin-v1-3-905199fe1172@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2a1f596e
    • Matthieu Baerts (NGI0)'s avatar
      selftests: mptcp: join: cannot rm sf if closed · e93681af
      Matthieu Baerts (NGI0) authored
      Thanks to the previous commit, the MPTCP subflows are now closed on both
      directions even when only the MPTCP path-manager of one peer asks for
      their closure.
      
      In the two tests modified here -- "userspace pm add & remove address"
      and "userspace pm create destroy subflow" -- one peer is controlled by
      the userspace PM, and the other one by the in-kernel PM. When the
      userspace PM sends a RM_ADDR notification, the in-kernel PM will
      automatically react by closing all subflows using this address. Now,
      thanks to the previous commit, the subflows are properly closed on both
      directions, the userspace PM can then no longer closes the same
      subflows if they are already closed. Before, it was OK to do that,
      because the subflows were still half-opened, still OK to send a RM_ADDR.
      
      In other words, thanks to the previous commit closing the subflows, an
      error will be returned to the userspace if it tries to close a subflow
      that has already been closed. So no need to run this command, which mean
      that the linked counters will then not be incremented.
      
      These tests are then no longer sending both a RM_ADDR, then closing the
      linked subflow just after. The test with the userspace PM on the server
      side is now removing one subflow linked to one address, then sending
      a RM_ADDR for another address. The test with the userspace PM on the
      client side is now only removing the subflow that was previously
      created.
      
      Fixes: 4369c198 ("selftests: mptcp: test userspace pm out of transfer")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Link: https://patch.msgid.link/20240826-net-mptcp-close-extra-sf-fin-v1-2-905199fe1172@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e93681af
    • Matthieu Baerts (NGI0)'s avatar
      mptcp: close subflow when receiving TCP+FIN · f09b0ad5
      Matthieu Baerts (NGI0) authored
      When a peer decides to close one subflow in the middle of a connection
      having multiple subflows, the receiver of the first FIN should accept
      that, and close the subflow on its side as well. If not, the subflow
      will stay half closed, and would even continue to be used until the end
      of the MPTCP connection or a reset from the network.
      
      The issue has not been seen before, probably because the in-kernel
      path-manager always sends a RM_ADDR before closing the subflow. Upon the
      reception of this RM_ADDR, the other peer will initiate the closure on
      its side as well. On the other hand, if the RM_ADDR is lost, or if the
      path-manager of the other peer only closes the subflow without sending a
      RM_ADDR, the subflow would switch to TCP_CLOSE_WAIT, but that's it,
      leaving the subflow half-closed.
      
      So now, when the subflow switches to the TCP_CLOSE_WAIT state, and if
      the MPTCP connection has not been closed before with a DATA_FIN, the
      kernel owning the subflow schedules its worker to initiate the closure
      on its side as well.
      
      This issue can be easily reproduced with packetdrill, as visible in [1],
      by creating an additional subflow, injecting a FIN+ACK before sending
      the DATA_FIN, and expecting a FIN+ACK in return.
      
      Fixes: 40947e13 ("mptcp: schedule worker when subflow is closed")
      Cc: stable@vger.kernel.org
      Link: https://github.com/multipath-tcp/packetdrill/pull/154 [1]
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Link: https://patch.msgid.link/20240826-net-mptcp-close-extra-sf-fin-v1-1-905199fe1172@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f09b0ad5
    • Xueming Feng's avatar
      tcp: fix forever orphan socket caused by tcp_abort · bac76cf8
      Xueming Feng authored
      We have some problem closing zero-window fin-wait-1 tcp sockets in our
      environment. This patch come from the investigation.
      
      Previously tcp_abort only sends out reset and calls tcp_done when the
      socket is not SOCK_DEAD, aka orphan. For orphan socket, it will only
      purging the write queue, but not close the socket and left it to the
      timer.
      
      While purging the write queue, tp->packets_out and sk->sk_write_queue
      is cleared along the way. However tcp_retransmit_timer have early
      return based on !tp->packets_out and tcp_probe_timer have early
      return based on !sk->sk_write_queue.
      
      This caused ICSK_TIME_RETRANS and ICSK_TIME_PROBE0 not being resched
      and socket not being killed by the timers, converting a zero-windowed
      orphan into a forever orphan.
      
      This patch removes the SOCK_DEAD check in tcp_abort, making it send
      reset to peer and close the socket accordingly. Preventing the
      timer-less orphan from happening.
      
      According to Lorenzo's email in the v1 thread, the check was there to
      prevent force-closing the same socket twice. That situation is handled
      by testing for TCP_CLOSE inside lock, and returning -ENOENT if it is
      already closed.
      
      The -ENOENT code comes from the associate patch Lorenzo made for
      iproute2-ss; link attached below, which also conform to RFC 9293.
      
      At the end of the patch, tcp_write_queue_purge(sk) is removed because it
      was already called in tcp_done_with_error().
      
      p.s. This is the same patch with v2. Resent due to mis-labeled "changes
      requested" on patchwork.kernel.org.
      
      Link: https://patchwork.ozlabs.org/project/netdev/patch/1450773094-7978-3-git-send-email-lorenzo@google.com/
      Fixes: c1e64e29 ("net: diag: Support destroying TCP sockets.")
      Signed-off-by: default avatarXueming Feng <kuro@kuroa.me>
      Tested-by: default avatarLorenzo Colitti <lorenzo@google.com>
      Reviewed-by: default avatarJason Xing <kerneljasonxing@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://patch.msgid.link/20240826102327.1461482-1-kuro@kuroa.meSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bac76cf8
    • Cong Wang's avatar
      gtp: fix a potential NULL pointer dereference · defd8b3c
      Cong Wang authored
      When sockfd_lookup() fails, gtp_encap_enable_socket() returns a
      NULL pointer, but its callers only check for error pointers thus miss
      the NULL pointer case.
      
      Fix it by returning an error pointer with the error code carried from
      sockfd_lookup().
      
      (I found this bug during code inspection.)
      
      Fixes: 1e3a3abd ("gtp: make GTP sockets in gtp_newlink optional")
      Cc: Andreas Schultz <aschultz@tpip.net>
      Cc: Harald Welte <laforge@gnumonks.org>
      Signed-off-by: default avatarCong Wang <cong.wang@bytedance.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Link: https://patch.msgid.link/20240825191638.146748-1-xiyou.wangcong@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      defd8b3c
    • Jakub Kicinski's avatar
      Merge branch 'fixes-for-ipsec-over-bonding' · 2fecbf75
      Jakub Kicinski authored
      Jianbo Liu says:
      
      ====================
      Fixes for IPsec over bonding
      
      This patchset provides bug fixes for IPsec over bonding driver.
      
      It adds the missing xdo_dev_state_free API, and fixes "scheduling while
      atomic" by using mutex lock instead.
      
      Series generated against:
      commit c07ff859 ("netem: fix return value if duplicate enqueue fails")
      ====================
      
      Link: https://patch.msgid.link/20240823031056.110999-1-jianbol@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2fecbf75
    • Jianbo Liu's avatar
      bonding: change ipsec_lock from spin lock to mutex · 2aeeef90
      Jianbo Liu authored
      In the cited commit, bond->ipsec_lock is added to protect ipsec_list,
      hence xdo_dev_state_add and xdo_dev_state_delete are called inside
      this lock. As ipsec_lock is a spin lock and such xfrmdev ops may sleep,
      "scheduling while atomic" will be triggered when changing bond's
      active slave.
      
      [  101.055189] BUG: scheduling while atomic: bash/902/0x00000200
      [  101.055726] Modules linked in:
      [  101.058211] CPU: 3 PID: 902 Comm: bash Not tainted 6.9.0-rc4+ #1
      [  101.058760] Hardware name:
      [  101.059434] Call Trace:
      [  101.059436]  <TASK>
      [  101.060873]  dump_stack_lvl+0x51/0x60
      [  101.061275]  __schedule_bug+0x4e/0x60
      [  101.061682]  __schedule+0x612/0x7c0
      [  101.062078]  ? __mod_timer+0x25c/0x370
      [  101.062486]  schedule+0x25/0xd0
      [  101.062845]  schedule_timeout+0x77/0xf0
      [  101.063265]  ? asm_common_interrupt+0x22/0x40
      [  101.063724]  ? __bpf_trace_itimer_state+0x10/0x10
      [  101.064215]  __wait_for_common+0x87/0x190
      [  101.064648]  ? usleep_range_state+0x90/0x90
      [  101.065091]  cmd_exec+0x437/0xb20 [mlx5_core]
      [  101.065569]  mlx5_cmd_do+0x1e/0x40 [mlx5_core]
      [  101.066051]  mlx5_cmd_exec+0x18/0x30 [mlx5_core]
      [  101.066552]  mlx5_crypto_create_dek_key+0xea/0x120 [mlx5_core]
      [  101.067163]  ? bonding_sysfs_store_option+0x4d/0x80 [bonding]
      [  101.067738]  ? kmalloc_trace+0x4d/0x350
      [  101.068156]  mlx5_ipsec_create_sa_ctx+0x33/0x100 [mlx5_core]
      [  101.068747]  mlx5e_xfrm_add_state+0x47b/0xaa0 [mlx5_core]
      [  101.069312]  bond_change_active_slave+0x392/0x900 [bonding]
      [  101.069868]  bond_option_active_slave_set+0x1c2/0x240 [bonding]
      [  101.070454]  __bond_opt_set+0xa6/0x430 [bonding]
      [  101.070935]  __bond_opt_set_notify+0x2f/0x90 [bonding]
      [  101.071453]  bond_opt_tryset_rtnl+0x72/0xb0 [bonding]
      [  101.071965]  bonding_sysfs_store_option+0x4d/0x80 [bonding]
      [  101.072567]  kernfs_fop_write_iter+0x10c/0x1a0
      [  101.073033]  vfs_write+0x2d8/0x400
      [  101.073416]  ? alloc_fd+0x48/0x180
      [  101.073798]  ksys_write+0x5f/0xe0
      [  101.074175]  do_syscall_64+0x52/0x110
      [  101.074576]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
      
      As bond_ipsec_add_sa_all and bond_ipsec_del_sa_all are only called
      from bond_change_active_slave, which requires holding the RTNL lock.
      And bond_ipsec_add_sa and bond_ipsec_del_sa are xfrm state
      xdo_dev_state_add and xdo_dev_state_delete APIs, which are in user
      context. So ipsec_lock doesn't have to be spin lock, change it to
      mutex, and thus the above issue can be resolved.
      
      Fixes: 9a560550 ("bonding: Add struct bond_ipesc to manage SA")
      Signed-off-by: default avatarJianbo Liu <jianbol@nvidia.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Acked-by: default avatarJay Vosburgh <jv@jvosburgh.net>
      Link: https://patch.msgid.link/20240823031056.110999-4-jianbol@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2aeeef90