1. 17 Nov, 2020 34 commits
  2. 16 Nov, 2020 6 commits
    • Jakub Kicinski's avatar
      Merge branch 'mptcp-improve-multiple-xmit-streams-support' · 72308ecb
      Jakub Kicinski authored
      Paolo Abeni says:
      
      ====================
      mptcp: improve multiple xmit streams support
      
      This series improves MPTCP handling of multiple concurrent
      xmit streams.
      
      The to-be-transmitted data is enqueued to a subflow only when
      the send window is open, keeping the subflows xmit queue shorter
      and allowing for faster switch-over.
      
      The above requires a more accurate msk socket state tracking
      and some additional infrastructure to allow pushing the data
      pending in the msk xmit queue as soon as the MPTCP's send window
      opens (patches 6-10).
      
      As a side effect, the MPTCP socket could enqueue data to subflows
      after close() time - to completely spooling the data sitting in the
      msk xmit queue. Dealing with the requires some infrastructure and
      core TCP changes (patches 1-5)
      
      Finally, patches 11-12 introduce a more accurate tracking of the other
      end's receive window.
      
      Overall this refactor the MPTCP xmit path, without introducing
      new features - the new code is covered by the existing self-tests.
      
      v2 -> v3:
       - rebased,
       - fixed checkpatch issue in patch 1/13
       - fixed some state tracking issues in patch 8/13
      
      v1 -> v2:
       - this is just a repost, to cope with patchwork issues, no changes
         at all
      ====================
      
      Link: https://lore.kernel.org/r/cover.1605458224.git.pabeni@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      72308ecb
    • Paolo Abeni's avatar
      mptcp: send explicit ack on delayed ack_seq incr · 7ed90803
      Paolo Abeni authored
      When the worker moves some bytes from the OoO queue into
      the receive queue, the msk->ask_seq is updated, the MPTCP-level
      ack carrying that value needs to wait the next ingress packet,
      possibly slowing down or hanging the peer
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7ed90803
    • Florian Westphal's avatar
      mptcp: keep track of advertised windows right edge · 6f8a612a
      Florian Westphal authored
      Before sending 'x' new bytes also check that the new snd_una would
      be within the permitted receive window.
      
      For every ACK that also contains a DSS ack, check whether its tcp-level
      receive window would advance the current mptcp window right edge and
      update it if so.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Co-developed-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6f8a612a
    • Florian Westphal's avatar
      mptcp: rework poll+nospace handling · 8edf0864
      Florian Westphal authored
      MPTCP maintains a status bit, MPTCP_SEND_SPACE, that is set when at
      least one subflow and the mptcp socket itself are writeable.
      
      mptcp_poll returns EPOLLOUT if the bit is set.
      
      mptcp_sendmsg makes sure MPTCP_SEND_SPACE gets cleared when last write
      has used up all subflows or the mptcp socket wmem.
      
      This reworks nospace handling as follows:
      
      MPTCP_SEND_SPACE is replaced with MPTCP_NOSPACE, i.e. inverted meaning.
      This bit is set when the mptcp socket is not writeable.
      The mptcp-level ack path schedule will then schedule the mptcp worker
      to allow it to free already-acked data (and reduce wmem usage).
      
      This will then wake userspace processes that wait for a POLLOUT event.
      
      sendmsg will set MPTCP_NOSPACE only when it has to wait for more
      wmem (blocking I/O case).
      
      poll path will set MPTCP_NOSPACE in case the mptcp socket is
      not writeable.
      
      Normal tcp-level notification (SOCK_NOSPACE) is only enabled
      in case the subflow socket has no available wmem.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8edf0864
    • Paolo Abeni's avatar
      mptcp: try to push pending data on snd una updates · 813e0a68
      Paolo Abeni authored
      After the previous patch we may end-up with unsent data
      in the write buffer. If such buffer is full, the writer
      will block for unlimited time.
      
      We need to trigger the MPTCP xmit path even for the
      subflow rx path, on MPTCP snd_una updates.
      
      Keep things simple and just schedule the work queue if
      needed.
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      813e0a68
    • Paolo Abeni's avatar
      mptcp: move page frag allocation in mptcp_sendmsg() · d9ca1de8
      Paolo Abeni authored
      mptcp_sendmsg() is refactored so that first it copies
      the data provided from user space into the send queue,
      and then tries to spool the send queue via sendmsg_frag.
      
      There a subtle change in the mptcp level collapsing on
      consecutive data fragment: we now allow that only on unsent
      data.
      
      The latter don't need to deal with msghdr data anymore
      and can be simplified in a relevant way.
      
      snd_nxt and write_seq are now tracked independently.
      
      Overall this allows some relevant cleanup and will
      allow sending pending mptcp data on msk una update in
      later patch.
      Co-developed-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d9ca1de8