1. 30 Apr, 2018 7 commits
    • David S. Miller's avatar
      Merge branch 'tcp-mmap-rework-zerocopy-receive' · 5d659b1d
      David S. Miller authored
      Eric Dumazet says:
      
      ====================
      tcp: mmap: rework zerocopy receive
      
      syzbot reported a lockdep issue caused by tcp mmap() support.
      
      I implemented Andy Lutomirski nice suggestions to resolve the
      issue and increase scalability as well.
      
      First patch is adding a new getsockopt() operation and changes mmap()
      behavior.
      
      Second patch changes tcp_mmap reference program.
      
      v4: tcp mmap() support depends on CONFIG_MMU, as kbuild bot told us.
      
      v3: change TCP_ZEROCOPY_RECEIVE to be a getsockopt() option
          instead of setsockopt(), feedback from Ka-Cheon Poon
      
      v2: Added a missing page align of zc->length in tcp_zerocopy_receive()
          Properly clear zc->recv_skip_hint in case user request was completed.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5d659b1d
    • Eric Dumazet's avatar
      selftests: net: tcp_mmap must use TCP_ZEROCOPY_RECEIVE · aacb0c2e
      Eric Dumazet authored
      After prior kernel change, mmap() on TCP socket only reserves VMA.
      
      We have to use getsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...)
      to perform the transfert of pages from skbs in TCP receive queue into such VMA.
      
      struct tcp_zerocopy_receive {
      	__u64 address;		/* in: address of mapping */
      	__u32 length;		/* in/out: number of bytes to map/mapped */
      	__u32 recv_skip_hint;	/* out: amount of bytes to skip */
      };
      
      After a successful getsockopt(...TCP_ZEROCOPY_RECEIVE...), @length contains
      number of bytes that were mapped, and @recv_skip_hint contains number of bytes
      that should be read using conventional read()/recv()/recvmsg() system calls,
      to skip a sequence of bytes that can not be mapped, because not properly page
      aligned.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aacb0c2e
    • Eric Dumazet's avatar
      tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive · 05255b82
      Eric Dumazet authored
      When adding tcp mmap() implementation, I forgot that socket lock
      had to be taken before current->mm->mmap_sem. syzbot eventually caught
      the bug.
      
      Since we can not lock the socket in tcp mmap() handler we have to
      split the operation in two phases.
      
      1) mmap() on a tcp socket simply reserves VMA space, and nothing else.
        This operation does not involve any TCP locking.
      
      2) getsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...) implements
       the transfert of pages from skbs to one VMA.
        This operation only uses down_read(&current->mm->mmap_sem) after
        holding TCP lock, thus solving the lockdep issue.
      
      This new implementation was suggested by Andy Lutomirski with great details.
      
      Benefits are :
      
      - Better scalability, in case multiple threads reuse VMAS
         (without mmap()/munmap() calls) since mmap_sem wont be write locked.
      
      - Better error recovery.
         The previous mmap() model had to provide the expected size of the
         mapping. If for some reason one part could not be mapped (partial MSS),
         the whole operation had to be aborted.
         With the tcp_zerocopy_receive struct, kernel can report how
         many bytes were successfuly mapped, and how many bytes should
         be read to skip the problematic sequence.
      
      - No more memory allocation to hold an array of page pointers.
        16 MB mappings needed 32 KB for this array, potentially using vmalloc() :/
      
      - skbs are freed while mmap_sem has been released
      
      Following patch makes the change in tcp_mmap tool to demonstrate
      one possible use of mmap() and setsockopt(... TCP_ZEROCOPY_RECEIVE ...)
      
      Note that memcg might require additional changes.
      
      Fixes: 93ab6cc6 ("tcp: implement mmap() for zero copy receive")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Suggested-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: linux-mm@kvack.org
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      05255b82
    • David S. Miller's avatar
      Merge branch 'dsa-mv88e6xxx-remove-Global-2-setup' · 589f84fb
      David S. Miller authored
      Vivien Didelot says:
      
      ====================
      net: dsa: mv88e6xxx: remove Global 2 setup
      
      Parts of the mv88e6xxx driver still write arbitrary registers of
      different banks at setup time, which is misleading especially when
      supporting multiple device models.
      
      This patchset moves two features setup into the top lovel
      mv88e6xxx_setup function and kills the old Global 2 register bank setup
      function. It brings no functional changes.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      589f84fb
    • Vivien Didelot's avatar
      net: dsa: mv88e6xxx: remove Global 2 setup · 5d49d603
      Vivien Didelot authored
      The remaining values written to the Switch Management Register in the
      mv88e6xxx_g2_setup function are specific to 88E6352 and older, and are
      the default values anyway.
      
      Thus remove completely this function. The mv88e6xxx driver no more
      contains setup code to access arbitrary Global 2 registers.
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5d49d603
    • Vivien Didelot's avatar
      net: dsa: mv88e6xxx: move device mapping setup · c7f047b6
      Vivien Didelot authored
      Move the Device Mapping setup out of the specific Global 2 code,
      into the top level device setup function.
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c7f047b6
    • Vivien Didelot's avatar
      net: dsa: mv88e6xxx: move trunk setup · b28f872d
      Vivien Didelot authored
      Move the trunking setup out of Global 2 specific setup into the top
      level mv88e6xxx_setup function.
      
      Note that the 88E6390 family calls this LAG instead of Trunk and
      supports 32 possible ID routing vectors, with LAG ID bit 4 being placed
      in Global 2 register 0x1D...
      
      We don't need Trunk (or LAG) IDs for the moment, thus keep it simple.
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b28f872d
  2. 28 Apr, 2018 4 commits
  3. 27 Apr, 2018 29 commits