1. 05 May, 2018 1 commit
  2. 04 May, 2018 4 commits
    • Peter Zijlstra's avatar
      locking/mutex: Optimize __mutex_trylock_fast() · c427f695
      Peter Zijlstra authored
      Use try_cmpxchg to avoid the pointless TEST instruction..
      And add the (missing) atomic_long_try_cmpxchg*() wrappery.
      
      On x86_64 this gives:
      
      0000000000000710 <mutex_lock>:						0000000000000710 <mutex_lock>:
       710:   65 48 8b 14 25 00 00    mov    %gs:0x0,%rdx                      710:   65 48 8b 14 25 00 00    mov    %gs:0x0,%rdx
       717:   00 00                                                            717:   00 00
                              715: R_X86_64_32S       current_task                                    715: R_X86_64_32S       current_task
       719:   31 c0                   xor    %eax,%eax                         719:   31 c0                   xor    %eax,%eax
       71b:   f0 48 0f b1 17          lock cmpxchg %rdx,(%rdi)                 71b:   f0 48 0f b1 17          lock cmpxchg %rdx,(%rdi)
       720:   48 85 c0                test   %rax,%rax                         720:   75 02                   jne    724 <mutex_lock+0x14>
       723:   75 02                   jne    727 <mutex_lock+0x17>             722:   f3 c3                   repz retq
       725:   f3 c3                   repz retq                                724:   eb da                   jmp    700 <__mutex_lock_slowpath>
       727:   eb d7                   jmp    700 <__mutex_lock_slowpath>       726:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
       729:   0f 1f 80 00 00 00 00    nopl   0x0(%rax)                         72d:   00 00 00
      
      On ARM64 this gives:
      
      000000000000638 <mutex_lock>:						0000000000000638 <mutex_lock>:
           638:       d5384101        mrs     x1, sp_el0                           638:       d5384101        mrs     x1, sp_el0
           63c:       d2800002        mov     x2, #0x0                             63c:       d2800002        mov     x2, #0x0
           640:       f9800011        prfm    pstl1strm, [x0]                      640:       f9800011        prfm    pstl1strm, [x0]
           644:       c85ffc03        ldaxr   x3, [x0]                             644:       c85ffc03        ldaxr   x3, [x0]
           648:       ca020064        eor     x4, x3, x2                           648:       ca020064        eor     x4, x3, x2
           64c:       b5000064        cbnz    x4, 658 <mutex_lock+0x20>            64c:       b5000064        cbnz    x4, 658 <mutex_lock+0x20>
           650:       c8047c01        stxr    w4, x1, [x0]                         650:       c8047c01        stxr    w4, x1, [x0]
           654:       35ffff84        cbnz    w4, 644 <mutex_lock+0xc>             654:       35ffff84        cbnz    w4, 644 <mutex_lock+0xc>
           658:       b40000c3        cbz     x3, 670 <mutex_lock+0x38>            658:       b5000043        cbnz    x3, 660 <mutex_lock+0x28>
           65c:       a9bf7bfd        stp     x29, x30, [sp,#-16]!                 65c:       d65f03c0        ret
           660:       910003fd        mov     x29, sp                              660:       a9bf7bfd        stp     x29, x30, [sp,#-16]!
           664:       97ffffef        bl      620 <__mutex_lock_slowpath>          664:       910003fd        mov     x29, sp
           668:       a8c17bfd        ldp     x29, x30, [sp],#16                   668:       97ffffee        bl      620 <__mutex_lock_slowpath>
           66c:       d65f03c0        ret                                          66c:       a8c17bfd        ldp     x29, x30, [sp],#16
           670:       d65f03c0        ret                                          670:       d65f03c0        ret
      Reported-by: default avatarMatthew Wilcox <mawilcox@microsoft.com>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c427f695
    • Linus Torvalds's avatar
      Merge tag 'linux-kselftest-4.17-rc4' of... · 15042698
      Linus Torvalds authored
      Merge tag 'linux-kselftest-4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull kselftest fixes from Shuah Khan:
       "This Kselftest update for 4.17-rc4 consists of a fix for a syntax
        error in the script that runs selftests. Mathieu Desnoyers found this
        bug in the script on systems running GNU Make 3.8 or older"
      
      * tag 'linux-kselftest-4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
        selftests: Fix lib.mk run_tests target shell script
      15042698
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · e523a256
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Various sockmap fixes from John Fastabend (pinned map handling,
          blocking in recvmsg, double page put, error handling during redirect
          failures, etc.)
      
       2) Fix dead code handling in x86-64 JIT, from Gianluca Borello.
      
       3) Missing device put in RDS IB code, from Dag Moxnes.
      
       4) Don't process fast open during repair mode in TCP< from Yuchung
          Cheng.
      
       5) Move address/port comparison fixes in SCTP, from Xin Long.
      
       6) Handle add a bond slave's master into a bridge properly, from
          Hangbin Liu.
      
       7) IPv6 multipath code can operate on unitialized memory due to an
          assumption that the icmp header is in the linear SKB area. Fix from
          Eric Dumazet.
      
       8) Don't invoke do_tcp_sendpages() recursively via TLS, from Dave
          Watson.
      
      9) Fix memory leaks in x86-64 JIT, from Daniel Borkmann.
      
      10) RDS leaks kernel memory to userspace, from Eric Dumazet.
      
      11) DCCP can invoke a tasklet on a freed socket, take a refcount. Also
          from Eric Dumazet.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (78 commits)
        dccp: fix tasklet usage
        smc: fix sendpage() call
        net/smc: handle unregistered buffers
        net/smc: call consolidation
        qed: fix spelling mistake: "offloded" -> "offloaded"
        net/mlx5e: fix spelling mistake: "loobpack" -> "loopback"
        tcp: restore autocorking
        rds: do not leak kernel memory to user land
        qmi_wwan: do not steal interfaces from class drivers
        ipv4: fix fnhe usage by non-cached routes
        bpf: sockmap, fix error handling in redirect failures
        bpf: sockmap, zero sg_size on error when buffer is released
        bpf: sockmap, fix scatterlist update on error path in send with apply
        net_sched: fq: take care of throttled flows before reuse
        ipv6: Revert "ipv6: Allow non-gateway ECMP for IPv6"
        bpf, x64: fix memleak when not converging on calls
        bpf, x64: fix memleak when not converging after image
        net/smc: restrict non-blocking connect finish
        8139too: Use disable_irq_nosync() in rtl8139_poll_controller()
        sctp: fix the issue that the cookie-ack with auth can't get processed
        ...
      e523a256
    • Linus Torvalds's avatar
      Merge branch 'parisc-4.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · bb609316
      Linus Torvalds authored
      Pull parisc fixes from Helge Deller:
       "Fix two section mismatches, convert to read_persistent_clock64(), add
        further documentation regarding the HPMC crash handler and make
        bzImage the default build target"
      
      * 'parisc-4.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        parisc: Fix section mismatches
        parisc: drivers.c: Fix section mismatches
        parisc: time: Convert read_persistent_clock() to read_persistent_clock64()
        parisc: Document rules regarding checksum of HPMC handler
        parisc: Make bzImage default build target
      bb609316
  3. 03 May, 2018 16 commits
    • Eric Dumazet's avatar
      dccp: fix tasklet usage · a8d7aa17
      Eric Dumazet authored
      syzbot reported a crash in tasklet_action_common() caused by dccp.
      
      dccp needs to make sure socket wont disappear before tasklet handler
      has completed.
      
      This patch takes a reference on the socket when arming the tasklet,
      and moves the sock_put() from dccp_write_xmit_timer() to dccp_write_xmitlet()
      
      kernel BUG at kernel/softirq.c:514!
      invalid opcode: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 1 PID: 17 Comm: ksoftirqd/1 Not tainted 4.17.0-rc3+ #30
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:tasklet_action_common.isra.19+0x6db/0x700 kernel/softirq.c:515
      RSP: 0018:ffff8801d9b3faf8 EFLAGS: 00010246
      dccp_close: ABORT with 65423 bytes unread
      RAX: 1ffff1003b367f6b RBX: ffff8801daf1f3f0 RCX: 0000000000000000
      RDX: ffff8801cf895498 RSI: 0000000000000004 RDI: 0000000000000000
      RBP: ffff8801d9b3fc40 R08: ffffed0039f12a95 R09: ffffed0039f12a94
      dccp_close: ABORT with 65423 bytes unread
      R10: ffffed0039f12a94 R11: ffff8801cf8954a3 R12: 0000000000000000
      R13: ffff8801d9b3fc18 R14: dffffc0000000000 R15: ffff8801cf895490
      FS:  0000000000000000(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000001b2bc28000 CR3: 00000001a08a9000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       tasklet_action+0x1d/0x20 kernel/softirq.c:533
       __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
      dccp_close: ABORT with 65423 bytes unread
       run_ksoftirqd+0x86/0x100 kernel/softirq.c:646
       smpboot_thread_fn+0x417/0x870 kernel/smpboot.c:164
       kthread+0x345/0x410 kernel/kthread.c:238
       ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
      Code: 48 8b 85 e8 fe ff ff 48 8b 95 f0 fe ff ff e9 94 fb ff ff 48 89 95 f0 fe ff ff e8 81 53 6e 00 48 8b 95 f0 fe ff ff e9 62 fb ff ff <0f> 0b 48 89 cf 48 89 8d e8 fe ff ff e8 64 53 6e 00 48 8b 8d e8
      RIP: tasklet_action_common.isra.19+0x6db/0x700 kernel/softirq.c:515 RSP: ffff8801d9b3faf8
      
      Fixes: dc841e30 ("dccp: Extend CCID packet dequeueing interface")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
      Cc: dccp@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a8d7aa17
    • David S. Miller's avatar
      Merge branch 'smc-fixes' · 31140b47
      David S. Miller authored
      Ursula Braun says:
      
      ====================
      net/smc: fixes 2018/05/03
      
      here are smc fixes for 2 problems:
       * receive buffers in SMC must be registered. If registration fails
         these buffers must not be kept within the link group for reuse.
         Patch 1 is a preparational patch; patch 2 contains the fix.
       * sendpage: do not hold the sock lock when calling kernel_sendpage()
                   or sock_no_sendpage()
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      31140b47
    • Stefan Raspl's avatar
      smc: fix sendpage() call · bda27ff5
      Stefan Raspl authored
      The sendpage() call grabs the sock lock before calling the default
      implementation - which tries to grab it once again.
      Signed-off-by: default avatarStefan Raspl <raspl@linux.ibm.com>
      Signed-off-by: Ursula Braun <ubraun@linux.ibm.com><
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bda27ff5
    • Karsten Graul's avatar
      net/smc: handle unregistered buffers · a6920d1d
      Karsten Graul authored
      When smc_wr_reg_send() fails then tag (regerr) the affected buffer and
      free it in smc_buf_unuse().
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a6920d1d
    • Karsten Graul's avatar
      net/smc: call consolidation · e63a5f8c
      Karsten Graul authored
      Consolidate the call to smc_wr_reg_send() in a new function.
      No functional changes.
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e63a5f8c
    • Colin Ian King's avatar
      qed: fix spelling mistake: "offloded" -> "offloaded" · df80b8fb
      Colin Ian King authored
      Trivial fix to spelling mistake in DP_NOTICE message
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df80b8fb
    • Colin Ian King's avatar
      net/mlx5e: fix spelling mistake: "loobpack" -> "loopback" · 4e11581c
      Colin Ian King authored
      Trivial fix to spelling mistake in netdev_err error message
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e11581c
    • Linus Torvalds's avatar
      Merge tag 'dma-mapping-4.17-4' of git://git.infradead.org/users/hch/dma-mapping · c15f6d8d
      Linus Torvalds authored
      Pull dma-mapping fix from Christoph Hellwig:
       "Fix an incorrect warning selection introduced in the last merge
        window"
      
      * tag 'dma-mapping-4.17-4' of git://git.infradead.org/users/hch/dma-mapping:
        swiotlb: fix inversed DMA_ATTR_NO_WARN test
      c15f6d8d
    • Eric Dumazet's avatar
      tcp: restore autocorking · 114f39fe
      Eric Dumazet authored
      When adding rb-tree for TCP retransmit queue, we inadvertently broke
      TCP autocorking.
      
      tcp_should_autocork() should really check if the rtx queue is not empty.
      
      Tested:
      
      Before the fix :
      $ nstat -n;./netperf -H 10.246.7.152 -Cc -- -m 500;nstat | grep AutoCork
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.152 () port 0 AF_INET
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
      540000 262144    500    10.00      2682.85   2.47     1.59     3.618   2.329
      TcpExtTCPAutoCorking            33                 0.0
      
      // Same test, but forcing TCP_NODELAY
      $ nstat -n;./netperf -H 10.246.7.152 -Cc -- -D -m 500;nstat | grep AutoCork
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.152 () port 0 AF_INET : nodelay
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
      540000 262144    500    10.00      1408.75   2.44     2.96     6.802   8.259
      TcpExtTCPAutoCorking            1                  0.0
      
      After the fix :
      $ nstat -n;./netperf -H 10.246.7.152 -Cc -- -m 500;nstat | grep AutoCork
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.152 () port 0 AF_INET
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
      540000 262144    500    10.00      5472.46   2.45     1.43     1.761   1.027
      TcpExtTCPAutoCorking            361293             0.0
      
      // With TCP_NODELAY option
      $ nstat -n;./netperf -H 10.246.7.152 -Cc -- -D -m 500;nstat | grep AutoCork
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.152 () port 0 AF_INET : nodelay
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
      540000 262144    500    10.00      5454.96   2.46     1.63     1.775   1.174
      TcpExtTCPAutoCorking            315448             0.0
      
      Fixes: 75c119af ("tcp: implement rb-tree based retransmit queue")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarMichael Wenig <mwenig@vmware.com>
      Tested-by: default avatarMichael Wenig <mwenig@vmware.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarMichael Wenig <mwenig@vmware.com>
      Tested-by: default avatarMichael Wenig <mwenig@vmware.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      114f39fe
    • Eric Dumazet's avatar
      rds: do not leak kernel memory to user land · eb80ca47
      Eric Dumazet authored
      syzbot/KMSAN reported an uninit-value in put_cmsg(), originating
      from rds_cmsg_recv().
      
      Simply clear the structure, since we have holes there, or since
      rx_traces might be smaller than RDS_MSG_RX_DGRAM_TRACE_MAX.
      
      BUG: KMSAN: uninit-value in copy_to_user include/linux/uaccess.h:184 [inline]
      BUG: KMSAN: uninit-value in put_cmsg+0x600/0x870 net/core/scm.c:242
      CPU: 0 PID: 4459 Comm: syz-executor582 Not tainted 4.16.0+ #87
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x185/0x1d0 lib/dump_stack.c:53
       kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
       kmsan_internal_check_memory+0x135/0x1e0 mm/kmsan/kmsan.c:1157
       kmsan_copy_to_user+0x69/0x160 mm/kmsan/kmsan.c:1199
       copy_to_user include/linux/uaccess.h:184 [inline]
       put_cmsg+0x600/0x870 net/core/scm.c:242
       rds_cmsg_recv net/rds/recv.c:570 [inline]
       rds_recvmsg+0x2db5/0x3170 net/rds/recv.c:657
       sock_recvmsg_nosec net/socket.c:803 [inline]
       sock_recvmsg+0x1d0/0x230 net/socket.c:810
       ___sys_recvmsg+0x3fb/0x810 net/socket.c:2205
       __sys_recvmsg net/socket.c:2250 [inline]
       SYSC_recvmsg+0x298/0x3c0 net/socket.c:2262
       SyS_recvmsg+0x54/0x80 net/socket.c:2257
       do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      
      Fixes: 3289025a ("RDS: add receive message trace used by application")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com>
      Cc: linux-rdma <linux-rdma@vger.kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eb80ca47
    • Bjørn Mork's avatar
      qmi_wwan: do not steal interfaces from class drivers · 5697db4a
      Bjørn Mork authored
      The USB_DEVICE_INTERFACE_NUMBER matching macro assumes that
      the { vendorid, productid, interfacenumber } set uniquely
      identifies one specific function.  This has proven to fail
      for some configurable devices. One example is the Quectel
      EM06/EP06 where the same interface number can be either
      QMI or MBIM, without the device ID changing either.
      
      Fix by requiring the vendor-specific class for interface number
      based matching.  Functions of other classes can and should use
      class based matching instead.
      
      Fixes: 03304bcb ("net: qmi_wwan: use fixed interface number matching")
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5697db4a
    • Linus Torvalds's avatar
      Merge tag 'trace-v4.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · f4ef6a43
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
       "Various fixes in tracing:
      
         - Tracepoints should not give warning on OOM failures
      
         - Use special field for function pointer in trace event
      
         - Fix igrab issues in uprobes
      
         - Fixes to the new histogram triggers"
      
      * tag 'trace-v4.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracepoint: Do not warn on ENOMEM
        tracing: Add field modifier parsing hist error for hist triggers
        tracing: Add field parsing hist error for hist triggers
        tracing: Restore proper field flag printing when displaying triggers
        tracing: initcall: Ordered comparison of function pointers
        tracing: Remove igrab() iput() call from uprobes.c
        tracing: Fix bad use of igrab in trace_uprobe.c
      f4ef6a43
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · ecd649b3
      Linus Torvalds authored
      Pull input updates from Dmitry Torokhov:
       "Just a few driver fixes"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: atmel_mxt_ts - add missing compatible strings to OF device table
        Input: atmel_mxt_ts - fix the firmware update
        Input: atmel_mxt_ts - add touchpad button mapping for Samsung Chromebook Pro
        MAINTAINERS: Rakesh Iyer can't be reached anymore
        Input: hideep_ts - fix a typo in Kconfig
        Input: alps - fix reporting pressure of v3 trackstick
        Input: leds - fix out of bound access
        Input: synaptics-rmi4 - fix an unchecked out of memory error path
      ecd649b3
    • Julian Anastasov's avatar
      ipv4: fix fnhe usage by non-cached routes · 94720e3a
      Julian Anastasov authored
      Allow some non-cached routes to use non-expired fnhe:
      
      1. ip_del_fnhe: moved above and now called by find_exception.
      The 4.5+ commit deed49df expires fnhe only when caching
      routes. Change that to:
      
      1.1. use fnhe for non-cached local output routes, with the help
      from (2)
      
      1.2. allow __mkroute_input to detect expired fnhe (outdated
      fnhe_gw, for example) when do_cache is false, eg. when itag!=0
      for unicast destinations.
      
      2. __mkroute_output: keep fi to allow local routes with orig_oif != 0
      to use fnhe info even when the new route will not be cached into fnhe.
      After commit 839da4d9 ("net: ipv4: set orig_oif based on fib
      result for local traffic") it means all local routes will be affected
      because they are not cached. This change is used to solve a PMTU
      problem with IPVS (and probably Netfilter DNAT) setups that redirect
      local clients from target local IP (local route to Virtual IP)
      to new remote IP target, eg. IPVS TUN real server. Loopback has
      64K MTU and we need to create fnhe on the local route that will
      keep the reduced PMTU for the Virtual IP. Without this change
      fnhe_pmtu is updated from ICMP but never exposed to non-cached
      local routes. This includes routes with flowi4_oif!=0 for 4.6+ and
      with flowi4_oif=any for 4.14+).
      
      3. update_or_create_fnhe: make sure fnhe_expires is not 0 for
      new entries
      
      Fixes: 839da4d9 ("net: ipv4: set orig_oif based on fib result for local traffic")
      Fixes: d6d5e999 ("route: do not cache fib route info on local routes with oif")
      Fixes: deed49df ("route: check and remove route cache when we get route")
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Xin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      94720e3a
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 3b6f9793
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Three small bug fixes: an illegally overlapping memcmp in target code,
        a potential infinite loop in isci under certain rare phy conditions
        and an ATA queue depth (performance) correction for storvsc"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: target: Fix fortify_panic kernel exception
        scsi: isci: Fix infinite loop in while loop
        scsi: storvsc: Set up correct queue depth values for IDE devices
      3b6f9793
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · e002434e
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2018-05-03
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Several BPF sockmap fixes mostly related to bugs in error path
         handling, that is, a bug in updating the scatterlist length /
         offset accounting, a missing sk_mem_uncharge() in redirect
         error handling, and a bug where the outstanding bytes counter
         sg_size was not zeroed, from John.
      
      2) Fix two memory leaks in the x86-64 BPF JIT, one in an error
         path where we still don't converge after image was allocated
         and another one where BPF calls are used and JIT passes don't
         converge, from Daniel.
      
      3) Minor fix in BPF selftests where in test_stacktrace_build_id()
         we drop useless args in urandom_read and we need to add a missing
         newline in a CHECK() error message, from Song.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e002434e
  4. 02 May, 2018 19 commits
    • Alexei Starovoitov's avatar
      Merge branch 'bpf-sockmap-fixes' · b5b6ff73
      Alexei Starovoitov authored
      John Fastabend says:
      
      ====================
      When I added the test_sockmap to selftests I mistakenly changed the
      test logic a bit. The result of this was on redirect cases we ended up
      choosing the wrong sock from the BPF program and ended up sending to a
      socket that had no receive handler. The result was the actual receive
      handler, running on a different socket, is timing out and closing the
      socket. This results in errors (-EPIPE to be specific) on the sending
      side. Typically happening if the sender does not complete the send
      before the receive side times out. So depending on timing and the size
      of the send we may get errors. This exposed some bugs in the sockmap
      error path handling.
      
      This series fixes the errors. The primary issue is we did not do proper
      memory accounting in these cases which resulted in missing a
      sk_mem_uncharge(). This happened in the redirect path and in one case
      on the normal send path. See the three patches for the details.
      
      The other take-away from this is we need to fix the test_sockmap and
      also add more negative test cases. That will happen in bpf-next.
      
      Finally, I tested this using the existing test_sockmap program, the
      older sockmap sample test script, and a few real use cases with
      Cilium. All of these seem to be in working correctly.
      
      v2: fix compiler warning, drop iterator variable 'i' that is no longer
          used in patch 3.
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      b5b6ff73
    • John Fastabend's avatar
      bpf: sockmap, fix error handling in redirect failures · abaeb096
      John Fastabend authored
      When a redirect failure happens we release the buffers in-flight
      without calling a sk_mem_uncharge(), the uncharge is called before
      dropping the sock lock for the redirecte, however we missed updating
      the ring start index. When no apply actions are in progress this
      is OK because we uncharge the entire buffer before the redirect.
      But, when we have apply logic running its possible that only a
      portion of the buffer is being redirected. In this case we only
      do memory accounting for the buffer slice being redirected and
      expect to be able to loop over the BPF program again and/or if
      a sock is closed uncharge the memory at sock destruct time.
      
      With an invalid start index however the program logic looks at
      the start pointer index, checks the length, and when seeing the
      length is zero (from the initial release and failure to update
      the pointer) aborts without uncharging/releasing the remaining
      memory.
      
      The fix for this is simply to update the start index. To avoid
      fixing this error in two locations we do a small refactor and
      remove one case where it is open-coded. Then fix it in the
      single function.
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      abaeb096
    • John Fastabend's avatar
      bpf: sockmap, zero sg_size on error when buffer is released · fec51d40
      John Fastabend authored
      When an error occurs during a redirect we have two cases that need
      to be handled (i) we have a cork'ed buffer (ii) we have a normal
      sendmsg buffer.
      
      In the cork'ed buffer case we don't currently support recovering from
      errors in a redirect action. So the buffer is released and the error
      should _not_ be pushed back to the caller of sendmsg/sendpage. The
      rationale here is the user will get an error that relates to old
      data that may have been sent by some arbitrary thread on that sock.
      Instead we simple consume the data and tell the user that the data
      has been consumed. We may add proper error recovery in the future.
      However, this patch fixes a bug where the bytes outstanding counter
      sg_size was not zeroed. This could result in a case where if the user
      has both a cork'ed action and apply action in progress we may
      incorrectly call into the BPF program when the user expected an
      old verdict to be applied via the apply action. I don't have a use
      case where using apply and cork at the same time is valid but we
      never explicitly reject it because it should work fine. This patch
      ensures the sg_size is zeroed so we don't have this case.
      
      In the normal sendmsg buffer case (no cork data) we also do not
      zero sg_size. Again this can confuse the apply logic when the logic
      calls into the BPF program when the BPF programmer expected the old
      verdict to remain. So ensure we set sg_size to zero here as well. And
      additionally to keep the psock state in-sync with the sk_msg_buff
      release all the memory as well. Previously we did this before
      returning to the user but this left a gap where psock and sk_msg_buff
      states were out of sync which seems fragile. No additional overhead
      is taken here except for a call to check the length and realize its
      already been freed. This is in the error path as well so in my
      opinion lets have robust code over optimized error paths.
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      fec51d40
    • John Fastabend's avatar
      bpf: sockmap, fix scatterlist update on error path in send with apply · 3cc9a472
      John Fastabend authored
      When the call to do_tcp_sendpage() fails to send the complete block
      requested we either retry if only a partial send was completed or
      abort if we receive a error less than or equal to zero. Before
      returning though we must update the scatterlist length/offset to
      account for any partial send completed.
      
      Before this patch we did this at the end of the retry loop, but
      this was buggy when used while applying a verdict to fewer bytes
      than in the scatterlist. When the scatterlist length was being set
      we forgot to account for the apply logic reducing the size variable.
      So the result was we chopped off some bytes in the scatterlist without
      doing proper cleanup on them. This results in a WARNING when the
      sock is tore down because the bytes have previously been charged to
      the socket but are never uncharged.
      
      The simple fix is to simply do the accounting inside the retry loop
      subtracting from the absolute scatterlist values rather than trying
      to accumulate the totals and subtract at the end.
      Reported-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      3cc9a472
    • Eric Dumazet's avatar
      net_sched: fq: take care of throttled flows before reuse · 7df40c26
      Eric Dumazet authored
      Normally, a socket can not be freed/reused unless all its TX packets
      left qdisc and were TX-completed. However connect(AF_UNSPEC) allows
      this to happen.
      
      With commit fc59d5bd ("pkt_sched: fq: clear time_next_packet for
      reused flows") we cleared f->time_next_packet but took no special
      action if the flow was still in the throttled rb-tree.
      
      Since f->time_next_packet is the key used in the rb-tree searches,
      blindly clearing it might break rb-tree integrity. We need to make
      sure the flow is no longer in the rb-tree to avoid this problem.
      
      Fixes: fc59d5bd ("pkt_sched: fq: clear time_next_packet for reused flows")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7df40c26
    • Ido Schimmel's avatar
      ipv6: Revert "ipv6: Allow non-gateway ECMP for IPv6" · 30ca22e4
      Ido Schimmel authored
      This reverts commit edd7ceb7 ("ipv6: Allow non-gateway ECMP for
      IPv6").
      
      Eric reported a division by zero in rt6_multipath_rebalance() which is
      caused by above commit that considers identical local routes to be
      siblings. The division by zero happens because a nexthop weight is not
      set for local routes.
      
      Revert the commit as it does not fix a bug and has side effects.
      
      To reproduce:
      
      # ip -6 address add 2001:db8::1/64 dev dummy0
      # ip -6 address add 2001:db8::1/64 dev dummy1
      
      Fixes: edd7ceb7 ("ipv6: Allow non-gateway ECMP for IPv6")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reported-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Tested-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      30ca22e4
    • Helge Deller's avatar
      parisc: Fix section mismatches · 8d73b180
      Helge Deller authored
      Fix three section mismatches:
      1) Section mismatch in reference from the function ioread8() to the
         function .init.text:pcibios_init_bridge()
      2) Section mismatch in reference from the function free_initmem() to the
         function .init.text:map_pages()
      3) Section mismatch in reference from the function ccio_ioc_init() to
         the function .init.text:count_parisc_driver()
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      8d73b180
    • Helge Deller's avatar
      parisc: drivers.c: Fix section mismatches · b819439f
      Helge Deller authored
      Fix two section mismatches in drivers.c:
      1) Section mismatch in reference from the function alloc_tree_node() to
         the function .init.text:create_tree_node().
      2) Section mismatch in reference from the function walk_native_bus() to
         the function .init.text:alloc_pa_dev().
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      b819439f
    • Alexei Starovoitov's avatar
      Merge branch 'x86-bpf-jit-fixes' · 0f58e58e
      Alexei Starovoitov authored
      Daniel Borkmann says:
      
      ====================
      Fix two memory leaks in x86 JIT. For details, please see
      individual patches in this series. Thanks!
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      0f58e58e
    • Daniel Borkmann's avatar
      bpf, x64: fix memleak when not converging on calls · 39f56ca9
      Daniel Borkmann authored
      The JIT logic in jit_subprogs() is as follows: for all subprogs we
      allocate a bpf_prog_alloc(), populate it (prog->is_func = 1 here),
      and pass it to bpf_int_jit_compile(). If a failure occurred during
      JIT and prog->jited is not set, then we bail out from attempting to
      JIT the whole program, and punt to the interpreter instead. In case
      JITing went successful, we fixup BPF call offsets and do another
      pass to bpf_int_jit_compile() (extra_pass is true at that point) to
      complete JITing calls. Given that requires to pass JIT context around
      addrs and jit_data from x86 JIT are freed in the extra_pass in
      bpf_int_jit_compile() when calls are involved (if not, they can
      be freed immediately). However, if in the original pass, the JIT
      image didn't converge then we leak addrs and jit_data since image
      itself is NULL, the prog->is_func is set and extra_pass is false
      in that case, meaning both will become unreachable and are never
      cleaned up, therefore we need to free as well on !image. Only x64
      JIT is affected.
      
      Fixes: 1c2a088a ("bpf: x64: add JIT support for multi-function programs")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      39f56ca9
    • Daniel Borkmann's avatar
      bpf, x64: fix memleak when not converging after image · 3aab8884
      Daniel Borkmann authored
      While reviewing x64 JIT code, I noticed that we leak the prior allocated
      JIT image in the case where proglen != oldproglen during the JIT passes.
      Prior to the commit e0ee9c12 ("x86: bpf_jit: fix two bugs in eBPF JIT
      compiler") we would just break out of the loop, and using the image as the
      JITed prog since it could only shrink in size anyway. After e0ee9c12,
      we would bail out to out_addrs label where we free addrs and jit_data but
      not the image coming from bpf_jit_binary_alloc().
      
      Fixes: e0ee9c12 ("x86: bpf_jit: fix two bugs in eBPF JIT compiler")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      3aab8884
    • Ursula Braun's avatar
      net/smc: restrict non-blocking connect finish · 784813ae
      Ursula Braun authored
      The smc_poll code tries to finish connect() if the socket is in
      state SMC_INIT and polling of the internal CLC-socket returns with
      EPOLLOUT. This makes sense for a select/poll call following a connect
      call, but not without preceding connect().
      With this patch smc_poll starts connect logic only, if the CLC-socket
      is no longer in its initial state TCP_CLOSE.
      
      In addition, a poll error on the internal CLC-socket is always
      propagated to the SMC socket.
      
      With this patch the code path mentioned by syzbot
      https://syzkaller.appspot.com/bug?extid=03faa2dc16b8b64be396
      is no longer possible.
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Reported-by: syzbot+03faa2dc16b8b64be396@syzkaller.appspotmail.com
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      784813ae
    • Ingo Molnar's avatar
      8139too: Use disable_irq_nosync() in rtl8139_poll_controller() · af3e0fcf
      Ingo Molnar authored
      Use disable_irq_nosync() instead of disable_irq() as this might be
      called in atomic context with netpoll.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      af3e0fcf
    • Xin Long's avatar
      sctp: fix the issue that the cookie-ack with auth can't get processed · ce402f04
      Xin Long authored
      When auth is enabled for cookie-ack chunk, in sctp_inq_pop, sctp
      processes auth chunk first, then continues to the next chunk in
      this packet if chunk_end + chunk_hdr size < skb_tail_pointer().
      Otherwise, it will go to the next packet or discard this chunk.
      
      However, it missed the fact that cookie-ack chunk's size is equal
      to chunk_hdr size, which couldn't match that check, and thus this
      chunk would not get processed.
      
      This patch fixes it by changing the check to chunk_end + chunk_hdr
      size <= skb_tail_pointer().
      
      Fixes: 26b87c78 ("net: sctp: fix remote memory pressure from excessive queueing")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ce402f04
    • Xin Long's avatar
      sctp: use the old asoc when making the cookie-ack chunk in dupcook_d · 46e16d4b
      Xin Long authored
      When processing a duplicate cookie-echo chunk, for case 'D', sctp will
      not process the param from this chunk. It means old asoc has nothing
      to be updated, and the new temp asoc doesn't have the complete info.
      
      So there's no reason to use the new asoc when creating the cookie-ack
      chunk. Otherwise, like when auth is enabled for cookie-ack, the chunk
      can not be set with auth, and it will definitely be dropped by peer.
      
      This issue is there since very beginning, and we fix it by using the
      old asoc instead.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      46e16d4b
    • Xin Long's avatar
      sctp: init active key for the new asoc in dupcook_a and dupcook_b · 4842a08f
      Xin Long authored
      When processing a duplicate cookie-echo chunk, for case 'A' and 'B',
      after sctp_process_init for the new asoc, if auth is enabled for the
      cookie-ack chunk, the active key should also be initialized.
      
      Otherwise, the cookie-ack chunk made later can not be set with auth
      shkey properly, and a crash can even be caused by this, as after
      Commit 1b1e0bc9 ("sctp: add refcnt support for sh_key"), sctp
      needs to hold the shkey when making control chunks.
      
      Fixes: 1b1e0bc9 ("sctp: add refcnt support for sh_key")
      Reported-by: default avatarJianwen Ji <jiji@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4842a08f
    • Neal Cardwell's avatar
      tcp_bbr: fix to zero idle_restart only upon S/ACKed data · e6e6a278
      Neal Cardwell authored
      Previously the bbr->idle_restart tracking was zeroing out the
      bbr->idle_restart bit upon ACKs that did not SACK or ACK anything,
      e.g. receiving incoming data or receiver window updates. In such
      situations BBR would forget that this was a restart-from-idle
      situation, and if the min_rtt had expired it would unnecessarily enter
      PROBE_RTT (even though we were actually restarting from idle but had
      merely forgotten that fact).
      
      The fix is simple: we need to remember we are restarting from idle
      until we receive a S/ACK for some data (a S/ACK for the first flight
      of data we send as we are restarting).
      
      This commit is a stable candidate for kernels back as far as 4.9.
      
      Fixes: 0f8782ea ("tcp_bbr: add BBR congestion control")
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarPriyaranjan Jha <priyarjha@google.com>
      Signed-off-by: default avatarYousuk Seung <ysseung@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e6e6a278
    • Grygorii Strashko's avatar
      net: ethernet: ti: cpsw: fix packet leaking in dual_mac mode · 5e5add17
      Grygorii Strashko authored
      In dual_mac mode packets arrived on one port should not be forwarded by
      switch hw to another port. Only Linux Host can forward packets between
      ports. The below test case (reported in [1]) shows that packet arrived on
      one port can be leaked to anoter (reproducible with dual port evms):
       - connect port 1 (eth0) to linux Host 0 and run tcpdump or Wireshark
       - connect port 2 (eth1) to linux Host 1 with vlan 1 configured
       - ping <IPx> from Host 1 through vlan 1 interface.
      ARP packets will be seen on Host 0.
      
      Issue happens because dual_mac mode is implemnted using two vlans: 1 (Port
      1+Port 0) and 2 (Port 2+Port 0), so there are vlan records created for for
      each vlan. By default, the ALE will find valid vlan record in its table
      when vlan 1 tagged packet arrived on Port 2 and so forwards packet to all
      ports which are vlan 1 members (like Port.
      
      To avoid such behaviorr the ALE VLAN ID Ingress Check need to be enabled
      for each external CPSW port (ALE_PORTCTLn.VID_INGRESS_CHECK) so ALE will
      drop ingress packets if Rx port is not VLAN member.
      Signed-off-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e5add17
    • Michael S. Tsirkin's avatar
      Revert "vhost: make msg padding explicit" · c818aa88
      Michael S. Tsirkin authored
      This reverts commit 93c0d549c4c5a7382ad70de6b86610b7aae57406.
      
      Unfortunately the padding will break 32 bit userspace.
      Ouch. Need to add some compat code, revert for now.
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c818aa88