1. 26 Apr, 2022 2 commits
    • Maciej Fijalkowski's avatar
      xsk: Fix possible crash when multiple sockets are created · ba3beec2
      Maciej Fijalkowski authored
      Fix a crash that happens if an Rx only socket is created first, then a
      second socket is created that is Tx only and bound to the same umem as
      the first socket and also the same netdev and queue_id together with the
      XDP_SHARED_UMEM flag. In this specific case, the tx_descs array page
      pool was not created by the first socket as it was an Rx only socket.
      When the second socket is bound it needs this tx_descs array of this
      shared page pool as it has a Tx component, but unfortunately it was
      never allocated, leading to a crash. Note that this array is only used
      for zero-copy drivers using the batched Tx APIs, currently only ice and
      i40e.
      
      [ 5511.150360] BUG: kernel NULL pointer dereference, address: 0000000000000008
      [ 5511.158419] #PF: supervisor write access in kernel mode
      [ 5511.164472] #PF: error_code(0x0002) - not-present page
      [ 5511.170416] PGD 0 P4D 0
      [ 5511.173347] Oops: 0002 [#1] PREEMPT SMP PTI
      [ 5511.178186] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G            E     5.18.0-rc1+ #97
      [ 5511.187245] Hardware name: Intel Corp. GRANTLEY/GRANTLEY, BIOS GRRFCRB1.86B.0276.D07.1605190235 05/19/2016
      [ 5511.198418] RIP: 0010:xsk_tx_peek_release_desc_batch+0x198/0x310
      [ 5511.205375] Code: c0 83 c6 01 84 c2 74 6d 8d 46 ff 23 07 44 89 e1 48 83 c0 14 48 c1 e1 04 48 c1 e0 04 48 03 47 10 4c 01 c1 48 8b 50 08 48 8b 00 <48> 89 51 08 48 89 01 41 80 bd d7 00 00 00 00 75 82 48 8b 19 49 8b
      [ 5511.227091] RSP: 0018:ffffc90000003dd0 EFLAGS: 00010246
      [ 5511.233135] RAX: 0000000000000000 RBX: ffff88810c8da600 RCX: 0000000000000000
      [ 5511.241384] RDX: 000000000000003c RSI: 0000000000000001 RDI: ffff888115f555c0
      [ 5511.249634] RBP: ffffc90000003e08 R08: 0000000000000000 R09: ffff889092296b48
      [ 5511.257886] R10: 0000ffffffffffff R11: ffff889092296800 R12: 0000000000000000
      [ 5511.266138] R13: ffff88810c8db500 R14: 0000000000000040 R15: 0000000000000100
      [ 5511.274387] FS:  0000000000000000(0000) GS:ffff88903f800000(0000) knlGS:0000000000000000
      [ 5511.283746] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 5511.290389] CR2: 0000000000000008 CR3: 00000001046e2001 CR4: 00000000003706f0
      [ 5511.298640] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 5511.306892] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 5511.315142] Call Trace:
      [ 5511.317972]  <IRQ>
      [ 5511.320301]  ice_xmit_zc+0x68/0x2f0 [ice]
      [ 5511.324977]  ? ktime_get+0x38/0xa0
      [ 5511.328913]  ice_napi_poll+0x7a/0x6a0 [ice]
      [ 5511.333784]  __napi_poll+0x2c/0x160
      [ 5511.337821]  net_rx_action+0xdd/0x200
      [ 5511.342058]  __do_softirq+0xe6/0x2dd
      [ 5511.346198]  irq_exit_rcu+0xb5/0x100
      [ 5511.350339]  common_interrupt+0xa4/0xc0
      [ 5511.354777]  </IRQ>
      [ 5511.357201]  <TASK>
      [ 5511.359625]  asm_common_interrupt+0x1e/0x40
      [ 5511.364466] RIP: 0010:cpuidle_enter_state+0xd2/0x360
      [ 5511.370211] Code: 49 89 c5 0f 1f 44 00 00 31 ff e8 e9 00 7b ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 72 02 00 00 31 ff e8 02 0c 80 ff fb 45 85 f6 <0f> 88 11 01 00 00 49 63 c6 4c 2b 2c 24 48 8d 14 40 48 8d 14 90 49
      [ 5511.391921] RSP: 0018:ffffffff82a03e60 EFLAGS: 00000202
      [ 5511.397962] RAX: ffff88903f800000 RBX: 0000000000000001 RCX: 000000000000001f
      [ 5511.406214] RDX: 0000000000000000 RSI: ffffffff823400b9 RDI: ffffffff8234c046
      [ 5511.424646] RBP: ffff88810a384800 R08: 000005032a28c046 R09: 0000000000000008
      [ 5511.443233] R10: 000000000000000b R11: 0000000000000006 R12: ffffffff82bcf700
      [ 5511.461922] R13: 000005032a28c046 R14: 0000000000000001 R15: 0000000000000000
      [ 5511.480300]  cpuidle_enter+0x29/0x40
      [ 5511.494329]  do_idle+0x1c7/0x250
      [ 5511.507610]  cpu_startup_entry+0x19/0x20
      [ 5511.521394]  start_kernel+0x649/0x66e
      [ 5511.534626]  secondary_startup_64_no_verify+0xc3/0xcb
      [ 5511.549230]  </TASK>
      
      Detect such case during bind() and allocate this memory region via newly
      introduced xp_alloc_tx_descs(). Also, use kvcalloc instead of kcalloc as
      for other buffer pool allocations, so that it matches the kvfree() from
      xp_destroy().
      
      Fixes: d1bc532e ("i40e: xsk: Move tmp desc array from driver to pool")
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Link: https://lore.kernel.org/bpf/20220425153745.481322-1-maciej.fijalkowski@intel.com
      ba3beec2
    • Adam Zabrocki's avatar
      kprobes: Fix KRETPROBES when CONFIG_KRETPROBE_ON_RETHOOK is set · 1d661ed5
      Adam Zabrocki authored
      The recent kernel change in 73f9b911 ("kprobes: Use rethook for kretprobe
      if possible"), introduced a potential NULL pointer dereference bug in the
      KRETPROBE mechanism. The official Kprobes documentation defines that "Any or
      all handlers can be NULL". Unfortunately, there is a missing return handler
      verification to fulfill these requirements and can result in a NULL pointer
      dereference bug.
      
      This patch adds such verification in kretprobe_rethook_handler() function.
      
      Fixes: 73f9b911 ("kprobes: Use rethook for kretprobe if possible")
      Signed-off-by: default avatarAdam Zabrocki <pi3@pi3.com.pl>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.ibm.com>
      Cc: Anil S. Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Link: https://lore.kernel.org/bpf/20220422164027.GA7862@pi3.com.pl
      1d661ed5
  2. 22 Apr, 2022 1 commit
    • Eyal Birger's avatar
      bpf, lwt: Fix crash when using bpf_skb_set_tunnel_key() from bpf_xmit lwt hook · b02d196c
      Eyal Birger authored
      xmit_check_hhlen() observes the dst for getting the device hard header
      length to make sure a modified packet can fit. When a helper which changes
      the dst - such as bpf_skb_set_tunnel_key() - is called as part of the
      xmit program the accessed dst is no longer valid.
      
      This leads to the following splat:
      
       BUG: kernel NULL pointer dereference, address: 00000000000000de
       #PF: supervisor read access in kernel mode
       #PF: error_code(0x0000) - not-present page
       PGD 0 P4D 0
       Oops: 0000 [#1] PREEMPT SMP PTI
       CPU: 0 PID: 798 Comm: ping Not tainted 5.18.0-rc2+ #103
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
       RIP: 0010:bpf_xmit+0xfb/0x17f
       Code: c6 c0 4d cd 8e 48 c7 c7 7d 33 f0 8e e8 42 09 fb ff 48 8b 45 58 48 8b 95 c8 00 00 00 48 2b 95 c0 00 00 00 48 83 e0 fe 48 8b 00 <0f> b7 80 de 00 00 00 39 c2 73 22 29 d0 b9 20 0a 00 00 31 d2 48 89
       RSP: 0018:ffffb148c0bc7b98 EFLAGS: 00010282
       RAX: 0000000000000000 RBX: 0000000000240008 RCX: 0000000000000000
       RDX: 0000000000000010 RSI: 00000000ffffffea RDI: 00000000ffffffff
       RBP: ffff922a828a4e00 R08: ffffffff8f1350e8 R09: 00000000ffffdfff
       R10: ffffffff8f055100 R11: ffffffff8f105100 R12: 0000000000000000
       R13: ffff922a828a4e00 R14: 0000000000000040 R15: 0000000000000000
       FS:  00007f414e8f0080(0000) GS:ffff922afdc00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00000000000000de CR3: 0000000002d80006 CR4: 0000000000370ef0
       Call Trace:
        <TASK>
        lwtunnel_xmit.cold+0x71/0xc8
        ip_finish_output2+0x279/0x520
        ? __ip_finish_output.part.0+0x21/0x130
      
      Fix by fetching the device hard header length before running the BPF code.
      
      Fixes: 3a0af8fd ("bpf: BPF for lightweight tunnel infrastructure")
      Signed-off-by: default avatarEyal Birger <eyal.birger@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20220420165219.1755407-1-eyal.birger@gmail.com
      b02d196c
  3. 11 Apr, 2022 1 commit
  4. 07 Apr, 2022 3 commits
    • Maciej Fijalkowski's avatar
      xsk: Fix l2fwd for copy mode + busy poll combo · 8de8b71b
      Maciej Fijalkowski authored
      While checking AF_XDP copy mode combined with busy poll, strange
      results were observed. rxdrop and txonly scenarios worked fine, but
      l2fwd broke immediately.
      
      After a deeper look, it turned out that for l2fwd, Tx side was exiting
      early due to xsk_no_wakeup() returning true and in the end
      xsk_generic_xmit() was never called. Note that AF_XDP Tx in copy mode
      is syscall steered, so the current behavior is broken.
      
      Txonly scenario only worked due to the fact that
      sk_mark_napi_id_once_xdp() was never called - since Rx side is not in
      the picture for this case and mentioned function is called in
      xsk_rcv_check(), sk::sk_napi_id was never set, which in turn meant that
      xsk_no_wakeup() was returning false (see the sk->sk_napi_id >=
      MIN_NAPI_ID check in there).
      
      To fix this, prefer busy poll in xsk_sendmsg() only when zero copy is
      enabled on a given AF_XDP socket. By doing so, busy poll in copy mode
      would not exit early on Tx side and eventually xsk_generic_xmit() will
      be called.
      
      Fixes: a0731952 ("xsk: Add busy-poll support for {recv,send}msg()")
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20220406155804.434493-1-maciej.fijalkowski@intel.com
      8de8b71b
    • Duoming Zhou's avatar
      drivers: net: slip: fix NPD bug in sl_tx_timeout() · ec4eb8a8
      Duoming Zhou authored
      When a slip driver is detaching, the slip_close() will act to
      cleanup necessary resources and sl->tty is set to NULL in
      slip_close(). Meanwhile, the packet we transmit is blocked,
      sl_tx_timeout() will be called. Although slip_close() and
      sl_tx_timeout() use sl->lock to synchronize, we don`t judge
      whether sl->tty equals to NULL in sl_tx_timeout() and the
      null pointer dereference bug will happen.
      
         (Thread 1)                 |      (Thread 2)
                                    | slip_close()
                                    |   spin_lock_bh(&sl->lock)
                                    |   ...
      ...                           |   sl->tty = NULL //(1)
      sl_tx_timeout()               |   spin_unlock_bh(&sl->lock)
        spin_lock(&sl->lock);       |
        ...                         |   ...
        tty_chars_in_buffer(sl->tty)|
          if (tty->ops->..) //(2)   |
          ...                       |   synchronize_rcu()
      
      We set NULL to sl->tty in position (1) and dereference sl->tty
      in position (2).
      
      This patch adds check in sl_tx_timeout(). If sl->tty equals to
      NULL, sl_tx_timeout() will goto out.
      Signed-off-by: default avatarDuoming Zhou <duoming@zju.edu.cn>
      Reviewed-by: default avatarJiri Slaby <jirislaby@kernel.org>
      Link: https://lore.kernel.org/r/20220405132206.55291-1-duoming@zju.edu.cnSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ec4eb8a8
    • Jakub Kicinski's avatar
      Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 8e9d0d7a
      Jakub Kicinski authored
      Alexei Starovoitov says:
      
      ====================
      pull-request: bpf 2022-04-06
      
      We've added 8 non-merge commits during the last 8 day(s) which contain
      a total of 9 files changed, 139 insertions(+), 36 deletions(-).
      
      The main changes are:
      
      1) rethook related fixes, from Jiri and Masami.
      
      2) Fix the case when tracing bpf prog is attached to struct_ops, from Martin.
      
      3) Support dual-stack sockets in bpf_tcp_check_syncookie, from Maxim.
      
      * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        bpf: Adjust bpf_tcp_check_syncookie selftest to test dual-stack sockets
        bpf: Support dual-stack sockets in bpf_tcp_check_syncookie
        bpf: selftests: Test fentry tracing a struct_ops program
        bpf: Resolve to prog->aux->dst_prog->type only for BPF_PROG_TYPE_EXT
        rethook: Fix to use WRITE_ONCE() for rethook:: Handler
        selftests/bpf: Fix warning comparing pointer to 0
        bpf: Fix sparse warnings in kprobe_multi_resolve_syms
        bpftool: Explicit errno handling in skeletons
      ====================
      
      Link: https://lore.kernel.org/r/20220407031245.73026-1-alexei.starovoitov@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8e9d0d7a
  5. 06 Apr, 2022 19 commits
  6. 05 Apr, 2022 11 commits
  7. 04 Apr, 2022 3 commits