• Magnus Karlsson's avatar
    xsk: Fix race at socket teardown · 18b1ab7a
    Magnus Karlsson authored
    Fix a race in the xsk socket teardown code that can lead to a NULL pointer
    dereference splat. The current xsk unbind code in xsk_unbind_dev() starts by
    setting xs->state to XSK_UNBOUND, sets xs->dev to NULL and then waits for any
    NAPI processing to terminate using synchronize_net(). After that, the release
    code starts to tear down the socket state and free allocated memory.
    
      BUG: kernel NULL pointer dereference, address: 00000000000000c0
      PGD 8000000932469067 P4D 8000000932469067 PUD 0
      Oops: 0000 [#1] PREEMPT SMP PTI
      CPU: 25 PID: 69132 Comm: grpcpp_sync_ser Tainted: G          I       5.16.0+ #2
      Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 1.2.10 03/09/2015
      RIP: 0010:__xsk_sendmsg+0x2c/0x690
      [...]
      RSP: 0018:ffffa2348bd13d50 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: 0000000000000040 RCX: ffff8d5fc632d258
      RDX: 0000000000400000 RSI: ffffa2348bd13e10 RDI: ffff8d5fc5489800
      RBP: ffffa2348bd13db0 R08: 0000000000000000 R09: 00007ffffffff000
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff8d5fc5489800
      R13: ffff8d5fcb0f5140 R14: ffff8d5fcb0f5140 R15: 0000000000000000
      FS:  00007f991cff9400(0000) GS:ffff8d6f1f700000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000000000c0 CR3: 0000000114888005 CR4: 00000000001706e0
      Call Trace:
      <TASK>
      ? aa_sk_perm+0x43/0x1b0
      xsk_sendmsg+0xf0/0x110
      sock_sendmsg+0x65/0x70
      __sys_sendto+0x113/0x190
      ? debug_smp_processor_id+0x17/0x20
      ? fpregs_assert_state_consistent+0x23/0x50
      ? exit_to_user_mode_prepare+0xa5/0x1d0
      __x64_sys_sendto+0x29/0x30
      do_syscall_64+0x3b/0xc0
      entry_SYSCALL_64_after_hwframe+0x44/0xae
    
    There are two problems with the current code. First, setting xs->dev to NULL
    before waiting for all users to stop using the socket is not correct. The
    entry to the data plane functions xsk_poll(), xsk_sendmsg(), and xsk_recvmsg()
    are all guarded by a test that xs->state is in the state XSK_BOUND and if not,
    it returns right away. But one process might have passed this test but still
    have not gotten to the point in which it uses xs->dev in the code. In this
    interim, a second process executing xsk_unbind_dev() might have set xs->dev to
    NULL which will lead to a crash for the first process. The solution here is
    just to get rid of this NULL assignment since it is not used anymore. Before
    commit 42fddcc7 ("xsk: use state member for socket synchronization"),
    xs->dev was the gatekeeper to admit processes into the data plane functions,
    but it was replaced with the state variable xs->state in the aforementioned
    commit.
    
    The second problem is that synchronize_net() does not wait for any process in
    xsk_poll(), xsk_sendmsg(), or xsk_recvmsg() to complete, which means that the
    state they rely on might be cleaned up prematurely. This can happen when the
    notifier gets called (at driver unload for example) as it uses xsk_unbind_dev().
    Solve this by extending the RCU critical region from just the ndo_xsk_wakeup
    to the whole functions mentioned above, so that both the test of xs->state ==
    XSK_BOUND and the last use of any member of xs is covered by the RCU critical
    section. This will guarantee that when synchronize_net() completes, there will
    be no processes left executing xsk_poll(), xsk_sendmsg(), or xsk_recvmsg() and
    state can be cleaned up safely. Note that we need to drop the RCU lock for the
    skb xmit path as it uses functions that might sleep. Due to this, we have to
    retest the xs->state after we grab the mutex that protects the skb xmit code
    from, among a number of things, an xsk_unbind_dev() being executed from the
    notifier at the same time.
    
    Fixes: 42fddcc7 ("xsk: use state member for socket synchronization")
    Reported-by: default avatarElza Mathew <elza.mathew@intel.com>
    Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
    Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
    Acked-by: default avatarBjörn Töpel <bjorn@kernel.org>
    Link: https://lore.kernel.org/bpf/20220228094552.10134-1-magnus.karlsson@gmail.com
    18b1ab7a
xsk.c 32.5 KB