1. 20 May, 2023 1 commit
    • Aditi Ghag's avatar
      bpf: tcp: Avoid taking fast sock lock in iterator · 9378096e
      Aditi Ghag authored
      This is a preparatory commit to replace `lock_sock_fast` with
      `lock_sock`,and facilitate BPF programs executed from the TCP sockets
      iterator to be able to destroy TCP sockets using the bpf_sock_destroy
      kfunc (implemented in follow-up commits).
      
      Previously, BPF TCP iterator was acquiring the sock lock with BH
      disabled. This led to scenarios where the sockets hash table bucket lock
      can be acquired with BH enabled in some path versus disabled in other.
      In such situation, kernel issued a warning since it thinks that in the
      BH enabled path the same bucket lock *might* be acquired again in the
      softirq context (BH disabled), which will lead to a potential dead lock.
      Since bpf_sock_destroy also happens in a process context, the potential
      deadlock warning is likely a false alarm.
      
      Here is a snippet of annotated stack trace that motivated this change:
      
      ```
      
      Possible interrupt unsafe locking scenario:
      
            CPU0                    CPU1
            ----                    ----
       lock(&h->lhash2[i].lock);
                                    local_bh_disable();
                                    lock(&h->lhash2[i].lock);
      kernel imagined possible scenario:
        local_bh_disable();  /* Possible softirq */
        lock(&h->lhash2[i].lock);
      *** Potential Deadlock ***
      
      process context:
      
      lock_acquire+0xcd/0x330
      _raw_spin_lock+0x33/0x40
      ------> Acquire (bucket) lhash2.lock with BH enabled
      __inet_hash+0x4b/0x210
      inet_csk_listen_start+0xe6/0x100
      inet_listen+0x95/0x1d0
      __sys_listen+0x69/0xb0
      __x64_sys_listen+0x14/0x20
      do_syscall_64+0x3c/0x90
      entry_SYSCALL_64_after_hwframe+0x72/0xdc
      
      bpf_sock_destroy run from iterator:
      
      lock_acquire+0xcd/0x330
      _raw_spin_lock+0x33/0x40
      ------> Acquire (bucket) lhash2.lock with BH disabled
      inet_unhash+0x9a/0x110
      tcp_set_state+0x6a/0x210
      tcp_abort+0x10d/0x200
      bpf_prog_6793c5ca50c43c0d_iter_tcp6_server+0xa4/0xa9
      bpf_iter_run_prog+0x1ff/0x340
      ------> lock_sock_fast that acquires sock lock with BH disabled
      bpf_iter_tcp_seq_show+0xca/0x190
      bpf_seq_read+0x177/0x450
      
      ```
      
      Also, Yonghong reported a deadlock for non-listening TCP sockets that
      this change resolves. Previously, `lock_sock_fast` held the sock spin
      lock with BH which was again being acquired in `tcp_abort`:
      
      ```
      watchdog: BUG: soft lockup - CPU#0 stuck for 86s! [test_progs:2331]
      RIP: 0010:queued_spin_lock_slowpath+0xd8/0x500
      Call Trace:
       <TASK>
       _raw_spin_lock+0x84/0x90
       tcp_abort+0x13c/0x1f0
       bpf_prog_88539c5453a9dd47_iter_tcp6_client+0x82/0x89
       bpf_iter_run_prog+0x1aa/0x2c0
       ? preempt_count_sub+0x1c/0xd0
       ? from_kuid_munged+0x1c8/0x210
       bpf_iter_tcp_seq_show+0x14e/0x1b0
       bpf_seq_read+0x36c/0x6a0
      
      bpf_iter_tcp_seq_show
         lock_sock_fast
           __lock_sock_fast
             spin_lock_bh(&sk->sk_lock.slock);
      	/* * Fast path return with bottom halves disabled and * sock::sk_lock.slock held.* */
      
       ...
       tcp_abort
         local_bh_disable();
         spin_lock(&((sk)->sk_lock.slock)); // from bh_lock_sock(sk)
      
      ```
      
      With the switch to `lock_sock`, it calls `spin_unlock_bh` before returning:
      
      ```
      lock_sock
          lock_sock_nested
             spin_lock_bh(&sk->sk_lock.slock);
             :
             spin_unlock_bh(&sk->sk_lock.slock);
      ```
      Acked-by: default avatarYonghong Song <yhs@meta.com>
      Acked-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarAditi Ghag <aditi.ghag@isovalent.com>
      Link: https://lore.kernel.org/r/20230519225157.760788-2-aditi.ghag@isovalent.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      9378096e
  2. 19 May, 2023 3 commits
  3. 17 May, 2023 32 commits
  4. 16 May, 2023 4 commits