1. 05 Dec, 2023 5 commits
    • Hou Tao's avatar
      bpf: Optimize the free of inner map · af66bfd3
      Hou Tao authored
      When removing the inner map from the outer map, the inner map will be
      freed after one RCU grace period and one RCU tasks trace grace
      period, so it is certain that the bpf program, which may access the
      inner map, has exited before the inner map is freed.
      
      However there is no need to wait for one RCU tasks trace grace period if
      the outer map is only accessed by non-sleepable program. So adding
      sleepable_refcnt in bpf_map and increasing sleepable_refcnt when adding
      the outer map into env->used_maps for sleepable program. Although the
      max number of bpf program is INT_MAX - 1, the number of bpf programs
      which are being loaded may be greater than INT_MAX, so using atomic64_t
      instead of atomic_t for sleepable_refcnt. When removing the inner map
      from the outer map, using sleepable_refcnt to decide whether or not a
      RCU tasks trace grace period is needed before freeing the inner map.
      Signed-off-by: default avatarHou Tao <houtao1@huawei.com>
      Link: https://lore.kernel.org/r/20231204140425.1480317-6-houtao@huaweicloud.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      af66bfd3
    • Hou Tao's avatar
      bpf: Defer the free of inner map when necessary · 87667336
      Hou Tao authored
      When updating or deleting an inner map in map array or map htab, the map
      may still be accessed by non-sleepable program or sleepable program.
      However bpf_map_fd_put_ptr() decreases the ref-counter of the inner map
      directly through bpf_map_put(), if the ref-counter is the last one
      (which is true for most cases), the inner map will be freed by
      ops->map_free() in a kworker. But for now, most .map_free() callbacks
      don't use synchronize_rcu() or its variants to wait for the elapse of a
      RCU grace period, so after the invocation of ops->map_free completes,
      the bpf program which is accessing the inner map may incur
      use-after-free problem.
      
      Fix the free of inner map by invoking bpf_map_free_deferred() after both
      one RCU grace period and one tasks trace RCU grace period if the inner
      map has been removed from the outer map before. The deferment is
      accomplished by using call_rcu() or call_rcu_tasks_trace() when
      releasing the last ref-counter of bpf map. The newly-added rcu_head
      field in bpf_map shares the same storage space with work field to
      reduce the size of bpf_map.
      
      Fixes: bba1dc0b ("bpf: Remove redundant synchronize_rcu.")
      Fixes: 638e4b82 ("bpf: Allows per-cpu maps and map-in-map in sleepable programs")
      Signed-off-by: default avatarHou Tao <houtao1@huawei.com>
      Link: https://lore.kernel.org/r/20231204140425.1480317-5-houtao@huaweicloud.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      87667336
    • Hou Tao's avatar
      bpf: Set need_defer as false when clearing fd array during map free · 79d93b3c
      Hou Tao authored
      Both map deletion operation, map release and map free operation use
      fd_array_map_delete_elem() to remove the element from fd array and
      need_defer is always true in fd_array_map_delete_elem(). For the map
      deletion operation and map release operation, need_defer=true is
      necessary, because the bpf program, which accesses the element in fd
      array, may still alive. However for map free operation, it is certain
      that the bpf program which owns the fd array has already been exited, so
      setting need_defer as false is appropriate for map free operation.
      
      So fix it by adding need_defer parameter to bpf_fd_array_map_clear() and
      adding a new helper __fd_array_map_delete_elem() to handle the map
      deletion, map release and map free operations correspondingly.
      Signed-off-by: default avatarHou Tao <houtao1@huawei.com>
      Link: https://lore.kernel.org/r/20231204140425.1480317-4-houtao@huaweicloud.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      79d93b3c
    • Hou Tao's avatar
      bpf: Add map and need_defer parameters to .map_fd_put_ptr() · 20c20bd1
      Hou Tao authored
      map is the pointer of outer map, and need_defer needs some explanation.
      need_defer tells the implementation to defer the reference release of
      the passed element and ensure that the element is still alive before
      the bpf program, which may manipulate it, exits.
      
      The following three cases will invoke map_fd_put_ptr() and different
      need_defer values will be passed to these callers:
      
      1) release the reference of the old element in the map during map update
         or map deletion. The release must be deferred, otherwise the bpf
         program may incur use-after-free problem, so need_defer needs to be
         true.
      2) release the reference of the to-be-added element in the error path of
         map update. The to-be-added element is not visible to any bpf
         program, so it is OK to pass false for need_defer parameter.
      3) release the references of all elements in the map during map release.
         Any bpf program which has access to the map must have been exited and
         released, so need_defer=false will be OK.
      
      These two parameters will be used by the following patches to fix the
      potential use-after-free problem for map-in-map.
      Signed-off-by: default avatarHou Tao <houtao1@huawei.com>
      Link: https://lore.kernel.org/r/20231204140425.1480317-3-houtao@huaweicloud.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      20c20bd1
    • Hou Tao's avatar
      bpf: Check rcu_read_lock_trace_held() before calling bpf map helpers · 169410eb
      Hou Tao authored
      These three bpf_map_{lookup,update,delete}_elem() helpers are also
      available for sleepable bpf program, so add the corresponding lock
      assertion for sleepable bpf program, otherwise the following warning
      will be reported when a sleepable bpf program manipulates bpf map under
      interpreter mode (aka bpf_jit_enable=0):
      
        WARNING: CPU: 3 PID: 4985 at kernel/bpf/helpers.c:40 ......
        CPU: 3 PID: 4985 Comm: test_progs Not tainted 6.6.0+ #2
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ......
        RIP: 0010:bpf_map_lookup_elem+0x54/0x60
        ......
        Call Trace:
         <TASK>
         ? __warn+0xa5/0x240
         ? bpf_map_lookup_elem+0x54/0x60
         ? report_bug+0x1ba/0x1f0
         ? handle_bug+0x40/0x80
         ? exc_invalid_op+0x18/0x50
         ? asm_exc_invalid_op+0x1b/0x20
         ? __pfx_bpf_map_lookup_elem+0x10/0x10
         ? rcu_lockdep_current_cpu_online+0x65/0xb0
         ? rcu_is_watching+0x23/0x50
         ? bpf_map_lookup_elem+0x54/0x60
         ? __pfx_bpf_map_lookup_elem+0x10/0x10
         ___bpf_prog_run+0x513/0x3b70
         __bpf_prog_run32+0x9d/0xd0
         ? __bpf_prog_enter_sleepable_recur+0xad/0x120
         ? __bpf_prog_enter_sleepable_recur+0x3e/0x120
         bpf_trampoline_6442580665+0x4d/0x1000
         __x64_sys_getpgid+0x5/0x30
         ? do_syscall_64+0x36/0xb0
         entry_SYSCALL_64_after_hwframe+0x6e/0x76
         </TASK>
      Signed-off-by: default avatarHou Tao <houtao1@huawei.com>
      Link: https://lore.kernel.org/r/20231204140425.1480317-2-houtao@huaweicloud.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      169410eb
  2. 04 Dec, 2023 2 commits
  3. 02 Dec, 2023 19 commits
  4. 01 Dec, 2023 14 commits