1. 15 Nov, 2022 5 commits
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Rename RET_PTR_TO_ALLOC_MEM · 2de2669b
      Kumar Kartikeya Dwivedi authored
      Currently, the verifier has two return types, RET_PTR_TO_ALLOC_MEM, and
      RET_PTR_TO_ALLOC_MEM_OR_NULL, however the former is confusingly named to
      imply that it carries MEM_ALLOC, while only the latter does. This causes
      confusion during code review leading to conclusions like that the return
      value of RET_PTR_TO_DYNPTR_MEM_OR_NULL (which is RET_PTR_TO_ALLOC_MEM |
      PTR_MAYBE_NULL) may be consumable by bpf_ringbuf_{submit,commit}.
      
      Rename it to make it clear MEM_ALLOC needs to be tacked on top of
      RET_PTR_TO_MEM.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20221114191547.1694267-6-memxor@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      2de2669b
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Support bpf_list_head in map values · f0c5941f
      Kumar Kartikeya Dwivedi authored
      Add the support on the map side to parse, recognize, verify, and build
      metadata table for a new special field of the type struct bpf_list_head.
      To parameterize the bpf_list_head for a certain value type and the
      list_node member it will accept in that value type, we use BTF
      declaration tags.
      
      The definition of bpf_list_head in a map value will be done as follows:
      
      struct foo {
      	struct bpf_list_node node;
      	int data;
      };
      
      struct map_value {
      	struct bpf_list_head head __contains(foo, node);
      };
      
      Then, the bpf_list_head only allows adding to the list 'head' using the
      bpf_list_node 'node' for the type struct foo.
      
      The 'contains' annotation is a BTF declaration tag composed of four
      parts, "contains:name:node" where the name is then used to look up the
      type in the map BTF, with its kind hardcoded to BTF_KIND_STRUCT during
      the lookup. The node defines name of the member in this type that has
      the type struct bpf_list_node, which is actually used for linking into
      the linked list. For now, 'kind' part is hardcoded as struct.
      
      This allows building intrusive linked lists in BPF, using container_of
      to obtain pointer to entry, while being completely type safe from the
      perspective of the verifier. The verifier knows exactly the type of the
      nodes, and knows that list helpers return that type at some fixed offset
      where the bpf_list_node member used for this list exists. The verifier
      also uses this information to disallow adding types that are not
      accepted by a certain list.
      
      For now, no elements can be added to such lists. Support for that is
      coming in future patches, hence draining and freeing items is done with
      a TODO that will be resolved in a future patch.
      
      Note that the bpf_list_head_free function moves the list out to a local
      variable under the lock and releases it, doing the actual draining of
      the list items outside the lock. While this helps with not holding the
      lock for too long pessimizing other concurrent list operations, it is
      also necessary for deadlock prevention: unless every function called in
      the critical section would be notrace, a fentry/fexit program could
      attach and call bpf_map_update_elem again on the map, leading to the
      same lock being acquired if the key matches and lead to a deadlock.
      While this requires some special effort on part of the BPF programmer to
      trigger and is highly unlikely to occur in practice, it is always better
      if we can avoid such a condition.
      
      While notrace would prevent this, doing the draining outside the lock
      has advantages of its own, hence it is used to also fix the deadlock
      related problem.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20221114191547.1694267-5-memxor@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f0c5941f
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Fix copy_map_value, zero_map_value · e5feed0f
      Kumar Kartikeya Dwivedi authored
      The current offset needs to also skip over the already copied region in
      addition to the size of the next field. This case manifests where there
      are gaps between adjacent special fields.
      
      It was observed that for a map value with size 48, having fields at:
      off:  0, 16, 32
      size: 4, 16, 16
      
      The current code does:
      
      memcpy(dst + 0, src + 0, 0)
      memcpy(dst + 4, src + 4, 12)
      memcpy(dst + 20, src + 20, 12)
      memcpy(dst + 36, src + 36, 12)
      
      With the fix, it is done correctly as:
      
      memcpy(dst + 0, src + 0, 0)
      memcpy(dst + 4, src + 4, 12)
      memcpy(dst + 32, src + 32, 0)
      memcpy(dst + 48, src + 48, 0)
      
      Fixes: 4d7d7f69 ("bpf: Adapt copy_map_value for multiple offset case")
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20221114191547.1694267-4-memxor@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      e5feed0f
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Remove BPF_MAP_OFF_ARR_MAX · 2d577252
      Kumar Kartikeya Dwivedi authored
      In f71b2f64 ("bpf: Refactor map->off_arr handling"), map->off_arr
      was refactored to be btf_field_offs. The number of field offsets is
      equal to maximum possible fields limited by BTF_FIELDS_MAX. Hence, reuse
      BTF_FIELDS_MAX as spin_lock and timer no longer are to be handled
      specially for offset sorting, fix the comment, and remove incorrect
      WARN_ON as its rec->cnt can never exceed this value. The reason to keep
      separate constant was the it was always more 2 more than total kptrs.
      This is no longer the case.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20221114191547.1694267-3-memxor@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      2d577252
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Remove local kptr references in documentation · 1f6d52f1
      Kumar Kartikeya Dwivedi authored
      We don't want to commit to a specific name for these. Simply call them
      allocated objects coming from bpf_obj_new, which is completely clear in
      itself.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20221114191547.1694267-2-memxor@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      1f6d52f1
  2. 14 Nov, 2022 22 commits
  3. 12 Nov, 2022 13 commits