1. 25 Aug, 2022 6 commits
    • Hao Luo's avatar
      bpf: Introduce cgroup iter · d4ccaf58
      Hao Luo authored
      Cgroup_iter is a type of bpf_iter. It walks over cgroups in four modes:
      
       - walking a cgroup's descendants in pre-order.
       - walking a cgroup's descendants in post-order.
       - walking a cgroup's ancestors.
       - process only the given cgroup.
      
      When attaching cgroup_iter, one can set a cgroup to the iter_link
      created from attaching. This cgroup is passed as a file descriptor
      or cgroup id and serves as the starting point of the walk. If no
      cgroup is specified, the starting point will be the root cgroup v2.
      
      For walking descendants, one can specify the order: either pre-order or
      post-order. For walking ancestors, the walk starts at the specified
      cgroup and ends at the root.
      
      One can also terminate the walk early by returning 1 from the iter
      program.
      
      Note that because walking cgroup hierarchy holds cgroup_mutex, the iter
      program is called with cgroup_mutex held.
      
      Currently only one session is supported, which means, depending on the
      volume of data bpf program intends to send to user space, the number
      of cgroups that can be walked is limited. For example, given the current
      buffer size is 8 * PAGE_SIZE, if the program sends 64B data for each
      cgroup, assuming PAGE_SIZE is 4kb, the total number of cgroups that can
      be walked is 512. This is a limitation of cgroup_iter. If the output
      data is larger than the kernel buffer size, after all data in the
      kernel buffer is consumed by user space, the subsequent read() syscall
      will signal EOPNOTSUPP. In order to work around, the user may have to
      update their program to reduce the volume of data sent to output. For
      example, skip some uninteresting cgroups. In future, we may extend
      bpf_iter flags to allow customizing buffer size.
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarHao Luo <haoluo@google.com>
      Link: https://lore.kernel.org/r/20220824233117.1312810-2-haoluo@google.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d4ccaf58
    • Yang Yingliang's avatar
      selftests/bpf: Fix wrong size passed to bpf_setsockopt() · 7e165d19
      Yang Yingliang authored
      sizeof(new_cc) is not real memory size that new_cc points to; introduce
      a new_cc_len to store the size and then pass it to bpf_setsockopt().
      
      Fixes: 31123c03 ("selftests/bpf: bpf_setsockopt tests")
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20220824013907.380448-1-yangyingliang@huawei.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      7e165d19
    • Daniel Müller's avatar
      selftests/bpf: Add cb_refs test to s390x deny list · b03914f7
      Daniel Müller authored
      The cb_refs BPF selftest is failing execution on s390x machines. This is
      a newly added test that requires a feature not presently supported on
      this architecture.
      
      Denylist the test for this architecture.
      
      Fixes: 3cf7e7d8685c ("selftests/bpf: Add tests for reference state fixes for callbacks")
      Signed-off-by: default avatarDaniel Müller <deso@posteo.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20220824163906.1186832-1-deso@posteo.netSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      b03914f7
    • Alexei Starovoitov's avatar
      Merge branch 'Fix reference state management for synchronous callbacks' · 09683080
      Alexei Starovoitov authored
      Kumar Kartikeya Dwivedi says:
      
      ====================
      
      This is patch 1, 2 + their individual tests split into a separate series from
      the RFC, so that these can be taken in, while we continue working towards a fix
      for handling stack access inside the callback.
      
      Changelog:
      ----------
      v1 -> v2:
      v1: https://lore.kernel.org/bpf/20220822131923.21476-1-memxor@gmail.com
      
        * Fix error for test_progs-no_alu32 due to distinct alloc_insn in errstr
      
      RFC v1 -> v1:
      RFC v1: https://lore.kernel.org/bpf/20220815051540.18791-1-memxor@gmail.com
      
        * Fix up commit log to add more explanation (Alexei)
        * Split reference state fix out into a separate series
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      09683080
    • Kumar Kartikeya Dwivedi's avatar
      selftests/bpf: Add tests for reference state fixes for callbacks · 35f14dbd
      Kumar Kartikeya Dwivedi authored
      These are regression tests to ensure we don't end up in invalid runtime
      state for helpers that execute callbacks multiple times. It exercises
      the fixes to verifier callback handling for reference state in previous
      patches.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20220823013226.24988-1-memxor@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      35f14dbd
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Fix reference state management for synchronous callbacks · 9d9d00ac
      Kumar Kartikeya Dwivedi authored
      Currently, verifier verifies callback functions (sync and async) as if
      they will be executed once, (i.e. it explores execution state as if the
      function was being called once). The next insn to explore is set to
      start of subprog and the exit from nested frame is handled using
      curframe > 0 and prepare_func_exit. In case of async callback it uses a
      customized variant of push_stack simulating a kind of branch to set up
      custom state and execution context for the async callback.
      
      While this approach is simple and works when callback really will be
      executed only once, it is unsafe for all of our current helpers which
      are for_each style, i.e. they execute the callback multiple times.
      
      A callback releasing acquired references of the caller may do so
      multiple times, but currently verifier sees it as one call inside the
      frame, which then returns to caller. Hence, it thinks it released some
      reference that the cb e.g. got access through callback_ctx (register
      filled inside cb from spilled typed register on stack).
      
      Similarly, it may see that an acquire call is unpaired inside the
      callback, so the caller will copy the reference state of callback and
      then will have to release the register with new ref_obj_ids. But again,
      the callback may execute multiple times, but the verifier will only
      account for acquired references for a single symbolic execution of the
      callback, which will cause leaks.
      
      Note that for async callback case, things are different. While currently
      we have bpf_timer_set_callback which only executes it once, even for
      multiple executions it would be safe, as reference state is NULL and
      check_reference_leak would force program to release state before
      BPF_EXIT. The state is also unaffected by analysis for the caller frame.
      Hence async callback is safe.
      
      Since we want the reference state to be accessible, e.g. for pointers
      loaded from stack through callback_ctx's PTR_TO_STACK, we still have to
      copy caller's reference_state to callback's bpf_func_state, but we
      enforce that whatever references it adds to that reference_state has
      been released before it hits BPF_EXIT. This requires introducing a new
      callback_ref member in the reference state to distinguish between caller
      vs callee references. Hence, check_reference_leak now errors out if it
      sees we are in callback_fn and we have not released callback_ref refs.
      Since there can be multiple nested callbacks, like frame 0 -> cb1 -> cb2
      etc. we need to also distinguish between whether this particular ref
      belongs to this callback frame or parent, and only error for our own, so
      we store state->frameno (which is always non-zero for callbacks).
      
      In short, callbacks can read parent reference_state, but cannot mutate
      it, to be able to use pointers acquired by the caller. They must only
      undo their changes (by releasing their own acquired_refs before
      BPF_EXIT) on top of caller reference_state before returning (at which
      point the caller and callback state will match anyway, so no need to
      copy it back to caller).
      
      Fixes: 69c087ba ("bpf: Add bpf_for_each_map_elem() helper")
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20220823013125.24938-1-memxor@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      9d9d00ac
  2. 23 Aug, 2022 13 commits
  3. 19 Aug, 2022 18 commits
  4. 18 Aug, 2022 3 commits
    • Maxime Chevallier's avatar
      net: ethernet: altera: Add use of ethtool_op_get_ts_info · fb8d784b
      Maxime Chevallier authored
      Add the ethtool_op_get_ts_info() callback to ethtool ops, so that we can
      at least use software timestamping.
      Signed-off-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Link: https://lore.kernel.org/r/20220817095725.97444-1-maxime.chevallier@bootlin.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fb8d784b
    • Wong Vee Khee's avatar
      stmmac: intel: remove unused 'has_crossts' flag · e34cfee6
      Wong Vee Khee authored
      The 'has_crossts' flag was not used anywhere in the stmmac driver,
      removing it from both header file and dwmac-intel driver.
      Signed-off-by: default avatarWong Vee Khee <veekhee@apple.com>
      Reviewed-by: Kurt Kanzenbach's avatarKurt Kanzenbach <kurt@linutronix.de>
      Link: https://lore.kernel.org/r/20220817064324.10025-1-veekhee@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e34cfee6
    • Jakub Kicinski's avatar
      Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · 3f5f728a
      Jakub Kicinski authored
      Andrii Nakryiko says:
      
      ====================
      bpf-next 2022-08-17
      
      We've added 45 non-merge commits during the last 14 day(s) which contain
      a total of 61 files changed, 986 insertions(+), 372 deletions(-).
      
      The main changes are:
      
      1) New bpf_ktime_get_tai_ns() BPF helper to access CLOCK_TAI, from Kurt
         Kanzenbach and Jesper Dangaard Brouer.
      
      2) Few clean ups and improvements for libbpf 1.0, from Andrii Nakryiko.
      
      3) Expose crash_kexec() as kfunc for BPF programs, from Artem Savkov.
      
      4) Add ability to define sleepable-only kfuncs, from Benjamin Tissoires.
      
      5) Teach libbpf's bpf_prog_load() and bpf_map_create() to gracefully handle
         unsupported names on old kernels, from Hangbin Liu.
      
      6) Allow opting out from auto-attaching BPF programs by libbpf's BPF skeleton,
         from Hao Luo.
      
      7) Relax libbpf's requirement for shared libs to be marked executable, from
         Henqgi Chen.
      
      8) Improve bpf_iter internals handling of error returns, from Hao Luo.
      
      9) Few accommodations in libbpf to support GCC-BPF quirks, from James Hilliard.
      
      10) Fix BPF verifier logic around tracking dynptr ref_obj_id, from Joanne Koong.
      
      11) bpftool improvements to handle full BPF program names better, from Manu
          Bretelle.
      
      12) bpftool fixes around libcap use, from Quentin Monnet.
      
      13) BPF map internals clean ups and improvements around memory allocations,
          from Yafang Shao.
      
      14) Allow to use cgroup_get_from_file() on cgroupv1, allowing BPF cgroup
          iterator to work on cgroupv1, from Yosry Ahmed.
      
      15) BPF verifier internal clean ups, from Dave Marchevsky and Joanne Koong.
      
      16) Various fixes and clean ups for selftests/bpf and vmtest.sh, from Daniel
          Xu, Artem Savkov, Joanne Koong, Andrii Nakryiko, Shibin Koikkara Reeny.
      
      * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (45 commits)
        selftests/bpf: Few fixes for selftests/bpf built in release mode
        libbpf: Clean up deprecated and legacy aliases
        libbpf: Streamline bpf_attr and perf_event_attr initialization
        libbpf: Fix potential NULL dereference when parsing ELF
        selftests/bpf: Tests libbpf autoattach APIs
        libbpf: Allows disabling auto attach
        selftests/bpf: Fix attach point for non-x86 arches in test_progs/lsm
        libbpf: Making bpf_prog_load() ignore name if kernel doesn't support
        selftests/bpf: Update CI kconfig
        selftests/bpf: Add connmark read test
        selftests/bpf: Add existing connection bpf_*_ct_lookup() test
        bpftool: Clear errno after libcap's checks
        bpf: Clear up confusion in bpf_skb_adjust_room()'s documentation
        bpftool: Fix a typo in a comment
        libbpf: Add names for auxiliary maps
        bpf: Use bpf_map_area_alloc consistently on bpf map creation
        bpf: Make __GFP_NOWARN consistent in bpf map creation
        bpf: Use bpf_map_area_free instread of kvfree
        bpf: Remove unneeded memset in queue_stack_map creation
        libbpf: preserve errno across pr_warn/pr_info/pr_debug
        ...
      ====================
      
      Link: https://lore.kernel.org/r/20220817215656.1180215-1-andrii@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3f5f728a