1. 15 Jan, 2021 16 commits
    • Alexei Starovoitov's avatar
      Merge branch 'perf: Add mmap2 build id support' · eed6a9a9
      Alexei Starovoitov authored
      Jiri Olsa says:
      
      ====================
      
      hi,
      adding the support to have buildid stored in mmap2 event,
      so we can bypass the final perf record hunt on build ids.
      
      This patchset allows perf to record build ID in mmap2 event,
      and adds perf tooling to store/download binaries to .debug
      cache based on these build IDs.
      
      Note that the build id retrieval code is stolen from bpf
      code, where it's been used (together with file offsets)
      to replace IPs in user space stack traces. It's now added
      under lib directory.
      
      v7 changes:
        - included only missing kernel patches, cc-ed bpf@vger and
          rebased on bpf-next/master [Alexei]
      
      v6 changes:
        - last 4 patches rebased Arnaldo's perf/core
      
      v5 changes:
        - rebased on latest perf/core
        - several patches already pulled in
        - fixed trace+probe_vfs_getname.sh output redirection
        - fixed changelogs [Arnaldo]
        - renamed BUILD_ID_SIZE to BUILD_ID_SIZE_MAX [Song]
      
      v4 changes:
        - fixed typo in changelog [Namhyung]
        - removed force_download bool from struct dso_store_data,
          because it's not used  [Namhyung]
      
      v3 changes:
        - added acks
        - removed forgotten debug code [Arnaldo]
        - fixed readlink termination [Ian]
        - fixed doc for --debuginfod=URLs [Ian]
        - adopted kernel's memchr_inv function and used
          it in build_id__is_defined function [Arnaldo]
      
      On recording server:
      
        - on the recording server we can run record with --buildid-mmap
          option to store build ids in mmap2 events:
      
          # perf record --buildid-mmap
          ^C[ perf record: Woken up 2 times to write data ]
          [ perf record: Captured and wrote 0.836 MB perf.data ]
      
        - it stores nothing to ~/.debug cache:
      
          # find ~/.debug
          find: ‘/root/.debug’: No such file or directory
      
        - and still reports properly:
      
          # perf report --stdio
          ...
          99.82%  swapper          [kernel.kallsyms]  [k] native_safe_halt
           0.03%  swapper          [kernel.kallsyms]  [k] finish_task_switch
           0.02%  swapper          [kernel.kallsyms]  [k] __softirqentry_text_start
           0.01%  kcompactd0       [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
           0.01%  ksoftirqd/6      [kernel.kallsyms]  [k] slab_free_freelist_hook
           0.01%  kworker/17:1H-x  [kernel.kallsyms]  [k] slab_free_freelist_hook
      
        - display used/hit build ids:
      
          # perf buildid-list | head -5
          5dcec522abf136fcfd3128f47e131f2365834dd7 /proc/kcore
          589e403a34f55486bcac848a45e00bcdeedd1ca8 /usr/lib64/libcrypto.so.1.1.1g
          94569566d4eac7e9c87ba029d43d4e2158f9527e /usr/lib64/libpthread-2.30.so
          559b9702bebe31c6d132c8dc5cc887673d65d5b5 /usr/lib64/libc-2.30.so
          40da7abe89f631f60538a17686a7d65c6a02ed31 /usr/lib64/ld-2.30.so
      
        - store build id binaries into build id cache:
      
          # perf buildid-cache -a perf.data
          OK   5dcec522abf136fcfd3128f47e131f2365834dd7 /proc/kcore
          OK   589e403a34f55486bcac848a45e00bcdeedd1ca8 /usr/lib64/libcrypto.so.1.1.1g
          OK   94569566d4eac7e9c87ba029d43d4e2158f9527e /usr/lib64/libpthread-2.30.so
          OK   559b9702bebe31c6d132c8dc5cc887673d65d5b5 /usr/lib64/libc-2.30.so
          OK   40da7abe89f631f60538a17686a7d65c6a02ed31 /usr/lib64/ld-2.30.so
          OK   a674f7a47c78e35a088104647b9640710277b489 /usr/sbin/sshd
          OK   e5cb4ca25f46485bdbc691c3a92e7e111dac3ef2 /usr/bin/bash
          OK   9bc8589108223c944b452f0819298a0c3cba6215 /usr/bin/find
      
          # find ~/.debug | head -5
          /root/.debug
          /root/.debug/proc
          /root/.debug/proc/kcore
          /root/.debug/proc/kcore/5dcec522abf136fcfd3128f47e131f2365834dd7
          /root/.debug/proc/kcore/5dcec522abf136fcfd3128f47e131f2365834dd7/kallsyms
      
        - run debuginfod daemon to provide binaries to another server (below)
          (the initialization could take some time)
      
          # debuginfod -F /
      
      On another server:
      
        - copy perf.data from 'record' server and run:
      
          $ find ~/.debug/
          find: ‘/home/jolsa/.debug/’: No such file or directory
      
          $ perf buildid-list | head -5
          No kallsyms or vmlinux with build-id 5dcec522abf136fcfd3128f47e131f2365834dd7 was found
          5dcec522abf136fcfd3128f47e131f2365834dd7 [kernel.kallsyms]
          5784f813b727a50cfd3363234aef9fcbab685cc4 /lib/modules/5.10.0-rc2speed+/kernel/fs/xfs/xfs.ko
          589e403a34f55486bcac848a45e00bcdeedd1ca8 /usr/lib64/libcrypto.so.1.1.1g
          94569566d4eac7e9c87ba029d43d4e2158f9527e /usr/lib64/libpthread-2.30.so
          559b9702bebe31c6d132c8dc5cc887673d65d5b5 /usr/lib64/libc-2.30.so
      
        - report does not show anything (kernel build id does not match):
      
         $ perf report --stdio
         ...
          76.73%  swapper          [kernel.kallsyms]    [k] 0xffffffff81aa8ebe
           1.89%  find             [kernel.kallsyms]    [k] 0xffffffff810f2167
           0.93%  sshd             [kernel.kallsyms]    [k] 0xffffffff8153380c
           0.83%  swapper          [kernel.kallsyms]    [k] 0xffffffff81104b0b
           0.71%  kworker/u40:2-e  [kernel.kallsyms]    [k] 0xffffffff810f3850
           0.70%  kworker/u40:0-e  [kernel.kallsyms]    [k] 0xffffffff810f3850
           0.64%  find             [kernel.kallsyms]    [k] 0xffffffff81a9ba0a
           0.63%  find             [kernel.kallsyms]    [k] 0xffffffff81aa93b0
      
        - add build ids does not work, because existing binaries (on another server)
          have different build ids:
      
          $ perf buildid-cache -a perf.data
          No kallsyms or vmlinux with build-id 5dcec522abf136fcfd3128f47e131f2365834dd7 was found
          FAIL 5dcec522abf136fcfd3128f47e131f2365834dd7 [kernel.kallsyms]
          FAIL 5784f813b727a50cfd3363234aef9fcbab685cc4 /lib/modules/5.10.0-rc2speed+/kernel/fs/xfs/xfs.ko
          FAIL 589e403a34f55486bcac848a45e00bcdeedd1ca8 /usr/lib64/libcrypto.so.1.1.1g
          FAIL 94569566d4eac7e9c87ba029d43d4e2158f9527e /usr/lib64/libpthread-2.30.so
          FAIL 559b9702bebe31c6d132c8dc5cc887673d65d5b5 /usr/lib64/libc-2.30.so
          FAIL 40da7abe89f631f60538a17686a7d65c6a02ed31 /usr/lib64/ld-2.30.so
          FAIL a674f7a47c78e35a088104647b9640710277b489 /usr/sbin/sshd
          FAIL e5cb4ca25f46485bdbc691c3a92e7e111dac3ef2 /usr/bin/bash
          FAIL 9bc8589108223c944b452f0819298a0c3cba6215 /usr/bin/find
      
        - add build ids with debuginfod setup pointing to record server:
      
          $ perf buildid-cache -a perf.data --debuginfod http://192.168.122.174:8002
          No kallsyms or vmlinux with build-id 5dcec522abf136fcfd3128f47e131f2365834dd7 was found
          OK   5dcec522abf136fcfd3128f47e131f2365834dd7 [kernel.kallsyms]
          OK   5784f813b727a50cfd3363234aef9fcbab685cc4 /lib/modules/5.10.0-rc2speed+/kernel/fs/xfs/xfs.ko
          OK   589e403a34f55486bcac848a45e00bcdeedd1ca8 /usr/lib64/libcrypto.so.1.1.1g
          OK   94569566d4eac7e9c87ba029d43d4e2158f9527e /usr/lib64/libpthread-2.30.so
          OK   559b9702bebe31c6d132c8dc5cc887673d65d5b5 /usr/lib64/libc-2.30.so
          OK   40da7abe89f631f60538a17686a7d65c6a02ed31 /usr/lib64/ld-2.30.so
          OK   a674f7a47c78e35a088104647b9640710277b489 /usr/sbin/sshd
          OK   e5cb4ca25f46485bdbc691c3a92e7e111dac3ef2 /usr/bin/bash
          OK   9bc8589108223c944b452f0819298a0c3cba6215 /usr/bin/find
      
        - and report works:
      
          $ perf report --stdio
          ...
          76.73%  swapper          [kernel.kallsyms]    [k] native_safe_halt
           1.91%  find             [kernel.kallsyms]    [k] queue_work_on
           0.93%  sshd             [kernel.kallsyms]    [k] iowrite16
           0.83%  swapper          [kernel.kallsyms]    [k] finish_task_switch
           0.72%  kworker/u40:2-e  [kernel.kallsyms]    [k] process_one_work
           0.70%  kworker/u40:0-e  [kernel.kallsyms]    [k] process_one_work
           0.64%  find             [kernel.kallsyms]    [k] syscall_enter_from_user_mode
           0.63%  find             [kernel.kallsyms]    [k] _raw_spin_unlock_irqrestore
      
        - because we have the data in build id cache:
      
          $ find ~/.debug | head -10
          .../.debug
          .../.debug/home
          .../.debug/home/jolsa
          .../.debug/home/jolsa/.cache
          .../.debug/home/jolsa/.cache/debuginfod_client
          .../.debug/home/jolsa/.cache/debuginfod_client/5dcec522abf136fcfd3128f47e131f2365834dd7
          .../.debug/home/jolsa/.cache/debuginfod_client/5dcec522abf136fcfd3128f47e131f2365834dd7/executable
          .../.debug/home/jolsa/.cache/debuginfod_client/5dcec522abf136fcfd3128f47e131f2365834dd7/executable/5dcec522abf136fcfd3128f47e131f2365834dd7
          .../.debug/home/jolsa/.cache/debuginfod_client/5dcec522abf136fcfd3128f47e131f2365834dd7/executable/5dcec522abf136fcfd3128f47e131f2365834dd7/elf
          .../.debug/home/jolsa/.cache/debuginfod_client/5dcec522abf136fcfd3128f47e131f2365834dd7/executable/5dcec522abf136fcfd3128f47e131f2365834dd7/debug
      
      Available also in:
        git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
        perf/build_id
      
      thanks,
      jirka
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      eed6a9a9
    • Jiri Olsa's avatar
      perf: Add build id data in mmap2 event · 88a16a13
      Jiri Olsa authored
      Adding support to carry build id data in mmap2 event.
      
      The build id data replaces maj/min/ino/ino_generation
      fields, which are also used to identify map's binary,
      so it's ok to replace them with build id data:
      
        union {
                struct {
                        u32       maj;
                        u32       min;
                        u64       ino;
                        u64       ino_generation;
                };
                struct {
                        u8        build_id_size;
                        u8        __reserved_1;
                        u16       __reserved_2;
                        u8        build_id[20];
                };
        };
      
      Replaced maj/min/ino/ino_generation fields give us size
      of 24 bytes. We use 20 bytes for build id data, 1 byte
      for size and rest is unused.
      
      There's new misc bit for mmap2 to signal there's build
      id data in it:
      
        #define PERF_RECORD_MISC_MMAP_BUILD_ID   (1 << 14)
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lore.kernel.org/bpf/20210114134044.1418404-4-jolsa@kernel.org
      88a16a13
    • Jiri Olsa's avatar
      bpf: Add size arg to build_id_parse function · 921f88fc
      Jiri Olsa authored
      It's possible to have other build id types (other than default SHA1).
      Currently there's also ld support for MD5 build id.
      
      Adding size argument to build_id_parse function, that returns (if defined)
      size of the parsed build id, so we can recognize the build id type.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210114134044.1418404-3-jolsa@kernel.org
      921f88fc
    • Jiri Olsa's avatar
      bpf: Move stack_map_get_build_id into lib · bd7525da
      Jiri Olsa authored
      Moving stack_map_get_build_id into lib with
      declaration in linux/buildid.h header:
      
        int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id);
      
      This function returns build id for given struct vm_area_struct.
      There is no functional change to stack_map_get_build_id function.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20210114134044.1418404-2-jolsa@kernel.org
      bd7525da
    • Alexei Starovoitov's avatar
      Merge branch 'Atomics for eBPF' · 7064a734
      Alexei Starovoitov authored
      Brendan Jackman says:
      
      ====================
      
      There's still one unresolved review comment from John[3] which I
      will resolve with a followup patch.
      
      Differences from v6->v7 [1]:
      
      * Fixed riscv build error detected by 0-day robot.
      
      Differences from v5->v6 [1]:
      
      * Carried Björn Töpel's ack for RISC-V code, plus a couple more acks from
        Yonhgong.
      
      * Doc fixups.
      
      * Trivial cleanups.
      
      Differences from v4->v5 [1]:
      
      * Fixed bogus type casts in interpreter that led to warnings from
        the 0day robot.
      
      * Dropped feature-detection for Clang per Andrii's suggestion in [4].
        The selftests will now fail to build unless you have llvm-project
        commit 286daafd6512. The ENABLE_ATOMICS_TEST macro is still needed
        to support the no_alu32 tests.
      
      * Carried some Acks from John and Yonghong.
      
      * Dropped confusing usage of __atomic_exchange from prog_test in
        favour of __sync_lock_test_and_set.
      
      * [Really] got rid of all the forest of instruction macros
        (BPF_ATOMIC_FETCH_ADD and friends); now there's just BPF_ATOMIC_OP
        to define all the instructions as we use them in the verifier
        tests. This makes the atomic ops less special in that API, and I
        don't think the resulting usage is actually any harder to read.
      
      Differences from v3->v4 [1]:
      
      * Added one Ack from Yonghong. He acked some other patches but those
        have now changed non-trivally so I didn't add those acks.
      
      * Fixups to commit messages.
      
      * Fixed disassembly and comments: first arg to atomic_fetch_* is a
        pointer.
      
      * Improved prog_test efficiency. BPF progs are now all loaded in a
        single call, then the skeleton is re-used for each subtest.
      
      * Dropped use of tools/build/feature in favour of a one-liner in the
        Makefile.
      
      * Dropped the commit that created an emit_neg helper in the x86
        JIT. It's not used any more (it wasn't used in v3 either).
      
      * Combined all the different filter.h macros (used to be
        BPF_ATOMIC_ADD, BPF_ATOMIC_FETCH_ADD, BPF_ATOMIC_AND, etc) into
        just BPF_ATOMIC32 and BPF_ATOMIC64.
      
      * Removed some references to BPF_STX_XADD from tools/, samples/ and
        lib/ that I missed before.
      
      Differences from v2->v3 [1]:
      
      * More minor fixes and naming/comment changes
      
      * Dropped atomic subtract: compilers can implement this by preceding
        an atomic add with a NEG instruction (which is what the x86 JIT did
        under the hood anyway).
      
      * Dropped the use of -mcpu=v4 in the Clang BPF command-line; there is
        no longer an architecture version bump. Instead a feature test is
        added to Kbuild - it builds a source file to check if Clang
        supports BPF atomics.
      
      * Fixed the prog_test so it no longer breaks
        test_progs-no_alu32. This requires some ifdef acrobatics to avoid
        complicating the prog_tests model where the same userspace code
        exercises both the normal and no_alu32 BPF test objects, using the
        same skeleton header.
      
      Differences from v1->v2 [1]:
      
      * Fixed mistakes in the netronome driver
      
      * Addd sub, add, or, xor operations
      
      * The above led to some refactors to keep things readable. (Maybe I
        should have just waited until I'd implemented these before starting
        the review...)
      
      * Replaced BPF_[CMP]SET | BPF_FETCH with just BPF_[CMP]XCHG, which
        include the BPF_FETCH flag
      
      * Added a bit of documentation. Suggestions welcome for more places
        to dump this info...
      
      The prog_test that's added depends on Clang/LLVM features added by
      Yonghong in commit 286daafd6512 (was
      https://reviews.llvm.org/D72184).
      
      This only includes a JIT implementation for x86_64 - I don't plan to
      implement JIT support myself for other architectures.
      
      Operations
      ==========
      
      This patchset adds atomic operations to the eBPF instruction set. The
      use-case that motivated this work was a trivial and efficient way to
      generate globally-unique cookies in BPF progs, but I think it's
      obvious that these features are pretty widely applicable.  The
      instructions that are added here can be summarised with this list of
      kernel operations:
      
      * atomic[64]_[fetch_]add
      * atomic[64]_[fetch_]and
      * atomic[64]_[fetch_]or
      * atomic[64]_xchg
      * atomic[64]_cmpxchg
      
      The following are left out of scope for this effort:
      
      * 16 and 8 bit operations
      * Explicit memory barriers
      
      Encoding
      ========
      
      I originally planned to add new values for bpf_insn.opcode. This was
      rather unpleasant: the opcode space has holes in it but no entire
      instruction classes[2]. Yonghong Song had a better idea: use the
      immediate field of the existing STX XADD instruction to encode the
      operation. This works nicely, without breaking existing programs,
      because the immediate field is currently reserved-must-be-zero, and
      extra-nicely because BPF_ADD happens to be zero.
      
      Note that this of course makes immediate-source atomic operations
      impossible. It's hard to imagine a measurable speedup from such
      instructions, and if it existed it would certainly not benefit x86,
      which has no support for them.
      
      The BPF_OP opcode fields are re-used in the immediate, and an
      additional flag BPF_FETCH is used to mark instructions that should
      fetch a pre-modification value from memory.
      
      So, BPF_XADD is now called BPF_ATOMIC (the old name is kept to avoid
      breaking userspace builds), and where we previously had .imm = 0, we
      now have .imm = BPF_ADD (which is 0).
      
      Operands
      ========
      
      Reg-source eBPF instructions only have two operands, while these
      atomic operations have up to four. To avoid needing to encode
      additional operands, then:
      
      - One of the input registers is re-used as an output register
        (e.g. atomic_fetch_add both reads from and writes to the source
        register).
      
      - Where necessary (i.e. for cmpxchg) , R0 is "hard-coded" as one of
        the operands.
      
      This approach also allows the new eBPF instructions to map directly
      to single x86 instructions.
      
      [1] Previous iterations:
          v1: https://lore.kernel.org/bpf/20201123173202.1335708-1-jackmanb@google.com/
          v2: https://lore.kernel.org/bpf/20201127175738.1085417-1-jackmanb@google.com/
          v3: https://lore.kernel.org/bpf/X8kN7NA7bJC7aLQI@google.com/
          v4: https://lore.kernel.org/bpf/20201207160734.2345502-1-jackmanb@google.com/
          v5: https://lore.kernel.org/bpf/20201215121816.1048557-1-jackmanb@google.com/
          v6: https://lore.kernel.org/bpf/20210112154235.2192781-1-jackmanb@google.com/
      
      [2] Visualisation of eBPF opcode space:
          https://gist.github.com/bjackman/00fdad2d5dfff601c1918bc29b16e778
      
      [3] Comment from John about propagating bounds in verifier:
          https://lore.kernel.org/bpf/5fcf0fbcc8aa8_9ab320853@john-XPS-13-9370.notmuch/
      
      [4] Mail from Andrii about not supporting old Clang in selftests:
          https://lore.kernel.org/bpf/CAEf4BzYBddPaEzRUs=jaWSo5kbf=LZdb7geAUVj85GxLQztuAQ@mail.gmail.com/
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      7064a734
    • Brendan Jackman's avatar
    • Brendan Jackman's avatar
      bpf: Add tests for new BPF atomic operations · 98d666d0
      Brendan Jackman authored
      The prog_test that's added depends on Clang/LLVM features added by
      Yonghong in commit 286daafd6512 (was https://reviews.llvm.org/D72184).
      
      Note the use of a define called ENABLE_ATOMICS_TESTS: this is used
      to:
      
       - Avoid breaking the build for people on old versions of Clang
       - Avoid needing separate lists of test objects for no_alu32, where
         atomics are not supported even if Clang has the feature.
      
      The atomics_test.o BPF object is built unconditionally both for
      test_progs and test_progs-no_alu32. For test_progs, if Clang supports
      atomics, ENABLE_ATOMICS_TESTS is defined, so it includes the proper
      test code. Otherwise, progs and global vars are defined anyway, as
      stubs; this means that the skeleton user code still builds.
      
      The atomics_test.o userspace object is built once and used for both
      test_progs and test_progs-no_alu32. A variable called skip_tests is
      defined in the BPF object's data section, which tells the userspace
      object whether to skip the atomics test.
      Signed-off-by: default avatarBrendan Jackman <jackmanb@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20210114181751.768687-11-jackmanb@google.com
      98d666d0
    • Brendan Jackman's avatar
      bpf: Add bitwise atomic instructions · 981f94c3
      Brendan Jackman authored
      This adds instructions for
      
      atomic[64]_[fetch_]and
      atomic[64]_[fetch_]or
      atomic[64]_[fetch_]xor
      
      All these operations are isomorphic enough to implement with the same
      verifier, interpreter, and x86 JIT code, hence being a single commit.
      
      The main interesting thing here is that x86 doesn't directly support
      the fetch_ version these operations, so we need to generate a CMPXCHG
      loop in the JIT. This requires the use of two temporary registers,
      IIUC it's safe to use BPF_REG_AX and x86's AUX_REG for this purpose.
      Signed-off-by: default avatarBrendan Jackman <jackmanb@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20210114181751.768687-10-jackmanb@google.com
      981f94c3
    • Brendan Jackman's avatar
      bpf: Pull out a macro for interpreting atomic ALU operations · 46291067
      Brendan Jackman authored
      Since the atomic operations that are added in subsequent commits are
      all isomorphic with BPF_ADD, pull out a macro to avoid the
      interpreter becoming dominated by lines of atomic-related code.
      
      Note that this sacrificies interpreter performance (combining
      STX_ATOMIC_W and STX_ATOMIC_DW into single switch case means that we
      need an extra conditional branch to differentiate them) in favour of
      compact and (relatively!) simple C code.
      Signed-off-by: default avatarBrendan Jackman <jackmanb@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20210114181751.768687-9-jackmanb@google.com
      46291067
    • Brendan Jackman's avatar
      bpf: Add instructions for atomic_[cmp]xchg · 5ffa2550
      Brendan Jackman authored
      This adds two atomic opcodes, both of which include the BPF_FETCH
      flag. XCHG without the BPF_FETCH flag would naturally encode
      atomic_set. This is not supported because it would be of limited
      value to userspace (it doesn't imply any barriers). CMPXCHG without
      BPF_FETCH woulud be an atomic compare-and-write. We don't have such
      an operation in the kernel so it isn't provided to BPF either.
      
      There are two significant design decisions made for the CMPXCHG
      instruction:
      
       - To solve the issue that this operation fundamentally has 3
         operands, but we only have two register fields. Therefore the
         operand we compare against (the kernel's API calls it 'old') is
         hard-coded to be R0. x86 has similar design (and A64 doesn't
         have this problem).
      
         A potential alternative might be to encode the other operand's
         register number in the immediate field.
      
       - The kernel's atomic_cmpxchg returns the old value, while the C11
         userspace APIs return a boolean indicating the comparison
         result. Which should BPF do? A64 returns the old value. x86 returns
         the old value in the hard-coded register (and also sets a
         flag). That means return-old-value is easier to JIT, so that's
         what we use.
      Signed-off-by: default avatarBrendan Jackman <jackmanb@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20210114181751.768687-8-jackmanb@google.com
      5ffa2550
    • Brendan Jackman's avatar
      bpf: Add BPF_FETCH field / create atomic_fetch_add instruction · 5ca419f2
      Brendan Jackman authored
      The BPF_FETCH field can be set in bpf_insn.imm, for BPF_ATOMIC
      instructions, in order to have the previous value of the
      atomically-modified memory location loaded into the src register
      after an atomic op is carried out.
      Suggested-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarBrendan Jackman <jackmanb@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20210114181751.768687-7-jackmanb@google.com
      5ca419f2
    • Brendan Jackman's avatar
      bpf: Move BPF_STX reserved field check into BPF_STX verifier code · c5bcb5eb
      Brendan Jackman authored
      I can't find a reason why this code is in resolve_pseudo_ldimm64;
      since I'll be modifying it in a subsequent commit, tidy it up.
      Signed-off-by: default avatarBrendan Jackman <jackmanb@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20210114181751.768687-6-jackmanb@google.com
      c5bcb5eb
    • Brendan Jackman's avatar
      bpf: Rename BPF_XADD and prepare to encode other atomics in .imm · 91c960b0
      Brendan Jackman authored
      A subsequent patch will add additional atomic operations. These new
      operations will use the same opcode field as the existing XADD, with
      the immediate discriminating different operations.
      
      In preparation, rename the instruction mode BPF_ATOMIC and start
      calling the zero immediate BPF_ADD.
      
      This is possible (doesn't break existing valid BPF progs) because the
      immediate field is currently reserved MBZ and BPF_ADD is zero.
      
      All uses are removed from the tree but the BPF_XADD definition is
      kept around to avoid breaking builds for people including kernel
      headers.
      Signed-off-by: default avatarBrendan Jackman <jackmanb@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarBjörn Töpel <bjorn.topel@gmail.com>
      Link: https://lore.kernel.org/bpf/20210114181751.768687-5-jackmanb@google.com
      91c960b0
    • Brendan Jackman's avatar
      bpf: x86: Factor out a lookup table for some ALU opcodes · e5f02cac
      Brendan Jackman authored
      A later commit will need to lookup a subset of these opcodes. To
      avoid duplicating code, pull out a table.
      
      The shift opcodes won't be needed by that later commit, but they're
      already duplicated, so fold them into the table anyway.
      Signed-off-by: default avatarBrendan Jackman <jackmanb@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20210114181751.768687-4-jackmanb@google.com
      e5f02cac
    • Brendan Jackman's avatar
      bpf: x86: Factor out emission of REX byte · 74007cfc
      Brendan Jackman authored
      The JIT case for encoding atomic ops is about to get more
      complicated. In order to make the review & resulting code easier,
      let's factor out some shared helpers.
      Signed-off-by: default avatarBrendan Jackman <jackmanb@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20210114181751.768687-3-jackmanb@google.com
      74007cfc
    • Brendan Jackman's avatar
      bpf: x86: Factor out emission of ModR/M for *(reg + off) · 11c11d07
      Brendan Jackman authored
      The case for JITing atomics is about to get more complicated. Let's
      factor out some common code to make the review and result more
      readable.
      
      NB the atomics code doesn't yet use the new helper - a subsequent
      patch will add its use as a side-effect of other changes.
      Signed-off-by: default avatarBrendan Jackman <jackmanb@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20210114181751.768687-2-jackmanb@google.com
      11c11d07
  2. 14 Jan, 2021 8 commits
  3. 13 Jan, 2021 5 commits
  4. 12 Jan, 2021 7 commits
  5. 08 Jan, 2021 4 commits