1. 27 Apr, 2021 11 commits
    • Florent Revest's avatar
      seq_file: Add a seq_bprintf function · 76d6a133
      Florent Revest authored
      Similarly to seq_buf_bprintf in lib/seq_buf.c, this function writes a
      printf formatted string with arguments provided in a "binary
      representation" built by functions such as vbin_printf.
      Signed-off-by: default avatarFlorent Revest <revest@chromium.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210427174313.860948-2-revest@chromium.org
      76d6a133
    • Hengqi Chen's avatar
      bpf, docs: Fix literal block for example code · 2551c2d1
      Hengqi Chen authored
      Add a missing colon so that the code block followed can be rendered
      properly.
      Signed-off-by: default avatarHengqi Chen <hengqi.chen@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210424021208.832116-1-hengqi.chen@gmail.com
      2551c2d1
    • Lorenzo Bianconi's avatar
      bpf, cpumap: Bulk skb using netif_receive_skb_list · bb024780
      Lorenzo Bianconi authored
      Rely on netif_receive_skb_list routine to send skbs converted from
      xdp_frames in cpu_map_kthread_run in order to improve i-cache usage.
      The proposed patch has been tested running xdp_redirect_cpu bpf sample
      available in the kernel tree that is used to redirect UDP frames from
      ixgbe driver to a cpumap entry and then to the networking stack. UDP
      frames are generated using pktgen. Packets are discarded by the UDP
      layer.
      
      $ xdp_redirect_cpu  --cpu <cpu> --progname xdp_cpu_map0 --dev <eth>
      
      bpf-next: ~2.35Mpps
      bpf-next + cpumap skb-list: ~2.72Mpps
      
      Rename drops counter in kmem_alloc_drops since now it reports just
      kmem_cache_alloc_bulk failures
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Link: https://lore.kernel.org/bpf/c729f83e5d7482d9329e0f165bdbe5adcefd1510.1619169700.git.lorenzo@kernel.org
      bb024780
    • Daniel Borkmann's avatar
      bpf: Fix propagation of 32 bit unsigned bounds from 64 bit bounds · 10bf4e83
      Daniel Borkmann authored
      Similarly as b0270958 ("bpf: Fix propagation of 32-bit signed bounds
      from 64-bit bounds."), we also need to fix the propagation of 32 bit
      unsigned bounds from 64 bit counterparts. That is, really only set the
      u32_{min,max}_value when /both/ {umin,umax}_value safely fit in 32 bit
      space. For example, the register with a umin_value == 1 does /not/ imply
      that u32_min_value is also equal to 1, since umax_value could be much
      larger than 32 bit subregister can hold, and thus u32_min_value is in
      the interval [0,1] instead.
      
      Before fix, invalid tracking result of R2_w=inv1:
      
        [...]
        5: R0_w=inv1337 R1=ctx(id=0,off=0,imm=0) R2_w=inv(id=0) R10=fp0
        5: (35) if r2 >= 0x1 goto pc+1
        [...] // goto path
        7: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,umin_value=1) R10=fp0
        7: (b6) if w2 <= 0x1 goto pc+1
        [...] // goto path
        9: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,smin_value=-9223372036854775807,smax_value=9223372032559808513,umin_value=1,umax_value=18446744069414584321,var_off=(0x1; 0xffffffff00000000),s32_min_value=1,s32_max_value=1,u32_max_value=1) R10=fp0
        9: (bc) w2 = w2
        10: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2_w=inv1 R10=fp0
        [...]
      
      After fix, correct tracking result of R2_w=inv(id=0,umax_value=1,var_off=(0x0; 0x1)):
      
        [...]
        5: R0_w=inv1337 R1=ctx(id=0,off=0,imm=0) R2_w=inv(id=0) R10=fp0
        5: (35) if r2 >= 0x1 goto pc+1
        [...] // goto path
        7: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,umin_value=1) R10=fp0
        7: (b6) if w2 <= 0x1 goto pc+1
        [...] // goto path
        9: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,smax_value=9223372032559808513,umax_value=18446744069414584321,var_off=(0x0; 0xffffffff00000001),s32_min_value=0,s32_max_value=1,u32_max_value=1) R10=fp0
        9: (bc) w2 = w2
        10: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2_w=inv(id=0,umax_value=1,var_off=(0x0; 0x1)) R10=fp0
        [...]
      
      Thus, same issue as in b0270958 holds for unsigned subregister tracking.
      Also, align __reg64_bound_u32() similarly to __reg64_bound_s32() as done in
      b0270958 to make them uniform again.
      
      Fixes: 3f50f132 ("bpf: Verifier, do explicit ALU32 bounds tracking")
      Reported-by: Manfred Paul (@_manfp)
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      10bf4e83
    • Florent Revest's avatar
      bpf: Lock bpf_trace_printk's tmp buf before it is written to · 38d26d89
      Florent Revest authored
      bpf_trace_printk uses a shared static buffer to hold strings before they
      are printed. A recent refactoring moved the locking of that buffer after
      it gets filled by mistake.
      
      Fixes: d9c9e4db ("bpf: Factorize bpf_trace_printk and bpf_seq_printf")
      Reported-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: default avatarFlorent Revest <revest@chromium.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210427112958.773132-1-revest@chromium.org
      38d26d89
    • Alexei Starovoitov's avatar
      Merge branch 'CO-RE relocation selftests fixes' · 92731504
      Alexei Starovoitov authored
      Andrii Nakryiko says:
      
      ====================
      
      Lorenz Bauer noticed that core_reloc selftest has two inverted CHECK()
      conditions, allowing failing tests to pass unnoticed. Fixing that opened up
      few long-standing (field existence and direct memory bitfields) and one recent
      failures (BTF_KIND_FLOAT relos).
      
      This patch set fixes core_reloc selftest to capture such failures reliably in
      the future. It also fixes all the newly failing tests. See individual patches
      for details.
      
      This patch set also completes a set of ASSERT_xxx() macros, so now there
      should be a very little reason to use verbose and error-prone generic CHECK()
      macro.
      
      v1->v2:
        - updated bpf_core_fields_are_compat() comment to mention FLOAT (Lorenz).
      
      Cc: Lorenz Bauer <lmb@cloudflare.com>
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      92731504
    • Andrii Nakryiko's avatar
      selftests/bpf: Fix core_reloc test runner · bede0ebf
      Andrii Nakryiko authored
      Fix failed tests checks in core_reloc test runner, which allowed failing tests
      to pass quietly. Also add extra check to make sure that expected to fail test cases with
      invalid names are caught as test failure anyway, as this is not an expected
      failure mode. Also fix mislabeled probed vs direct bitfield test cases.
      
      Fixes: 124a892d ("selftests/bpf: Test TYPE_EXISTS and TYPE_SIZE CO-RE relocations")
      Reported-by: default avatarLorenz Bauer <lmb@cloudflare.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarLorenz Bauer <lmb@cloudflare.com>
      Link: https://lore.kernel.org/bpf/20210426192949.416837-6-andrii@kernel.org
      bede0ebf
    • Andrii Nakryiko's avatar
      selftests/bpf: Fix field existence CO-RE reloc tests · 5a30eb23
      Andrii Nakryiko authored
      Negative field existence cases for have a broken assumption that FIELD_EXISTS
      CO-RE relo will fail for fields that match the name but have incompatible type
      signature. That's not how CO-RE relocations generally behave. Types and fields
      that match by name but not by expected type are treated as non-matching
      candidates and are skipped. Error later is reported if no matching candidate
      was found. That's what happens for most relocations, but existence relocations
      (FIELD_EXISTS and TYPE_EXISTS) are more permissive and they are designed to
      return 0 or 1, depending if a match is found. This allows to handle
      name-conflicting but incompatible types in BPF code easily. Combined with
      ___flavor suffixes, it's possible to handle pretty much any structural type
      changes in kernel within the compiled once BPF source code.
      
      So, long story short, negative field existence test cases are invalid in their
      assumptions, so this patch reworks them into a single consolidated positive
      case that doesn't match any of the fields.
      
      Fixes: c7566a69 ("selftests/bpf: Add field existence CO-RE relocs tests")
      Reported-by: default avatarLorenz Bauer <lmb@cloudflare.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarLorenz Bauer <lmb@cloudflare.com>
      Link: https://lore.kernel.org/bpf/20210426192949.416837-5-andrii@kernel.org
      5a30eb23
    • Andrii Nakryiko's avatar
      selftests/bpf: Fix BPF_CORE_READ_BITFIELD() macro · 0f20615d
      Andrii Nakryiko authored
      Fix BPF_CORE_READ_BITFIELD() macro used for reading CO-RE-relocatable
      bitfields. Missing breaks in a switch caused 8-byte reads always. This can
      confuse libbpf because it does strict checks that memory load size corresponds
      to the original size of the field, which in this case quite often would be
      wrong.
      
      After fixing that, we run into another problem, which quite subtle, so worth
      documenting here. The issue is in Clang optimization and CO-RE relocation
      interactions. Without that asm volatile construct (also known as
      barrier_var()), Clang will re-order BYTE_OFFSET and BYTE_SIZE relocations and
      will apply BYTE_OFFSET 4 times for each switch case arm. This will result in
      the same error from libbpf about mismatch of memory load size and original
      field size. I.e., if we were reading u32, we'd still have *(u8 *), *(u16 *),
      *(u32 *), and *(u64 *) memory loads, three of which will fail. Using
      barrier_var() forces Clang to apply BYTE_OFFSET relocation first (and once) to
      calculate p, after which value of p is used without relocation in each of
      switch case arms, doing appropiately-sized memory load.
      
      Here's the list of relevant relocations and pieces of generated BPF code
      before and after this patch for test_core_reloc_bitfields_direct selftests.
      
      BEFORE
      =====
       #45: core_reloc: insn #160 --> [5] + 0:5: byte_sz --> struct core_reloc_bitfields.u32
       #46: core_reloc: insn #167 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32
       #47: core_reloc: insn #174 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32
       #48: core_reloc: insn #178 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32
       #49: core_reloc: insn #182 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32
      
           157:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0 ll
           159:       7b 12 20 01 00 00 00 00 *(u64 *)(r2 + 288) = r1
           160:       b7 02 00 00 04 00 00 00 r2 = 4
      ; BYTE_SIZE relocation here                 ^^^
           161:       66 02 07 00 03 00 00 00 if w2 s> 3 goto +7 <LBB0_63>
           162:       16 02 0d 00 01 00 00 00 if w2 == 1 goto +13 <LBB0_65>
           163:       16 02 01 00 02 00 00 00 if w2 == 2 goto +1 <LBB0_66>
           164:       05 00 12 00 00 00 00 00 goto +18 <LBB0_69>
      
      0000000000000528 <LBB0_66>:
           165:       18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
           167:       69 11 08 00 00 00 00 00 r1 = *(u16 *)(r1 + 8)
      ; BYTE_OFFSET relo here w/ WRONG size        ^^^^^^^^^^^^^^^^
           168:       05 00 0e 00 00 00 00 00 goto +14 <LBB0_69>
      
      0000000000000548 <LBB0_63>:
           169:       16 02 0a 00 04 00 00 00 if w2 == 4 goto +10 <LBB0_67>
           170:       16 02 01 00 08 00 00 00 if w2 == 8 goto +1 <LBB0_68>
           171:       05 00 0b 00 00 00 00 00 goto +11 <LBB0_69>
      
      0000000000000560 <LBB0_68>:
           172:       18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
           174:       79 11 08 00 00 00 00 00 r1 = *(u64 *)(r1 + 8)
      ; BYTE_OFFSET relo here w/ WRONG size        ^^^^^^^^^^^^^^^^
           175:       05 00 07 00 00 00 00 00 goto +7 <LBB0_69>
      
      0000000000000580 <LBB0_65>:
           176:       18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
           178:       71 11 08 00 00 00 00 00 r1 = *(u8 *)(r1 + 8)
      ; BYTE_OFFSET relo here w/ WRONG size        ^^^^^^^^^^^^^^^^
           179:       05 00 03 00 00 00 00 00 goto +3 <LBB0_69>
      
      00000000000005a0 <LBB0_67>:
           180:       18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
           182:       61 11 08 00 00 00 00 00 r1 = *(u32 *)(r1 + 8)
      ; BYTE_OFFSET relo here w/ RIGHT size        ^^^^^^^^^^^^^^^^
      
      00000000000005b8 <LBB0_69>:
           183:       67 01 00 00 20 00 00 00 r1 <<= 32
           184:       b7 02 00 00 00 00 00 00 r2 = 0
           185:       16 02 02 00 00 00 00 00 if w2 == 0 goto +2 <LBB0_71>
           186:       c7 01 00 00 20 00 00 00 r1 s>>= 32
           187:       05 00 01 00 00 00 00 00 goto +1 <LBB0_72>
      
      00000000000005e0 <LBB0_71>:
           188:       77 01 00 00 20 00 00 00 r1 >>= 32
      
      AFTER
      =====
      
       #30: core_reloc: insn #132 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32
       #31: core_reloc: insn #134 --> [5] + 0:5: byte_sz --> struct core_reloc_bitfields.u32
      
           129:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0 ll
           131:       7b 12 20 01 00 00 00 00 *(u64 *)(r2 + 288) = r1
           132:       b7 01 00 00 08 00 00 00 r1 = 8
      ; BYTE_OFFSET relo here                     ^^^
      ; no size check for non-memory dereferencing instructions
           133:       0f 12 00 00 00 00 00 00 r2 += r1
           134:       b7 03 00 00 04 00 00 00 r3 = 4
      ; BYTE_SIZE relocation here                 ^^^
           135:       66 03 05 00 03 00 00 00 if w3 s> 3 goto +5 <LBB0_63>
           136:       16 03 09 00 01 00 00 00 if w3 == 1 goto +9 <LBB0_65>
           137:       16 03 01 00 02 00 00 00 if w3 == 2 goto +1 <LBB0_66>
           138:       05 00 0a 00 00 00 00 00 goto +10 <LBB0_69>
      
      0000000000000458 <LBB0_66>:
           139:       69 21 00 00 00 00 00 00 r1 = *(u16 *)(r2 + 0)
      ; NO CO-RE relocation here                   ^^^^^^^^^^^^^^^^
           140:       05 00 08 00 00 00 00 00 goto +8 <LBB0_69>
      
      0000000000000468 <LBB0_63>:
           141:       16 03 06 00 04 00 00 00 if w3 == 4 goto +6 <LBB0_67>
           142:       16 03 01 00 08 00 00 00 if w3 == 8 goto +1 <LBB0_68>
           143:       05 00 05 00 00 00 00 00 goto +5 <LBB0_69>
      
      0000000000000480 <LBB0_68>:
           144:       79 21 00 00 00 00 00 00 r1 = *(u64 *)(r2 + 0)
      ; NO CO-RE relocation here                   ^^^^^^^^^^^^^^^^
           145:       05 00 03 00 00 00 00 00 goto +3 <LBB0_69>
      
      0000000000000490 <LBB0_65>:
           146:       71 21 00 00 00 00 00 00 r1 = *(u8 *)(r2 + 0)
      ; NO CO-RE relocation here                   ^^^^^^^^^^^^^^^^
           147:       05 00 01 00 00 00 00 00 goto +1 <LBB0_69>
      
      00000000000004a0 <LBB0_67>:
           148:       61 21 00 00 00 00 00 00 r1 = *(u32 *)(r2 + 0)
      ; NO CO-RE relocation here                   ^^^^^^^^^^^^^^^^
      
      00000000000004a8 <LBB0_69>:
           149:       67 01 00 00 20 00 00 00 r1 <<= 32
           150:       b7 02 00 00 00 00 00 00 r2 = 0
           151:       16 02 02 00 00 00 00 00 if w2 == 0 goto +2 <LBB0_71>
           152:       c7 01 00 00 20 00 00 00 r1 s>>= 32
           153:       05 00 01 00 00 00 00 00 goto +1 <LBB0_72>
      
      00000000000004d0 <LBB0_71>:
           154:       77 01 00 00 20 00 00 00 r1 >>= 323
      
      Fixes: ee26dade ("libbpf: Add support for relocatable bitfields")
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarLorenz Bauer <lmb@cloudflare.com>
      Link: https://lore.kernel.org/bpf/20210426192949.416837-4-andrii@kernel.org
      0f20615d
    • Andrii Nakryiko's avatar
      libbpf: Support BTF_KIND_FLOAT during type compatibility checks in CO-RE · 6709a914
      Andrii Nakryiko authored
      Add BTF_KIND_FLOAT support when doing CO-RE field type compatibility check.
      Without this, relocations against float/double fields will fail.
      
      Also adjust one error message to emit instruction index instead of less
      convenient instruction byte offset.
      
      Fixes: 22541a9e ("libbpf: Add BTF_KIND_FLOAT support")
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarLorenz Bauer <lmb@cloudflare.com>
      Link: https://lore.kernel.org/bpf/20210426192949.416837-3-andrii@kernel.org
      6709a914
    • Andrii Nakryiko's avatar
      selftests/bpf: Add remaining ASSERT_xxx() variants · 7a2fa70a
      Andrii Nakryiko authored
      Add ASSERT_TRUE/ASSERT_FALSE for conditions calculated with custom logic to
      true/false. Also add remaining arithmetical assertions:
        - ASSERT_LE -- less than or equal;
        - ASSERT_GT -- greater than;
        - ASSERT_GE -- greater than or equal.
      This should cover most scenarios where people fall back to error-prone
      CHECK()s.
      
      Also extend ASSERT_ERR() to print out errno, in addition to direct error.
      
      Also convert few CHECK() instances to ensure new ASSERT_xxx() variants work as
      expected. Subsequent patch will also use ASSERT_TRUE/ASSERT_FALSE more
      extensively.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarLorenz Bauer <lmb@cloudflare.com>
      Link: https://lore.kernel.org/bpf/20210426192949.416837-2-andrii@kernel.org
      7a2fa70a
  2. 26 Apr, 2021 27 commits
  3. 24 Apr, 2021 2 commits
    • Parav Pandit's avatar
      net/mlx5: SF, Extend SF table for additional SF id range · f1b9acd3
      Parav Pandit authored
      Extended the SF table to cover additioanl SF id range of external
      controller.
      
      A user optionallly provides the external controller number when user
      wants to create SF on the external controller.
      
      An example on eswitch system:
      $ devlink dev eswitch set pci/0033:01:00.0 mode switchdev
      
      $ devlink port show
      pci/0033:01:00.0/196607: type eth netdev enP51p1s0f0np0 flavour physical port 0 splittable false
      pci/0033:01:00.0/131072: type eth netdev eth0 flavour pcipf controller 1 pfnum 0 external true splittable false
        function:
          hw_addr 00:00:00:00:00:00
      
      $ devlink port add pci/0033:01:00.0 flavour pcisf pfnum 0 sfnum 77 controller 1
      pci/0033:01:00.0/163840: type eth netdev eth1 flavour pcisf controller 1 pfnum 0 sfnum 77 external true splittable false
        function:
          hw_addr 00:00:00:00:00:00 state inactive opstate detached
      Signed-off-by: default avatarParav Pandit <parav@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      f1b9acd3
    • Parav Pandit's avatar
      net/mlx5: SF, Split mlx5_sf_hw_table into two parts · a3088f87
      Parav Pandit authored
      Device has SF ids in two different contiguous ranges. One for the local
      controller and second for the external controller's PF.
      
      Each such range has its own maximum number of functions and base id.
      To allocate SF from either of the range, prepare code to split into
      range specific fields into its own structure.
      Signed-off-by: default avatarParav Pandit <parav@nvidia.com>
      Reviewed-by: default avatarVu Pham <vuhuong@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      a3088f87