1. 29 Mar, 2020 7 commits
  2. 28 Mar, 2020 15 commits
  3. 26 Mar, 2020 6 commits
    • YueHaibing's avatar
      bpf: Remove unused vairable 'bpf_xdp_link_lops' · f54a5bba
      YueHaibing authored
      kernel/bpf/syscall.c:2263:34: warning: 'bpf_xdp_link_lops' defined but not used [-Wunused-const-variable=]
       static const struct bpf_link_ops bpf_xdp_link_lops;
                                        ^~~~~~~~~~~~~~~~~
      
      commit 70ed506c ("bpf: Introduce pinnable bpf_link abstraction")
      involded this unused variable, remove it.
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200326031613.19372-1-yuehaibing@huawei.com
      f54a5bba
    • Andrii Nakryiko's avatar
      bpf: Factor out attach_type to prog_type mapping for attach/detach · e28784e3
      Andrii Nakryiko authored
      Factor out logic mapping expected program attach type to program type and
      subsequent handling of program attach/detach. Also list out all supported
      cgroup BPF program types explicitly to prevent accidental bugs once more
      program types are added to a mapping. Do the same for prog_query API.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200325065746.640559-3-andriin@fb.com
      e28784e3
    • Andrii Nakryiko's avatar
      bpf: Factor out cgroup storages operations · 00c4eddf
      Andrii Nakryiko authored
      Refactor cgroup attach/detach code to abstract away common operations
      performed on all types of cgroup storages. This makes high-level logic more
      apparent, plus allows to reuse more code across multiple functions.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200325065746.640559-2-andriin@fb.com
      00c4eddf
    • John Fastabend's avatar
      bpf: Test_verifier, #70 error message updates for 32-bit right shift · aa131ed4
      John Fastabend authored
      After changes to add update_reg_bounds after ALU ops and adding ALU32
      bounds tracking the error message is changed in the 32-bit right shift
      tests.
      
      Test "#70/u bounds check after 32-bit right shift with 64-bit input FAIL"
      now fails with,
      
      Unexpected error message!
      	EXP: R0 invalid mem access
      	RES: func#0 @0
      
      7: (b7) r1 = 2
      8: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=invP2 R10=fp0 fp-8_w=mmmmmmmm
      8: (67) r1 <<= 31
      9: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=invP4294967296 R10=fp0 fp-8_w=mmmmmmmm
      9: (74) w1 >>= 31
      10: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=invP0 R10=fp0 fp-8_w=mmmmmmmm
      10: (14) w1 -= 2
      11: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=invP4294967294 R10=fp0 fp-8_w=mmmmmmmm
      11: (0f) r0 += r1
      math between map_value pointer and 4294967294 is not allowed
      
      And test "#70/p bounds check after 32-bit right shift with 64-bit input
      FAIL" now fails with,
      
      Unexpected error message!
      	EXP: R0 invalid mem access
      	RES: func#0 @0
      
      7: (b7) r1 = 2
      8: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=inv2 R10=fp0 fp-8_w=mmmmmmmm
      8: (67) r1 <<= 31
      9: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=inv4294967296 R10=fp0 fp-8_w=mmmmmmmm
      9: (74) w1 >>= 31
      10: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=inv0 R10=fp0 fp-8_w=mmmmmmmm
      10: (14) w1 -= 2
      11: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=inv4294967294 R10=fp0 fp-8_w=mmmmmmmm
      11: (0f) r0 += r1
      last_idx 11 first_idx 0
      regs=2 stack=0 before 10: (14) w1 -= 2
      regs=2 stack=0 before 9: (74) w1 >>= 31
      regs=2 stack=0 before 8: (67) r1 <<= 31
      regs=2 stack=0 before 7: (b7) r1 = 2
      math between map_value pointer and 4294967294 is not allowed
      
      Before this series we did not trip the "math between map_value pointer..."
      error because check_reg_sane_offset is never called in
      adjust_ptr_min_max_vals(). Instead we have a register state that looks
      like this at line 11*,
      
      11: R0_w=map_value(id=0,off=0,ks=8,vs=8,
                         smin_value=0,smax_value=0,
                         umin_value=0,umax_value=0,
                         var_off=(0x0; 0x0))
          R1_w=invP(id=0,
                    smin_value=0,smax_value=4294967295,
                    umin_value=0,umax_value=4294967295,
                    var_off=(0xfffffffe; 0x0))
          R10=fp(id=0,off=0,
                 smin_value=0,smax_value=0,
                 umin_value=0,umax_value=0,
                 var_off=(0x0; 0x0)) fp-8_w=mmmmmmmm
      11: (0f) r0 += r1
      
      In R1 'smin_val != smax_val' yet we have a tnum_const as seen
      by 'var_off(0xfffffffe; 0x0))' with a 0x0 mask. So we hit this check
      in adjust_ptr_min_max_vals()
      
       if ((known && (smin_val != smax_val || umin_val != umax_val)) ||
            smin_val > smax_val || umin_val > umax_val) {
             /* Taint dst register if offset had invalid bounds derived from
              * e.g. dead branches.
              */
             __mark_reg_unknown(env, dst_reg);
             return 0;
       }
      
      So we don't throw an error here and instead only throw an error
      later in the verification when the memory access is made.
      
      The root cause in verifier without alu32 bounds tracking is having
      'umin_value = 0' and 'umax_value = U64_MAX' from BPF_SUB which we set
      when 'umin_value < umax_val' here,
      
       if (dst_reg->umin_value < umax_val) {
          /* Overflow possible, we know nothing */
          dst_reg->umin_value = 0;
          dst_reg->umax_value = U64_MAX;
       } else { ...}
      
      Later in adjust_calar_min_max_vals we previously did a
      coerce_reg_to_size() which will clamp the U64_MAX to U32_MAX by
      truncating to 32bits. But either way without a call to update_reg_bounds
      the less precise bounds tracking will fall out of the alu op
      verification.
      
      After latest changes we now exit adjust_scalar_min_max_vals with the
      more precise umin value, due to zero extension propogating bounds from
      alu32 bounds into alu64 bounds and then calling update_reg_bounds.
      This then causes the verifier to trigger an earlier error and we get
      the error in the output above.
      
      This patch updates tests to reflect new error message.
      
      * I have a local patch to print entire verifier state regardless if we
       believe it is a constant so we can get a full picture of the state.
       Usually if tnum_is_const() then bounds are also smin=smax, etc. but
       this is not always true and is a bit subtle. Being able to see these
       states helps understand dataflow imo. Let me know if we want something
       similar upstream.
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/158507161475.15666.3061518385241144063.stgit@john-Precision-5820-Tower
      aa131ed4
    • John Fastabend's avatar
      bpf: Verifer, adjust_scalar_min_max_vals to always call update_reg_bounds() · 294f2fc6
      John Fastabend authored
      Currently, for all op verification we call __red_deduce_bounds() and
      __red_bound_offset() but we only call __update_reg_bounds() in bitwise
      ops. However, we could benefit from calling __update_reg_bounds() in
      BPF_ADD, BPF_SUB, and BPF_MUL cases as well.
      
      For example, a register with state 'R1_w=invP0' when we subtract from
      it,
      
       w1 -= 2
      
      Before coerce we will now have an smin_value=S64_MIN, smax_value=U64_MAX
      and unsigned bounds umin_value=0, umax_value=U64_MAX. These will then
      be clamped to S32_MIN, U32_MAX values by coerce in the case of alu32 op
      as done in above example. However tnum will be a constant because the
      ALU op is done on a constant.
      
      Without update_reg_bounds() we have a scenario where tnum is a const
      but our unsigned bounds do not reflect this. By calling update_reg_bounds
      after coerce to 32bit we further refine the umin_value to U64_MAX in the
      alu64 case or U32_MAX in the alu32 case above.
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/158507151689.15666.566796274289413203.stgit@john-Precision-5820-Tower
      294f2fc6
    • John Fastabend's avatar
      bpf: Verifer, refactor adjust_scalar_min_max_vals · 07cd2631
      John Fastabend authored
      Pull per op ALU logic into individual functions. We are about to add
      u32 versions of each of these by pull them out the code gets a bit
      more readable here and nicer in the next patch.
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/158507149518.15666.15672349629329072411.stgit@john-Precision-5820-Tower
      07cd2631
  4. 25 Mar, 2020 3 commits
  5. 23 Mar, 2020 4 commits
  6. 20 Mar, 2020 5 commits
    • Bill Wendling's avatar
      selftests/bpf: Fix mix of tabs and spaces · 1440e792
      Bill Wendling authored
      Clang's -Wmisleading-indentation warns about misleading indentations if
      there's a mixture of spaces and tabs. Remove extraneous spaces.
      Signed-off-by: default avatarBill Wendling <morbo@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20200320201510.217169-1-morbo@google.com
      1440e792
    • YueHaibing's avatar
      bpf, tcp: Make tcp_bpf_recvmsg static · c0fd336e
      YueHaibing authored
      After commit f747632b ("bpf: sockmap: Move generic sockmap
      hooks from BPF TCP"), tcp_bpf_recvmsg() is not used out of
      tcp_bpf.c, so make it static and remove it from tcp.h. Also move
      it to BPF_STREAM_PARSER #ifdef to fix unused function warnings.
      
      Fixes: f747632b ("bpf: sockmap: Move generic sockmap hooks from BPF TCP")
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20200320023426.60684-3-yuehaibing@huawei.com
      c0fd336e
    • YueHaibing's avatar
      bpf, tcp: Fix unused function warnings · a2652798
      YueHaibing authored
      If BPF_STREAM_PARSER is not set, gcc warns:
      
        net/ipv4/tcp_bpf.c:483:12: warning: 'tcp_bpf_sendpage' defined but not used [-Wunused-function]
        net/ipv4/tcp_bpf.c:395:12: warning: 'tcp_bpf_sendmsg' defined but not used [-Wunused-function]
        net/ipv4/tcp_bpf.c:13:13: warning: 'tcp_bpf_stream_read' defined but not used [-Wunused-function]
      
      Moves the unused functions into the #ifdef CONFIG_BPF_STREAM_PARSER.
      
      Fixes: f747632b ("bpf: sockmap: Move generic sockmap hooks from BPF TCP")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarLorenz Bauer <lmb@cloudflare.com>
      Reviewed-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20200320023426.60684-2-yuehaibing@huawei.com
      a2652798
    • Martin KaFai Lau's avatar
      bpftool: Add struct_ops support · 65c93628
      Martin KaFai Lau authored
      This patch adds struct_ops support to the bpftool.
      
      To recap a bit on the recent bpf_struct_ops feature on the kernel side:
      It currently supports "struct tcp_congestion_ops" to be implemented
      in bpf.  At a high level, bpf_struct_ops is struct_ops map populated
      with a number of bpf progs.  bpf_struct_ops currently supports the
      "struct tcp_congestion_ops".  However, the bpf_struct_ops design is
      generic enough that other kernel struct ops can be supported in
      the future.
      
      Although struct_ops is map+progs at a high lever, there are differences
      in details.  For example,
      1) After registering a struct_ops, the struct_ops is held by the kernel
         subsystem (e.g. tcp-cc).  Thus, there is no need to pin a
         struct_ops map or its progs in order to keep them around.
      2) To iterate all struct_ops in a system, it iterates all maps
         in type BPF_MAP_TYPE_STRUCT_OPS.  BPF_MAP_TYPE_STRUCT_OPS is
         the current usual filter.  In the future, it may need to
         filter by other struct_ops specific properties.  e.g. filter by
         tcp_congestion_ops or other kernel subsystem ops in the future.
      3) struct_ops requires the running kernel having BTF info.  That allows
         more flexibility in handling other kernel structs.  e.g. it can
         always dump the latest bpf_map_info.
      4) Also, "struct_ops" command is not intended to repeat all features
         already provided by "map" or "prog".  For example, if there really
         is a need to pin the struct_ops map, the user can use the "map" cmd
         to do that.
      
      While the first attempt was to reuse parts from map/prog.c,  it ended up
      not a lot to share.  The only obvious item is the map_parse_fds() but
      that still requires modifications to accommodate struct_ops map specific
      filtering (for the immediate and the future needs).  Together with the
      earlier mentioned differences, it is better to part away from map/prog.c.
      
      The initial set of subcmds are, register, unregister, show, and dump.
      
      For register, it registers all struct_ops maps that can be found in an
      obj file.  Option can be added in the future to specify a particular
      struct_ops map.  Also, the common bpf_tcp_cc is stateless (e.g.
      bpf_cubic.c and bpf_dctcp.c).  The "reuse map" feature is not
      implemented in this patch and it can be considered later also.
      
      For other subcmds, please see the man doc for details.
      
      A sample output of dump:
      [root@arch-fb-vm1 bpf]# bpftool struct_ops dump name cubic
      [{
              "bpf_map_info": {
                  "type": 26,
                  "id": 64,
                  "key_size": 4,
                  "value_size": 256,
                  "max_entries": 1,
                  "map_flags": 0,
                  "name": "cubic",
                  "ifindex": 0,
                  "btf_vmlinux_value_type_id": 18452,
                  "netns_dev": 0,
                  "netns_ino": 0,
                  "btf_id": 52,
                  "btf_key_type_id": 0,
                  "btf_value_type_id": 0
              }
          },{
              "bpf_struct_ops_tcp_congestion_ops": {
                  "refcnt": {
                      "refs": {
                          "counter": 1
                      }
                  },
                  "state": "BPF_STRUCT_OPS_STATE_INUSE",
                  "data": {
                      "list": {
                          "next": 0,
                          "prev": 0
                      },
                      "key": 0,
                      "flags": 0,
                      "init": "void (struct sock *) bictcp_init/prog_id:138",
                      "release": "void (struct sock *) 0",
                      "ssthresh": "u32 (struct sock *) bictcp_recalc_ssthresh/prog_id:141",
                      "cong_avoid": "void (struct sock *, u32, u32) bictcp_cong_avoid/prog_id:140",
                      "set_state": "void (struct sock *, u8) bictcp_state/prog_id:142",
                      "cwnd_event": "void (struct sock *, enum tcp_ca_event) bictcp_cwnd_event/prog_id:139",
                      "in_ack_event": "void (struct sock *, u32) 0",
                      "undo_cwnd": "u32 (struct sock *) tcp_reno_undo_cwnd/prog_id:144",
                      "pkts_acked": "void (struct sock *, const struct ack_sample *) bictcp_acked/prog_id:143",
                      "min_tso_segs": "u32 (struct sock *) 0",
                      "sndbuf_expand": "u32 (struct sock *) 0",
                      "cong_control": "void (struct sock *, const struct rate_sample *) 0",
                      "get_info": "size_t (struct sock *, u32, int *, union tcp_cc_info *) 0",
                      "name": "bpf_cubic",
                      "owner": 0
                  }
              }
          }
      ]
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarQuentin Monnet <quentin@isovalent.com>
      Link: https://lore.kernel.org/bpf/20200318171656.129650-1-kafai@fb.com
      65c93628
    • Martin KaFai Lau's avatar
      bpftool: Translate prog_id to its bpf prog_name · d5ae04da
      Martin KaFai Lau authored
      The kernel struct_ops obj has kernel's func ptrs implemented by bpf_progs.
      The bpf prog_id is stored as the value of the func ptr for introspection
      purpose.  In the latter patch, a struct_ops dump subcmd will be added
      to introspect these func ptrs.  It is desired to print the actual bpf
      prog_name instead of only printing the prog_id.
      
      Since struct_ops is the only usecase storing prog_id in the func ptr,
      this patch adds a prog_id_as_func_ptr bool (default is false) to
      "struct btf_dumper" in order not to mis-interpret the ptr value
      for the other existing use-cases.
      
      While printing a func_ptr as a bpf prog_name,
      this patch also prefix the bpf prog_name with the ptr's func_proto.
      [ Note that it is the ptr's func_proto instead of the bpf prog's
        func_proto ]
      It reuses the current btf_dump_func() to obtain the ptr's func_proto
      string.
      
      Here is an example from the bpf_cubic.c:
      "void (struct sock *, u32, u32) bictcp_cong_avoid/prog_id:140"
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarQuentin Monnet <quentin@isovalent.com>
      Link: https://lore.kernel.org/bpf/20200318171650.129252-1-kafai@fb.com
      d5ae04da