1. 17 Nov, 2018 8 commits
  2. 11 Nov, 2018 4 commits
    • Alexei Starovoitov's avatar
      Merge branch 'narrow-loads' · 407be8d0
      Alexei Starovoitov authored
      Andrey Ignatov says:
      
      ====================
      This patch set adds support for narrow loads with offset > 0 to BPF
      verifier.
      
      Patch 1 provides more details and is the main patch in the set.
      Patches 2 and 3 add new test cases to test_verifier and test_sock_addr
      selftests.
      
      v1->v2:
      - fix -Wdeclaration-after-statement warning.
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      407be8d0
    • Andrey Ignatov's avatar
      selftests/bpf: Test narrow loads with off > 0 for bpf_sock_addr · e7605475
      Andrey Ignatov authored
      Add more test cases for context bpf_sock_addr to test narrow loads with
      offset > 0 for ctx->user_ip4 field (__u32):
      * off=1, size=1;
      * off=2, size=1;
      * off=3, size=1;
      * off=2, size=2.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      e7605475
    • Andrey Ignatov's avatar
      selftests/bpf: Test narrow loads with off > 0 in test_verifier · 6c2afb67
      Andrey Ignatov authored
      Test the following narrow loads in test_verifier for context __sk_buff:
      * off=1, size=1 - ok;
      * off=2, size=1 - ok;
      * off=3, size=1 - ok;
      * off=0, size=2 - ok;
      * off=1, size=2 - fail;
      * off=0, size=2 - ok;
      * off=3, size=2 - fail.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      6c2afb67
    • Andrey Ignatov's avatar
      bpf: Allow narrow loads with offset > 0 · 46f53a65
      Andrey Ignatov authored
      Currently BPF verifier allows narrow loads for a context field only with
      offset zero. E.g. if there is a __u32 field then only the following
      loads are permitted:
        * off=0, size=1 (narrow);
        * off=0, size=2 (narrow);
        * off=0, size=4 (full).
      
      On the other hand LLVM can generate a load with offset different than
      zero that make sense from program logic point of view, but verifier
      doesn't accept it.
      
      E.g. tools/testing/selftests/bpf/sendmsg4_prog.c has code:
      
        #define DST_IP4			0xC0A801FEU /* 192.168.1.254 */
        ...
        	if ((ctx->user_ip4 >> 24) == (bpf_htonl(DST_IP4) >> 24) &&
      
      where ctx is struct bpf_sock_addr.
      
      Some versions of LLVM can produce the following byte code for it:
      
             8:       71 12 07 00 00 00 00 00         r2 = *(u8 *)(r1 + 7)
             9:       67 02 00 00 18 00 00 00         r2 <<= 24
            10:       18 03 00 00 00 00 00 fe 00 00 00 00 00 00 00 00         r3 = 4261412864 ll
            12:       5d 32 07 00 00 00 00 00         if r2 != r3 goto +7 <LBB0_6>
      
      where `*(u8 *)(r1 + 7)` means narrow load for ctx->user_ip4 with size=1
      and offset=3 (7 - sizeof(ctx->user_family) = 3). This load is currently
      rejected by verifier.
      
      Verifier code that rejects such loads is in bpf_ctx_narrow_access_ok()
      what means any is_valid_access implementation, that uses the function,
      works this way, e.g. bpf_skb_is_valid_access() for __sk_buff or
      sock_addr_is_valid_access() for bpf_sock_addr.
      
      The patch makes such loads supported. Offset can be in [0; size_default)
      but has to be multiple of load size. E.g. for __u32 field the following
      loads are supported now:
        * off=0, size=1 (narrow);
        * off=1, size=1 (narrow);
        * off=2, size=1 (narrow);
        * off=3, size=1 (narrow);
        * off=0, size=2 (narrow);
        * off=2, size=2 (narrow);
        * off=0, size=4 (full).
      Reported-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      46f53a65
  3. 10 Nov, 2018 18 commits
  4. 09 Nov, 2018 7 commits
    • Nitin Hande's avatar
      bpf: Extend the sk_lookup() helper to XDP hookpoint. · c8123ead
      Nitin Hande authored
      This patch proposes to extend the sk_lookup() BPF API to the
      XDP hookpoint. The sk_lookup() helper supports a lookup
      on incoming packet to find the corresponding socket that will
      receive this packet. Current support for this BPF API is
      at the tc hookpoint. This patch will extend this API at XDP
      hookpoint. A XDP program can map the incoming packet to the
      5-tuple parameter and invoke the API to find the corresponding
      socket structure.
      Signed-off-by: default avatarNitin Hande <Nitin.Hande@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      c8123ead
    • David Ahern's avatar
      bpftool: Improve handling of ENOENT on map dumps · bf598a8f
      David Ahern authored
      bpftool output is not user friendly when dumping a map with only a few
      populated entries:
      
          $ bpftool map
          1: devmap  name tx_devmap  flags 0x0
                  key 4B  value 4B  max_entries 64  memlock 4096B
          2: array  name tx_idxmap  flags 0x0
                  key 4B  value 4B  max_entries 64  memlock 4096B
      
          $ bpftool map dump id 1
          key:
          00 00 00 00
          value:
          No such file or directory
          key:
          01 00 00 00
          value:
          No such file or directory
          key:
          02 00 00 00
          value:
          No such file or directory
          key: 03 00 00 00  value: 03 00 00 00
      
      Handle ENOENT by keeping the line format sane and dumping
      "<no entry>" for the value
      
          $ bpftool map dump id 1
          key: 00 00 00 00  value: <no entry>
          key: 01 00 00 00  value: <no entry>
          key: 02 00 00 00  value: <no entry>
          key: 03 00 00 00  value: 03 00 00 00
          ...
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      bf598a8f
    • Sowmini Varadhan's avatar
      selftests/bpf: add a test case for sock_ops perf-event notification · 435f90a3
      Sowmini Varadhan authored
      This patch provides a tcp_bpf based eBPF sample. The test
      
      - ncat(1) as the TCP client program to connect() to a port
        with the intention of triggerring SYN retransmissions: we
        first install an iptables DROP rule to make sure ncat SYNs are
        resent (instead of aborting instantly after a TCP RST)
      
      - has a bpf kernel module that sends a perf-event notification for
        each TCP retransmit, and also tracks the number of such notifications
        sent in the global_map
      
      The test passes when the number of event notifications intercepted
      in user-space matches the value in the global_map.
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      435f90a3
    • Sowmini Varadhan's avatar
      bpf: add perf event notificaton support for sock_ops · a5a3a828
      Sowmini Varadhan authored
      This patch allows eBPF programs that use sock_ops to send perf
      based event notifications using bpf_perf_event_output(). Our main
      use case for this is the following:
      
        We would like to monitor some subset of TCP sockets in user-space,
        (the monitoring application would define 4-tuples it wants to monitor)
        using TCP_INFO stats to analyze reported problems. The idea is to
        use those stats to see where the bottlenecks are likely to be ("is
        it application-limited?" or "is there evidence of BufferBloat in
        the path?" etc).
      
        Today we can do this by periodically polling for tcp_info, but this
        could be made more efficient if the kernel would asynchronously
        notify the application via tcp_info when some "interesting"
        thresholds (e.g., "RTT variance > X", or "total_retrans > Y" etc)
        are reached. And to make this effective, it is better if
        we could apply the threshold check *before* constructing the
        tcp_info netlink notification, so that we don't waste resources
        constructing notifications that will be discarded by the filter.
      
      This work solves the problem by adding perf event based notification
      support for sock_ops. The eBPF program can thus be designed to apply
      any desired filters to the bpf_sock_ops and trigger a perf event
      notification based on the evaluation from the filter. The user space
      component can use these perf event notifications to either read any
      state managed by the eBPF program, or issue a TCP_INFO netlink call
      if desired.
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Co-developed-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      a5a3a828
    • Daniel Borkmann's avatar
      Merge branch 'bpf-max-pkt-offset' · 185067a8
      Daniel Borkmann authored
      Jiong Wang says:
      
      ====================
      The maximum packet offset accessed by one BPF program is useful
      information.
      
      Because sometimes there could be packet split and it is possible for some
      reasons (for example performance) we want to reject the BPF program if the
      maximum packet size would trigger such split. Normally, MTU value is
      treated as the maximum packet size, but one BPF program does not always
      access the whole packet, it could only access the head portion of the data.
      
      We could let verifier calculate the maximum packet offset ever used and
      record it inside prog auxiliar information structure as a new field
      "max_pkt_offset".
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      185067a8
    • Jiong Wang's avatar
      nfp: bpf: relax prog rejection through max_pkt_offset · cf599f50
      Jiong Wang authored
      NFP is refusing to offload programs whenever the MTU is set to a value
      larger than the max packet bytes that fits in NFP Cluster Target Memory
      (CTM). However, a eBPF program doesn't always need to access the whole
      packet data.
      
      Verifier has always calculated maximum direct packet access (DPA) offset,
      and kept it in max_pkt_offset inside prog auxiliar information. This patch
      relax prog rejection based on max_pkt_offset.
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarJiong Wang <jiong.wang@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      cf599f50
    • Jiong Wang's avatar
      bpf: let verifier to calculate and record max_pkt_offset · e647815a
      Jiong Wang authored
      In check_packet_access, update max_pkt_offset after the offset has passed
      __check_packet_access.
      
      It should be safe to use u32 for max_pkt_offset as explained in code
      comment.
      
      Also, when there is tail call, the max_pkt_offset of the called program is
      unknown, so conservatively set max_pkt_offset to MAX_PACKET_OFF for such
      case.
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarJiong Wang <jiong.wang@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      e647815a
  5. 07 Nov, 2018 3 commits
    • Shannon Nelson's avatar
      bpf_load: add map name to load_maps error message · bce6a149
      Shannon Nelson authored
      To help when debugging bpf/xdp load issues, have the load_map()
      error message include the number and name of the map that
      failed.
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@oracle.com>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      bce6a149
    • Quentin Monnet's avatar
      tools: bpftool: adjust rlimit RLIMIT_MEMLOCK when loading programs, maps · 8302b9bd
      Quentin Monnet authored
      The limit for memory locked in the kernel by a process is usually set to
      64 kbytes by default. This can be an issue when creating large BPF maps
      and/or loading many programs. A workaround is to raise this limit for
      the current process before trying to create a new BPF map. Changing the
      hard limit requires the CAP_SYS_RESOURCE and can usually only be done by
      root user (for non-root users, a call to setrlimit fails (and sets
      errno) and the program simply goes on with its rlimit unchanged).
      
      There is no API to get the current amount of memory locked for a user,
      therefore we cannot raise the limit only when required. One solution,
      used by bcc, is to try to create the map, and on getting a EPERM error,
      raising the limit to infinity before giving another try. Another
      approach, used in iproute2, is to raise the limit in all cases, before
      trying to create the map.
      
      Here we do the same as in iproute2: the rlimit is raised to infinity
      before trying to load programs or to create maps with bpftool.
      Signed-off-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      8302b9bd
    • Quentin Monnet's avatar
      selftests/bpf: enable (uncomment) all tests in test_libbpf.sh · f96afa76
      Quentin Monnet authored
      libbpf is now able to load successfully test_l4lb_noinline.o and
      samples/bpf/tracex3_kern.o.
      
      For the test_l4lb_noinline, uncomment related tests from test_libbpf.c
      and remove the associated "TODO".
      
      For tracex3_kern.o, instead of loading a program from samples/bpf/ that
      might not have been compiled at this stage, try loading a program from
      BPF selftests. Since this test case is about loading a program compiled
      without the "-target bpf" flag, change the Makefile to compile one
      program accordingly (instead of passing the flag for compiling all
      programs).
      
      Regarding test_xdp_noinline.o: in its current shape the program fails to
      load because it provides no version section, but the loader needs one.
      The test was added to make sure that libbpf could load XDP programs even
      if they do not provide a version number in a dedicated section. But
      libbpf is already capable of doing that: in our case loading fails
      because the loader does not know that this is an XDP program (it does
      not need to, since it does not attach the program). So trying to load
      test_xdp_noinline.o does not bring much here: just delete this subtest.
      
      For the record, the error message obtained with tracex3_kern.o was
      fixed by commit e3d91b0c ("tools/libbpf: handle issues with bpf ELF
      objects containing .eh_frames")
      
      I have not been abled to reproduce the "libbpf: incorrect bpf_call
      opcode" error for test_l4lb_noinline.o, even with the version of libbpf
      present at the time when test_libbpf.sh and test_libbpf_open.c were
      created.
      
      RFC -> v1:
      - Compile test_xdp without the "-target bpf" flag, and try to load it
        instead of ../../samples/bpf/tracex3_kern.o.
      - Delete test_xdp_noinline.o subtest.
      
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      f96afa76