1. 14 Sep, 2018 7 commits
    • Alexei Starovoitov's avatar
      selftests/bpf: fix bpf_flow.c build · 70e88c75
      Alexei Starovoitov authored
      fix the following build error:
      clang -I. -I./include/uapi -I../../../include/uapi -idirafter /usr/local/include -idirafter /data/users/ast/llvm/bld/lib/clang/7.0.0/include -idirafter /usr/include -Wno-compare-distinct-pointer-types \
      	 -O2 -target bpf -emit-llvm -c bpf_flow.c -o - |      \
      llc -march=bpf -mcpu=generic  -filetype=obj -o /data/users/ast/bpf-next/tools/testing/selftests/bpf/bpf_flow.o
      LLVM ERROR: 'dissect' label emitted multiple times to assembly file
      make: *** [/data/users/ast/bpf-next/tools/testing/selftests/bpf/bpf_flow.o] Error 1
      
      Fixes: 9c98b13c ("flow_dissector: implements eBPF parser")
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      70e88c75
    • Alexei Starovoitov's avatar
      Merge branch 'bpf-flow-dissector' · 4a9f42c9
      Alexei Starovoitov authored
      Petar Penkov says:
      
      ====================
      This patch series hardens the RX stack by allowing flow dissection in BPF,
      as previously discussed [1]. Because of the rigorous checks of the BPF
      verifier, this provides significant security guarantees. In particular, the
      BPF flow dissector cannot get inside of an infinite loop, as with
      CVE-2013-4348, because BPF programs are guaranteed to terminate. It cannot
      read outside of packet bounds, because all memory accesses are checked.
      Also, with BPF the administrator can decide which protocols to support,
      reducing potential attack surface. Rarely encountered protocols can be
      excluded from dissection and the program can be updated without kernel
      recompile or reboot if a bug is discovered.
      
      Patch 1 adds infrastructure to execute a BPF program in __skb_flow_dissect.
      This includes a new BPF program and attach type.
      
      Patch 2 adds the new BPF flow dissector definitions to tools/uapi.
      
      Patch 3 adds support for the new BPF program type to libbpf and bpftool.
      
      Patch 4 adds a flow dissector program in BPF. This parses most protocols in
      __skb_flow_dissect in BPF for a subset of flow keys (basic, control, ports,
      and address types).
      
      Patch 5 adds a selftest that attaches the BPF program to the flow dissector
      and sends traffic with different levels of encapsulation.
      
      Performance Evaluation:
      The in-kernel implementation was compared against the demo program from
      patch 4 using the test in patch 5 with IPv4/UDP traffic over 10 seconds.
      	$perf record -a -C 4 taskset -c 4 ./test_flow_dissector -i 4 -f 8 \
      		-t 10
      
      In-kernel Dissector:
      	__skb_flow_dissect overhead: 2.12%
      	Total Packets: 3,272,597 (from output of ./test_flow_dissector)
      
      BPF Dissector:
      	__skb_flow_dissect overhead: 1.63%
      	Total Packets: 3,232,356 (from output of ./test_flow_dissector)
      
      No-op BPF Dissector:
      	__skb_flow_dissect overhead: 1.52%
      	Total Packets: 3,330,635 (from output of ./test_flow_dissector)
      
      Changes since v3:
      1/ struct bpf_flow_keys reorganized to remove holes in patch 1 and patch 2.
      
      Changes since v2:
      1/ Changes to tools/include/uapi pulled into a separate patch 2
      2/ Changes to tools/lib and tools/bpftool pulled into a separate patch 3
      3/ Changed flow_keys in __sk_buff from __u32 to struct bpf_flow_keys *
      4/ Added nhoff field in struct bpf_flow_keys to pass initial offset
      5/ Saving all of the modified control block, rather than just the qdisc
      6/ Sample BPF program in patch 4 modified to use the changes above
      
      Changes since v1:
      1/ LD_ABS instructions now disallowed for the new BPF prog type
      2/ now checks if skb is NULL in __skb_flow_dissect()
      3/ fixed incorrect accesses in flow_dissector_is_valid_access()
      	- writes to the flow_keys field now disallowed
      	- reads/writes to tc_classid and data_meta now disallowed
      4/ headers pulled with bpf_skb_load_data if direct access fails
      
      Changes since RFC:
      1/ Flow dissector hook changed from global to per-netns
      2/ Defined struct bpf_flow_keys to be used in BPF flow dissector
      programs instead of exposing the internal flow keys layout. Added a
      function to translate from bpf_flow_keys to the internal layout after BPF
      dissection is complete. The pointer to this struct is stored in
      qdisc_skb_cb rather than inside of the 20 byte control block which
      simplifies verification and allows access to all 20 bytes of the cb.
      3/ Removed GUE parsing as it relied on a hardcoded port
      4/ MPLS parsing now stops at the first label which is consistent
      with the in-kernel flow dissector
      5/ Refactored to use direct packet access and to write out to
      struct bpf_flow_keys
      
      [1] http://vger.kernel.org/netconf2017_files/rx_hardening_and_udp_gso.pdf
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      4a9f42c9
    • Petar Penkov's avatar
      selftests/bpf: test bpf flow dissection · 50b3ed57
      Petar Penkov authored
      Adds a test that sends different types of packets over multiple
      tunnels and verifies that valid packets are dissected correctly.  To do
      so, a tc-flower rule is added to drop packets on UDP src port 9, and
      packets are sent from ports 8, 9, and 10. Only the packets on port 9
      should be dropped. Because tc-flower relies on the flow dissector to
      match flows, correct classification demonstrates correct dissection.
      
      Also add support logic to load the BPF program and to inject the test
      packets.
      Signed-off-by: default avatarPetar Penkov <ppenkov@google.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      50b3ed57
    • Petar Penkov's avatar
      flow_dissector: implements eBPF parser · 9c98b13c
      Petar Penkov authored
      This eBPF program extracts basic/control/ip address/ports keys from
      incoming packets. It supports recursive parsing for IP encapsulation,
      and VLAN, along with IPv4/IPv6 and extension headers.  This program is
      meant to show how flow dissection and key extraction can be done in
      eBPF.
      
      Link: http://vger.kernel.org/netconf2017_files/rx_hardening_and_udp_gso.pdfSigned-off-by: default avatarPetar Penkov <ppenkov@google.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      9c98b13c
    • Petar Penkov's avatar
      bpf: support flow dissector in libbpf and bpftool · c22fbae7
      Petar Penkov authored
      This patch extends libbpf and bpftool to work with programs of type
      BPF_PROG_TYPE_FLOW_DISSECTOR.
      Signed-off-by: default avatarPetar Penkov <ppenkov@google.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c22fbae7
    • Petar Penkov's avatar
      bpf: sync bpf.h uapi with tools/ · 2f965e3f
      Petar Penkov authored
      This patch syncs tools/include/uapi/linux/bpf.h with the flow dissector
      definitions from include/uapi/linux/bpf.h
      Signed-off-by: default avatarPetar Penkov <ppenkov@google.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      2f965e3f
    • Petar Penkov's avatar
      flow_dissector: implements flow dissector BPF hook · d58e468b
      Petar Penkov authored
      Adds a hook for programs of type BPF_PROG_TYPE_FLOW_DISSECTOR and
      attach type BPF_FLOW_DISSECTOR that is executed in the flow dissector
      path. The BPF program is per-network namespace.
      Signed-off-by: default avatarPetar Penkov <ppenkov@google.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d58e468b
  2. 11 Sep, 2018 5 commits
    • Anders Roxell's avatar
      net/core/filter: fix unused-variable warning · 1edb6e03
      Anders Roxell authored
      Building with CONFIG_INET=n will show the warning below:
      net/core/filter.c: In function ‘____bpf_getsockopt’:
      net/core/filter.c:4048:19: warning: unused variable ‘tp’ [-Wunused-variable]
        struct tcp_sock *tp;
                         ^~
      net/core/filter.c:4046:31: warning: unused variable ‘icsk’ [-Wunused-variable]
        struct inet_connection_sock *icsk;
                                     ^~~~
      Move the variable declarations inside the {} block where they are used.
      
      Fixes: 1e215300 ("bpf: add TCP_SAVE_SYN/TCP_SAVED_SYN options for bpf_(set|get)sockopt")
      Signed-off-by: default avatarAnders Roxell <anders.roxell@linaro.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      1edb6e03
    • Yonghong Song's avatar
      tools/bpf: fix a netlink recv issue · 9d0b3c1f
      Yonghong Song authored
      Commit f7010770 ("tools/bpf: move bpf/lib netlink related
      functions into a new file") introduced a while loop for the
      netlink recv path. This while loop is needed since the
      buffer in recv syscall may not be enough to hold all the
      information and in such cases multiple recv calls are needed.
      
      There is a bug introduced by the above commit as
      the while loop may block on recv syscall if there is no
      more messages are expected. The netlink message header
      flag NLM_F_MULTI is used to indicate that more messages
      are expected and this patch fixed the bug by doing
      further recv syscall only if multipart message is expected.
      
      The patch added another fix regarding to message length of 0.
      When netlink recv returns message length of 0, there will be
      no more messages for returning data so the while loop
      can end.
      
      Fixes: f7010770 ("tools/bpf: move bpf/lib netlink related functions into a new file")
      Reported-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Tested-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      9d0b3c1f
    • Alexei Starovoitov's avatar
      Merge branch 'progarray_mapinmap_dump' · 2e2a0c96
      Alexei Starovoitov authored
      Yonghong Song says:
      
      ====================
      The support to dump program array and map_in_map maps
      for bpffs and bpftool is added. Patch #1 added bpffs support
      and Patch #2 added bpftool support. Please see
      individual patches for example output.
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      2e2a0c96
    • Yonghong Song's avatar
      tools/bpf: bpftool: support prog array map and map of maps · ad3338d2
      Yonghong Song authored
      Currently, prog array map and map of maps are not supported
      in bpftool. This patch added the support.
      Different from other map types, for prog array map and
      map of maps, the key returned bpf_get_next_key() may not
      point to a valid value. So for these two map types,
      no error will be printed out when such a scenario happens.
      
      The following is the plain and json dump if btf is not available:
        $ ./bpftool map dump id 10
          key: 08 00 00 00  value: 5c 01 00 00
          Found 1 element
        $ ./bpftool -jp map dump id 10
          [{
              "key": ["0x08","0x00","0x00","0x00"
              ],
              "value": ["0x5c","0x01","0x00","0x00"
              ]
          }]
      
      If the BTF is available, the dump looks below:
        $ ./bpftool map dump id 2
          [{
                  "key": 0,
                  "value": 7
              }
          ]
        $ ./bpftool -jp map dump id 2
          [{
              "key": ["0x00","0x00","0x00","0x00"
              ],
              "value": ["0x07","0x00","0x00","0x00"
              ],
              "formatted": {
                  "key": 0,
                  "value": 7
              }
          }]
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      ad3338d2
    • Yonghong Song's avatar
      bpf: add bpffs pretty print for program array map · a7c19db3
      Yonghong Song authored
      Added bpffs pretty print for program array map. For a particular
      array index, if the program array points to a valid program,
      the "<index>: <prog_id>" will be printed out like
         0: 6
      which means bpf program with id "6" is installed at index "0".
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      a7c19db3
  3. 07 Sep, 2018 9 commits
    • Yonghong Song's avatar
      tools/bpf: bpftool: add net support · f6f3bac0
      Yonghong Song authored
      Add "bpftool net" support. Networking devices are enumerated
      to dump device index/name associated with xdp progs.
      
      For each networking device, tc classes and qdiscs are enumerated
      in order to check their bpf filters.
      In addition, root handle and clsact ingress/egress are also checked for
      bpf filters.  Not all filter information is printed out. Only ifindex,
      kind, filter name, prog_id and tag are printed out, which are good
      enough to show attachment information. If the filter action
      is a bpf action, its bpf program id, bpf name and tag will be
      printed out as well.
      
      For example,
        $ ./bpftool net
        xdp [
        ifindex 2 devname eth0 prog_id 198
        ]
        tc_filters [
        ifindex 2 kind qdisc_htb name prefix_matcher.o:[cls_prefix_matcher_htb]
                  prog_id 111727 tag d08fe3b4319bc2fd act []
        ifindex 2 kind qdisc_clsact_ingress name fbflow_icmp
                  prog_id 130246 tag 3f265c7f26db62c9 act []
        ifindex 2 kind qdisc_clsact_egress name prefix_matcher.o:[cls_prefix_matcher_clsact]
                  prog_id 111726 tag 99a197826974c876
        ifindex 2 kind qdisc_clsact_egress name cls_fg_dscp
                  prog_id 108619 tag dc4630674fd72dcc act []
        ifindex 2 kind qdisc_clsact_egress name fbflow_egress
                  prog_id 130245 tag 72d2d830d6888d2c
        ]
        $ ./bpftool -jp net
        [{
              "xdp": [{
                      "ifindex": 2,
                      "devname": "eth0",
                      "prog_id": 198
                  }
              ],
              "tc_filters": [{
                      "ifindex": 2,
                      "kind": "qdisc_htb",
                      "name": "prefix_matcher.o:[cls_prefix_matcher_htb]",
                      "prog_id": 111727,
                      "tag": "d08fe3b4319bc2fd",
                      "act": []
                  },{
                      "ifindex": 2,
                      "kind": "qdisc_clsact_ingress",
                      "name": "fbflow_icmp",
                      "prog_id": 130246,
                      "tag": "3f265c7f26db62c9",
                      "act": []
                  },{
                      "ifindex": 2,
                      "kind": "qdisc_clsact_egress",
                      "name": "prefix_matcher.o:[cls_prefix_matcher_clsact]",
                      "prog_id": 111726,
                      "tag": "99a197826974c876"
                  },{
                      "ifindex": 2,
                      "kind": "qdisc_clsact_egress",
                      "name": "cls_fg_dscp",
                      "prog_id": 108619,
                      "tag": "dc4630674fd72dcc",
                      "act": []
                  },{
                      "ifindex": 2,
                      "kind": "qdisc_clsact_egress",
                      "name": "fbflow_egress",
                      "prog_id": 130245,
                      "tag": "72d2d830d6888d2c"
                  }
              ]
          }
        ]
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f6f3bac0
    • Yonghong Song's avatar
      tools/bpf: add more netlink functionalities in lib/bpf · 36f1678d
      Yonghong Song authored
      This patch added a few netlink attribute parsing functions
      and the netlink API functions to query networking links, tc classes,
      tc qdiscs and tc filters. For example, the following API is
      to get networking links:
        int nl_get_link(int sock, unsigned int nl_pid,
                        dump_nlmsg_t dump_link_nlmsg,
                        void *cookie);
      
      Note that when the API is called, the user also provided a
      callback function with the following signature:
        int (*dump_nlmsg_t)(void *cookie, void *msg, struct nlattr **tb);
      
      The "cookie" is the parameter the user passed to the API and will
      be available for the callback function.
      The "msg" is the information about the result, e.g., ifinfomsg or
      tcmsg. The "tb" is the parsed netlink attributes.
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      36f1678d
    • Yonghong Song's avatar
      tools/bpf: move bpf/lib netlink related functions into a new file · f7010770
      Yonghong Song authored
      There are no functionality change for this patch.
      
      In the subsequent patches, more netlink related library functions
      will be added and a separate file is better than cluttering bpf.c.
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f7010770
    • Yonghong Song's avatar
      tools/bpf: sync kernel uapi header if_link.h to tools · 52b7b784
      Yonghong Song authored
      Among others, this header will be used later for
      bpftool net support.
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      52b7b784
    • Mauricio Vasquez B's avatar
      selftests/bpf/test_progs: do not check errno == 0 · f5bd3948
      Mauricio Vasquez B authored
      The errno man page states: "The value in errno is significant only when
      the return value of the call indicated an error..." then it is not correct
      to check it, it could be different than zero even if the function
      succeeded.
      
      It causes some false positives if errno is set by a previous function.
      Signed-off-by: default avatarMauricio Vasquez B <mauricio.vasquez@polito.it>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f5bd3948
    • Mauricio Vasquez B's avatar
    • Jesper Dangaard Brouer's avatar
      xdp: split code for map vs non-map redirect · 47b123ed
      Jesper Dangaard Brouer authored
      The compiler does an efficient job of inlining static C functions.
      Perf top clearly shows that almost everything gets inlined into the
      function call xdp_do_redirect.
      
      The function xdp_do_redirect end-up containing and interleaving the
      map and non-map redirect code.  This is sub-optimal, as it would be
      strange for an XDP program to use both types of redirect in the same
      program. The two use-cases are separate, and interleaving the code
      just cause more instruction-cache pressure.
      
      I would like to stress (again) that the non-map variant bpf_redirect
      is very slow compared to the bpf_redirect_map variant, approx half the
      speed.  Measured with driver i40e the difference is:
      
      - map     redirect: 13,250,350 pps
      - non-map redirect:  7,491,425 pps
      
      For this reason, the function name of the non-map variant of redirect
      have been called xdp_do_redirect_slow.  This hopefully gives a hint
      when using perf, that this is not the optimal XDP redirect operating mode.
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      47b123ed
    • Jesper Dangaard Brouer's avatar
      xdp: explicit inline __xdp_map_lookup_elem · 2a68d85f
      Jesper Dangaard Brouer authored
      The compiler chooses to not-inline the function __xdp_map_lookup_elem,
      because it can see that it is used by both Generic-XDP and native-XDP
      do redirect calls (xdp_do_generic_redirect_map and xdp_do_redirect_map).
      
      The compiler cannot know that this is a bad choice, as it cannot know
      that a net device cannot run both XDP modes (Generic or Native) at the
      same time.  Thus, mark this function inline, even-though we normally
      leave this up-to the compiler.
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      2a68d85f
    • Jesper Dangaard Brouer's avatar
      xdp: unlikely instrumentation for xdp map redirect · e1302542
      Jesper Dangaard Brouer authored
      Notice the compiler generated ASM code layout was suboptimal.  It
      assumed map enqueue errors as the likely case, which is shouldn't.
      It assumed that xdp_do_flush_map() was a likely case, due to maps
      changing between packets, which should be very unlikely.
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      e1302542
  4. 06 Sep, 2018 4 commits
    • Alexei Starovoitov's avatar
      bpf/verifier: fix verifier instability · a9c676bc
      Alexei Starovoitov authored
      Edward Cree says:
      In check_mem_access(), for the PTR_TO_CTX case, after check_ctx_access()
      has supplied a reg_type, the other members of the register state are set
      appropriately.  Previously reg.range was set to 0, but as it is in a
      union with reg.map_ptr, which is larger, upper bytes of the latter were
      left in place.  This then caused the memcmp() in regsafe() to fail,
      preventing some branches from being pruned (and occasionally causing the
      same program to take a varying number of processed insns on repeated
      verifier runs).
      
      Fix the instability by clearing bpf_reg_state in __mark_reg_[un]known()
      
      Fixes: f1174f77 ("bpf/verifier: rework value tracking")
      Debugged-by: default avatarEdward Cree <ecree@solarflare.com>
      Acked-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      a9c676bc
    • Taeung Song's avatar
      libbpf: Remove the duplicate checking of function storage · 69495d2a
      Taeung Song authored
      After the commit eac7d845 ("tools: libbpf: don't return '.text'
      as a program for multi-function programs"), bpf_program__next()
      in bpf_object__for_each_program skips the function storage such as .text,
      so eliminate the duplicate checking.
      
      Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarTaeung Song <treeze.taeung@gmail.com>
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      69495d2a
    • Dmitry Safonov's avatar
      netlink: Make groups check less stupid in netlink_bind() · 428f944b
      Dmitry Safonov authored
      As Linus noted, the test for 0 is needless, groups type can follow the
      usual kernel style and 8*sizeof(unsigned long) is BITS_PER_LONG:
      
      > The code [..] isn't technically incorrect...
      > But it is stupid.
      > Why stupid? Because the test for 0 is pointless.
      >
      > Just doing
      >        if (nlk->ngroups < 8*sizeof(groups))
      >                groups &= (1UL << nlk->ngroups) - 1;
      >
      > would have been fine and more understandable, since the "mask by shift
      > count" already does the right thing for a ngroups value of 0. Now that
      > test for zero makes me go "what's special about zero?". It turns out
      > that the answer to that is "nothing".
      [..]
      > The type of "groups" is kind of silly too.
      >
      > Yeah, "long unsigned int" isn't _technically_ wrong. But we normally
      > call that type "unsigned long".
      
      Cleanup my piece of pointlessness.
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: netdev@vger.kernel.org
      Fairly-blamed-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarDmitry Safonov <dima@arista.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      428f944b
    • Vincent Whitchurch's avatar
      packet: add sockopt to ignore outgoing packets · fa788d98
      Vincent Whitchurch authored
      Currently, the only way to ignore outgoing packets on a packet socket is
      via the BPF filter.  With MSG_ZEROCOPY, packets that are looped into
      AF_PACKET are copied in dev_queue_xmit_nit(), and this copy happens even
      if the filter run from packet_rcv() would reject them.  So the presence
      of a packet socket on the interface takes away the benefits of
      MSG_ZEROCOPY, even if the packet socket is not interested in outgoing
      packets.  (Even when MSG_ZEROCOPY is not used, the skb is unnecessarily
      cloned, but the cost for that is much lower.)
      
      Add a socket option to allow AF_PACKET sockets to ignore outgoing
      packets to solve this.  Note that the *BSDs already have something
      similar: BIOCSSEESENT/BIOCSDIRECTION and BIOCSDIRFILT.
      
      The first intended user is lldpd.
      Signed-off-by: default avatarVincent Whitchurch <vincent.whitchurch@axis.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fa788d98
  5. 05 Sep, 2018 15 commits
    • YueHaibing's avatar
      net: lan743x_ptp: make function lan743x_ptp_set_sync_ts_insert() static · 05dcc712
      YueHaibing authored
      Fixes the following sparse warning:
      
      drivers/net/ethernet/microchip/lan743x_ptp.c:980:6: warning:
       symbol 'lan743x_ptp_set_sync_ts_insert' was not declared. Should it be static?
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      05dcc712
    • Wei Yongjun's avatar
      net/mlx5e: Make function mlx5i_grp_sw_update_stats() static · fbb66ad5
      Wei Yongjun authored
      Fixes the following sparse warning:
      
      drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c:119:6: warning:
       symbol 'mlx5i_grp_sw_update_stats' was not declared. Should it be static?
      Signed-off-by: default avatarWei Yongjun <weiyongjun1@huawei.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fbb66ad5
    • David S. Miller's avatar
      Merge tag 'mac80211-next-for-davem-2018-09-05' of... · 579d03fe
      David S. Miller authored
      Merge tag 'mac80211-next-for-davem-2018-09-05' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
      
      Johannes Berg says:
      
      ====================
      This time, we have some pretty impactful work. Among
      the changes:
       * changes to make PTK rekeying work better, or actually
         better/safely if drivers get updated
       * VHT extended NSS support - some APs had capabilities
         that didn't fit into the VHT (11ac) spec, so the spec
         was updated and we follow that now
       * some TXQ and A-MSDU building work - will allow iwlwifi
         to use this soon
       * more HE work, including aligning to 802.11ax Draft 3.0
       * L-SIG and 0-length-PSDU support in radiotap
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      579d03fe
    • Stanislaw Gruszka's avatar
      cfg80211: validate wmm rule when setting · 014f5a25
      Stanislaw Gruszka authored
      Add validation check for wmm rule when copy rules from fwdb and print
      error when rule is invalid.
      Signed-off-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      014f5a25
    • Gustavo A. R. Silva's avatar
      mac80211: remove unnecessary NULL check · 40b5a0f8
      Gustavo A. R. Silva authored
      Both old and new cannot be NULL at the same time, hence checking
      new when old is not NULL is unnecessary.
      
      Also, notice that new is being dereferenced before it is checked:
      
      	idx = new->conf.keyidx;
      
      The above triggers a static code analysis warning.
      
      Address this by removing the NULL check on new and adding a code
      comment based on the following piece of code:
      
      387        /* caller must provide at least one old/new */
      388        if (WARN_ON(!new && !old))
      389                return 0;
      
      Addresses-Coverity-ID: 1473176 ("Dereference before null check")
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      40b5a0f8
    • Sara Sharon's avatar
      mac80211: add an option for drivers to check if packets can be aggregated · 9739fe29
      Sara Sharon authored
      Some hardwares have limitations on the packets' type in AMSDU.
      Add an optional driver callback to determine if two skbs can
      be used in the same AMSDU or not.
      Signed-off-by: default avatarSara Sharon <sara.sharon@intel.com>
      Signed-off-by: default avatarLuca Coelho <luciano.coelho@intel.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      9739fe29
    • Sara Sharon's avatar
      mac80211: allow AMSDU size limitation per-TID · edba6bda
      Sara Sharon authored
      Some drivers may have AMSDU size limitation per TID, due to
      HW constrains. Add an option to set this limit.
      Signed-off-by: default avatarSara Sharon <sara.sharon@intel.com>
      Signed-off-by: default avatarLuca Coelho <luciano.coelho@intel.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      edba6bda
    • Sara Sharon's avatar
      mac80211: add an option for station management TXQ · 0eeb2b67
      Sara Sharon authored
      We have a TXQ abstraction for non-data packets that need
      powersave buffering. Since the AP cannot sleep, in case
      of station we can use this TXQ for all management frames,
      regardless if they are bufferable. Add HW flag to allow
      that.
      Signed-off-by: default avatarSara Sharon <sara.sharon@intel.com>
      Signed-off-by: default avatarLuca Coelho <luciano.coelho@intel.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      0eeb2b67
    • Shaul Triebitz's avatar
      wireless: align to draft 11ax D3.0 · add7453a
      Shaul Triebitz authored
      Align to new 11ax draft D3.0.  Change/add new MAC and PHY capabilities
      and update drivers' 11ax capabilities and mac80211's debugfs
      accordingly.
      Signed-off-by: default avatarShaul Triebitz <shaul.triebitz@intel.com>
      Signed-off-by: default avatarLuca Coelho <luciano.coelho@intel.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      add7453a
    • Naftali Goldstein's avatar
      mac80211: fix saving a few HE values · 77cbbc35
      Naftali Goldstein authored
      After masking the he_oper_params, to get the requested values as
      integers one must rshift and not lshift.  Fix that by using the
      le32_get_bits() macro.
      
      Fixes: 41cbb0f5 ("mac80211: add support for HE")
      Signed-off-by: default avatarNaftali Goldstein <naftali.goldstein@intel.com>
      [converted to use le32_get_bits()]
      Signed-off-by: default avatarLuca Coelho <luciano.coelho@intel.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      77cbbc35
    • Shaul Triebitz's avatar
      mac80211: support reporting 0-length PSDU in radiotap · c3d1f875
      Shaul Triebitz authored
      For certain sounding frames, it may be useful to report them
      to userspace even though they don't have a PSDU in order to
      determine the PHY parameters (e.g. VHT rate/stream config.)
      Add support for this to mac80211.
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarShaul Triebitz <shaul.triebitz@intel.com>
      Signed-off-by: default avatarLuca Coelho <luciano.coelho@intel.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      c3d1f875
    • Alexander Wetzel's avatar
      mac80211: Fix PTK rekey freezes and clear text leak · 62872a9b
      Alexander Wetzel authored
      Rekeying PTK keys without "Extended Key ID for Individually Addressed
      Frames" did use a procedure not suitable to replace in-use keys and
      could caused the following issues:
      
       1) Freeze caused by incoming frames:
          If the local STA installed the key prior to the remote STA we still
          had the old key active in the hardware when mac80211 switched over
          to the new key.
          Therefore there was a window where the card could hand over frames
          decoded with the old key to mac80211 and bump the new PN (IV) value
          to an incorrect high number. When it happened the local replay
          detection silently started to drop all frames sent with the new key.
      
       2) Freeze caused by outgoing frames:
          If mac80211 was providing the PN (IV) and handed over a clear text
          frame for encryption to the hardware prior to a key change the
          driver/card could have processed the queued frame after switching
          to the new key. This bumped the PN value on the remote STA to an
          incorrect high number, tricking the remote STA to discard all frames
          we sent later.
      
       3) Freeze caused by RX aggregation reorder buffer:
          An aggregation session started with the old key and ending after the
          switch to the new key also bumped the PN to an incorrect high number,
          freezing the connection quite similar to 1).
      
       4) Freeze caused by repeating lost frames in an aggregation session:
          A driver could repeat a lost frame and encrypt it with the new key
          while in a TX aggregation session without updating the PN for the
          new key. This also could freeze connections similar to 2).
      
       5) Clear text leak:
          Removing encryption offload from the card cleared the encryption
          offload flag only after the card had deleted the key and we did not
          stop TX during the rekey. The driver/card could therefore get
          unencrypted frames from mac80211 while no longer be instructed to
          encrypt them.
      
      To prevent those issues the key install logic has been changed:
       - Mac80211 divers known to be able to rekey PTK0 keys have to set
         @NL80211_EXT_FEATURE_CAN_REPLACE_PTK0,
       - mac80211 stops queuing frames depending on the key during the replace
       - the key is first replaced in the hardware and after that in mac80211
       - and mac80211 stops/blocks new aggregation sessions during the rekey.
      
      For drivers not setting
      @NL80211_EXT_FEATURE_CAN_REPLACE_PTK0 the user space must avoid PTK
      rekeys if "Extended Key ID for Individually Addressed Frames" is not
      being used. Rekeys for mac80211 drivers without this flag will generate a
      warning and use an extra call to ieee80211_flush_queues() to both
      highlight and try to prevent the issues with not updated drivers.
      
      The core of the fix changes the key install procedure from:
       - atomic switch over to the new key in mac80211
       - remove the old key in the hardware (stops encryption offloading, fall
         back to software encryption with a potential clear text packet leak
         in between)
       - delete the inactive old key in mac80211
       - enable hardware encryption offloading for the new key
      to:
       - if it's a PTK mark the old key as tainted to drop TX frames with the
         outgoing key
       - replace the key in hardware with the new one
       - atomic switch over to the new (not marked as tainted) key in
         mac80211 (which also resumes TX)
       - delete the inactive old key in mac80211
      
      With the new sequence the hardware will be unable to decrypt frames
      encrypted with the old key prior to switching to the new key in mac80211
      and thus prevent PNs from packets decrypted with the old key to be
      accounted against the new key.
      
      For that to work the drivers have to provide a clear boundary.
      Mac80211 drivers setting @NL80211_EXT_FEATURE_CAN_REPLACE_PTK0 confirm
      to provide it and mac80211 will then be able to correctly rekey in-use
      PTK keys with those drivers.
      
      The mac80211 requirements for drivers to set the flag have been added to
      the "Hardware crypto acceleration" documentation section. It drills down
      to:
      The drivers must not hand over frames decrypted with the old key to
      mac80211 once the call to set_key() with %DISABLE_KEY has been
      completed. It's allowed to either drop or continue to use the old key
      for any outgoing frames which are already in the queues, but it must not
      send out any of them unencrypted or encrypted with the new key.
      
      Even with the new boundary in place aggregation sessions with the
      reorder buffer are problematic:
      RX aggregation session started prior and completed after the rekey could
      still dump frames received with the old key at mac80211 after it
      switched over to the new key. This is side stepped by stopping all (RX
      and TX) aggregation sessions when replacing a PTK key and hardware key
      offloading.
      Stopping TX aggregation sessions avoids the need to get
      the PNs (IVs) updated in frames prepared for the old key and
      (re)transmitted after the switch to the new key. As a bonus it improves
      the compatibility when the remote STA is not handling rekeys as it
      should.
      
      When using software crypto aggregation sessions are not stopped.
      Mac80211 won't be able to decode the dangerous frames and discard them
      without special handling.
      Signed-off-by: default avatarAlexander Wetzel <alexander@wetzel-home.de>
      [trim overly long rekey warning]
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      62872a9b
    • Alexander Wetzel's avatar
      nl80211: Add CAN_REPLACE_PTK0 API · 2b815b04
      Alexander Wetzel authored
      Drivers able to correctly replace a in-use key should set
      @NL80211_EXT_FEATURE_CAN_REPLACE_PTK0 to allow the user space (e.g.
      hostapd or wpa_supplicant) to rekey PTK keys.
      
      The user space must detect a PTK rekey attempt and only go ahead with it
      when the driver has set this flag. If the driver is not supporting the
      feature the user space either must not replace the PTK key or perform a
      full re-association instead.
      
      Ignoring this flag and continuing to rekey the connection can still work
      but has to be considered insecure and broken. Depending on the driver it
      can leak clear text packets or freeze the connection and is only
      supported to allow the user space to be updated.
      Signed-off-by: default avatarAlexander Wetzel <alexander@wetzel-home.de>
      Reviewed-by: default avatarDenis Kenzior <denkenz@gmail.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      2b815b04
    • Shaul Triebitz's avatar
      mac80211: support radiotap L-SIG data · d1332e7b
      Shaul Triebitz authored
      As before with HE, the data needs to be provided by the
      driver in the skb head, since there's not enough space
      in the skb CB.
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarShaul Triebitz <shaul.triebitz@intel.com>
      Signed-off-by: default avatarLuca Coelho <luciano.coelho@intel.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      d1332e7b
    • Wen Gong's avatar
      mac80211: Store sk_pacing_shift in ieee80211_hw · 70e53669
      Wen Gong authored
      Make it possibly for drivers to adjust the default skb_pacing_shift
      by storing it in the hardware struct.
      Signed-off-by: default avatarWen Gong <wgong@codeaurora.org>
      [adjust commit log, move & adjust comment]
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      70e53669