1. 09 Feb, 2018 3 commits
    • Niklas Cassel's avatar
      net: stmmac: discard disabled flags in interrupt status register · 1b84ca18
      Niklas Cassel authored
      The interrupt status register in both dwmac1000 and dwmac4 ignores
      interrupt enable (for dwmac4) / interrupt mask (for dwmac1000).
      Therefore, if we want to check only the bits that can actually trigger
      an irq, we have to filter the interrupt status register manually.
      
      Commit 0a764db1 ("stmmac: Discard masked flags in interrupt status
      register") fixed this for dwmac1000. Fix the same issue for dwmac4.
      
      Just like commit 0a764db1 ("stmmac: Discard masked flags in
      interrupt status register"), this makes sure that we do not get
      spurious link up/link down prints.
      Signed-off-by: default avatarNiklas Cassel <niklas.cassel@axis.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1b84ca18
    • Thomas Falcon's avatar
      ibmvnic: Reset long term map ID counter · faefaa97
      Thomas Falcon authored
      When allocating RX or TX buffer pools, the driver needs to provide a
      unique mapping ID to firmware for each pool. This value is assigned
      using a counter which is incremented after a new pool is created. The
      ID can be an integer ranging from 1-255. When migrating to a device
      that requests a different number of queues, this value was not being
      reset properly. As a result, after enough migrations, the counter
      exceeded the upper bound and pool creation failed. This is fixed by
      resetting the counter to one in this case.
      Signed-off-by: default avatarThomas Falcon <tlfalcon@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      faefaa97
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 437a4db6
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2018-02-09
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Two fixes for BPF sockmap in order to break up circular map references
         from programs attached to sockmap, and detaching related sockets in
         case of socket close() event. For the latter we get rid of the
         smap_state_change() and plug into ULP infrastructure, which will later
         also be used for additional features anyway such as TX hooks. For the
         second issue, dependency chain is broken up via map release callback
         to free parse/verdict programs, all from John.
      
      2) Fix a libbpf relocation issue that was found while implementing XDP
         support for Suricata project. Issue was that when clang was invoked
         with default target instead of bpf target, then various other e.g.
         debugging relevant sections are added to the ELF file that contained
         relocation entries pointing to non-BPF related sections which libbpf
         trips over instead of skipping them. Test cases for libbpf are added
         as well, from Jesper.
      
      3) Various misc fixes for bpftool and one for libbpf: a small addition
         to libbpf to make sure it recognizes all standard section prefixes.
         Then, the Makefile in bpftool/Documentation is improved to explicitly
         check for rst2man being installed on the system as we otherwise risk
         installing empty man pages; the man page for bpftool-map is corrected
         and a set of missing bash completions added in order to avoid shipping
         bpftool where the completions are only partially working, from Quentin.
      
      4) Fix applying the relocation to immediate load instructions in the
         nfp JIT which were missing a shift, from Jakub.
      
      5) Two fixes for the BPF kernel selftests: handle CONFIG_BPF_JIT_ALWAYS_ON=y
         gracefully in test_bpf.ko module and mark them as FLAG_EXPECTED_FAIL
         in this case; and explicitly delete the veth devices in the two tests
         test_xdp_{meta,redirect}.sh before dismantling the netnses as when
         selftests are run in batch mode, then workqueue to handle destruction
         might not have finished yet and thus veth creation in next test under
         same dev name would fail, from Yonghong.
      
      6) Fix test_kmod.sh to check the test_bpf.ko module path before performing
         an insmod, and fallback to modprobe. Especially the latter is useful
         when having a device under test that has the modules installed instead,
         from Naresh.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      437a4db6
  2. 08 Feb, 2018 37 commits
    • Daniel Borkmann's avatar
      Merge branch 'bpf-libbpf-relo-fix-and-tests' · d977ae59
      Daniel Borkmann authored
      Jesper Dangaard Brouer says:
      
      ====================
      While playing with using libbpf for the Suricata project, we had
      issues LLVM >= 4.0.1 generating ELF files that could not be loaded
      with libbpf (tools/lib/bpf/).
      
      During the troubleshooting phase, I wrote a test program and improved
      the debugging output in libbpf.  I turned this into a selftests
      program, and it also serves as a code example for libbpf in itself.
      
      I discovered that there are at least three ELF load issues with
      libbpf.  I left them as TODO comments in (tools/testing/selftests/bpf)
      test_libbpf.sh. I've only fixed the load issue with eh_frames, and
      other types of relo-section that does not have exec flags.  We can
      work on the other issues later.
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      d977ae59
    • Jesper Dangaard Brouer's avatar
      tools/libbpf: handle issues with bpf ELF objects containing .eh_frames · e3d91b0c
      Jesper Dangaard Brouer authored
      V3: More generic skipping of relo-section (suggested by Daniel)
      
      If clang >= 4.0.1 is missing the option '-target bpf', it will cause
      llc/llvm to create two ELF sections for "Exception Frames", with
      section names '.eh_frame' and '.rel.eh_frame'.
      
      The BPF ELF loader library libbpf fails when loading files with these
      sections.  The other in-kernel BPF ELF loader in samples/bpf/bpf_load.c,
      handle this gracefully. And iproute2 loader also seems to work with these
      "eh" sections.
      
      The issue in libbpf is caused by bpf_object__elf_collect() skipping
      some sections, and later when performing relocation it will be
      pointing to a skipped section, as these sections cannot be found by
      bpf_object__find_prog_by_idx() in bpf_object__collect_reloc().
      
      This is a general issue that also occurs for other sections, like
      debug sections which are also skipped and can have relo section.
      
      As suggested by Daniel.  To avoid keeping state about all skipped
      sections, instead perform a direct qlookup in the ELF object.  Lookup
      the section that the relo-section points to and check if it contains
      executable machine instructions (denoted by the sh_flags
      SHF_EXECINSTR).  Use this check to also skip irrelevant relo-sections.
      
      Note, for samples/bpf/ the '-target bpf' parameter to clang cannot be used
      due to incompatibility with asm embedded headers, that some of the samples
      include. This is explained in more details by Yonghong Song in bpf_devel_QA.
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      e3d91b0c
    • Jesper Dangaard Brouer's avatar
      selftests/bpf: add selftest that use test_libbpf_open · f09b2e38
      Jesper Dangaard Brouer authored
      This script test_libbpf.sh will be part of the 'make run_tests'
      invocation, but can also be invoked manually in this directory,
      and a verbose mode can be enabled via setting the environment
      variable $VERBOSE like:
      
       $ VERBOSE=yes ./test_libbpf.sh
      
      The script contains some tests that are commented out, as they
      currently fail.  They are reminders about what we need to improve
      for the libbpf loader library.
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      f09b2e38
    • Jesper Dangaard Brouer's avatar
      selftests/bpf: add test program for loading BPF ELF files · 864db336
      Jesper Dangaard Brouer authored
      V2: Moved program into selftests/bpf from tools/libbpf
      
      This program can be used on its own for testing/debugging if a
      BPF ELF-object file can be loaded with libbpf (from tools/lib/bpf).
      
      If something is wrong with the ELF object, the program have
      a --debug mode that will display the ELF sections and especially
      the skipped sections.  This allows for quickly identifying the
      problematic ELF section number, which can be corrolated with the
      readelf tool.
      
      The program signal error via return codes, and also have
      a --quiet mode, which is practical for use in scripts like
      selftests/bpf.
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      864db336
    • Jesper Dangaard Brouer's avatar
      tools/libbpf: improve the pr_debug statements to contain section numbers · 077c066a
      Jesper Dangaard Brouer authored
      While debugging a bpf ELF loading issue, I needed to correlate the
      ELF section number with the failed relocation section reference.
      Thus, add section numbers/index to the pr_debug.
      
      In debug mode, also print section that were skipped.  This helped
      me identify that a section (.eh_frame) was skipped, and this was
      the reason the relocation section (.rel.eh_frame) could not find
      that section number.
      
      The section numbers corresponds to the readelf tools Section Headers [Nr].
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      077c066a
    • Jesper Dangaard Brouer's avatar
      bpf: Sync kernel ABI header with tooling header for bpf_common.h · 8c88181e
      Jesper Dangaard Brouer authored
      I recently fixed up a lot of commits that forgot to keep the tooling
      headers in sync.  And then I forgot to do the same thing in commit
      cb5f7334 ("bpf: add comments to BPF ld/ldx sizes"). Let correct
      that before people notice ;-).
      
      Lawrence did partly fix/sync this for bpf.h in commit d6d4f60c
      ("bpf: add selftest for tcpbpf").
      
      Fixes: cb5f7334 ("bpf: add comments to BPF ld/ldx sizes")
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      8c88181e
    • Heiner Kallweit's avatar
      net: phy: fix phy_start to consider PHY_IGNORE_INTERRUPT · 08f51385
      Heiner Kallweit authored
      This condition wasn't adjusted when PHY_IGNORE_INTERRUPT (-2) was added
      long ago. In case of PHY_IGNORE_INTERRUPT the MAC interrupt indicates
      also PHY state changes and we should do what the symbol says.
      
      Fixes: 84a527a4 ("net: phylib: fix interrupts re-enablement in phy_start")
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      08f51385
    • Dean Nelson's avatar
      net: thunder: change q_len's type to handle max ring size · 88c991a9
      Dean Nelson authored
      The Cavium thunder nicvf driver supports rx/tx rings of up to 65536 entries per.
      The number of entires are stored in the q_len member of struct q_desc_mem. The
      problem is that q_len being a u16, results in 65536 becoming 0.
      
      In getting pointers to descriptors in the rings, the driver uses q_len minus 1
      as a mask after incrementing the pointer, in order to go back to the beginning
      and not go past the end of the ring.
      
      With the q_len set to 0 the mask is no longer correct and the driver does go
      beyond the end of the ring, causing various ills. Usually the first thing that
      shows up is a "NETDEV WATCHDOG: enP2p1s0f1 (nicvf): transmit queue 7 timed out"
      warning.
      
      This patch remedies the problem by changing q_len to a u32.
      Signed-off-by: default avatarDean Nelson <dnelson@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      88c991a9
    • David S. Miller's avatar
      Merge tag 'wireless-drivers-next-for-davem-2018-02-08' of... · e0c42c8e
      David S. Miller authored
      Merge tag 'wireless-drivers-next-for-davem-2018-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
      
      Kalle Valo says:
      
      ====================
      wireless-drivers-next patches for 4.16
      
      The most important here is the ssb fix, it has been reported by the
      users frequently and the fix just missed the final v4.15. Also
      numerous other fixes, mt76 had multiple problems with aggregation and
      a long standing unaligned access bug in rtlwifi is finally fixed.
      
      Major changes:
      
      ath10k
      
      * correct firmware RAM dump length for QCA6174/QCA9377
      
      * add new QCA988X device id
      
      * fix a kernel panic during pci probe
      
      * revert a recent commit which broke ath10k firmware metadata parsing
      
      ath9k
      
      * fix a noise floor regression introduced during the merge window
      
      * add new device id
      
      rtlwifi
      
      * fix unaligned access seen on ARM architecture
      
      mt76
      
      * various aggregation fixes which fix connection stalls
      
      ssb
      
      * fix b43 and b44 on non-MIPS which broke in v4.15-rc9
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e0c42c8e
    • Hoang Le's avatar
      tipc: fix skb truesize/datasize ratio control · 55b3280d
      Hoang Le authored
      In commit d618d09a ("tipc: enforce valid ratio between skb truesize
      and contents") we introduced a test for ensuring that the condition
      truesize/datasize <= 4 is true for a received buffer. Unfortunately this
      test has two problems.
      
      - Because of the integer arithmetics the test
        if (skb->truesize / buf_roundup_len(skb) > 4) will miss all
        ratios [4 < ratio < 5], which was not the intention.
      - The buffer returned by skb_copy() inherits skb->truesize of the
        original buffer, which doesn't help the situation at all.
      
      In this commit, we change the ratio condition and replace skb_copy()
      with a call to skb_copy_expand() to finally get this right.
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      55b3280d
    • Ivan Vecera's avatar
      net/sched: cls_u32: fix cls_u32 on filter replace · eb53f7af
      Ivan Vecera authored
      The following sequence is currently broken:
      
       # tc qdisc add dev foo ingress
       # tc filter replace dev foo protocol all ingress \
         u32 match u8 0 0 action mirred egress mirror dev bar1
       # tc filter replace dev foo protocol all ingress \
         handle 800::800 pref 49152 \
         u32 match u8 0 0 action mirred egress mirror dev bar2
       Error: cls_u32: Key node flags do not match passed flags.
       We have an error talking to the kernel, -1
      
      The error comes from u32_change() when comparing new and
      existing flags. The existing ones always contains one of
      TCA_CLS_FLAGS_{,NOT}_IN_HW flag depending on offloading state.
      These flags cannot be passed from userspace so the condition
      (n->flags != flags) in u32_change() always fails.
      
      Fix the condition so the flags TCA_CLS_FLAGS_NOT_IN_HW and
      TCA_CLS_FLAGS_IN_HW are not taken into account.
      
      Fixes: 24d3dc6d ("net/sched: cls_u32: Reflect HW offload status")
      Signed-off-by: default avatarIvan Vecera <ivecera@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eb53f7af
    • Dan Williams's avatar
      mpls, nospec: Sanitize array index in mpls_label_ok() · 3968523f
      Dan Williams authored
      mpls_label_ok() validates that the 'platform_label' array index from a
      userspace netlink message payload is valid. Under speculation the
      mpls_label_ok() result may not resolve in the CPU pipeline until after
      the index is used to access an array element. Sanitize the index to zero
      to prevent userspace-controlled arbitrary out-of-bounds speculation, a
      precursor for a speculative execution side channel vulnerability.
      
      Cc: <stable@vger.kernel.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3968523f
    • Sowmini Varadhan's avatar
      rds: tcp: use rds_destroy_pending() to synchronize netns/module teardown and... · ebeeb1ad
      Sowmini Varadhan authored
      rds: tcp: use rds_destroy_pending() to synchronize netns/module teardown and rds connection/workq management
      
      An rds_connection can get added during netns deletion between lines 528
      and 529 of
      
        506 static void rds_tcp_kill_sock(struct net *net)
        :
        /* code to pull out all the rds_connections that should be destroyed */
        :
        528         spin_unlock_irq(&rds_tcp_conn_lock);
        529         list_for_each_entry_safe(tc, _tc, &tmp_list, t_tcp_node)
        530                 rds_conn_destroy(tc->t_cpath->cp_conn);
      
      Such an rds_connection would miss out the rds_conn_destroy()
      loop (that cancels all pending work) and (if it was scheduled
      after netns deletion) could trigger the use-after-free.
      
      A similar race-window exists for the module unload path
      in rds_tcp_exit -> rds_tcp_destroy_conns
      
      Concurrency with netns deletion (rds_tcp_kill_sock()) must be handled
      by checking check_net() before enqueuing new work or adding new
      connections.
      
      Concurrency with module-unload is handled by maintaining a module
      specific flag that is set at the start of the module exit function,
      and must be checked before enqueuing new work or adding new connections.
      
      This commit refactors existing RDS_DESTROY_PENDING checks added by
      commit 3db6e0d1 ("rds: use RCU to synchronize work-enqueue with
      connection teardown") and consolidates all the concurrency checks
      listed above into the function rds_destroy_pending().
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ebeeb1ad
    • Kees Cook's avatar
      net: Whitelist the skbuff_head_cache "cb" field · 79a8a642
      Kees Cook authored
      Most callers of put_cmsg() use a "sizeof(foo)" for the length argument.
      Within put_cmsg(), a copy_to_user() call is made with a dynamic size, as a
      result of the cmsg header calculations. This means that hardened usercopy
      will examine the copy, even though it was technically a fixed size and
      should be implicitly whitelisted. All the put_cmsg() calls being built
      from values in skbuff_head_cache are coming out of the protocol-defined
      "cb" field, so whitelist this field entirely instead of creating per-use
      bounce buffers, for which there are concerns about performance.
      
      Original report was:
      
      Bad or missing usercopy whitelist? Kernel memory exposure attempt detected from SLAB object 'skbuff_head_cache' (offset 64, size 16)!
      WARNING: CPU: 0 PID: 3663 at mm/usercopy.c:81 usercopy_warn+0xdb/0x100 mm/usercopy.c:76
      ...
       __check_heap_object+0x89/0xc0 mm/slab.c:4426
       check_heap_object mm/usercopy.c:236 [inline]
       __check_object_size+0x272/0x530 mm/usercopy.c:259
       check_object_size include/linux/thread_info.h:112 [inline]
       check_copy_size include/linux/thread_info.h:143 [inline]
       copy_to_user include/linux/uaccess.h:154 [inline]
       put_cmsg+0x233/0x3f0 net/core/scm.c:242
       sock_recv_errqueue+0x200/0x3e0 net/core/sock.c:2913
       packet_recvmsg+0xb2e/0x17a0 net/packet/af_packet.c:3296
       sock_recvmsg_nosec net/socket.c:803 [inline]
       sock_recvmsg+0xc9/0x110 net/socket.c:810
       ___sys_recvmsg+0x2a4/0x640 net/socket.c:2179
       __sys_recvmmsg+0x2a9/0xaf0 net/socket.c:2287
       SYSC_recvmmsg net/socket.c:2368 [inline]
       SyS_recvmmsg+0xc4/0x160 net/socket.c:2352
       entry_SYSCALL_64_fastpath+0x29/0xa0
      
      Reported-by: syzbot+e2d6cfb305e9f3911dea@syzkaller.appspotmail.com
      Fixes: 6d07d1cd ("usercopy: Restrict non-usercopy caches to size 0")
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      79a8a642
    • Mathieu Malaterre's avatar
      net: Extra '_get' in declaration of arch_get_platform_mac_address · e728789c
      Mathieu Malaterre authored
      In commit c7f5d105 ("net: Add eth_platform_get_mac_address() helper."),
      two declarations were added:
      
        int eth_platform_get_mac_address(struct device *dev, u8 *mac_addr);
        unsigned char *arch_get_platform_get_mac_address(void);
      
      An extra '_get' was introduced in arch_get_platform_get_mac_address, remove
      it. Fix compile warning using W=1:
      
        CC      net/ethernet/eth.o
      net/ethernet/eth.c:523:24: warning: no previous prototype for ‘arch_get_platform_mac_address’ [-Wmissing-prototypes]
       unsigned char * __weak arch_get_platform_mac_address(void)
                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        AR      net/ethernet/built-in.o
      Signed-off-by: default avatarMathieu Malaterre <malat@debian.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e728789c
    • Nathan Fontenot's avatar
      ibmvnic: queue reset when CRQ gets closed during reset · ec95dffa
      Nathan Fontenot authored
      While handling a driver reset we get a H_CLOSED return trying
      to send a CRQ event. When this occurs we need to queue up another
      reset attempt. Without doing this we see instances where the driver
      is left in a closed state because the reset failed and there is no
      further attempts to reset the driver.
      Signed-off-by: default avatarNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ec95dffa
    • Gustavo A. R. Silva's avatar
      atm: he: use 64-bit arithmetic instead of 32-bit · 583133b3
      Gustavo A. R. Silva authored
      Add suffix ULL to constants 272, 204, 136 and 68 in order to give the
      compiler complete information about the proper arithmetic to use.
      Notice that these constants are used in contexts that expect
      expressions of type unsigned long long (64 bits, unsigned).
      
      The following expressions are currently being evaluated using 32-bit
      arithmetic:
      
      272 * mult
      204 * mult
      136 * mult
      68 * mult
      
      Addresses-Coverity-ID: 201058
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      583133b3
    • Christian Brauner's avatar
      rtnetlink: require unique netns identifier · 4ff66cae
      Christian Brauner authored
      Since we've added support for IFLA_IF_NETNSID for RTM_{DEL,GET,SET,NEW}LINK
      it is possible for userspace to send us requests with three different
      properties to identify a target network namespace. This affects at least
      RTM_{NEW,SET}LINK. Each of them could potentially refer to a different
      network namespace which is confusing. For legacy reasons the kernel will
      pick the IFLA_NET_NS_PID property first and then look for the
      IFLA_NET_NS_FD property but there is no reason to extend this type of
      behavior to network namespace ids. The regression potential is quite
      minimal since the rtnetlink requests in question either won't allow
      IFLA_IF_NETNSID requests before 4.16 is out (RTM_{NEW,SET}LINK) or don't
      support IFLA_NET_NS_{PID,FD} (RTM_{DEL,GET}LINK) in the first place.
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Acked-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ff66cae
    • Jason Wang's avatar
      tuntap: add missing xdp flush · 762c330d
      Jason Wang authored
      When using devmap to redirect packets between interfaces,
      xdp_do_flush() is usually a must to flush any batched
      packets. Unfortunately this is missed in current tuntap
      implementation.
      
      Unlike most hardware driver which did XDP inside NAPI loop and call
      xdp_do_flush() at then end of each round of poll. TAP did it in the
      context of process e.g tun_get_user(). So fix this by count the
      pending redirected packets and flush when it exceeds NAPI_POLL_WEIGHT
      or MSG_MORE was cleared by sendmsg() caller.
      
      With this fix, xdp_redirect_map works again between two TAPs.
      
      Fixes: 761876c8 ("tap: XDP support")
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      762c330d
    • Nicolas Dichtel's avatar
      netlink: ensure to loop over all netns in genlmsg_multicast_allns() · cb9f7a9a
      Nicolas Dichtel authored
      Nowadays, nlmsg_multicast() returns only 0 or -ESRCH but this was not the
      case when commit 134e6375 was pushed.
      However, there was no reason to stop the loop if a netns does not have
      listeners.
      Returns -ESRCH only if there was no listeners in all netns.
      
      To avoid having the same problem in the future, I didn't take the
      assumption that nlmsg_multicast() returns only 0 or -ESRCH.
      
      Fixes: 134e6375 ("genetlink: make netns aware")
      CC: Johannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cb9f7a9a
    • David Howells's avatar
      rxrpc: Don't put crypto buffers on the stack · 8c2f826d
      David Howells authored
      Don't put buffers of data to be handed to crypto on the stack as this may
      cause an assertion failure in the kernel (see below).  Fix this by using an
      kmalloc'd buffer instead.
      
      kernel BUG at ./include/linux/scatterlist.h:147!
      ...
      RIP: 0010:rxkad_encrypt_response.isra.6+0x191/0x1b0 [rxrpc]
      RSP: 0018:ffffbe2fc06cfca8 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: ffff989277d59900 RCX: 0000000000000028
      RDX: 0000259dc06cfd88 RSI: 0000000000000025 RDI: ffffbe30406cfd88
      RBP: ffffbe2fc06cfd60 R08: ffffbe2fc06cfd08 R09: ffffbe2fc06cfd08
      R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff7c5f80d9f95
      R13: ffffbe2fc06cfd88 R14: ffff98927a3f7aa0 R15: ffffbe2fc06cfd08
      FS:  0000000000000000(0000) GS:ffff98927fc00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000055b1ff28f0f8 CR3: 000000001b412003 CR4: 00000000003606f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       rxkad_respond_to_challenge+0x297/0x330 [rxrpc]
       rxrpc_process_connection+0xd1/0x690 [rxrpc]
       ? process_one_work+0x1c3/0x680
       ? __lock_is_held+0x59/0xa0
       process_one_work+0x249/0x680
       worker_thread+0x3a/0x390
       ? process_one_work+0x680/0x680
       kthread+0x121/0x140
       ? kthread_create_worker_on_cpu+0x70/0x70
       ret_from_fork+0x3a/0x50
      Reported-by: default avatarJonathan Billings <jsbillings@jsbillings.org>
      Reported-by: default avatarMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-by: default avatarJonathan Billings <jsbillings@jsbillings.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8c2f826d
    • Kalle Valo's avatar
      Merge ath-current from git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git · 99ffd198
      Kalle Valo authored
      ath.git fixes for 4.16. Major changes:
      
      ath10k
      
      * correct firmware RAM dump length for QCA6174/QCA9377
      
      * add new QCA988X device id
      
      * fix a kernel panic during pci probe
      
      * revert a recent commit which broke ath10k firmware metadata parsing
      
      ath9k
      
      * fix a noise floor regression introduced during the merge window
      
      * add new device id
      99ffd198
    • David S. Miller's avatar
      Merge branch 'nfp-fix-disabling-TC-offloads-in-flower-max-TSO-segs-and-module-version' · c7025586
      David S. Miller authored
      Jakub Kicinski says:
      
      ====================
      nfp: fix disabling TC offloads in flower, max TSO segs and module version
      
      This set corrects the way nfp deals with the NETIF_F_HW_TC flag.
      It has slipped the review that flower offload does not currently
      refuse disabling this flag when filter offload is active.
      
      nfp's flower offload does not actually keep track of how many filters
      for each port are offloaded.  The accounting of the number of filters
      is added to the nfp core structures, and BPF moved to use these
      structures as well.
      
      If users are allowed to disable TC offloads while filters are active,
      not only is it incorrect behaviour, but actually the NFP will never
      be told to remove the flows, leading to use-after-free when stats
      arrive.
      
      Fourth patch makes sure we declare the max number of TSO segments.
      FW should drop longer packets cleanly (otherwise this would be a
      security problem for untrusted VFs) but dropping longer TSO frames
      is not nice and driver should prevent them from being generated.
      
      Last small addition populates MODULE_VERSION with kernel version.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c7025586
    • Jakub Kicinski's avatar
      nfp: populate MODULE_VERSION · 1a5e8e35
      Jakub Kicinski authored
      DKMS and similar out-of-tree module replacement services use
      module version to make sure the out-of-tree software is not
      older than the module shipped with the kernel.  We use the
      kernel version in ethtool -i output, put it into MODULE_VERSION
      as well.
      Reported-by: default avatarJan Gutter <jan.gutter@netronome.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a5e8e35
    • Jakub Kicinski's avatar
      nfp: limit the number of TSO segments · 0d592e52
      Jakub Kicinski authored
      Most FWs limit the number of TSO segments a frame can produce
      to 64.  This is for fairness and efficiency (of FW datapath)
      reasons.  If a frame with larger number of segments is submitted
      the FW will drop it.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d592e52
    • Jakub Kicinski's avatar
      nfp: forbid disabling hw-tc-offload on representors while offload active · d692403e
      Jakub Kicinski authored
      All netdevs which can accept TC offloads must implement
      .ndo_set_features().  nfp_reprs currently do not do that, which
      means hw-tc-offload can be turned on and off even when offloads
      are active.
      
      Whether the offloads are active is really a question to nfp_ports,
      so remove the per-app tc_busy callback indirection thing, and
      simply count the number of offloaded items in nfp_port structure.
      
      Fixes: 8a276873 ("nfp: provide infrastructure for offloading flower based TC filters")
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Tested-by: default avatarPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d692403e
    • Jakub Kicinski's avatar
      nfp: don't advertise hw-tc-offload on non-port netdevs · 0b9de4ca
      Jakub Kicinski authored
      nfp_port is a structure which represents an ASIC port, both
      PCIe vNIC (on a PF or a VF) or the external MAC port.  vNIC
      netdev (struct nfp_net) and pure representor netdev (struct
      nfp_repr) both have a pointer to this structure.  nfp_reprs
      always have a port associated.  nfp_nets, however, only represent
      a device port in legacy mode, where they are considered the
      MAC port. In switchdev mode they are just the CPU's side of
      the PCIe link.
      
      By definition TC offloads only apply to device ports.  Don't
      set the flag on vNICs without a port (i.e. in switchdev mode).
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Tested-by: default avatarPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0b9de4ca
    • Jakub Kicinski's avatar
      nfp: bpf: require ETH table · e3ac6c07
      Jakub Kicinski authored
      Upcoming changes will require all netdevs supporting TC offloads
      to have a full struct nfp_port.  Require those for BPF offload.
      The operation without management FW reporting information about
      Ethernet ports is something we only support for very old and very
      basic NIC firmwares anyway.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Tested-by: default avatarPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e3ac6c07
    • Ryan Hsu's avatar
      Revert "ath10k: add sanity check to ie_len before parsing fw/board ie" · 9ce8b24a
      Ryan Hsu authored
      This reverts commit 9ed4f916.
      
      The commit introduced a regression that over read the ie with
      the padding.
      
      - the expected IE information
      
      ath10k_pci 0000:03:00.0: found firmware features ie (1 B)
      ath10k_pci 0000:03:00.0: Enabling feature bit: 6
      ath10k_pci 0000:03:00.0: Enabling feature bit: 7
      ath10k_pci 0000:03:00.0: features
      ath10k_pci 0000:03:00.0: 00000000: c0 00 00 00 00 00 00 00
      
      - the wrong IE with padding is read (0x77)
      
      ath10k_pci 0000:03:00.0: found firmware features ie (4 B)
      ath10k_pci 0000:03:00.0: Enabling feature bit: 6
      ath10k_pci 0000:03:00.0: Enabling feature bit: 7
      ath10k_pci 0000:03:00.0: Enabling feature bit: 8
      ath10k_pci 0000:03:00.0: Enabling feature bit: 9
      ath10k_pci 0000:03:00.0: Enabling feature bit: 10
      ath10k_pci 0000:03:00.0: Enabling feature bit: 12
      ath10k_pci 0000:03:00.0: Enabling feature bit: 13
      ath10k_pci 0000:03:00.0: Enabling feature bit: 14
      ath10k_pci 0000:03:00.0: Enabling feature bit: 16
      ath10k_pci 0000:03:00.0: Enabling feature bit: 17
      ath10k_pci 0000:03:00.0: Enabling feature bit: 18
      ath10k_pci 0000:03:00.0: features
      ath10k_pci 0000:03:00.0: 00000000: c0 77 07 00 00 00 00 00
      Tested-by: default avatarMike Lothian <mike@fireburn.co.uk>
      Signed-off-by: default avatarRyan Hsu <ryanhsu@codeaurora.org>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      9ce8b24a
    • Daniel Borkmann's avatar
      Merge branch 'bpf-misc-nfp-bpftool-doc-fixes' · 69fe98ed
      Daniel Borkmann authored
      Jakub Kicinski says:
      
      ====================
      First patch in this series fixes applying the relocation to immediate
      load instructions in the NFP JIT.
      
      The remaining patches come from Quentin.  Small addition to libbpf
      makes sure it recognizes all standard section names.  Makefile in
      bpftool/Documentation is improved to explicitly check for rst2man
      being installed on the system, otherwise we risk installing empty
      files.  Man page for bpftool-map is corrected to include program
      as a potential value for map of programs.
      
      Last two patches are slightly longer, those update bash completions to
      include this release cycle's additions from Roman.  Maybe the use of
      Fixes tags is slightly frivolous there, but having bash completions
      which don't cover all commands and options could be disruptive to work
      flow for users.
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      69fe98ed
    • Quentin Monnet's avatar
      tools: bpftool: add bash completion for cgroup commands · a827a164
      Quentin Monnet authored
      Add bash completion for "bpftool cgroup" command family. While at it,
      also fix the formatting of some keywords in the man page for cgroups.
      
      Fixes: 5ccda64d ("bpftool: implement cgroup bpf operations")
      Signed-off-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      a827a164
    • Quentin Monnet's avatar
      tools: bpftool: add bash completion for `bpftool prog load` · 55f3538c
      Quentin Monnet authored
      Add bash completion for bpftool command `prog load`. Completion for this
      command is easy, as it only takes existing file paths as arguments.
      
      Fixes: 49a086c2 ("bpftool: implement prog load command")
      Signed-off-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      55f3538c
    • Quentin Monnet's avatar
      tools: bpftool: make syntax for program map update explicit in man page · 2148481d
      Quentin Monnet authored
      Specify in the documentation that when using bpftool to update a map of
      type BPF_MAP_TYPE_PROG_ARRAY, the syntax for the program used as a value
      should use the "id|tag|pinned" keywords convention, as used with
      "bpftool prog" commands.
      
      Fixes: ff69c21a ("tools: bpftool: add documentation")
      Signed-off-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      2148481d
    • Quentin Monnet's avatar
      tools: bpftool: exit doc Makefile early if rst2man is not available · 92426820
      Quentin Monnet authored
      If rst2man is not available on the system, running `make doc` from the
      bpftool directory fails with an error message. However, it creates empty
      manual pages (.8 files in this case). A subsequent call to `make
      doc-install` would then succeed and install those empty man pages on the
      system.
      
      To prevent this, raise a Makefile error and exit immediately if rst2man
      is not available before generating the pages from the rst documentation.
      
      Fixes: ff69c21a ("tools: bpftool: add documentation")
      Reported-by: default avatarJason van Aaardt <jason.vanaardt@netronome.com>
      Signed-off-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      92426820
    • Quentin Monnet's avatar
      libbpf: complete list of strings for guessing program type · 0badd331
      Quentin Monnet authored
      It seems that the type guessing feature for libbpf, based on the name of
      the ELF section the program is located in, was inspired from
      samples/bpf/prog_load.c, which was not used by any sample for loading
      programs of certain types such as TC actions and classifiers, or
      LWT-related types. As a consequence, libbpf is not able to guess the
      type of such programs and to load them automatically if type is not
      provided to the `bpf_load_prog()` function.
      
      Add ELF section names associated to those eBPF program types so that
      they can be loaded with e.g. bpftool as well.
      Signed-off-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      0badd331
    • Jakub Kicinski's avatar
      nfp: bpf: fix immed relocation for larger offsets · b7d99235
      Jakub Kicinski authored
      Immed relocation is missing a shift which means for larger
      offsets the lower and higher part of the address would be
      ORed together.
      
      Fixes: ce4ebfd8 ("nfp: bpf: add helpers for updating immediate instructions")
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarJiong Wang <jiong.wang@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      b7d99235
    • Song Liu's avatar
      tcp: tracepoint: only call trace_tcp_send_reset with full socket · 5c487bb9
      Song Liu authored
      tracepoint tcp_send_reset requires a full socket to work. However, it
      may be called when in TCP_TIME_WAIT:
      
              case TCP_TW_RST:
                      tcp_v6_send_reset(sk, skb);
                      inet_twsk_deschedule_put(inet_twsk(sk));
                      goto discard_it;
      
      To avoid this problem, this patch checks the socket with sk_fullsock()
      before calling trace_tcp_send_reset().
      
      Fixes: c24b14c4 ("tcp: add tracepoint trace_tcp_send_reset")
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      Reviewed-by: default avatarLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5c487bb9