1. 13 Jul, 2018 5 commits
    • Daniel Borkmann's avatar
      Merge branch 'bpf-af-xdp-consistent-err-reporting' · 5e3e6e83
      Daniel Borkmann authored
      Magnus Karlsson says:
      
      ====================
      This patch set adjusts the AF_XDP TX error reporting so that it becomes
      consistent between copy mode and zero-copy. First some background:
      
      Copy-mode for TX uses the SKB path in which the action of sending the
      packet is performed from process context using the sendmsg
      syscall. Completions are usually done asynchronously from NAPI mode by
      using a TX interrupt. In this mode, send errors can be returned back
      through the syscall.
      
      In zero-copy mode both the sending of the packet and the completions
      are done asynchronously from NAPI mode for performance reasons. In
      this mode, the sendmsg syscall only makes sure that the TX NAPI loop
      will be run that performs both the actions of sending and
      completing. In this mode it is therefore not possible to return errors
      through the sendmsg syscall as the sending is done from the NAPI
      loop. Note that it is possible to implement a synchronous send with
      our API, but in our benchmarks that made the TX performance drop by
      nearly half due to synchronization requirements and cache line
      bouncing. But for some netdevs this might be preferable so let us
      leave it up to the implementation to decide.
      
      The problem is that the current code base returns some errors in
      copy-mode that are not possible to return in zero-copy mode. This
      patch set aligns them so that the two modes always return the same
      error code. We achieve this by removing some of the errors returned by
      sendmsg in copy-mode (and in one case adding an error message for
      zero-copy mode) and offering alternative error detection methods that
      are consistent between the two modes.
      
      The structure of the patch set is as follows:
      
      Patch 1: removes the ENXIO return code from copy-mode when someone has
      forcefully changed the number of queues on the device so that the
      queue bound to the socket is no longer available. Just silently stop
      sending anything as in zero-copy mode.
      
      Patch 2: stop returning EAGAIN in copy mode when the completion queue
      is full as zero-copy does not do this. Instead this situation can be
      detected by comparing the head and tail pointers of the completion
      queue in both modes. In any case, EAGAIN was not the correct error code
      here since no amount of calling sendmsg will solve the problem. Only
      consuming one or more messages on the completion queue will fix this.
      
      Patch 3: Always return ENOBUFS from sendmsg if there is no TX queue
      configured. This was not the case for zero-copy mode.
      
      Patch 4: stop returning EMSGSIZE when the size of the packet is larger
      than the MTU. Just send it to the device so that it will drop it as in
      zero-copy mode.
      
      Note that copy-mode can still return EAGAIN in certain circumstances,
      but as these conditions cannot occur in zero-copy mode it is fine for
      copy-mode to return them.
      ====================
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      5e3e6e83
    • Magnus Karlsson's avatar
      xsk: do not return EMSGSIZE in copy mode for packets larger than MTU · 09210c4b
      Magnus Karlsson authored
      This patch stops returning EMSGSIZE from sendmsg in copy mode when the
      size of the packet is larger than the MTU. Just send it to the device
      so that it will drop it as in zero-copy mode. This makes the error
      reporting consistent between copy mode and zero-copy mode.
      
      Fixes: 35fcde7f ("xsk: support for Tx")
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      09210c4b
    • Magnus Karlsson's avatar
      xsk: always return ENOBUFS from sendmsg if there is no TX queue · 6efb4436
      Magnus Karlsson authored
      This patch makes sure ENOBUFS is always returned from sendmsg if there
      is no TX queue configured. This was not the case for zero-copy
      mode. With this patch this error reporting is consistent between copy
      mode and zero-copy mode.
      
      Fixes: ac98d8aa ("xsk: wire upp Tx zero-copy functions")
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      6efb4436
    • Magnus Karlsson's avatar
      xsk: do not return EAGAIN from sendmsg when completion queue is full · 9684f5e7
      Magnus Karlsson authored
      This patch stops returning EAGAIN in TX copy mode when the completion
      queue is full as zero-copy does not do this. Instead this situation
      can be detected by comparing the head and tail pointers of the
      completion queue in both modes. In any case, EAGAIN was not the
      correct error code here since no amount of calling sendmsg will solve
      the problem. Only consuming one or more messages on the completion
      queue will fix this.
      
      With this patch, the error reporting becomes consistent between copy
      mode and zero-copy mode.
      
      Fixes: 35fcde7f ("xsk: support for Tx")
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      9684f5e7
    • Magnus Karlsson's avatar
      xsk: do not return ENXIO from TX copy mode · 509d7648
      Magnus Karlsson authored
      This patch removes the ENXIO return code from TX copy-mode when
      someone has forcefully changed the number of queues on the device so
      that the queue bound to the socket is no longer available. Just
      silently stop sending anything as in zero-copy mode so the error
      reporting gets consistent between the two modes.
      
      Fixes: 35fcde7f ("xsk: support for Tx")
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      509d7648
  2. 12 Jul, 2018 1 commit
    • Daniel Borkmann's avatar
      bpf: don't leave partial mangled prog in jit_subprogs error path · c7a89784
      Daniel Borkmann authored
      syzkaller managed to trigger the following bug through fault injection:
      
        [...]
        [  141.043668] verifier bug. No program starts at insn 3
        [  141.044648] WARNING: CPU: 3 PID: 4072 at kernel/bpf/verifier.c:1613
                       get_callee_stack_depth kernel/bpf/verifier.c:1612 [inline]
        [  141.044648] WARNING: CPU: 3 PID: 4072 at kernel/bpf/verifier.c:1613
                       fixup_call_args kernel/bpf/verifier.c:5587 [inline]
        [  141.044648] WARNING: CPU: 3 PID: 4072 at kernel/bpf/verifier.c:1613
                       bpf_check+0x525e/0x5e60 kernel/bpf/verifier.c:5952
        [  141.047355] CPU: 3 PID: 4072 Comm: a.out Not tainted 4.18.0-rc4+ #51
        [  141.048446] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),BIOS 1.10.2-1 04/01/2014
        [  141.049877] Call Trace:
        [  141.050324]  __dump_stack lib/dump_stack.c:77 [inline]
        [  141.050324]  dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
        [  141.050950]  ? dump_stack_print_info.cold.2+0x52/0x52 lib/dump_stack.c:60
        [  141.051837]  panic+0x238/0x4e7 kernel/panic.c:184
        [  141.052386]  ? add_taint.cold.5+0x16/0x16 kernel/panic.c:385
        [  141.053101]  ? __warn.cold.8+0x148/0x1ba kernel/panic.c:537
        [  141.053814]  ? __warn.cold.8+0x117/0x1ba kernel/panic.c:530
        [  141.054506]  ? get_callee_stack_depth kernel/bpf/verifier.c:1612 [inline]
        [  141.054506]  ? fixup_call_args kernel/bpf/verifier.c:5587 [inline]
        [  141.054506]  ? bpf_check+0x525e/0x5e60 kernel/bpf/verifier.c:5952
        [  141.055163]  __warn.cold.8+0x163/0x1ba kernel/panic.c:538
        [  141.055820]  ? get_callee_stack_depth kernel/bpf/verifier.c:1612 [inline]
        [  141.055820]  ? fixup_call_args kernel/bpf/verifier.c:5587 [inline]
        [  141.055820]  ? bpf_check+0x525e/0x5e60 kernel/bpf/verifier.c:5952
        [...]
      
      What happens in jit_subprogs() is that kcalloc() for the subprog func
      buffer is failing with NULL where we then bail out. Latter is a plain
      return -ENOMEM, and this is definitely not okay since earlier in the
      loop we are walking all subprogs and temporarily rewrite insn->off to
      remember the subprog id as well as insn->imm to temporarily point the
      call to __bpf_call_base + 1 for the initial JIT pass. Thus, bailing
      out in such state and handing this over to the interpreter is troublesome
      since later/subsequent e.g. find_subprog() lookups are based on wrong
      insn->imm.
      
      Therefore, once we hit this point, we need to jump to out_free path
      where we undo all changes from earlier loop, so that interpreter can
      work on unmodified insn->{off,imm}.
      
      Another point is that should find_subprog() fail in jit_subprogs() due
      to a verifier bug, then we also should not simply defer the program to
      the interpreter since also here we did partial modifications. Instead
      we should just bail out entirely and return an error to the user who is
      trying to load the program.
      
      Fixes: 1c2a088a ("bpf: x64: add JIT support for multi-function programs")
      Reported-by: syzbot+7d427828b2ea6e592804@syzkaller.appspotmail.com
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c7a89784
  3. 11 Jul, 2018 3 commits
    • Daniel Borkmann's avatar
      bpf: fix panic due to oob in bpf_prog_test_run_skb · 6e6fddc7
      Daniel Borkmann authored
      sykzaller triggered several panics similar to the below:
      
        [...]
        [  248.851531] BUG: KASAN: use-after-free in _copy_to_user+0x5c/0x90
        [  248.857656] Read of size 985 at addr ffff8808017ffff2 by task a.out/1425
        [...]
        [  248.865902] CPU: 1 PID: 1425 Comm: a.out Not tainted 4.18.0-rc4+ #13
        [  248.865903] Hardware name: Supermicro SYS-5039MS-H12TRF/X11SSE-F, BIOS 2.1a 03/08/2018
        [  248.865905] Call Trace:
        [  248.865910]  dump_stack+0xd6/0x185
        [  248.865911]  ? show_regs_print_info+0xb/0xb
        [  248.865913]  ? printk+0x9c/0xc3
        [  248.865915]  ? kmsg_dump_rewind_nolock+0xe4/0xe4
        [  248.865919]  print_address_description+0x6f/0x270
        [  248.865920]  kasan_report+0x25b/0x380
        [  248.865922]  ? _copy_to_user+0x5c/0x90
        [  248.865924]  check_memory_region+0x137/0x190
        [  248.865925]  kasan_check_read+0x11/0x20
        [  248.865927]  _copy_to_user+0x5c/0x90
        [  248.865930]  bpf_test_finish.isra.8+0x4f/0xc0
        [  248.865932]  bpf_prog_test_run_skb+0x6a0/0xba0
        [...]
      
      After scrubbing the BPF prog a bit from the noise, turns out it called
      bpf_skb_change_head() for the lwt_xmit prog with headroom of 2. Nothing
      wrong in that, however, this was run with repeat >> 0 in bpf_prog_test_run_skb()
      and the same skb thus keeps changing until the pskb_expand_head() called
      from skb_cow() keeps bailing out in atomic alloc context with -ENOMEM.
      So upon return we'll basically have 0 headroom left yet blindly do the
      __skb_push() of 14 bytes and keep copying data from there in bpf_test_finish()
      out of bounds. Fix to check if we have enough headroom and if pskb_expand_head()
      fails, bail out with error.
      
      Another bug independent of this fix (but related in triggering above) is
      that BPF_PROG_TEST_RUN should be reworked to reset the skb/xdp buffer to
      it's original state from input as otherwise repeating the same test in a
      loop won't work for benchmarking when underlying input buffer is getting
      changed by the prog each time and reused for the next run leading to
      unexpected results.
      
      Fixes: 1cf1cae9 ("bpf: introduce BPF_PROG_TEST_RUN command")
      Reported-by: syzbot+709412e651e55ed96498@syzkaller.appspotmail.com
      Reported-by: syzbot+54f39d6ab58f39720a55@syzkaller.appspotmail.com
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      6e6fddc7
    • Okash Khawaja's avatar
      bpf: btf: Fix bitfield extraction for big endian · b65f370d
      Okash Khawaja authored
      When extracting bitfield from a number, btf_int_bits_seq_show() builds
      a mask and accesses least significant byte of the number in a way
      specific to little-endian. This patch fixes that by checking endianness
      of the machine and then shifting left and right the unneeded bits.
      
      Thanks to Martin Lau for the help in navigating potential pitfalls when
      dealing with endianess and for the final solution.
      
      Fixes: b00b8dae ("bpf: btf: Add pretty print capability for data with BTF type info")
      Signed-off-by: default avatarOkash Khawaja <osk@fb.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      b65f370d
    • Mathieu Xhonneux's avatar
      bpf: fix availability probing for seg6 helpers · 61d76980
      Mathieu Xhonneux authored
      bpf_lwt_seg6_* helpers require CONFIG_IPV6_SEG6_BPF, and currently
      return -EOPNOTSUPP to indicate unavailability. This patch forces the
      BPF verifier to reject programs using these helpers when
      !CONFIG_IPV6_SEG6_BPF, allowing users to more easily probe if they are
      available or not.
      Signed-off-by: default avatarMathieu Xhonneux <m.xhonneux@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      61d76980
  4. 10 Jul, 2018 2 commits
  5. 09 Jul, 2018 1 commit
    • Roman Gushchin's avatar
      bpf: include errno.h from bpf-cgroup.h · f292b87d
      Roman Gushchin authored
      Commit fdb5c453 ("bpf: fix attach type BPF_LIRC_MODE2 dependency
      wrt CONFIG_CGROUP_BPF") caused some build issues, detected by 0-DAY
      kernel test infrastructure.
      
      The problem is that cgroup_bpf_prog_attach/detach/query() functions
      can return -EINVAL error code, which is not defined. Fix this adding
      errno.h to includes.
      
      Fixes: fdb5c453 ("bpf: fix attach type BPF_LIRC_MODE2 dependency wrt CONFIG_CGROUP_BPF")
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Cc: Sean Young <sean@mess.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      f292b87d
  6. 08 Jul, 2018 5 commits
    • Eric Dumazet's avatar
      tcp: cleanup copied_seq and urg_data in tcp_disconnect · 6508b678
      Eric Dumazet authored
      tcp_zerocopy_receive() relies on tcp_inq() to limit number of bytes
      requested by user.
      
      syzbot found that after tcp_disconnect(), tcp_inq() was returning
      a stale value (number of bytes in queue before the disconnect).
      
      Note that after this patch, ioctl(fd, SIOCINQ, &val) is also fixed
      and returns 0, so this might be a candidate for all known linux kernels.
      
      While we are at this, we probably also should clear urg_data to
      avoid other syzkaller reports after it discovers how to deal with
      urgent data.
      
      syzkaller repro :
      
      socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
      bind(3, {sa_family=AF_INET, sin_port=htons(20000), sin_addr=inet_addr("224.0.0.1")}, 16) = 0
      connect(3, {sa_family=AF_INET, sin_port=htons(20000), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
      send(3, ..., 4096, 0) = 4096
      connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 128) = 0
      getsockopt(3, SOL_TCP, TCP_ZEROCOPY_RECEIVE, ..., [16]) = 0 // CRASH
      
      Fixes: 05255b82 ("tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6508b678
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 7f93d129
      David S. Miller authored
      Alexei Starovoitov says:
      
      ====================
      pull-request: bpf 2018-07-07
      
      The following pull-request contains BPF updates for your *net* tree.
      
      Plenty of fixes for different components:
      
      1) A set of critical fixes for sockmap and sockhash, from John Fastabend.
      
      2) fixes for several race conditions in af_xdp, from Magnus Karlsson.
      
      3) hash map refcnt fix, from Mauricio Vasquez.
      
      4) samples/bpf fixes, from Taeung Song.
      
      5) ifup+mtu check for xdp_redirect, from Toshiaki Makita.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7f93d129
    • Paolo Abeni's avatar
      ipfrag: really prevent allocation on netns exit · f6f2a4a2
      Paolo Abeni authored
      Setting the low threshold to 0 has no effect on frags allocation,
      we need to clear high_thresh instead.
      
      The code was pre-existent to commit 648700f7 ("inet: frags:
      use rhashtables for reassembly units"), but before the above,
      such assignment had a different role: prevent concurrent eviction
      from the worker and the netns cleanup helper.
      
      Fixes: 648700f7 ("inet: frags: use rhashtables for reassembly units")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f6f2a4a2
    • Lorenzo Colitti's avatar
      net: diag: Don't double-free TCP_NEW_SYN_RECV sockets in tcp_abort · acc2cf4e
      Lorenzo Colitti authored
      When tcp_diag_destroy closes a TCP_NEW_SYN_RECV socket, it first
      frees it by calling inet_csk_reqsk_queue_drop_and_and_put in
      tcp_abort, and then frees it again by calling sock_gen_put.
      
      Since tcp_abort only has one caller, and all the other codepaths
      in tcp_abort don't free the socket, just remove the free in that
      function.
      
      Cc: David Ahern <dsa@cumulusnetworks.com>
      Tested: passes Android sock_diag_test.py, which exercises this codepath
      Fixes: d7226c7a ("net: diag: Fix refcnt leak in error path destroying socket")
      Signed-off-by: default avatarLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Tested-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      acc2cf4e
    • David Ahern's avatar
      net/ipv4: Set oif in fib_compute_spec_dst · e7372197
      David Ahern authored
      Xin reported that icmp replies may not use the address on the device the
      echo request is received if the destination address is broadcast. Instead
      a route lookup is done without considering VRF context. Fix by setting
      oif in flow struct to the master device if it is enslaved. That directs
      the lookup to the VRF table. If the device is not enslaved, oif is still
      0 so no affect.
      
      Fixes: cd2fbe1b ("net: Use VRF device index for lookups on RX")
      Reported-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e7372197
  7. 07 Jul, 2018 23 commits
    • Toshiaki Makita's avatar
      xdp: XDP_REDIRECT should check IFF_UP and MTU · d8d7218a
      Toshiaki Makita authored
      Otherwise we end up with attempting to send packets from down devices
      or to send oversized packets, which may cause unexpected driver/device
      behaviour. Generic XDP has already done this check, so reuse the logic
      in native XDP.
      
      Fixes: 814abfab ("xdp: add bpf_redirect helper function")
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d8d7218a
    • Alexei Starovoitov's avatar
      Merge branch 'sockhash-fixes' · 4fb126cb
      Alexei Starovoitov authored
      John Fastabend says:
      
      ====================
      First three patches resolve issues found while testing sockhash and
      reviewing code. Syzbot also found them about the same time as I was
      working on fixes. The main issue is in the sockhash path we reduced
      the scope of sk_callback lock but this meant we could get update and
      close running in parallel so fix that here.
      
      Then testing sk_msg and sk_skb programs together found that skb->dev
      is not always assigned and some of the helpers were depending on this
      to lookup max mtu. Fix this by using SKB_MAX_ALLOC when no MTU is
      available.
      
      Finally, Martin spotted that the sockmap code was still using the
      qdisc skb cb structure. But I was sure we had fixed this long ago.
      Looks like we missed it in a merge conflict resolution and then by
      chance data_end offset was the same in both structures so everything
      sort of continued to work even though it could break at any moment
      if the structs ever change. So redo the conversion and this time
      also convert the helpers.
      
      v2: fix '0 files changed' issue in patches
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      4fb126cb
    • John Fastabend's avatar
      bpf: sockmap, convert bpf_compute_data_pointers to bpf_*_sk_skb · 0ea488ff
      John Fastabend authored
      In commit
      
        'bpf: bpf_compute_data uses incorrect cb structure' (8108a775)
      
      we added the routine bpf_compute_data_end_sk_skb() to compute the
      correct data_end values, but this has since been lost. In kernel
      v4.14 this was correct and the above patch was applied in it
      entirety. Then when v4.14 was merged into v4.15-rc1 net-next tree
      we lost the piece that renamed bpf_compute_data_pointers to the
      new function bpf_compute_data_end_sk_skb. This was done here,
      
      e1ea2f98 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")
      
      When it conflicted with the following rename patch,
      
      6aaae2b6 ("bpf: rename bpf_compute_data_end into bpf_compute_data_pointers")
      
      Finally, after a refactor I thought even the function
      bpf_compute_data_end_sk_skb() was no longer needed and it was
      erroneously removed.
      
      However, we never reverted the sk_skb_convert_ctx_access() usage of
      tcp_skb_cb which had been committed and survived the merge conflict.
      Here we fix this by adding back the helper and *_data_end_sk_skb()
      usage. Using the bpf_skc_data_end mapping is not correct because it
      expects a qdisc_skb_cb object but at the sock layer this is not the
      case. Even though it happens to work here because we don't overwrite
      any data in-use at the socket layer and the cb structure is cleared
      later this has potential to create some subtle issues. But, even
      more concretely the filter.c access check uses tcp_skb_cb.
      
      And by some act of chance though,
      
      struct bpf_skb_data_end {
              struct qdisc_skb_cb        qdisc_cb;             /*     0    28 */
      
              /* XXX 4 bytes hole, try to pack */
      
              void *                     data_meta;            /*    32     8 */
              void *                     data_end;             /*    40     8 */
      
              /* size: 48, cachelines: 1, members: 3 */
              /* sum members: 44, holes: 1, sum holes: 4 */
              /* last cacheline: 48 bytes */
      };
      
      and then tcp_skb_cb,
      
      struct tcp_skb_cb {
      	[...]
                      struct {
                              __u32      flags;                /*    24     4 */
                              struct sock * sk_redir;          /*    32     8 */
                              void *     data_end;             /*    40     8 */
                      } bpf;                                   /*          24 */
              };
      
      So when we use offset_of() to track down the byte offset we get 40 in
      either case and everything continues to work. Fix this mess and use
      correct structures its unclear how long this might actually work for
      until someone moves the structs around.
      Reported-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Fixes: e1ea2f98 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")
      Fixes: 6aaae2b6 ("bpf: rename bpf_compute_data_end into bpf_compute_data_pointers")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      0ea488ff
    • John Fastabend's avatar
      bpf: sockmap, consume_skb in close path · 7ebc14d5
      John Fastabend authored
      Currently, when a sock is closed and the bpf_tcp_close() callback is
      used we remove memory but do not free the skb. Call consume_skb() if
      the skb is attached to the buffer.
      
      Reported-by: syzbot+d464d2c20c717ef5a6a8@syzkaller.appspotmail.com
      Fixes: 1aa12bdf ("bpf: sockmap, add sock close() hook to remove socks")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      7ebc14d5
    • John Fastabend's avatar
      bpf: sockhash, disallow bpf_tcp_close and update in parallel · 99ba2b5a
      John Fastabend authored
      After latest lock updates there is no longer anything preventing a
      close and recvmsg call running in parallel. Additionally, we can
      race update with close if we close a socket and simultaneously update
      if via the BPF userspace API (note the cgroup ops are already run
      with sock_lock held).
      
      To resolve this take sock_lock in close and update paths.
      
      Reported-by: syzbot+b680e42077a0d7c9a0c4@syzkaller.appspotmail.com
      Fixes: e9db4ef6 ("bpf: sockhash fix omitted bucket lock in sock_close")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      99ba2b5a
    • John Fastabend's avatar
      bpf: fix sk_skb programs without skb->dev assigned · 0c6bc6e5
      John Fastabend authored
      Multiple BPF helpers in use by sk_skb programs calculate the max
      skb length using the __bpf_skb_max_len function. However, this
      calculates the max length using the skb->dev pointer which can be
      NULL when an sk_skb program is paired with an sk_msg program.
      
      To force this a sk_msg program needs to redirect into the ingress
      path of a sock with an attach sk_skb program. Then the the sk_skb
      program would need to call one of the helpers that adjust the skb
      size.
      
      To fix the null ptr dereference use SKB_MAX_ALLOC size if no dev
      is available.
      
      Fixes: 8934ce2f ("bpf: sockmap redirect ingress support")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      0c6bc6e5
    • Alexei Starovoitov's avatar
      Merge branch 'sockmap-fixes' · 631da853
      Alexei Starovoitov authored
      John Fastabend says:
      
      ====================
      I missed fixing the error path in the sockhash code to align with
      supporting socks in multiple maps. Simply checking if the psock is
      present does not mean we can decrement the reference count because
      it could be part of another map. Fix this by cleaning up the error
      path so this situation does not happen.
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      631da853
    • John Fastabend's avatar
      bpf: sockmap, hash table is RCU so readers do not need locks · 1d1ef005
      John Fastabend authored
      This removes locking from readers of RCU hash table. Its not
      necessary.
      
      Fixes: 81110384 ("bpf: sockmap, add hash map support")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      1d1ef005
    • John Fastabend's avatar
      bpf: sockmap, error path can not release psock in multi-map case · 547b3aa4
      John Fastabend authored
      The current code, in the error path of sock_hash_ctx_update_elem,
      checks if the sock has a psock in the user data and if so decrements
      the reference count of the psock. However, if the error happens early
      in the error path we may have never incremented the psock reference
      count and if the psock exists because the sock is in another map then
      we may inadvertently decrement the reference count.
      
      Fix this by making the error path only call smap_release_sock if the
      error happens after the increment.
      
      Reported-by: syzbot+d464d2c20c717ef5a6a8@syzkaller.appspotmail.com
      Fixes: 81110384 ("bpf: sockmap, add hash map support")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      547b3aa4
    • David S. Miller's avatar
      Merge branch 'net-sched-fix-NULL-dereference-in-goto-chain-control-action' · de508f8b
      David S. Miller authored
      Davide Caratti says:
      
      ====================
      net/sched: fix NULL dereference in 'goto chain' control action
      
      in a couple of TC actions (i.e. csum and tunnel_key), the control action
      is stored together with the action-specific configuration data.
      This avoids a race condition (see [1]), but it causes a crash when 'goto
      chain' is used with the above actions. Since this race condition is
      tolerated on the other TC actions (it's present even on actions where the
      spinlock is still used), storing the control action in the common area
      should be acceptable for tunnel_key and csum as well.
      
      [1] https://www.spinics.net/lists/netdev/msg472047.html
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      de508f8b
    • Davide Caratti's avatar
      net/sched: act_tunnel_key: fix NULL dereference when 'goto chain' is used · 38230a3e
      Davide Caratti authored
      the control action in the common member of struct tcf_tunnel_key must be a
      valid value, as it can contain the chain index when 'goto chain' is used.
      Ensure that the control action can be read as x->tcfa_action, when x is a
      pointer to struct tc_action and x->ops->type is TCA_ACT_TUNNEL_KEY, to
      prevent the following command:
      
       # tc filter add dev $h2 ingress protocol ip pref 1 handle 101 flower \
       > $tcflags dst_mac $h2mac action tunnel_key unset goto chain 1
      
      from causing a NULL dereference when a matching packet is received:
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
       PGD 80000001097ac067 P4D 80000001097ac067 PUD 103b0a067 PMD 0
       Oops: 0000 [#1] SMP PTI
       CPU: 0 PID: 3491 Comm: mausezahn Tainted: G            E     4.18.0-rc2.auguri+ #421
       Hardware name: Hewlett-Packard HP Z220 CMT Workstation/1790, BIOS K51 v01.58 02/07/2013
       RIP: 0010:tcf_action_exec+0xb8/0x100
       Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
       RSP: 0018:ffff95145ea03c40 EFLAGS: 00010246
       RAX: 0000000020000001 RBX: ffff9514499e5800 RCX: 0000000000000001
       RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
       RBP: ffff95145ea03e60 R08: 0000000000000000 R09: ffff95145ea03c9c
       R10: ffff95145ea03c78 R11: 0000000000000008 R12: ffff951456a69800
       R13: ffff951456a69808 R14: 0000000000000001 R15: ffff95144965ee40
       FS:  00007fd67ee11740(0000) GS:ffff95145ea00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000000 CR3: 00000001038a2006 CR4: 00000000001606f0
       Call Trace:
        <IRQ>
        fl_classify+0x1ad/0x1c0 [cls_flower]
        ? __update_load_avg_se.isra.47+0x1ca/0x1d0
        ? __update_load_avg_se.isra.47+0x1ca/0x1d0
        ? update_load_avg+0x665/0x690
        ? update_load_avg+0x665/0x690
        ? kmem_cache_alloc+0x38/0x1c0
        tcf_classify+0x89/0x140
        __netif_receive_skb_core+0x5ea/0xb70
        ? enqueue_entity+0xd0/0x270
        ? process_backlog+0x97/0x150
        process_backlog+0x97/0x150
        net_rx_action+0x14b/0x3e0
        __do_softirq+0xde/0x2b4
        do_softirq_own_stack+0x2a/0x40
        </IRQ>
        do_softirq.part.18+0x49/0x50
        __local_bh_enable_ip+0x49/0x50
        __dev_queue_xmit+0x4ab/0x8a0
        ? wait_woken+0x80/0x80
        ? packet_sendmsg+0x38f/0x810
        ? __dev_queue_xmit+0x8a0/0x8a0
        packet_sendmsg+0x38f/0x810
        sock_sendmsg+0x36/0x40
        __sys_sendto+0x10e/0x140
        ? do_vfs_ioctl+0xa4/0x630
        ? syscall_trace_enter+0x1df/0x2e0
        ? __audit_syscall_exit+0x22a/0x290
        __x64_sys_sendto+0x24/0x30
        do_syscall_64+0x5b/0x180
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x7fd67e18dc93
       Code: 48 8b 0d 18 83 20 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 59 c7 20 00 00 75 13 49 89 ca b8 2c 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 2b f7 ff ff 48 89 04 24
       RSP: 002b:00007ffe0189b748 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
       RAX: ffffffffffffffda RBX: 00000000020ca010 RCX: 00007fd67e18dc93
       RDX: 0000000000000062 RSI: 00000000020ca322 RDI: 0000000000000003
       RBP: 00007ffe0189b780 R08: 00007ffe0189b760 R09: 0000000000000014
       R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000062
       R13: 00000000020ca322 R14: 00007ffe0189b760 R15: 0000000000000003
       Modules linked in: act_tunnel_key act_gact cls_flower sch_ingress vrf veth act_csum(E) xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter intel_rapl snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek coretemp snd_hda_codec_generic kvm_intel kvm irqbypass snd_hda_intel crct10dif_pclmul crc32_pclmul hp_wmi ghash_clmulni_intel pcbc snd_hda_codec aesni_intel sparse_keymap rfkill snd_hda_core snd_hwdep snd_seq crypto_simd iTCO_wdt gpio_ich iTCO_vendor_support wmi_bmof cryptd mei_wdt glue_helper snd_seq_device snd_pcm pcspkr snd_timer snd i2c_i801 lpc_ich sg soundcore wmi mei_me
        mei ie31200_edac nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod sr_mod cdrom i915 video i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ahci crc32c_intel libahci serio_raw sfc libata mtd drm ixgbe mdio i2c_core e1000e dca
       CR2: 0000000000000000
       ---[ end trace 1ab8b5b5d4639dfc ]---
       RIP: 0010:tcf_action_exec+0xb8/0x100
       Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
       RSP: 0018:ffff95145ea03c40 EFLAGS: 00010246
       RAX: 0000000020000001 RBX: ffff9514499e5800 RCX: 0000000000000001
       RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
       RBP: ffff95145ea03e60 R08: 0000000000000000 R09: ffff95145ea03c9c
       R10: ffff95145ea03c78 R11: 0000000000000008 R12: ffff951456a69800
       R13: ffff951456a69808 R14: 0000000000000001 R15: ffff95144965ee40
       FS:  00007fd67ee11740(0000) GS:ffff95145ea00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000000 CR3: 00000001038a2006 CR4: 00000000001606f0
       Kernel panic - not syncing: Fatal exception in interrupt
       Kernel Offset: 0x11400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
       ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
      
      Fixes: d0f6dd8a ("net/sched: Introduce act_tunnel_key")
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      38230a3e
    • Davide Caratti's avatar
      net/sched: act_csum: fix NULL dereference when 'goto chain' is used · 11a245e2
      Davide Caratti authored
      the control action in the common member of struct tcf_csum must be a valid
      value, as it can contain the chain index when 'goto chain' is used. Ensure
      that the control action can be read as x->tcfa_action, when x is a pointer
      to struct tc_action and x->ops->type is TCA_ACT_CSUM, to prevent the
      following command:
      
        # tc filter add dev $h2 ingress protocol ip pref 1 handle 101 flower \
        > $tcflags dst_mac $h2mac action csum ip or tcp or udp or sctp goto chain 1
      
      from triggering a NULL pointer dereference when a matching packet is
      received.
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
       PGD 800000010416b067 P4D 800000010416b067 PUD 1041be067 PMD 0
       Oops: 0000 [#1] SMP PTI
       CPU: 0 PID: 3072 Comm: mausezahn Tainted: G            E     4.18.0-rc2.auguri+ #421
       Hardware name: Hewlett-Packard HP Z220 CMT Workstation/1790, BIOS K51 v01.58 02/07/2013
       RIP: 0010:tcf_action_exec+0xb8/0x100
       Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
       RSP: 0018:ffffa020dea03c40 EFLAGS: 00010246
       RAX: 0000000020000001 RBX: ffffa020d7ccef00 RCX: 0000000000000054
       RDX: 0000000000000000 RSI: ffffa020ca5ae000 RDI: ffffa020d7ccef00
       RBP: ffffa020dea03e60 R08: 0000000000000000 R09: ffffa020dea03c9c
       R10: ffffa020dea03c78 R11: 0000000000000008 R12: ffffa020d3fe4f00
       R13: ffffa020d3fe4f08 R14: 0000000000000001 R15: ffffa020d53ca300
       FS:  00007f5a46942740(0000) GS:ffffa020dea00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000000 CR3: 0000000104218002 CR4: 00000000001606f0
       Call Trace:
        <IRQ>
        fl_classify+0x1ad/0x1c0 [cls_flower]
        ? arp_rcv+0x121/0x1b0
        ? __x2apic_send_IPI_dest+0x40/0x40
        ? smp_reschedule_interrupt+0x1c/0xd0
        ? reschedule_interrupt+0xf/0x20
        ? reschedule_interrupt+0xa/0x20
        ? device_is_rmrr_locked+0xe/0x50
        ? iommu_should_identity_map+0x49/0xd0
        ? __intel_map_single+0x30/0x140
        ? e1000e_update_rdt_wa.isra.52+0x22/0xb0 [e1000e]
        ? e1000_alloc_rx_buffers+0x233/0x250 [e1000e]
        ? kmem_cache_alloc+0x38/0x1c0
        tcf_classify+0x89/0x140
        __netif_receive_skb_core+0x5ea/0xb70
        ? enqueue_task_fair+0xb6/0x7d0
        ? process_backlog+0x97/0x150
        process_backlog+0x97/0x150
        net_rx_action+0x14b/0x3e0
        __do_softirq+0xde/0x2b4
        do_softirq_own_stack+0x2a/0x40
        </IRQ>
        do_softirq.part.18+0x49/0x50
        __local_bh_enable_ip+0x49/0x50
        __dev_queue_xmit+0x4ab/0x8a0
        ? wait_woken+0x80/0x80
        ? packet_sendmsg+0x38f/0x810
        ? __dev_queue_xmit+0x8a0/0x8a0
        packet_sendmsg+0x38f/0x810
        sock_sendmsg+0x36/0x40
        __sys_sendto+0x10e/0x140
        ? do_vfs_ioctl+0xa4/0x630
        ? syscall_trace_enter+0x1df/0x2e0
        ? __audit_syscall_exit+0x22a/0x290
        __x64_sys_sendto+0x24/0x30
        do_syscall_64+0x5b/0x180
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x7f5a45cbec93
       Code: 48 8b 0d 18 83 20 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 59 c7 20 00 00 75 13 49 89 ca b8 2c 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 2b f7 ff ff 48 89 04 24
       RSP: 002b:00007ffd0ee6d748 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
       RAX: ffffffffffffffda RBX: 0000000001161010 RCX: 00007f5a45cbec93
       RDX: 0000000000000062 RSI: 0000000001161322 RDI: 0000000000000003
       RBP: 00007ffd0ee6d780 R08: 00007ffd0ee6d760 R09: 0000000000000014
       R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000062
       R13: 0000000001161322 R14: 00007ffd0ee6d760 R15: 0000000000000003
       Modules linked in: act_csum act_gact cls_flower sch_ingress vrf veth act_tunnel_key(E) xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_hda_codec_hdmi snd_hda_codec_realtek kvm snd_hda_codec_generic hp_wmi iTCO_wdt sparse_keymap rfkill mei_wdt iTCO_vendor_support wmi_bmof gpio_ich irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel snd_hda_intel crypto_simd cryptd snd_hda_codec glue_helper snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm pcspkr i2c_i801 snd_timer snd sg lpc_ich soundcore wmi mei_me
        mei ie31200_edac nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sr_mod cdrom sd_mod ahci libahci crc32c_intel i915 ixgbe serio_raw libata video dca i2c_algo_bit sfc drm_kms_helper syscopyarea mtd sysfillrect mdio sysimgblt fb_sys_fops drm e1000e i2c_core
       CR2: 0000000000000000
       ---[ end trace 3c9e9d1a77df4026 ]---
       RIP: 0010:tcf_action_exec+0xb8/0x100
       Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
       RSP: 0018:ffffa020dea03c40 EFLAGS: 00010246
       RAX: 0000000020000001 RBX: ffffa020d7ccef00 RCX: 0000000000000054
       RDX: 0000000000000000 RSI: ffffa020ca5ae000 RDI: ffffa020d7ccef00
       RBP: ffffa020dea03e60 R08: 0000000000000000 R09: ffffa020dea03c9c
       R10: ffffa020dea03c78 R11: 0000000000000008 R12: ffffa020d3fe4f00
       R13: ffffa020d3fe4f08 R14: 0000000000000001 R15: ffffa020d53ca300
       FS:  00007f5a46942740(0000) GS:ffffa020dea00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000000 CR3: 0000000104218002 CR4: 00000000001606f0
       Kernel panic - not syncing: Fatal exception in interrupt
       Kernel Offset: 0x26400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
       ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
      
      Fixes: 9c5f69bb ("net/sched: act_csum: don't use spinlock in the fast path")
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      11a245e2
    • Harini Katakam's avatar
      net: macb: Allocate valid memory for TX and RX BD prefetch · 404cd086
      Harini Katakam authored
      GEM version in ZynqMP and most versions greater than r1p07 supports
      TX and RX BD prefetch. The number of BDs that can be prefetched is a
      HW configurable parameter. For ZynqMP, this parameter is 4.
      
      When GEM DMA is accessing the last BD in the ring, even before the
      BD is processed and the WRAP bit is noticed, it will have prefetched
      BDs outside the BD ring. These will not be processed but it is
      necessary to have accessible memory after the last BD. Especially
      in cases where SMMU is used, memory locations immediately after the
      last BD may not have translation tables triggering HRESP errors. Hence
      always allocate extra BDs to accommodate for prefetch.
      The value of tx/rx bd prefetch for any given SoC version is:
      2 ^ (corresponding field in design config 10 register).
      (value of this field >= 1)
      
      Added a capability flag so that older IP versions that do not have
      DCFG10 or this prefetch capability are not affected.
      Signed-off-by: default avatarHarini Katakam <harini.katakam@xilinx.com>
      Reviewed-by: default avatarClaudiu Beznea <claudiu.beznea@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      404cd086
    • Harini Katakam's avatar
      net: macb: Free RX ring for all queues · e50b770e
      Harini Katakam authored
      rx ring is allocated for all queues in macb_alloc_consistent.
      Free the same for all queues instead of just Q0.
      Signed-off-by: default avatarHarini Katakam <harini.katakam@xilinx.com>
      Reviewed-by: default avatarClaudiu Beznea <claudiu.beznea@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e50b770e
    • Ursula Braun's avatar
      net/smc: reduce sock_put() for fallback sockets · e1bbdd57
      Ursula Braun authored
      smc_release() calls a sock_put() for smc fallback sockets to cover
      the passive closing sock_hold() in __smc_connect() and
      smc_tcp_listen_work(). This does not make sense for sockets in state
      SMC_LISTEN and SMC_INIT.
      An SMC socket stays in state SMC_INIT if connect fails. The sock_put
      in smc_connect_abort() does not cover all failures. Move it into
      smc_connect_decline_fallback().
      
      Fixes: ee9dfbef ("net/smc: handle sockopts forcing fallback")
      Reported-by: syzbot+3a0748c8f2f210c0ef9b@syzkaller.appspotmail.com
      Reported-by: syzbot+9e60d2428a42049a592a@syzkaller.appspotmail.com
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e1bbdd57
    • Arnd Bergmann's avatar
      net: bridge: fix br_vlan_get_{pvid,info} return values · 000244d3
      Arnd Bergmann authored
      These two functions return the regular -EINVAL failure in the normal
      code path, but return a nonstandard '-1' error otherwise, which gets
      interpreted as -EPERM.
      
      Let's change it to -EINVAL for the dummy functions as well.
      
      Fixes: 4d4fd361 ("net: bridge: Publish bridge accessor functions")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      000244d3
    • Casey Leedom's avatar
      cxgb4: assume flash part size to be 4MB, if it can't be determined · 843789f6
      Casey Leedom authored
      t4_get_flash_params() fails in a fatal fashion if the FLASH part isn't
      one of the recognized parts. But this leads to desperate efforts to update
      drivers when various FLASH parts which we are using suddenly become
      unavailable and we need to substitute new FLASH parts.  This has lead to
      more than one Customer Field Emergency when a Customer has an old driver
      and suddenly can't use newly shipped adapters.
      
      This commit fixes this by simply assuming that the FLASH part is 4MB in
      size if it can't be identified. Note that all Chelsio adapters will have
      flash parts which are at least 4MB in size.
      Signed-off-by: default avatarCasey Leedom <leedom@chelsio.com>
      Signed-off-by: default avatarGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      843789f6
    • David S. Miller's avatar
      Merge branch 'tipc-dad-fixes' · 7f978e85
      David S. Miller authored
      Jon Maloy says:
      
      ====================
      tipc: fixes in duplicate address discovery function
      
      commit 25b0b9c4 ("tipc: handle collisions of 32-bit node address
      hash values") introduced new functionality that has turned out to
      contain several bugs and weaknesses.
      
      We address those in this series.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7f978e85
    • Jon Maloy's avatar
      tipc: make function tipc_net_finalize() thread safe · 9faa89d4
      Jon Maloy authored
      The setting of the node address is not thread safe, meaning that
      two discoverers may decide to set it simultanously, with a duplicate
      entry in the name table as result. We fix that with this commit.
      
      Fixes: 25b0b9c4 ("tipc: handle collisions of 32-bit node address hash values")
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9faa89d4
    • Jon Maloy's avatar
      tipc: fix correct setting of message type in second discoverer · 92018c7c
      Jon Maloy authored
      The duplicate address discovery protocol is not safe against two
      discoverers running in parallel. The one executing first after the
      trial period is over will set the node address and change its own
      message type to DSC_REQ_MSG. The one executing last may find that the
      node address is already set, and never change message type, with the
      result that its links may never be established.
      
      In this commmit we ensure that the message type always is set correctly
      after the trial period is over.
      
      Fixes: 25b0b9c4 ("tipc: handle collisions of 32-bit node address hash values")
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      92018c7c
    • Jon Maloy's avatar
      tipc: correct discovery message handling during address trial period · e415577f
      Jon Maloy authored
      With the duplicate address discovery protocol for tipc nodes addresses
      we introduced a one second trial period before a node is allocated a
      hash number to use as address.
      
      Unfortunately, we miss to handle the case when a regular LINK REQUEST/
      RESPONSE arrives from a cluster node during the trial period. Such
      messages are not ignored as they should be, leading to links setup
      attempts while the node still has no address.
      
      Fixes: 25b0b9c4 ("tipc: handle collisions of 32-bit node address hash values")
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e415577f
    • Jon Maloy's avatar
      tipc: fix wrong return value from function tipc_node_try_addr() · 2a57f182
      Jon Maloy authored
      The function for checking if there is an node address conflict is
      supposed to return a suggestion for a new address if it finds a
      conflict, and zero otherwise. But in case the peer being checked
      is previously unknown it does instead return a "suggestion" for
      the checked address itself. This results in a DSC_TRIAL_FAIL_MSG
      being sent unecessarily to the peer, and sometimes makes the trial
      period starting over again.
      
      Fixes: 25b0b9c4 ("tipc: handle collisions of 32-bit node address hash values")
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2a57f182
    • David S. Miller's avatar
      Merge branch 'ravb-sh_eth-fix-sleep-in-atomic-by-reusing-shared-ethtool-handlers' · 0f62aeec
      David S. Miller authored
      Vladimir Zapolskiy says:
      
      ====================
      ravb/sh_eth: fix sleep in atomic by reusing shared ethtool handlers
      
      For ages trivial changes to RAVB and SuperH ethernet links by means of
      standard 'ethtool' trigger a 'sleeping function called from invalid
      context' bug, to visualize it on r8a7795 ULCB:
      
        % ethtool -r eth0
        BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747
        in_atomic(): 1, irqs_disabled(): 128, pid: 554, name: ethtool
        INFO: lockdep is turned off.
        irq event stamp: 0
        hardirqs last  enabled at (0): [<0000000000000000>]           (null)
        hardirqs last disabled at (0): [<ffff0000080e1d3c>] copy_process.isra.7.part.8+0x2cc/0x1918
        softirqs last  enabled at (0): [<ffff0000080e1d3c>] copy_process.isra.7.part.8+0x2cc/0x1918
        softirqs last disabled at (0): [<0000000000000000>]           (null)
        CPU: 5 PID: 554 Comm: ethtool Not tainted 4.17.0-rc4-arm64-renesas+ #33
        Hardware name: Renesas H3ULCB board based on r8a7795 ES2.0+ (DT)
        Call trace:
         dump_backtrace+0x0/0x198
         show_stack+0x24/0x30
         dump_stack+0xb8/0xf4
         ___might_sleep+0x1c8/0x1f8
         __might_sleep+0x58/0x90
         __mutex_lock+0x50/0x890
         mutex_lock_nested+0x3c/0x50
         phy_start_aneg_priv+0x38/0x180
         phy_start_aneg+0x24/0x30
         ravb_nway_reset+0x3c/0x68
         dev_ethtool+0x3dc/0x2338
         dev_ioctl+0x19c/0x490
         sock_do_ioctl+0xe0/0x238
         sock_ioctl+0x254/0x460
         do_vfs_ioctl+0xb0/0x918
         ksys_ioctl+0x50/0x80
         sys_ioctl+0x34/0x48
         __sys_trace_return+0x0/0x4
      
      The root cause is that an attempt to modify ECMR and GECMR registers
      only when RX/TX function is disabled was too overcomplicated in its
      original implementation, also processing of an optional Link Change
      interrupt added even more complexity, as a result the implementation
      was error prone.
      
      The new locking scheme is confirmed to be correct by dumping driver
      specific and generic PHY framework function calls with aid of ftrace
      while running more or less advanced tests.
      
      Please note that sh_eth patches from the series were built-tested only.
      
      On purpose I do not add Fixes tags, the reused PHY handlers were added
      way later than the fixed problems were firstly found in the drivers.
      
      Changes from v1 to v2:
      * the original patches are split to bugfixes and enhancements only,
        both v1 and v2 series are absolutely equal in total, thus I omit
        description of changes in individual patches,
      * the latter implies that there should be no strict need for retesting,
        but because formally two series are different, I have to drop the tags
        given by Geert and Andrew, please send your tags again.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0f62aeec