1. 09 Mar, 2021 2 commits
    • Yonghong Song's avatar
      bpf, x86: Use kvmalloc_array instead kmalloc_array in bpf_jit_comp · de920fc6
      Yonghong Song authored
      x86 bpf_jit_comp.c used kmalloc_array to store jited addresses
      for each bpf insn. With a large bpf program, we have see the
      following allocation failures in our production server:
      
          page allocation failure: order:5, mode:0x40cc0(GFP_KERNEL|__GFP_COMP),
                                   nodemask=(null),cpuset=/,mems_allowed=0"
          Call Trace:
          dump_stack+0x50/0x70
          warn_alloc.cold.120+0x72/0xd2
          ? __alloc_pages_direct_compact+0x157/0x160
          __alloc_pages_slowpath+0xcdb/0xd00
          ? get_page_from_freelist+0xe44/0x1600
          ? vunmap_page_range+0x1ba/0x340
          __alloc_pages_nodemask+0x2c9/0x320
          kmalloc_order+0x18/0x80
          kmalloc_order_trace+0x1d/0xa0
          bpf_int_jit_compile+0x1e2/0x484
          ? kmalloc_order_trace+0x1d/0xa0
          bpf_prog_select_runtime+0xc3/0x150
          bpf_prog_load+0x480/0x720
          ? __mod_memcg_lruvec_state+0x21/0x100
          __do_sys_bpf+0xc31/0x2040
          ? close_pdeo+0x86/0xe0
          do_syscall_64+0x42/0x110
          entry_SYSCALL_64_after_hwframe+0x44/0xa9
          RIP: 0033:0x7f2f300f7fa9
          Code: Bad RIP value.
      
      Dumped assembly:
      
          ffffffff810b6d70 <bpf_int_jit_compile>:
          ; {
          ffffffff810b6d70: e8 eb a5 b4 00        callq   0xffffffff81c01360 <__fentry__>
          ffffffff810b6d75: 41 57                 pushq   %r15
          ...
          ffffffff810b6f39: e9 72 fe ff ff        jmp     0xffffffff810b6db0 <bpf_int_jit_compile+0x40>
          ;       addrs = kmalloc_array(prog->len + 1, sizeof(*addrs), GFP_KERNEL);
          ffffffff810b6f3e: 8b 45 0c              movl    12(%rbp), %eax
          ;       return __kmalloc(bytes, flags);
          ffffffff810b6f41: be c0 0c 00 00        movl    $3264, %esi
          ;       addrs = kmalloc_array(prog->len + 1, sizeof(*addrs), GFP_KERNEL);
          ffffffff810b6f46: 8d 78 01              leal    1(%rax), %edi
          ;       if (unlikely(check_mul_overflow(n, size, &bytes)))
          ffffffff810b6f49: 48 c1 e7 02           shlq    $2, %rdi
          ;       return __kmalloc(bytes, flags);
          ffffffff810b6f4d: e8 8e 0c 1d 00        callq   0xffffffff81287be0 <__kmalloc>
          ;       if (!addrs) {
          ffffffff810b6f52: 48 85 c0              testq   %rax, %rax
      
      Change kmalloc_array() to kvmalloc_array() to avoid potential
      allocation error for big bpf programs.
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210309015647.3657852-1-yhs@fb.com
      de920fc6
    • Yonghong Song's avatar
      bpf: Don't do bpf_cgroup_storage_set() for kuprobe/tp programs · 05a68ce5
      Yonghong Song authored
      For kuprobe and tracepoint bpf programs, kernel calls
      trace_call_bpf() which calls BPF_PROG_RUN_ARRAY_CHECK()
      to run the program array. Currently, BPF_PROG_RUN_ARRAY_CHECK()
      also calls bpf_cgroup_storage_set() to set percpu
      cgroup local storage with NULL value. This is
      due to Commit 394e40a2 ("bpf: extend bpf_prog_array to store
      pointers to the cgroup storage") which modified
      __BPF_PROG_RUN_ARRAY() to call bpf_cgroup_storage_set()
      and this macro is also used by BPF_PROG_RUN_ARRAY_CHECK().
      
      kuprobe and tracepoint programs are not allowed to call
      bpf_get_local_storage() helper hence does not
      access percpu cgroup local storage. Let us
      change BPF_PROG_RUN_ARRAY_CHECK() not to
      modify percpu cgroup local storage.
      
      The issue is observed when I tried to debug [1] where
      percpu data is overwritten due to
        preempt_disable -> migration_disable
      change. This patch does not completely fix the above issue,
      which will be addressed separately, e.g., multiple cgroup
      prog runs may preempt each other. But it does fix
      any potential issue caused by tracing program
      overwriting percpu cgroup storage:
       - in a busy system, a tracing program is to run between
         bpf_cgroup_storage_set() and the cgroup prog run.
       - a kprobe program is triggered by a helper in cgroup prog
         before bpf_get_local_storage() is called.
      
       [1] https://lore.kernel.org/bpf/CAKH8qBuXCfUz=w8L+Fj74OaUpbosO29niYwTki7e3Ag044_aww@mail.gmail.com/T
      
      Fixes: 394e40a2 ("bpf: extend bpf_prog_array to store pointers to the cgroup storage")
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarRoman Gushchin <guro@fb.com>
      Link: https://lore.kernel.org/bpf/20210309185028.3763817-1-yhs@fb.com
      05a68ce5
  2. 08 Mar, 2021 5 commits
  3. 05 Mar, 2021 15 commits
  4. 04 Mar, 2021 18 commits
    • Paul Moore's avatar
      cipso,calipso: resolve a number of problems with the DOI refcounts · ad5d07f4
      Paul Moore authored
      The current CIPSO and CALIPSO refcounting scheme for the DOI
      definitions is a bit flawed in that we:
      
      1. Don't correctly match gets/puts in netlbl_cipsov4_list().
      2. Decrement the refcount on each attempt to remove the DOI from the
         DOI list, only removing it from the list once the refcount drops
         to zero.
      
      This patch fixes these problems by adding the missing "puts" to
      netlbl_cipsov4_list() and introduces a more conventional, i.e.
      not-buggy, refcounting mechanism to the DOI definitions.  Upon the
      addition of a DOI to the DOI list, it is initialized with a refcount
      of one, removing a DOI from the list removes it from the list and
      drops the refcount by one; "gets" and "puts" behave as expected with
      respect to refcounts, increasing and decreasing the DOI's refcount by
      one.
      
      Fixes: b1edeb10 ("netlabel: Replace protocol/NetLabel linking with refrerence counts")
      Fixes: d7cce015 ("netlabel: Add support for removing a CALIPSO DOI.")
      Reported-by: syzbot+9ec037722d2603a9f52e@syzkaller.appspotmail.com
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad5d07f4
    • Jiri Wiesner's avatar
      ibmvnic: always store valid MAC address · 67eb2114
      Jiri Wiesner authored
      The last change to ibmvnic_set_mac(), 8fc3672a, meant to prevent
      users from setting an invalid MAC address on an ibmvnic interface
      that has not been brought up yet. The change also prevented the
      requested MAC address from being stored by the adapter object for an
      ibmvnic interface when the state of the ibmvnic interface is
      VNIC_PROBED - that is after probing has finished but before the
      ibmvnic interface is brought up. The MAC address stored by the
      adapter object is used and sent to the hypervisor for checking when
      an ibmvnic interface is brought up.
      
      The ibmvnic driver ignoring the requested MAC address when in
      VNIC_PROBED state caused LACP bonds (bonds in 802.3ad mode) with more
      than one slave to malfunction. The bonding code must be able to
      change the MAC address of its slaves before they are brought up
      during enslaving. The inability of kernels with 8fc3672a to set
      the MAC addresses of bonding slaves is observable in the output of
      "ip address show". The MAC addresses of the slaves are the same as
      the MAC address of the bond on a working system whereas the slaves
      retain their original MAC addresses on a system with a malfunctioning
      LACP bond.
      
      Fixes: 8fc3672a ("ibmvnic: fix ibmvnic_set_mac")
      Signed-off-by: default avatarJiri Wiesner <jwiesner@suse.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      67eb2114
    • Hillf Danton's avatar
      netdevsim: init u64 stats for 32bit hardware · 863a42b2
      Hillf Danton authored
      Init the u64 stats in order to avoid the lockdep prints on the 32bit
      hardware like
      
       INFO: trying to register non-static key.
       the code is fine but needs lockdep annotation.
       turning off the locking correctness validator.
       CPU: 0 PID: 4695 Comm: syz-executor.0 Not tainted 5.11.0-rc5-syzkaller #0
       Hardware name: ARM-Versatile Express
       Backtrace:
       [<826fc5b8>] (dump_backtrace) from [<826fc82c>] (show_stack+0x18/0x1c arch/arm/kernel/traps.c:252)
       [<826fc814>] (show_stack) from [<8270d1f8>] (__dump_stack lib/dump_stack.c:79 [inline])
       [<826fc814>] (show_stack) from [<8270d1f8>] (dump_stack+0xa8/0xc8 lib/dump_stack.c:120)
       [<8270d150>] (dump_stack) from [<802bf9c0>] (assign_lock_key kernel/locking/lockdep.c:935 [inline])
       [<8270d150>] (dump_stack) from [<802bf9c0>] (register_lock_class+0xabc/0xb68 kernel/locking/lockdep.c:1247)
       [<802bef04>] (register_lock_class) from [<802baa2c>] (__lock_acquire+0x84/0x32d4 kernel/locking/lockdep.c:4711)
       [<802ba9a8>] (__lock_acquire) from [<802be840>] (lock_acquire.part.0+0xf0/0x554 kernel/locking/lockdep.c:5442)
       [<802be750>] (lock_acquire.part.0) from [<802bed10>] (lock_acquire+0x6c/0x74 kernel/locking/lockdep.c:5415)
       [<802beca4>] (lock_acquire) from [<81560548>] (seqcount_lockdep_reader_access include/linux/seqlock.h:103 [inline])
       [<802beca4>] (lock_acquire) from [<81560548>] (__u64_stats_fetch_begin include/linux/u64_stats_sync.h:164 [inline])
       [<802beca4>] (lock_acquire) from [<81560548>] (u64_stats_fetch_begin include/linux/u64_stats_sync.h:175 [inline])
       [<802beca4>] (lock_acquire) from [<81560548>] (nsim_get_stats64+0xdc/0xf0 drivers/net/netdevsim/netdev.c:70)
       [<8156046c>] (nsim_get_stats64) from [<81e2efa0>] (dev_get_stats+0x44/0xd0 net/core/dev.c:10405)
       [<81e2ef5c>] (dev_get_stats) from [<81e53204>] (rtnl_fill_stats+0x38/0x120 net/core/rtnetlink.c:1211)
       [<81e531cc>] (rtnl_fill_stats) from [<81e59d58>] (rtnl_fill_ifinfo+0x6d4/0x148c net/core/rtnetlink.c:1783)
       [<81e59684>] (rtnl_fill_ifinfo) from [<81e5ceb4>] (rtmsg_ifinfo_build_skb+0x9c/0x108 net/core/rtnetlink.c:3798)
       [<81e5ce18>] (rtmsg_ifinfo_build_skb) from [<81e5d0ac>] (rtmsg_ifinfo_event net/core/rtnetlink.c:3830 [inline])
       [<81e5ce18>] (rtmsg_ifinfo_build_skb) from [<81e5d0ac>] (rtmsg_ifinfo_event net/core/rtnetlink.c:3821 [inline])
       [<81e5ce18>] (rtmsg_ifinfo_build_skb) from [<81e5d0ac>] (rtmsg_ifinfo+0x44/0x70 net/core/rtnetlink.c:3839)
       [<81e5d068>] (rtmsg_ifinfo) from [<81e45c2c>] (register_netdevice+0x664/0x68c net/core/dev.c:10103)
       [<81e455c8>] (register_netdevice) from [<815608bc>] (nsim_create+0xf8/0x124 drivers/net/netdevsim/netdev.c:317)
       [<815607c4>] (nsim_create) from [<81561184>] (__nsim_dev_port_add+0x108/0x188 drivers/net/netdevsim/dev.c:941)
       [<8156107c>] (__nsim_dev_port_add) from [<815620d8>] (nsim_dev_port_add_all drivers/net/netdevsim/dev.c:990 [inline])
       [<8156107c>] (__nsim_dev_port_add) from [<815620d8>] (nsim_dev_probe+0x5cc/0x750 drivers/net/netdevsim/dev.c:1119)
       [<81561b0c>] (nsim_dev_probe) from [<815661dc>] (nsim_bus_probe+0x10/0x14 drivers/net/netdevsim/bus.c:287)
       [<815661cc>] (nsim_bus_probe) from [<811724c0>] (really_probe+0x100/0x50c drivers/base/dd.c:554)
       [<811723c0>] (really_probe) from [<811729c4>] (driver_probe_device+0xf8/0x1c8 drivers/base/dd.c:740)
       [<811728cc>] (driver_probe_device) from [<81172fe4>] (__device_attach_driver+0x8c/0xf0 drivers/base/dd.c:846)
       [<81172f58>] (__device_attach_driver) from [<8116fee0>] (bus_for_each_drv+0x88/0xd8 drivers/base/bus.c:431)
       [<8116fe58>] (bus_for_each_drv) from [<81172c6c>] (__device_attach+0xdc/0x1d0 drivers/base/dd.c:914)
       [<81172b90>] (__device_attach) from [<8117305c>] (device_initial_probe+0x14/0x18 drivers/base/dd.c:961)
       [<81173048>] (device_initial_probe) from [<81171358>] (bus_probe_device+0x90/0x98 drivers/base/bus.c:491)
       [<811712c8>] (bus_probe_device) from [<8116e77c>] (device_add+0x320/0x824 drivers/base/core.c:3109)
       [<8116e45c>] (device_add) from [<8116ec9c>] (device_register+0x1c/0x20 drivers/base/core.c:3182)
       [<8116ec80>] (device_register) from [<81566710>] (nsim_bus_dev_new drivers/net/netdevsim/bus.c:336 [inline])
       [<8116ec80>] (device_register) from [<81566710>] (new_device_store+0x178/0x208 drivers/net/netdevsim/bus.c:215)
       [<81566598>] (new_device_store) from [<8116fcb4>] (bus_attr_store+0x2c/0x38 drivers/base/bus.c:122)
       [<8116fc88>] (bus_attr_store) from [<805b4b8c>] (sysfs_kf_write+0x48/0x54 fs/sysfs/file.c:139)
       [<805b4b44>] (sysfs_kf_write) from [<805b3c90>] (kernfs_fop_write_iter+0x128/0x1ec fs/kernfs/file.c:296)
       [<805b3b68>] (kernfs_fop_write_iter) from [<804d22fc>] (call_write_iter include/linux/fs.h:1901 [inline])
       [<805b3b68>] (kernfs_fop_write_iter) from [<804d22fc>] (new_sync_write fs/read_write.c:518 [inline])
       [<805b3b68>] (kernfs_fop_write_iter) from [<804d22fc>] (vfs_write+0x3dc/0x57c fs/read_write.c:605)
       [<804d1f20>] (vfs_write) from [<804d2604>] (ksys_write+0x68/0xec fs/read_write.c:658)
       [<804d259c>] (ksys_write) from [<804d2698>] (__do_sys_write fs/read_write.c:670 [inline])
       [<804d259c>] (ksys_write) from [<804d2698>] (sys_write+0x10/0x14 fs/read_write.c:667)
       [<804d2688>] (sys_write) from [<80200060>] (ret_fast_syscall+0x0/0x2c arch/arm/mm/proc-v7.S:64)
      
      Fixes: 83c9e13a ("netdevsim: add software driver for testing offloads")
      Reported-by: syzbot+e74a6857f2d0efe3ad81@syzkaller.appspotmail.com
      Tested-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarHillf Danton <hdanton@sina.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      863a42b2
    • David S. Miller's avatar
      Merge branch 'mptcp-fixes' · bdda7dfa
      David S. Miller authored
      Mat Martineau says:
      
      ====================
      mptcp: Fixes for v5.12
      
      These patches from the MPTCP tree fix a few multipath TCP issues:
      
      Patches 1 and 5 clear some stale pointers when subflows close.
      
      Patches 2, 4, and 9 plug some memory leaks.
      
      Patch 3 fixes a memory accounting error identified by syzkaller.
      
      Patches 6 and 7 fix a race condition that slowed data transmission.
      
      Patch 8 adds missing wakeups when write buffer space is freed.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bdda7dfa
    • Geliang Tang's avatar
      mptcp: free resources when the port number is mismatched · 9238e900
      Geliang Tang authored
      When the port number is mismatched with the announced ones, use
      'goto dispose_child' to free the resources instead of using 'goto out'.
      
      This patch also moves the port number checking code in
      subflow_syn_recv_sock before mptcp_finish_join, otherwise subflow_drop_ctx
      will fail in dispose_child.
      
      Fixes: 5bc56388 ("mptcp: add port number check for MP_JOIN")
      Reported-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarGeliang Tang <geliangtang@gmail.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9238e900
    • Paolo Abeni's avatar
      mptcp: fix missing wakeup · 417789df
      Paolo Abeni authored
      __mptcp_clean_una() can free write memory and should wake-up
      user-space processes when needed.
      
      When such function is invoked by the MPTCP receive path, the wakeup
      is not needed, as the TCP stack will later trigger subflow_write_space
      which will do the wakeup as needed.
      
      Other __mptcp_clean_una() call sites need an additional wakeup check
      Let's bundle the relevant code in a new helper and use it.
      
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/165
      Fixes: 6e628cd3 ("mptcp: use mptcp release_cb for delayed tasks")
      Fixes: 64b9cea7 ("mptcp: fix spurious retransmissions")
      Tested-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      417789df
    • Paolo Abeni's avatar
      mptcp: fix race in release_cb · c2e6048f
      Paolo Abeni authored
      If we receive a MPTCP_PUSH_PENDING even from a subflow when
      mptcp_release_cb() is serving the previous one, the latter
      will be delayed up to the next release_sock(msk).
      
      Address the issue implementing a test/serve loop for such
      event.
      
      Additionally rename the push helper to __mptcp_push_pending()
      to be more consistent with the existing code.
      
      Fixes: 6e628cd3 ("mptcp: use mptcp release_cb for delayed tasks")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c2e6048f
    • Paolo Abeni's avatar
      mptcp: factor out __mptcp_retrans helper() · 2948d0a1
      Paolo Abeni authored
      Will simplify the following patch, no functional change
      intended.
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2948d0a1
    • Florian Westphal's avatar
      mptcp: reset 'first' and ack_hint on subflow close · c8fe62f0
      Florian Westphal authored
      Just like with last_snd, we have to NULL 'first' on subflow close.
      
      ack_hint isn't strictly required (its never dereferenced), but better to
      clear this explicitly as well instead of making it an exception.
      
      msk->first is dereferenced unconditionally at accept time, but
      at that point the ssk is not on the conn_list yet -- this means
      worker can't see it when iterating the conn_list.
      Reported-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c8fe62f0
    • Florian Westphal's avatar
      mptcp: dispose initial struct socket when its subflow is closed · 17aee05d
      Florian Westphal authored
      Christoph Paasch reported following crash:
      dst_release underflow
      WARNING: CPU: 0 PID: 1319 at net/core/dst.c:175 dst_release+0xc1/0xd0 net/core/dst.c:175
      CPU: 0 PID: 1319 Comm: syz-executor217 Not tainted 5.11.0-rc6af8e85128b4d0d24083c5cac646e891227052e0c #70
      Call Trace:
       rt_cache_route+0x12e/0x140 net/ipv4/route.c:1503
       rt_set_nexthop.constprop.0+0x1fc/0x590 net/ipv4/route.c:1612
       __mkroute_output net/ipv4/route.c:2484 [inline]
      ...
      
      The worker leaves msk->subflow alone even when it
      happened to close the subflow ssk associated with it.
      
      Fixes: 866f26f2 ("mptcp: always graft subflow socket to parent")
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/157Reported-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Suggested-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      17aee05d
    • Paolo Abeni's avatar
      mptcp: fix memory accounting on allocation error · eaeef1ce
      Paolo Abeni authored
      In case of memory pressure the MPTCP xmit path keeps
      at most a single skb in the tx cache, eventually freeing
      additional ones.
      
      The associated counter for forward memory is not update
      accordingly, and that causes the following splat:
      
      WARNING: CPU: 0 PID: 12 at net/core/stream.c:208 sk_stream_kill_queues+0x3ca/0x530 net/core/stream.c:208
      Modules linked in:
      CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.11.0-rc2 #59
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      Workqueue: events mptcp_worker
      RIP: 0010:sk_stream_kill_queues+0x3ca/0x530 net/core/stream.c:208
      Code: 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e 63 01 00 00 8b ab 00 01 00 00 e9 60 ff ff ff e8 2f 24 d3 fe 0f 0b eb 97 e8 26 24 d3 fe <0f> 0b eb a0 e8 1d 24 d3 fe 0f 0b e9 a5 fe ff ff 4c 89 e7 e8 0e d0
      RSP: 0018:ffffc900000c7bc8 EFLAGS: 00010293
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: ffff88810030ac40 RSI: ffffffff8262ca4a RDI: 0000000000000003
      RBP: 0000000000000d00 R08: 0000000000000000 R09: ffffffff85095aa7
      R10: ffffffff8262c9ea R11: 0000000000000001 R12: ffff888108908100
      R13: ffffffff85095aa0 R14: ffffc900000c7c48 R15: 1ffff92000018f85
      FS:  0000000000000000(0000) GS:ffff88811b200000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fa7444baef8 CR3: 0000000035ee9005 CR4: 0000000000170ef0
      Call Trace:
       __mptcp_destroy_sock+0x4a7/0x6c0 net/mptcp/protocol.c:2547
       mptcp_worker+0x7dd/0x1610 net/mptcp/protocol.c:2272
       process_one_work+0x896/0x1170 kernel/workqueue.c:2275
       worker_thread+0x605/0x1350 kernel/workqueue.c:2421
       kthread+0x344/0x410 kernel/kthread.c:292
       ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:296
      
      At close time, as reported by syzkaller/Christoph.
      
      This change address the issue properly updating the fwd
      allocated memory counter in the error path.
      Reported-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/136
      Fixes: 724cfd2e ("mptcp: allocate TX skbs in msk context")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eaeef1ce
    • Florian Westphal's avatar
      mptcp: put subflow sock on connect error · f0715779
      Florian Westphal authored
      mptcp_add_pending_subflow() performs a sock_hold() on the subflow,
      then adds the subflow to the join list.
      
      Without a sock_put the subflow sk won't be freed in case connect() fails.
      
      unreferenced object 0xffff88810c03b100 (size 3000):
      [..]
          sk_prot_alloc.isra.0+0x2f/0x110
          sk_alloc+0x5d/0xc20
          inet6_create+0x2b7/0xd30
          __sock_create+0x17f/0x410
          mptcp_subflow_create_socket+0xff/0x9c0
          __mptcp_subflow_connect+0x1da/0xaf0
          mptcp_pm_nl_work+0x6e0/0x1120
          mptcp_worker+0x508/0x9a0
      
      Fixes: 5b950ff4 ("mptcp: link MPC subflow into msk only after accept")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f0715779
    • Florian Westphal's avatar
      mptcp: reset last_snd on subflow close · e0be4931
      Florian Westphal authored
      Send logic caches last active subflow in the msk, so it needs to be
      cleared when the cached subflow is closed.
      
      Fixes: d5f49190 ("mptcp: allow picking different xmit subflows")
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/155Reported-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e0be4931
    • Maximilian Heyne's avatar
      net: sched: avoid duplicates in classes dump · bfc25605
      Maximilian Heyne authored
      This is a follow up of commit ea327469 ("net: sched: avoid
      duplicates in qdisc dump") which has fixed the issue only for the qdisc
      dump.
      
      The duplicate printing also occurs when dumping the classes via
        tc class show dev eth0
      
      Fixes: 59cc1f61 ("net: sched: convert qdisc linked list to hashtable")
      Signed-off-by: default avatarMaximilian Heyne <mheyne@amazon.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bfc25605
    • Daniele Palmas's avatar
      net: usb: qmi_wwan: allow qmimux add/del with master up · 6c59cff3
      Daniele Palmas authored
      There's no reason for preventing the creation and removal
      of qmimux network interfaces when the underlying interface
      is up.
      
      This makes qmi_wwan mux implementation more similar to the
      rmnet one, simplifying userspace management of the same
      logical interfaces.
      
      Fixes: c6adf779 ("net: usb: qmi_wwan: add qmap mux protocol support")
      Reported-by: default avatarAleksander Morgado <aleksander@aleksander.es>
      Signed-off-by: default avatarDaniele Palmas <dnlplm@gmail.com>
      Acked-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6c59cff3
    • Vladimir Oltean's avatar
      net: dsa: sja1105: fix ucast/bcast flooding always remaining enabled · 6a5166e0
      Vladimir Oltean authored
      In the blamed patch I managed to introduce a bug while moving code
      around: the same logic is applied to the ucast_egress_floods and
      bcast_egress_floods variables both on the "if" and the "else" branches.
      
      This is clearly an unintended change compared to how the code used to be
      prior to that bugfix, so restore it.
      
      Fixes: 7f7ccdea ("net: dsa: sja1105: fix leakage of flooded frames outside bridging domain")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6a5166e0
    • Vladimir Oltean's avatar
      net: dsa: sja1105: fix SGMII PCS being forced to SPEED_UNKNOWN instead of SPEED_10 · 053d8ad1
      Vladimir Oltean authored
      When using MLO_AN_PHY or MLO_AN_FIXED, the MII_BMCR of the SGMII PCS is
      read before resetting the switch so it can be reprogrammed afterwards.
      This works for the speeds of 1Gbps and 100Mbps, but not for 10Mbps,
      because SPEED_10 is actually 0, so AND-ing anything with 0 is false,
      therefore that last branch is dead code.
      
      Do what others do (genphy_read_status_fixed, phy_mii_ioctl) and just
      remove the check for SPEED_10, let it fall into the default case.
      
      Fixes: ffe10e67 ("net: dsa: sja1105: Add support for the SGMII port")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      053d8ad1
    • Vladimir Oltean's avatar
      net: mscc: ocelot: properly reject destination IP keys in VCAP IS1 · f1becbed
      Vladimir Oltean authored
      An attempt is made to warn the user about the fact that VCAP IS1 cannot
      offload keys matching on destination IP (at least given the current half
      key format), but sadly that warning fails miserably in practice, due to
      the fact that it operates on an uninitialized "match" variable. We must
      first decode the keys from the flow rule.
      
      Fixes: 75944fda ("net: mscc: ocelot: offload ingress skbedit and vlan actions to VCAP IS1")
      Reported-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f1becbed