1. 08 Jul, 2019 4 commits
    • Taehee Yoo's avatar
      gtp: fix Illegal context switch in RCU read-side critical section. · 3f167e19
      Taehee Yoo authored
      ipv4_pdp_add() is called in RCU read-side critical section.
      So GFP_KERNEL should not be used in the function.
      This patch make ipv4_pdp_add() to use GFP_ATOMIC instead of GFP_KERNEL.
      
      Test commands:
      gtp-link add gtp1 &
      gtp-tunnel add gtp1 v1 100 200 1.1.1.1 2.2.2.2
      
      Splat looks like:
      [  130.618881] =============================
      [  130.626382] WARNING: suspicious RCU usage
      [  130.626994] 5.2.0-rc6+ #50 Not tainted
      [  130.627622] -----------------------------
      [  130.628223] ./include/linux/rcupdate.h:266 Illegal context switch in RCU read-side critical section!
      [  130.629684]
      [  130.629684] other info that might help us debug this:
      [  130.629684]
      [  130.631022]
      [  130.631022] rcu_scheduler_active = 2, debug_locks = 1
      [  130.632136] 4 locks held by gtp-tunnel/1025:
      [  130.632925]  #0: 000000002b93c8b7 (cb_lock){++++}, at: genl_rcv+0x15/0x40
      [  130.634159]  #1: 00000000f17bc999 (genl_mutex){+.+.}, at: genl_rcv_msg+0xfb/0x130
      [  130.635487]  #2: 00000000c644ed8e (rtnl_mutex){+.+.}, at: gtp_genl_new_pdp+0x18c/0x1150 [gtp]
      [  130.636936]  #3: 0000000007a1cde7 (rcu_read_lock){....}, at: gtp_genl_new_pdp+0x187/0x1150 [gtp]
      [  130.638348]
      [  130.638348] stack backtrace:
      [  130.639062] CPU: 1 PID: 1025 Comm: gtp-tunnel Not tainted 5.2.0-rc6+ #50
      [  130.641318] Call Trace:
      [  130.641707]  dump_stack+0x7c/0xbb
      [  130.642252]  ___might_sleep+0x2c0/0x3b0
      [  130.642862]  kmem_cache_alloc_trace+0x1cd/0x2b0
      [  130.643591]  gtp_genl_new_pdp+0x6c5/0x1150 [gtp]
      [  130.644371]  genl_family_rcv_msg+0x63a/0x1030
      [  130.645074]  ? mutex_lock_io_nested+0x1090/0x1090
      [  130.645845]  ? genl_unregister_family+0x630/0x630
      [  130.646592]  ? debug_show_all_locks+0x2d0/0x2d0
      [  130.647293]  ? check_flags.part.40+0x440/0x440
      [  130.648099]  genl_rcv_msg+0xa3/0x130
      [ ... ]
      
      Fixes: 459aa660 ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3f167e19
    • Taehee Yoo's avatar
      gtp: remove duplicate code in gtp_dellink() · a635037a
      Taehee Yoo authored
      gtp_encap_disable() in gtp_dellink() is unnecessary because it will be
      called by unregister_netdevice().
      unregister_netdevice() internally calls gtp_dev_uninit() by ->ndo_uninit().
      And gtp_dev_uninit() calls gtp_encap_disable().
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a635037a
    • Taehee Yoo's avatar
      gtp: fix use-after-free in gtp_encap_destroy() · 1788b856
      Taehee Yoo authored
      gtp_encap_destroy() is called twice.
      1. When interface is deleted.
      2. When udp socket is destroyed.
      either gtp->sk0 or gtp->sk1u could be freed by sock_put() in
      gtp_encap_destroy(). so, when gtp_encap_destroy() is called again,
      it would uses freed sk pointer.
      
      patch makes gtp_encap_destroy() to set either gtp->sk0 or gtp->sk1u to
      null. in addition, both gtp->sk0 and gtp->sk1u pointer are protected
      by rtnl_lock. so, rtnl_lock() is added.
      
      Test command:
         gtp-link add gtp1 &
         killall gtp-link
         ip link del gtp1
      
      Splat looks like:
      [   83.182767] BUG: KASAN: use-after-free in __lock_acquire+0x3a20/0x46a0
      [   83.184128] Read of size 8 at addr ffff8880cc7d5360 by task ip/1008
      [   83.185567] CPU: 1 PID: 1008 Comm: ip Not tainted 5.2.0-rc6+ #50
      [   83.188469] Call Trace:
      [ ... ]
      [   83.200126]  lock_acquire+0x141/0x380
      [   83.200575]  ? lock_sock_nested+0x3a/0xf0
      [   83.201069]  _raw_spin_lock_bh+0x38/0x70
      [   83.201551]  ? lock_sock_nested+0x3a/0xf0
      [   83.202044]  lock_sock_nested+0x3a/0xf0
      [   83.202520]  gtp_encap_destroy+0x18/0xe0 [gtp]
      [   83.203065]  gtp_encap_disable.isra.14+0x13/0x50 [gtp]
      [   83.203687]  gtp_dellink+0x56/0x170 [gtp]
      [   83.204190]  rtnl_delete_link+0xb4/0x100
      [ ... ]
      [   83.236513] Allocated by task 976:
      [   83.236925]  save_stack+0x19/0x80
      [   83.237332]  __kasan_kmalloc.constprop.3+0xa0/0xd0
      [   83.237894]  kmem_cache_alloc+0xd8/0x280
      [   83.238360]  sk_prot_alloc.isra.42+0x50/0x200
      [   83.238874]  sk_alloc+0x32/0x940
      [   83.239264]  inet_create+0x283/0xc20
      [   83.239684]  __sock_create+0x2dd/0x540
      [   83.240136]  __sys_socket+0xca/0x1a0
      [   83.240550]  __x64_sys_socket+0x6f/0xb0
      [   83.240998]  do_syscall_64+0x9c/0x450
      [   83.241466]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [   83.242061]
      [   83.242249] Freed by task 0:
      [   83.242616]  save_stack+0x19/0x80
      [   83.243013]  __kasan_slab_free+0x111/0x150
      [   83.243498]  kmem_cache_free+0x89/0x250
      [   83.244444]  __sk_destruct+0x38f/0x5a0
      [   83.245366]  rcu_core+0x7e9/0x1c20
      [   83.245766]  __do_softirq+0x213/0x8fa
      
      Fixes: 1e3a3abd ("gtp: make GTP sockets in gtp_newlink optional")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1788b856
    • Taehee Yoo's avatar
      gtp: fix suspicious RCU usage · e198987e
      Taehee Yoo authored
      gtp_encap_enable_socket() and gtp_encap_destroy() are not protected
      by rcu_read_lock(). and it's not safe to write sk->sk_user_data.
      This patch make these functions to use lock_sock() instead of
      rcu_dereference_sk_user_data().
      
      Test commands:
          gtp-link add gtp1
      
      Splat looks like:
      [   83.238315] =============================
      [   83.239127] WARNING: suspicious RCU usage
      [   83.239702] 5.2.0-rc6+ #49 Not tainted
      [   83.240268] -----------------------------
      [   83.241205] drivers/net/gtp.c:799 suspicious rcu_dereference_check() usage!
      [   83.243828]
      [   83.243828] other info that might help us debug this:
      [   83.243828]
      [   83.246325]
      [   83.246325] rcu_scheduler_active = 2, debug_locks = 1
      [   83.247314] 1 lock held by gtp-link/1008:
      [   83.248523]  #0: 0000000017772c7f (rtnl_mutex){+.+.}, at: __rtnl_newlink+0x5f5/0x11b0
      [   83.251503]
      [   83.251503] stack backtrace:
      [   83.252173] CPU: 0 PID: 1008 Comm: gtp-link Not tainted 5.2.0-rc6+ #49
      [   83.253271] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [   83.254562] Call Trace:
      [   83.254995]  dump_stack+0x7c/0xbb
      [   83.255567]  gtp_encap_enable_socket+0x2df/0x360 [gtp]
      [   83.256415]  ? gtp_find_dev+0x1a0/0x1a0 [gtp]
      [   83.257161]  ? memset+0x1f/0x40
      [   83.257843]  gtp_newlink+0x90/0xa21 [gtp]
      [   83.258497]  ? __netlink_ns_capable+0xc3/0xf0
      [   83.259260]  __rtnl_newlink+0xb9f/0x11b0
      [   83.260022]  ? rtnl_link_unregister+0x230/0x230
      [ ... ]
      
      Fixes: 1e3a3abd ("gtp: make GTP sockets in gtp_newlink optional")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e198987e
  2. 07 Jul, 2019 3 commits
  3. 05 Jul, 2019 7 commits
    • Ido Schimmel's avatar
      ipv4: Fix NULL pointer dereference in ipv4_neigh_lookup() · 537de0c8
      Ido Schimmel authored
      Both ip_neigh_gw4() and ip_neigh_gw6() can return either a valid pointer
      or an error pointer, but the code currently checks that the pointer is
      not NULL.
      
      Fix this by checking that the pointer is not an error pointer, as this
      can result in a NULL pointer dereference [1]. Specifically, I believe
      that what happened is that ip_neigh_gw4() returned '-EINVAL'
      (0xffffffffffffffea) to which the offset of 'refcnt' (0x70) was added,
      which resulted in the address 0x000000000000005a.
      
      [1]
       BUG: KASAN: null-ptr-deref in refcount_inc_not_zero_checked+0x6e/0x180
       Read of size 4 at addr 000000000000005a by task swapper/2/0
      
       CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.2.0-rc6-custom-reg-179657-gaa32d89 #396
       Hardware name: Mellanox Technologies Ltd. MSN2010/SA002610, BIOS 5.6.5 08/24/2017
       Call Trace:
       <IRQ>
       dump_stack+0x73/0xbb
       __kasan_report+0x188/0x1ea
       kasan_report+0xe/0x20
       refcount_inc_not_zero_checked+0x6e/0x180
       ipv4_neigh_lookup+0x365/0x12c0
       __neigh_update+0x1467/0x22f0
       arp_process.constprop.6+0x82e/0x1f00
       __netif_receive_skb_one_core+0xee/0x170
       process_backlog+0xe3/0x640
       net_rx_action+0x755/0xd90
       __do_softirq+0x29b/0xae7
       irq_exit+0x177/0x1c0
       smp_apic_timer_interrupt+0x164/0x5e0
       apic_timer_interrupt+0xf/0x20
       </IRQ>
      
      Fixes: 5c9f7c1d ("ipv4: Add helpers for neigh lookup for nexthop")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reported-by: default avatarShalom Toledo <shalomt@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      537de0c8
    • Hayes Wang's avatar
      r8152: set RTL8152_UNPLUG only for real disconnection · ffa9fec3
      Hayes Wang authored
      Set the flag of RTL8152_UNPLUG if and only if the device is unplugged.
      Some error codes sometimes don't mean the real disconnection of usb device.
      For those situations, set the flag of RTL8152_UNPLUG causes the driver skips
      some flows of disabling the device, and it let the device stay at incorrect
      state.
      Signed-off-by: default avatarHayes Wang <hayeswang@realtek.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ffa9fec3
    • David S. Miller's avatar
      Merge branch 'hsr-bug-fixes' · fa804301
      David S. Miller authored
      Cong Wang says:
      
      ====================
      hsr: a few bug fixes
      
      This patchset contains 3 bug fixes for hsr triggered by a syzbot
      reproducer, please check each patch for details.
      ====================
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      fa804301
    • Cong Wang's avatar
      hsr: fix a NULL pointer deref in hsr_dev_xmit() · edf070a0
      Cong Wang authored
      hsr_port_get_hsr() could return NULL and kernel
      could crash:
      
       BUG: kernel NULL pointer dereference, address: 0000000000000010
       #PF: supervisor read access in kernel mode
       #PF: error_code(0x0000) - not-present page
       PGD 8000000074b84067 P4D 8000000074b84067 PUD 7057d067 PMD 0
       Oops: 0000 [#1] SMP PTI
       CPU: 0 PID: 754 Comm: a.out Not tainted 5.2.0-rc6+ #718
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
       RIP: 0010:hsr_dev_xmit+0x20/0x31
       Code: 48 8b 1b eb e0 5b 5d 41 5c c3 66 66 66 66 90 55 48 89 fd 48 8d be 40 0b 00 00 be 04 00 00 00 e8 ee f2 ff ff 48 89 ef 48 89 c6 <48> 8b 40 10 48 89 45 10 e8 6c 1b 00 00 31 c0 5d c3 66 66 66 66 90
       RSP: 0018:ffffb5b400003c48 EFLAGS: 00010246
       RAX: 0000000000000000 RBX: ffff9821b4509a88 RCX: 0000000000000000
       RDX: ffff9821b4509a88 RSI: 0000000000000000 RDI: ffff9821bc3fc7c0
       RBP: ffff9821bc3fc7c0 R08: 0000000000000000 R09: 00000000000c2019
       R10: 0000000000000000 R11: 0000000000000002 R12: ffff9821bc3fc7c0
       R13: ffff9821b4509a88 R14: 0000000000000000 R15: 000000000000006e
       FS:  00007fee112a1800(0000) GS:ffff9821bd800000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000010 CR3: 000000006e9ce000 CR4: 00000000000406f0
       Call Trace:
        <IRQ>
        netdev_start_xmit+0x1b/0x38
        dev_hard_start_xmit+0x121/0x21e
        ? validate_xmit_skb.isra.0+0x19/0x1e3
        __dev_queue_xmit+0x74c/0x823
        ? lockdep_hardirqs_on+0x12b/0x17d
        ip6_finish_output2+0x3d3/0x42c
        ? ip6_mtu+0x55/0x5c
        ? mld_sendpack+0x191/0x229
        mld_sendpack+0x191/0x229
        mld_ifc_timer_expire+0x1f7/0x230
        ? mld_dad_timer_expire+0x58/0x58
        call_timer_fn+0x12e/0x273
        __run_timers.part.0+0x174/0x1b5
        ? mld_dad_timer_expire+0x58/0x58
        ? sched_clock_cpu+0x10/0xad
        ? mark_lock+0x26/0x1f2
        ? __lock_is_held+0x40/0x71
        run_timer_softirq+0x26/0x48
        __do_softirq+0x1af/0x392
        irq_exit+0x53/0xa2
        smp_apic_timer_interrupt+0x1c4/0x1d9
        apic_timer_interrupt+0xf/0x20
        </IRQ>
      
      Cc: Arvid Brodin <arvid.brodin@alten.se>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      edf070a0
    • Cong Wang's avatar
      hsr: implement dellink to clean up resources · b9a1e627
      Cong Wang authored
      hsr_link_ops implements ->newlink() but not ->dellink(),
      which leads that resources not released after removing the device,
      particularly the entries in self_node_db and node_db.
      
      So add ->dellink() implementation to replace the priv_destructor.
      This also makes the code slightly easier to understand.
      
      Reported-by: syzbot+c6167ec3de7def23d1e8@syzkaller.appspotmail.com
      Cc: Arvid Brodin <arvid.brodin@alten.se>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b9a1e627
    • Cong Wang's avatar
      hsr: fix a memory leak in hsr_del_port() · 619afef0
      Cong Wang authored
      hsr_del_port() should release all the resources allocated
      in hsr_add_port().
      
      As a consequence of this change, hsr_for_each_port() is no
      longer safe to work with hsr_del_port(), switch to
      list_for_each_entry_safe() as we always hold RTNL lock.
      
      Cc: Arvid Brodin <arvid.brodin@alten.se>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      619afef0
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec · 114b5b35
      David S. Miller authored
      Steffen Klassert says:
      
      ====================
      pull request (net): ipsec 2019-07-05
      
      1)  Fix xfrm selector prefix length validation for
          inter address family tunneling.
          From Anirudh Gupta.
      
      2) Fix a memleak in pfkey.
         From Jeremy Sowden.
      
      3) Fix SA selector validation to allow empty selectors again.
         From Nicolas Dichtel.
      
      4) Select crypto ciphers for xfrm_algo, this fixes some
         randconfig builds. From Arnd Bergmann.
      
      5) Remove a duplicated assignment in xfrm_bydst_resize.
         From Cong Wang.
      
      6) Fix a hlist corruption on hash rebuild.
         From Florian Westphal.
      
      7) Fix a memory leak when creating xfrm interfaces.
         From Nicolas Dichtel.
      
      Please pull or let me know if there are problems.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      114b5b35
  4. 03 Jul, 2019 15 commits
    • Cong Wang's avatar
      bonding: validate ip header before check IPPROTO_IGMP · 9d1bc24b
      Cong Wang authored
      bond_xmit_roundrobin() checks for IGMP packets but it parses
      the IP header even before checking skb->protocol.
      
      We should validate the IP header with pskb_may_pull() before
      using iph->protocol.
      
      Reported-and-tested-by: syzbot+e5be16aa39ad6e755391@syzkaller.appspotmail.com
      Fixes: a2fd940f ("bonding: fix broken multicast with round-robin mode")
      Cc: Jay Vosburgh <j.vosburgh@gmail.com>
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9d1bc24b
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · c3ead2df
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2019-07-03
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Fix the interpreter to properly handle BPF_ALU32 | BPF_ARSH
         on BE architectures, from Jiong.
      
      2) Fix several bugs in the x32 BPF JIT for handling shifts by 0,
         from Luke and Xi.
      
      3) Fix NULL pointer deref in btf_type_is_resolve_source_only(),
         from Stanislav.
      
      4) Properly handle the check that forwarding is enabled on the device
         in bpf_ipv6_fib_lookup() helper code, from Anton.
      
      5) Fix UAPI bpf_prog_info fields alignment for archs that have 16 bit
         alignment such as m68k, from Baruch.
      
      6) Fix kernel hanging in unregister_netdevice loop while unregistering
         device bound to XDP socket, from Ilya.
      
      7) Properly terminate tail update in xskq_produce_flush_desc(), from Nathan.
      
      8) Fix broken always_inline handling in test_lwt_seg6local, from Jiri.
      
      9) Fix bpftool to use correct argument in cgroup errors, from Jakub.
      
      10) Fix detaching dummy prog in XDP redirect sample code, from Prashant.
      
      11) Add Jonathan to AF_XDP reviewers, from Björn.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c3ead2df
    • Yonglong Liu's avatar
      net: hns: add support for vlan TSO · 0d581ba3
      Yonglong Liu authored
      The hip07 chip support vlan TSO, this patch adds NETIF_F_TSO
      and NETIF_F_TSO6 flags to vlan_features to improve the
      performance after adding vlan to the net ports.
      Signed-off-by: default avatarYonglong Liu <liuyonglong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d581ba3
    • Xin Long's avatar
      sctp: count data bundling sack chunk for outctrlchunks · 7af03301
      Xin Long authored
      Now all ctrl chunks are counted for asoc stats.octrlchunks and net
      SCTP_MIB_OUTCTRLCHUNKS either after queuing up or bundling, other
      than the chunk maked and bundled in sctp_packet_bundle_sack, which
      caused 'outctrlchunks' not consistent with 'inctrlchunks' in peer.
      
      This issue exists since very beginning, here to fix it by increasing
      both net SCTP_MIB_OUTCTRLCHUNKS and asoc stats.octrlchunks when sack
      chunk is maked and bundled in sctp_packet_bundle_sack.
      Reported-by: default avatarJa Ram Jeon <jajeon@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7af03301
    • Hayes Wang's avatar
      r8152: move calling r8153b_rx_agg_chg_indicate() · 9fae5418
      Hayes Wang authored
      r8153b_rx_agg_chg_indicate() needs to be called after enabling TX/RX and
      before calling rxdy_gated_en(tp, false). Otherwise, the change of the
      settings of RX aggregation wouldn't work.
      
      Besides, adjust rtl8152_set_coalesce() for the same reason. If
      rx_coalesce_usecs is changed, restart TX/RX to let the setting work.
      Signed-off-by: default avatarHayes Wang <hayeswang@realtek.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9fae5418
    • Stephen Hemminger's avatar
      net: don't warn in inet diag when IPV6 is disabled · 1e64d7cb
      Stephen Hemminger authored
      If IPV6 was disabled, then ss command would cause a kernel warning
      because the command was attempting to dump IPV6 socket information.
      The fix is to just remove the warning.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202249
      Fixes: 432490f9 ("net: ip, diag -- Add diag interface for raw sockets")
      Signed-off-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1e64d7cb
    • Ilya Maximets's avatar
      xdp: fix hang while unregistering device bound to xdp socket · 455302d1
      Ilya Maximets authored
      Device that bound to XDP socket will not have zero refcount until the
      userspace application will not close it. This leads to hang inside
      'netdev_wait_allrefs()' if device unregistering requested:
      
        # ip link del p1
        < hang on recvmsg on netlink socket >
      
        # ps -x | grep ip
        5126  pts/0    D+   0:00 ip link del p1
      
        # journalctl -b
      
        Jun 05 07:19:16 kernel:
        unregister_netdevice: waiting for p1 to become free. Usage count = 1
      
        Jun 05 07:19:27 kernel:
        unregister_netdevice: waiting for p1 to become free. Usage count = 1
        ...
      
      Fix that by implementing NETDEV_UNREGISTER event notification handler
      to properly clean up all the resources and unref device.
      
      This should also allow socket killing via ss(8) utility.
      
      Fixes: 965a9909 ("xsk: add support for bind for Rx")
      Signed-off-by: default avatarIlya Maximets <i.maximets@samsung.com>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      455302d1
    • Ilya Maximets's avatar
      xdp: hold device for umem regardless of zero-copy mode · 162c820e
      Ilya Maximets authored
      Device pointer stored in umem regardless of zero-copy mode,
      so we heed to hold the device in all cases.
      
      Fixes: c9b47cc1 ("xsk: fix bug when trying to use both copy and zero-copy on one queue id")
      Signed-off-by: default avatarIlya Maximets <i.maximets@samsung.com>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      162c820e
    • Jiri Benc's avatar
      selftests: bpf: fix inlines in test_lwt_seg6local · 11aca65e
      Jiri Benc authored
      Selftests are reporting this failure in test_lwt_seg6local.sh:
      
      + ip netns exec ns2 ip -6 route add fb00::6 encap bpf in obj test_lwt_seg6local.o sec encap_srh dev veth2
      Error fetching program/map!
      Failed to parse eBPF program: Operation not permitted
      
      The problem is __attribute__((always_inline)) alone is not enough to prevent
      clang from inserting those functions in .text. In that case, .text is not
      marked as relocateable.
      
      See the output of objdump -h test_lwt_seg6local.o:
      
      Idx Name          Size      VMA               LMA               File off  Algn
        0 .text         00003530  0000000000000000  0000000000000000  00000040  2**3
                        CONTENTS, ALLOC, LOAD, READONLY, CODE
      
      This causes the iproute bpf loader to fail in bpf_fetch_prog_sec:
      bpf_has_call_data returns true but bpf_fetch_prog_relo fails as there's no
      relocateable .text section in the file.
      
      To fix this, convert to 'static __always_inline'.
      
      v2: Use 'static __always_inline' instead of 'static inline
          __attribute__((always_inline))'
      
      Fixes: c99a84ea ("selftests/bpf: test for seg6local End.BPF action")
      Signed-off-by: default avatarJiri Benc <jbenc@redhat.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      11aca65e
    • Luke Nelson's avatar
      selftests: bpf: add tests for shifts by zero · ac8786c7
      Luke Nelson authored
      There are currently no tests for ALU64 shift operations when the shift
      amount is 0. This adds 6 new tests to make sure they are equivalent
      to a no-op. The x32 JIT had such bugs that could have been caught by
      these tests.
      
      Cc: Xi Wang <xi.wang@gmail.com>
      Signed-off-by: default avatarLuke Nelson <luke.r.nels@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      ac8786c7
    • Luke Nelson's avatar
      bpf, x32: Fix bug with ALU64 {LSH, RSH, ARSH} BPF_K shift by 0 · 6fa632e7
      Luke Nelson authored
      The current x32 BPF JIT does not correctly compile shift operations when
      the immediate shift amount is 0. The expected behavior is for this to
      be a no-op.
      
      The following program demonstrates the bug. The expexceted result is 1,
      but the current JITed code returns 2.
      
        r0 = 1
        r1 = 1
        r1 <<= 0
        if r1 == 1 goto end
        r0 = 2
      end:
        exit
      
      This patch simplifies the code and fixes the bug.
      
      Fixes: 03f5781b ("bpf, x86_32: add eBPF JIT compiler for ia32")
      Co-developed-by: default avatarXi Wang <xi.wang@gmail.com>
      Signed-off-by: default avatarXi Wang <xi.wang@gmail.com>
      Signed-off-by: default avatarLuke Nelson <luke.r.nels@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      6fa632e7
    • Luke Nelson's avatar
      bpf, x32: Fix bug with ALU64 {LSH, RSH, ARSH} BPF_X shift by 0 · 68a8357e
      Luke Nelson authored
      The current x32 BPF JIT for shift operations is not correct when the
      shift amount in a register is 0. The expected behavior is a no-op, whereas
      the current implementation changes bits in the destination register.
      
      The following example demonstrates the bug. The expected result of this
      program is 1, but the current JITed code returns 2.
      
        r0 = 1
        r1 = 1
        r2 = 0
        r1 <<= r2
        if r1 == 1 goto end
        r0 = 2
      end:
        exit
      
      The bug is caused by an incorrect assumption by the JIT that a shift by
      32 clear the register. On x32 however, shifts use the lower 5 bits of
      the source, making a shift by 32 equivalent to a shift by 0.
      
      This patch fixes the bug using double-precision shifts, which also
      simplifies the code.
      
      Fixes: 03f5781b ("bpf, x86_32: add eBPF JIT compiler for ia32")
      Co-developed-by: default avatarXi Wang <xi.wang@gmail.com>
      Signed-off-by: default avatarXi Wang <xi.wang@gmail.com>
      Signed-off-by: default avatarLuke Nelson <luke.r.nels@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      68a8357e
    • Nicolas Dichtel's avatar
      xfrm interface: fix memory leak on creation · 56c5ee1a
      Nicolas Dichtel authored
      The following commands produce a backtrace and return an error but the xfrm
      interface is created (in the wrong netns):
      $ ip netns add foo
      $ ip netns add bar
      $ ip -n foo netns set bar 0
      $ ip -n foo link add xfrmi0 link-netnsid 0 type xfrm dev lo if_id 23
      RTNETLINK answers: Invalid argument
      $ ip -n bar link ls xfrmi0
      2: xfrmi0@lo: <NOARP,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
          link/none 00:00:00:00:00:00 brd 00:00:00:00:00:00
      
      Here is the backtrace:
      [   79.879174] WARNING: CPU: 0 PID: 1178 at net/core/dev.c:8172 rollback_registered_many+0x86/0x3c1
      [   79.880260] Modules linked in: xfrm_interface nfsv3 nfs_acl auth_rpcgss nfsv4 nfs lockd grace sunrpc fscache button parport_pc parport serio_raw evdev pcspkr loop ext4 crc16 mbcache jbd2 crc32c_generic ide_cd_mod ide_gd_mod cdrom ata_$
      eneric ata_piix libata scsi_mod 8139too piix psmouse i2c_piix4 ide_core 8139cp mii i2c_core floppy
      [   79.883698] CPU: 0 PID: 1178 Comm: ip Not tainted 5.2.0-rc6+ #106
      [   79.884462] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
      [   79.885447] RIP: 0010:rollback_registered_many+0x86/0x3c1
      [   79.886120] Code: 01 e8 d7 7d c6 ff 0f 0b 48 8b 45 00 4c 8b 20 48 8d 58 90 49 83 ec 70 48 8d 7b 70 48 39 ef 74 44 8a 83 d0 04 00 00 84 c0 75 1f <0f> 0b e8 61 cd ff ff 48 b8 00 01 00 00 00 00 ad de 48 89 43 70 66
      [   79.888667] RSP: 0018:ffffc900015ab740 EFLAGS: 00010246
      [   79.889339] RAX: ffff8882353e5700 RBX: ffff8882353e56a0 RCX: ffff8882353e5710
      [   79.890174] RDX: ffffc900015ab7e0 RSI: ffffc900015ab7e0 RDI: ffff8882353e5710
      [   79.891029] RBP: ffffc900015ab7e0 R08: ffffc900015ab7e0 R09: ffffc900015ab7e0
      [   79.891866] R10: ffffc900015ab7a0 R11: ffffffff82233fec R12: ffffc900015ab770
      [   79.892728] R13: ffffffff81eb7ec0 R14: ffff88822ed6cf00 R15: 00000000ffffffea
      [   79.893557] FS:  00007ff350f31740(0000) GS:ffff888237a00000(0000) knlGS:0000000000000000
      [   79.894581] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   79.895317] CR2: 00000000006c8580 CR3: 000000022c272000 CR4: 00000000000006f0
      [   79.896137] Call Trace:
      [   79.896464]  unregister_netdevice_many+0x12/0x6c
      [   79.896998]  __rtnl_newlink+0x6e2/0x73b
      [   79.897446]  ? __kmalloc_node_track_caller+0x15e/0x185
      [   79.898039]  ? pskb_expand_head+0x5f/0x1fe
      [   79.898556]  ? stack_access_ok+0xd/0x2c
      [   79.899009]  ? deref_stack_reg+0x12/0x20
      [   79.899462]  ? stack_access_ok+0xd/0x2c
      [   79.899927]  ? stack_access_ok+0xd/0x2c
      [   79.900404]  ? __module_text_address+0x9/0x4f
      [   79.900910]  ? is_bpf_text_address+0x5/0xc
      [   79.901390]  ? kernel_text_address+0x67/0x7b
      [   79.901884]  ? __kernel_text_address+0x1a/0x25
      [   79.902397]  ? unwind_get_return_address+0x12/0x23
      [   79.903122]  ? __cmpxchg_double_slab.isra.37+0x46/0x77
      [   79.903772]  rtnl_newlink+0x43/0x56
      [   79.904217]  rtnetlink_rcv_msg+0x200/0x24c
      
      In fact, each time a xfrm interface was created, a netdev was allocated
      by __rtnl_newlink()/rtnl_create_link() and then another one by
      xfrmi_newlink()/xfrmi_create(). Only the second one was registered, it's
      why the previous commands produce a backtrace: dev_change_net_namespace()
      was called on a netdev with reg_state set to NETREG_UNINITIALIZED (the
      first one).
      
      CC: Lorenzo Colitti <lorenzo@google.com>
      CC: Benedict Wong <benedictwong@google.com>
      CC: Steffen Klassert <steffen.klassert@secunet.com>
      CC: Shannon Nelson <shannon.nelson@oracle.com>
      CC: Antony Antony <antony@phenome.org>
      CC: Eyal Birger <eyal.birger@gmail.com>
      Fixes: f203b76d ("xfrm: Add virtual xfrm interfaces")
      Reported-by: default avatarJulien Floret <julien.floret@6wind.com>
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      56c5ee1a
    • Florian Westphal's avatar
      xfrm: policy: fix bydst hlist corruption on hash rebuild · fd709721
      Florian Westphal authored
      syzbot reported following spat:
      
      BUG: KASAN: use-after-free in __write_once_size include/linux/compiler.h:221
      BUG: KASAN: use-after-free in hlist_del_rcu include/linux/rculist.h:455
      BUG: KASAN: use-after-free in xfrm_hash_rebuild+0xa0d/0x1000 net/xfrm/xfrm_policy.c:1318
      Write of size 8 at addr ffff888095e79c00 by task kworker/1:3/8066
      Workqueue: events xfrm_hash_rebuild
      Call Trace:
       __write_once_size include/linux/compiler.h:221 [inline]
       hlist_del_rcu include/linux/rculist.h:455 [inline]
       xfrm_hash_rebuild+0xa0d/0x1000 net/xfrm/xfrm_policy.c:1318
       process_one_work+0x814/0x1130 kernel/workqueue.c:2269
      Allocated by task 8064:
       __kmalloc+0x23c/0x310 mm/slab.c:3669
       kzalloc include/linux/slab.h:742 [inline]
       xfrm_hash_alloc+0x38/0xe0 net/xfrm/xfrm_hash.c:21
       xfrm_policy_init net/xfrm/xfrm_policy.c:4036 [inline]
       xfrm_net_init+0x269/0xd60 net/xfrm/xfrm_policy.c:4120
       ops_init+0x336/0x420 net/core/net_namespace.c:130
       setup_net+0x212/0x690 net/core/net_namespace.c:316
      
      The faulting address is the address of the old chain head,
      free'd by xfrm_hash_resize().
      
      In xfrm_hash_rehash(), chain heads get re-initialized without
      any hlist_del_rcu:
      
       for (i = hmask; i >= 0; i--)
          INIT_HLIST_HEAD(odst + i);
      
      Then, hlist_del_rcu() gets called on the about to-be-reinserted policy
      when iterating the per-net list of policies.
      
      hlist_del_rcu() will then make chain->first be nonzero again:
      
      static inline void __hlist_del(struct hlist_node *n)
      {
         struct hlist_node *next = n->next;   // address of next element in list
         struct hlist_node **pprev = n->pprev;// location of previous elem, this
                                              // can point at chain->first
              WRITE_ONCE(*pprev, next);       // chain->first points to next elem
              if (next)
                      next->pprev = pprev;
      
      Then, when we walk chainlist to find insertion point, we may find a
      non-empty list even though we're supposedly reinserting the first
      policy to an empty chain.
      
      To fix this first unlink all exact and inexact policies instead of
      zeroing the list heads.
      
      Add the commands equivalent to the syzbot reproducer to xfrm_policy.sh,
      without fix KASAN catches the corruption as it happens, SLUB poisoning
      detects it a bit later.
      
      Reported-by: syzbot+0165480d4ef07360eeda@syzkaller.appspotmail.com
      Fixes: 1548bc4e ("xfrm: policy: delete inexact policies from inexact list on hash rebuild")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      fd709721
    • Po-Hsu Lin's avatar
      selftests/net: skip psock_tpacket test if KALLSYMS was not enabled · ff95bf28
      Po-Hsu Lin authored
      The psock_tpacket test will need to access /proc/kallsyms, this would
      require the kernel config CONFIG_KALLSYMS to be enabled first.
      
      Apart from adding CONFIG_KALLSYMS to the net/config file here, check the
      file existence to determine if we can run this test will be helpful to
      avoid a false-positive test result when testing it directly with the
      following commad against a kernel that have CONFIG_KALLSYMS disabled:
          make -C tools/testing/selftests TARGETS=net run_tests
      Signed-off-by: default avatarPo-Hsu Lin <po-hsu.lin@canonical.com>
      Acked-by: default avatarShuah Khan <skhan@linuxfoundation.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ff95bf28
  5. 02 Jul, 2019 11 commits
    • David Howells's avatar
      rxrpc: Fix oops in tracepoint · 99f0eae6
      David Howells authored
      If the rxrpc_eproto tracepoint is enabled, an oops will be cause by the
      trace line that rxrpc_extract_header() tries to emit when a protocol error
      occurs (typically because the packet is short) because the call argument is
      NULL.
      
      Fix this by using ?: to assume 0 as the debug_id if call is NULL.
      
      This can then be induced by:
      
      	echo -e '\0\0\0\0\0\0\0\0' | ncat -4u --send-only <addr> 20001
      
      where addr has the following program running on it:
      
      	#include <stdio.h>
      	#include <stdlib.h>
      	#include <string.h>
      	#include <unistd.h>
      	#include <sys/socket.h>
      	#include <arpa/inet.h>
      	#include <linux/rxrpc.h>
      	int main(void)
      	{
      		struct sockaddr_rxrpc srx;
      		int fd;
      		memset(&srx, 0, sizeof(srx));
      		srx.srx_family			= AF_RXRPC;
      		srx.srx_service			= 0;
      		srx.transport_type		= AF_INET;
      		srx.transport_len		= sizeof(srx.transport.sin);
      		srx.transport.sin.sin_family	= AF_INET;
      		srx.transport.sin.sin_port	= htons(0x4e21);
      		fd = socket(AF_RXRPC, SOCK_DGRAM, AF_INET6);
      		bind(fd, (struct sockaddr *)&srx, sizeof(srx));
      		sleep(20);
      		return 0;
      	}
      
      It results in the following oops.
      
      	BUG: kernel NULL pointer dereference, address: 0000000000000340
      	#PF: supervisor read access in kernel mode
      	#PF: error_code(0x0000) - not-present page
      	...
      	RIP: 0010:trace_event_raw_event_rxrpc_rx_eproto+0x47/0xac
      	...
      	Call Trace:
      	 <IRQ>
      	 rxrpc_extract_header+0x86/0x171
      	 ? rcu_read_lock_sched_held+0x5d/0x63
      	 ? rxrpc_new_skb+0xd4/0x109
      	 rxrpc_input_packet+0xef/0x14fc
      	 ? rxrpc_input_data+0x986/0x986
      	 udp_queue_rcv_one_skb+0xbf/0x3d0
      	 udp_unicast_rcv_skb.isra.8+0x64/0x71
      	 ip_protocol_deliver_rcu+0xe4/0x1b4
      	 ip_local_deliver+0xf0/0x154
      	 __netif_receive_skb_one_core+0x50/0x6c
      	 netif_receive_skb_internal+0x26b/0x2e9
      	 napi_gro_receive+0xf8/0x1da
      	 rtl8169_poll+0x303/0x4c4
      	 net_rx_action+0x10e/0x333
      	 __do_softirq+0x1a5/0x38f
      	 irq_exit+0x54/0xc4
      	 do_IRQ+0xda/0xf8
      	 common_interrupt+0xf/0xf
      	 </IRQ>
      	 ...
      	 ? cpuidle_enter_state+0x23c/0x34d
      	 cpuidle_enter+0x2a/0x36
      	 do_idle+0x163/0x1ea
      	 cpu_startup_entry+0x1d/0x1f
      	 start_secondary+0x157/0x172
      	 secondary_startup_64+0xa4/0xb0
      
      Fixes: a25e21f0 ("rxrpc, afs: Use debug_ids rather than pointers in traces")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      99f0eae6
    • Phong Tran's avatar
      net: usb: asix: init MAC address buffers · 78226f6e
      Phong Tran authored
      This is for fixing bug KMSAN: uninit-value in ax88772_bind
      
      Tested by
      https://groups.google.com/d/msg/syzkaller-bugs/aFQurGotng4/eB_HlNhhCwAJ
      
      Reported-by: syzbot+8a3fc6674bbc3978ed4e@syzkaller.appspotmail.com
      
      syzbot found the following crash on:
      
      HEAD commit:    f75e4cfe kmsan: use kmsan_handle_urb() in urb.c
      git tree:       kmsan
      console output: https://syzkaller.appspot.com/x/log.txt?x=136d720ea00000
      kernel config:
      https://syzkaller.appspot.com/x/.config?x=602468164ccdc30a
      dashboard link:
      https://syzkaller.appspot.com/bug?extid=8a3fc6674bbc3978ed4e
      compiler:       clang version 9.0.0 (/home/glider/llvm/clang
      06d00afa61eef8f7f501ebdb4e8612ea43ec2d78)
      syz repro:
      https://syzkaller.appspot.com/x/repro.syz?x=12788316a00000
      C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=120359aaa00000
      
      ==================================================================
      BUG: KMSAN: uninit-value in is_valid_ether_addr
      include/linux/etherdevice.h:200 [inline]
      BUG: KMSAN: uninit-value in asix_set_netdev_dev_addr
      drivers/net/usb/asix_devices.c:73 [inline]
      BUG: KMSAN: uninit-value in ax88772_bind+0x93d/0x11e0
      drivers/net/usb/asix_devices.c:724
      CPU: 0 PID: 3348 Comm: kworker/0:2 Not tainted 5.1.0+ #1
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      Workqueue: usb_hub_wq hub_event
      Call Trace:
        __dump_stack lib/dump_stack.c:77 [inline]
        dump_stack+0x191/0x1f0 lib/dump_stack.c:113
        kmsan_report+0x130/0x2a0 mm/kmsan/kmsan.c:622
        __msan_warning+0x75/0xe0 mm/kmsan/kmsan_instr.c:310
        is_valid_ether_addr include/linux/etherdevice.h:200 [inline]
        asix_set_netdev_dev_addr drivers/net/usb/asix_devices.c:73 [inline]
        ax88772_bind+0x93d/0x11e0 drivers/net/usb/asix_devices.c:724
        usbnet_probe+0x10f5/0x3940 drivers/net/usb/usbnet.c:1728
        usb_probe_interface+0xd66/0x1320 drivers/usb/core/driver.c:361
        really_probe+0xdae/0x1d80 drivers/base/dd.c:513
        driver_probe_device+0x1b3/0x4f0 drivers/base/dd.c:671
        __device_attach_driver+0x5b8/0x790 drivers/base/dd.c:778
        bus_for_each_drv+0x28e/0x3b0 drivers/base/bus.c:454
        __device_attach+0x454/0x730 drivers/base/dd.c:844
        device_initial_probe+0x4a/0x60 drivers/base/dd.c:891
        bus_probe_device+0x137/0x390 drivers/base/bus.c:514
        device_add+0x288d/0x30e0 drivers/base/core.c:2106
        usb_set_configuration+0x30dc/0x3750 drivers/usb/core/message.c:2027
        generic_probe+0xe7/0x280 drivers/usb/core/generic.c:210
        usb_probe_device+0x14c/0x200 drivers/usb/core/driver.c:266
        really_probe+0xdae/0x1d80 drivers/base/dd.c:513
        driver_probe_device+0x1b3/0x4f0 drivers/base/dd.c:671
        __device_attach_driver+0x5b8/0x790 drivers/base/dd.c:778
        bus_for_each_drv+0x28e/0x3b0 drivers/base/bus.c:454
        __device_attach+0x454/0x730 drivers/base/dd.c:844
        device_initial_probe+0x4a/0x60 drivers/base/dd.c:891
        bus_probe_device+0x137/0x390 drivers/base/bus.c:514
        device_add+0x288d/0x30e0 drivers/base/core.c:2106
        usb_new_device+0x23e5/0x2ff0 drivers/usb/core/hub.c:2534
        hub_port_connect drivers/usb/core/hub.c:5089 [inline]
        hub_port_connect_change drivers/usb/core/hub.c:5204 [inline]
        port_event drivers/usb/core/hub.c:5350 [inline]
        hub_event+0x48d1/0x7290 drivers/usb/core/hub.c:5432
        process_one_work+0x1572/0x1f00 kernel/workqueue.c:2269
        process_scheduled_works kernel/workqueue.c:2331 [inline]
        worker_thread+0x189c/0x2460 kernel/workqueue.c:2417
        kthread+0x4b5/0x4f0 kernel/kthread.c:254
        ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:355
      Signed-off-by: default avatarPhong Tran <tranmanphong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      78226f6e
    • David S. Miller's avatar
      Merge branch 'macsec-fix-some-bugs-in-the-receive-path' · bc389fd1
      David S. Miller authored
      Andreas Steinmetz says:
      
      ====================
      macsec: fix some bugs in the receive path
      
      This series fixes some bugs in the receive path of macsec. The first
      is a use after free when processing macsec frames with a SecTAG that
      has the TCI E bit set but the C bit clear. In the 2nd bug, the driver
      leaves an invalid checksumming state after decrypting the packet.
      
      This is a combined effort of Sabrina Dubroca <sd@queasysnail.net> and me.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bc389fd1
    • Andreas Steinmetz's avatar
      macsec: fix checksumming after decryption · 7d8b16b9
      Andreas Steinmetz authored
      Fix checksumming after decryption.
      Signed-off-by: default avatarAndreas Steinmetz <ast@domdv.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7d8b16b9
    • Andreas Steinmetz's avatar
      macsec: fix use-after-free of skb during RX · 095c02da
      Andreas Steinmetz authored
      Fix use-after-free of skb when rx_handler returns RX_HANDLER_PASS.
      Signed-off-by: default avatarAndreas Steinmetz <ast@domdv.de>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      095c02da
    • David Howells's avatar
      rxrpc: Fix send on a connected, but unbound socket · e835ada0
      David Howells authored
      If sendmsg() or sendmmsg() is called on a connected socket that hasn't had
      bind() called on it, then an oops will occur when the kernel tries to
      connect the call because no local endpoint has been allocated.
      
      Fix this by implicitly binding the socket if it is in the
      RXRPC_CLIENT_UNBOUND state, just like it does for the RXRPC_UNBOUND state.
      
      Further, the state should be transitioned to RXRPC_CLIENT_BOUND after this
      to prevent further attempts to bind it.
      
      This can be tested with:
      
      	#include <stdio.h>
      	#include <stdlib.h>
      	#include <string.h>
      	#include <sys/socket.h>
      	#include <arpa/inet.h>
      	#include <linux/rxrpc.h>
      	static const unsigned char inet6_addr[16] = {
      		0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, -1, 0xac, 0x14, 0x14, 0xaa
      	};
      	int main(void)
      	{
      		struct sockaddr_rxrpc srx;
      		struct cmsghdr *cm;
      		struct msghdr msg;
      		unsigned char control[16];
      		int fd;
      		memset(&srx, 0, sizeof(srx));
      		srx.srx_family = 0x21;
      		srx.srx_service = 0;
      		srx.transport_type = AF_INET;
      		srx.transport_len = 0x1c;
      		srx.transport.sin6.sin6_family = AF_INET6;
      		srx.transport.sin6.sin6_port = htons(0x4e22);
      		srx.transport.sin6.sin6_flowinfo = htons(0x4e22);
      		srx.transport.sin6.sin6_scope_id = htons(0xaa3b);
      		memcpy(&srx.transport.sin6.sin6_addr, inet6_addr, 16);
      		cm = (struct cmsghdr *)control;
      		cm->cmsg_len	= CMSG_LEN(sizeof(unsigned long));
      		cm->cmsg_level	= SOL_RXRPC;
      		cm->cmsg_type	= RXRPC_USER_CALL_ID;
      		*(unsigned long *)CMSG_DATA(cm) = 0;
      		msg.msg_name = NULL;
      		msg.msg_namelen = 0;
      		msg.msg_iov = NULL;
      		msg.msg_iovlen = 0;
      		msg.msg_control = control;
      		msg.msg_controllen = cm->cmsg_len;
      		msg.msg_flags = 0;
      		fd = socket(AF_RXRPC, SOCK_DGRAM, AF_INET);
      		connect(fd, (struct sockaddr *)&srx, sizeof(srx));
      		sendmsg(fd, &msg, 0);
      		return 0;
      	}
      
      Leading to the following oops:
      
      	BUG: kernel NULL pointer dereference, address: 0000000000000018
      	#PF: supervisor read access in kernel mode
      	#PF: error_code(0x0000) - not-present page
      	...
      	RIP: 0010:rxrpc_connect_call+0x42/0xa01
      	...
      	Call Trace:
      	 ? mark_held_locks+0x47/0x59
      	 ? __local_bh_enable_ip+0xb6/0xba
      	 rxrpc_new_client_call+0x3b1/0x762
      	 ? rxrpc_do_sendmsg+0x3c0/0x92e
      	 rxrpc_do_sendmsg+0x3c0/0x92e
      	 rxrpc_sendmsg+0x16b/0x1b5
      	 sock_sendmsg+0x2d/0x39
      	 ___sys_sendmsg+0x1a4/0x22a
      	 ? release_sock+0x19/0x9e
      	 ? reacquire_held_locks+0x136/0x160
      	 ? release_sock+0x19/0x9e
      	 ? find_held_lock+0x2b/0x6e
      	 ? __lock_acquire+0x268/0xf73
      	 ? rxrpc_connect+0xdd/0xe4
      	 ? __local_bh_enable_ip+0xb6/0xba
      	 __sys_sendmsg+0x5e/0x94
      	 do_syscall_64+0x7d/0x1bf
      	 entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fixes: 2341e077 ("rxrpc: Simplify connect() implementation and simplify sendmsg() op")
      Reported-by: syzbot+7966f2a0b2c7da8939b4@syzkaller.appspotmail.com
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e835ada0
    • David S. Miller's avatar
      Merge branch 'bridge-stale-ptrs' · f2f17175
      David S. Miller authored
      Nikolay Aleksandrov says:
      
      ====================
      net: bridge: fix possible stale skb pointers
      
      In the bridge driver we have a couple of places which call pskb_may_pull
      but we've cached skb pointers before that and use them after which can
      lead to out-of-bounds/stale pointer use. I've had these in my "to fix"
      list for some time and now we got a report (patch 01) so here they are.
      Patches 02-04 are fixes based on code inspection. Also patch 01 was
      tested by Martin Weinelt, Martin if you don't mind please add your
      tested-by tag to it by replying with Tested-by: name <email>.
      I've also briefly tested the set by trying to exercise those code paths.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f2f17175
    • Nikolay Aleksandrov's avatar
      net: bridge: stp: don't cache eth dest pointer before skb pull · 2446a68a
      Nikolay Aleksandrov authored
      Don't cache eth dest pointer before calling pskb_may_pull.
      
      Fixes: cf0f02d0 ("[BRIDGE]: use llc for receiving STP packets")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2446a68a
    • Nikolay Aleksandrov's avatar
      net: bridge: don't cache ether dest pointer on input · 3d26eb8a
      Nikolay Aleksandrov authored
      We would cache ether dst pointer on input in br_handle_frame_finish but
      after the neigh suppress code that could lead to a stale pointer since
      both ipv4 and ipv6 suppress code do pskb_may_pull. This means we have to
      always reload it after the suppress code so there's no point in having
      it cached just retrieve it directly.
      
      Fixes: 057658cb ("bridge: suppress arp pkts on BR_NEIGH_SUPPRESS ports")
      Fixes: ed842fae ("bridge: suppress nd pkts on BR_NEIGH_SUPPRESS ports")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3d26eb8a
    • Nikolay Aleksandrov's avatar
      net: bridge: mcast: fix stale ipv6 hdr pointer when handling v6 query · 3b26a5d0
      Nikolay Aleksandrov authored
      We get a pointer to the ipv6 hdr in br_ip6_multicast_query but we may
      call pskb_may_pull afterwards and end up using a stale pointer.
      So use the header directly, it's just 1 place where it's needed.
      
      Fixes: 08b202b6 ("bridge br_multicast: IPv6 MLD support.")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Tested-by: default avatarMartin Weinelt <martin@linuxlounge.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3b26a5d0
    • Nikolay Aleksandrov's avatar
      net: bridge: mcast: fix stale nsrcs pointer in igmp3/mld2 report handling · e57f6185
      Nikolay Aleksandrov authored
      We take a pointer to grec prior to calling pskb_may_pull and use it
      afterwards to get nsrcs so record nsrcs before the pull when handling
      igmp3 and we get a pointer to nsrcs and call pskb_may_pull when handling
      mld2 which again could lead to reading 2 bytes out-of-bounds.
      
       ==================================================================
       BUG: KASAN: use-after-free in br_multicast_rcv+0x480c/0x4ad0 [bridge]
       Read of size 2 at addr ffff8880421302b4 by task ksoftirqd/1/16
      
       CPU: 1 PID: 16 Comm: ksoftirqd/1 Tainted: G           OE     5.2.0-rc6+ #1
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
       Call Trace:
        dump_stack+0x71/0xab
        print_address_description+0x6a/0x280
        ? br_multicast_rcv+0x480c/0x4ad0 [bridge]
        __kasan_report+0x152/0x1aa
        ? br_multicast_rcv+0x480c/0x4ad0 [bridge]
        ? br_multicast_rcv+0x480c/0x4ad0 [bridge]
        kasan_report+0xe/0x20
        br_multicast_rcv+0x480c/0x4ad0 [bridge]
        ? br_multicast_disable_port+0x150/0x150 [bridge]
        ? ktime_get_with_offset+0xb4/0x150
        ? __kasan_kmalloc.constprop.6+0xa6/0xf0
        ? __netif_receive_skb+0x1b0/0x1b0
        ? br_fdb_update+0x10e/0x6e0 [bridge]
        ? br_handle_frame_finish+0x3c6/0x11d0 [bridge]
        br_handle_frame_finish+0x3c6/0x11d0 [bridge]
        ? br_pass_frame_up+0x3a0/0x3a0 [bridge]
        ? virtnet_probe+0x1c80/0x1c80 [virtio_net]
        br_handle_frame+0x731/0xd90 [bridge]
        ? select_idle_sibling+0x25/0x7d0
        ? br_handle_frame_finish+0x11d0/0x11d0 [bridge]
        __netif_receive_skb_core+0xced/0x2d70
        ? virtqueue_get_buf_ctx+0x230/0x1130 [virtio_ring]
        ? do_xdp_generic+0x20/0x20
        ? virtqueue_napi_complete+0x39/0x70 [virtio_net]
        ? virtnet_poll+0x94d/0xc78 [virtio_net]
        ? receive_buf+0x5120/0x5120 [virtio_net]
        ? __netif_receive_skb_one_core+0x97/0x1d0
        __netif_receive_skb_one_core+0x97/0x1d0
        ? __netif_receive_skb_core+0x2d70/0x2d70
        ? _raw_write_trylock+0x100/0x100
        ? __queue_work+0x41e/0xbe0
        process_backlog+0x19c/0x650
        ? _raw_read_lock_irq+0x40/0x40
        net_rx_action+0x71e/0xbc0
        ? __switch_to_asm+0x40/0x70
        ? napi_complete_done+0x360/0x360
        ? __switch_to_asm+0x34/0x70
        ? __switch_to_asm+0x40/0x70
        ? __schedule+0x85e/0x14d0
        __do_softirq+0x1db/0x5f9
        ? takeover_tasklets+0x5f0/0x5f0
        run_ksoftirqd+0x26/0x40
        smpboot_thread_fn+0x443/0x680
        ? sort_range+0x20/0x20
        ? schedule+0x94/0x210
        ? __kthread_parkme+0x78/0xf0
        ? sort_range+0x20/0x20
        kthread+0x2ae/0x3a0
        ? kthread_create_worker_on_cpu+0xc0/0xc0
        ret_from_fork+0x35/0x40
      
       The buggy address belongs to the page:
       page:ffffea0001084c00 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0
       flags: 0xffffc000000000()
       raw: 00ffffc000000000 ffffea0000cfca08 ffffea0001098608 0000000000000000
       raw: 0000000000000000 0000000000000003 00000000ffffff7f 0000000000000000
       page dumped because: kasan: bad access detected
      
       Memory state around the buggy address:
       ffff888042130180: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ffff888042130200: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       > ffff888042130280: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                                           ^
       ffff888042130300: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ffff888042130380: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ==================================================================
       Disabling lock debugging due to kernel taint
      
      Fixes: bc8c20ac ("bridge: multicast: treat igmpv3 report with INCLUDE and no sources as a leave")
      Reported-by: default avatarMartin Weinelt <martin@linuxlounge.net>
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Tested-by: default avatarMartin Weinelt <martin@linuxlounge.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e57f6185