1. 15 Dec, 2016 4 commits
    • Helge Deller's avatar
      parisc: Fix TLB related boot crash on SMP machines · a0bd6aa0
      Helge Deller authored
      commit 24d0492b upstream.
      
      At bootup we run measurements to calculate the best threshold for when we
      should be using full TLB flushes instead of just flushing a specific amount of
      TLB entries.  This performance test is run over the kernel text segment.
      
      But running this TLB performance test on the kernel text segment turned out to
      crash some SMP machines when the kernel text pages were mapped as huge pages.
      
      To avoid those crashes this patch simply skips this test on some SMP machines
      and calculates an optimal threshold based on the maximum number of available
      TLB entries and number of online CPUs.
      
      On a technical side, this seems to happen:
      The TLB measurement code uses flush_tlb_kernel_range() to flush specific TLB
      entries with a page size of 4k (pdtlb 0(sr1,addr)). On UP systems this purge
      instruction seems to work without problems even if the pages were mapped as
      huge pages.  But on SMP systems the TLB purge instruction is broadcasted to
      other CPUs. Those CPUs then crash the machine because the page size is not as
      expected.  C8000 machines with PA8800/PA8900 CPUs were not affected by this
      problem, because the required cache coherency prohibits to use huge pages at
      all.  Sadly I didn't found any documentation about this behaviour, so this
      finding is purely based on testing with phyiscal SMP machines (A500-44 and
      J5000, both were 2-way boxes).
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a0bd6aa0
    • John David Anglin's avatar
      parisc: Remove unnecessary TLB purges from flush_dcache_page_asm and flush_icache_page_asm · 605f315c
      John David Anglin authored
      commit febe4296 upstream.
      
      We have four routines in pacache.S that use temporary alias pages:
      copy_user_page_asm(), clear_user_page_asm(), flush_dcache_page_asm() and
      flush_icache_page_asm().  copy_user_page_asm() and clear_user_page_asm()
      don't purge the TLB entry used for the operation.
      flush_dcache_page_asm() and flush_icache_page_asm do purge the entry.
      
      Presumably, this was thought to optimize TLB use.  However, the
      operation is quite heavy weight on PA 1.X processors as we need to take
      the TLB lock and a TLB broadcast is sent to all processors.
      
      This patch removes the purges from flush_dcache_page_asm() and
      flush_icache_page_asm.
      Signed-off-by: default avatarJohn David Anglin  <dave.anglin@bell.net>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      605f315c
    • John David Anglin's avatar
      parisc: Purge TLB before setting PTE · db959860
      John David Anglin authored
      commit c78e710c upstream.
      
      The attached change interchanges the order of purging the TLB and
      setting the corresponding page table entry.  TLB purges are strongly
      ordered.  It occurred to me one night that setting the PTE first might
      have subtle ordering issues on SMP machines and cause random memory
      corruption.
      
      A TLB lock guards the insertion of user TLB entries.  So after the TLB
      is purged, a new entry can't be inserted until the lock is released.
      This ensures that the new PTE value is used when the lock is released.
      
      Since making this change, no random segmentation faults have been
      observed on the Debian hppa buildd servers.
      Signed-off-by: default avatarJohn David Anglin  <dave.anglin@bell.net>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      db959860
    • Andrew Donnellan's avatar
      powerpc/eeh: Fix deadlock when PE frozen state can't be cleared · 4bcea472
      Andrew Donnellan authored
      commit 409bf7f8 upstream.
      
      In eeh_reset_device(), we take the pci_rescan_remove_lock immediately after
      after we call eeh_reset_pe() to reset the PCI controller. We then call
      eeh_clear_pe_frozen_state(), which can return an error. In this case, we
      bail out of eeh_reset_device() without calling pci_unlock_rescan_remove().
      
      Add a call to pci_unlock_rescan_remove() in the eeh_clear_pe_frozen_state()
      error path so that we don't cause a deadlock later on.
      Reported-by: default avatarPradipta Ghosh <pradghos@in.ibm.com>
      Fixes: 78954700 ("powerpc/eeh: Avoid I/O access during PE reset")
      Signed-off-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Acked-by: default avatarRussell Currey <ruscur@russell.cc>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4bcea472
  2. 10 Dec, 2016 29 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.4.38 · c95b7f1f
      Greg Kroah-Hartman authored
      c95b7f1f
    • Tobias Brunner's avatar
      esp6: Fix integrity verification when ESN are used · 52783ada
      Tobias Brunner authored
      commit a55e2386 upstream.
      
      When handling inbound packets, the two halves of the sequence number
      stored on the skb are already in network order.
      
      Fixes: 000ae7b2 ("esp6: Switch to new AEAD interface")
      Signed-off-by: default avatarTobias Brunner <tobias@strongswan.org>
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      52783ada
    • Tobias Brunner's avatar
      esp4: Fix integrity verification when ESN are used · 3bf28ce9
      Tobias Brunner authored
      commit 7c7fedd5 upstream.
      
      When handling inbound packets, the two halves of the sequence number
      stored on the skb are already in network order.
      
      Fixes: 7021b2e1 ("esp4: Switch to new AEAD interface")
      Signed-off-by: default avatarTobias Brunner <tobias@strongswan.org>
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3bf28ce9
    • Eli Cooper's avatar
      ipv4: Set skb->protocol properly for local output · 2176ec1c
      Eli Cooper authored
      commit f4180439 upstream.
      
      When xfrm is applied to TSO/GSO packets, it follows this path:
      
          xfrm_output() -> xfrm_output_gso() -> skb_gso_segment()
      
      where skb_gso_segment() relies on skb->protocol to function properly.
      
      This patch sets skb->protocol to ETH_P_IP before dst_output() is called,
      fixing a bug where GSO packets sent through a sit tunnel are dropped
      when xfrm is involved.
      Signed-off-by: default avatarEli Cooper <elicooper@gmx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2176ec1c
    • Eli Cooper's avatar
      ipv6: Set skb->protocol properly for local output · 25d8b7c1
      Eli Cooper authored
      commit b4e479a9 upstream.
      
      When xfrm is applied to TSO/GSO packets, it follows this path:
      
          xfrm_output() -> xfrm_output_gso() -> skb_gso_segment()
      
      where skb_gso_segment() relies on skb->protocol to function properly.
      
      This patch sets skb->protocol to ETH_P_IPV6 before dst_output() is called,
      fixing a bug where GSO packets sent through an ipip6 tunnel are dropped
      when xfrm is involved.
      Signed-off-by: default avatarEli Cooper <elicooper@gmx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      25d8b7c1
    • Linus Torvalds's avatar
      Don't feed anything but regular iovec's to blk_rq_map_user_iov · d41fb2fb
      Linus Torvalds authored
      commit a0ac402c upstream.
      
      In theory we could map other things, but there's a reason that function
      is called "user_iov".  Using anything else (like splice can do) just
      confuses it.
      Reported-and-tested-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d41fb2fb
    • Al Viro's avatar
      constify iov_iter_count() and iter_is_iovec() · fd1aa12c
      Al Viro authored
      commit b57332b4 upstream.
      
      [stable note, need this to prevent build warning in commit
      a0ac402c]
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fd1aa12c
    • Thomas Tai's avatar
      sparc64: fix compile warning section mismatch in find_node() · 899b6053
      Thomas Tai authored
      [ Upstream commit 87a349f9 ]
      
      A compile warning is introduced by a commit to fix the find_node().
      This patch fix the compile warning by moving find_node() into __init
      section. Because find_node() is only used by memblock_nid_range() which
      is only used by a __init add_node_ranges(). find_node() and
      memblock_nid_range() should also be inside __init section.
      Signed-off-by: default avatarThomas Tai <thomas.tai@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      899b6053
    • Thomas Tai's avatar
      sparc64: Fix find_node warning if numa node cannot be found · ed7b60db
      Thomas Tai authored
      [ Upstream commit 74a5ed5c ]
      
      When booting up LDOM, find_node() warns that a physical address
      doesn't match a NUMA node.
      
      WARNING: CPU: 0 PID: 0 at arch/sparc/mm/init_64.c:835
      find_node+0xf4/0x120 find_node: A physical address doesn't
      match a NUMA node rule. Some physical memory will be
      owned by node 0.Modules linked in:
      
      CPU: 0 PID: 0 Comm: swapper Not tainted 4.9.0-rc3 #4
      Call Trace:
       [0000000000468ba0] __warn+0xc0/0xe0
       [0000000000468c74] warn_slowpath_fmt+0x34/0x60
       [00000000004592f4] find_node+0xf4/0x120
       [0000000000dd0774] add_node_ranges+0x38/0xe4
       [0000000000dd0b1c] numa_parse_mdesc+0x268/0x2e4
       [0000000000dd0e9c] bootmem_init+0xb8/0x160
       [0000000000dd174c] paging_init+0x808/0x8fc
       [0000000000dcb0d0] setup_arch+0x2c8/0x2f0
       [0000000000dc68a0] start_kernel+0x48/0x424
       [0000000000dcb374] start_early_boot+0x27c/0x28c
       [0000000000a32c08] tlb_fixup_done+0x4c/0x64
       [0000000000027f08] 0x27f08
      
      It is because linux use an internal structure node_masks[] to
      keep the best memory latency node only. However, LDOM mdesc can
      contain single latency-group with multiple memory latency nodes.
      
      If the address doesn't match the best latency node within
      node_masks[], it should check for an alternative via mdesc.
      The warning message should only be printed if the address
      doesn't match any node_masks[] nor within mdesc. To minimize
      the impact of searching mdesc every time, the last matched
      mask and index is stored in a variable.
      Signed-off-by: default avatarThomas Tai <thomas.tai@oracle.com>
      Reviewed-by: default avatarChris Hyser <chris.hyser@oracle.com>
      Reviewed-by: default avatarLiam Merwick <liam.merwick@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ed7b60db
    • Andreas Larsson's avatar
    • Kees Cook's avatar
      net: ping: check minimum size on ICMP header length · 06cdad2b
      Kees Cook authored
      [ Upstream commit 0eab121e ]
      
      Prior to commit c0371da6 ("put iov_iter into msghdr") in v3.19, there
      was no check that the iovec contained enough bytes for an ICMP header,
      and the read loop would walk across neighboring stack contents. Since the
      iov_iter conversion, bad arguments are noticed, but the returned error is
      EFAULT. Returning EINVAL is a clearer error and also solves the problem
      prior to v3.19.
      
      This was found using trinity with KASAN on v3.18:
      
      BUG: KASAN: stack-out-of-bounds in memcpy_fromiovec+0x60/0x114 at addr ffffffc071077da0
      Read of size 8 by task trinity-c2/9623
      page:ffffffbe034b9a08 count:0 mapcount:0 mapping:          (null) index:0x0
      flags: 0x0()
      page dumped because: kasan: bad access detected
      CPU: 0 PID: 9623 Comm: trinity-c2 Tainted: G    BU         3.18.0-dirty #15
      Hardware name: Google Tegra210 Smaug Rev 1,3+ (DT)
      Call trace:
      [<ffffffc000209c98>] dump_backtrace+0x0/0x1ac arch/arm64/kernel/traps.c:90
      [<ffffffc000209e54>] show_stack+0x10/0x1c arch/arm64/kernel/traps.c:171
      [<     inline     >] __dump_stack lib/dump_stack.c:15
      [<ffffffc000f18dc4>] dump_stack+0x7c/0xd0 lib/dump_stack.c:50
      [<     inline     >] print_address_description mm/kasan/report.c:147
      [<     inline     >] kasan_report_error mm/kasan/report.c:236
      [<ffffffc000373dcc>] kasan_report+0x380/0x4b8 mm/kasan/report.c:259
      [<     inline     >] check_memory_region mm/kasan/kasan.c:264
      [<ffffffc00037352c>] __asan_load8+0x20/0x70 mm/kasan/kasan.c:507
      [<ffffffc0005b9624>] memcpy_fromiovec+0x5c/0x114 lib/iovec.c:15
      [<     inline     >] memcpy_from_msg include/linux/skbuff.h:2667
      [<ffffffc000ddeba0>] ping_common_sendmsg+0x50/0x108 net/ipv4/ping.c:674
      [<ffffffc000dded30>] ping_v4_sendmsg+0xd8/0x698 net/ipv4/ping.c:714
      [<ffffffc000dc91dc>] inet_sendmsg+0xe0/0x12c net/ipv4/af_inet.c:749
      [<     inline     >] __sock_sendmsg_nosec net/socket.c:624
      [<     inline     >] __sock_sendmsg net/socket.c:632
      [<ffffffc000cab61c>] sock_sendmsg+0x124/0x164 net/socket.c:643
      [<     inline     >] SYSC_sendto net/socket.c:1797
      [<ffffffc000cad270>] SyS_sendto+0x178/0x1d8 net/socket.c:1761
      
      CVE-2016-8399
      Reported-by: default avatarQidan He <i@flanker017.me>
      Fixes: c319b4d7 ("net: ipv4: add IPPROTO_ICMP socket kind")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      06cdad2b
    • Eric Dumazet's avatar
      net: avoid signed overflows for SO_{SND|RCV}BUFFORCE · 77125815
      Eric Dumazet authored
      [ Upstream commit b98b0bc8 ]
      
      CAP_NET_ADMIN users should not be allowed to set negative
      sk_sndbuf or sk_rcvbuf values, as it can lead to various memory
      corruptions, crashes, OOM...
      
      Note that before commit 82981930 ("net: cleanups in
      sock_setsockopt()"), the bug was even more serious, since SO_SNDBUF
      and SO_RCVBUF were vulnerable.
      
      This needs to be backported to all known linux kernels.
      
      Again, many thanks to syzkaller team for discovering this gem.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      77125815
    • Sabrina Dubroca's avatar
      geneve: avoid use-after-free of skb->data · 6e682c52
      Sabrina Dubroca authored
      [ Upstream commit 5b010147 ]
      
      geneve{,6}_build_skb can end up doing a pskb_expand_head(), which
      makes the ip_hdr(skb) reference we stashed earlier stale. Since it's
      only needed as an argument to ip_tunnel_ecn_encap(), move this
      directly in the function call.
      
      Fixes: 08399efc ("geneve: ensure ECN info is handled properly in all tx/rx paths")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6e682c52
    • Chris Brandt's avatar
      sh_eth: remove unchecked interrupts for RZ/A1 · a89e2ff8
      Chris Brandt authored
      [ Upstream commit 33d446db ]
      
      When streaming a lot of data and the RZ/A1 can't keep up, some status bits
      will get set that are not being checked or cleared which cause the
      following messages and the Ethernet driver to stop working. This
      patch fixes that issue.
      
      irq 21: nobody cared (try booting with the "irqpoll" option)
      handlers:
      [<c036b71c>] sh_eth_interrupt
      Disabling IRQ #21
      
      Fixes: db893473 ("sh_eth: Add support for r7s72100")
      Signed-off-by: default avatarChris Brandt <chris.brandt@renesas.com>
      Acked-by: default avatarSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a89e2ff8
    • Florian Fainelli's avatar
      net: bcmgenet: Utilize correct struct device for all DMA operations · c36a2a14
      Florian Fainelli authored
      [ Upstream commit 8c4799ac ]
      
      __bcmgenet_tx_reclaim() and bcmgenet_free_rx_buffers() are not using the
      same struct device during unmap that was used for the map operation,
      which makes DMA-API debugging warn about it. Fix this by always using
      &priv->pdev->dev throughout the driver, using an identical device
      reference for all map/unmap calls.
      
      Fixes: 1c1008c7 ("net: bcmgenet: add main driver file")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c36a2a14
    • Philip Pettersson's avatar
      packet: fix race condition in packet_set_ring · 5a01eaf1
      Philip Pettersson authored
      [ Upstream commit 84ac7260 ]
      
      When packet_set_ring creates a ring buffer it will initialize a
      struct timer_list if the packet version is TPACKET_V3. This value
      can then be raced by a different thread calling setsockopt to
      set the version to TPACKET_V1 before packet_set_ring has finished.
      
      This leads to a use-after-free on a function pointer in the
      struct timer_list when the socket is closed as the previously
      initialized timer will not be deleted.
      
      The bug is fixed by taking lock_sock(sk) in packet_setsockopt when
      changing the packet version while also taking the lock at the start
      of packet_set_ring.
      
      Fixes: f6fb8f10 ("af-packet: TPACKET_V3 flexible buffer implementation.")
      Signed-off-by: default avatarPhilip Pettersson <philip.pettersson@gmail.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5a01eaf1
    • Eric Dumazet's avatar
      net/dccp: fix use-after-free in dccp_invalid_packet · 1a15519f
      Eric Dumazet authored
      [ Upstream commit 648f0c28 ]
      
      pskb_may_pull() can reallocate skb->head, we need to reload dh pointer
      in dccp_invalid_packet() or risk use after free.
      
      Bug found by Andrey Konovalov using syzkaller.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1a15519f
    • Herbert Xu's avatar
      netlink: Do not schedule work from sk_destruct · baaf0c65
      Herbert Xu authored
      [ Upstream commit ed5d7788 ]
      
      It is wrong to schedule a work from sk_destruct using the socket
      as the memory reserve because the socket will be freed immediately
      after the return from sk_destruct.
      
      Instead we should do the deferral prior to sk_free.
      
      This patch does just that.
      
      Fixes: 707693c8 ("netlink: Call cb->done from a worker thread")
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      baaf0c65
    • Herbert Xu's avatar
      netlink: Call cb->done from a worker thread · d1ed9c1d
      Herbert Xu authored
      [ Upstream commit 707693c8 ]
      
      The cb->done interface expects to be called in process context.
      This was broken by the netlink RCU conversion.  This patch fixes
      it by adding a worker struct to make the cb->done call where
      necessary.
      
      Fixes: 21e4902a ("netlink: Lockless lookup with RCU grace...")
      Reported-by: default avatarSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d1ed9c1d
    • Amir Vadai's avatar
      net/sched: pedit: make sure that offset is valid · 6c42bd6a
      Amir Vadai authored
      [ Upstream commit 95c2027b ]
      
      Add a validation function to make sure offset is valid:
      1. Not below skb head (could happen when offset is negative).
      2. Validate both 'offset' and 'at'.
      Signed-off-by: default avatarAmir Vadai <amir@vadai.me>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6c42bd6a
    • Daniel Borkmann's avatar
      net, sched: respect rcu grace period on cls destruction · cfa7c16d
      Daniel Borkmann authored
      [ Upstream commit d9363774 ]
      
      Roi reported a crash in flower where tp->root was NULL in ->classify()
      callbacks. Reason is that in ->destroy() tp->root is set to NULL via
      RCU_INIT_POINTER(). It's problematic for some of the classifiers, because
      this doesn't respect RCU grace period for them, and as a result, still
      outstanding readers from tc_classify() will try to blindly dereference
      a NULL tp->root.
      
      The tp->root object is strictly private to the classifier implementation
      and holds internal data the core such as tc_ctl_tfilter() doesn't know
      about. Within some classifiers, such as cls_bpf, cls_basic, etc, tp->root
      is only checked for NULL in ->get() callback, but nowhere else. This is
      misleading and seemed to be copied from old classifier code that was not
      cleaned up properly. For example, d3fa76ee ("[NET_SCHED]: cls_basic:
      fix NULL pointer dereference") moved tp->root initialization into ->init()
      routine, where before it was part of ->change(), so ->get() had to deal
      with tp->root being NULL back then, so that was indeed a valid case, after
      d3fa76ee, not really anymore. We used to set tp->root to NULL long
      ago in ->destroy(), see 47a1a1d4 ("pkt_sched: remove unnecessary xchg()
      in packet classifiers"); but the NULLifying was reintroduced with the
      RCUification, but it's not correct for every classifier implementation.
      
      In the cases that are fixed here with one exception of cls_cgroup, tp->root
      object is allocated and initialized inside ->init() callback, which is always
      performed at a point in time after we allocate a new tp, which means tp and
      thus tp->root was not globally visible in the tp chain yet (see tc_ctl_tfilter()).
      Also, on destruction tp->root is strictly kfree_rcu()'ed in ->destroy()
      handler, same for the tp which is kfree_rcu()'ed right when we return
      from ->destroy() in tcf_destroy(). This means, the head object's lifetime
      for such classifiers is always tied to the tp lifetime. The RCU callback
      invocation for the two kfree_rcu() could be out of order, but that's fine
      since both are independent.
      
      Dropping the RCU_INIT_POINTER(tp->root, NULL) for these classifiers here
      means that 1) we don't need a useless NULL check in fast-path and, 2) that
      outstanding readers of that tp in tc_classify() can still execute under
      respect with RCU grace period as it is actually expected.
      
      Things that haven't been touched here: cls_fw and cls_route. They each
      handle tp->root being NULL in ->classify() path for historic reasons, so
      their ->destroy() implementation can stay as is. If someone actually
      cares, they could get cleaned up at some point to avoid the test in fast
      path. cls_u32 doesn't set tp->root to NULL. For cls_rsvp, I just added a
      !head should anyone actually be using/testing it, so it at least aligns with
      cls_fw and cls_route. For cls_flower we additionally need to defer rhashtable
      destruction (to a sleepable context) after RCU grace period as concurrent
      readers might still access it. (Note that in this case we need to hold module
      reference to keep work callback address intact, since we only wait on module
      unload for all call_rcu()s to finish.)
      
      This fixes one race to bring RCU grace period guarantees back. Next step
      as worked on by Cong however is to fix 1e052be6 ("net_sched: destroy
      proto tp when all filters are gone") to get the order of unlinking the tp
      in tc_ctl_tfilter() for the RTM_DELTFILTER case right by moving
      RCU_INIT_POINTER() before tcf_destroy() and let the notification for
      removal be done through the prior ->delete() callback. Both are independant
      issues. Once we have that right, we can then clean tp->root up for a number
      of classifiers by not making them RCU pointers, which requires a new callback
      (->uninit) that is triggered from tp's RCU callback, where we just kfree()
      tp->root from there.
      
      Fixes: 1f947bf1 ("net: sched: rcu'ify cls_bpf")
      Fixes: 9888faef ("net: sched: cls_basic use RCU")
      Fixes: 70da9f0b ("net: sched: cls_flow use RCU")
      Fixes: 77b9900e ("tc: introduce Flower classifier")
      Fixes: bf3994d2 ("net/sched: introduce Match-all classifier")
      Fixes: 952313bd ("net: sched: cls_cgroup use RCU")
      Reported-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Roi Dayan <roid@mellanox.com>
      Cc: Jiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cfa7c16d
    • Florian Fainelli's avatar
      net: dsa: bcm_sf2: Ensure we re-negotiate EEE during after link change · 94de6f2f
      Florian Fainelli authored
      [ Upstream commit 76da8706 ]
      
      In case the link change and EEE is enabled or disabled, always try to
      re-negotiate this with the link partner.
      
      Fixes: 450b05c1 ("net: dsa: bcm_sf2: add support for controlling EEE")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      94de6f2f
    • Guillaume Nault's avatar
      l2tp: fix racy SOCK_ZAPPED flag check in l2tp_ip{,6}_bind() · 56366fa0
      Guillaume Nault authored
      [ Upstream commit 32c23116 ]
      
      Lock socket before checking the SOCK_ZAPPED flag in l2tp_ip6_bind().
      Without lock, a concurrent call could modify the socket flags between
      the sock_flag(sk, SOCK_ZAPPED) test and the lock_sock() call. This way,
      a socket could be inserted twice in l2tp_ip6_bind_table. Releasing it
      would then leave a stale pointer there, generating use-after-free
      errors when walking through the list or modifying adjacent entries.
      
      BUG: KASAN: use-after-free in l2tp_ip6_close+0x22e/0x290 at addr ffff8800081b0ed8
      Write of size 8 by task syz-executor/10987
      CPU: 0 PID: 10987 Comm: syz-executor Not tainted 4.8.0+ #39
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
       ffff880031d97838 ffffffff829f835b ffff88001b5a1640 ffff8800081b0ec0
       ffff8800081b15a0 ffff8800081b6d20 ffff880031d97860 ffffffff8174d3cc
       ffff880031d978f0 ffff8800081b0e80 ffff88001b5a1640 ffff880031d978e0
      Call Trace:
       [<ffffffff829f835b>] dump_stack+0xb3/0x118 lib/dump_stack.c:15
       [<ffffffff8174d3cc>] kasan_object_err+0x1c/0x70 mm/kasan/report.c:156
       [<     inline     >] print_address_description mm/kasan/report.c:194
       [<ffffffff8174d666>] kasan_report_error+0x1f6/0x4d0 mm/kasan/report.c:283
       [<     inline     >] kasan_report mm/kasan/report.c:303
       [<ffffffff8174db7e>] __asan_report_store8_noabort+0x3e/0x40 mm/kasan/report.c:329
       [<     inline     >] __write_once_size ./include/linux/compiler.h:249
       [<     inline     >] __hlist_del ./include/linux/list.h:622
       [<     inline     >] hlist_del_init ./include/linux/list.h:637
       [<ffffffff8579047e>] l2tp_ip6_close+0x22e/0x290 net/l2tp/l2tp_ip6.c:239
       [<ffffffff850b2dfd>] inet_release+0xed/0x1c0 net/ipv4/af_inet.c:415
       [<ffffffff851dc5a0>] inet6_release+0x50/0x70 net/ipv6/af_inet6.c:422
       [<ffffffff84c4581d>] sock_release+0x8d/0x1d0 net/socket.c:570
       [<ffffffff84c45976>] sock_close+0x16/0x20 net/socket.c:1017
       [<ffffffff817a108c>] __fput+0x28c/0x780 fs/file_table.c:208
       [<ffffffff817a1605>] ____fput+0x15/0x20 fs/file_table.c:244
       [<ffffffff813774f9>] task_work_run+0xf9/0x170
       [<ffffffff81324aae>] do_exit+0x85e/0x2a00
       [<ffffffff81326dc8>] do_group_exit+0x108/0x330
       [<ffffffff81348cf7>] get_signal+0x617/0x17a0 kernel/signal.c:2307
       [<ffffffff811b49af>] do_signal+0x7f/0x18f0
       [<ffffffff810039bf>] exit_to_usermode_loop+0xbf/0x150 arch/x86/entry/common.c:156
       [<     inline     >] prepare_exit_to_usermode arch/x86/entry/common.c:190
       [<ffffffff81006060>] syscall_return_slowpath+0x1a0/0x1e0 arch/x86/entry/common.c:259
       [<ffffffff85e4d726>] entry_SYSCALL_64_fastpath+0xc4/0xc6
      Object at ffff8800081b0ec0, in cache L2TP/IPv6 size: 1448
      Allocated:
      PID = 10987
       [ 1116.897025] [<ffffffff811ddcb6>] save_stack_trace+0x16/0x20
       [ 1116.897025] [<ffffffff8174c736>] save_stack+0x46/0xd0
       [ 1116.897025] [<ffffffff8174c9ad>] kasan_kmalloc+0xad/0xe0
       [ 1116.897025] [<ffffffff8174cee2>] kasan_slab_alloc+0x12/0x20
       [ 1116.897025] [<     inline     >] slab_post_alloc_hook mm/slab.h:417
       [ 1116.897025] [<     inline     >] slab_alloc_node mm/slub.c:2708
       [ 1116.897025] [<     inline     >] slab_alloc mm/slub.c:2716
       [ 1116.897025] [<ffffffff817476a8>] kmem_cache_alloc+0xc8/0x2b0 mm/slub.c:2721
       [ 1116.897025] [<ffffffff84c4f6a9>] sk_prot_alloc+0x69/0x2b0 net/core/sock.c:1326
       [ 1116.897025] [<ffffffff84c58ac8>] sk_alloc+0x38/0xae0 net/core/sock.c:1388
       [ 1116.897025] [<ffffffff851ddf67>] inet6_create+0x2d7/0x1000 net/ipv6/af_inet6.c:182
       [ 1116.897025] [<ffffffff84c4af7b>] __sock_create+0x37b/0x640 net/socket.c:1153
       [ 1116.897025] [<     inline     >] sock_create net/socket.c:1193
       [ 1116.897025] [<     inline     >] SYSC_socket net/socket.c:1223
       [ 1116.897025] [<ffffffff84c4b46f>] SyS_socket+0xef/0x1b0 net/socket.c:1203
       [ 1116.897025] [<ffffffff85e4d685>] entry_SYSCALL_64_fastpath+0x23/0xc6
      Freed:
      PID = 10987
       [ 1116.897025] [<ffffffff811ddcb6>] save_stack_trace+0x16/0x20
       [ 1116.897025] [<ffffffff8174c736>] save_stack+0x46/0xd0
       [ 1116.897025] [<ffffffff8174cf61>] kasan_slab_free+0x71/0xb0
       [ 1116.897025] [<     inline     >] slab_free_hook mm/slub.c:1352
       [ 1116.897025] [<     inline     >] slab_free_freelist_hook mm/slub.c:1374
       [ 1116.897025] [<     inline     >] slab_free mm/slub.c:2951
       [ 1116.897025] [<ffffffff81748b28>] kmem_cache_free+0xc8/0x330 mm/slub.c:2973
       [ 1116.897025] [<     inline     >] sk_prot_free net/core/sock.c:1369
       [ 1116.897025] [<ffffffff84c541eb>] __sk_destruct+0x32b/0x4f0 net/core/sock.c:1444
       [ 1116.897025] [<ffffffff84c5aca4>] sk_destruct+0x44/0x80 net/core/sock.c:1452
       [ 1116.897025] [<ffffffff84c5ad33>] __sk_free+0x53/0x220 net/core/sock.c:1460
       [ 1116.897025] [<ffffffff84c5af23>] sk_free+0x23/0x30 net/core/sock.c:1471
       [ 1116.897025] [<ffffffff84c5cb6c>] sk_common_release+0x28c/0x3e0 ./include/net/sock.h:1589
       [ 1116.897025] [<ffffffff8579044e>] l2tp_ip6_close+0x1fe/0x290 net/l2tp/l2tp_ip6.c:243
       [ 1116.897025] [<ffffffff850b2dfd>] inet_release+0xed/0x1c0 net/ipv4/af_inet.c:415
       [ 1116.897025] [<ffffffff851dc5a0>] inet6_release+0x50/0x70 net/ipv6/af_inet6.c:422
       [ 1116.897025] [<ffffffff84c4581d>] sock_release+0x8d/0x1d0 net/socket.c:570
       [ 1116.897025] [<ffffffff84c45976>] sock_close+0x16/0x20 net/socket.c:1017
       [ 1116.897025] [<ffffffff817a108c>] __fput+0x28c/0x780 fs/file_table.c:208
       [ 1116.897025] [<ffffffff817a1605>] ____fput+0x15/0x20 fs/file_table.c:244
       [ 1116.897025] [<ffffffff813774f9>] task_work_run+0xf9/0x170
       [ 1116.897025] [<ffffffff81324aae>] do_exit+0x85e/0x2a00
       [ 1116.897025] [<ffffffff81326dc8>] do_group_exit+0x108/0x330
       [ 1116.897025] [<ffffffff81348cf7>] get_signal+0x617/0x17a0 kernel/signal.c:2307
       [ 1116.897025] [<ffffffff811b49af>] do_signal+0x7f/0x18f0
       [ 1116.897025] [<ffffffff810039bf>] exit_to_usermode_loop+0xbf/0x150 arch/x86/entry/common.c:156
       [ 1116.897025] [<     inline     >] prepare_exit_to_usermode arch/x86/entry/common.c:190
       [ 1116.897025] [<ffffffff81006060>] syscall_return_slowpath+0x1a0/0x1e0 arch/x86/entry/common.c:259
       [ 1116.897025] [<ffffffff85e4d726>] entry_SYSCALL_64_fastpath+0xc4/0xc6
      Memory state around the buggy address:
       ffff8800081b0d80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff8800081b0e00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      >ffff8800081b0e80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
                                                          ^
       ffff8800081b0f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8800081b0f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      ==================================================================
      
      The same issue exists with l2tp_ip_bind() and l2tp_ip_bind_table.
      
      Fixes: c51ce497 ("l2tp: fix oops in L2TP IP sockets for connect() AF_UNSPEC case")
      Reported-by: default avatarBaozeng Ding <sploving1@gmail.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: default avatarBaozeng Ding <sploving1@gmail.com>
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      56366fa0
    • Sabrina Dubroca's avatar
      rtnetlink: fix FDB size computation · aece024e
      Sabrina Dubroca authored
      [ Upstream commit f82ef3e1 ]
      
      Add missing NDA_VLAN attribute's size.
      
      Fixes: 1e53d5bb ("net: Pass VLAN ID to rtnl_fdb_notify.")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aece024e
    • WANG Cong's avatar
      af_unix: conditionally use freezable blocking calls in read · 6ef59b98
      WANG Cong authored
      [ Upstream commit 06a77b07 ]
      
      Commit 2b15af6f ("af_unix: use freezable blocking calls in read")
      converts schedule_timeout() to its freezable version, it was probably
      correct at that time, but later, commit 2b514574
      ("net: af_unix: implement splice for stream af_unix sockets") breaks
      the strong requirement for a freezable sleep, according to
      commit 0f9548ca:
      
          We shouldn't try_to_freeze if locks are held.  Holding a lock can cause a
          deadlock if the lock is later acquired in the suspend or hibernate path
          (e.g.  by dpm).  Holding a lock can also cause a deadlock in the case of
          cgroup_freezer if a lock is held inside a frozen cgroup that is later
          acquired by a process outside that group.
      
      The pipe_lock is still held at that point.
      
      So use freezable version only for the recvmsg call path, avoid impact for
      Android.
      
      Fixes: 2b514574 ("net: af_unix: implement splice for stream af_unix sockets")
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Colin Cross <ccross@android.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6ef59b98
    • Jeremy Linton's avatar
      net: sky2: Fix shutdown crash · acf9504a
      Jeremy Linton authored
      [ Upstream commit 06ba3b21 ]
      
      The sky2 frequently crashes during machine shutdown with:
      
      sky2_get_stats+0x60/0x3d8 [sky2]
      dev_get_stats+0x68/0xd8
      rtnl_fill_stats+0x54/0x140
      rtnl_fill_ifinfo+0x46c/0xc68
      rtmsg_ifinfo_build_skb+0x7c/0xf0
      rtmsg_ifinfo.part.22+0x3c/0x70
      rtmsg_ifinfo+0x50/0x5c
      netdev_state_change+0x4c/0x58
      linkwatch_do_dev+0x50/0x88
      __linkwatch_run_queue+0x104/0x1a4
      linkwatch_event+0x30/0x3c
      process_one_work+0x140/0x3e0
      worker_thread+0x60/0x44c
      kthread+0xdc/0xf0
      ret_from_fork+0x10/0x50
      
      This is caused by the sky2 being called after it has been shutdown.
      A previous thread about this can be found here:
      
      https://lkml.org/lkml/2016/4/12/410
      
      An alternative fix is to assure that IFF_UP gets cleared by
      calling dev_close() during shutdown. This is similar to what the
      bnx2/tg3/xgene and maybe others are doing to assure that the driver
      isn't being called following _shutdown().
      Signed-off-by: default avatarJeremy Linton <jeremy.linton@arm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      acf9504a
    • Paolo Abeni's avatar
      ip6_tunnel: disable caching when the traffic class is inherited · 49695d1e
      Paolo Abeni authored
      [ Upstream commit b5c2d495 ]
      
      If an ip6 tunnel is configured to inherit the traffic class from
      the inner header, the dst_cache must be disabled or it will foul
      the policy routing.
      
      The issue is apprently there since at leat Linux-2.6.12-rc2.
      Reported-by: default avatarLiam McBirnie <liam.mcbirnie@boeing.com>
      Cc: Liam McBirnie <liam.mcbirnie@boeing.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      49695d1e
    • WANG Cong's avatar
      net: check dead netns for peernet2id_alloc() · 2b54505c
      WANG Cong authored
      [ Upstream commit cfc44a4d ]
      
      Andrei reports we still allocate netns ID from idr after we destroy
      it in cleanup_net().
      
      cleanup_net():
        ...
        idr_destroy(&net->netns_ids);
        ...
        list_for_each_entry_reverse(ops, &pernet_list, list)
          ops_exit_list(ops, &net_exit_list);
            -> rollback_registered_many()
              -> rtmsg_ifinfo_build_skb()
               -> rtnl_fill_ifinfo()
                 -> peernet2id_alloc()
      
      After that point we should not even access net->netns_ids, we
      should check the death of the current netns as early as we can in
      peernet2id_alloc().
      
      For net-next we can consider to avoid sending rtmsg totally,
      it is a good optimization for netns teardown path.
      
      Fixes: 0c7aecd4 ("netns: add rtnl cmd to add and get peer netns ids")
      Reported-by: default avatarAndrei Vagin <avagin@gmail.com>
      Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarAndrei Vagin <avagin@openvz.org>
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2b54505c
    • Eric Dumazet's avatar
      virtio-net: add a missing synchronize_net() · 790fd11f
      Eric Dumazet authored
      [ Upstream commit 963abe5c ]
      
      It seems many drivers do not respect napi_hash_del() contract.
      
      When napi_hash_del() is used before netif_napi_del(), an RCU grace
      period is needed before freeing NAPI object.
      
      Fixes: 91815639 ("virtio-net: rx busy polling support")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      790fd11f
  3. 08 Dec, 2016 7 commits