1. 26 Oct, 2016 40 commits
    • Zefan Li's avatar
      Fix incomplete backport of commit 423f04d6 · 6883832b
      Zefan Li authored
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      6883832b
    • Nicolas Dichtel's avatar
      ipv6: fix handling of blackhole and prohibit routes · af706acb
      Nicolas Dichtel authored
      commit ef2c7d7b upstream.
      
      When adding a blackhole or a prohibit route, they were handling like classic
      routes. Moreover, it was only possible to add this kind of routes by specifying
      an interface.
      
      Bug already reported here:
        http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=498498
      
      Before the patch:
        $ ip route add blackhole 2001::1/128
        RTNETLINK answers: No such device
        $ ip route add blackhole 2001::1/128 dev eth0
        $ ip -6 route | grep 2001
        2001::1 dev eth0  metric 1024
      
      After:
        $ ip route add blackhole 2001::1/128
        $ ip -6 route | grep 2001
        blackhole 2001::1 dev lo  metric 1024  error -22
      
      v2: wrong patch
      v3: add a field fc_type in struct fib6_config to store RTN_* type
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      af706acb
    • Michal Kubeček's avatar
      ipv6: don't call fib6_run_gc() until routing is ready · a4ea6252
      Michal Kubeček authored
      commit 2c861cc6 upstream.
      
      When loading the ipv6 module, ndisc_init() is called before
      ip6_route_init(). As the former registers a handler calling
      fib6_run_gc(), this opens a window to run the garbage collector
      before necessary data structures are initialized. If a network
      device is initialized in this window, adding MAC address to it
      triggers a NETDEV_CHANGEADDR event, leading to a crash in
      fib6_clean_all().
      
      Take the event handler registration out of ndisc_init() into a
      separate function ndisc_late_init() and move it after
      ip6_route_init().
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      a4ea6252
    • Michal Kubeček's avatar
      ipv6: update ip6_rt_last_gc every time GC is run · 3b02ae3d
      Michal Kubeček authored
      commit 49a18d86 upstream.
      
      As pointed out by Eric Dumazet, net->ipv6.ip6_rt_last_gc should
      hold the last time garbage collector was run so that we should
      update it whenever fib6_run_gc() calls fib6_clean_all(), not only
      if we got there from ip6_dst_gc().
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      3b02ae3d
    • Karl Heiss's avatar
      sctp: Prevent soft lockup when sctp_accept() is called during a timeout event · cc639575
      Karl Heiss authored
      commit 635682a1 upstream.
      
      A case can occur when sctp_accept() is called by the user during
      a heartbeat timeout event after the 4-way handshake.  Since
      sctp_assoc_migrate() changes both assoc->base.sk and assoc->ep, the
      bh_sock_lock in sctp_generate_heartbeat_event() will be taken with
      the listening socket but released with the new association socket.
      The result is a deadlock on any future attempts to take the listening
      socket lock.
      
      Note that this race can occur with other SCTP timeouts that take
      the bh_lock_sock() in the event sctp_accept() is called.
      
       BUG: soft lockup - CPU#9 stuck for 67s! [swapper:0]
       ...
       RIP: 0010:[<ffffffff8152d48e>]  [<ffffffff8152d48e>] _spin_lock+0x1e/0x30
       RSP: 0018:ffff880028323b20  EFLAGS: 00000206
       RAX: 0000000000000002 RBX: ffff880028323b20 RCX: 0000000000000000
       RDX: 0000000000000000 RSI: ffff880028323be0 RDI: ffff8804632c4b48
       RBP: ffffffff8100bb93 R08: 0000000000000000 R09: 0000000000000000
       R10: ffff880610662280 R11: 0000000000000100 R12: ffff880028323aa0
       R13: ffff8804383c3880 R14: ffff880028323a90 R15: ffffffff81534225
       FS:  0000000000000000(0000) GS:ffff880028320000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
       CR2: 00000000006df528 CR3: 0000000001a85000 CR4: 00000000000006e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
       Process swapper (pid: 0, threadinfo ffff880616b70000, task ffff880616b6cab0)
       Stack:
       ffff880028323c40 ffffffffa01c2582 ffff880614cfb020 0000000000000000
       <d> 0100000000000000 00000014383a6c44 ffff8804383c3880 ffff880614e93c00
       <d> ffff880614e93c00 0000000000000000 ffff8804632c4b00 ffff8804383c38b8
       Call Trace:
       <IRQ>
       [<ffffffffa01c2582>] ? sctp_rcv+0x492/0xa10 [sctp]
       [<ffffffff8148c559>] ? nf_iterate+0x69/0xb0
       [<ffffffff814974a0>] ? ip_local_deliver_finish+0x0/0x2d0
       [<ffffffff8148c716>] ? nf_hook_slow+0x76/0x120
       [<ffffffff814974a0>] ? ip_local_deliver_finish+0x0/0x2d0
       [<ffffffff8149757d>] ? ip_local_deliver_finish+0xdd/0x2d0
       [<ffffffff81497808>] ? ip_local_deliver+0x98/0xa0
       [<ffffffff81496ccd>] ? ip_rcv_finish+0x12d/0x440
       [<ffffffff81497255>] ? ip_rcv+0x275/0x350
       [<ffffffff8145cfeb>] ? __netif_receive_skb+0x4ab/0x750
       ...
      
      With lockdep debugging:
      
       =====================================
       [ BUG: bad unlock balance detected! ]
       -------------------------------------
       CslRx/12087 is trying to release lock (slock-AF_INET) at:
       [<ffffffffa01bcae0>] sctp_generate_timeout_event+0x40/0xe0 [sctp]
       but there are no more locks to release!
      
       other info that might help us debug this:
       2 locks held by CslRx/12087:
       #0:  (&asoc->timers[i]){+.-...}, at: [<ffffffff8108ce1f>] run_timer_softirq+0x16f/0x3e0
       #1:  (slock-AF_INET){+.-...}, at: [<ffffffffa01bcac3>] sctp_generate_timeout_event+0x23/0xe0 [sctp]
      
      Ensure the socket taken is also the same one that is released by
      saving a copy of the socket before entering the timeout event
      critical section.
      Signed-off-by: default avatarKarl Heiss <kheiss@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2:
       - Net namespaces are not used
       - Keep using sctp_bh_{,un}lock_sock()
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      cc639575
    • Dave Airlie's avatar
      drm/radeon: fix hotplug race at startup · d78e7762
      Dave Airlie authored
      commit 7f98ca45 upstream.
      
      We apparantly get a hotplug irq before we've initialised
      modesetting,
      
      [drm] Loading R100 Microcode
      BUG: unable to handle kernel NULL pointer dereference at   (null)
      IP: [<c125f56f>] __mutex_lock_slowpath+0x23/0x91
      *pde = 00000000
      Oops: 0002 [#1]
      Modules linked in: radeon(+) drm_kms_helper ttm drm i2c_algo_bit backlight pcspkr psmouse evdev sr_mod input_leds led_class cdrom sg parport_pc parport floppy intel_agp intel_gtt lpc_ich acpi_cpufreq processor button mfd_core agpgart uhci_hcd ehci_hcd rng_core snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm usbcore usb_common i2c_i801 i2c_core snd_timer snd soundcore thermal_sys
      CPU: 0 PID: 15 Comm: kworker/0:1 Not tainted 4.2.0-rc7-00015-gbf674028 #111
      Hardware name: MicroLink                               /D850MV                         , BIOS MV85010A.86A.0067.P24.0304081124 04/08/2003
      Workqueue: events radeon_hotplug_work_func [radeon]
      task: f6ca5900 ti: f6d3e000 task.ti: f6d3e000
      EIP: 0060:[<c125f56f>] EFLAGS: 00010282 CPU: 0
      EIP is at __mutex_lock_slowpath+0x23/0x91
      EAX: 00000000 EBX: f5e900fc ECX: 00000000 EDX: fffffffe
      ESI: f6ca5900 EDI: f5e90100 EBP: f5e90000 ESP: f6d3ff0c
       DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
      CR0: 8005003b CR2: 00000000 CR3: 36f61000 CR4: 000006d0
      Stack:
       f5e90100 00000000 c103c4c1 f6d2a5a0 f5e900fc f6df394c c125f162 f8b0faca
       f6d2a5a0 c138ca00 f6df394c f7395600 c1034741 00d40000 00000000 f6d2a5a0
       c138ca00 f6d2a5b8 c138ca10 c1034b58 00000001 f6d40000 f6ca5900 f6d0c940
      Call Trace:
       [<c103c4c1>] ? dequeue_task_fair+0xa4/0xb7
       [<c125f162>] ? mutex_lock+0x9/0xa
       [<f8b0faca>] ? radeon_hotplug_work_func+0x17/0x57 [radeon]
       [<c1034741>] ? process_one_work+0xfc/0x194
       [<c1034b58>] ? worker_thread+0x18d/0x218
       [<c10349cb>] ? rescuer_thread+0x1d5/0x1d5
       [<c103742a>] ? kthread+0x7b/0x80
       [<c12601c0>] ? ret_from_kernel_thread+0x20/0x30
       [<c10373af>] ? init_completion+0x18/0x18
      Code: 42 08 e8 8e a6 dd ff c3 57 56 53 83 ec 0c 8b 35 48 f7 37 c1 8b 10 4a 74 1a 89 c3 8d 78 04 8b 40 08 89 63
      Reported-and-Tested-by: default avatarMeelis Roos <mroos@linux.ee>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      d78e7762
    • Eric Dumazet's avatar
      udp: properly support MSG_PEEK with truncated buffers · 088be966
      Eric Dumazet authored
      commit 197c949e upstream.
      
      Backport of this upstream commit into stable kernels :
      89c22d8c ("net: Fix skb csum races when peeking")
      exposed a bug in udp stack vs MSG_PEEK support, when user provides
      a buffer smaller than skb payload.
      
      In this case,
      skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr),
                                       msg->msg_iov);
      returns -EFAULT.
      
      This bug does not happen in upstream kernels since Al Viro did a great
      job to replace this into :
      skb_copy_and_csum_datagram_msg(skb, sizeof(struct udphdr), msg);
      This variant is safe vs short buffers.
      
      For the time being, instead reverting Herbert Xu patch and add back
      skb->ip_summed invalid changes, simply store the result of
      udp_lib_checksum_complete() so that we avoid computing the checksum a
      second time, and avoid the problematic
      skb_copy_and_csum_datagram_iovec() call.
      
      This patch can be applied on recent kernels as it avoids a double
      checksumming, then backported to stable kernels as a bug fix.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      088be966
    • Herbert Xu's avatar
      net: Fix skb csum races when peeking · 538b3c02
      Herbert Xu authored
      [ Upstream commit 89c22d8c ]
      
      When we calculate the checksum on the recv path, we store the
      result in the skb as an optimisation in case we need the checksum
      again down the line.
      
      This is in fact bogus for the MSG_PEEK case as this is done without
      any locking.  So multiple threads can peek and then store the result
      to the same skb, potentially resulting in bogus skb states.
      
      This patch fixes this by only storing the result if the skb is not
      shared.  This preserves the optimisations for the few cases where
      it can be done safely due to locking or other reasons, e.g., SIOCINQ.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      538b3c02
    • Ben Hutchings's avatar
      USB: ti_usb_3410_502: Fix ID table size · c6192e5d
      Ben Hutchings authored
      Commit 35a2fbc9 ("USB: serial: ti_usb_3410_5052: new device id for
      Abbot strip port cable") failed to update the size of the
      ti_id_table_3410 array.  This doesn't need to be fixed upstream
      following commit d7ece651 ("USB: ti_usb_3410_5052: remove
      vendor/product module parameters") but should be fixed in stable
      branches older than 3.12.
      
      Backports of commit c9d09dc7 ("USB: serial: ti_usb_3410_5052: add
      Abbott strip port ID to combined table as well.") similarly failed to
      update the size of the ti_id_table_combined array.
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      c6192e5d
    • Eric Dumazet's avatar
      af_unix: fix a fatal race with bit fields · 5781d89c
      Eric Dumazet authored
      commit 60bc851a upstream.
      
      Using bit fields is dangerous on ppc64/sparc64, as the compiler [1]
      uses 64bit instructions to manipulate them.
      If the 64bit word includes any atomic_t or spinlock_t, we can lose
      critical concurrent changes.
      
      This is happening in af_unix, where unix_sk(sk)->gc_candidate/
      gc_maybe_cycle/lock share the same 64bit word.
      
      This leads to fatal deadlock, as one/several cpus spin forever
      on a spinlock that will never be available again.
      
      A safer way would be to use a long to store flags.
      This way we are sure compiler/arch wont do bad things.
      
      As we own unix_gc_lock spinlock when clearing or setting bits,
      we can use the non atomic __set_bit()/__clear_bit().
      
      recursion_level can share the same 64bit location with the spinlock,
      as it is set only with this spinlock held.
      
      [1] bug fixed in gcc-4.8.0 :
      http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080Reported-by: default avatarAmbrose Feinstein <ambrose@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: hejianet <hejianet@gmail.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      5781d89c
    • Francesco Ruggeri's avatar
      net: possible use after free in dst_release · edd32246
      Francesco Ruggeri authored
      commit 07a5d384 upstream.
      
      dst_release should not access dst->flags after decrementing
      __refcnt to 0. The dst_entry may be in dst_busy_list and
      dst_gc_task may dst_destroy it before dst_release gets a chance
      to access dst->flags.
      
      Fixes: d69bbf88 ("net: fix a race in dst_release()")
      Fixes: 27b75c95 ("net: avoid RCU for NOCACHE dst")
      Signed-off-by: default avatarFrancesco Ruggeri <fruggeri@arista.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      edd32246
    • Colin Ian King's avatar
      ftrace/scripts: Fix incorrect use of sprintf in recordmcount · 626c6e6e
      Colin Ian King authored
      commit 713a3e4d upstream.
      
      Fix build warning:
      
      scripts/recordmcount.c:589:4: warning: format not a string
      literal and no format arguments [-Wformat-security]
          sprintf("%s: failed\n", file);
      
      Fixes: a50bd439 ("ftrace/scripts: Have recordmcount copy the object file")
      Link: http://lkml.kernel.org/r/1451516801-16951-1-git-send-email-colin.king@canonical.com
      
      Cc: Li Bin <huawei.libin@huawei.com>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      626c6e6e
    • Andrew Banman's avatar
      mm/memory_hotplug.c: check for missing sections in test_pages_in_a_zone() · 42d0b53b
      Andrew Banman authored
      commit 5f0f2887 upstream.
      
      test_pages_in_a_zone() does not account for the possibility of missing
      sections in the given pfn range.  pfn_valid_within always returns 1 when
      CONFIG_HOLES_IN_ZONE is not set, allowing invalid pfns from missing
      sections to pass the test, leading to a kernel oops.
      
      Wrap an additional pfn loop with PAGES_PER_SECTION granularity to check
      for missing sections before proceeding into the zone-check code.
      
      This also prevents a crash from offlining memory devices with missing
      sections.  Despite this, it may be a good idea to keep the related patch
      '[PATCH 3/3] drivers: memory: prohibit offlining of memory blocks with
      missing sections' because missing sections in a memory block may lead to
      other problems not covered by the scope of this fix.
      Signed-off-by: default avatarAndrew Banman <abanman@sgi.com>
      Acked-by: default avatarAlex Thorlton <athorlton@sgi.com>
      Cc: Russ Anderson <rja@sgi.com>
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Greg KH <greg@kroah.com>
      Cc: Seth Jennings <sjennings@variantweb.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      42d0b53b
    • Joseph Qi's avatar
      ocfs2: fix BUG when calculate new backup super · c173fefc
      Joseph Qi authored
      commit 5c9ee4cb upstream.
      
      When resizing, it firstly extends the last gd.  Once it should backup
      super in the gd, it calculates new backup super and update the
      corresponding value.
      
      But it currently doesn't consider the situation that the backup super is
      already done.  And in this case, it still sets the bit in gd bitmap and
      then decrease from bg_free_bits_count, which leads to a corrupted gd and
      trigger the BUG in ocfs2_block_group_set_bits:
      
          BUG_ON(le16_to_cpu(bg->bg_free_bits_count) < num_bits);
      
      So check whether the backup super is done and then do the updates.
      Signed-off-by: default avatarJoseph Qi <joseph.qi@huawei.com>
      Reviewed-by: default avatarJiufei Xue <xuejiufei@huawei.com>
      Reviewed-by: default avatarYiwen Jiang <jiangyiwen@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      c173fefc
    • Andrey Ryabinin's avatar
      ipv6/addrlabel: fix ip6addrlbl_get() · 2c165804
      Andrey Ryabinin authored
      commit e459dfee upstream.
      
      ip6addrlbl_get() has never worked. If ip6addrlbl_hold() succeeded,
      ip6addrlbl_get() will exit with '-ESRCH'. If ip6addrlbl_hold() failed,
      ip6addrlbl_get() will use about to be free ip6addrlbl_entry pointer.
      
      Fix this by inverting ip6addrlbl_hold() check.
      
      Fixes: 2a8cc6c8 ("[IPV6] ADDRCONF: Support RFC3484 configurable address selection policy table.")
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Reviewed-by: default avatarCong Wang <cwang@twopensource.com>
      Acked-by: default avatarYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      2c165804
    • Helge Deller's avatar
      parisc: Fix syscall restarts · 5244151e
      Helge Deller authored
      commit 71a71fb5 upstream.
      
      On parisc syscalls which are interrupted by signals sometimes failed to
      restart and instead returned -ENOSYS which in the worst case lead to
      userspace crashes.
      A similiar problem existed on MIPS and was fixed by commit e967ef02
      ("MIPS: Fix restart of indirect syscalls").
      
      On parisc the current syscall restart code assumes that all syscall
      callers load the syscall number in the delay slot of the ble
      instruction. That's how it is e.g. done in the unistd.h header file:
      	ble 0x100(%sr2, %r0)
      	ldi #syscall_nr, %r20
      Because of that assumption the current code never restored %r20 before
      returning to userspace.
      
      This assumption is at least not true for code which uses the glibc
      syscall() function, which instead uses this syntax:
      	ble 0x100(%sr2, %r0)
      	copy regX, %r20
      where regX depend on how the compiler optimizes the code and register
      usage.
      
      This patch fixes this problem by adding code to analyze how the syscall
      number is loaded in the delay branch and - if needed - copy the syscall
      number to regX prior returning to userspace for the syscall restart.
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      5244151e
    • David Howells's avatar
      KEYS: Fix race between read and revoke · 87f4dcb8
      David Howells authored
      commit b4a1b4f5 upstream.
      
      This fixes CVE-2015-7550.
      
      There's a race between keyctl_read() and keyctl_revoke().  If the revoke
      happens between keyctl_read() checking the validity of a key and the key's
      semaphore being taken, then the key type read method will see a revoked key.
      
      This causes a problem for the user-defined key type because it assumes in
      its read method that there will always be a payload in a non-revoked key
      and doesn't check for a NULL pointer.
      
      Fix this by making keyctl_read() check the validity of a key after taking
      semaphore instead of before.
      
      I think the bug was introduced with the original keyrings code.
      
      This was discovered by a multithreaded test program generated by syzkaller
      (http://github.com/google/syzkaller).  Here's a cleaned up version:
      
      	#include <sys/types.h>
      	#include <keyutils.h>
      	#include <pthread.h>
      	void *thr0(void *arg)
      	{
      		key_serial_t key = (unsigned long)arg;
      		keyctl_revoke(key);
      		return 0;
      	}
      	void *thr1(void *arg)
      	{
      		key_serial_t key = (unsigned long)arg;
      		char buffer[16];
      		keyctl_read(key, buffer, 16);
      		return 0;
      	}
      	int main()
      	{
      		key_serial_t key = add_key("user", "%", "foo", 3, KEY_SPEC_USER_KEYRING);
      		pthread_t th[5];
      		pthread_create(&th[0], 0, thr0, (void *)(unsigned long)key);
      		pthread_create(&th[1], 0, thr1, (void *)(unsigned long)key);
      		pthread_create(&th[2], 0, thr0, (void *)(unsigned long)key);
      		pthread_create(&th[3], 0, thr1, (void *)(unsigned long)key);
      		pthread_join(th[0], 0);
      		pthread_join(th[1], 0);
      		pthread_join(th[2], 0);
      		pthread_join(th[3], 0);
      		return 0;
      	}
      
      Build as:
      
      	cc -o keyctl-race keyctl-race.c -lkeyutils -lpthread
      
      Run as:
      
      	while keyctl-race; do :; done
      
      as it may need several iterations to crash the kernel.  The crash can be
      summarised as:
      
      	BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
      	IP: [<ffffffff81279b08>] user_read+0x56/0xa3
      	...
      	Call Trace:
      	 [<ffffffff81276aa9>] keyctl_read_key+0xb6/0xd7
      	 [<ffffffff81277815>] SyS_keyctl+0x83/0xe0
      	 [<ffffffff815dbb97>] entry_SYSCALL_64_fastpath+0x12/0x6f
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarJames Morris <james.l.morris@oracle.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      87f4dcb8
    • Alan Stern's avatar
      USB: fix invalid memory access in hub_activate() · 7a42c72c
      Alan Stern authored
      commit e50293ef upstream.
      
      Commit 8520f380 ("USB: change hub initialization sleeps to
      delayed_work") changed the hub_activate() routine to make part of it
      run in a workqueue.  However, the commit failed to take a reference to
      the usb_hub structure or to lock the hub interface while doing so.  As
      a result, if a hub is plugged in and quickly unplugged before the work
      routine can run, the routine will try to access memory that has been
      deallocated.  Or, if the hub is unplugged while the routine is
      running, the memory may be deallocated while it is in active use.
      
      This patch fixes the problem by taking a reference to the usb_hub at
      the start of hub_activate() and releasing it at the end (when the work
      is finished), and by locking the hub interface while the work routine
      is running.  It also adds a check at the start of the routine to see
      if the hub has already been disconnected, in which nothing should be
      done.
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Reported-by: default avatarAlexandru Cornea <alexandru.cornea@intel.com>
      Tested-by: default avatarAlexandru Cornea <alexandru.cornea@intel.com>
      Fixes: 8520f380 ("USB: change hub initialization sleeps to delayed_work")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [lizf: Backported to 3.4: add forward declaration of hub_release()]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      7a42c72c
    • Dan Carpenter's avatar
      USB: ipaq.c: fix a timeout loop · 681e2852
      Dan Carpenter authored
      commit abdc9a3b upstream.
      
      The code expects the loop to end with "retries" set to zero but, because
      it is a post-op, it will end set to -1.  I have fixed this by moving the
      decrement inside the loop.
      
      Fixes: 014aa2a3 ('USB: ipaq: minor ipaq_open() cleanup.')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      681e2852
    • Konrad Rzeszutek Wilk's avatar
      xen/pciback: Don't allow MSI-X ops if PCI_COMMAND_MEMORY is not set. · c053c997
      Konrad Rzeszutek Wilk authored
      commit 408fb0e5 upstream.
      
      commit f598282f ("PCI: Fix the NIU MSI-X problem in a better way")
      teaches us that dealing with MSI-X can be troublesome.
      
      Further checks in the MSI-X architecture shows that if the
      PCI_COMMAND_MEMORY bit is turned of in the PCI_COMMAND we
      may not be able to access the BAR (since they are memory regions).
      
      Since the MSI-X tables are located in there.. that can lead
      to us causing PCIe errors. Inhibit us performing any
      operation on the MSI-X unless the MEMORY bit is set.
      
      Note that Xen hypervisor with:
      "x86/MSI-X: access MSI-X table only after having enabled MSI-X"
      will return:
      xen_pciback: 0000:0a:00.1: error -6 enabling MSI-X for guest 3!
      
      When the generic MSI code tries to setup the PIRQ without
      MEMORY bit set. Which means with later versions of Xen
      (4.6) this patch is not neccessary.
      
      This is part of XSA-157
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      c053c997
    • Konrad Rzeszutek Wilk's avatar
      xen/pciback: For XEN_PCI_OP_disable_msi[|x] only disable if device has MSI(X) enabled. · 7e8c20ac
      Konrad Rzeszutek Wilk authored
      commit 7cfb905b upstream.
      
      Otherwise just continue on, returning the same values as
      previously (return of 0, and op->result has the PIRQ value).
      
      This does not change the behavior of XEN_PCI_OP_disable_msi[|x].
      
      The pci_disable_msi or pci_disable_msix have the checks for
      msi_enabled or msix_enabled so they will error out immediately.
      
      However the guest can still call these operations and cause
      us to disable the 'ack_intr'. That means the backend IRQ handler
      for the legacy interrupt will not respond to interrupts anymore.
      
      This will lead to (if the device is causing an interrupt storm)
      for the Linux generic code to disable the interrupt line.
      
      Naturally this will only happen if the device in question
      is plugged in on the motherboard on shared level interrupt GSI.
      
      This is part of XSA-157
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      7e8c20ac
    • Konrad Rzeszutek Wilk's avatar
      xen/pciback: Do not install an IRQ handler for MSI interrupts. · 073a5ac1
      Konrad Rzeszutek Wilk authored
      commit a396f3a2 upstream.
      
      Otherwise an guest can subvert the generic MSI code to trigger
      an BUG_ON condition during MSI interrupt freeing:
      
       for (i = 0; i < entry->nvec_used; i++)
              BUG_ON(irq_has_action(entry->irq + i));
      
      Xen PCI backed installs an IRQ handler (request_irq) for
      the dev->irq whenever the guest writes PCI_COMMAND_MEMORY
      (or PCI_COMMAND_IO) to the PCI_COMMAND register. This is
      done in case the device has legacy interrupts the GSI line
      is shared by the backend devices.
      
      To subvert the backend the guest needs to make the backend
      to change the dev->irq from the GSI to the MSI interrupt line,
      make the backend allocate an interrupt handler, and then command
      the backend to free the MSI interrupt and hit the BUG_ON.
      
      Since the backend only calls 'request_irq' when the guest
      writes to the PCI_COMMAND register the guest needs to call
      XEN_PCI_OP_enable_msi before any other operation. This will
      cause the generic MSI code to setup an MSI entry and
      populate dev->irq with the new PIRQ value.
      
      Then the guest can write to PCI_COMMAND PCI_COMMAND_MEMORY
      and cause the backend to setup an IRQ handler for dev->irq
      (which instead of the GSI value has the MSI pirq). See
      'xen_pcibk_control_isr'.
      
      Then the guest disables the MSI: XEN_PCI_OP_disable_msi
      which ends up triggering the BUG_ON condition in 'free_msi_irqs'
      as there is an IRQ handler for the entry->irq (dev->irq).
      
      Note that this cannot be done using MSI-X as the generic
      code does not over-write dev->irq with the MSI-X PIRQ values.
      
      The patch inhibits setting up the IRQ handler if MSI or
      MSI-X (for symmetry reasons) code had been called successfully.
      
      P.S.
      Xen PCIBack when it sets up the device for the guest consumption
      ends up writting 0 to the PCI_COMMAND (see xen_pcibk_reset_device).
      XSA-120 addendum patch removed that - however when upstreaming said
      addendum we found that it caused issues with qemu upstream. That
      has now been fixed in qemu upstream.
      
      This is part of XSA-157
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      073a5ac1
    • Konrad Rzeszutek Wilk's avatar
      xen/pciback: Return error on XEN_PCI_OP_enable_msix when device has MSI or MSI-X enabled · 8f805dec
      Konrad Rzeszutek Wilk authored
      commit 5e0ce145 upstream.
      
      The guest sequence of:
      
        a) XEN_PCI_OP_enable_msix
        b) XEN_PCI_OP_enable_msix
      
      results in hitting an NULL pointer due to using freed pointers.
      
      The device passed in the guest MUST have MSI-X capability.
      
      The a) constructs and SysFS representation of MSI and MSI groups.
      The b) adds a second set of them but adding in to SysFS fails (duplicate entry).
      'populate_msi_sysfs' frees the newly allocated msi_irq_groups (note that
      in a) pdev->msi_irq_groups is still set) and also free's ALL of the
      MSI-X entries of the device (the ones allocated in step a) and b)).
      
      The unwind code: 'free_msi_irqs' deletes all the entries and tries to
      delete the pdev->msi_irq_groups (which hasn't been set to NULL).
      However the pointers in the SysFS are already freed and we hit an
      NULL pointer further on when 'strlen' is attempted on a freed pointer.
      
      The patch adds a simple check in the XEN_PCI_OP_enable_msix to guard
      against that. The check for msi_enabled is not stricly neccessary.
      
      This is part of XSA-157
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      8f805dec
    • Konrad Rzeszutek Wilk's avatar
      xen/pciback: Return error on XEN_PCI_OP_enable_msi when device has MSI or MSI-X enabled · d3ec8867
      Konrad Rzeszutek Wilk authored
      commit 56441f3c upstream.
      
      The guest sequence of:
      
       a) XEN_PCI_OP_enable_msi
       b) XEN_PCI_OP_enable_msi
       c) XEN_PCI_OP_disable_msi
      
      results in hitting an BUG_ON condition in the msi.c code.
      
      The MSI code uses an dev->msi_list to which it adds MSI entries.
      Under the above conditions an BUG_ON() can be hit. The device
      passed in the guest MUST have MSI capability.
      
      The a) adds the entry to the dev->msi_list and sets msi_enabled.
      The b) adds a second entry but adding in to SysFS fails (duplicate entry)
      and deletes all of the entries from msi_list and returns (with msi_enabled
      is still set).  c) pci_disable_msi passes the msi_enabled checks and hits:
      
      BUG_ON(list_empty(dev_to_msi_list(&dev->dev)));
      
      and blows up.
      
      The patch adds a simple check in the XEN_PCI_OP_enable_msi to guard
      against that. The check for msix_enabled is not stricly neccessary.
      
      This is part of XSA-157.
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      d3ec8867
    • Konrad Rzeszutek Wilk's avatar
      xen/pciback: Save xen_pci_op commands before processing it · 550fe257
      Konrad Rzeszutek Wilk authored
      commit 8135cf8b upstream.
      
      Double fetch vulnerabilities that happen when a variable is
      fetched twice from shared memory but a security check is only
      performed the first time.
      
      The xen_pcibk_do_op function performs a switch statements on the op->cmd
      value which is stored in shared memory. Interestingly this can result
      in a double fetch vulnerability depending on the performed compiler
      optimization.
      
      This patch fixes it by saving the xen_pci_op command before
      processing it. We also use 'barrier' to make sure that the
      compiler does not perform any optimization.
      
      This is part of XSA155.
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarJan Beulich <JBeulich@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      550fe257
    • Roger Pau Monné's avatar
      xen-blkback: only read request operation from shared ring once · a7bc1af5
      Roger Pau Monné authored
      commit 1f13d75c upstream.
      
      A compiler may load a switch statement value multiple times, which could
      be bad when the value is in memory shared with the frontend.
      
      When converting a non-native request to a native one, ensure that
      src->operation is only loaded once by using READ_ONCE().
      
      This is part of XSA155.
      Signed-off-by: default avatarRoger Pau Monné <roger.pau@citrix.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      [lizf: Backported to 3.4:
       - adjust context
       - call ACCESS_ONCE instead of READ_ONCE]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      a7bc1af5
    • David Vrabel's avatar
      xen-netback: use RING_COPY_REQUEST() throughout · f97ed0a9
      David Vrabel authored
      commit 68a33bfd upstream.
      
      Instead of open-coding memcpy()s and directly accessing Tx and Rx
      requests, use the new RING_COPY_REQUEST() that ensures the local copy
      is correct.
      
      This is more than is strictly necessary for guest Rx requests since
      only the id and gref fields are used and it is harmless if the
      frontend modifies these.
      
      This is part of XSA155.
      Reviewed-by: default avatarWei Liu <wei.liu2@citrix.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      [lizf: Backported to 3.4:
       - adjust context
       - s/queue/vif/g]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      f97ed0a9
    • David Vrabel's avatar
      xen-netback: don't use last request to determine minimum Tx credit · ac2ce7ef
      David Vrabel authored
      commit 0f589967 upstream.
      
      The last from guest transmitted request gives no indication about the
      minimum amount of credit that the guest might need to send a packet
      since the last packet might have been a small one.
      
      Instead allow for the worst case 128 KiB packet.
      
      This is part of XSA155.
      Reviewed-by: default avatarWei Liu <wei.liu2@citrix.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      [lizf: Backported to 3.4: s/queue/vif/g]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      ac2ce7ef
    • David Vrabel's avatar
      xen: Add RING_COPY_REQUEST() · ffac7466
      David Vrabel authored
      commit 454d5d88 upstream.
      
      Using RING_GET_REQUEST() on a shared ring is easy to use incorrectly
      (i.e., by not considering that the other end may alter the data in the
      shared ring while it is being inspected).  Safe usage of a request
      generally requires taking a local copy.
      
      Provide a RING_COPY_REQUEST() macro to use instead of
      RING_GET_REQUEST() and an open-coded memcpy().  This takes care of
      ensuring that the copy is done correctly regardless of any possible
      compiler optimizations.
      
      Use a volatile source to prevent the compiler from reordering or
      omitting the copy.
      
      This is part of XSA155.
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      ffac7466
    • Steven Rostedt (Red Hat)'s avatar
      ftrace/scripts: Have recordmcount copy the object file · 334e9079
      Steven Rostedt (Red Hat) authored
      commit a50bd439 upstream.
      
      Russell King found that he had weird side effects when compiling the kernel
      with hard linked ccache. The reason was that recordmcount modified the
      kernel in place via mmap, and when a file gets modified twice by
      recordmcount, it will complain about it. To fix this issue, Russell wrote a
      patch that checked if the file was hard linked more than once and would
      unlink it if it was.
      
      Linus Torvalds was not happy with the fact that recordmcount does this in
      place modification. Instead of doing the unlink only if the file has two or
      more hard links, it does the unlink all the time. In otherwords, it always
      does a copy if it changed something. That is, it does the write out if a
      change was made.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      334e9079
    • Russell King's avatar
      scripts: recordmcount: break hardlinks · 38b8ae4f
      Russell King authored
      commit dd39a265 upstream.
      
      recordmcount edits the file in-place, which can cause problems when
      using ccache in hardlink mode.  Arrange for recordmcount to break a
      hardlinked object.
      
      Link: http://lkml.kernel.org/r/E1a7MVT-0000et-62@rmk-PC.arm.linux.org.ukSigned-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      38b8ae4f
    • Johan Hovold's avatar
      spi: fix parent-device reference leak · 1b17302a
      Johan Hovold authored
      commit 157f38f9 upstream.
      
      Fix parent-device reference leak due to SPI-core taking an unnecessary
      reference to the parent when allocating the master structure, a
      reference that was never released.
      
      Note that driver core takes its own reference to the parent when the
      master device is registered.
      
      Fixes: 49dce689 ("spi doesn't need class_device")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      1b17302a
    • Tilman Schmidt's avatar
      ser_gigaset: fix deallocation of platform device structure · 325c0423
      Tilman Schmidt authored
      commit 4c5e354a upstream.
      
      When shutting down the device, the struct ser_cardstate must not be
      kfree()d immediately after the call to platform_device_unregister()
      since the embedded struct platform_device is still in use.
      Move the kfree() call to the release method instead.
      Signed-off-by: default avatarTilman Schmidt <tilman@imap.cc>
      Fixes: 2869b23e ("drivers/isdn/gigaset: new M101 driver (v2)")
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarPaul Bolle <pebolle@tiscali.nl>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      325c0423
    • Dan Carpenter's avatar
      mISDN: fix a loop count · eeeeb976
      Dan Carpenter authored
      commit 40d24c4d upstream.
      
      There are two issue here.
      1)  cnt starts as maxloop + 1 so all these loops iterate one more time
          than intended.
      2)  At the end of the loop we test for "if (maxloop && !cnt)" but for
          the first two loops, we end with cnt equal to -1.  Changing this to
          a pre-op means we end with cnt set to 0.
      
      Fixes: cae86d4a ('mISDN: Add driver for Infineon ISDN chipset family')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      eeeeb976
    • Anson Huang's avatar
      ARM: 8471/1: need to save/restore arm register(r11) when it is corrupted · 0629784c
      Anson Huang authored
      commit fa0708b3 upstream.
      
      In cpu_v7_do_suspend routine, r11 is used while it is NOT
      saved/restored, different compiler may have different usage
      of ARM general registers, so it may cause issues during
      calling cpu_v7_do_suspend.
      
      We meet kernel fault occurs when using GCC 4.8.3, r11 contains
      valid value before calling into cpu_v7_do_suspend, but when returned
      from this routine, r11 is corrupted and lead to kernel fault.
      Doing save/restore for those corrupted registers is a must in
      assemble code.
      Signed-off-by: default avatarAnson Huang <Anson.Huang@freescale.com>
      Reviewed-by: default avatarNicolas Pitre <nico@linaro.org>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      0629784c
    • Sergei Shtylyov's avatar
      sh_eth: fix TX buffer byte-swapping · efd17100
      Sergei Shtylyov authored
      commit 3e230993 upstream.
      
      For the little-endian SH771x kernels the driver has to byte-swap the RX/TX
      buffers,  however yet unset physcial address from the TX descriptor is used
      to call sh_eth_soft_swap(). Use 'skb->data' instead...
      
      Fixes: 31fcb99d ("net: sh_eth: remove __flush_purge_region")
      Signed-off-by: default avatarSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      efd17100
    • Thomas Gleixner's avatar
      genirq: Prevent chip buslock deadlock · 7b4eaec6
      Thomas Gleixner authored
      commit abc7e40c upstream.
      
      If a interrupt chip utilizes chip->buslock then free_irq() can
      deadlock in the following way:
      
      CPU0				CPU1
      				interrupt(X) (Shared or spurious)
      free_irq(X)			interrupt_thread(X)
      chip_bus_lock(X)
      				   irq_finalize_oneshot(X)
      				     chip_bus_lock(X)
      synchronize_irq(X)
      
      synchronize_irq() waits for the interrupt thread to complete,
      i.e. forever.
      
      Solution is simple: Drop chip_bus_lock() before calling
      synchronize_irq() as we do with the irq_desc lock. There is nothing to
      be protected after the point where irq_desc lock has been released.
      
      This adds chip_bus_lock/unlock() to the remove_irq() code path, but
      that's actually correct in the case where remove_irq() is called on
      such an interrupt. The current users of remove_irq() are not affected
      as none of those interrupts is on a chip which requires buslock.
      Reported-by: default avatarFredrik Markström <fredrik.markstrom@gmail.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      7b4eaec6
    • Peter Hurley's avatar
      tty: Fix GPF in flush_to_ldisc() · d6aefa87
      Peter Hurley authored
      commit 9ce119f3 upstream.
      
      A line discipline which does not define a receive_buf() method can
      can cause a GPF if data is ever received [1]. Oddly, this was known
      to the author of n_tracesink in 2011, but never fixed.
      
      [1] GPF report
          BUG: unable to handle kernel NULL pointer dereference at           (null)
          IP: [<          (null)>]           (null)
          PGD 3752d067 PUD 37a7b067 PMD 0
          Oops: 0010 [#1] SMP KASAN
          Modules linked in:
          CPU: 2 PID: 148 Comm: kworker/u10:2 Not tainted 4.4.0-rc2+ #51
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
          Workqueue: events_unbound flush_to_ldisc
          task: ffff88006da94440 ti: ffff88006db60000 task.ti: ffff88006db60000
          RIP: 0010:[<0000000000000000>]  [<          (null)>]           (null)
          RSP: 0018:ffff88006db67b50  EFLAGS: 00010246
          RAX: 0000000000000102 RBX: ffff88003ab32f88 RCX: 0000000000000102
          RDX: 0000000000000000 RSI: ffff88003ab330a6 RDI: ffff88003aabd388
          RBP: ffff88006db67c48 R08: ffff88003ab32f9c R09: ffff88003ab31fb0
          R10: ffff88003ab32fa8 R11: 0000000000000000 R12: dffffc0000000000
          R13: ffff88006db67c20 R14: ffffffff863df820 R15: ffff88003ab31fb8
          FS:  0000000000000000(0000) GS:ffff88006dc00000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
          CR2: 0000000000000000 CR3: 0000000037938000 CR4: 00000000000006e0
          Stack:
           ffffffff829f46f1 ffff88006da94bf8 ffff88006da94bf8 0000000000000000
           ffff88003ab31fb0 ffff88003aabd438 ffff88003ab31ff8 ffff88006430fd90
           ffff88003ab32f9c ffffed0007557a87 1ffff1000db6cf78 ffff88003ab32078
          Call Trace:
           [<ffffffff8127cf91>] process_one_work+0x8f1/0x17a0 kernel/workqueue.c:2030
           [<ffffffff8127df14>] worker_thread+0xd4/0x1180 kernel/workqueue.c:2162
           [<ffffffff8128faaf>] kthread+0x1cf/0x270 drivers/block/aoe/aoecmd.c:1302
           [<ffffffff852a7c2f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:468
          Code:  Bad RIP value.
          RIP  [<          (null)>]           (null)
           RSP <ffff88006db67b50>
          CR2: 0000000000000000
          ---[ end trace a587f8947e54d6ea ]---
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarPeter Hurley <peter@hurleysoftware.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [lizf: Backportd to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      d6aefa87
    • Naoya Horiguchi's avatar
      mm: hugetlb: call huge_pte_alloc() only if ptep is null · c3634024
      Naoya Horiguchi authored
      commit 0d777df5 upstream.
      
      Currently at the beginning of hugetlb_fault(), we call huge_pte_offset()
      and check whether the obtained *ptep is a migration/hwpoison entry or
      not.  And if not, then we get to call huge_pte_alloc().  This is racy
      because the *ptep could turn into migration/hwpoison entry after the
      huge_pte_offset() check.  This race results in BUG_ON in
      huge_pte_alloc().
      
      We don't have to call huge_pte_alloc() when the huge_pte_offset()
      returns non-NULL, so let's fix this bug with moving the code into else
      block.
      
      Note that the *ptep could turn into a migration/hwpoison entry after
      this block, but that's not a problem because we have another
      !pte_present check later (we never go into hugetlb_no_page() in that
      case.)
      
      Fixes: 290408d4 ("hugetlb: hugepage migration core")
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      c3634024
    • Michal Hocko's avatar
      mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress · 80357219
      Michal Hocko authored
      commit 373ccbe5 upstream.
      
      Tetsuo Handa has reported that the system might basically livelock in
      OOM condition without triggering the OOM killer.
      
      The issue is caused by internal dependency of the direct reclaim on
      vmstat counter updates (via zone_reclaimable) which are performed from
      the workqueue context.  If all the current workers get assigned to an
      allocation request, though, they will be looping inside the allocator
      trying to reclaim memory but zone_reclaimable can see stalled numbers so
      it will consider a zone reclaimable even though it has been scanned way
      too much.  WQ concurrency logic will not consider this situation as a
      congested workqueue because it relies that worker would have to sleep in
      such a situation.  This also means that it doesn't try to spawn new
      workers or invoke the rescuer thread if the one is assigned to the
      queue.
      
      In order to fix this issue we need to do two things.  First we have to
      let wq concurrency code know that we are in trouble so we have to do a
      short sleep.  In order to prevent from issues handled by 0e093d99
      ("writeback: do not sleep on the congestion queue if there are no
      congested BDIs or if significant congestion is not being encountered in
      the current zone") we limit the sleep only to worker threads which are
      the ones of the interest anyway.
      
      The second thing to do is to create a dedicated workqueue for vmstat and
      mark it WQ_MEM_RECLAIM to note it participates in the reclaim and to
      have a spare worker thread for it.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Cristopher Lameter <clameter@sgi.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Arkadiusz Miskiewicz <arekm@maven.pl>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      80357219