1. 27 Apr, 2016 40 commits
    • Laura Abbott's avatar
      xhci: Add spurious wakeup quirk for LynxPoint-LP controllers · 4333b97f
      Laura Abbott authored
      commit fd7cd061 upstream.
      
      We received several reports of systems rebooting and powering on
      after an attempted shutdown. Testing showed that setting
      XHCI_SPURIOUS_WAKEUP quirk in addition to the XHCI_SPURIOUS_REBOOT
      quirk allowed the system to shutdown as expected for LynxPoint-LP
      xHCI controllers. Set the quirk back.
      
      Note that the quirk was originally introduced for LynxPoint and
      LynxPoint-LP just for this same reason. See:
      
      commit 638298dc ("xhci: Fix spurious wakeups after S5 on Haswell")
      
      It was later limited to only concern HP machines as it caused
      regression on some machines, see both bug and commit:
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=66171
      commit 6962d914 ("xhci: Limit the spurious wakeup fix only to HP machines")
      
      Later it was discovered that the powering on after shutdown
      was limited to LynxPoint-LP (Haswell-ULT) and that some non-LP HP
      machine suffered from spontaneous resume from S3 (which should
      not be related to the SPURIOUS_WAKEUP quirk at all). An attempt
      to fix this then removed the SPURIOUS_WAKEUP flag usage completely.
      
      commit b45abacd ("xhci: no switching back on non-ULT Haswell")
      
      Current understanding is that LynxPoint-LP (Haswell ULT) machines
      need the SPURIOUS_WAKEUP quirk, otherwise they will restart, and
      plain Lynxpoint (Haswell) machines may _not_ have the quirk
      set otherwise they again will restart.
      Signed-off-by: default avatarLaura Abbott <labbott@fedoraproject.org>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Oliver Neukum <oneukum@suse.com>
      [Added more history to commit message -Mathias]
      Signed-off-by: default avatarMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      4333b97f
    • Mathias Nyman's avatar
      xhci: handle no ping response error properly · 720083a9
      Mathias Nyman authored
      commit 3b4739b8 upstream.
      
      If a host fails to wake up a isochronous SuperSpeed device from U1/U2
      in time for a isoch transfer it will generate a "No ping response error"
      Host will then move to the next transfer descriptor.
      
      Handle this case in the same way as missed service errors, tag the
      current TD as skipped and handle it on the next transfer event.
      Signed-off-by: default avatarMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      720083a9
    • Christian Zander's avatar
      iommu/vt-d: fix range computation when making room for large pages · d425239d
      Christian Zander authored
      commit ba2374fd upstream.
      
      In preparation for the installation of a large page, any small page
      tables that may still exist in the target IOV address range are
      removed.  However, if a scatter/gather list entry is large enough to
      fit more than one large page, the address space for any subsequent
      large pages is not cleared of conflicting small page tables.
      
      This can cause legitimate mapping requests to fail with errors of the
      form below, potentially followed by a series of IOMMU faults:
      
      ERROR: DMA PTE for vPFN 0xfde00 already set (to 7f83a4003 not 7e9e00083)
      
      In this example, a 4MiB scatter/gather list entry resulted in the
      successful installation of a large page @ vPFN 0xfdc00, followed by
      a failed attempt to install another large page @ vPFN 0xfde00, due to
      the presence of a pointer to a small page table @ 0x7f83a4000.
      
      To address this problem, compute the number of large pages that fit
      into a given scatter/gather list entry, and use it to derive the
      last vPFN covered by the large page(s).
      Signed-off-by: default avatarChristian Zander <christian@nervanasys.com>
      Signed-off-by: default avatarDavid Woodhouse <David.Woodhouse@intel.com>
      [bwh: Backported to 3.2:
       - Add the lvl_pages variable, added by an earlier commit upstream
       - Also change arguments to dma_pte_clear_range(), which is called by
         dma_pte_free_pagetable() upstream]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      d425239d
    • Russell King's avatar
      crypto: ahash - ensure statesize is non-zero · 2147886b
      Russell King authored
      commit 8996eafd upstream.
      
      Unlike shash algorithms, ahash drivers must implement export
      and import as their descriptors may contain hardware state and
      cannot be exported as is.  Unfortunately some ahash drivers did
      not provide them and end up causing crashes with algif_hash.
      
      This patch adds a check to prevent these drivers from registering
      ahash algorithms until they are fixed.
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      2147886b
    • Cathy Avery's avatar
      xen-blkfront: check for null drvdata in blkback_changed (XenbusStateClosing) · efb04943
      Cathy Avery authored
      commit a54c8f0f upstream.
      
      xen-blkfront will crash if the check to talk_to_blkback()
      in blkback_changed()(XenbusStateInitWait) returns an error.
      The driver data is freed and info is set to NULL. Later during
      the close process via talk_to_blkback's call to xenbus_dev_fatal()
      the null pointer is passed to and dereference in blkfront_closing.
      Signed-off-by: default avatarCathy Avery <cathy.avery@oracle.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      efb04943
    • Takashi Iwai's avatar
      ALSA: synth: Fix conflicting OSS device registration on AWE32 · de7f6bfb
      Takashi Iwai authored
      commit 225db576 upstream.
      
      When OSS emulation is loaded on ISA SB AWE32 chip, we get now kernel
      warnings like:
        WARNING: CPU: 0 PID: 2791 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x51/0x80()
        sysfs: cannot create duplicate filename '/devices/isa/sbawe.0/sound/card0/seq-oss-0-0'
      
      It's because both emux synth and opl3 drivers try to register their
      OSS device object with the same static index number 0.  This hasn't
      been a big problem until the recent rewrite of device management code
      (that exposes sysfs at the same time), but it's been an obvious bug.
      
      This patch works around it just by using a different index number of
      emux synth object.  There can be a more elegant way to fix, but it's
      enough for now, as this code won't be touched so often, in anyway.
      Reported-and-tested-by: default avatarMichael Shell <list1@michaelshell.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      de7f6bfb
    • Jann Horn's avatar
      drivers/tty: require read access for controlling terminal · 3f258c66
      Jann Horn authored
      commit 0c556271 upstream.
      
      This is mostly a hardening fix, given that write-only access to other
      users' ttys is usually only given through setgid tty executables.
      Signed-off-by: default avatarJann Horn <jann@thejh.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      3f258c66
    • Kosuke Tatsukawa's avatar
      tty: fix stall caused by missing memory barrier in drivers/tty/n_tty.c · 3a0b4c1d
      Kosuke Tatsukawa authored
      commit e81107d4 upstream.
      
      My colleague ran into a program stall on a x86_64 server, where
      n_tty_read() was waiting for data even if there was data in the buffer
      in the pty.  kernel stack for the stuck process looks like below.
       #0 [ffff88303d107b58] __schedule at ffffffff815c4b20
       #1 [ffff88303d107bd0] schedule at ffffffff815c513e
       #2 [ffff88303d107bf0] schedule_timeout at ffffffff815c7818
       #3 [ffff88303d107ca0] wait_woken at ffffffff81096bd2
       #4 [ffff88303d107ce0] n_tty_read at ffffffff8136fa23
       #5 [ffff88303d107dd0] tty_read at ffffffff81368013
       #6 [ffff88303d107e20] __vfs_read at ffffffff811a3704
       #7 [ffff88303d107ec0] vfs_read at ffffffff811a3a57
       #8 [ffff88303d107f00] sys_read at ffffffff811a4306
       #9 [ffff88303d107f50] entry_SYSCALL_64_fastpath at ffffffff815c86d7
      
      There seems to be two problems causing this issue.
      
      First, in drivers/tty/n_tty.c, __receive_buf() stores the data and
      updates ldata->commit_head using smp_store_release() and then checks
      the wait queue using waitqueue_active().  However, since there is no
      memory barrier, __receive_buf() could return without calling
      wake_up_interactive_poll(), and at the same time, n_tty_read() could
      start to wait in wait_woken() as in the following chart.
      
              __receive_buf()                         n_tty_read()
      ------------------------------------------------------------------------
      if (waitqueue_active(&tty->read_wait))
      /* Memory operations issued after the
         RELEASE may be completed before the
         RELEASE operation has completed */
                                              add_wait_queue(&tty->read_wait, &wait);
                                              ...
                                              if (!input_available_p(tty, 0)) {
      smp_store_release(&ldata->commit_head,
                        ldata->read_head);
                                              ...
                                              timeout = wait_woken(&wait,
                                                TASK_INTERRUPTIBLE, timeout);
      ------------------------------------------------------------------------
      
      The second problem is that n_tty_read() also lacks a memory barrier
      call and could also cause __receive_buf() to return without calling
      wake_up_interactive_poll(), and n_tty_read() to wait in wait_woken()
      as in the chart below.
      
              __receive_buf()                         n_tty_read()
      ------------------------------------------------------------------------
                                              spin_lock_irqsave(&q->lock, flags);
                                              /* from add_wait_queue() */
                                              ...
                                              if (!input_available_p(tty, 0)) {
                                              /* Memory operations issued after the
                                                 RELEASE may be completed before the
                                                 RELEASE operation has completed */
      smp_store_release(&ldata->commit_head,
                        ldata->read_head);
      if (waitqueue_active(&tty->read_wait))
                                              __add_wait_queue(q, wait);
                                              spin_unlock_irqrestore(&q->lock,flags);
                                              /* from add_wait_queue() */
                                              ...
                                              timeout = wait_woken(&wait,
                                                TASK_INTERRUPTIBLE, timeout);
      ------------------------------------------------------------------------
      
      There are also other places in drivers/tty/n_tty.c which have similar
      calls to waitqueue_active(), so instead of adding many memory barrier
      calls, this patch simply removes the call to waitqueue_active(),
      leaving just wake_up*() behind.
      
      This fixes both problems because, even though the memory access before
      or after the spinlocks in both wake_up*() and add_wait_queue() can
      sneak into the critical section, it cannot go past it and the critical
      section assures that they will be serialized (please see "INTER-CPU
      ACQUIRING BARRIER EFFECTS" in Documentation/memory-barriers.txt for a
      better explanation).  Moreover, the resulting code is much simpler.
      
      Latency measurement using a ping-pong test over a pty doesn't show any
      visible performance drop.
      Signed-off-by: default avatarKosuke Tatsukawa <tatsu@ab.jp.nec.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [lizf: Backported to 3.4:
       - adjust context
       - s/wake_up_interruptible_poll/wake_up_interruptible/
       - drop changes to __receive_buf() and n_tty_set_termios()]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      3a0b4c1d
    • Vincent Palatin's avatar
      usb: Add device quirk for Logitech PTZ cameras · d6706f05
      Vincent Palatin authored
      commit 72194739 upstream.
      
      Add a device quirk for the Logitech PTZ Pro Camera and its sibling the
      ConferenceCam CC3000e Camera.
      This fixes the failed camera enumeration on some boot, particularly on
      machines with fast CPU.
      
      Tested by connecting a Logitech PTZ Pro Camera to a machine with a
      Haswell Core i7-4600U CPU @ 2.10GHz, and doing thousands of reboot cycles
      while recording the kernel logs and taking camera picture after each boot.
      Before the patch, more than 7% of the boots show some enumeration transfer
      failures and in a few of them, the kernel is giving up before actually
      enumerating the webcam. After the patch, the enumeration has been correct
      on every reboot.
      Signed-off-by: default avatarVincent Palatin <vpalatin@chromium.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      d6706f05
    • Yao-Wen Mao's avatar
      USB: Add reset-resume quirk for two Plantronics usb headphones. · 18316131
      Yao-Wen Mao authored
      commit 8484bf29 upstream.
      
      These two headphones need a reset-resume quirk to properly resume to
      original volume level.
      Signed-off-by: default avatarYao-Wen Mao <yaowen@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      18316131
    • John Stultz's avatar
      clocksource: Fix abs() usage w/ 64bit values · b81cc21d
      John Stultz authored
      commit 67dfae0c upstream.
      
      This patch fixes one cases where abs() was being used with 64-bit
      nanosecond values, where the result may be capped at 32-bits.
      
      This potentially could cause watchdog false negatives on 32-bit
      systems, so this patch addresses the issue by using abs64().
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Link: http://lkml.kernel.org/r/1442279124-7309-2-git-send-email-john.stultz@linaro.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      b81cc21d
    • Mel Gorman's avatar
      mm: hugetlbfs: skip shared VMAs when unmapping private pages to satisfy a fault · e9c23599
      Mel Gorman authored
      commit 2f84a899 upstream.
      
      SunDong reported the following on
      
        https://bugzilla.kernel.org/show_bug.cgi?id=103841
      
      	I think I find a linux bug, I have the test cases is constructed. I
      	can stable recurring problems in fedora22(4.0.4) kernel version,
      	arch for x86_64.  I construct transparent huge page, when the parent
      	and child process with MAP_SHARE, MAP_PRIVATE way to access the same
      	huge page area, it has the opportunity to lead to huge page copy on
      	write failure, and then it will munmap the child corresponding mmap
      	area, but then the child mmap area with VM_MAYSHARE attributes, child
      	process munmap this area can trigger VM_BUG_ON in set_vma_resv_flags
      	functions (vma - > vm_flags & VM_MAYSHARE).
      
      There were a number of problems with the report (e.g.  it's hugetlbfs that
      triggers this, not transparent huge pages) but it was fundamentally
      correct in that a VM_BUG_ON in set_vma_resv_flags() can be triggered that
      looks like this
      
      	 vma ffff8804651fd0d0 start 00007fc474e00000 end 00007fc475e00000
      	 next ffff8804651fd018 prev ffff8804651fd188 mm ffff88046b1b1800
      	 prot 8000000000000027 anon_vma           (null) vm_ops ffffffff8182a7a0
      	 pgoff 0 file ffff88106bdb9800 private_data           (null)
      	 flags: 0x84400fb(read|write|shared|mayread|maywrite|mayexec|mayshare|dontexpand|hugetlb)
      	 ------------
      	 kernel BUG at mm/hugetlb.c:462!
      	 SMP
      	 Modules linked in: xt_pkttype xt_LOG xt_limit [..]
      	 CPU: 38 PID: 26839 Comm: map Not tainted 4.0.4-default #1
      	 Hardware name: Dell Inc. PowerEdge R810/0TT6JF, BIOS 2.7.4 04/26/2012
      	 set_vma_resv_flags+0x2d/0x30
      
      The VM_BUG_ON is correct because private and shared mappings have
      different reservation accounting but the warning clearly shows that the
      VMA is shared.
      
      When a private COW fails to allocate a new page then only the process
      that created the VMA gets the page -- all the children unmap the page.
      If the children access that data in the future then they get killed.
      
      The problem is that the same file is mapped shared and private.  During
      the COW, the allocation fails, the VMAs are traversed to unmap the other
      private pages but a shared VMA is found and the bug is triggered.  This
      patch identifies such VMAs and skips them.
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Reported-by: default avatarSunDong <sund_sky@126.com>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: David Rientjes <rientjes@google.com>
      Reviewed-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      e9c23599
    • Ben Hutchings's avatar
      genirq: Fix race in register_irq_proc() · a03288de
      Ben Hutchings authored
      commit 95c2b175 upstream.
      
      Per-IRQ directories in procfs are created only when a handler is first
      added to the irqdesc, not when the irqdesc is created.  In the case of
      a shared IRQ, multiple tasks can race to create a directory.  This
      race condition seems to have been present forever, but is easier to
      hit with async probing.
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Link: http://lkml.kernel.org/r/1443266636.2004.2.camel@decadent.org.ukSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      a03288de
    • Thomas Gleixner's avatar
      x86/process: Add proper bound checks in 64bit get_wchan() · 12bdf057
      Thomas Gleixner authored
      commit eddd3826 upstream.
      
      Dmitry Vyukov reported the following using trinity and the memory
      error detector AddressSanitizer
      (https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel).
      
      [ 124.575597] ERROR: AddressSanitizer: heap-buffer-overflow on
      address ffff88002e280000
      [ 124.576801] ffff88002e280000 is located 131938492886538 bytes to
      the left of 28857600-byte region [ffffffff81282e0a, ffffffff82e0830a)
      [ 124.578633] Accessed by thread T10915:
      [ 124.579295] inlined in describe_heap_address
      ./arch/x86/mm/asan/report.c:164
      [ 124.579295] #0 ffffffff810dd277 in asan_report_error
      ./arch/x86/mm/asan/report.c:278
      [ 124.580137] #1 ffffffff810dc6a0 in asan_check_region
      ./arch/x86/mm/asan/asan.c:37
      [ 124.581050] #2 ffffffff810dd423 in __tsan_read8 ??:0
      [ 124.581893] #3 ffffffff8107c093 in get_wchan
      ./arch/x86/kernel/process_64.c:444
      
      The address checks in the 64bit implementation of get_wchan() are
      wrong in several ways:
      
       - The lower bound of the stack is not the start of the stack
         page. It's the start of the stack page plus sizeof (struct
         thread_info)
      
       - The upper bound must be:
      
             top_of_stack - TOP_OF_KERNEL_STACK_PADDING - 2 * sizeof(unsigned long).
      
         The 2 * sizeof(unsigned long) is required because the stack pointer
         points at the frame pointer. The layout on the stack is: ... IP FP
         ... IP FP. So we need to make sure that both IP and FP are in the
         bounds.
      
      Fix the bound checks and get rid of the mix of numeric constants, u64
      and unsigned long. Making all unsigned long allows us to use the same
      function for 32bit as well.
      
      Use READ_ONCE() when accessing the stack. This does not prevent a
      concurrent wakeup of the task and the stack changing, but at least it
      avoids TOCTOU.
      
      Also check task state at the end of the loop. Again that does not
      prevent concurrent changes, but it avoids walking for nothing.
      
      Add proper comments while at it.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Based-on-patch-from: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Reviewed-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Kostya Serebryany <kcc@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: kasan-dev <kasan-dev@googlegroups.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de>
      Link: http://lkml.kernel.org/r/20150930083302.694788319@linutronix.deSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      [lizf: Backported to 3.4:
       - s/READ_ONCE/ACCESS_ONCE
       - remove TOP_OF_KERNEL_STACK_PADDING]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      12bdf057
    • shengyong's avatar
      UBI: return ENOSPC if no enough space available · 127cf7cb
      shengyong authored
      commit 7c7feb2e upstream.
      
      UBI: attaching mtd1 to ubi0
      UBI: scanning is finished
      UBI error: init_volumes: not enough PEBs, required 706, available 686
      UBI error: ubi_wl_init: no enough physical eraseblocks (-20, need 1)
      UBI error: ubi_attach_mtd_dev: failed to attach mtd1, error -12 <= NOT ENOMEM
      UBI error: ubi_init: cannot attach mtd1
      
      If available PEBs are not enough when initializing volumes, return -ENOSPC
      directly. If available PEBs are not enough when initializing WL, return
      -ENOSPC instead of -ENOMEM.
      Signed-off-by: default avatarSheng Yong <shengyong1@huawei.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      Reviewed-by: default avatarDavid Gstir <david@sigma-star.at>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      127cf7cb
    • Richard Weinberger's avatar
      UBI: Validate data_size · 15acb368
      Richard Weinberger authored
      commit 281fda27 upstream.
      
      Make sure that data_size is less than LEB size.
      Otherwise a handcrafted UBI image is able to trigger
      an out of bounds memory access in ubi_compare_lebs().
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      Reviewed-by: default avatarDavid Gstir <david@sigma-star.at>
      [lizf: Backported to 3.4: use dbg_err() instead of ubi_err()];
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      15acb368
    • Malcolm Crossley's avatar
      x86/xen: Do not clip xen_e820_map to xen_e820_map_entries when sanitizing map · 9ed559d3
      Malcolm Crossley authored
      commit 64c98e7f upstream.
      
      Sanitizing the e820 map may produce extra E820 entries which would result in
      the topmost E820 entries being removed. The removed entries would typically
      include the top E820 usable RAM region and thus result in the domain having
      signicantly less RAM available to it.
      
      Fix by allowing sanitize_e820_map to use the full size of the allocated E820
      array.
      Signed-off-by: default avatarMalcolm Crossley <malcolm.crossley@citrix.com>
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      [lizf: Backported to 3.4: s/map/xen_e820_map]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      9ed559d3
    • Andreas Schwab's avatar
      m68k: Define asmlinkage_protect · d311156a
      Andreas Schwab authored
      commit 8474ba74 upstream.
      
      Make sure the compiler does not modify arguments of syscall functions.
      This can happen if the compiler generates a tailcall to another
      function.  For example, without asmlinkage_protect sys_openat is compiled
      into this function:
      
      sys_openat:
      	clr.l %d0
      	move.w 18(%sp),%d0
      	move.l %d0,16(%sp)
      	jbra do_sys_open
      
      Note how the fourth argument is modified in place, modifying the register
      %d4 that gets restored from this stack slot when the function returns to
      user-space.  The caller may expect the register to be unmodified across
      system calls.
      Signed-off-by: default avatarAndreas Schwab <schwab@linux-m68k.org>
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      d311156a
    • Felix Fietkau's avatar
      ath9k: declare required extra tx headroom · edb236d7
      Felix Fietkau authored
      commit 029cd037 upstream.
      
      ath9k inserts padding between the 802.11 header and the data area (to
      align it). Since it didn't declare this extra required headroom, this
      led to some nasty issues like randomly dropped packets in some setups.
      Signed-off-by: default avatarFelix Fietkau <nbd@openwrt.org>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      edb236d7
    • Joseph Qi's avatar
      ocfs2/dlm: fix deadlock when dispatch assert master · f5499bfc
      Joseph Qi authored
      commit 012572d4 upstream.
      
      The order of the following three spinlocks should be:
      dlm_domain_lock < dlm_ctxt->spinlock < dlm_lock_resource->spinlock
      
      But dlm_dispatch_assert_master() is called while holding
      dlm_ctxt->spinlock and dlm_lock_resource->spinlock, and then it calls
      dlm_grab() which will take dlm_domain_lock.
      
      Once another thread (for example, dlm_query_join_handler) has already
      taken dlm_domain_lock, and tries to take dlm_ctxt->spinlock deadlock
      happens.
      Signed-off-by: default avatarJoseph Qi <joseph.qi@huawei.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: "Junxiao Bi" <junxiao.bi@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      f5499bfc
    • Peter Seiderer's avatar
      cifs: use server timestamp for ntlmv2 authentication · 6fa2028d
      Peter Seiderer authored
      commit 98ce94c8 upstream.
      
      Linux cifs mount with ntlmssp against an Mac OS X (Yosemite
      10.10.5) share fails in case the clocks differ more than +/-2h:
      
      digest-service: digest-request: od failed with 2 proto=ntlmv2
      digest-service: digest-request: kdc failed with -1561745592 proto=ntlmv2
      
      Fix this by (re-)using the given server timestamp for the
      ntlmv2 authentication (as Windows 7 does).
      
      A related problem was also reported earlier by Namjae Jaen (see below):
      
      Windows machine has extended security feature which refuse to allow
      authentication when there is time difference between server time and
      client time when ntlmv2 negotiation is used. This problem is prevalent
      in embedded enviornment where system time is set to default 1970.
      
      Modern servers send the server timestamp in the TargetInfo Av_Pair
      structure in the challenge message [see MS-NLMP 2.2.2.1]
      In [MS-NLMP 3.1.5.1.2] it is explicitly mentioned that the client must
      use the server provided timestamp if present OR current time if it is
      not
      Reported-by: default avatarNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: default avatarPeter Seiderer <ps.report@gmx.net>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      6fa2028d
    • Mathias Nyman's avatar
      xhci: change xhci 1.0 only restrictions to support xhci 1.1 · 276a6c94
      Mathias Nyman authored
      commit dca77945 upstream.
      
      Some changes between xhci 0.96 and xhci 1.0 specifications forced us to
      check the hci version in code, some of these checks were implemented as
      hci_version == 1.0, which will not work with new xhci 1.1 controllers.
      
      xhci 1.1 behaves similar to xhci 1.0 in these cases, so change these
      checks to hci_version >= 1.0
      Signed-off-by: default avatarMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      276a6c94
    • Roger Quadros's avatar
      usb: xhci: Clear XHCI_STATE_DYING on start · fa8600fa
      Roger Quadros authored
      commit e5bfeab0 upstream.
      
      For whatever reason if XHCI died in the previous instant
      then it will never recover on the next xhci_start unless we
      clear the DYING flag.
      Signed-off-by: default avatarRoger Quadros <rogerq@ti.com>
      Signed-off-by: default avatarMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      fa8600fa
    • Mathias Nyman's avatar
      xhci: give command abortion one more chance before killing xhci · 63a2bddd
      Mathias Nyman authored
      commit a6809ffd upstream.
      
      We want to give the command abortion an additional try to stop
      the command ring before we completely hose xhci.
      Tested-by: default avatarVincent Pelletier <plr.vincent@gmail.com>
      Signed-off-by: default avatarMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [lizf: Backported to 3.4: call handshake() instead of xhci_handshake()]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      63a2bddd
    • Mathias Nyman's avatar
      usb: Use the USB_SS_MULT() macro to get the burst multiplier. · 40dba0fd
      Mathias Nyman authored
      commit ff30cbc8 upstream.
      
      Bits 1:0 of the bmAttributes are used for the burst multiplier.
      The rest of the bits used to be reserved (zero), but USB3.1 takes bit 7
      into use.
      
      Use the existing USB_SS_MULT() macro instead to make sure the mult value
      and hence max packet calculations are correct for USB3.1 devices.
      
      Note that burst multiplier in bmAttributes is zero based and that
      the USB_SS_MULT() macro adds one.
      Signed-off-by: default avatarMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      40dba0fd
    • Paolo Bonzini's avatar
      KVM: x86: trap AMD MSRs for the TSeg base and mask · 7d148ce4
      Paolo Bonzini authored
      commit 3afb1121 upstream.
      
      These have roughly the same purpose as the SMRR, which we do not need
      to implement in KVM.  However, Linux accesses MSR_K8_TSEG_ADDR at
      boot, which causes problems when running a Xen dom0 under KVM.
      Just return 0, meaning that processor protection of SMRAM is not
      in effect.
      Reported-by: default avatarM A Young <m.a.young@durham.ac.uk>
      Acked-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      7d148ce4
    • Mark Brown's avatar
      regmap: debugfs: Don't bother actually printing when calculating max length · 42bffe1a
      Mark Brown authored
      commit 176fc2d5 upstream.
      
      The in kernel snprintf() will conveniently return the actual length of
      the printed string even if not given an output beffer at all so just do
      that rather than relying on the user to pass in a suitable buffer,
      ensuring that we don't need to worry if the buffer was truncated due to
      the size of the buffer passed in.
      Reported-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      42bffe1a
    • Mark Brown's avatar
      regmap: debugfs: Ensure we don't underflow when printing access masks · f4524d72
      Mark Brown authored
      commit b763ec17 upstream.
      
      If a read is attempted which is smaller than the line length then we may
      underflow the subtraction we're doing with the unsigned size_t type so
      move some of the calculation to be additions on the right hand side
      instead in order to avoid this.
      Reported-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      f4524d72
    • Jeff Mahoney's avatar
      btrfs: skip waiting on ordered range for special files · 413e7340
      Jeff Mahoney authored
      commit a30e577c upstream.
      
      In btrfs_evict_inode, we properly truncate the page cache for evicted
      inodes but then we call btrfs_wait_ordered_range for every inode as well.
      It's the right thing to do for regular files but results in incorrect
      behavior for device inodes for block devices.
      
      filemap_fdatawrite_range gets called with inode->i_mapping which gets
      resolved to the block device inode before getting passed to
      wbc_attach_fdatawrite_inode and ultimately to inode_to_bdi.  What happens
      next depends on whether there's an open file handle associated with the
      inode.  If there is, we write to the block device, which is unexpected
      behavior.  If there isn't, we through normally and inode->i_data is used.
      We can also end up racing against open/close which can result in crashes
      when i_mapping points to a block device inode that has been closed.
      
      Since there can't be any page cache associated with special file inodes,
      it's safe to skip the btrfs_wait_ordered_range call entirely and avoid
      the problem.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=100911Tested-by: default avatarChristoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
      Signed-off-by: default avatarJeff Mahoney <jeffm@suse.com>
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      413e7340
    • Guenter Roeck's avatar
      spi: Fix documentation of spi_alloc_master() · 557b53d8
      Guenter Roeck authored
      commit a394d635 upstream.
      
      Actually, spi_master_put() after spi_alloc_master() must _not_ be followed
      by kfree(). The memory is already freed with the call to spi_master_put()
      through spi_master_class, which registers a release function. Calling both
      spi_master_put() and kfree() results in often nasty (and delayed) crashes
      elsewhere in the kernel, often in the networking stack.
      
      This reverts commit eb4af0f5.
      
      Link to patch and concerns: https://lkml.org/lkml/2012/9/3/269
      or
      http://lkml.iu.edu/hypermail/linux/kernel/1209.0/00790.html
      
      Alexey Klimov: This revert becomes valid after
      94c69f76 when spi-imx.c
      has been fixed and there is no need to call kfree() so comment
      for spi_alloc_master() should be fixed.
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarAlexey Klimov <alexey.klimov@linaro.org>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      557b53d8
    • Tan, Jui Nee's avatar
      spi: spi-pxa2xx: Check status register to determine if SSSR_TINT is disabled · 47d7e7e7
      Tan, Jui Nee authored
      commit 02bc933e upstream.
      
      On Intel Baytrail, there is case when interrupt handler get called, no SPI
      message is captured. The RX FIFO is indeed empty when RX timeout pending
      interrupt (SSSR_TINT) happens.
      
      Use the BIOS version where both HSUART and SPI are on the same IRQ. Both
      drivers are using IRQF_SHARED when calling the request_irq function. When
      running two separate and independent SPI and HSUART application that
      generate data traffic on both components, user will see messages like
      below on the console:
      
        pxa2xx-spi pxa2xx-spi.0: bad message state in interrupt handler
      
      This commit will fix this by first checking Receiver Time-out Interrupt,
      if it is disabled, ignore the request and return without servicing.
      Signed-off-by: default avatarTan, Jui Nee <jui.nee.tan@intel.com>
      Acked-by: default avatarJarkko Nikula <jarkko.nikula@linux.intel.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      47d7e7e7
    • Dan Carpenter's avatar
      drm: crtc: integer overflow in drm_property_create_blob() · 2f5b9f27
      Dan Carpenter authored
      commit 9ac0934b upstream.
      
      The size here comes from the user via the ioctl, it is a number between
      1-u32max so the addition here could overflow on 32 bit systems.
      
      Fixes: f453ba04 ('DRM: add mode setting support')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarDaniel Stone <daniels@collabora.com>
      Signed-off-by: default avatarDave Airlie <airlied@gmail.com>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      2f5b9f27
    • NeilBrown's avatar
      md/raid1: don't clear bitmap bit when bad-block-list write fails. · 6126604d
      NeilBrown authored
      commit bd8688a1 upstream.
      
      When a write fails and a bad-block-list is present, we can
      update the bad-block-list instead of writing the data.  If
      this succeeds then it is OK clear the relevant bitmap-bit as
      no further 'sync' of the block is needed.
      
      However if writing the bad-block-list fails then we need to
      treat the write as failed and particularly must not clear
      the bitmap bit.  Otherwise the device can be re-added (after
      any hardware connection issues are resolved) and because the
      relevant bit in the bitmap is clear, that block will not be
      resynced.  This leads to data corruption.
      
      We already delay the final bio_endio() on the write until
      the bad-block-list is written so that when the write
      returns: either that data is safe, the bad-block record is
      safe, or the fact that the device is faulty is safe.
      However we *don't* delay the clearing of the bitmap, so the
      bitmap bit can be recorded as cleared before we know if the
      bad-block-list was written safely.
      
      So: delay that until the write really is safe.
      i.e. move the call to close_write() until just before
      calling bio_endio(), and recheck the 'is array degraded'
      status before making that call.
      
      This bug goes back to v3.1 when bad-block-lists were
      introduced, though it only affects arrays created with
      mdadm-3.3 or later as only those have bad-block lists.
      
      Backports will require at least
      Commit: 55ce74d4 ("md/raid1: ensure device failure recorded before write request returns.")
      as well.  I'll send that to 'stable' separately.
      
      Note that of the two tests of R1BIO_WriteError that this
      patch adds, the first is certain to fail and the second is
      certain to succeed.  However doing it this way makes the
      patch more obviously correct.  I will tidy the code up in a
      future merge window.
      Reported-and-tested-by: default avatarNate Dailey <nate.dailey@stratus.com>
      Cc: Jes Sorensen <Jes.Sorensen@redhat.com>
      Fixes: cd5ff9a1 ("md/raid1:  Handle write errors by updating badblock log.")
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      6126604d
    • NeilBrown's avatar
      md/raid1: ensure device failure recorded before write request returns. · f6b1d7cb
      NeilBrown authored
      commit 55ce74d4 upstream.
      
      When a write to one of the legs of a RAID1 fails, the failure is
      recorded in the metadata of the other leg(s) so that after a restart
      the data on the failed drive wont be trusted even if that drive seems
      to be working again  (maybe a cable was unplugged).
      
      Similarly when we record a bad-block in response to a write failure,
      we must not let the write complete until the bad-block update is safe.
      
      Currently there is no interlock between the write request completing
      and the metadata update.  So it is possible that the write will
      complete, the app will confirm success in some way, and then the
      machine will crash before the metadata update completes.
      
      This is an extremely small hole for a racy to fit in, but it is
      theoretically possible and so should be closed.
      
      So:
       - set MD_CHANGE_PENDING when requesting a metadata update for a
         failed device, so we can know with certainty when it completes
       - queue requests that experienced an error on a new queue which
         is only processed after the metadata update completes
       - call raid_end_bio_io() on bios in that queue when the time comes.
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      f6b1d7cb
    • NeilBrown's avatar
      md/raid10: don't clear bitmap bit when bad-block-list write fails. · 0570dab3
      NeilBrown authored
      commit c340702c upstream.
      
      When a write fails and a bad-block-list is present, we can
      update the bad-block-list instead of writing the data.  If
      this succeeds then it is OK clear the relevant bitmap-bit as
      no further 'sync' of the block is needed.
      
      However if writing the bad-block-list fails then we need to
      treat the write as failed and particularly must not clear
      the bitmap bit.  Otherwise the device can be re-added (after
      any hardware connection issues are resolved) and because the
      relevant bit in the bitmap is clear, that block will not be
      resynced.  This leads to data corruption.
      
      We already delay the final bio_endio() on the write until
      the bad-block-list is written so that when the write
      returns: either that data is safe, the bad-block record is
      safe, or the fact that the device is faulty is safe.
      However we *don't* delay the clearing of the bitmap, so the
      bitmap bit can be recorded as cleared before we know if the
      bad-block-list was written safely.
      
      So: delay that until the write really is safe.
      i.e. move the call to close_write() until just before
      calling bio_endio(), and recheck the 'is array degraded'
      status before making that call.
      
      This bug goes back to v3.1 when bad-block-lists were
      introduced, though it only affects arrays created with
      mdadm-3.3 or later as only those have bad-block lists.
      
      Backports will require at least
      Commit: 95af587e ("md/raid10: ensure device failure recorded before write request returns.")
      as well.  I'll send that to 'stable' separately.
      
      Note that of the two tests of R10BIO_WriteError that this
      patch adds, the first is certain to fail and the second is
      certain to succeed.  However doing it this way makes the
      patch more obviously correct.  I will tidy the code up in a
      future merge window.
      Reported-by: default avatarNate Dailey <nate.dailey@stratus.com>
      Fixes: bd870a16 ("md/raid10:  Handle write errors by updating badblock log.")
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      0570dab3
    • NeilBrown's avatar
      md/raid10: ensure device failure recorded before write request returns. · 43bf02ba
      NeilBrown authored
      commit 95af587e upstream.
      
      When a write to one of the legs of a RAID10 fails, the failure is
      recorded in the metadata of the other legs so that after a restart
      the data on the failed drive wont be trusted even if that drive seems
      to be working again (maybe a cable was unplugged).
      
      Currently there is no interlock between the write request completing
      and the metadata update.  So it is possible that the write will
      complete, the app will confirm success in some way, and then the
      machine will crash before the metadata update completes.
      
      This is an extremely small hole for a racy to fit in, but it is
      theoretically possible and so should be closed.
      
      So:
       - set MD_CHANGE_PENDING when requesting a metadata update for a
         failed device, so we can know with certainty when it completes
       - queue requests that experienced an error on a new queue which
         is only processed after the metadata update completes
       - call raid_end_bio_io() on bios in that queue when the time comes.
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      43bf02ba
    • Vasant Hegde's avatar
      powerpc/rtas: Validate rtas.entry before calling enter_rtas() · 894f53c9
      Vasant Hegde authored
      commit 8832317f upstream.
      
      Currently we do not validate rtas.entry before calling enter_rtas(). This
      leads to a kernel oops when user space calls rtas system call on a powernv
      platform (see below). This patch adds code to validate rtas.entry before
      making enter_rtas() call.
      
        Oops: Exception in kernel mode, sig: 4 [#1]
        SMP NR_CPUS=1024 NUMA PowerNV
        task: c000000004294b80 ti: c0000007e1a78000 task.ti: c0000007e1a78000
        NIP: 0000000000000000 LR: 0000000000009c14 CTR: c000000000423140
        REGS: c0000007e1a7b920 TRAP: 0e40   Not tainted  (3.18.17-340.el7_1.pkvm3_1_0.2400.1.ppc64le)
        MSR: 1000000000081000 <HV,ME>  CR: 00000000  XER: 00000000
        CFAR: c000000000009c0c SOFTE: 0
        NIP [0000000000000000]           (null)
        LR [0000000000009c14] 0x9c14
        Call Trace:
        [c0000007e1a7bba0] [c00000000041a7f4] avc_has_perm_noaudit+0x54/0x110 (unreliable)
        [c0000007e1a7bd80] [c00000000002ddc0] ppc_rtas+0x150/0x2d0
        [c0000007e1a7be30] [c000000000009358] syscall_exit+0x0/0x98
      
      Fixes: 55190f88 ("powerpc: Add skeleton PowerNV platform")
      Reported-by: default avatarNAGESWARA R. SASTRY <nasastry@in.ibm.com>
      Signed-off-by: default avatarVasant Hegde <hegdevasant@linux.vnet.ibm.com>
      [mpe: Reword change log, trim oops, and add stable + fixes]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      894f53c9
    • Doron Tsur's avatar
      IB/cm: Fix rb-tree duplicate free and use-after-free · 7abd07f2
      Doron Tsur authored
      commit 0ca81a28 upstream.
      
      ib_send_cm_sidr_rep could sometimes erase the node from the sidr
      (depending on errors in the process). Since ib_send_cm_sidr_rep is
      called both from cm_sidr_req_handler and cm_destroy_id, cm_id_priv
      could be either erased from the rb_tree twice or not erased at all.
      Fixing that by making sure it's erased only once before freeing
      cm_id_priv.
      
      Fixes: a977049d ('[PATCH] IB: Add the kernel CM implementation')
      Signed-off-by: default avatarDoron Tsur <doront@mellanox.com>
      Signed-off-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      7abd07f2
    • Peter Zijlstra's avatar
      sched/core: Fix TASK_DEAD race in finish_task_switch() · a12321d3
      Peter Zijlstra authored
      commit 95913d97 upstream.
      
      So the problem this patch is trying to address is as follows:
      
              CPU0                            CPU1
      
              context_switch(A, B)
                                              ttwu(A)
                                                LOCK A->pi_lock
                                                A->on_cpu == 0
              finish_task_switch(A)
                prev_state = A->state  <-.
                WMB                      |
                A->on_cpu = 0;           |
                UNLOCK rq0->lock         |
                                         |    context_switch(C, A)
                                         `--  A->state = TASK_DEAD
                prev_state == TASK_DEAD
                  put_task_struct(A)
                                              context_switch(A, C)
                                              finish_task_switch(A)
                                                A->state == TASK_DEAD
                                                  put_task_struct(A)
      
      The argument being that the WMB will allow the load of A->state on CPU0
      to cross over and observe CPU1's store of A->state, which will then
      result in a double-drop and use-after-free.
      
      Now the comment states (and this was true once upon a long time ago)
      that we need to observe A->state while holding rq->lock because that
      will order us against the wakeup; however the wakeup will not in fact
      acquire (that) rq->lock; it takes A->pi_lock these days.
      
      We can obviously fix this by upgrading the WMB to an MB, but that is
      expensive, so we'd rather avoid that.
      
      The alternative this patch takes is: smp_store_release(&A->on_cpu, 0),
      which avoids the MB on some archs, but not important ones like ARM.
      Reported-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Cc: manfred@colorfullife.com
      Cc: will.deacon@arm.com
      Fixes: e4a52bcb ("sched: Remove rq->lock from the first half of ttwu()")
      Link: http://lkml.kernel.org/r/20150929124509.GG3816@twins.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      [lizf: Backported to 3.4: use smb_mb() instead of smp_store_release(), which
       is not defined in 3.4.y]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      a12321d3
    • Johannes Berg's avatar
      iwlwifi: dvm: fix D3 firmware PN programming · c2acc6aa
      Johannes Berg authored
      commit 5bd16687 upstream.
      
      The code to send the RX PN data (for each TID) to the firmware
      has a devastating bug: it overwrites the data for TID 0 with
      all the TID data, leaving the remaining TIDs zeroed. This will
      allow replays to actually be accepted by the firmware, which
      could allow waking up the system.
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarLuca Coelho <luciano.coelho@intel.com>
      [lizf: Backported to 3.4: adjust filename]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      c2acc6aa