1. 22 Oct, 2015 40 commits
    • Peter Zijlstra's avatar
      sched/core: Fix TASK_DEAD race in finish_task_switch() · 8210b919
      Peter Zijlstra authored
      commit 95913d97 upstream.
      
      So the problem this patch is trying to address is as follows:
      
              CPU0                            CPU1
      
              context_switch(A, B)
                                              ttwu(A)
                                                LOCK A->pi_lock
                                                A->on_cpu == 0
              finish_task_switch(A)
                prev_state = A->state  <-.
                WMB                      |
                A->on_cpu = 0;           |
                UNLOCK rq0->lock         |
                                         |    context_switch(C, A)
                                         `--  A->state = TASK_DEAD
                prev_state == TASK_DEAD
                  put_task_struct(A)
                                              context_switch(A, C)
                                              finish_task_switch(A)
                                                A->state == TASK_DEAD
                                                  put_task_struct(A)
      
      The argument being that the WMB will allow the load of A->state on CPU0
      to cross over and observe CPU1's store of A->state, which will then
      result in a double-drop and use-after-free.
      
      Now the comment states (and this was true once upon a long time ago)
      that we need to observe A->state while holding rq->lock because that
      will order us against the wakeup; however the wakeup will not in fact
      acquire (that) rq->lock; it takes A->pi_lock these days.
      
      We can obviously fix this by upgrading the WMB to an MB, but that is
      expensive, so we'd rather avoid that.
      
      The alternative this patch takes is: smp_store_release(&A->on_cpu, 0),
      which avoids the MB on some archs, but not important ones like ARM.
      Reported-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Cc: manfred@colorfullife.com
      Cc: will.deacon@arm.com
      Fixes: e4a52bcb ("sched: Remove rq->lock from the first half of ttwu()")
      Link: http://lkml.kernel.org/r/20150929124509.GG3816@twins.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8210b919
    • Ricardo Ribalda Delgado's avatar
      leds/led-class: Add missing put_device() · 43843798
      Ricardo Ribalda Delgado authored
      commit e5b5a61f upstream.
      
      Devices found by class_find_device must be freed with put_device().
      Otherwise the reference count will not work properly.
      
      Fixes: a96aa64c ("leds/led-class: Handle LEDs with the same name")
      Reported-by: default avatarAlan Tull <delicious.quinoa@gmail.com>
      Signed-off-by: default avatarRicardo Ribalda Delgado <ricardo.ribalda@gmail.com>
      Signed-off-by: default avatarJacek Anaszewski <j.anaszewski@samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      43843798
    • Vitaly Kuznetsov's avatar
      x86/xen: Support kexec/kdump in HVM guests by doing a soft reset · 6cdd2beb
      Vitaly Kuznetsov authored
      commit 0b34a166 upstream.
      
      Currently there is a number of issues preventing PVHVM Xen guests from
      doing successful kexec/kdump:
      
        - Bound event channels.
        - Registered vcpu_info.
        - PIRQ/emuirq mappings.
        - shared_info frame after XENMAPSPACE_shared_info operation.
        - Active grant mappings.
      
      Basically, newly booted kernel stumbles upon already set up Xen
      interfaces and there is no way to reestablish them. In Xen-4.7 a new
      feature called 'soft reset' is coming. A guest performing kexec/kdump
      operation is supposed to call SCHEDOP_shutdown hypercall with
      SHUTDOWN_soft_reset reason before jumping to new kernel. Hypervisor
      (with some help from toolstack) will do full domain cleanup (but
      keeping its memory and vCPU contexts intact) returning the guest to
      the state it had when it was first booted and thus allowing it to
      start over.
      
      Doing SHUTDOWN_soft_reset on Xen hypervisors which don't support it is
      probably OK as by default all unknown shutdown reasons cause domain
      destroy with a message in toolstack log: 'Unknown shutdown reason code
      5. Destroying domain.'  which gives a clue to what the problem is and
      eliminates false expectations.
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6cdd2beb
    • Stephen Smalley's avatar
      x86/mm: Set NX on gap between __ex_table and rodata · b05730b2
      Stephen Smalley authored
      commit ab76f7b4 upstream.
      
      Unused space between the end of __ex_table and the start of
      rodata can be left W+x in the kernel page tables.  Extend the
      setting of the NX bit to cover this gap by starting from
      text_end rather than rodata_start.
      
        Before:
        ---[ High Kernel Mapping ]---
        0xffffffff80000000-0xffffffff81000000          16M                               pmd
        0xffffffff81000000-0xffffffff81600000           6M     ro         PSE     GLB x  pmd
        0xffffffff81600000-0xffffffff81754000        1360K     ro                 GLB x  pte
        0xffffffff81754000-0xffffffff81800000         688K     RW                 GLB x  pte
        0xffffffff81800000-0xffffffff81a00000           2M     ro         PSE     GLB NX pmd
        0xffffffff81a00000-0xffffffff81b3b000        1260K     ro                 GLB NX pte
        0xffffffff81b3b000-0xffffffff82000000        4884K     RW                 GLB NX pte
        0xffffffff82000000-0xffffffff82200000           2M     RW         PSE     GLB NX pmd
        0xffffffff82200000-0xffffffffa0000000         478M                               pmd
      
        After:
        ---[ High Kernel Mapping ]---
        0xffffffff80000000-0xffffffff81000000          16M                               pmd
        0xffffffff81000000-0xffffffff81600000           6M     ro         PSE     GLB x  pmd
        0xffffffff81600000-0xffffffff81754000        1360K     ro                 GLB x  pte
        0xffffffff81754000-0xffffffff81800000         688K     RW                 GLB NX pte
        0xffffffff81800000-0xffffffff81a00000           2M     ro         PSE     GLB NX pmd
        0xffffffff81a00000-0xffffffff81b3b000        1260K     ro                 GLB NX pte
        0xffffffff81b3b000-0xffffffff82000000        4884K     RW                 GLB NX pte
        0xffffffff82000000-0xffffffff82200000           2M     RW         PSE     GLB NX pmd
        0xffffffff82200000-0xffffffffa0000000         478M                               pmd
      Signed-off-by: default avatarStephen Smalley <sds@tycho.nsa.gov>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Link: http://lkml.kernel.org/r/1443704662-3138-1-git-send-email-sds@tycho.nsa.govSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b05730b2
    • Thomas Gleixner's avatar
      x86/process: Add proper bound checks in 64bit get_wchan() · f0caabec
      Thomas Gleixner authored
      commit eddd3826 upstream.
      
      Dmitry Vyukov reported the following using trinity and the memory
      error detector AddressSanitizer
      (https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel).
      
      [ 124.575597] ERROR: AddressSanitizer: heap-buffer-overflow on
      address ffff88002e280000
      [ 124.576801] ffff88002e280000 is located 131938492886538 bytes to
      the left of 28857600-byte region [ffffffff81282e0a, ffffffff82e0830a)
      [ 124.578633] Accessed by thread T10915:
      [ 124.579295] inlined in describe_heap_address
      ./arch/x86/mm/asan/report.c:164
      [ 124.579295] #0 ffffffff810dd277 in asan_report_error
      ./arch/x86/mm/asan/report.c:278
      [ 124.580137] #1 ffffffff810dc6a0 in asan_check_region
      ./arch/x86/mm/asan/asan.c:37
      [ 124.581050] #2 ffffffff810dd423 in __tsan_read8 ??:0
      [ 124.581893] #3 ffffffff8107c093 in get_wchan
      ./arch/x86/kernel/process_64.c:444
      
      The address checks in the 64bit implementation of get_wchan() are
      wrong in several ways:
      
       - The lower bound of the stack is not the start of the stack
         page. It's the start of the stack page plus sizeof (struct
         thread_info)
      
       - The upper bound must be:
      
             top_of_stack - TOP_OF_KERNEL_STACK_PADDING - 2 * sizeof(unsigned long).
      
         The 2 * sizeof(unsigned long) is required because the stack pointer
         points at the frame pointer. The layout on the stack is: ... IP FP
         ... IP FP. So we need to make sure that both IP and FP are in the
         bounds.
      
      Fix the bound checks and get rid of the mix of numeric constants, u64
      and unsigned long. Making all unsigned long allows us to use the same
      function for 32bit as well.
      
      Use READ_ONCE() when accessing the stack. This does not prevent a
      concurrent wakeup of the task and the stack changing, but at least it
      avoids TOCTOU.
      
      Also check task state at the end of the loop. Again that does not
      prevent concurrent changes, but it avoids walking for nothing.
      
      Add proper comments while at it.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Based-on-patch-from: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Reviewed-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Kostya Serebryany <kcc@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: kasan-dev <kasan-dev@googlegroups.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de>
      Link: http://lkml.kernel.org/r/20150930083302.694788319@linutronix.deSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f0caabec
    • Lee, Chun-Yi's avatar
      x86/kexec: Fix kexec crash in syscall kexec_file_load() · 18b756c4
      Lee, Chun-Yi authored
      commit e3c41e37 upstream.
      
      The original bug is a page fault crash that sometimes happens
      on big machines when preparing ELF headers:
      
          BUG: unable to handle kernel paging request at ffffc90613fc9000
          IP: [<ffffffff8103d645>] prepare_elf64_ram_headers_callback+0x165/0x260
      
      The bug is caused by us under-counting the number of memory ranges
      and subsequently not allocating enough ELF header space for them.
      The bug is typically masked on smaller systems, because the ELF header
      allocation is rounded up to the next page.
      
      This patch modifies the code in fill_up_crash_elf_data() by using
      walk_system_ram_res() instead of walk_system_ram_range() to correctly
      count the max number of crash memory ranges. That's because the
      walk_system_ram_range() filters out small memory regions that
      reside in the same page, but walk_system_ram_res() does not.
      
      Here's how I found the bug:
      
      After tracing prepare_elf64_headers() and prepare_elf64_ram_headers_callback(),
      the code uses walk_system_ram_res() to fill-in crash memory regions information
      to the program header, so it counts those small memory regions that
      reside in a page area.
      
      But, when the kernel was using walk_system_ram_range() in
      fill_up_crash_elf_data() to count the number of crash memory regions,
      it filters out small regions.
      
      I printed those small memory regions, for example:
      
        kexec: Get nr_ram ranges. vaddr=0xffff880077592258 paddr=0x77592258, sz=0xdc0
      
      Based on the code in walk_system_ram_range(), this memory region
      will be filtered out:
      
        pfn = (0x77592258 + 0x1000 - 1) >> 12 = 0x77593
        end_pfn = (0x77592258 + 0xfc0 -1 + 1) >> 12 = 0x77593
        end_pfn - pfn = 0x77593 - 0x77593 = 0  <=== if (end_pfn > pfn) is FALSE
      
      So, the max_nr_ranges that's counted by the kernel doesn't include
      small memory regions - causing us to under-allocate the required space.
      That causes the page fault crash that happens in a later code path
      when preparing ELF headers.
      
      This bug is not easy to reproduce on small machines that have few
      CPUs, because the allocated page aligned ELF buffer has more free
      space to cover those small memory regions' PT_LOAD headers.
      Signed-off-by: default avatarLee, Chun-Yi <jlee@suse.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: kexec@lists.infradead.org
      Cc: linux-kernel@vger.kernel.org
      Link: http://lkml.kernel.org/r/1443531537-29436-1-git-send-email-jlee@suse.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      18b756c4
    • Matt Fleming's avatar
      x86/efi: Fix boot crash by mapping EFI memmap entries bottom-up at runtime, instead of top-down · 6adcb2b1
      Matt Fleming authored
      commit a5caa209 upstream.
      
      Beginning with UEFI v2.5 EFI_PROPERTIES_TABLE was introduced
      that signals that the firmware PE/COFF loader supports splitting
      code and data sections of PE/COFF images into separate EFI
      memory map entries. This allows the kernel to map those regions
      with strict memory protections, e.g. EFI_MEMORY_RO for code,
      EFI_MEMORY_XP for data, etc.
      
      Unfortunately, an unwritten requirement of this new feature is
      that the regions need to be mapped with the same offsets
      relative to each other as observed in the EFI memory map. If
      this is not done crashes like this may occur,
      
        BUG: unable to handle kernel paging request at fffffffefe6086dd
        IP: [<fffffffefe6086dd>] 0xfffffffefe6086dd
        Call Trace:
         [<ffffffff8104c90e>] efi_call+0x7e/0x100
         [<ffffffff81602091>] ? virt_efi_set_variable+0x61/0x90
         [<ffffffff8104c583>] efi_delete_dummy_variable+0x63/0x70
         [<ffffffff81f4e4aa>] efi_enter_virtual_mode+0x383/0x392
         [<ffffffff81f37e1b>] start_kernel+0x38a/0x417
         [<ffffffff81f37495>] x86_64_start_reservations+0x2a/0x2c
         [<ffffffff81f37582>] x86_64_start_kernel+0xeb/0xef
      
      Here 0xfffffffefe6086dd refers to an address the firmware
      expects to be mapped but which the OS never claimed was mapped.
      The issue is that included in these regions are relative
      addresses to other regions which were emitted by the firmware
      toolchain before the "splitting" of sections occurred at
      runtime.
      
      Needless to say, we don't satisfy this unwritten requirement on
      x86_64 and instead map the EFI memory map entries in reverse
      order. The above crash is almost certainly triggerable with any
      kernel newer than v3.13 because that's when we rewrote the EFI
      runtime region mapping code, in commit d2f7cbe7 ("x86/efi:
      Runtime services virtual mapping"). For kernel versions before
      v3.13 things may work by pure luck depending on the
      fragmentation of the kernel virtual address space at the time we
      map the EFI regions.
      
      Instead of mapping the EFI memory map entries in reverse order,
      where entry N has a higher virtual address than entry N+1, map
      them in the same order as they appear in the EFI memory map to
      preserve this relative offset between regions.
      
      This patch has been kept as small as possible with the intention
      that it should be applied aggressively to stable and
      distribution kernels. It is very much a bugfix rather than
      support for a new feature, since when EFI_PROPERTIES_TABLE is
      enabled we must map things as outlined above to even boot - we
      have no way of asking the firmware not to split the code/data
      regions.
      
      In fact, this patch doesn't even make use of the more strict
      memory protections available in UEFI v2.5. That will come later.
      Suggested-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Reported-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Chun-Yi <jlee@suse.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: James Bottomley <JBottomley@Odin.com>
      Cc: Lee, Chun-Yi <jlee@suse.com>
      Cc: Leif Lindholm <leif.lindholm@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Jones <pjones@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Link: http://lkml.kernel.org/r/1443218539-7610-2-git-send-email-matt@codeblueprint.co.ukSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6adcb2b1
    • Dirk Müller's avatar
      Use WARN_ON_ONCE for missing X86_FEATURE_NRIPS · d6a4aed8
      Dirk Müller authored
      commit d2922422 upstream.
      
      The cpu feature flags are not ever going to change, so warning
      everytime can cause a lot of kernel log spam
      (in our case more than 10GB/hour).
      
      The warning seems to only occur when nested virtualization is
      enabled, so it's probably triggered by a KVM bug.  This is a
      sensible and safe change anyway, and the KVM bug fix might not
      be suitable for stable releases anyway.
      Signed-off-by: default avatarDirk Mueller <dmueller@suse.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d6a4aed8
    • Andy Lutomirski's avatar
      x86/nmi/64: Fix a paravirt stack-clobbering bug in the NMI code · 3f2a8445
      Andy Lutomirski authored
      commit 83c133cf upstream.
      
      The NMI entry code that switches to the normal kernel stack needs to
      be very careful not to clobber any extra stack slots on the NMI
      stack.  The code is fine under the assumption that SWAPGS is just a
      normal instruction, but that assumption isn't really true.  Use
      SWAPGS_UNSAFE_STACK instead.
      
      This is part of a fix for some random crashes that Sasha saw.
      
      Fixes: 9b6e6a83 ("x86/nmi/64: Switch stacks on userspace NMI entry")
      Reported-and-tested-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Link: http://lkml.kernel.org/r/974bc40edffdb5c2950a5c4977f821a446b76178.1442791737.git.luto@kernel.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3f2a8445
    • Andy Lutomirski's avatar
      x86/paravirt: Replace the paravirt nop with a bona fide empty function · b3eb2816
      Andy Lutomirski authored
      commit fc57a7c6 upstream.
      
      PARAVIRT_ADJUST_EXCEPTION_FRAME generates this code (using nmi as an
      example, trimmed for readability):
      
          ff 15 00 00 00 00       callq  *0x0(%rip)        # 2796 <nmi+0x6>
                    2792: R_X86_64_PC32     pv_irq_ops+0x2c
      
      That's a call through a function pointer to regular C function that
      does nothing on native boots, but that function isn't protected
      against kprobes, isn't marked notrace, and is certainly not
      guaranteed to preserve any registers if the compiler is feeling
      perverse.  This is bad news for a CLBR_NONE operation.
      
      Of course, if everything works correctly, once paravirt ops are
      patched, it gets nopped out, but what if we hit this code before
      paravirt ops are patched in?  This can potentially cause breakage
      that is very difficult to debug.
      
      A more subtle failure is possible here, too: if _paravirt_nop uses
      the stack at all (even just to push RBP), it will overwrite the "NMI
      executing" variable if it's called in the NMI prologue.
      
      The Xen case, perhaps surprisingly, is fine, because it's already
      written in asm.
      
      Fix all of the cases that default to paravirt_nop (including
      adjust_exception_frame) with a big hammer: replace paravirt_nop with
      an asm function that is just a ret instruction.
      
      The Xen case may have other problems, so document them.
      
      This is part of a fix for some random crashes that Sasha saw.
      Reported-and-tested-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Link: http://lkml.kernel.org/r/8f5d2ba295f9d73751c33d97fda03e0495d9ade0.1442791737.git.luto@kernel.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b3eb2816
    • David Woodhouse's avatar
      x86/platform: Fix Geode LX timekeeping in the generic x86 build · e6491eca
      David Woodhouse authored
      commit 03da3ff1 upstream.
      
      In 2007, commit 07190a08 ("Mark TSC on GeodeLX reliable")
      bypassed verification of the TSC on Geode LX. However, this code
      (now in the check_system_tsc_reliable() function in
      arch/x86/kernel/tsc.c) was only present if CONFIG_MGEODE_LX was
      set.
      
      OpenWRT has recently started building its generic Geode target
      for Geode GX, not LX, to include support for additional
      platforms. This broke the timekeeping on LX-based devices,
      because the TSC wasn't marked as reliable:
      https://dev.openwrt.org/ticket/20531
      
      By adding a runtime check on is_geode_lx(), we can also include
      the fix if CONFIG_MGEODEGX1 or CONFIG_X86_GENERIC are set, thus
      fixing the problem.
      Signed-off-by: default avatarDavid Woodhouse <David.Woodhouse@intel.com>
      Cc: Andres Salomon <dilinger@queued.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Marcelo Tosatti <marcelo@kvack.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1442409003.131189.87.camel@infradead.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e6491eca
    • Thomas Gleixner's avatar
      x86/alternatives: Make optimize_nops() interrupt safe and synced · 34171109
      Thomas Gleixner authored
      commit 66c117d7 upstream.
      
      Richard reported the following crash:
      
      [    0.036000] BUG: unable to handle kernel paging request at 55501e06
      [    0.036000] IP: [<c0aae48b>] common_interrupt+0xb/0x38
      [    0.036000] Call Trace:
      [    0.036000]  [<c0409c80>] ? add_nops+0x90/0xa0
      [    0.036000]  [<c040a054>] apply_alternatives+0x274/0x630
      
      Chuck decoded:
      
       "  0:   8d 90 90 83 04 24       lea    0x24048390(%eax),%edx
          6:   80 fc 0f                cmp    $0xf,%ah
          9:   a8 0f                   test   $0xf,%al
       >> b:   a0 06 1e 50 55          mov    0x55501e06,%al
         10:   57                      push   %edi
         11:   56                      push   %esi
      
       Interrupt 0x30 occurred while the alternatives code was replacing the
       initial 0x90,0x90,0x90 NOPs (from the ASM_CLAC macro) with the
       optimized version, 0x8d,0x76,0x00. Only the first byte has been
       replaced so far, and it makes a mess out of the insn decoding."
      
      optimize_nops() is buggy in two aspects:
      
      - It's not disabling interrupts across the modification
      - It's lacking a sync_core() call
      
      Add both.
      
      Fixes: 4fd4b6e5 'x86/alternatives: Use optimized NOPs for padding'
      Reported-and-tested-by: default avatar"Richard W.M. Jones" <rjones@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Richard W.M. Jones <rjones@redhat.com>
      Cc: Chuck Ebbert <cebbert.lkml@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1509031232340.15006@nanosSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      34171109
    • Shaohua Li's avatar
      x86/apic: Serialize LVTT and TSC_DEADLINE writes · d97c1eca
      Shaohua Li authored
      commit 5d7c631d upstream.
      
      The APIC LVTT register is MMIO mapped but the TSC_DEADLINE register is an
      MSR. The write to the TSC_DEADLINE MSR is not serializing, so it's not
      guaranteed that the write to LVTT has reached the APIC before the
      TSC_DEADLINE MSR is written. In such a case the write to the MSR is
      ignored and as a consequence the local timer interrupt never fires.
      
      The SDM decribes this issue for xAPIC and x2APIC modes. The
      serialization methods recommended by the SDM differ.
      
      xAPIC:
       "1. Memory-mapped write to LVT Timer Register, setting bits 18:17 to 10b.
        2. WRMSR to the IA32_TSC_DEADLINE MSR a value much larger than current time-stamp counter.
        3. If RDMSR of the IA32_TSC_DEADLINE MSR returns zero, go to step 2.
        4. WRMSR to the IA32_TSC_DEADLINE MSR the desired deadline."
      
      x2APIC:
       "To allow for efficient access to the APIC registers in x2APIC mode,
        the serializing semantics of WRMSR are relaxed when writing to the
        APIC registers. Thus, system software should not use 'WRMSR to APIC
        registers in x2APIC mode' as a serializing instruction. Read and write
        accesses to the APIC registers will occur in program order. A WRMSR to
        an APIC register may complete before all preceding stores are globally
        visible; software can prevent this by inserting a serializing
        instruction, an SFENCE, or an MFENCE before the WRMSR."
      
      The xAPIC method is to just wait for the memory mapped write to hit
      the LVTT by checking whether the MSR write has reached the hardware.
      There is no reason why a proper MFENCE after the memory mapped write would
      not do the same. Andi Kleen confirmed that MFENCE is sufficient for the
      xAPIC case as well.
      
      Issue MFENCE before writing to the TSC_DEADLINE MSR. This can be done
      unconditionally as all CPUs which have TSC_DEADLINE also have MFENCE
      support.
      
      [ tglx: Massaged the changelog ]
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Reviewed-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: <Kernel-team@fb.com>
      Cc: <lenb@kernel.org>
      Cc: <fenghua.yu@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Link: http://lkml.kernel.org/r/20150909041352.GA2059853@devbig257.prn2.facebook.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d97c1eca
    • Andy Shevchenko's avatar
      dmaengine: dw: properly read DWC_PARAMS register · c46eada1
      Andy Shevchenko authored
      commit 6bea0f6d upstream.
      
      In case we have less than maximum allowed channels (8) and autoconfiguration is
      enabled the DWC_PARAMS read is wrong because it uses different arithmetic to
      what is needed for channel priority setup.
      
      Re-do the caclulations properly. This now works on AVR32 board well.
      
      Fixes: fed2574b (dw_dmac: introduce software emulation of LLP transfers)
      Cc: yitian.bu@tangramtek.com
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarVinod Koul <vinod.koul@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c46eada1
    • Jeff Moyer's avatar
      blockdev: don't set S_DAX for misaligned partitions · eb218995
      Jeff Moyer authored
      commit f0b2e563 upstream.
      
      The dax code doesn't currently support misaligned partitions,
      so disable O_DIRECT via dax until such time as that support
      materializes.
      Suggested-by: default avatarBoaz Harrosh <boaz@plexistor.com>
      Signed-off-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eb218995
    • Felipe F. Tonello's avatar
      ARM: dts: fix usb pin control for imx-rex dts · 7ae7e145
      Felipe F. Tonello authored
      commit 0af82211 upstream.
      
      This fixes a duplicated pin control causing this error:
      
      imx6q-pinctrl 20e0000.iomuxc: pin MX6Q_PAD_GPIO_1 already
      requested by regulators:regulator@2; cannot claim for 2184000.usb
      imx6q-pinctrl 20e0000.iomuxc: pin-137 (2184000.usb) status -22
      imx6q-pinctrl 20e0000.iomuxc: could not request pin 137
      (MX6Q_PAD_GPIO_1) from group usbotggrp  on device 20e0000.iomuxc
      imx_usb 2184000.usb: Error applying setting, reverse things
      back
      imx6q-pinctrl 20e0000.iomuxc: pin MX6Q_PAD_EIM_D31 already
      requested by regulators:regulator@1; cannot claim for 2184200.usb
      imx6q-pinctrl 20e0000.iomuxc: pin-52 (2184200.usb) status -22
      imx6q-pinctrl 20e0000.iomuxc: could not request pin 52 (MX6Q_PAD_EIM_D31)
      from group usbh1grp  on device 20e0000.iomuxc
      imx_usb 2184200.usb: Error applying setting, reverse things
      back
      Signed-off-by: default avatarFelipe F. Tonello <eu@felipetonello.com>
      Fixes: e2047e33 ("ARM: dts: add initial Rex Pro board support")
      Signed-off-by: default avatarShawn Guo <shawnguo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7ae7e145
    • Chanho Park's avatar
      ARM: EXYNOS: reset Little cores when cpu is up · d46cd96d
      Chanho Park authored
      commit 833b5794 upstream.
      
      The cpu booting of exynos5422 has been still broken since we discussed
      it in last year[1]. This patch is inspired from Odroid XU3
      code (Actually, it was from samsung exynos vendor kernel)[2]. This weird
      reset code was founded exynos5420 octa cores series SoCs and only
      required for the first boot core is the Little core (Cortex A7).
      Some of the exynos5420 boards and all of the exynos5422 boards will require
      this code.
      
      There is two ways to check the little core is the first cpu. One is
      checking GPG2CON[1] GPIO value and the other is checking the cluster
      number of the first cpu. I selected the latter because it's more easier
      than the former.
      
      [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-June/350632.html
      [2] https://patchwork.kernel.org/patch/6782891/
      
      Cc: Kevin Hilman <khilman@kernel.org>
      Cc: Javier Martinez Canillas <javier@osg.samsung.com>
      Cc: Krzysztof Kozlowski <k.kozlowski@samsung.com>
      Tested-by: default avatarKevin Hilman <khilman@linaro.org>
      Signed-off-by: default avatarChanho Park <parkch98@gmail.com>
      [k.kozlowski: Adding stable for v4.1+, reformat comment]
      Signed-off-by: default avatarKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d46cd96d
    • Carl Frederik Werner's avatar
      ARM: dts: omap3-beagle: make i2c3, ddc and tfp410 gpio work again · 49d8fc8a
      Carl Frederik Werner authored
      commit 3a2fa775 upstream.
      
      Let's fix pinmux address of gpio 170 used by tfp410 powerdown-gpio.
      
      According to the OMAP35x Technical Reference Manual
        CONTROL_PADCONF_I2C3_SDA[15:0]  0x480021C4 mode0: i2c3_sda
        CONTROL_PADCONF_I2C3_SDA[31:16] 0x480021C4 mode4: gpio_170
      the pinmux address of gpio 170 must be 0x480021C6.
      
      The former wrong address broke i2c3 (used by hdmi ddc), resulting in
      kernel message:
        omap_i2c 48060000.i2c: controller timed out
      
      Fixes: 8cecf52b ("ARM: omap3-beagle.dts: add display information")
      Signed-off-by: default avatarCarl Frederik Werner <frederik@cfbw.eu>
      Signed-off-by: default avatarTony Lindgren <tony@atomide.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      49d8fc8a
    • Grazvydas Ignotas's avatar
      ARM: dts: omap5-uevm.dts: fix i2c5 pinctrl offsets · b43dbfb4
      Grazvydas Ignotas authored
      commit 1dbdad75 upstream.
      
      The i2c5 pinctrl offsets are wrong. If the bootloader doesn't set the
      pins up, communication with tca6424a doesn't work (controller timeouts)
      and it is not possible to enable HDMI.
      
      Fixes: 9be495c4 ("ARM: dts: omap5-evm: Add I2c pinctrl data")
      Signed-off-by: default avatarGrazvydas Ignotas <notasas@gmail.com>
      Signed-off-by: default avatarTony Lindgren <tony@atomide.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b43dbfb4
    • Doug Anderson's avatar
      ARM: 8425/1: kgdb: Don't try to stop the machine when setting breakpoints · ca4104a0
      Doug Anderson authored
      commit 7ae85dc7 upstream.
      
      In (23a4e405 arm: kgdb: Handle read-only text / modules) we moved to
      using patch_text() to set breakpoints so that we could handle the case
      when we had CONFIG_DEBUG_RODATA.  That patch used patch_text().
      Unfortunately, patch_text() assumes that we're not in atomic context
      when it runs since it needs to grab a mutex and also wait for other
      CPUs to stop (which it does with a completion).
      
      This would result in a stack crawl if you had
      CONFIG_DEBUG_ATOMIC_SLEEP and tried to set a breakpoint in kgdb.  The
      crawl looked something like:
      
       BUG: scheduling while atomic: swapper/0/0/0x00010007
       CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.2.0-rc7-00133-geb63b34b #1073
       Hardware name: Rockchip (Device Tree)
        (unwind_backtrace) from [<c00133d4>] (show_stack+0x20/0x24)
        (show_stack) from [<c05400e8>] (dump_stack+0x84/0xb8)
        (dump_stack) from [<c004913c>] (__schedule_bug+0x54/0x6c)
        (__schedule_bug) from [<c054065c>] (__schedule+0x80/0x668)
        (__schedule) from [<c0540cfc>] (schedule+0xb8/0xd4)
        (schedule) from [<c0543a3c>] (schedule_timeout+0x2c/0x234)
        (schedule_timeout) from [<c05417c0>] (wait_for_common+0xf4/0x188)
        (wait_for_common) from [<c0541874>] (wait_for_completion+0x20/0x24)
        (wait_for_completion) from [<c00a0104>] (__stop_cpus+0x58/0x70)
        (__stop_cpus) from [<c00a0580>] (stop_cpus+0x3c/0x54)
        (stop_cpus) from [<c00a06c4>] (__stop_machine+0xcc/0xe8)
        (__stop_machine) from [<c00a0714>] (stop_machine+0x34/0x44)
        (stop_machine) from [<c00173e8>] (patch_text+0x28/0x34)
        (patch_text) from [<c001733c>] (kgdb_arch_set_breakpoint+0x40/0x4c)
        (kgdb_arch_set_breakpoint) from [<c00a0d68>] (kgdb_validate_break_address+0x2c/0x60)
        (kgdb_validate_break_address) from [<c00a0e90>] (dbg_set_sw_break+0x1c/0xdc)
        (dbg_set_sw_break) from [<c00a2e88>] (gdb_serial_stub+0x9c4/0xba4)
        (gdb_serial_stub) from [<c00a11cc>] (kgdb_cpu_enter+0x1f8/0x60c)
        (kgdb_cpu_enter) from [<c00a18cc>] (kgdb_handle_exception+0x19c/0x1d0)
        (kgdb_handle_exception) from [<c0016f7c>] (kgdb_compiled_brk_fn+0x30/0x3c)
        (kgdb_compiled_brk_fn) from [<c00091a4>] (do_undefinstr+0x1a4/0x20c)
        (do_undefinstr) from [<c001400c>] (__und_svc_finish+0x0/0x34)
      
      It turns out that when we're in kgdb all the CPUs are stopped anyway
      so there's no reason we should be calling patch_text().  We can
      instead directly call __patch_text() which assumes that CPUs have
      already been stopped.
      
      Fixes: 23a4e405 ("arm: kgdb: Handle read-only text / modules")
      Reported-by: default avatarAapo Vienamo <avienamo@nvidia.com>
      Signed-off-by: default avatarDouglas Anderson <dianders@chromium.org>
      Reviewed-by: default avatarStephen Boyd <sboyd@codeaurora.org>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ca4104a0
    • Paul Bolle's avatar
      windfarm: decrement client count when unregistering · c014e0ff
      Paul Bolle authored
      commit fe2b5921 upstream.
      
      wf_unregister_client() increments the client count when a client
      unregisters. That is obviously incorrect. Decrement that client count
      instead.
      
      Fixes: 75722d39 ("[PATCH] ppc64: Thermal control for SMU based machines")
      Signed-off-by: default avatarPaul Bolle <pebolle@tiscali.nl>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c014e0ff
    • Ard Biesheuvel's avatar
      ARM: 8429/1: disable GCC SRA optimization · fd9af2c0
      Ard Biesheuvel authored
      commit a077224f upstream.
      
      While working on the 32-bit ARM port of UEFI, I noticed a strange
      corruption in the kernel log. The following snprintf() statement
      (in drivers/firmware/efi/efi.c:efi_md_typeattr_format())
      
      	snprintf(pos, size, "|%3s|%2s|%2s|%2s|%3s|%2s|%2s|%2s|%2s]",
      
      was producing the following output in the log:
      
      	|    |   |   |   |    |WB|WT|WC|UC]
      	|    |   |   |   |    |WB|WT|WC|UC]
      	|    |   |   |   |    |WB|WT|WC|UC]
      	|RUN|   |   |   |    |WB|WT|WC|UC]*
      	|RUN|   |   |   |    |WB|WT|WC|UC]*
      	|    |   |   |   |    |WB|WT|WC|UC]
      	|RUN|   |   |   |    |WB|WT|WC|UC]*
      	|    |   |   |   |    |WB|WT|WC|UC]
      	|RUN|   |   |   |    |   |   |   |UC]
      	|RUN|   |   |   |    |   |   |   |UC]
      
      As it turns out, this is caused by incorrect code being emitted for
      the string() function in lib/vsprintf.c. The following code
      
      	if (!(spec.flags & LEFT)) {
      		while (len < spec.field_width--) {
      			if (buf < end)
      				*buf = ' ';
      			++buf;
      		}
      	}
      	for (i = 0; i < len; ++i) {
      		if (buf < end)
      			*buf = *s;
      		++buf; ++s;
      	}
      	while (len < spec.field_width--) {
      		if (buf < end)
      			*buf = ' ';
      		++buf;
      	}
      
      when called with len == 0, triggers an issue in the GCC SRA optimization
      pass (Scalar Replacement of Aggregates), which handles promotion of signed
      struct members incorrectly. This is a known but as yet unresolved issue.
      (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65932). In this particular
      case, it is causing the second while loop to be executed erroneously a
      single time, causing the additional space characters to be printed.
      
      So disable the optimization by passing -fno-ipa-sra.
      Acked-by: default avatarNicolas Pitre <nico@linaro.org>
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fd9af2c0
    • Russell King's avatar
      ARM: fix Thumb2 signal handling when ARMv6 is enabled · 5e624cb5
      Russell King authored
      commit 9b55613f upstream.
      
      When a kernel is built covering ARMv6 to ARMv7, we omit to clear the
      IT state when entering a signal handler.  This can cause the first
      few instructions to be conditionally executed depending on the parent
      context.
      
      In any case, the original test for >= ARMv7 is broken - ARMv6 can have
      Thumb-2 support as well, and an ARMv6T2 specific build would omit this
      code too.
      
      Relax the test back to ARMv6 or greater.  This results in us always
      clearing the IT state bits in the PSR, even on CPUs where these bits
      are reserved.  However, they're reserved for the IT state, so this
      should cause no harm.
      
      Fixes: d71e1352 ("Clear the IT state when invoking a Thumb-2 signal handler")
      Acked-by: default avatarTony Lindgren <tony@atomide.com>
      Tested-by: default avatarH. Nikolaus Schaller <hns@goldelico.com>
      Tested-by: default avatarGrazvydas Ignotas <notasas@gmail.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5e624cb5
    • Guenter Roeck's avatar
      hwmon: (nct6775) Swap STEP_UP_TIME and STEP_DOWN_TIME registers for most chips · 28ef4c4b
      Guenter Roeck authored
      commit 728d2940 upstream.
      
      The STEP_UP_TIME and STEP_DOWN_TIME registers are swapped for all chips but
      NCT6775.
      Reported-by: default avatarGrazvydas Ignotas <notasas@gmail.com>
      Reviewed-by: default avatarJean Delvare <jdelvare@suse.de>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      28ef4c4b
    • Dominik Dingel's avatar
      sched: access local runqueue directly in single_task_running · feada04e
      Dominik Dingel authored
      commit 00cc1633 upstream.
      
      Commit 2ee507c4 ("sched: Add function single_task_running to let a task
      check if it is the only task running on a cpu") referenced the current
      runqueue with the smp_processor_id.  When CONFIG_DEBUG_PREEMPT is enabled,
      that is only allowed if preemption is disabled or the currrent task is
      bound to the local cpu (e.g. kernel worker).
      
      With commit f7819512 ("kvm: add halt_poll_ns module parameter") KVM
      calls single_task_running. If CONFIG_DEBUG_PREEMPT is enabled that
      generates a lot of kernel messages.
      
      To avoid adding preemption in that cases, as it would limit the usefulness,
      we change single_task_running to access directly the cpu local runqueue.
      
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Fixes: 2ee507c4Signed-off-by: default avatarDominik Dingel <dingel@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      feada04e
    • Francesco Lavra's avatar
      watchdog: sunxi: fix activation of system reset · f829ccff
      Francesco Lavra authored
      commit 0919e444 upstream.
      
      Commit f2147de3 ("watchdog: sunxi: support parameterized compatible
      strings") introduced a regression in sunxi_wdt_start(), by which
      the system reset function of the watchdog is not enabled upon
      starting the watchdog. As a result, the system is not reset when the
      watchdog expires. Fix it.
      
      Fixes: f2147de3 ("watchdog: sunxi: support parameterized compatible strings")
      Signed-off-by: default avatarFrancesco Lavra <francescolavra.fl@gmail.com>
      Acked-by: default avatarMaxime Ripard <maxime.ripard@free-electrons.com>
      Reviewed-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarWim Van Sebroeck <wim@iguana.be>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f829ccff
    • Peter Zijlstra's avatar
      perf: Fix AUX buffer refcounting · 12522227
      Peter Zijlstra authored
      commit 57ffc5ca upstream.
      
      Its currently possible to drop the last refcount to the aux buffer
      from NMI context, which results in the expected fireworks.
      
      The refcounting needs a bigger overhaul, but to cure the immediate
      problem, delay the freeing by using an irq_work.
      Reviewed-and-tested-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Reported-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20150618103249.GK19282@twins.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      12522227
    • Arnaldo Carvalho de Melo's avatar
      perf header: Fixup reading of HEADER_NRCPUS feature · 41e324d7
      Arnaldo Carvalho de Melo authored
      commit caa47047 upstream.
      
      The original patch introducing this header wrote the number of CPUs available
      and online in one order and then swapped those values when reading, fix it.
      
      Before:
      
        # perf record usleep 1
        # perf report --header-only | grep 'nrcpus \(online\|avail\)'
        # nrcpus online : 4
        # nrcpus avail : 4
        # echo 0 > /sys/devices/system/cpu/cpu2/online
        # perf record usleep 1
        # perf report --header-only | grep 'nrcpus \(online\|avail\)'
        # nrcpus online : 4
        # nrcpus avail : 3
        # echo 0 > /sys/devices/system/cpu/cpu1/online
        # perf record usleep 1
        # perf report --header-only | grep 'nrcpus \(online\|avail\)'
        # nrcpus online : 4
        # nrcpus avail : 2
      
      After the fix, bringing back the CPUs online:
      
        # perf report --header-only | grep 'nrcpus \(online\|avail\)'
        # nrcpus online : 2
        # nrcpus avail : 4
        # echo 1 > /sys/devices/system/cpu/cpu2/online
        # perf record usleep 1
        # perf report --header-only | grep 'nrcpus \(online\|avail\)'
        # nrcpus online : 3
        # nrcpus avail : 4
        # echo 1 > /sys/devices/system/cpu/cpu1/online
        # perf record usleep 1
        # perf report --header-only | grep 'nrcpus \(online\|avail\)'
        # nrcpus online : 4
        # nrcpus avail : 4
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Fixes: fbe96f29 ("perf tools: Make perf.data more self-descriptive (v8)")
      Link: http://lkml.kernel.org/r/20150911153323.GP23511@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      41e324d7
    • Ben Hutchings's avatar
      perf tools: Add empty Build files for architectures lacking them · 1477cb59
      Ben Hutchings authored
      commit 93df8a1e upstream.
      
      perf currently fails to build on MIPS as there is no
      tools/perf/arch/mips/Build file.  Adding an empty file fixes this as
      there are no MIPS-specific sources to build.
      
      It looks like the same is needed for Alpha and PA-RISC, though I
      haven't been able to test those.
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Fixes: 5e8c0fb6 ("perf build: Add arch x86 objects building")
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1438704627.7315.2.camel@decadent.org.ukSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1477cb59
    • Kan Liang's avatar
      perf stat: Get correct cpu id for print_aggr · 63f155ca
      Kan Liang authored
      commit 601083cf upstream.
      
      print_aggr() fails to print per-core/per-socket statistics after commit
      582ec082 ("perf stat: Fix per-socket output bug for uncore events")
      if events have differnt cpus. Because in print_aggr(), aggr_get_id needs
      index (not cpu id) to find core/pkg id. Also, evsel cpu maps should be
      used to get aggregated id.
      
      Here is an example:
      
      Counting events cycles,uncore_imc_0/cas_count_read/. (Uncore event has
      cpumask 0,18)
      
        $ perf stat -e cycles,uncore_imc_0/cas_count_read/ -C0,18 --per-core sleep 2
      
      Without this patch, it failes to get CPU 18 result.
      
         Performance counter stats for 'CPU(s) 0,18':
      
        S0-C0           1            7526851      cycles
        S0-C0           1               1.05 MiB  uncore_imc_0/cas_count_read/
        S1-C0           0      <not counted>      cycles
        S1-C0           0      <not counted> MiB  uncore_imc_0/cas_count_read/
      
      With this patch, it can get both CPU0 and CPU18 result.
      
         Performance counter stats for 'CPU(s) 0,18':
      
        S0-C0           1            6327768      cycles
        S0-C0           1               0.47 MiB  uncore_imc_0/cas_count_read/
        S1-C0           1             330228      cycles
        S1-C0           1               0.29 MiB  uncore_imc_0/cas_count_read/
      Signed-off-by: default avatarKan Liang <kan.liang@intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Acked-by: default avatarStephane Eranian <eranian@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Fixes: 582ec082 ("perf stat: Fix per-socket output bug for uncore events")
      Link: http://lkml.kernel.org/r/1435820925-51091-1-git-send-email-kan.liang@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      63f155ca
    • Arnaldo Carvalho de Melo's avatar
      perf hists: Update the column width for the "srcline" sort key · 5c23f8cd
      Arnaldo Carvalho de Melo authored
      commit e8e6d37e upstream.
      
      When we introduce a new sort key, we need to update the
      hists__calc_col_len() function accordingly, otherwise the width
      will be limited to strlen(header).
      
      We can't update it when obtaining a line value for a column (for
      instance, in sort__srcline_cmp()), because we reset it all when doing a
      resort (see hists__output_recalc_col_len()), so we need to, from what is
      in the hist_entry fields, set each of the column widths.
      
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Fixes: 409a8be6 ("perf tools: Add sort by src line/number")
      Link: http://lkml.kernel.org/n/tip-jgbe0yx8v1gs89cslr93pvz2@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5c23f8cd
    • Adrian Hunter's avatar
      perf tools: Fix copying of /proc/kcore · d4cb7d0f
      Adrian Hunter authored
      commit b5cabbcb upstream.
      
      A copy of /proc/kcore containing the kernel text can be made to the
      buildid cache. e.g.
      
      	perf buildid-cache -v -k /proc/kcore
      
      To workaround objdump limitations, a copy is also made when annotating
      against /proc/kcore.
      
      The copying process stops working from libelf about v1.62 onwards (the
      problem was found with v1.63).
      
      The cause is that a call to gelf_getphdr() in kcore__add_phdr() fails
      because additional validation has been added to gelf_getphdr().
      
      The use of gelf_getphdr() is a misguided attempt to get default
      initialization of the Gelf_Phdr structure.  That should not be
      necessary because every member of the Gelf_Phdr structure is
      subsequently assigned.  So just remove the call to gelf_getphdr().
      
      Similarly, a call to gelf_getehdr() in gelf_kcore__init() can be
      removed also.
      
      Committer notes:
      
      Note to stable@kernel.org, from Adrian in the cover letter for this
      patchkit:
      
      The "Fix copying of /proc/kcore" problem goes back to v3.13 if you think
      it is important enough for stable.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/1443089122-19082-3-git-send-email-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d4cb7d0f
    • Peter Zijlstra's avatar
      perf/x86/intel: Fix constraint access · 37f70c38
      Peter Zijlstra authored
      commit ebfb4988 upstream.
      
      Sasha reported that we can get here with .idx==-1, and
      cpuc->event_constraints unallocated.
      Suggested-by: default avatarStephane Eranian <eranian@google.com>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: b371b594 ("perf/x86: Fix event/group validation")
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      37f70c38
    • Azael Avalos's avatar
      toshiba_acpi: Fix hotkeys registration on some toshiba models · 61be5a59
      Azael Avalos authored
      commit 53147b6c upstream.
      
      Commit a2b3471b ("toshiba_acpi: Use the Hotkey Event Type function
      for keymap choosing") changed the *setup_keyboard function to query for
      the Hotkey Event Type to help choose the correct keymap, but turns out
      that here are certain Toshiba models out there not implementing this
      feature, and thus, failing to continue the input device registration and
      leaving such laptops without hotkey support.
      
      This patch changes such check, and instead of returning an error if
      the Hotkey Event Type is not present, we simply inform userspace about it,
      changing the message printed from err to notice, making the function
      responsible for registering the input device to continue.
      
      This issue was found on a Toshiba Portege Z30-B, but there might be
      some other models out there affected by this regression as well.
      Signed-off-by: default avatarAzael Avalos <coproscefalo@gmail.com>
      Signed-off-by: default avatarDarren Hart <dvhart@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      61be5a59
    • Nicholas Bellinger's avatar
      target: Fix v4.1 UNIT_ATTENTION se_node_acl->device_list[] NULL pointer · 35afa656
      Nicholas Bellinger authored
      This patch fixes a v4.1 only regression bug as reported by Martin
      where UNIT_ATTENTION checking for pre v4.2-rc1 RCU conversion code
      legacy se_node_acl->device_list[] was hitting a NULL pointer
      dereference in:
      
      [ 1858.639654] CPU: 2 PID: 1293 Comm: kworker/2:1 Tainted: G          I 4.1.6-fixxcopy+ #1
      [ 1858.639699] Hardware name: Dell Inc. PowerEdge R410/0N83VF, BIOS 1.11.0 07/20/2012
      [ 1858.639747] Workqueue: xcopy_wq target_xcopy_do_work [target_core_mod]
      [ 1858.639782] task: ffff880036f0cbe0 ti: ffff880317940000 task.ti: ffff880317940000
      [ 1858.639822] RIP: 0010:[<ffffffffa01d3774>]  [<ffffffffa01d3774>] target_scsi3_ua_check+0x24/0x60 [target_core_mod]
      [ 1858.639884] RSP: 0018:ffff880317943ce0  EFLAGS: 00010282
      [ 1858.639913] RAX: 0000000000000000 RBX: ffff880317943dc0 RCX: 0000000000000000
      [ 1858.639950] RDX: 0000000000000000 RSI: ffff880317943dd0 RDI: ffff88030eaee408
      [ 1858.639987] RBP: ffff88030eaee408 R08: 0000000000000001 R09: 0000000000000001
      [ 1858.640025] R10: 0000000000000000 R11: 00000000000706e0 R12: ffff880315e0a000
      [ 1858.640062] R13: ffff88030eaee408 R14: 0000000000000001 R15: ffff88030eaee408
      [ 1858.640100] FS:  0000000000000000(0000) GS:ffff880322e80000(0000) knlGS:0000000000000000
      [ 1858.640143] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [ 1858.640173] CR2: 0000000000000000 CR3: 000000000180d000 CR4: 00000000000006e0
      [ 1858.640210] Stack:
      [ 1858.640223]  ffffffffa01cadfa ffff88030eaee400 ffff880318e7c340 ffff880315e0a000
      [ 1858.640267]  ffffffffa01d8c25 ffff8800cae809e0 0000000000000400 0000000000000400
      [ 1858.640310]  ffff880318e7c3d0 0000000006b75800 0000000000080000 ffff88030eaee400
      [ 1858.640354] Call Trace:
      [ 1858.640379]  [<ffffffffa01cadfa>] ? target_setup_cmd_from_cdb+0x13a/0x2c0 [target_core_mod]
      [ 1858.640429]  [<ffffffffa01d8c25>] ? target_xcopy_setup_pt_cmd+0x85/0x320 [target_core_mod]
      [ 1858.640479]  [<ffffffffa01d9424>] ? target_xcopy_do_work+0x264/0x700 [target_core_mod]
      [ 1858.640526]  [<ffffffff810ac3a0>] ? pick_next_task_fair+0x720/0x8f0
      [ 1858.640562]  [<ffffffff8108b3fb>] ? process_one_work+0x14b/0x430
      [ 1858.640595]  [<ffffffff8108bf5b>] ? worker_thread+0x6b/0x560
      [ 1858.640627]  [<ffffffff8108bef0>] ? rescuer_thread+0x390/0x390
      [ 1858.640661]  [<ffffffff810913b3>] ? kthread+0xd3/0xf0
      [ 1858.640689]  [<ffffffff810912e0>] ? kthread_create_on_node+0x180/0x180
      
      Also, check for the same se_node_acl->device_list[] during EXTENDED_COPY
      operation as a non-holding persistent reservation port.
      Reported-by: default avatarMartin Svec <martin,svec@zoner.cz>
      Tested-by: default avatarMartin Svec <martin,svec@zoner.cz>
      Cc: Martin Svec <martin,svec@zoner.cz>
      Cc: Alex Gorbachev <ag@iss-integration.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      35afa656
    • Jenny Derzhavetz's avatar
      iser-target: Put the reference on commands waiting for unsol data · b6ac3aee
      Jenny Derzhavetz authored
      commit 3e03c4b0 upstream.
      
      The iscsi target core teardown sequence calls wait_conn for
      all active commands to finish gracefully by:
      - move the queue-pair to error state
      - drain all the completions
      - wait for the core to finish handling all session commands
      
      However, when tearing down a session while there are sequenced
      commands that are still waiting for unsolicited data outs, we can
      block forever as these are missing an extra reference put.
      
      We basically need the equivalent of iscsit_free_queue_reqs_for_conn()
      which is called after wait_conn has returned. Address this by an
      explicit walk on conn_cmd_list and put the extra reference.
      Signed-off-by: default avatarJenny Derzhavetz <jennyf@mellanox.com>
      Signed-off-by: default avatarSagi Grimberg <sagig@mellanox.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b6ac3aee
    • Jenny Derzhavetz's avatar
      iser-target: remove command with state ISTATE_REMOVE · 060f4515
      Jenny Derzhavetz authored
      commit a4c15cd9 upstream.
      
      As documented in iscsit_sequence_cmd:
      /*
       * Existing callers for iscsit_sequence_cmd() will silently
       * ignore commands with CMDSN_LOWER_THAN_EXP, so force this
       * return for CMDSN_MAXCMDSN_OVERRUN as well..
       */
      
      We need to silently finish a command when it's in ISTATE_REMOVE.
      This fixes an teardown hang we were seeing where a mis-behaved
      initiator (triggered by allocation error injections) sent us a
      cmdsn which was lower than expected.
      Signed-off-by: default avatarJenny Derzhavetz <jennyf@mellanox.com>
      Signed-off-by: default avatarSagi Grimberg <sagig@mellanox.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      060f4515
    • Nicholas Bellinger's avatar
      target: Attach EXTENDED_COPY local I/O descriptors to xcopy_pt_sess · 57bc4a2d
      Nicholas Bellinger authored
      commit 4416f89b upstream.
      
      This patch is a >= v4.1 regression bug-fix where control CDB
      emulation logic in commit 38b57f82 now expects a se_cmd->se_sess
      pointer to exist when determining T10-PI support is to be exposed
      for initiator host ports.
      
      To address this bug, go ahead and add locally generated se_cmd
      descriptors for copy-offload block-copy to it's own stand-alone
      se_session nexus, while the parent EXTENDED_COPY se_cmd descriptor
      remains associated with it's originating se_cmd->se_sess nexus.
      
      Note a valid se_cmd->se_sess is also required for future support
      of WRITE_INSERT and READ_STRIP software emulation when submitting
      backend I/O to se_device that exposes T10-PI suport.
      Reported-by: default avatarAlex Gorbachev <ag@iss-integration.com>
      Tested-by: default avatarAlex Gorbachev <ag@iss-integration.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Doug Gilbert <dgilbert@interlog.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      57bc4a2d
    • Michal Hocko's avatar
      scsi: fix scsi_error_handler vs. scsi_host_dev_release race · 4863a3f3
      Michal Hocko authored
      commit 537b604c upstream.
      
      b9d5c6b7 ("[SCSI] cleanup setting task state in
      scsi_error_handler()") has introduced a race between scsi_error_handler
      and scsi_host_dev_release resulting in the hang when the device goes
      away because scsi_error_handler might miss a wake up:
      
      CPU0					CPU1
      scsi_error_handler			scsi_host_dev_release
        					  kthread_stop()
        kthread_should_stop()
          test_bit(KTHREAD_SHOULD_STOP)
      					    set_bit(KTHREAD_SHOULD_STOP)
      					    wake_up_process()
      					    wait_for_completion()
      
        set_current_state(TASK_INTERRUPTIBLE)
        schedule()
      
      The most straightforward solution seems to be to invert the ordering of
      the set_current_state and kthread_should_stop.
      
      The issue has been noticed during reboot test on a 3.0 based kernel but
      the current code seems to be affected in the same way.
      
      [jejb: additional comment added]
      Reported-and-debugged-by: default avatarMike Mayer <Mike.Meyer@teradata.com>
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarJames Bottomley <JBottomley@Odin.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4863a3f3
    • Andy Grover's avatar
      target/iscsi: Fix np_ip bracket issue by removing np_ip · e77fb714
      Andy Grover authored
      commit 76c28f1f upstream.
      
      Revert commit 1997e625, which causes double brackets on ipv6
      inaddr_any addresses.
      
      Since we have np_sockaddr, if we need a textual representation we can
      use "%pISc".
      
      Change iscsit_add_network_portal() and iscsit_add_np() signatures to remove
      *ip_str parameter.
      
      Fix and extend some comments earlier in the function.
      
      Tested to work for :: and ::1 via iscsiadm, previously :: failed, see
      https://bugzilla.redhat.com/show_bug.cgi?id=1249107 .
      Signed-off-by: default avatarAndy Grover <agrover@redhat.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e77fb714