1. 12 Nov, 2015 29 commits
    • Stephen Smalley's avatar
      x86/mm: Set NX on gap between __ex_table and rodata · 29d61448
      Stephen Smalley authored
      commit ab76f7b4 upstream.
      
      Unused space between the end of __ex_table and the start of
      rodata can be left W+x in the kernel page tables.  Extend the
      setting of the NX bit to cover this gap by starting from
      text_end rather than rodata_start.
      
        Before:
        ---[ High Kernel Mapping ]---
        0xffffffff80000000-0xffffffff81000000          16M                               pmd
        0xffffffff81000000-0xffffffff81600000           6M     ro         PSE     GLB x  pmd
        0xffffffff81600000-0xffffffff81754000        1360K     ro                 GLB x  pte
        0xffffffff81754000-0xffffffff81800000         688K     RW                 GLB x  pte
        0xffffffff81800000-0xffffffff81a00000           2M     ro         PSE     GLB NX pmd
        0xffffffff81a00000-0xffffffff81b3b000        1260K     ro                 GLB NX pte
        0xffffffff81b3b000-0xffffffff82000000        4884K     RW                 GLB NX pte
        0xffffffff82000000-0xffffffff82200000           2M     RW         PSE     GLB NX pmd
        0xffffffff82200000-0xffffffffa0000000         478M                               pmd
      
        After:
        ---[ High Kernel Mapping ]---
        0xffffffff80000000-0xffffffff81000000          16M                               pmd
        0xffffffff81000000-0xffffffff81600000           6M     ro         PSE     GLB x  pmd
        0xffffffff81600000-0xffffffff81754000        1360K     ro                 GLB x  pte
        0xffffffff81754000-0xffffffff81800000         688K     RW                 GLB NX pte
        0xffffffff81800000-0xffffffff81a00000           2M     ro         PSE     GLB NX pmd
        0xffffffff81a00000-0xffffffff81b3b000        1260K     ro                 GLB NX pte
        0xffffffff81b3b000-0xffffffff82000000        4884K     RW                 GLB NX pte
        0xffffffff82000000-0xffffffff82200000           2M     RW         PSE     GLB NX pmd
        0xffffffff82200000-0xffffffffa0000000         478M                               pmd
      Signed-off-by: default avatarStephen Smalley <sds@tycho.nsa.gov>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Link: http://lkml.kernel.org/r/1443704662-3138-1-git-send-email-sds@tycho.nsa.govSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      29d61448
    • Lee, Chun-Yi's avatar
      x86/kexec: Fix kexec crash in syscall kexec_file_load() · aee991fa
      Lee, Chun-Yi authored
      commit e3c41e37 upstream.
      
      The original bug is a page fault crash that sometimes happens
      on big machines when preparing ELF headers:
      
          BUG: unable to handle kernel paging request at ffffc90613fc9000
          IP: [<ffffffff8103d645>] prepare_elf64_ram_headers_callback+0x165/0x260
      
      The bug is caused by us under-counting the number of memory ranges
      and subsequently not allocating enough ELF header space for them.
      The bug is typically masked on smaller systems, because the ELF header
      allocation is rounded up to the next page.
      
      This patch modifies the code in fill_up_crash_elf_data() by using
      walk_system_ram_res() instead of walk_system_ram_range() to correctly
      count the max number of crash memory ranges. That's because the
      walk_system_ram_range() filters out small memory regions that
      reside in the same page, but walk_system_ram_res() does not.
      
      Here's how I found the bug:
      
      After tracing prepare_elf64_headers() and prepare_elf64_ram_headers_callback(),
      the code uses walk_system_ram_res() to fill-in crash memory regions information
      to the program header, so it counts those small memory regions that
      reside in a page area.
      
      But, when the kernel was using walk_system_ram_range() in
      fill_up_crash_elf_data() to count the number of crash memory regions,
      it filters out small regions.
      
      I printed those small memory regions, for example:
      
        kexec: Get nr_ram ranges. vaddr=0xffff880077592258 paddr=0x77592258, sz=0xdc0
      
      Based on the code in walk_system_ram_range(), this memory region
      will be filtered out:
      
        pfn = (0x77592258 + 0x1000 - 1) >> 12 = 0x77593
        end_pfn = (0x77592258 + 0xfc0 -1 + 1) >> 12 = 0x77593
        end_pfn - pfn = 0x77593 - 0x77593 = 0  <=== if (end_pfn > pfn) is FALSE
      
      So, the max_nr_ranges that's counted by the kernel doesn't include
      small memory regions - causing us to under-allocate the required space.
      That causes the page fault crash that happens in a later code path
      when preparing ELF headers.
      
      This bug is not easy to reproduce on small machines that have few
      CPUs, because the allocated page aligned ELF buffer has more free
      space to cover those small memory regions' PT_LOAD headers.
      Signed-off-by: default avatarLee, Chun-Yi <jlee@suse.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: kexec@lists.infradead.org
      Cc: linux-kernel@vger.kernel.org
      Link: http://lkml.kernel.org/r/1443531537-29436-1-git-send-email-jlee@suse.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      aee991fa
    • Dave Airlie's avatar
      drm/dp/mst: drop cancel work sync in the mstb destroy path (v2) · 00e16080
      Dave Airlie authored
      commit 274d8352 upstream.
      
      Since 9eb1e57f
      drm/dp/mst: make sure mst_primary mstb is valid in work function
      
      we validate the mstb structs in the work function, and doing
      that takes a reference. So we should never get here with the
      work function running using the mstb device, only if the work
      function hasn't run yet or is running for another mstb.
      
      So we don't need to sync the work here, this was causing
      lockdep spew as below.
      
      [  +0.000160] =============================================
      [  +0.000001] [ INFO: possible recursive locking detected ]
      [  +0.000002] 3.10.0-320.el7.rhel72.stable.backport.3.x86_64.debug #1 Tainted: G        W      ------------
      [  +0.000001] ---------------------------------------------
      [  +0.000001] kworker/4:2/1262 is trying to acquire lock:
      [  +0.000001]  ((&mgr->work)){+.+.+.}, at: [<ffffffff810b29a5>] flush_work+0x5/0x2e0
      [  +0.000007]
      but task is already holding lock:
      [  +0.000001]  ((&mgr->work)){+.+.+.}, at: [<ffffffff810b57e4>] process_one_work+0x1b4/0x710
      [  +0.000004]
      other info that might help us debug this:
      [  +0.000001]  Possible unsafe locking scenario:
      
      [  +0.000002]        CPU0
      [  +0.000000]        ----
      [  +0.000001]   lock((&mgr->work));
      [  +0.000002]   lock((&mgr->work));
      [  +0.000001]
       *** DEADLOCK ***
      
      [  +0.000001]  May be due to missing lock nesting notation
      
      [  +0.000002] 2 locks held by kworker/4:2/1262:
      [  +0.000001]  #0:  (events_long){.+.+.+}, at: [<ffffffff810b57e4>] process_one_work+0x1b4/0x710
      [  +0.000004]  #1:  ((&mgr->work)){+.+.+.}, at: [<ffffffff810b57e4>] process_one_work+0x1b4/0x710
      [  +0.000003]
      stack backtrace:
      [  +0.000003] CPU: 4 PID: 1262 Comm: kworker/4:2 Tainted: G        W      ------------   3.10.0-320.el7.rhel72.stable.backport.3.x86_64.debug #1
      [  +0.000001] Hardware name: LENOVO 20EGS0R600/20EGS0R600, BIOS GNET71WW (2.19 ) 02/05/2015
      [  +0.000008] Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper]
      [  +0.000001]  ffffffff82c26c90 00000000a527b914 ffff88046399bae8 ffffffff816fe04d
      [  +0.000004]  ffff88046399bb58 ffffffff8110f47f ffff880461438000 0001009b840fc003
      [  +0.000002]  ffff880461438a98 0000000000000000 0000000804dc26e1 ffffffff824a2c00
      [  +0.000003] Call Trace:
      [  +0.000004]  [<ffffffff816fe04d>] dump_stack+0x19/0x1b
      [  +0.000004]  [<ffffffff8110f47f>] __lock_acquire+0x115f/0x1250
      [  +0.000002]  [<ffffffff8110fd49>] lock_acquire+0x99/0x1e0
      [  +0.000002]  [<ffffffff810b29a5>] ? flush_work+0x5/0x2e0
      [  +0.000002]  [<ffffffff810b29ee>] flush_work+0x4e/0x2e0
      [  +0.000002]  [<ffffffff810b29a5>] ? flush_work+0x5/0x2e0
      [  +0.000004]  [<ffffffff81025905>] ? native_sched_clock+0x35/0x80
      [  +0.000002]  [<ffffffff81025959>] ? sched_clock+0x9/0x10
      [  +0.000002]  [<ffffffff810da1f5>] ? local_clock+0x25/0x30
      [  +0.000002]  [<ffffffff8110dca9>] ? mark_held_locks+0xb9/0x140
      [  +0.000003]  [<ffffffff810b4ed5>] ? __cancel_work_timer+0x95/0x160
      [  +0.000002]  [<ffffffff810b4ee8>] __cancel_work_timer+0xa8/0x160
      [  +0.000002]  [<ffffffff810b4fb0>] cancel_work_sync+0x10/0x20
      [  +0.000007]  [<ffffffffa0160d17>] drm_dp_destroy_mst_branch_device+0x27/0x120 [drm_kms_helper]
      [  +0.000006]  [<ffffffffa0163968>] drm_dp_mst_link_probe_work+0x78/0xa0 [drm_kms_helper]
      [  +0.000002]  [<ffffffff810b5850>] process_one_work+0x220/0x710
      [  +0.000002]  [<ffffffff810b57e4>] ? process_one_work+0x1b4/0x710
      [  +0.000005]  [<ffffffff810b5e5b>] worker_thread+0x11b/0x3a0
      [  +0.000003]  [<ffffffff810b5d40>] ? process_one_work+0x710/0x710
      [  +0.000002]  [<ffffffff810beced>] kthread+0xed/0x100
      [  +0.000003]  [<ffffffff810bec00>] ? insert_kthread_work+0x80/0x80
      [  +0.000003]  [<ffffffff817121d8>] ret_from_fork+0x58/0x90
      
      v2: add flush_work.
      Reviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      00e16080
    • Dave Airlie's avatar
      drm/dp/mst: fixup handling hotplug on port removal. · ae2be527
      Dave Airlie authored
      commit df4839fd upstream.
      
      output ports should always have a connector, unless
      in the rare case connector allocation fails in the
      driver.
      
      In this case we only need to teardown the pdt,
      and free the struct, and there is no need to
      send a hotplug msg.
      
      In the case were we add the port to the destroy
      list we need to send a hotplug if we destroy
      any connectors, so userspace knows to reprobe
      stuff.
      
      this patch also handles port->connector allocation
      failing which should be a rare event, but makes
      the code consistent.
      Reviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ae2be527
    • Mel Gorman's avatar
      mm: hugetlbfs: skip shared VMAs when unmapping private pages to satisfy a fault · f3bbe547
      Mel Gorman authored
      commit 2f84a899 upstream.
      
      SunDong reported the following on
      
        https://bugzilla.kernel.org/show_bug.cgi?id=103841
      
      	I think I find a linux bug, I have the test cases is constructed. I
      	can stable recurring problems in fedora22(4.0.4) kernel version,
      	arch for x86_64.  I construct transparent huge page, when the parent
      	and child process with MAP_SHARE, MAP_PRIVATE way to access the same
      	huge page area, it has the opportunity to lead to huge page copy on
      	write failure, and then it will munmap the child corresponding mmap
      	area, but then the child mmap area with VM_MAYSHARE attributes, child
      	process munmap this area can trigger VM_BUG_ON in set_vma_resv_flags
      	functions (vma - > vm_flags & VM_MAYSHARE).
      
      There were a number of problems with the report (e.g.  it's hugetlbfs that
      triggers this, not transparent huge pages) but it was fundamentally
      correct in that a VM_BUG_ON in set_vma_resv_flags() can be triggered that
      looks like this
      
      	 vma ffff8804651fd0d0 start 00007fc474e00000 end 00007fc475e00000
      	 next ffff8804651fd018 prev ffff8804651fd188 mm ffff88046b1b1800
      	 prot 8000000000000027 anon_vma           (null) vm_ops ffffffff8182a7a0
      	 pgoff 0 file ffff88106bdb9800 private_data           (null)
      	 flags: 0x84400fb(read|write|shared|mayread|maywrite|mayexec|mayshare|dontexpand|hugetlb)
      	 ------------
      	 kernel BUG at mm/hugetlb.c:462!
      	 SMP
      	 Modules linked in: xt_pkttype xt_LOG xt_limit [..]
      	 CPU: 38 PID: 26839 Comm: map Not tainted 4.0.4-default #1
      	 Hardware name: Dell Inc. PowerEdge R810/0TT6JF, BIOS 2.7.4 04/26/2012
      	 set_vma_resv_flags+0x2d/0x30
      
      The VM_BUG_ON is correct because private and shared mappings have
      different reservation accounting but the warning clearly shows that the
      VMA is shared.
      
      When a private COW fails to allocate a new page then only the process
      that created the VMA gets the page -- all the children unmap the page.
      If the children access that data in the future then they get killed.
      
      The problem is that the same file is mapped shared and private.  During
      the COW, the allocation fails, the VMAs are traversed to unmap the other
      private pages but a shared VMA is found and the bug is triggered.  This
      patch identifies such VMAs and skips them.
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Reported-by: default avatarSunDong <sund_sky@126.com>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: David Rientjes <rientjes@google.com>
      Reviewed-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      f3bbe547
    • Dirk Müller's avatar
      Use WARN_ON_ONCE for missing X86_FEATURE_NRIPS · 3c67c5eb
      Dirk Müller authored
      commit d2922422 upstream.
      
      The cpu feature flags are not ever going to change, so warning
      everytime can cause a lot of kernel log spam
      (in our case more than 10GB/hour).
      
      The warning seems to only occur when nested virtualization is
      enabled, so it's probably triggered by a KVM bug.  This is a
      sensible and safe change anyway, and the KVM bug fix might not
      be suitable for stable releases anyway.
      Signed-off-by: default avatarDirk Mueller <dmueller@suse.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      3c67c5eb
    • Bandan Das's avatar
      KVM: nSVM: Check for NRIPS support before updating control field · e0ce5838
      Bandan Das authored
      commit f104765b upstream.
      
      If hardware doesn't support DecodeAssist - a feature that provides
      more information about the intercept in the VMCB, KVM decodes the
      instruction and then updates the next_rip vmcb control field.
      However, NRIP support itself depends on cpuid Fn8000_000A_EDX[NRIPS].
      Since skip_emulated_instruction() doesn't verify nrip support
      before accepting control.next_rip as valid, avoid writing this
      field if support isn't present.
      Signed-off-by: default avatarBandan Das <bsd@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      e0ce5838
    • Matt Fleming's avatar
      x86/efi: Fix boot crash by mapping EFI memmap entries bottom-up at runtime, instead of top-down · 4a76b64a
      Matt Fleming authored
      commit a5caa209 upstream.
      
      Beginning with UEFI v2.5 EFI_PROPERTIES_TABLE was introduced
      that signals that the firmware PE/COFF loader supports splitting
      code and data sections of PE/COFF images into separate EFI
      memory map entries. This allows the kernel to map those regions
      with strict memory protections, e.g. EFI_MEMORY_RO for code,
      EFI_MEMORY_XP for data, etc.
      
      Unfortunately, an unwritten requirement of this new feature is
      that the regions need to be mapped with the same offsets
      relative to each other as observed in the EFI memory map. If
      this is not done crashes like this may occur,
      
        BUG: unable to handle kernel paging request at fffffffefe6086dd
        IP: [<fffffffefe6086dd>] 0xfffffffefe6086dd
        Call Trace:
         [<ffffffff8104c90e>] efi_call+0x7e/0x100
         [<ffffffff81602091>] ? virt_efi_set_variable+0x61/0x90
         [<ffffffff8104c583>] efi_delete_dummy_variable+0x63/0x70
         [<ffffffff81f4e4aa>] efi_enter_virtual_mode+0x383/0x392
         [<ffffffff81f37e1b>] start_kernel+0x38a/0x417
         [<ffffffff81f37495>] x86_64_start_reservations+0x2a/0x2c
         [<ffffffff81f37582>] x86_64_start_kernel+0xeb/0xef
      
      Here 0xfffffffefe6086dd refers to an address the firmware
      expects to be mapped but which the OS never claimed was mapped.
      The issue is that included in these regions are relative
      addresses to other regions which were emitted by the firmware
      toolchain before the "splitting" of sections occurred at
      runtime.
      
      Needless to say, we don't satisfy this unwritten requirement on
      x86_64 and instead map the EFI memory map entries in reverse
      order. The above crash is almost certainly triggerable with any
      kernel newer than v3.13 because that's when we rewrote the EFI
      runtime region mapping code, in commit d2f7cbe7 ("x86/efi:
      Runtime services virtual mapping"). For kernel versions before
      v3.13 things may work by pure luck depending on the
      fragmentation of the kernel virtual address space at the time we
      map the EFI regions.
      
      Instead of mapping the EFI memory map entries in reverse order,
      where entry N has a higher virtual address than entry N+1, map
      them in the same order as they appear in the EFI memory map to
      preserve this relative offset between regions.
      
      This patch has been kept as small as possible with the intention
      that it should be applied aggressively to stable and
      distribution kernels. It is very much a bugfix rather than
      support for a new feature, since when EFI_PROPERTIES_TABLE is
      enabled we must map things as outlined above to even boot - we
      have no way of asking the firmware not to split the code/data
      regions.
      
      In fact, this patch doesn't even make use of the more strict
      memory protections available in UEFI v2.5. That will come later.
      Suggested-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Reported-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Chun-Yi <jlee@suse.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: James Bottomley <JBottomley@Odin.com>
      Cc: Lee, Chun-Yi <jlee@suse.com>
      Cc: Leif Lindholm <leif.lindholm@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Jones <pjones@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Link: http://lkml.kernel.org/r/1443218539-7610-2-git-send-email-matt@codeblueprint.co.ukSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      4a76b64a
    • Ben Hutchings's avatar
      genirq: Fix race in register_irq_proc() · af9f14ca
      Ben Hutchings authored
      commit 95c2b175 upstream.
      
      Per-IRQ directories in procfs are created only when a handler is first
      added to the irqdesc, not when the irqdesc is created.  In the case of
      a shared IRQ, multiple tasks can race to create a directory.  This
      race condition seems to have been present forever, but is easier to
      hit with async probing.
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Link: http://lkml.kernel.org/r/1443266636.2004.2.camel@decadent.org.ukSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      af9f14ca
    • Fabiano Fidêncio's avatar
      drm/qxl: recreate the primary surface when the bo is not primary · b7b62249
      Fabiano Fidêncio authored
      commit 8d0d9401 upstream.
      
      When disabling/enabling a crtc the primary area must be updated
      independently of which crtc has been disabled/enabled.
      
      Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1264735Signed-off-by: default avatarFabiano Fidêncio <fidencio@redhat.com>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      b7b62249
    • Thomas Gleixner's avatar
      x86/process: Add proper bound checks in 64bit get_wchan() · f46e6b3b
      Thomas Gleixner authored
      commit eddd3826 upstream.
      
      Dmitry Vyukov reported the following using trinity and the memory
      error detector AddressSanitizer
      (https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel).
      
      [ 124.575597] ERROR: AddressSanitizer: heap-buffer-overflow on
      address ffff88002e280000
      [ 124.576801] ffff88002e280000 is located 131938492886538 bytes to
      the left of 28857600-byte region [ffffffff81282e0a, ffffffff82e0830a)
      [ 124.578633] Accessed by thread T10915:
      [ 124.579295] inlined in describe_heap_address
      ./arch/x86/mm/asan/report.c:164
      [ 124.579295] #0 ffffffff810dd277 in asan_report_error
      ./arch/x86/mm/asan/report.c:278
      [ 124.580137] #1 ffffffff810dc6a0 in asan_check_region
      ./arch/x86/mm/asan/asan.c:37
      [ 124.581050] #2 ffffffff810dd423 in __tsan_read8 ??:0
      [ 124.581893] #3 ffffffff8107c093 in get_wchan
      ./arch/x86/kernel/process_64.c:444
      
      The address checks in the 64bit implementation of get_wchan() are
      wrong in several ways:
      
       - The lower bound of the stack is not the start of the stack
         page. It's the start of the stack page plus sizeof (struct
         thread_info)
      
       - The upper bound must be:
      
             top_of_stack - TOP_OF_KERNEL_STACK_PADDING - 2 * sizeof(unsigned long).
      
         The 2 * sizeof(unsigned long) is required because the stack pointer
         points at the frame pointer. The layout on the stack is: ... IP FP
         ... IP FP. So we need to make sure that both IP and FP are in the
         bounds.
      
      Fix the bound checks and get rid of the mix of numeric constants, u64
      and unsigned long. Making all unsigned long allows us to use the same
      function for 32bit as well.
      
      Use READ_ONCE() when accessing the stack. This does not prevent a
      concurrent wakeup of the task and the stack changing, but at least it
      avoids TOCTOU.
      
      Also check task state at the end of the loop. Again that does not
      prevent concurrent changes, but it avoids walking for nothing.
      
      Add proper comments while at it.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Based-on-patch-from: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Reviewed-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Kostya Serebryany <kcc@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: kasan-dev <kasan-dev@googlegroups.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de>
      Link: http://lkml.kernel.org/r/20150930083302.694788319@linutronix.deSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      f46e6b3b
    • Andy Lutomirski's avatar
      x86/asm/entry: Create and use a 'TOP_OF_KERNEL_STACK_PADDING' macro · 9f21be21
      Andy Lutomirski authored
      commit 3ee4298f upstream.
      
      x86_32, unlike x86_64, pads the top of the kernel stack, because the
      hardware stack frame formats are variable in size.
      
      Document this padding and give it a name.
      
      This should make no change whatsoever to the compiled kernel
      image. It also doesn't fix any of the current bugs in this area.
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Acked-by: default avatarDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/02bf2f54b8dcb76a62a142b6dfe07d4ef7fc582e.1426009661.git.luto@amacapital.net
      [ Fixed small details, such as a missed magic constant in entry_32.S pointed out by Denys Vlasenko. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      [ kamal: 3.19-stable prereq for
        eddd3826 x86/process: Add proper bound checks in 64bit get_wchan() ]
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      9f21be21
    • Linus Torvalds's avatar
      Initialize msg/shm IPC objects before doing ipc_addid() · c5d00e12
      Linus Torvalds authored
      commit b9a53227 upstream.
      
      As reported by Dmitry Vyukov, we really shouldn't do ipc_addid() before
      having initialized the IPC object state.  Yes, we initialize the IPC
      object in a locked state, but with all the lockless RCU lookup work,
      that IPC object lock no longer means that the state cannot be seen.
      
      We already did this for the IPC semaphore code (see commit e8577d1f:
      "ipc/sem.c: fully initialize sem_array before making it visible") but we
      clearly forgot about msg and shm.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      c5d00e12
    • Paul Burton's avatar
      MIPS: CPS: #ifdef on CONFIG_MIPS_MT_SMP rather than CONFIG_MIPS_MT · 86fddae6
      Paul Burton authored
      commit 7a63076d upstream.
      
      The CONFIG_MIPS_MT symbol can be selected by CONFIG_MIPS_VPE_LOADER in
      addition to CONFIG_MIPS_MT_SMP. We only want MT code in the CPS SMP boot
      vector if we're using MT for SMP. Thus switch the config symbol we ifdef
      against to CONFIG_MIPS_MT_SMP.
      Signed-off-by: default avatarPaul Burton <paul.burton@imgtec.com>
      Cc: Markos Chandras <markos.chandras@imgtec.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Cc: linux-kernel@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/10867/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      86fddae6
    • Paul Burton's avatar
      MIPS: CPS: Don't include MT code in non-MT kernels. · 1e3159d2
      Paul Burton authored
      commit a5b0f6db upstream.
      
      The MT-specific code in mips_cps_boot_vpes can safely be omitted from
      kernels which don't support MT, with the default VPE==0 case being used
      as it would be after the has_mt (Config3.MT) check failed at runtime.
      Discarding the code entirely will save us a few bytes & allow cleaner
      handling of MT ASE instructions by later patches.
      Signed-off-by: default avatarPaul Burton <paul.burton@imgtec.com>
      Cc: Markos Chandras <markos.chandras@imgtec.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Cc: linux-kernel@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/10866/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      1e3159d2
    • Paul Burton's avatar
      MIPS: CPS: Stop dangling delay slot from has_mt. · 3d291824
      Paul Burton authored
      commit 1e5fb282 upstream.
      
      The has_mt macro ended with a branch, leaving its callers with a delay
      slot that would be executed if Config3.MT is not set. However it would
      not be executed if Config3 (or earlier Config registers) don't exist
      which makes it somewhat inconsistent at best. Fill the delay slot in the
      macro & fix the mips_cps_boot_vpes caller appropriately.
      Signed-off-by: default avatarPaul Burton <paul.burton@imgtec.com>
      Cc: Markos Chandras <markos.chandras@imgtec.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Cc: linux-kernel@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/10865/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      3d291824
    • James Hogan's avatar
      MIPS: dma-default: Fix 32-bit fall back to GFP_DMA · afdfb2e7
      James Hogan authored
      commit 53960059 upstream.
      
      If there is a DMA zone (usually 24bit = 16MB I believe), but no DMA32
      zone, as is the case for some 32-bit kernels, then massage_gfp_flags()
      will cause DMA memory allocated for devices with a 32..63-bit
      coherent_dma_mask to fall back to using __GFP_DMA, even though there may
      only be 32-bits of physical address available anyway.
      
      Correct that case to compare against a mask the size of phys_addr_t
      instead of always using a 64-bit mask.
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Fixes: a2e715a8 ("MIPS: DMA: Fix computation of DMA flags from device's coherent_dma_mask.")
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/9610/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      afdfb2e7
    • Alex Deucher's avatar
      drm/radeon: fix dpms when driver backlight control is disabled · 3da62041
      Alex Deucher authored
      commit ae93580e upstream.
      
      If driver backlight control is disabled, either by driver
      parameter or default per-asic setting, revert to the old behavior.
      
      Fixes a regression in commit:
      4281f46eReviewed-by: default avatarMichel Dänzer <michel.daenzer@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      3da62041
    • Alex Deucher's avatar
      drm/radeon: move bl encoder assignment into bl init · 64353bbf
      Alex Deucher authored
      commit 4cee6a90 upstream.
      
      So that the bl encoder will be null if the GPU does not
      control the backlight.
      Reviewed-by: default avatarMichel Dänzer <michel.daenzer@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      64353bbf
    • Michel Dänzer's avatar
      drm/radeon: Restore LCD backlight level on resume (>= R5xx) · 96aa5ef3
      Michel Dänzer authored
      commit 4281f46e upstream.
      
      Instead of only enabling the backlight (which seems to set it to max
      brightness), just re-set the current backlight level, which also takes
      care of enabling the backlight if necessary.
      
      Only the radeon_atom_encoder_dpms_dig part tested on a Kaveri laptop,
      the radeon_atom_encoder_dpms_avivo part is only compile tested.
      Signed-off-by: default avatarMichel Dänzer <michel.daenzer@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      96aa5ef3
    • shengyong's avatar
      UBI: return ENOSPC if no enough space available · 220dc62f
      shengyong authored
      commit 7c7feb2e upstream.
      
      UBI: attaching mtd1 to ubi0
      UBI: scanning is finished
      UBI error: init_volumes: not enough PEBs, required 706, available 686
      UBI error: ubi_wl_init: no enough physical eraseblocks (-20, need 1)
      UBI error: ubi_attach_mtd_dev: failed to attach mtd1, error -12 <= NOT ENOMEM
      UBI error: ubi_init: cannot attach mtd1
      
      If available PEBs are not enough when initializing volumes, return -ENOSPC
      directly. If available PEBs are not enough when initializing WL, return
      -ENOSPC instead of -ENOMEM.
      Signed-off-by: default avatarSheng Yong <shengyong1@huawei.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      Reviewed-by: default avatarDavid Gstir <david@sigma-star.at>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      220dc62f
    • Richard Weinberger's avatar
      UBI: Validate data_size · 5cddde9e
      Richard Weinberger authored
      commit 281fda27 upstream.
      
      Make sure that data_size is less than LEB size.
      Otherwise a handcrafted UBI image is able to trigger
      an out of bounds memory access in ubi_compare_lebs().
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      Reviewed-by: default avatarDavid Gstir <david@sigma-star.at>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      5cddde9e
    • Andreas Schwab's avatar
      m68k: Define asmlinkage_protect · 0c88ed43
      Andreas Schwab authored
      commit 8474ba74 upstream.
      
      Make sure the compiler does not modify arguments of syscall functions.
      This can happen if the compiler generates a tailcall to another
      function.  For example, without asmlinkage_protect sys_openat is compiled
      into this function:
      
      sys_openat:
      	clr.l %d0
      	move.w 18(%sp),%d0
      	move.l %d0,16(%sp)
      	jbra do_sys_open
      
      Note how the fourth argument is modified in place, modifying the register
      %d4 that gets restored from this stack slot when the function returns to
      user-space.  The caller may expect the register to be unmodified across
      system calls.
      Signed-off-by: default avatarAndreas Schwab <schwab@linux-m68k.org>
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      0c88ed43
    • Adrian Hunter's avatar
      perf tools: Fix copying of /proc/kcore · 7f7afdb9
      Adrian Hunter authored
      commit b5cabbcb upstream.
      
      A copy of /proc/kcore containing the kernel text can be made to the
      buildid cache. e.g.
      
      	perf buildid-cache -v -k /proc/kcore
      
      To workaround objdump limitations, a copy is also made when annotating
      against /proc/kcore.
      
      The copying process stops working from libelf about v1.62 onwards (the
      problem was found with v1.63).
      
      The cause is that a call to gelf_getphdr() in kcore__add_phdr() fails
      because additional validation has been added to gelf_getphdr().
      
      The use of gelf_getphdr() is a misguided attempt to get default
      initialization of the Gelf_Phdr structure.  That should not be
      necessary because every member of the Gelf_Phdr structure is
      subsequently assigned.  So just remove the call to gelf_getphdr().
      
      Similarly, a call to gelf_getehdr() in gelf_kcore__init() can be
      removed also.
      
      Committer notes:
      
      Note to stable@kernel.org, from Adrian in the cover letter for this
      patchkit:
      
      The "Fix copying of /proc/kcore" problem goes back to v3.13 if you think
      it is important enough for stable.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/1443089122-19082-3-git-send-email-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      7f7afdb9
    • Kapileshwar Singh's avatar
      tools lib traceevent: Fix string handling in heterogeneous arch environments · 9f56ed5d
      Kapileshwar Singh authored
      commit c2e4b24f upstream.
      
      When a trace recorded on a 32-bit device is processed with a 64-bit
      binary, the higher 32-bits of the address need to ignored.
      
      The lack of this results in the output of the 64-bit pointer
      value to the trace as the 32-bit address lookup fails in find_printk().
      
      Before:
      
        burn-1778  [003]   548.600305: bputs:   0xc0046db2s: 2cec5c058d98c
      
      After:
      
        burn-1778  [003]   548.600305: bputs:   0xc0046db2s: RT throttling activated
      
      The problem occurs in PRINT_FIELD when the field is recognized as a
      pointer to a string (of the type const char *)
      
      Heterogeneous architectures cases below can arise and should be handled:
      
      * Traces recorded using 32-bit addresses processed on a 64-bit machine
      * Traces recorded using 64-bit addresses processed on a 32-bit machine
      Reported-by: default avatarJuri Lelli <juri.lelli@arm.com>
      Signed-off-by: default avatarKapileshwar Singh <kapileshwar.singh@arm.com>
      Reviewed-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Javi Merino <javi.merino@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lkml.kernel.org/r/1442928123-13824-1-git-send-email-kapileshwar.singh@arm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      9f56ed5d
    • Salva Peiró's avatar
      staging/dgnc: fix info leak in ioctl · 08ff7938
      Salva Peiró authored
      commit 4b618433 upstream.
      
      The dgnc_mgmt_ioctl() code fails to initialize the 16 _reserved bytes of
      struct digi_dinfo after the ->dinfo_nboards member. Add an explicit
      memset(0) before filling the structure to avoid the info leak.
      Signed-off-by: default avatarSalva Peiró <speirofr@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reference: CVE-2015-7885
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      08ff7938
    • Salva Peiró's avatar
      [media] media/vivid-osd: fix info leak in ioctl · 1f7cfc48
      Salva Peiró authored
      commit eda98796 upstream.
      
      The vivid_fb_ioctl() code fails to initialize the 16 _reserved bytes of
      struct fb_vblank after the ->hcount member. Add an explicit
      memset(0) before filling the structure to avoid the info leak.
      Signed-off-by: default avatarSalva Peiró <speirofr@gmail.com>
      Signed-off-by: default avatarHans Verkuil <hans.verkuil@cisco.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@osg.samsung.com>
      Reference: CVE-2015-7884
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      1f7cfc48
    • Ben Hutchings's avatar
      ppp, slip: Validate VJ compression slot parameters completely · 875f1a0d
      Ben Hutchings authored
      commit 4ab42d78 upstream.
      
      Currently slhc_init() treats out-of-range values of rslots and tslots
      as equivalent to 0, except that if tslots is too large it will
      dereference a null pointer (CVE-2015-7799).
      
      Add a range-check at the top of the function and make it return an
      ERR_PTR() on error instead of NULL.  Change the callers accordingly.
      
      Compile-tested only.
      Reported-by: default avatar郭永刚 <guoyonggang@360.cn>
      References: http://article.gmane.org/gmane.comp.security.oss.general/17908Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      875f1a0d
    • Ben Hutchings's avatar
      isdn_ppp: Add checks for allocation failure in isdn_ppp_open() · 00e31a21
      Ben Hutchings authored
      commit 0baa57d8 upstream.
      
      Compile-tested only.
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Reference: CVE-2015-7799
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      00e31a21
  2. 30 Oct, 2015 1 commit
  3. 28 Oct, 2015 2 commits
    • Thomas Hellstrom's avatar
      drm/vmwgfx: Fix kernel NULL pointer dereference on older hardware · 64bc2869
      Thomas Hellstrom authored
      commit ed7d78b2 upstream.
      
      The commit "drm/vmwgfx: Fix up user_dmabuf refcounting", while fixing a
      kernel crash introduced a NULL pointer dereference on older hardware.
      Fix this.
      Signed-off-by: default avatarThomas Hellstrom <thellstrom@vmware.com>
      Reviewed-by: default avatarSinclair Yeh <syeh@vmware.com>
      Reviewed-by: default avatarBrian Paul <brianp@vmware.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      64bc2869
    • Filipe Manana's avatar
      Btrfs: update fix for read corruption of compressed and shared extents · d9f75147
      Filipe Manana authored
      commit 808f80b4 upstream.
      
      My previous fix in commit 005efedf ("Btrfs: fix read corruption of
      compressed and shared extents") was effective only if the compressed
      extents cover a file range with a length that is not a multiple of 16
      pages. That's because the detection of when we reached a different range
      of the file that shares the same compressed extent as the previously
      processed range was done at extent_io.c:__do_contiguous_readpages(),
      which covers subranges with a length up to 16 pages, because
      extent_readpages() groups the pages in clusters no larger than 16 pages.
      So fix this by tracking the start of the previously processed file
      range's extent map at extent_readpages().
      
      The following test case for fstests reproduces the issue:
      
        seq=`basename $0`
        seqres=$RESULT_DIR/$seq
        echo "QA output created by $seq"
        tmp=/tmp/$$
        status=1	# failure is the default!
        trap "_cleanup; exit \$status" 0 1 2 3 15
      
        _cleanup()
        {
            rm -f $tmp.*
        }
      
        # get standard environment, filters and checks
        . ./common/rc
        . ./common/filter
      
        # real QA test starts here
        _need_to_be_root
        _supported_fs btrfs
        _supported_os Linux
        _require_scratch
        _require_cloner
      
        rm -f $seqres.full
      
        test_clone_and_read_compressed_extent()
        {
            local mount_opts=$1
      
            _scratch_mkfs >>$seqres.full 2>&1
            _scratch_mount $mount_opts
      
            # Create our test file with a single extent of 64Kb that is going to
            # be compressed no matter which compression algo is used (zlib/lzo).
            $XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 64K" \
                $SCRATCH_MNT/foo | _filter_xfs_io
      
            # Now clone the compressed extent into an adjacent file offset.
            $CLONER_PROG -s 0 -d $((64 * 1024)) -l $((64 * 1024)) \
                $SCRATCH_MNT/foo $SCRATCH_MNT/foo
      
            echo "File digest before unmount:"
            md5sum $SCRATCH_MNT/foo | _filter_scratch
      
            # Remount the fs or clear the page cache to trigger the bug in
            # btrfs. Because the extent has an uncompressed length that is a
            # multiple of 16 pages, all the pages belonging to the second range
            # of the file (64K to 128K), which points to the same extent as the
            # first range (0K to 64K), had their contents full of zeroes instead
            # of the byte 0xaa. This was a bug exclusively in the read path of
            # compressed extents, the correct data was stored on disk, btrfs
            # just failed to fill in the pages correctly.
            _scratch_remount
      
            echo "File digest after remount:"
            # Must match the digest we got before.
            md5sum $SCRATCH_MNT/foo | _filter_scratch
        }
      
        echo -e "\nTesting with zlib compression..."
        test_clone_and_read_compressed_extent "-o compress=zlib"
      
        _scratch_unmount
      
        echo -e "\nTesting with lzo compression..."
        test_clone_and_read_compressed_extent "-o compress=lzo"
      
        status=0
        exit
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Tested-by: default avatarTimofey Titovets <nefelim4ag@gmail.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      d9f75147
  4. 26 Oct, 2015 8 commits