1. 22 Feb, 2014 5 commits
    • Naoya Horiguchi's avatar
      mm/memory-failure.c: move refcount only in !MF_COUNT_INCREASED · 2186bb4e
      Naoya Horiguchi authored
      commit 8d547ff4 upstream.
      
      mce-test detected a test failure when injecting error to a thp tail
      page.  This is because we take page refcount of the tail page in
      madvise_hwpoison() while the fix in commit a3e0f9e4
      ("mm/memory-failure.c: transfer page count from head page to tail page
      after split thp") assumes that we always take refcount on the head page.
      
      When a real memory error happens we take refcount on the head page where
      memory_failure() is called without MF_COUNT_INCREASED set, so it seems
      to me that testing memory error on thp tail page using madvise makes
      little sense.
      
      This patch cancels moving refcount in !MF_COUNT_INCREASED for valid
      testing.
      
      [akpm@linux-foundation.org: s/&&/&/]
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Chen Gong <gong.chen@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2186bb4e
    • Eric W. Biederman's avatar
      fs/file.c:fdtable: avoid triggering OOMs from alloc_fdmem · 434672c0
      Eric W. Biederman authored
      commit 96c7a2ff upstream.
      
      Recently due to a spike in connections per second memcached on 3
      separate boxes triggered the OOM killer from accept.  At the time the
      OOM killer was triggered there was 4GB out of 36GB free in zone 1.  The
      problem was that alloc_fdtable was allocating an order 3 page (32KiB) to
      hold a bitmap, and there was sufficient fragmentation that the largest
      page available was 8KiB.
      
      I find the logic that PAGE_ALLOC_COSTLY_ORDER can't fail pretty dubious
      but I do agree that order 3 allocations are very likely to succeed.
      
      There are always pathologies where order > 0 allocations can fail when
      there are copious amounts of free memory available.  Using the pigeon
      hole principle it is easy to show that it requires 1 page more than 50%
      of the pages being free to guarantee an order 1 (8KiB) allocation will
      succeed, 1 page more than 75% of the pages being free to guarantee an
      order 2 (16KiB) allocation will succeed and 1 page more than 87.5% of
      the pages being free to guarantee an order 3 allocate will succeed.
      
      A server churning memory with a lot of small requests and replies like
      memcached is a common case that if anything can will skew the odds
      against large pages being available.
      
      Therefore let's not give external applications a practical way to kill
      linux server applications, and specify __GFP_NORETRY to the kmalloc in
      alloc_fdmem.  Unless I am misreading the code and by the time the code
      reaches should_alloc_retry in __alloc_pages_slowpath (where
      __GFP_NORETRY becomes signification).  We have already tried everything
      reasonable to allocate a page and the only thing left to do is wait.  So
      not waiting and falling back to vmalloc immediately seems like the
      reasonable thing to do even if there wasn't a chance of triggering the
      OOM killer.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Cong Wang <cwang@twopensource.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      434672c0
    • Frediano Ziglio's avatar
      xen: Fix possible user space selector corruption · 119c1d25
      Frediano Ziglio authored
      commit 7cde9b27 upstream.
      
      Due to the way kernel is initialized under Xen is possible that the
      ring1 selector used by the kernel for the boot cpu end up to be copied
      to userspace leading to segmentation fault in the userspace.
      
      Xen code in the kernel initialize no-boot cpus with correct selectors (ds
      and es set to __USER_DS) but the boot one keep the ring1 (passed by Xen).
      On task context switch (switch_to) we assume that ds, es and cs already
      point to __USER_DS and __KERNEL_CSso these selector are not changed.
      
      If processor is an Intel that support sysenter instruction sysenter/sysexit
      is used so ds and es are not restored switching back from kernel to
      userspace. In the case the selectors point to a ring1 instead of __USER_DS
      the userspace code will crash on first memory access attempt (to be
      precise Xen on the emulated iret used to do sysexit will detect and set ds
      and es to zero which lead to GPF anyway).
      
      Now if an userspace process call kernel using sysenter and get rescheduled
      (for me it happen on a specific init calling wait4) could happen that the
      ring1 selector is set to ds and es.
      
      This is quite hard to detect cause after a while these selectors are fixed
      (__USER_DS seems sticky).
      
      Bisecting the code commit 7076aada appears
      to be the first one that have this issue.
      Signed-off-by: default avatarFrediano Ziglio <frediano.ziglio@citrix.com>
      Signed-off-by: default avatarStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Reviewed-by: default avatarAndrew Cooper <andrew.cooper3@citrix.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      119c1d25
    • David Vrabel's avatar
      xen/p2m: check MFN is in range before using the m2p table · 3d048e58
      David Vrabel authored
      commit 0160676b upstream.
      
      On hosts with more than 168 GB of memory, a 32-bit guest may attempt
      to grant map an MFN that is error cannot lookup in its mapping of the
      m2p table.  There is an m2p lookup as part of m2p_add_override() and
      m2p_remove_override().  The lookup falls off the end of the mapped
      portion of the m2p and (because the mapping is at the highest virtual
      address) wraps around and the lookup causes a fault on what appears to
      be a user space address.
      
      do_page_fault() (thinking it's a fault to a userspace address), tries
      to lock mm->mmap_sem.  If the gntdev device is used for the grant map,
      m2p_add_override() is called from from gnttab_mmap() with mm->mmap_sem
      already locked.  do_page_fault() then deadlocks.
      
      The deadlock would most commonly occur when a 64-bit guest is started
      and xenconsoled attempts to grant map its console ring.
      
      Introduce mfn_to_pfn_no_overrides() which checks the MFN is within the
      mapped portion of the m2p table before accessing the table and use
      this in m2p_add_override(), m2p_remove_override(), and mfn_to_pfn()
      (which already had the correct range check).
      
      All faults caused by accessing the non-existant parts of the m2p are
      thus within the kernel address space and exception_fixup() is called
      without trying to lock mm->mmap_sem.
      
      This means that for MFNs that are outside the mapped range of the m2p
      then mfn_to_pfn() will always look in the m2p overrides.  This is
      correct because it must be a foreign MFN (and the PFN in the m2p in
      this case is only relevant for the other domain).
      
      v3: check for auto_translated_physmap in mfn_to_pfn_no_overrides()
      v2: in mfn_to_pfn() look in m2p_overrides if the MFN is out of
          range as it's probably foreign.
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Cc: Stefano Stabellini <stefano.stabellini@citrix.com>
      Cc: Jan Beulich <JBeulich@suse.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: default avatarStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3d048e58
    • David Vrabel's avatar
      xen-blkfront: handle backend CLOSED without CLOSING · 80ead821
      David Vrabel authored
      commit 36613717 upstream.
      
      Backend drivers shouldn't transistion to CLOSED unless the frontend is
      CLOSED.  If a backend does transition to CLOSED too soon then the
      frontend may not see the CLOSING state and will not properly shutdown.
      
      So, treat an unexpected backend CLOSED state the same as CLOSING.
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      80ead821
  2. 20 Feb, 2014 26 commits
    • Greg Kroah-Hartman's avatar
      Linux 3.10.31 · a43e02cf
      Greg Kroah-Hartman authored
      a43e02cf
    • Xishi Qiu's avatar
      mm: fix process accidentally killed by mce because of huge page migration · 6843d925
      Xishi Qiu authored
      Based on c8721bbb upstream, but only the
      bugfix portion pulled out.
      
      Hi Naoya or Greg,
      
      We found a bug in 3.10.x.
      The problem is that we accidentally have a hwpoisoned hugepage in free
      hugepage list. It could happend in the the following scenario:
      
              process A                           process B
      
        migrate_huge_page
        put_page (old hugepage)
          linked to free hugepage list
                                           hugetlb_fault
                                             hugetlb_no_page
                                               alloc_huge_page
                                                 dequeue_huge_page_vma
                                                   dequeue_huge_page_node
                                                     (steal hwpoisoned hugepage)
        set_page_hwpoison_huge_page
        dequeue_hwpoisoned_huge_page
          (fail to dequeue)
      
      I tested this bug, one process keeps allocating huge page, and I 
      use sysfs interface to soft offline a huge page, then received:
      "MCE: Killing UCP:2717 due to hardware memory corruption fault at 8200034"
      
      Upstream kernel is free from this bug because of these two commits:
      
      f15bdfa8
      mm/memory-failure.c: fix memory leak in successful soft offlining
      
      c8721bbb
      mm: memory-hotplug: enable memory hotplug to handle hugepage
      
      The first one, although the problem is about memory leak, this patch
      moves unset_migratetype_isolate(), which is important to avoid the race.
      The latter is not a bug fix and it's too big, so I rewrite a small one.
      
      The following patch can fix this bug.(please apply f15bdfa8 first)
      Signed-off-by: default avatarXishi Qiu <qiuxishi@huawei.com>
      Reviewed-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6843d925
    • Jan Kara's avatar
      IB/qib: Convert qib_user_sdma_pin_pages() to use get_user_pages_fast() · 2d9258e4
      Jan Kara authored
      commit 603e7729 upstream.
      
      qib_user_sdma_queue_pkts() gets called with mmap_sem held for
      writing. Except for get_user_pages() deep down in
      qib_user_sdma_pin_pages() we don't seem to need mmap_sem at all.  Even
      more interestingly the function qib_user_sdma_queue_pkts() (and also
      qib_user_sdma_coalesce() called somewhat later) call copy_from_user()
      which can hit a page fault and we deadlock on trying to get mmap_sem
      when handling that fault.
      
      So just make qib_user_sdma_pin_pages() use get_user_pages_fast() and
      leave mmap_sem locking for mm.
      
      This deadlock has actually been observed in the wild when the node
      is under memory pressure.
      Reviewed-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      [Backported to 3.10: (Thanks to Ben Huthings)
       - Adjust context
       - Adjust indentation and nr_pages argument in qib_user_sdma_pin_pages()]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2d9258e4
    • Naoya Horiguchi's avatar
      mm/memory-failure.c: fix memory leak in successful soft offlining · b0d4c0f8
      Naoya Horiguchi authored
      commit f15bdfa8 upstream.
      
      After a successful page migration by soft offlining, the source page is
      not properly freed and it's never reusable even if we unpoison it
      afterward.
      
      This is caused by the race between freeing page and setting PG_hwpoison.
      In successful soft offlining, the source page is put (and the refcount
      becomes 0) by putback_lru_page() in unmap_and_move(), where it's linked
      to pagevec and actual freeing back to buddy is delayed.  So if
      PG_hwpoison is set for the page before freeing, the freeing does not
      functions as expected (in such case freeing aborts in
      free_pages_prepare() check.)
      
      This patch tries to make sure to free the source page before setting
      PG_hwpoison on it.  To avoid reallocating, the page keeps
      MIGRATE_ISOLATE until after setting PG_hwpoison.
      
      This patch also removes obsolete comments about "keeping elevated
      refcount" because what they say is not true.  Unlike memory_failure(),
      soft_offline_page() uses no special page isolation code, and the
      soft-offlined pages have no elevated.
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b0d4c0f8
    • Stanislaw Gruszka's avatar
      pinctrl: protect pinctrl_list add · 1fc42b84
      Stanislaw Gruszka authored
      commit 7b320cb1 upstream.
      
      We have few fedora bug reports about list corruption on pinctrl,
      for example:
      https://bugzilla.redhat.com/show_bug.cgi?id=1051918
      
      Most likely corruption happen due lack of protection of pinctrl_list
      when adding new nodes to it. Patch corrects that.
      
      Fixes: 42fed7ba ("pinctrl: move subsystem mutex to pinctrl_dev struct")
      Signed-off-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Acked-by: default avatarStephen Warren <swarren@nvidia.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1fc42b84
    • Tony Prisk's avatar
      pinctrl: vt8500: Change devicetree data parsing · c2b570f3
      Tony Prisk authored
      commit f17248ed upstream.
      
      Due to an assumption in the VT8500 pinctrl driver, the value passed
      from devicetree for 'wm,pull' was not explicitly translated before
      being passed to pinconf.
      
      Since v3.10, changes to 'enum pin_config_param', PIN_CONFIG_BIAS_PULL_(UP/DOWN)
      no longer map 1-to-1 with the expected values in devicetree.
      
      This patch adds a small translation between the devicetree values (0..2)
      and the enum pin_config_param equivalent values.
      Signed-off-by: default avatarTony Prisk <linux@prisktech.co.nz>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c2b570f3
    • Peter Oberparleiter's avatar
      x86, hweight: Fix BUG when booting with CONFIG_GCOV_PROFILE_ALL=y · f878b94f
      Peter Oberparleiter authored
      commit 6583327c upstream.
      
      Commit d61931d8, "x86: Add optimized popcnt variants" introduced
      compile flag -fcall-saved-rdi for lib/hweight.c. When combined with
      options -fprofile-arcs and -O2, this flag causes gcc to generate
      broken constructor code. As a result, a 64 bit x86 kernel compiled
      with CONFIG_GCOV_PROFILE_ALL=y prints message "gcov: could not create
      file" and runs into sproadic BUGs during boot.
      
      The gcc people indicate that these kinds of problems are endemic when
      using ad hoc calling conventions.  It is therefore best to treat any
      file compiled with ad hoc calling conventions as an isolated
      environment and avoid things like profiling or coverage analysis,
      since those subsystems assume a "normal" calling conventions.
      
      This patch avoids the bug by excluding lib/hweight.o from coverage
      profiling.
      Reported-by: default avatarMeelis Roos <mroos@linux.ee>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarPeter Oberparleiter <oberpar@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/52F3A30C.7050205@linux.vnet.ibm.comSigned-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f878b94f
    • Dave Jones's avatar
      mxl111sf: Fix compile when CONFIG_DVB_USB_MXL111SF is unset · f19148c7
      Dave Jones authored
      commit 13e1b87c upstream.
      
      Fix the following build error:
      
      drivers/media/usb/dvb-usb-v2/
      mxl111sf-tuner.h:72:9: error: expected ‘;’, ‘,’ or ‘)’ before ‘struct’
               struct mxl111sf_tuner_config *cfg)
      Signed-off-by: default avatarDave Jones <davej@fedoraproject.org>
      Signed-off-by: default avatarMichael Krufky <mkrufky@linuxtv.org>
      Signed-off-by: default avatarMauro Carvalho Chehab <m.chehab@samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f19148c7
    • Antti Palosaari's avatar
      af9035: add ID [2040:f900] Hauppauge WinTV-MiniStick 2 · 728311f5
      Antti Palosaari authored
      commit f2e4c5e0 upstream.
      
      Add USB ID [2040:f900] for Hauppauge WinTV-MiniStick 2.
      Device is build upon IT9135 chipset.
      Tested-by: default avatarStefan Becker <schtefan@gmx.net>
      Signed-off-by: default avatarAntti Palosaari <crope@iki.fi>
      Signed-off-by: default avatarMauro Carvalho Chehab <m.chehab@samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      728311f5
    • Mel Gorman's avatar
      x86: mm: change tlb_flushall_shift for IvyBridge · 9fa5c526
      Mel Gorman authored
      commit f98b7a77 upstream.
      
      There was a large performance regression that was bisected to
      commit 611ae8e3 ("x86/tlb: enable tlb flush range support for
      x86").  This patch simply changes the default balance point
      between a local and global flush for IvyBridge.
      
      In the interest of allowing the tests to be reproduced, this
      patch was tested using mmtests 0.15 with the following
      configurations
      
      	configs/config-global-dhp__tlbflush-performance
      	configs/config-global-dhp__scheduler-performance
      	configs/config-global-dhp__network-performance
      
      Results are from two machines
      
      Ivybridge   4 threads:  Intel(R) Core(TM) i3-3240 CPU @ 3.40GHz
      Ivybridge   8 threads:  Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
      
      Page fault microbenchmark showed nothing interesting.
      
      Ebizzy was configured to run multiple iterations and threads.
      Thread counts ranged from 1 to NR_CPUS*2. For each thread count,
      it ran 100 iterations and each iteration lasted 10 seconds.
      
      Ivybridge 4 threads
                          3.13.0-rc7            3.13.0-rc7
                             vanilla           altshift-v3
      Mean   1     6395.44 (  0.00%)     6789.09 (  6.16%)
      Mean   2     7012.85 (  0.00%)     8052.16 ( 14.82%)
      Mean   3     6403.04 (  0.00%)     6973.74 (  8.91%)
      Mean   4     6135.32 (  0.00%)     6582.33 (  7.29%)
      Mean   5     6095.69 (  0.00%)     6526.68 (  7.07%)
      Mean   6     6114.33 (  0.00%)     6416.64 (  4.94%)
      Mean   7     6085.10 (  0.00%)     6448.51 (  5.97%)
      Mean   8     6120.62 (  0.00%)     6462.97 (  5.59%)
      
      Ivybridge 8 threads
                           3.13.0-rc7            3.13.0-rc7
                              vanilla           altshift-v3
      Mean   1      7336.65 (  0.00%)     7787.02 (  6.14%)
      Mean   2      8218.41 (  0.00%)     9484.13 ( 15.40%)
      Mean   3      7973.62 (  0.00%)     8922.01 ( 11.89%)
      Mean   4      7798.33 (  0.00%)     8567.03 (  9.86%)
      Mean   5      7158.72 (  0.00%)     8214.23 ( 14.74%)
      Mean   6      6852.27 (  0.00%)     7952.45 ( 16.06%)
      Mean   7      6774.65 (  0.00%)     7536.35 ( 11.24%)
      Mean   8      6510.50 (  0.00%)     6894.05 (  5.89%)
      Mean   12     6182.90 (  0.00%)     6661.29 (  7.74%)
      Mean   16     6100.09 (  0.00%)     6608.69 (  8.34%)
      
      Ebizzy hits the worst case scenario for TLB range flushing every
      time and it shows for these Ivybridge CPUs at least that the
      default choice is a poor on.  The patch addresses the problem.
      
      Next was a tlbflush microbenchmark written by Alex Shi at
      http://marc.info/?l=linux-kernel&m=133727348217113 .  It
      measures access costs while the TLB is being flushed.  The
      expectation is that if there are always full TLB flushes that
      the benchmark would suffer and it benefits from range flushing
      
      There are 320 iterations of the test per thread count.  The
      number of entries is randomly selected with a min of 1 and max
      of 512.  To ensure a reasonably even spread of entries, the full
      range is broken up into 8 sections and a random number selected
      within that section.
      
      iteration 1, random number between 0-64
      iteration 2, random number between 64-128 etc
      
      This is still a very weak methodology.  When you do not know
      what are typical ranges, random is a reasonable choice but it
      can be easily argued that the opimisation was for smaller ranges
      and an even spread is not representative of any workload that
      matters.  To improve this, we'd need to know the probability
      distribution of TLB flush range sizes for a set of workloads
      that are considered "common", build a synthetic trace and feed
      that into this benchmark.  Even that is not perfect because it
      would not account for the time between flushes but there are
      limits of what can be reasonably done and still be doing
      something useful.  If a representative synthetic trace is
      provided then this benchmark could be revisited and the shift values retuned.
      
      Ivybridge 4 threads
                              3.13.0-rc7            3.13.0-rc7
                                 vanilla           altshift-v3
      Mean       1       10.50 (  0.00%)       10.50 (  0.03%)
      Mean       2       17.59 (  0.00%)       17.18 (  2.34%)
      Mean       3       22.98 (  0.00%)       21.74 (  5.41%)
      Mean       5       47.13 (  0.00%)       46.23 (  1.92%)
      Mean       8       43.30 (  0.00%)       42.56 (  1.72%)
      
      Ivybridge 8 threads
                               3.13.0-rc7            3.13.0-rc7
                                  vanilla           altshift-v3
      Mean       1         9.45 (  0.00%)        9.36 (  0.93%)
      Mean       2         9.37 (  0.00%)        9.70 ( -3.54%)
      Mean       3         9.36 (  0.00%)        9.29 (  0.70%)
      Mean       5        14.49 (  0.00%)       15.04 ( -3.75%)
      Mean       8        41.08 (  0.00%)       38.73 (  5.71%)
      Mean       13       32.04 (  0.00%)       31.24 (  2.49%)
      Mean       16       40.05 (  0.00%)       39.04 (  2.51%)
      
      For both CPUs, average access time is reduced which is good as
      this is the benchmark that was used to tune the shift values in
      the first place albeit it is now known *how* the benchmark was
      used.
      
      The scheduler benchmarks were somewhat inconclusive.  They
      showed gains and losses and makes me reconsider how stable those
      benchmarks really are or if something else might be interfering
      with the test results recently.
      
      Network benchmarks were inconclusive.  Almost all results were
      flat except for netperf-udp tests on the 4 thread machine.
      These results were unstable and showed large variations between
      reboots.  It is unknown if this is a recent problems but I've
      noticed before that netperf-udp results tend to vary.
      
      Based on these results, changing the default for Ivybridge seems
      like a logical choice.
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Tested-by: default avatarDavidlohr Bueso <davidlohr@hp.com>
      Reviewed-by: default avatarAlex Shi <alex.shi@linaro.org>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/n/tip-cqnadffh1tiqrshthRj3Esge@git.kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9fa5c526
    • KOSAKI Motohiro's avatar
      mm: __set_page_dirty uses spin_lock_irqsave instead of spin_lock_irq · 7e405afc
      KOSAKI Motohiro authored
      commit 227d53b3 upstream.
      
      To use spin_{un}lock_irq is dangerous if caller disabled interrupt.
      During aio buffer migration, we have a possibility to see the following
      call stack.
      
      aio_migratepage  [disable interrupt]
        migrate_page_copy
          clear_page_dirty_for_io
            set_page_dirty
              __set_page_dirty_buffers
                __set_page_dirty
                  spin_lock_irq
      
      This mean, current aio migration is a deadlockable.  spin_lock_irqsave
      is a safer alternative and we should use it.
      Signed-off-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reported-by: David Rientjes rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7e405afc
    • KOSAKI Motohiro's avatar
      mm: __set_page_dirty_nobuffers() uses spin_lock_irqsave() instead of spin_lock_irq() · dce0b4fc
      KOSAKI Motohiro authored
      commit a85d9df1 upstream.
      
      During aio stress test, we observed the following lockdep warning.  This
      mean AIO+numa_balancing is currently deadlockable.
      
      The problem is, aio_migratepage disable interrupt, but
      __set_page_dirty_nobuffers unintentionally enable it again.
      
      Generally, all helper function should use spin_lock_irqsave() instead of
      spin_lock_irq() because they don't know caller at all.
      
         other info that might help us debug this:
          Possible unsafe locking scenario:
      
                CPU0
                ----
           lock(&(&ctx->completion_lock)->rlock);
           <Interrupt>
             lock(&(&ctx->completion_lock)->rlock);
      
          *** DEADLOCK ***
      
            dump_stack+0x19/0x1b
            print_usage_bug+0x1f7/0x208
            mark_lock+0x21d/0x2a0
            mark_held_locks+0xb9/0x140
            trace_hardirqs_on_caller+0x105/0x1d0
            trace_hardirqs_on+0xd/0x10
            _raw_spin_unlock_irq+0x2c/0x50
            __set_page_dirty_nobuffers+0x8c/0xf0
            migrate_page_copy+0x434/0x540
            aio_migratepage+0xb1/0x140
            move_to_new_page+0x7d/0x230
            migrate_pages+0x5e5/0x700
            migrate_misplaced_page+0xbc/0xf0
            do_numa_page+0x102/0x190
            handle_pte_fault+0x241/0x970
            handle_mm_fault+0x265/0x370
            __do_page_fault+0x172/0x5a0
            do_page_fault+0x1a/0x70
            page_fault+0x28/0x30
      Signed-off-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dce0b4fc
    • Takashi Iwai's avatar
      ALSA: hda - Add missing mixer widget for AD1983 · 99d35c4d
      Takashi Iwai authored
      commit c7579fed upstream.
      
      The mixer widget on AD1983 at NID 0x0e was missing in the commit
      [f2f8be43: ALSA: hda - Add aamix NID to AD codecs].
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=70011Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      99d35c4d
    • Takashi Iwai's avatar
      ALSA: hda - Fix missing VREF setup for Mac Pro 1,1 · 1e8968b3
      Takashi Iwai authored
      commit c20f31ec upstream.
      
      Mac Pro 1,1 with ALC889A codec needs the VREF setup on NID 0x18 to
      VREF50, in order to make the speaker working.  The same fixup was
      already needed for MacBook Air 1,1, so we can reuse it.
      Reported-by: default avatarNicolai Beuermann <mail@nico-beuermann.de>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1e8968b3
    • Takashi Iwai's avatar
      ALSA: usb-audio: Add missing kconfig dependecy · c3d49ed2
      Takashi Iwai authored
      commit 4fa71c15 upstream.
      
      The commit 44dcbbb1 introduced the usage of bitreverse helpers but
      forgot to add the dependency.  This patch adds the selection for
      CONFIG_BITREVERSE.
      
      Fixes: 44dcbbb1 ('ALSA: snd-usb: add support for bit-reversed byte formats')
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c3d49ed2
    • Vinayak Kale's avatar
      arm64: add DSB after icache flush in __flush_icache_all() · 02599bad
      Vinayak Kale authored
      commit 5044bad4 upstream.
      
      Add DSB after icache flush to complete the cache maintenance operation.
      The function __flush_icache_all() is used only for user space mappings
      and an ISB is not required because of an exception return before executing
      user instructions. An exception return would behave like an ISB.
      Signed-off-by: default avatarVinayak Kale <vkale@apm.com>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      02599bad
    • Nathan Lynch's avatar
      arm64: vdso: fix coarse clock handling · fb569d15
      Nathan Lynch authored
      commit 069b9186 upstream.
      
      When __kernel_clock_gettime is called with a CLOCK_MONOTONIC_COARSE or
      CLOCK_REALTIME_COARSE clock id, it returns incorrectly to whatever the
      caller has placed in x2 ("ret x2" to return from the fast path).  Fix
      this by saving x30/LR to x2 only in code that will call
      __do_get_tspec, restoring x30 afterward, and using a plain "ret" to
      return from the routine.
      
      Also: while the resulting tv_nsec value for CLOCK_REALTIME and
      CLOCK_MONOTONIC must be computed using intermediate values that are
      left-shifted by cs_shift (x12, set by __do_get_tspec), the results for
      coarse clocks should be calculated using unshifted values
      (xtime_coarse_nsec is in units of actual nanoseconds).  The current
      code shifts intermediate values by x12 unconditionally, but x12 is
      uninitialized when servicing a coarse clock.  Fix this by setting x12
      to 0 once we know we are dealing with a coarse clock id.
      Signed-off-by: default avatarNathan Lynch <nathan_lynch@mentor.com>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fb569d15
    • Catalin Marinas's avatar
      arm64: Invalidate the TLB when replacing pmd entries during boot · f35f27e7
      Catalin Marinas authored
      commit a55f9929 upstream.
      
      With the 64K page size configuration, __create_page_tables in head.S
      maps enough memory to get started but using 64K pages rather than 512M
      sections with a single pgd/pud/pmd entry pointing to a pte table.
      create_mapping() may override the pgd/pud/pmd table entry with a block
      (section) one if the RAM size is more than 512MB and aligned correctly.
      For the end of this block to be accessible, the old TLB entry must be
      invalidated.
      Reported-by: default avatarMark Salter <msalter@redhat.com>
      Tested-by: default avatarMark Salter <msalter@redhat.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f35f27e7
    • Will Deacon's avatar
      arm64: vdso: prevent ld from aligning PT_LOAD segments to 64k · 6737eaeb
      Will Deacon authored
      commit 40507403 upstream.
      
      Whilst the text segment for our VDSO is marked as PT_LOAD in the ELF
      headers, it is mapped by the kernel and not actually subject to
      demand-paging. ld doesn't realise this, and emits a p_align field of 64k
      (the maximum supported page size), which conflicts with the load address
      picked by the kernel on 4k systems, which will be 4k aligned. This
      causes GDB to fail with "Failed to read a valid object file image from
      memory" when attempting to load the VDSO.
      
      This patch passes the -n option to ld, which prevents it from aligning
      PT_LOAD segments to the maximum page size.
      Reported-by: default avatarKyle McMartin <kyle@redhat.com>
      Acked-by: default avatarKyle McMartin <kyle@redhat.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6737eaeb
    • Nathan Lynch's avatar
      arm64: vdso: update wtm fields for CLOCK_MONOTONIC_COARSE · b666f382
      Nathan Lynch authored
      commit d4022a33 upstream.
      
      Update wall-to-monotonic fields in the VDSO data page
      unconditionally.  These are used to service CLOCK_MONOTONIC_COARSE,
      which is not guarded by use_syscall.
      Signed-off-by: default avatarNathan Lynch <nathan_lynch@mentor.com>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b666f382
    • Lior Amsalem's avatar
      irqchip: armada-370-xp: fix IPI race condition · 86f06cac
      Lior Amsalem authored
      commit a6f089e9 upstream.
      
      In the Armada 370/XP driver, when we receive an IRQ 0, we read the
      list of doorbells that caused the interrupt from register
      ARMADA_370_XP_IN_DRBEL_CAUSE_OFFS. This gives the list of IPIs that
      were generated. However, instead of acknowledging only the IPIs that
      were generated, we acknowledge *all* the IPIs, by writing
      ~IPI_DOORBELL_MASK in the ARMADA_370_XP_IN_DRBEL_CAUSE_OFFS register.
      
      This creates a race condition: if a new IPI that isn't part of the
      ones read into the temporary "ipimask" variable is fired before we
      acknowledge all IPIs, then we will simply loose it. This is causing
      scheduling hangs on SMP intensive workloads.
      
      It is important to mention that this ARMADA_370_XP_IN_DRBEL_CAUSE_OFFS
      register has the following behavior: "A CPU write of 0 clears the bits
      in this field. A CPU write of 1 has no effect". This is what allows us
      to simply write ~ipimask to acknoledge the handled IPIs.
      
      Notice that the same problem is present in the MSI implementation, but
      it will be fixed as a separate patch, so that this IPI fix can be
      pushed to older stable versions as appropriate (all the way to 3.8),
      while the MSI code only appeared in 3.13.
      Signed-off-by: default avatarLior Amsalem <alior@marvell.com>
      Signed-off-by: default avatarThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Fixes: 344e873e 'arm: mvebu: Add IPI support via doorbells'
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarJason Cooper <jason@lakedaemon.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      86f06cac
    • Harald Freudenberger's avatar
      crypto: s390 - fix des and des3_ede ctr concurrency issue · 4cd88080
      Harald Freudenberger authored
      commit ee97dc7d upstream.
      
      In s390 des and 3des ctr mode there is one preallocated page
      used to speed up the en/decryption. This page is not protected
      against concurrent usage and thus there is a potential of data
      corruption with multiple threads.
      
      The fix introduces locking/unlocking the ctr page and a slower
      fallback solution at concurrency situations.
      Signed-off-by: default avatarHarald Freudenberger <freude@linux.vnet.ibm.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4cd88080
    • Harald Freudenberger's avatar
      crypto: s390 - fix des and des3_ede cbc concurrency issue · 21d6cfc9
      Harald Freudenberger authored
      commit adc3fcf1 upstream.
      
      In s390 des and des3_ede cbc mode the iv value is not protected
      against concurrency access and modifications from another running
      en/decrypt operation which is using the very same tfm struct
      instance. This fix copies the iv to the local stack before
      the crypto operation and stores the value back when done.
      Signed-off-by: default avatarHarald Freudenberger <freude@linux.vnet.ibm.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      21d6cfc9
    • Harald Freudenberger's avatar
      crypto: s390 - fix concurrency issue in aes-ctr mode · 5b64d8f7
      Harald Freudenberger authored
      commit 0519e9ad upstream.
      
      The aes-ctr mode uses one preallocated page without any concurrency
      protection. When multiple threads run aes-ctr encryption or decryption
      this can lead to data corruption.
      
      The patch introduces locking for the page and a fallback solution with
      slower en/decryption performance in concurrency situations.
      Signed-off-by: default avatarHarald Freudenberger <freude@linux.vnet.ibm.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5b64d8f7
    • Josef Bacik's avatar
      Btrfs: disable snapshot aware defrag for now · a082f872
      Josef Bacik authored
      commit 8101c8db upstream.
      
      It's just broken and it's taking a lot of effort to fix it, so for now just
      disable it so people can defrag in peace.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a082f872
    • Stephen Smalley's avatar
      SELinux: Fix kernel BUG on empty security contexts. · 95664c96
      Stephen Smalley authored
      commit 2172fa70 upstream.
      
      Setting an empty security context (length=0) on a file will
      lead to incorrectly dereferencing the type and other fields
      of the security context structure, yielding a kernel BUG.
      As a zero-length security context is never valid, just reject
      all such security contexts whether coming from userspace
      via setxattr or coming from the filesystem upon a getxattr
      request by SELinux.
      
      Setting a security context value (empty or otherwise) unknown to
      SELinux in the first place is only possible for a root process
      (CAP_MAC_ADMIN), and, if running SELinux in enforcing mode, only
      if the corresponding SELinux mac_admin permission is also granted
      to the domain by policy.  In Fedora policies, this is only allowed for
      specific domains such as livecd for setting down security contexts
      that are not defined in the build host policy.
      
      Reproducer:
      su
      setenforce 0
      touch foo
      setfattr -n security.selinux foo
      
      Caveat:
      Relabeling or removing foo after doing the above may not be possible
      without booting with SELinux disabled.  Any subsequent access to foo
      after doing the above will also trigger the BUG.
      
      BUG output from Matthew Thode:
      [  473.893141] ------------[ cut here ]------------
      [  473.962110] kernel BUG at security/selinux/ss/services.c:654!
      [  473.995314] invalid opcode: 0000 [#6] SMP
      [  474.027196] Modules linked in:
      [  474.058118] CPU: 0 PID: 8138 Comm: ls Tainted: G      D   I
      3.13.0-grsec #1
      [  474.116637] Hardware name: Supermicro X8ST3/X8ST3, BIOS 2.0
      07/29/10
      [  474.149768] task: ffff8805f50cd010 ti: ffff8805f50cd488 task.ti:
      ffff8805f50cd488
      [  474.183707] RIP: 0010:[<ffffffff814681c7>]  [<ffffffff814681c7>]
      context_struct_compute_av+0xce/0x308
      [  474.219954] RSP: 0018:ffff8805c0ac3c38  EFLAGS: 00010246
      [  474.252253] RAX: 0000000000000000 RBX: ffff8805c0ac3d94 RCX:
      0000000000000100
      [  474.287018] RDX: ffff8805e8aac000 RSI: 00000000ffffffff RDI:
      ffff8805e8aaa000
      [  474.321199] RBP: ffff8805c0ac3cb8 R08: 0000000000000010 R09:
      0000000000000006
      [  474.357446] R10: 0000000000000000 R11: ffff8805c567a000 R12:
      0000000000000006
      [  474.419191] R13: ffff8805c2b74e88 R14: 00000000000001da R15:
      0000000000000000
      [  474.453816] FS:  00007f2e75220800(0000) GS:ffff88061fc00000(0000)
      knlGS:0000000000000000
      [  474.489254] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  474.522215] CR2: 00007f2e74716090 CR3: 00000005c085e000 CR4:
      00000000000207f0
      [  474.556058] Stack:
      [  474.584325]  ffff8805c0ac3c98 ffffffff811b549b ffff8805c0ac3c98
      ffff8805f1190a40
      [  474.618913]  ffff8805a6202f08 ffff8805c2b74e88 00068800d0464990
      ffff8805e8aac860
      [  474.653955]  ffff8805c0ac3cb8 000700068113833a ffff880606c75060
      ffff8805c0ac3d94
      [  474.690461] Call Trace:
      [  474.723779]  [<ffffffff811b549b>] ? lookup_fast+0x1cd/0x22a
      [  474.778049]  [<ffffffff81468824>] security_compute_av+0xf4/0x20b
      [  474.811398]  [<ffffffff8196f419>] avc_compute_av+0x2a/0x179
      [  474.843813]  [<ffffffff8145727b>] avc_has_perm+0x45/0xf4
      [  474.875694]  [<ffffffff81457d0e>] inode_has_perm+0x2a/0x31
      [  474.907370]  [<ffffffff81457e76>] selinux_inode_getattr+0x3c/0x3e
      [  474.938726]  [<ffffffff81455cf6>] security_inode_getattr+0x1b/0x22
      [  474.970036]  [<ffffffff811b057d>] vfs_getattr+0x19/0x2d
      [  475.000618]  [<ffffffff811b05e5>] vfs_fstatat+0x54/0x91
      [  475.030402]  [<ffffffff811b063b>] vfs_lstat+0x19/0x1b
      [  475.061097]  [<ffffffff811b077e>] SyS_newlstat+0x15/0x30
      [  475.094595]  [<ffffffff8113c5c1>] ? __audit_syscall_entry+0xa1/0xc3
      [  475.148405]  [<ffffffff8197791e>] system_call_fastpath+0x16/0x1b
      [  475.179201] Code: 00 48 85 c0 48 89 45 b8 75 02 0f 0b 48 8b 45 a0 48
      8b 3d 45 d0 b6 00 8b 40 08 89 c6 ff ce e8 d1 b0 06 00 48 85 c0 49 89 c7
      75 02 <0f> 0b 48 8b 45 b8 4c 8b 28 eb 1e 49 8d 7d 08 be 80 01 00 00 e8
      [  475.255884] RIP  [<ffffffff814681c7>]
      context_struct_compute_av+0xce/0x308
      [  475.296120]  RSP <ffff8805c0ac3c38>
      [  475.328734] ---[ end trace f076482e9d754adc ]---
      Reported-by: default avatarMatthew Thode <mthode@mthode.org>
      Signed-off-by: default avatarStephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: default avatarPaul Moore <pmoore@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      95664c96
  3. 13 Feb, 2014 9 commits
    • Greg Kroah-Hartman's avatar
      Linux 3.10.30 · 29b5f720
      Greg Kroah-Hartman authored
      29b5f720
    • Dirk Brandewie's avatar
      intel_pstate: Correct calculation of min pstate value · 0df520d4
      Dirk Brandewie authored
      commit 7244cb62 upstream.
      
      The minimum pstate is supposed to be a percentage of the maximum P
      state available.  Calculate min using max pstate and not the
      current max which may have been limited by the user
      Signed-off-by: default avatarDirk Brandewie <dirk.j.brandewie@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0df520d4
    • Brennan Shacklett's avatar
      intel_pstate: Improve accuracy by not truncating until final result · e34ce30f
      Brennan Shacklett authored
      commit d253d2a5 upstream.
      
      This patch addresses Bug 60727
      (https://bugzilla.kernel.org/show_bug.cgi?id=60727)
      which was due to the truncation of intermediate values in the
      calculations, which causes the code to consistently underestimate the
      current cpu frequency, specifically 100% cpu utilization was truncated
      down to the setpoint of 97%. This patch fixes the problem by keeping
      the results of all intermediate calculations as fixed point numbers
      rather scaling them back and forth between integers and fixed point.
      
      References: https://bugzilla.kernel.org/show_bug.cgi?id=60727Signed-off-by: default avatarBrennan Shacklett <bpshacklett@gmail.com>
      Acked-by: default avatarDirk Brandewie <dirk.j.brandewie@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e34ce30f
    • Srinivas Pandruvada's avatar
      intel_pstate: fix no_turbo · 3dc642a3
      Srinivas Pandruvada authored
      commit 1ccf7a1c upstream.
      
      When sysfs for no_turbo is set, then also some p states in turbo regions
      are observed. This patch will set IDA Engage bit when no_turbo is set to
      explicitly disengage turbo.
      Signed-off-by: default avatarSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Acked-by: default avatarDirk Brandewie <dirk.j.brandewie@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3dc642a3
    • Nell Hardcastle's avatar
      intel_pstate: Add Haswell CPU models · 0b977de8
      Nell Hardcastle authored
      commit 6cdcdb79 upstream.
      
      Enable the intel_pstate driver for Haswell CPUs. One missing Ivy Bridge
      model (0x3E) is also included. Models referenced from
      tools/power/x86/turbostat/turbostat.c:has_nehalem_turbo_ratio_limit
      Signed-off-by: default avatarNell Hardcastle <nell@spicious.com>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Acked-by: default avatarDirk Brandewie <dirk.j.brandewie@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0b977de8
    • John Stultz's avatar
      timekeeping: Avoid possible deadlock from clock_was_set_delayed · d9e8fada
      John Stultz authored
      commit 6fdda9a9 upstream.
      
      As part of normal operaions, the hrtimer subsystem frequently calls
      into the timekeeping code, creating a locking order of
        hrtimer locks -> timekeeping locks
      
      clock_was_set_delayed() was suppoed to allow us to avoid deadlocks
      between the timekeeping the hrtimer subsystem, so that we could
      notify the hrtimer subsytem the time had changed while holding
      the timekeeping locks. This was done by scheduling delayed work
      that would run later once we were out of the timekeeing code.
      
      But unfortunately the lock chains are complex enoguh that in
      scheduling delayed work, we end up eventually trying to grab
      an hrtimer lock.
      
      Sasha Levin noticed this in testing when the new seqlock lockdep
      enablement triggered the following (somewhat abrieviated) message:
      
      [  251.100221] ======================================================
      [  251.100221] [ INFO: possible circular locking dependency detected ]
      [  251.100221] 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty #4053 Not tainted
      [  251.101967] -------------------------------------------------------
      [  251.101967] kworker/10:1/4506 is trying to acquire lock:
      [  251.101967]  (timekeeper_seq){----..}, at: [<ffffffff81160e96>] retrigger_next_event+0x56/0x70
      [  251.101967]
      [  251.101967] but task is already holding lock:
      [  251.101967]  (hrtimer_bases.lock#11){-.-...}, at: [<ffffffff81160e7c>] retrigger_next_event+0x3c/0x70
      [  251.101967]
      [  251.101967] which lock already depends on the new lock.
      [  251.101967]
      [  251.101967]
      [  251.101967] the existing dependency chain (in reverse order) is:
      [  251.101967]
      -> #5 (hrtimer_bases.lock#11){-.-...}:
      [snipped]
      -> #4 (&rt_b->rt_runtime_lock){-.-...}:
      [snipped]
      -> #3 (&rq->lock){-.-.-.}:
      [snipped]
      -> #2 (&p->pi_lock){-.-.-.}:
      [snipped]
      -> #1 (&(&pool->lock)->rlock){-.-...}:
      [  251.101967]        [<ffffffff81194803>] validate_chain+0x6c3/0x7b0
      [  251.101967]        [<ffffffff81194d9d>] __lock_acquire+0x4ad/0x580
      [  251.101967]        [<ffffffff81194ff2>] lock_acquire+0x182/0x1d0
      [  251.101967]        [<ffffffff84398500>] _raw_spin_lock+0x40/0x80
      [  251.101967]        [<ffffffff81153e69>] __queue_work+0x1a9/0x3f0
      [  251.101967]        [<ffffffff81154168>] queue_work_on+0x98/0x120
      [  251.101967]        [<ffffffff81161351>] clock_was_set_delayed+0x21/0x30
      [  251.101967]        [<ffffffff811c4bd1>] do_adjtimex+0x111/0x160
      [  251.101967]        [<ffffffff811e2711>] compat_sys_adjtimex+0x41/0x70
      [  251.101967]        [<ffffffff843a4b49>] ia32_sysret+0x0/0x5
      [  251.101967]
      -> #0 (timekeeper_seq){----..}:
      [snipped]
      [  251.101967] other info that might help us debug this:
      [  251.101967]
      [  251.101967] Chain exists of:
        timekeeper_seq --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock#11
      
      [  251.101967]  Possible unsafe locking scenario:
      [  251.101967]
      [  251.101967]        CPU0                    CPU1
      [  251.101967]        ----                    ----
      [  251.101967]   lock(hrtimer_bases.lock#11);
      [  251.101967]                                lock(&rt_b->rt_runtime_lock);
      [  251.101967]                                lock(hrtimer_bases.lock#11);
      [  251.101967]   lock(timekeeper_seq);
      [  251.101967]
      [  251.101967]  *** DEADLOCK ***
      [  251.101967]
      [  251.101967] 3 locks held by kworker/10:1/4506:
      [  251.101967]  #0:  (events){.+.+.+}, at: [<ffffffff81154960>] process_one_work+0x200/0x530
      [  251.101967]  #1:  (hrtimer_work){+.+...}, at: [<ffffffff81154960>] process_one_work+0x200/0x530
      [  251.101967]  #2:  (hrtimer_bases.lock#11){-.-...}, at: [<ffffffff81160e7c>] retrigger_next_event+0x3c/0x70
      [  251.101967]
      [  251.101967] stack backtrace:
      [  251.101967] CPU: 10 PID: 4506 Comm: kworker/10:1 Not tainted 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty #4053
      [  251.101967] Workqueue: events clock_was_set_work
      
      So the best solution is to avoid calling clock_was_set_delayed() while
      holding the timekeeping lock, and instead using a flag variable to
      decide if we should call clock_was_set() once we've released the locks.
      
      This works for the case here, where the do_adjtimex() was the deadlock
      trigger point. Unfortuantely, in update_wall_time() we still hold
      the jiffies lock, which would deadlock with the ipi triggered by
      clock_was_set(), preventing us from calling it even after we drop the
      timekeeping lock. So instead call clock_was_set_delayed() at that point.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Tested-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d9e8fada
    • Borislav Petkov's avatar
      rtc-cmos: Add an alarm disable quirk · 29a16182
      Borislav Petkov authored
      commit d5a1c7e3 upstream.
      
      41c7f742 ("rtc: Disable the alarm in the hardware (v2)") added the
      functionality to disable the RTC wake alarm when shutting down the box.
      
      However, there are at least two b0rked BIOSes we know about:
      
      https://bugzilla.novell.com/show_bug.cgi?id=812592
      https://bugzilla.novell.com/show_bug.cgi?id=805740
      
      where, when wakeup alarm is enabled in the BIOS, the machine reboots
      automatically right after shutdown, regardless of what wakeup time is
      programmed.
      
      Bisecting the issue lead to this patch so disable its functionality with
      a DMI quirk only for those boxes.
      
      Cc: Brecht Machiels <brecht@mos6581.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Rabin Vincent <rabin.vincent@stericsson.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      [jstultz: Changed variable name for clarity, added extra dmi entry]
      Tested-by: default avatarBrecht Machiels <brecht@mos6581.org>
      Tested-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      29a16182
    • John Stultz's avatar
      timekeeping: Fix missing timekeeping_update in suspend path · 226e0f71
      John Stultz authored
      commit 330a1617 upstream.
      
      Since 48cdc135 (Implement a shadow timekeeper), we have to
      call timekeeping_update() after any adjustment to the timekeeping
      structure in order to make sure that any adjustments to the structure
      persist.
      
      In the timekeeping suspend path, we udpate the timekeeper
      structure, so we should be sure to update the shadow-timekeeper
      before releasing the timekeeping locks. Currently this isn't done.
      
      In most cases, the next time related code to run would be
      timekeeping_resume, which does update the shadow-timekeeper, but
      in an abundence of caution, this patch adds the call to
      timekeeping_update() in the suspend path.
      
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      226e0f71
    • John Stultz's avatar
      timekeeping: Fix CLOCK_TAI timer/nanosleep delays · a8ad6b67
      John Stultz authored
      commit 04005f60 upstream.
      
      A think-o in the calculation of the monotonic -> tai time offset
      results in CLOCK_TAI timers and nanosleeps to expire late (the
      latency is ~2x the tai offset).
      
      Fix this by adding the tai offset from the realtime offset instead
      of subtracting.
      
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a8ad6b67