1. 20 Jun, 2014 21 commits
    • Emil Goode's avatar
      ARM: imx: fix error handling in ipu device registration · 93a5cb6c
      Emil Goode authored
      commit d1d70e5d upstream.
      
      If we fail to allocate struct platform_device pdev we
      dereference it after the goto label err.
      
      This bug was found using coccinelle.
      
      Fixes: afa77ef3 (ARM: mx3: dynamically allocate "ipu-core" devices)
      Signed-off-by: default avatarEmil Goode <emilgoode@gmail.com>
      Acked-by: default avatarUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Signed-off-by: default avatarShawn Guo <shawn.guo@freescale.com>
      Signed-off-by: default avatarOlof Johansson <olof@lixom.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      93a5cb6c
    • Joe Lawrence's avatar
      SCSI: scsi_transport_sas: move bsg destructor into sas_rphy_remove · 517e5981
      Joe Lawrence authored
      commit 6aa6caff upstream.
      
      The recent change in sysfs, bcdde7e2
      "sysfs: make __sysfs_remove_dir() recursive" revealed an asymmetric
      rphy device creation/deletion sequence in scsi_transport_sas:
      
        modprobe mpt2sas
          sas_rphy_add
            device_add A               rphy->dev
            device_add B               sas_device transport class
            device_add C               sas_end_device transport class
            device_add D               bsg class
      
        rmmod mpt2sas
          sas_rphy_delete
            sas_rphy_remove
              device_del B
              device_del C
              device_del A
                sysfs_remove_group     recursive sysfs dir removal
            sas_rphy_free
              device_del D             warning
      
        where device A is the parent of B, C, and D.
      
      When sas_rphy_free tries to unregister the bsg request queue (device D
      above), the ensuing sysfs cleanup discovers that its sysfs group has
      already been removed and emits a warning, "sysfs group... not found for
      kobject 'end_device-X:0'".
      
      Since bsg creation is a side effect of sas_rphy_add, move its
      complementary removal call into sas_rphy_remove. This imposes the
      following tear-down order for the devices above: D, B, C, A.
      
      Note the sas_device and sas_end_device transport class devices (B and C
      above) are created and destroyed both via the list match traversal in
      attribute_container_device_trigger, so the order in which they are
      handled is fixed. This is fine as long as they are deleted before their
      parent device.
      Signed-off-by: default avatarJoe Lawrence <joe.lawrence@stratus.com>
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarJames Bottomley <JBottomley@Parallels.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      517e5981
    • Alex Deucher's avatar
      drm/radeon: handle non-VGA class pci devices with ATRM · 7d683054
      Alex Deucher authored
      commit d8ade352 upstream.
      
      Newer PX systems have non-VGA pci class dGPUs.  Update
      the ATRM fetch method to handle those cases.
      
      bug:
      https://bugzilla.kernel.org/show_bug.cgi?id=75401Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      7d683054
    • Christian König's avatar
      drm/radeon: also try GART for CPU accessed buffers · bac59d1b
      Christian König authored
      commit 54409259 upstream.
      
      Placing them exclusively into VRAM might not work all the time.
      
      Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=78297Signed-off-by: default avatarChristian König <christian.koenig@amd.com>
      Reviewed-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      bac59d1b
    • Ben Skeggs's avatar
      drm/gf119-/disp: fix nasty bug which can clobber SOR0's clock setup · afb44e17
      Ben Skeggs authored
      commit 0f1d360b upstream.
      
      Fixes a LVDS bleed issue on Lenovo W530 that can occur under a
      number of circumstances.
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      afb44e17
    • Jean Delvare's avatar
      hwmon: (ntc_thermistor) Fix OF device ID mapping · 1b2b80af
      Jean Delvare authored
      commit ead82d67 upstream.
      
      The mapping from OF device IDs to platform device IDs is wrong.
      TYPE_NCPXXWB473 is 0, TYPE_NCPXXWL333 is 1, so
      ntc_thermistor_id[TYPE_NCPXXWB473] is { "ncp15wb473", TYPE_NCPXXWB473 }
      while
      ntc_thermistor_id[TYPE_NCPXXWL333] is { "ncp18wb473", TYPE_NCPXXWB473 }.
      
      So the name is wrong for all but the "ntc,ncp15wb473" entry, and the
      type is wrong for the "ntc,ncp15wl333" entry.
      
      So map the entries by index, it is neither elegant nor robust but at
      least it is correct.
      Signed-off-by: default avatarJean Delvare <jdelvare@suse.de>
      Fixes: 9e8269de hwmon: (ntc_thermistor) Add DT with IIO support to NTC thermistor driver
      Reviewed-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Cc: Naveen Krishna Chatradhi <ch.naveen@samsung.com>
      Cc: Doug Anderson <dianders@chromium.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      1b2b80af
    • Jean Delvare's avatar
      hwmon: (ntc_thermistor) Fix dependencies · f8239ad5
      Jean Delvare authored
      commit 59cf4243 upstream.
      
      In commit 9e8269de, support was added for ntc_thermistor devices being
      declared in the device tree and implemented on top of IIO. With that
      change, a dependency was added to the ntc_thermistor driver:
      
      	depends on (!OF && !IIO) || (OF && IIO)
      
      This construct has the drawback that the driver can no longer be
      selected when OF is set and IIO isn't, nor when IIO is set and OF is
      not. This is a regression for the original users of the driver.
      
      As the new code depends on IIO and is useless without OF, include it
      only if both are enabled, and set the dependencies accordingly. This
      is clearer, more simple and more correct.
      Signed-off-by: default avatarJean Delvare <jdelvare@suse.de>
      Fixes: 9e8269de hwmon: (ntc_thermistor) Add DT with IIO support to NTC thermistor driver
      Reviewed-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Cc: Naveen Krishna Chatradhi <ch.naveen@samsung.com>
      Cc: Doug Anderson <dianders@chromium.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      f8239ad5
    • Johannes Berg's avatar
      Documentation: fix DOCBOOKS=... building · 4552eb24
      Johannes Berg authored
      commit e60cbeed upstream.
      
      Prior to commit 42661299 ("[media] DocBook: Move all media docbook
      stuff into its own directory") it was possible to build only a single
      (or more) book(s) by calling, for example
      
          make htmldocs DOCBOOKS=80211.xml
      
      This now fails:
      
          cp: target `.../Documentation/DocBook//media_api' is not a directory
      
      Ignore errors from that copy to make this possible again.
      
      Fixes: 42661299 ("[media] DocBook: Move all media docbook stuff into its own directory")
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Acked-by: default avatarRandy Dunlap <rdunlap@xenotime.net>
      Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      4552eb24
    • Naoya Horiguchi's avatar
      mm/memory-failure.c: fix memory leak by race between poison and unpoison · 191df2f2
      Naoya Horiguchi authored
      commit 3e030ecc upstream.
      
      When a memory error happens on an in-use page or (free and in-use)
      hugepage, the victim page is isolated with its refcount set to one.
      
      When you try to unpoison it later, unpoison_memory() calls put_page()
      for it twice in order to bring the page back to free page pool (buddy or
      free hugepage list).  However, if another memory error occurs on the
      page which we are unpoisoning, memory_failure() returns without
      releasing the refcount which was incremented in the same call at first,
      which results in memory leak and unconsistent num_poisoned_pages
      statistics.  This patch fixes it.
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      191df2f2
    • Peter Zijlstra's avatar
      perf: Fix race in removing an event · c8db9ba0
      Peter Zijlstra authored
      commit 46ce0fe9 upstream.
      
      When removing a (sibling) event we do:
      
      	raw_spin_lock_irq(&ctx->lock);
      	perf_group_detach(event);
      	raw_spin_unlock_irq(&ctx->lock);
      
      	<hole>
      
      	perf_remove_from_context(event);
      		raw_spin_lock_irq(&ctx->lock);
      		...
      		raw_spin_unlock_irq(&ctx->lock);
      
      Now, assuming the event is a sibling, it will be 'unreachable' for
      things like ctx_sched_out() because that iterates the
      groups->siblings, and we just unhooked the sibling.
      
      So, if during <hole> we get ctx_sched_out(), it will miss the event
      and not call event_sched_out() on it, leaving it programmed on the
      PMU.
      
      The subsequent perf_remove_from_context() call will find the ctx is
      inactive and only call list_del_event() to remove the event from all
      other lists.
      
      Hereafter we can proceed to free the event; while still programmed!
      
      Close this hole by moving perf_group_detach() inside the same
      ctx->lock region(s) perf_remove_from_context() has.
      
      The condition on inherited events only in __perf_event_exit_task() is
      likely complete crap because non-inherited events are part of groups
      too and we're tearing down just the same. But leave that for another
      patch.
      
      Most-likely-Fixes: e03a9a55 ("perf: Change close() semantics for group events")
      Reported-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Tested-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Much-staring-at-traces-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Much-staring-at-traces-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20140505093124.GN17778@laptop.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      c8db9ba0
    • Peter Zijlstra's avatar
      perf: Limit perf_event_attr::sample_period to 63 bits · ed8acba3
      Peter Zijlstra authored
      commit 0819b2e3 upstream.
      
      Vince reported that using a large sample_period (one with bit 63 set)
      results in wreckage since while the sample_period is fundamentally
      unsigned (negative periods don't make sense) the way we implement
      things very much rely on signed logic.
      
      So limit sample_period to 63 bits to avoid tripping over this.
      Reported-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/n/tip-p25fhunibl4y3qi0zuqmyf4b@git.kernel.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      ed8acba3
    • Jiri Olsa's avatar
      perf: Prevent false warning in perf_swevent_add · e36e8c8f
      Jiri Olsa authored
      commit 39af6b16 upstream.
      
      The perf cpu offline callback takes down all cpu context
      events and releases swhash->swevent_hlist.
      
      This could race with task context software event being just
      scheduled on this cpu via perf_swevent_add while cpu hotplug
      code already cleaned up event's data.
      
      The race happens in the gap between the cpu notifier code
      and the cpu being actually taken down. Note that only cpu
      ctx events are terminated in the perf cpu hotplug code.
      
      It's easily reproduced with:
        $ perf record -e faults perf bench sched pipe
      
      while putting one of the cpus offline:
        # echo 0 > /sys/devices/system/cpu/cpu1/online
      
      Console emits following warning:
        WARNING: CPU: 1 PID: 2845 at kernel/events/core.c:5672 perf_swevent_add+0x18d/0x1a0()
        Modules linked in:
        CPU: 1 PID: 2845 Comm: sched-pipe Tainted: G        W    3.14.0+ #256
        Hardware name: Intel Corporation Montevina platform/To be filled by O.E.M., BIOS AMVACRB1.86C.0066.B00.0805070703 05/07/2008
         0000000000000009 ffff880077233ab8 ffffffff81665a23 0000000000200005
         0000000000000000 ffff880077233af8 ffffffff8104732c 0000000000000046
         ffff88007467c800 0000000000000002 ffff88007a9cf2a0 0000000000000001
        Call Trace:
         [<ffffffff81665a23>] dump_stack+0x4f/0x7c
         [<ffffffff8104732c>] warn_slowpath_common+0x8c/0xc0
         [<ffffffff8104737a>] warn_slowpath_null+0x1a/0x20
         [<ffffffff8110fb3d>] perf_swevent_add+0x18d/0x1a0
         [<ffffffff811162ae>] event_sched_in.isra.75+0x9e/0x1f0
         [<ffffffff8111646a>] group_sched_in+0x6a/0x1f0
         [<ffffffff81083dd5>] ? sched_clock_local+0x25/0xa0
         [<ffffffff811167e6>] ctx_sched_in+0x1f6/0x450
         [<ffffffff8111757b>] perf_event_sched_in+0x6b/0xa0
         [<ffffffff81117a4b>] perf_event_context_sched_in+0x7b/0xc0
         [<ffffffff81117ece>] __perf_event_task_sched_in+0x43e/0x460
         [<ffffffff81096f1e>] ? put_lock_stats.isra.18+0xe/0x30
         [<ffffffff8107b3c8>] finish_task_switch+0xb8/0x100
         [<ffffffff8166a7de>] __schedule+0x30e/0xad0
         [<ffffffff81172dd2>] ? pipe_read+0x3e2/0x560
         [<ffffffff8166b45e>] ? preempt_schedule_irq+0x3e/0x70
         [<ffffffff8166b45e>] ? preempt_schedule_irq+0x3e/0x70
         [<ffffffff8166b464>] preempt_schedule_irq+0x44/0x70
         [<ffffffff816707f0>] retint_kernel+0x20/0x30
         [<ffffffff8109e60a>] ? lockdep_sys_exit+0x1a/0x90
         [<ffffffff812a4234>] lockdep_sys_exit_thunk+0x35/0x67
         [<ffffffff81679321>] ? sysret_check+0x5/0x56
      
      Fixing this by tracking the cpu hotplug state and displaying
      the WARN only if current cpu is initialized properly.
      
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarJiri Olsa <jolsa@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1396861448-10097-1-git-send-email-jolsa@redhat.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      e36e8c8f
    • Thomas Gleixner's avatar
      sched: Sanitize irq accounting madness · 3cca5deb
      Thomas Gleixner authored
      commit 2d513868 upstream.
      
      Russell reported, that irqtime_account_idle_ticks() takes ages due to:
      
             for (i = 0; i < ticks; i++)
                     irqtime_account_process_tick(current, 0, rq);
      
      It's sad, that this code was written way _AFTER_ the NOHZ idle
      functionality was available. I charge myself guitly for not paying
      attention when that crap got merged with commit abb74cef ("sched:
      Export ns irqtimes through /proc/stat")
      
      So instead of looping nr_ticks times just apply the whole thing at
      once.
      
      As a side note: The whole cputime_t vs. u64 business in that context
      wants to be cleaned up as well. There is no point in having all these
      back and forth conversions. Lets standardise on u64 nsec for all
      kernel internal accounting and be done with it. Everything else does
      not make sense at all for fine grained accounting. Frederic, can you
      please take care of that?
      Reported-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Cc: Shaun Ruffell <sruffell@digium.com>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1405022307000.6261@ionos.tec.linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      3cca5deb
    • Steven Rostedt (Red Hat)'s avatar
      sched: Use CPUPRI_NR_PRIORITIES instead of MAX_RT_PRIO in cpupri check · 21af1041
      Steven Rostedt (Red Hat) authored
      commit 6227cb00 upstream.
      
      The check at the beginning of cpupri_find() makes sure that the task_pri
      variable does not exceed the cp->pri_to_cpu array length. But that length
      is CPUPRI_NR_PRIORITIES not MAX_RT_PRIO, where it will miss the last two
      priorities in that array.
      
      As task_pri is computed from convert_prio() which should never be bigger
      than CPUPRI_NR_PRIORITIES, if the check should cause a panic if it is
      hit.
      Reported-by: default avatarMike Galbraith <umgwanakikbuti@gmail.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1397015410.5212.13.camel@marge.simpson.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      21af1041
    • David Woodhouse's avatar
      iommu/vt-d: Fix missing IOTLB flush in intel_iommu_unmap() · 4801cb6b
      David Woodhouse authored
      This is a small excerpt of the upstream commit
      ea8ea460 (iommu/vt-d: Clean up and fix
      page table clear/free behaviour).
      
      This missing IOTLB flush was added as a minor, inconsequential bug-fix
      in commit ea8ea460 ("iommu/vt-d: Clean up and fix page table clear/free
      behaviour") in 3.15. It wasn't originally intended for -stable but a
      couple of users have reported issues which turn out to be fixed by
      adding the missing flush.
      Signed-off-by: default avatarDavid Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      4801cb6b
    • Will Deacon's avatar
      ARM: perf: hook up perf_sample_event_took around pmu irq handling · 22259d1f
      Will Deacon authored
      commit 5f5092e7 upstream.
      
      Since we indirect all of our PMU IRQ handling through a dispatcher, it's
      trivial to hook up perf_sample_event_took to prevent applications such
      as oprofile from generating interrupt storms due to an unrealisticly
      low sample period.
      Reported-by: default avatarRobert Richter <rric@kernel.org>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      22259d1f
    • Vlastimil Babka's avatar
      mm/compaction: make isolate_freepages start at pageblock boundary · 4ce13868
      Vlastimil Babka authored
      commit 49e068f0 upstream.
      
      The compaction freepage scanner implementation in isolate_freepages()
      starts by taking the current cc->free_pfn value as the first pfn.  In a
      for loop, it scans from this first pfn to the end of the pageblock, and
      then subtracts pageblock_nr_pages from the first pfn to obtain the first
      pfn for the next for loop iteration.
      
      This means that when cc->free_pfn starts at offset X rather than being
      aligned on pageblock boundary, the scanner will start at offset X in all
      scanned pageblock, ignoring potentially many free pages.  Currently this
      can happen when
      
       a) zone's end pfn is not pageblock aligned, or
      
       b) through zone->compact_cached_free_pfn with CONFIG_HOLES_IN_ZONE
          enabled and a hole spanning the beginning of a pageblock
      
      This patch fixes the problem by aligning the initial pfn in
      isolate_freepages() to pageblock boundary.  This also permits replacing
      the end-of-pageblock alignment within the for loop with a simple
      pageblock_nr_pages increment.
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reported-by: default avatarHeesub Shin <heesub.shin@samsung.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Acked-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Dongjun Shin <d.j.shin@samsung.com>
      Cc: Sunghwan Yun <sunghwan.yun@samsung.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      4ce13868
    • Vlastimil Babka's avatar
      mm: compaction: detect when scanners meet in isolate_freepages · 31662557
      Vlastimil Babka authored
      commit 7ed695e0 upstream.
      
      Compaction of a zone is finished when the migrate scanner (which begins
      at the zone's lowest pfn) meets the free page scanner (which begins at
      the zone's highest pfn).  This is detected in compact_zone() and in the
      case of direct compaction, the compact_blockskip_flush flag is set so
      that kswapd later resets the cached scanner pfn's, and a new compaction
      may again start at the zone's borders.
      
      The meeting of the scanners can happen during either scanner's activity.
      However, it may currently fail to be detected when it occurs in the free
      page scanner, due to two problems.  First, isolate_freepages() keeps
      free_pfn at the highest block where it isolated pages from, for the
      purposes of not missing the pages that are returned back to allocator
      when migration fails.  Second, failing to isolate enough free pages due
      to scanners meeting results in -ENOMEM being returned by
      migrate_pages(), which makes compact_zone() bail out immediately without
      calling compact_finished() that would detect scanners meeting.
      
      This failure to detect scanners meeting might result in repeated
      attempts at compaction of a zone that keep starting from the cached
      pfn's close to the meeting point, and quickly failing through the
      -ENOMEM path, without the cached pfns being reset, over and over.  This
      has been observed (through additional tracepoints) in the third phase of
      the mmtests stress-highalloc benchmark, where the allocator runs on an
      otherwise idle system.  The problem was observed in the DMA32 zone,
      which was used as a fallback to the preferred Normal zone, but on the
      4GB system it was actually the largest zone.  The problem is even
      amplified for such fallback zone - the deferred compaction logic, which
      could (after being fixed by a previous patch) reset the cached scanner
      pfn's, is only applied to the preferred zone and not for the fallbacks.
      
      The problem in the third phase of the benchmark was further amplified by
      commit 81c0a2bb ("mm: page_alloc: fair zone allocator policy") which
      resulted in a non-deterministic regression of the allocation success
      rate from ~85% to ~65%.  This occurs in about half of benchmark runs,
      making bisection problematic.  It is unlikely that the commit itself is
      buggy, but it should put more pressure on the DMA32 zone during phases 1
      and 2, which may leave it more fragmented in phase 3 and expose the bugs
      that this patch fixes.
      
      The fix is to make scanners meeting in isolate_freepage() stay that way,
      and to check in compact_zone() for scanners meeting when migrate_pages()
      returns -ENOMEM.  The result is that compact_finished() also detects
      scanners meeting and sets the compact_blockskip_flush flag to make
      kswapd reset the scanner pfn's.
      
      The results in stress-highalloc benchmark show that the "regression" by
      commit 81c0a2bb in phase 3 no longer occurs, and phase 1 and 2
      allocation success rates are also significantly improved.
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      31662557
    • Vlastimil Babka's avatar
      mm: compaction: reset cached scanner pfn's before reading them · 948ec1db
      Vlastimil Babka authored
      commit d3132e4b upstream.
      
      Compaction caches pfn's for its migrate and free scanners to avoid
      scanning the whole zone each time.  In compact_zone(), the cached values
      are read to set up initial values for the scanners.  There are several
      situations when these cached pfn's are reset to the first and last pfn
      of the zone, respectively.  One of these situations is when a compaction
      has been deferred for a zone and is now being restarted during a direct
      compaction, which is also done in compact_zone().
      
      However, compact_zone() currently reads the cached pfn's *before*
      resetting them.  This means the reset doesn't affect the compaction that
      performs it, and with good chance also subsequent compactions, as
      update_pageblock_skip() is likely to be called and update the cached
      pfn's to those being processed.  Another chance for a successful reset
      is when a direct compaction detects that migration and free scanners
      meet (which has its own problems addressed by another patch) and sets
      update_pageblock_skip flag which kswapd uses to do the reset because it
      goes to sleep.
      
      This is clearly a bug that results in non-deterministic behavior, so
      this patch moves the cached pfn reset to be performed *before* the
      values are read.
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      948ec1db
    • Nicholas Bellinger's avatar
      target: Fix NULL pointer dereference for XCOPY in target_put_sess_cmd · 9c7e0735
      Nicholas Bellinger authored
      commit 0ed6e189 upstream.
      
      This patch fixes a NULL pointer dereference regression bug that was
      introduced with:
      
      commit 1e1110c4
      Author: Mikulas Patocka <mpatocka@redhat.com>
      Date:   Sat May 17 06:49:22 2014 -0400
      
          target: fix memory leak on XCOPY
      
      Now that target_put_sess_cmd() -> kref_put_spinlock_irqsave() is
      called with a valid se_cmd->cmd_kref, a NULL pointer dereference
      is triggered because the XCOPY passthrough commands don't have
      an associated se_session pointer.
      
      To address this bug, go ahead and checking for a NULL se_sess pointer
      within target_put_sess_cmd(), and call se_cmd->se_tfo->release_cmd()
      to release the XCOPY's xcopy_pt_cmd memory.
      Reported-by: default avatarThomas Glanzmann <thomas@glanzmann.de>
      Cc: Thomas Glanzmann <thomas@glanzmann.de>
      Cc: Mikulas Patocka <mpatocka@redhat.com>
      Cc: stable@vger.kernel.org # 3.12+
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      9c7e0735
    • Justin Maggard's avatar
      btrfs: fix defrag 32-bit integer overflow · 6e8451cb
      Justin Maggard authored
      commit c41570c9 upstream.
      
      When defragging a very large file, the cluster variable can wrap its 32-bit
      signed int type and become negative, which eventually gets passed to
      btrfs_force_ra() as a very large unsigned long value.  On 32-bit platforms,
      this eventually results in an Oops from the SLAB allocator.
      
      Change the cluster and max_cluster signed int variables to unsigned long to
      match the readahead functions.  This also allows the min() comparison in
      btrfs_defrag_file() to work as intended.
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      6e8451cb
  2. 18 Jun, 2014 7 commits
  3. 11 Jun, 2014 2 commits
  4. 09 Jun, 2014 10 commits