1. 08 Apr, 2015 1 commit
  2. 02 Apr, 2015 9 commits
  3. 31 Mar, 2015 5 commits
  4. 25 Mar, 2015 2 commits
  5. 17 Mar, 2015 1 commit
  6. 13 Mar, 2015 19 commits
    • Keith Busch's avatar
      blk-mq: don't wait in blk_mq_queue_enter() if __GFP_WAIT isn't set · bfd343aa
      Keith Busch authored
      Return -EBUSY if we're unable to enter a queue immediately when
      allocating a blk-mq request without __GFP_WAIT.
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      bfd343aa
    • Mike Snitzer's avatar
      blk-mq: export blk_mq_run_hw_queues · b94ec296
      Mike Snitzer authored
      Rename blk_mq_run_queues to blk_mq_run_hw_queues, add async argument,
      and export it.
      
      DM's suspend support must be able to run the queue without starting
      stopped hw queues.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      b94ec296
    • Mike Snitzer's avatar
      blk-mq: add blk_mq_init_allocated_queue and export blk_mq_register_disk · b62c21b7
      Mike Snitzer authored
      Add a variant of blk_mq_init_queue that allows a previously allocated
      queue to be initialized.  blk_mq_init_allocated_queue models
      blk_init_allocated_queue -- which was also created for DM's use.
      
      DM's approach to device creation requires a placeholder request_queue be
      allocated for use with alloc_dev() but the decision about what type of
      request_queue will be ultimately created is deferred until all component
      devices referenced in the DM table are processed to determine the table
      type (request-based, blk-mq request-based, or bio-based).
      
      Also, because of DM's late finalization of the request_queue type
      the call to blk_mq_register_disk() doesn't happen during alloc_dev().
      Must export blk_mq_register_disk() so that DM can backfill the 'mq' dir
      once the blk-mq queue is fully allocated.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Reviewed-by: default avatarMing Lei <ming.lei@canonical.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      b62c21b7
    • Jens Axboe's avatar
      Merge branch 'for-linus' into for-4.1/core · 64f9b683
      Jens Axboe authored
      64f9b683
    • Mike Snitzer's avatar
      blk-mq: fix use of incorrect goto label in blk_mq_init_queue error path · 9a30b096
      Mike Snitzer authored
      If percpu_ref_init() fails the allocated q and hctxs must get cleaned
      up; using 'err_map' doesn't allow that to happen.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Reviewed-by: default avatarMing Lei <ming.lei@canonical.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      9a30b096
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · c202baf0
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "13 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        memcg: disable hierarchy support if bound to the legacy cgroup hierarchy
        mm: reorder can_do_mlock to fix audit denial
        kasan, module: move MODULE_ALIGN macro into <linux/moduleloader.h>
        kasan, module, vmalloc: rework shadow allocation for modules
        fanotify: fix event filtering with FAN_ONDIR set
        mm/nommu.c: export symbol max_mapnr
        arch/c6x/include/asm/pgtable.h: define dummy pgprot_writecombine for !MMU
        nilfs2: fix deadlock of segment constructor during recovery
        mm: cma: fix CMA aligned offset calculation
        mm, hugetlb: close race when setting PageTail for gigantic pages
        mm, oom: do not fail __GFP_NOFAIL allocation if oom killer is disabled
        drivers/rtc/rtc-s3c.c: add .needs_src_clk to s3c6410 RTC data
        ocfs2: make append_dio an incompat feature
      c202baf0
    • Vladimir Davydov's avatar
      memcg: disable hierarchy support if bound to the legacy cgroup hierarchy · 7feee590
      Vladimir Davydov authored
      If the memory cgroup controller is initially mounted in the scope of the
      default cgroup hierarchy and then remounted to a legacy hierarchy, it will
      still have hierarchy support enabled, which is incorrect.  We should
      disable hierarchy support if bound to the legacy cgroup hierarchy.
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7feee590
    • Jeff Vander Stoep's avatar
      mm: reorder can_do_mlock to fix audit denial · a5a6579d
      Jeff Vander Stoep authored
      A userspace call to mmap(MAP_LOCKED) may result in the successful locking
      of memory while also producing a confusing audit log denial.  can_do_mlock
      checks capable and rlimit.  If either of these return positive
      can_do_mlock returns true.  The capable check leads to an LSM hook used by
      apparmour and selinux which produce the audit denial.  Reordering so
      rlimit is checked first eliminates the denial on success, only recording a
      denial when the lock is unsuccessful as a result of the denial.
      Signed-off-by: default avatarJeff Vander Stoep <jeffv@google.com>
      Acked-by: default avatarNick Kralevich <nnk@google.com>
      Cc: Jeff Vander Stoep <jeffv@google.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Paul Cassella <cassella@cray.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a5a6579d
    • Andrey Ryabinin's avatar
      kasan, module: move MODULE_ALIGN macro into <linux/moduleloader.h> · d3733e5c
      Andrey Ryabinin authored
      include/linux/moduleloader.h is more suitable place for this macro.
      Also change alignment to PAGE_SIZE for CONFIG_KASAN=n as such
      alignment already assumed in several places.
      Signed-off-by: default avatarAndrey Ryabinin <a.ryabinin@samsung.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Acked-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d3733e5c
    • Andrey Ryabinin's avatar
      kasan, module, vmalloc: rework shadow allocation for modules · a5af5aa8
      Andrey Ryabinin authored
      Current approach in handling shadow memory for modules is broken.
      
      Shadow memory could be freed only after memory shadow corresponds it is no
      longer used.  vfree() called from interrupt context could use memory its
      freeing to store 'struct llist_node' in it:
      
          void vfree(const void *addr)
          {
          ...
              if (unlikely(in_interrupt())) {
                  struct vfree_deferred *p = this_cpu_ptr(&vfree_deferred);
                  if (llist_add((struct llist_node *)addr, &p->list))
                          schedule_work(&p->wq);
      
      Later this list node used in free_work() which actually frees memory.
      Currently module_memfree() called in interrupt context will free shadow
      before freeing module's memory which could provoke kernel crash.
      
      So shadow memory should be freed after module's memory.  However, such
      deallocation order could race with kasan_module_alloc() in module_alloc().
      
      Free shadow right before releasing vm area.  At this point vfree()'d
      memory is not used anymore and yet not available for other allocations.
      New VM_KASAN flag used to indicate that vm area has dynamically allocated
      shadow memory so kasan frees shadow only if it was previously allocated.
      Signed-off-by: default avatarAndrey Ryabinin <a.ryabinin@samsung.com>
      Acked-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a5af5aa8
    • Suzuki K. Poulose's avatar
      fanotify: fix event filtering with FAN_ONDIR set · b3c1030d
      Suzuki K. Poulose authored
      With FAN_ONDIR set, the user can end up getting events, which it hasn't
      marked.  This was revealed with fanotify04 testcase failure on
      Linux-4.0-rc1, and is a regression from 3.19, revealed with 66ba93c0
      ("fanotify: don't set FAN_ONDIR implicitly on a marks ignored mask").
      
         # /opt/ltp/testcases/bin/fanotify04
         [ ... ]
        fanotify04    7  TPASS  :  event generated properly for type 100000
        fanotify04    8  TFAIL  :  fanotify04.c:147: got unexpected event 30
        fanotify04    9  TPASS  :  No event as expected
      
      The testcase sets the adds the following marks : FAN_OPEN | FAN_ONDIR for
      a fanotify on a dir.  Then does an open(), followed by close() of the
      directory and expects to see an event FAN_OPEN(0x20).  However, the
      fanotify returns (FAN_OPEN|FAN_CLOSE_NOWRITE(0x10)).  This happens due to
      the flaw in the check for event_mask in fanotify_should_send_event() which
      does:
      
      	if (event_mask & marks_mask & ~marks_ignored_mask)
      		return true;
      
      where, event_mask == (FAN_ONDIR | FAN_CLOSE_NOWRITE),
             marks_mask == (FAN_ONDIR | FAN_OPEN),
             marks_ignored_mask == 0
      
      Fix this by masking the outgoing events to the user, as we already take
      care of FAN_ONDIR and FAN_EVENT_ON_CHILD.
      Signed-off-by: default avatarSuzuki K. Poulose <suzuki.poulose@arm.com>
      Tested-by: default avatarLino Sanfilippo <LinoSanfilippo@gmx.de>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b3c1030d
    • gchen gchen's avatar
      mm/nommu.c: export symbol max_mapnr · 5b8bf307
      gchen gchen authored
      Several modules may need max_mapnr, so export, the related error with
      allmodconfig under c6x:
      
        MODPOST 3327 modules
        ERROR: "max_mapnr" [fs/pstore/ramoops.ko] undefined!
        ERROR: "max_mapnr" [drivers/media/v4l2-core/videobuf2-dma-contig.ko] undefined!
      Signed-off-by: default avatarChen Gang <gang.chen.5i5j@gmail.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5b8bf307
    • Chen Gang's avatar
      arch/c6x/include/asm/pgtable.h: define dummy pgprot_writecombine for !MMU · 65b9ab88
      Chen Gang authored
      When !MMU, asm-generic will not define default pgprot_writecombine, so c6x
      needs to define it by itself.  The related error:
      
          CC [M]  fs/pstore/ram_core.o
        fs/pstore/ram_core.c: In function 'persistent_ram_vmap':
        fs/pstore/ram_core.c:399:10: error: implicit declaration of function 'pgprot_writecombine' [-Werror=implicit-function-declaration]
           prot = pgprot_writecombine(PAGE_KERNEL);
                  ^
        fs/pstore/ram_core.c:399:8: error: incompatible types when assigning to type 'pgprot_t {aka struct <anonymous>}' from type 'int'
           prot = pgprot_writecombine(PAGE_KERNEL);
                ^
      Signed-off-by: default avatarChen Gang <gang.chen.5i5j@gmail.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      65b9ab88
    • Ryusuke Konishi's avatar
      nilfs2: fix deadlock of segment constructor during recovery · 283ee148
      Ryusuke Konishi authored
      According to a report from Yuxuan Shui, nilfs2 in kernel 3.19 got stuck
      during recovery at mount time.  The code path that caused the deadlock was
      as follows:
      
        nilfs_fill_super()
          load_nilfs()
            nilfs_salvage_orphan_logs()
              * Do roll-forwarding, attach segment constructor for recovery,
                and kick it.
      
              nilfs_segctor_thread()
                nilfs_segctor_thread_construct()
                 * A lock is held with nilfs_transaction_lock()
                   nilfs_segctor_do_construct()
                     nilfs_segctor_drop_written_files()
                       iput()
                         iput_final()
                           write_inode_now()
                             writeback_single_inode()
                               __writeback_single_inode()
                                 do_writepages()
                                   nilfs_writepage()
                                     nilfs_construct_dsync_segment()
                                       nilfs_transaction_lock() --> deadlock
      
      This can happen if commit 7ef3ff2f ("nilfs2: fix deadlock of segment
      constructor over I_SYNC flag") is applied and roll-forward recovery was
      performed at mount time.  The roll-forward recovery can happen if datasync
      write is done and the file system crashes immediately after that.  For
      instance, we can reproduce the issue with the following steps:
      
       < nilfs2 is mounted on /nilfs (device: /dev/sdb1) >
       # dd if=/dev/zero of=/nilfs/test bs=4k count=1 && sync
       # dd if=/dev/zero of=/nilfs/test conv=notrunc oflag=dsync bs=4k
       count=1 && reboot -nfh
       < the system will immediately reboot >
       # mount -t nilfs2 /dev/sdb1 /nilfs
      
      The deadlock occurs because iput() can run segment constructor through
      writeback_single_inode() if MS_ACTIVE flag is not set on sb->s_flags.  The
      above commit changed segment constructor so that it calls iput()
      asynchronously for inodes with i_nlink == 0, but that change was
      imperfect.
      
      This fixes the another deadlock by deferring iput() in segment constructor
      even for the case that mount is not finished, that is, for the case that
      MS_ACTIVE flag is not set.
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Reported-by: default avatarYuxuan Shui <yshuiv7@gmail.com>
      Tested-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      283ee148
    • Danesh Petigara's avatar
      mm: cma: fix CMA aligned offset calculation · 850fc430
      Danesh Petigara authored
      The CMA aligned offset calculation is incorrect for non-zero order_per_bit
      values.
      
      For example, if cma->order_per_bit=1, cma->base_pfn= 0x2f800000 and
      align_order=12, the function returns a value of 0x17c00 instead of 0x400.
      
      This patch fixes the CMA aligned offset calculation.
      
      The previous calculation was wrong and would return too-large values for
      the offset, so that when cma_alloc looks for free pages in the bitmap with
      the requested alignment > order_per_bit, it starts too far into the bitmap
      and so CMA allocations will fail despite there actually being plenty of
      free pages remaining.  It will also probably have the wrong alignment.
      With this change, we will get the correct offset into the bitmap.
      
      One affected user is powerpc KVM, which has kvm_cma->order_per_bit set to
      KVM_CMA_CHUNK_ORDER - PAGE_SHIFT, or 18 - 12 = 6.
      
      [gregory.0xf0@gmail.com: changelog additions]
      Signed-off-by: default avatarDanesh Petigara <dpetigara@broadcom.com>
      Reviewed-by: default avatarGregory Fong <gregory.0xf0@gmail.com>
      Acked-by: default avatarMichal Nazarewicz <mina86@mina86.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      850fc430
    • David Rientjes's avatar
      mm, hugetlb: close race when setting PageTail for gigantic pages · 44fc8057
      David Rientjes authored
      Now that gigantic pages are dynamically allocatable, care must be taken to
      ensure that p->first_page is valid before setting PageTail.
      
      If this isn't done, then it is possible to race and have compound_head()
      return NULL.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarDavidlohr Bueso <dave@stgolabs.net>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      44fc8057
    • Michal Hocko's avatar
      mm, oom: do not fail __GFP_NOFAIL allocation if oom killer is disabled · e009d5dc
      Michal Hocko authored
      Tetsuo Handa has pointed out that __GFP_NOFAIL allocations might fail
      after OOM killer is disabled if the allocation is performed by a kernel
      thread.  This behavior was introduced from the very beginning by
      7f33d49a ("mm, PM/Freezer: Disable OOM killer when tasks are frozen").
       This means that the basic contract for the allocation request is broken
      and the context requesting such an allocation might blow up unexpectedly.
      
      There are basically two ways forward.
      
      1) move oom_killer_disable after kernel threads are frozen.  This has a
         risk that the OOM victim wouldn't be able to finish because it would
         depend on an already frozen kernel thread.  This would be really tricky
         to debug.
      
      2) do not fail GFP_NOFAIL allocation no matter what and risk a
         potential Freezable kernel threads will loop and fail the suspend.
         Incidental allocations after kernel threads are frozen will at least
         dump a warning - if we are lucky and the serial console is still active
         of course...
      
      This patch implements the later option because it is safer.  We would see
      warning rather than allocation failures for the kernel threads which would
      blow up otherwise and have a higher chances to identify __GFP_NOFAIL users
      from deeper pm code.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarDavid Rientjes <rientjes@gooogle.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e009d5dc
    • Javier Martinez Canillas's avatar
      drivers/rtc/rtc-s3c.c: add .needs_src_clk to s3c6410 RTC data · 8792f777
      Javier Martinez Canillas authored
      Commit df9e26d0 ("rtc: s3c: add support for RTC of Exynos3250 SoC")
      added an "rtc_src" DT property to specify the clock used as a source to
      the S3C real-time clock.
      
      Not all SoCs needs this so commit eaf3a659 ("drivers/rtc/rtc-s3c.c:
      fix initialization failure without rtc source clock") changed to check
      the struct s3c_rtc_data .needs_src_clk to conditionally grab the clock.
      
      But that commit didn't update the data for each IP version so the RTC
      broke on the boards that needs a source clock. This is the case of at
      least Exynos5250 and Exynos5440 which uses the s3c6410 RTC IP block.
      
      This commit fixes the S3C rtc on the Exynos5250 Snow and Exynos5420
      Peach Pit and Pi Chromebooks.
      Signed-off-by: default avatarJavier Martinez Canillas <javier.martinez@collabora.co.uk>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Chanwoo Choi <cw00.choi@samsung.com>
      Cc: Doug Anderson <dianders@chromium.org>
      Cc: Olof Johansson <olof@lixom.net>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Tyler Baker <tyler.baker@linaro.org>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8792f777
    • Mark Fasheh's avatar
      ocfs2: make append_dio an incompat feature · 18d585f0
      Mark Fasheh authored
      It turns out that making this feature ro_compat isn't quite enough to
      prevent accidental corruption on mount from older kernels.  Ocfs2 (like
      other file systems) will process orphaned inodes even when the user mounts
      in 'ro' mode.  So for the case of a filesystem not knowing the append_dio
      feature, mounting the filesystem could result in orphaned-for-dio files
      being deleted, which we clearly don't want.
      
      So instead, turn this into an incompat flag.
      
      Btw, this is kind of my fault - initially I asked that we add a flag to
      cover the feature and even suggested that we use an ro flag.  It wasn't
      until I was looking through our commits for v4.0-rc1 that I realized we
      actually want this to be incompat.
      Signed-off-by: default avatarMark Fasheh <mfasheh@suse.de>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      18d585f0
  7. 12 Mar, 2015 3 commits