1. 13 Apr, 2015 2 commits
    • Heiko Carstens's avatar
      s390/hibernate: fix save and restore of kernel text section · d7441949
      Heiko Carstens authored
      Sebastian reported a crash caused by a jump label mismatch after resume.
      This happens because we do not save the kernel text section during suspend
      and therefore also do not restore it during resume, but use the kernel image
      that restores the old system.
      
      This means that after a suspend/resume cycle we lost all modifications done
      to the kernel text section.
      The reason for this is the pfn_is_nosave() function, which incorrectly
      returns that read-only pages don't need to be saved. This is incorrect since
      we mark the kernel text section read-only.
      We still need to make sure to not save and restore pages contained within
      NSS and DCSS segment.
      To fix this add an extra case for the kernel text section and only save
      those pages if they are not contained within an NSS segment.
      
      Fixes the following crash (and the above bugs as well):
      
      Jump label code mismatch at netif_receive_skb_internal+0x28/0xd0
      Found:    c0 04 00 00 00 00
      Expected: c0 f4 00 00 00 11
      New:      c0 04 00 00 00 00
      Kernel panic - not syncing: Corrupted kernel text
      CPU: 0 PID: 9 Comm: migration/0 Not tainted 3.19.0-01975-gb1b096e70f23 #4
      Call Trace:
        [<0000000000113972>] show_stack+0x72/0xf0
        [<000000000081f15e>] dump_stack+0x6e/0x90
        [<000000000081c4e8>] panic+0x108/0x2b0
        [<000000000081be64>] jump_label_bug.isra.2+0x104/0x108
        [<0000000000112176>] __jump_label_transform+0x9e/0xd0
        [<00000000001121e6>] __sm_arch_jump_label_transform+0x3e/0x50
        [<00000000001d1136>] multi_cpu_stop+0x12e/0x170
        [<00000000001d1472>] cpu_stopper_thread+0xb2/0x168
        [<000000000015d2ac>] smpboot_thread_fn+0x134/0x1b0
        [<0000000000158baa>] kthread+0x10a/0x110
        [<0000000000824a86>] kernel_thread_starter+0x6/0xc
      Reported-and-tested-by: default avatarSebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      d7441949
    • Heiko Carstens's avatar
      s390/cacheinfo: add missing facility check · 77bb36e5
      Heiko Carstens authored
      Git commit d97d929f ("s390: move cacheinfo sysfs to generic cacheinfo
      infrastructure") removed the general-instructions-extension availability
      check before the ecag instruction is executed.
      Without this check this may lead to crashes on machines without this facility.
      Therefore add the check again where needed.
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      77bb36e5
  2. 30 Mar, 2015 1 commit
  3. 25 Mar, 2015 20 commits
  4. 13 Mar, 2015 17 commits
    • Martin Schwidefsky's avatar
      s390/mm: limit STACK_RND_MASK for compat tasks · e143fa93
      Martin Schwidefsky authored
      For compat tasks the mmap randomization does not use the maximum
      randomization value from mmap_rnd_mask but the fixed value of 0x7ff.
      This needs to be respected in the definition of STACK_RND_MASK as
      well.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      e143fa93
    • Heiko Carstens's avatar
    • Hendrik Brueckner's avatar
      s390/cpum_sf: add diagnostic sampling event only if it is authorized · 0a648150
      Hendrik Brueckner authored
      The SF_CYCLES_BASIC_DIAG is always registered even if it is turned of in the
      current hardware configuration.  Because diagnostic-sampling is typically not
      turned on in the hardware configuration, do not register this perf event by
      default.  Enable it only if the diagnostic-sampling function is authorized.
      Signed-off-by: default avatarHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      0a648150
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · c202baf0
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "13 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        memcg: disable hierarchy support if bound to the legacy cgroup hierarchy
        mm: reorder can_do_mlock to fix audit denial
        kasan, module: move MODULE_ALIGN macro into <linux/moduleloader.h>
        kasan, module, vmalloc: rework shadow allocation for modules
        fanotify: fix event filtering with FAN_ONDIR set
        mm/nommu.c: export symbol max_mapnr
        arch/c6x/include/asm/pgtable.h: define dummy pgprot_writecombine for !MMU
        nilfs2: fix deadlock of segment constructor during recovery
        mm: cma: fix CMA aligned offset calculation
        mm, hugetlb: close race when setting PageTail for gigantic pages
        mm, oom: do not fail __GFP_NOFAIL allocation if oom killer is disabled
        drivers/rtc/rtc-s3c.c: add .needs_src_clk to s3c6410 RTC data
        ocfs2: make append_dio an incompat feature
      c202baf0
    • Vladimir Davydov's avatar
      memcg: disable hierarchy support if bound to the legacy cgroup hierarchy · 7feee590
      Vladimir Davydov authored
      If the memory cgroup controller is initially mounted in the scope of the
      default cgroup hierarchy and then remounted to a legacy hierarchy, it will
      still have hierarchy support enabled, which is incorrect.  We should
      disable hierarchy support if bound to the legacy cgroup hierarchy.
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7feee590
    • Jeff Vander Stoep's avatar
      mm: reorder can_do_mlock to fix audit denial · a5a6579d
      Jeff Vander Stoep authored
      A userspace call to mmap(MAP_LOCKED) may result in the successful locking
      of memory while also producing a confusing audit log denial.  can_do_mlock
      checks capable and rlimit.  If either of these return positive
      can_do_mlock returns true.  The capable check leads to an LSM hook used by
      apparmour and selinux which produce the audit denial.  Reordering so
      rlimit is checked first eliminates the denial on success, only recording a
      denial when the lock is unsuccessful as a result of the denial.
      Signed-off-by: default avatarJeff Vander Stoep <jeffv@google.com>
      Acked-by: default avatarNick Kralevich <nnk@google.com>
      Cc: Jeff Vander Stoep <jeffv@google.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Paul Cassella <cassella@cray.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a5a6579d
    • Andrey Ryabinin's avatar
      kasan, module: move MODULE_ALIGN macro into <linux/moduleloader.h> · d3733e5c
      Andrey Ryabinin authored
      include/linux/moduleloader.h is more suitable place for this macro.
      Also change alignment to PAGE_SIZE for CONFIG_KASAN=n as such
      alignment already assumed in several places.
      Signed-off-by: default avatarAndrey Ryabinin <a.ryabinin@samsung.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Acked-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d3733e5c
    • Andrey Ryabinin's avatar
      kasan, module, vmalloc: rework shadow allocation for modules · a5af5aa8
      Andrey Ryabinin authored
      Current approach in handling shadow memory for modules is broken.
      
      Shadow memory could be freed only after memory shadow corresponds it is no
      longer used.  vfree() called from interrupt context could use memory its
      freeing to store 'struct llist_node' in it:
      
          void vfree(const void *addr)
          {
          ...
              if (unlikely(in_interrupt())) {
                  struct vfree_deferred *p = this_cpu_ptr(&vfree_deferred);
                  if (llist_add((struct llist_node *)addr, &p->list))
                          schedule_work(&p->wq);
      
      Later this list node used in free_work() which actually frees memory.
      Currently module_memfree() called in interrupt context will free shadow
      before freeing module's memory which could provoke kernel crash.
      
      So shadow memory should be freed after module's memory.  However, such
      deallocation order could race with kasan_module_alloc() in module_alloc().
      
      Free shadow right before releasing vm area.  At this point vfree()'d
      memory is not used anymore and yet not available for other allocations.
      New VM_KASAN flag used to indicate that vm area has dynamically allocated
      shadow memory so kasan frees shadow only if it was previously allocated.
      Signed-off-by: default avatarAndrey Ryabinin <a.ryabinin@samsung.com>
      Acked-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a5af5aa8
    • Suzuki K. Poulose's avatar
      fanotify: fix event filtering with FAN_ONDIR set · b3c1030d
      Suzuki K. Poulose authored
      With FAN_ONDIR set, the user can end up getting events, which it hasn't
      marked.  This was revealed with fanotify04 testcase failure on
      Linux-4.0-rc1, and is a regression from 3.19, revealed with 66ba93c0
      ("fanotify: don't set FAN_ONDIR implicitly on a marks ignored mask").
      
         # /opt/ltp/testcases/bin/fanotify04
         [ ... ]
        fanotify04    7  TPASS  :  event generated properly for type 100000
        fanotify04    8  TFAIL  :  fanotify04.c:147: got unexpected event 30
        fanotify04    9  TPASS  :  No event as expected
      
      The testcase sets the adds the following marks : FAN_OPEN | FAN_ONDIR for
      a fanotify on a dir.  Then does an open(), followed by close() of the
      directory and expects to see an event FAN_OPEN(0x20).  However, the
      fanotify returns (FAN_OPEN|FAN_CLOSE_NOWRITE(0x10)).  This happens due to
      the flaw in the check for event_mask in fanotify_should_send_event() which
      does:
      
      	if (event_mask & marks_mask & ~marks_ignored_mask)
      		return true;
      
      where, event_mask == (FAN_ONDIR | FAN_CLOSE_NOWRITE),
             marks_mask == (FAN_ONDIR | FAN_OPEN),
             marks_ignored_mask == 0
      
      Fix this by masking the outgoing events to the user, as we already take
      care of FAN_ONDIR and FAN_EVENT_ON_CHILD.
      Signed-off-by: default avatarSuzuki K. Poulose <suzuki.poulose@arm.com>
      Tested-by: default avatarLino Sanfilippo <LinoSanfilippo@gmx.de>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b3c1030d
    • gchen gchen's avatar
      mm/nommu.c: export symbol max_mapnr · 5b8bf307
      gchen gchen authored
      Several modules may need max_mapnr, so export, the related error with
      allmodconfig under c6x:
      
        MODPOST 3327 modules
        ERROR: "max_mapnr" [fs/pstore/ramoops.ko] undefined!
        ERROR: "max_mapnr" [drivers/media/v4l2-core/videobuf2-dma-contig.ko] undefined!
      Signed-off-by: default avatarChen Gang <gang.chen.5i5j@gmail.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5b8bf307
    • Chen Gang's avatar
      arch/c6x/include/asm/pgtable.h: define dummy pgprot_writecombine for !MMU · 65b9ab88
      Chen Gang authored
      When !MMU, asm-generic will not define default pgprot_writecombine, so c6x
      needs to define it by itself.  The related error:
      
          CC [M]  fs/pstore/ram_core.o
        fs/pstore/ram_core.c: In function 'persistent_ram_vmap':
        fs/pstore/ram_core.c:399:10: error: implicit declaration of function 'pgprot_writecombine' [-Werror=implicit-function-declaration]
           prot = pgprot_writecombine(PAGE_KERNEL);
                  ^
        fs/pstore/ram_core.c:399:8: error: incompatible types when assigning to type 'pgprot_t {aka struct <anonymous>}' from type 'int'
           prot = pgprot_writecombine(PAGE_KERNEL);
                ^
      Signed-off-by: default avatarChen Gang <gang.chen.5i5j@gmail.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      65b9ab88
    • Ryusuke Konishi's avatar
      nilfs2: fix deadlock of segment constructor during recovery · 283ee148
      Ryusuke Konishi authored
      According to a report from Yuxuan Shui, nilfs2 in kernel 3.19 got stuck
      during recovery at mount time.  The code path that caused the deadlock was
      as follows:
      
        nilfs_fill_super()
          load_nilfs()
            nilfs_salvage_orphan_logs()
              * Do roll-forwarding, attach segment constructor for recovery,
                and kick it.
      
              nilfs_segctor_thread()
                nilfs_segctor_thread_construct()
                 * A lock is held with nilfs_transaction_lock()
                   nilfs_segctor_do_construct()
                     nilfs_segctor_drop_written_files()
                       iput()
                         iput_final()
                           write_inode_now()
                             writeback_single_inode()
                               __writeback_single_inode()
                                 do_writepages()
                                   nilfs_writepage()
                                     nilfs_construct_dsync_segment()
                                       nilfs_transaction_lock() --> deadlock
      
      This can happen if commit 7ef3ff2f ("nilfs2: fix deadlock of segment
      constructor over I_SYNC flag") is applied and roll-forward recovery was
      performed at mount time.  The roll-forward recovery can happen if datasync
      write is done and the file system crashes immediately after that.  For
      instance, we can reproduce the issue with the following steps:
      
       < nilfs2 is mounted on /nilfs (device: /dev/sdb1) >
       # dd if=/dev/zero of=/nilfs/test bs=4k count=1 && sync
       # dd if=/dev/zero of=/nilfs/test conv=notrunc oflag=dsync bs=4k
       count=1 && reboot -nfh
       < the system will immediately reboot >
       # mount -t nilfs2 /dev/sdb1 /nilfs
      
      The deadlock occurs because iput() can run segment constructor through
      writeback_single_inode() if MS_ACTIVE flag is not set on sb->s_flags.  The
      above commit changed segment constructor so that it calls iput()
      asynchronously for inodes with i_nlink == 0, but that change was
      imperfect.
      
      This fixes the another deadlock by deferring iput() in segment constructor
      even for the case that mount is not finished, that is, for the case that
      MS_ACTIVE flag is not set.
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Reported-by: default avatarYuxuan Shui <yshuiv7@gmail.com>
      Tested-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      283ee148
    • Danesh Petigara's avatar
      mm: cma: fix CMA aligned offset calculation · 850fc430
      Danesh Petigara authored
      The CMA aligned offset calculation is incorrect for non-zero order_per_bit
      values.
      
      For example, if cma->order_per_bit=1, cma->base_pfn= 0x2f800000 and
      align_order=12, the function returns a value of 0x17c00 instead of 0x400.
      
      This patch fixes the CMA aligned offset calculation.
      
      The previous calculation was wrong and would return too-large values for
      the offset, so that when cma_alloc looks for free pages in the bitmap with
      the requested alignment > order_per_bit, it starts too far into the bitmap
      and so CMA allocations will fail despite there actually being plenty of
      free pages remaining.  It will also probably have the wrong alignment.
      With this change, we will get the correct offset into the bitmap.
      
      One affected user is powerpc KVM, which has kvm_cma->order_per_bit set to
      KVM_CMA_CHUNK_ORDER - PAGE_SHIFT, or 18 - 12 = 6.
      
      [gregory.0xf0@gmail.com: changelog additions]
      Signed-off-by: default avatarDanesh Petigara <dpetigara@broadcom.com>
      Reviewed-by: default avatarGregory Fong <gregory.0xf0@gmail.com>
      Acked-by: default avatarMichal Nazarewicz <mina86@mina86.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      850fc430
    • David Rientjes's avatar
      mm, hugetlb: close race when setting PageTail for gigantic pages · 44fc8057
      David Rientjes authored
      Now that gigantic pages are dynamically allocatable, care must be taken to
      ensure that p->first_page is valid before setting PageTail.
      
      If this isn't done, then it is possible to race and have compound_head()
      return NULL.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarDavidlohr Bueso <dave@stgolabs.net>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      44fc8057
    • Michal Hocko's avatar
      mm, oom: do not fail __GFP_NOFAIL allocation if oom killer is disabled · e009d5dc
      Michal Hocko authored
      Tetsuo Handa has pointed out that __GFP_NOFAIL allocations might fail
      after OOM killer is disabled if the allocation is performed by a kernel
      thread.  This behavior was introduced from the very beginning by
      7f33d49a ("mm, PM/Freezer: Disable OOM killer when tasks are frozen").
       This means that the basic contract for the allocation request is broken
      and the context requesting such an allocation might blow up unexpectedly.
      
      There are basically two ways forward.
      
      1) move oom_killer_disable after kernel threads are frozen.  This has a
         risk that the OOM victim wouldn't be able to finish because it would
         depend on an already frozen kernel thread.  This would be really tricky
         to debug.
      
      2) do not fail GFP_NOFAIL allocation no matter what and risk a
         potential Freezable kernel threads will loop and fail the suspend.
         Incidental allocations after kernel threads are frozen will at least
         dump a warning - if we are lucky and the serial console is still active
         of course...
      
      This patch implements the later option because it is safer.  We would see
      warning rather than allocation failures for the kernel threads which would
      blow up otherwise and have a higher chances to identify __GFP_NOFAIL users
      from deeper pm code.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarDavid Rientjes <rientjes@gooogle.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e009d5dc
    • Javier Martinez Canillas's avatar
      drivers/rtc/rtc-s3c.c: add .needs_src_clk to s3c6410 RTC data · 8792f777
      Javier Martinez Canillas authored
      Commit df9e26d0 ("rtc: s3c: add support for RTC of Exynos3250 SoC")
      added an "rtc_src" DT property to specify the clock used as a source to
      the S3C real-time clock.
      
      Not all SoCs needs this so commit eaf3a659 ("drivers/rtc/rtc-s3c.c:
      fix initialization failure without rtc source clock") changed to check
      the struct s3c_rtc_data .needs_src_clk to conditionally grab the clock.
      
      But that commit didn't update the data for each IP version so the RTC
      broke on the boards that needs a source clock. This is the case of at
      least Exynos5250 and Exynos5440 which uses the s3c6410 RTC IP block.
      
      This commit fixes the S3C rtc on the Exynos5250 Snow and Exynos5420
      Peach Pit and Pi Chromebooks.
      Signed-off-by: default avatarJavier Martinez Canillas <javier.martinez@collabora.co.uk>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Chanwoo Choi <cw00.choi@samsung.com>
      Cc: Doug Anderson <dianders@chromium.org>
      Cc: Olof Johansson <olof@lixom.net>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Tyler Baker <tyler.baker@linaro.org>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8792f777
    • Mark Fasheh's avatar
      ocfs2: make append_dio an incompat feature · 18d585f0
      Mark Fasheh authored
      It turns out that making this feature ro_compat isn't quite enough to
      prevent accidental corruption on mount from older kernels.  Ocfs2 (like
      other file systems) will process orphaned inodes even when the user mounts
      in 'ro' mode.  So for the case of a filesystem not knowing the append_dio
      feature, mounting the filesystem could result in orphaned-for-dio files
      being deleted, which we clearly don't want.
      
      So instead, turn this into an incompat flag.
      
      Btw, this is kind of my fault - initially I asked that we add a flag to
      cover the feature and even suggested that we use an ro flag.  It wasn't
      until I was looking through our commits for v4.0-rc1 that I realized we
      actually want this to be incompat.
      Signed-off-by: default avatarMark Fasheh <mfasheh@suse.de>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      18d585f0