1. 20 Aug, 2016 27 commits
  2. 16 Aug, 2016 13 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.7.1 · 95f15f5e
      Greg Kroah-Hartman authored
      95f15f5e
    • Vegard Nossum's avatar
      ext4: fix reference counting bug on block allocation error · 9cab14ce
      Vegard Nossum authored
      commit 554a5ccc upstream.
      
      If we hit this error when mounted with errors=continue or
      errors=remount-ro:
      
          EXT4-fs error (device loop0): ext4_mb_mark_diskspace_used:2940: comm ext4.exe: Allocating blocks 5090-6081 which overlap fs metadata
      
      then ext4_mb_new_blocks() will call ext4_mb_release_context() and try to
      continue. However, ext4_mb_release_context() is the wrong thing to call
      here since we are still actually using the allocation context.
      
      Instead, just error out. We could retry the allocation, but there is a
      possibility of getting stuck in an infinite loop instead, so this seems
      safer.
      
      [ Fixed up so we don't return EAGAIN to userspace. --tytso ]
      
      Fixes: 8556e8f3 ("ext4: Don't allow new groups to be added during block allocation")
      Signed-off-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9cab14ce
    • Vegard Nossum's avatar
      ext4: short-cut orphan cleanup on error · b58d417d
      Vegard Nossum authored
      commit c65d5c6c upstream.
      
      If we encounter a filesystem error during orphan cleanup, we should stop.
      Otherwise, we may end up in an infinite loop where the same inode is
      processed again and again.
      
          EXT4-fs (loop0): warning: checktime reached, running e2fsck is recommended
          EXT4-fs error (device loop0): ext4_mb_generate_buddy:758: group 2, block bitmap and bg descriptor inconsistent: 6117 vs 0 free clusters
          Aborting journal on device loop0-8.
          EXT4-fs (loop0): Remounting filesystem read-only
          EXT4-fs error (device loop0) in ext4_free_blocks:4895: Journal has aborted
          EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
          EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
          EXT4-fs error (device loop0) in ext4_ext_remove_space:3068: IO failure
          EXT4-fs error (device loop0) in ext4_ext_truncate:4667: Journal has aborted
          EXT4-fs error (device loop0) in ext4_orphan_del:2927: Journal has aborted
          EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
          EXT4-fs (loop0): Inode 16 (00000000618192a0): orphan list check failed!
          [...]
          EXT4-fs (loop0): Inode 16 (0000000061819748): orphan list check failed!
          [...]
          EXT4-fs (loop0): Inode 16 (0000000061819bf0): orphan list check failed!
          [...]
      
      See-also: c9eb13a9 ("ext4: fix hang when processing corrupted orphaned inode list")
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b58d417d
    • Theodore Ts'o's avatar
      ext4: validate s_reserved_gdt_blocks on mount · 9cfaab73
      Theodore Ts'o authored
      commit 5b9554dc upstream.
      
      If s_reserved_gdt_blocks is extremely large, it's possible for
      ext4_init_block_bitmap(), which is called when ext4 sets up an
      uninitialized block bitmap, to corrupt random kernel memory.  Add the
      same checks which e2fsck has --- it must never be larger than
      blocksize / sizeof(__u32) --- and then add a backup check in
      ext4_init_block_bitmap() in case the superblock gets modified after
      the file system is mounted.
      Reported-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9cfaab73
    • Vegard Nossum's avatar
      ext4: don't call ext4_should_journal_data() on the journal inode · efc588bf
      Vegard Nossum authored
      commit 6a7fd522 upstream.
      
      If ext4_fill_super() fails early, it's possible for ext4_evict_inode()
      to call ext4_should_journal_data() before superblock options and flags
      are fully set up.  In that case, the iput() on the journal inode can
      end up causing a BUG().
      
      Work around this problem by reordering the tests so we only call
      ext4_should_journal_data() after we know it's not the journal inode.
      
      Fixes: 2d859db3 ("ext4: fix data corruption in inodes with journalled data")
      Fixes: 2b405bfa ("ext4: fix data=journal fast mount/umount hang")
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      efc588bf
    • Jan Kara's avatar
      ext4: fix deadlock during page writeback · 4e23a593
      Jan Kara authored
      commit 646caa9c upstream.
      
      Commit 06bd3c36 (ext4: fix data exposure after a crash) uncovered a
      deadlock in ext4_writepages() which was previously much harder to hit.
      After this commit xfstest generic/130 reproduces the deadlock on small
      filesystems.
      
      The problem happens when ext4_do_update_inode() sets LARGE_FILE feature
      and marks current inode handle as synchronous. That subsequently results
      in ext4_journal_stop() called from ext4_writepages() to block waiting for
      transaction commit while still holding page locks, reference to io_end,
      and some prepared bio in mpd structure each of which can possibly block
      transaction commit from completing and thus results in deadlock.
      
      Fix the problem by releasing page locks, io_end reference, and
      submitting prepared bio before calling ext4_journal_stop().
      
      [ Changed to defer the call to ext4_journal_stop() only if the handle
        is synchronous.  --tytso ]
      Reported-and-tested-by: default avatarEryu Guan <eguan@redhat.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4e23a593
    • Vegard Nossum's avatar
      ext4: check for extents that wrap around · 7fc7b085
      Vegard Nossum authored
      commit f70749ca upstream.
      
      An extent with lblock = 4294967295 and len = 1 will pass the
      ext4_valid_extent() test:
      
      	ext4_lblk_t last = lblock + len - 1;
      
      	if (len == 0 || lblock > last)
      		return 0;
      
      since last = 4294967295 + 1 - 1 = 4294967295. This would later trigger
      the BUG_ON(es->es_lblk + es->es_len < es->es_lblk) in ext4_es_end().
      
      We can simplify it by removing the - 1 altogether and changing the test
      to use lblock + len <= lblock, since now if len = 0, then lblock + 0 ==
      lblock and it fails, and if len > 0 then lblock + len > lblock in order
      to pass (i.e. it doesn't overflow).
      
      Fixes: 5946d089 ("ext4: check for overlapping extents in ext4_valid_extent_entries()")
      Fixes: 2f974865 ("ext4: check for zero length extent explicitly")
      Cc: Eryu Guan <guaneryu@gmail.com>
      Signed-off-by: default avatarPhil Turnbull <phil.turnbull@oracle.com>
      Signed-off-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7fc7b085
    • Thomas Petazzoni's avatar
      serial: mvebu-uart: free the IRQ in ->shutdown() · 5b3045bc
      Thomas Petazzoni authored
      commit c2c1659b upstream.
      
      As suggested by the serial port infrastructure documentation, the IRQ is
      requested in ->startup(). However, it is never freed in the ->shutdown()
      hook.
      
      With simple systems that open the serial port once for all and always
      have at least one process that keep the serial port opened, there was no
      problem. But with a more complicated system (*cough* systemd *cough*),
      the serial port is opened/closed many times, which at some point no
      processes having the serial port open at all. Due to this ->startup()
      gets called again, tries to request_irq() again, which fails.
      
      Fixes: 30530791 ("serial: mvebu-uart: initial support for Armada-3700 serial port")
      Cc: Ofer Heifetz <oferh@marvell.com>
      Signed-off-by: default avatarThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5b3045bc
    • Herbert Xu's avatar
      crypto: scatterwalk - Fix test in scatterwalk_done · 1539c4d5
      Herbert Xu authored
      commit 5f070e81 upstream.
      
      When there is more data to be processed, the current test in
      scatterwalk_done may prevent us from calling pagedone even when
      we should.
      
      In particular, if we're on an SG entry spanning multiple pages
      where the last page is not a full page, we will incorrectly skip
      calling pagedone on the second last page.
      
      This patch fixes this by adding a separate test for whether we've
      reached the end of a page.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1539c4d5
    • Herbert Xu's avatar
      crypto: gcm - Filter out async ghash if necessary · cab0805d
      Herbert Xu authored
      commit b30bdfa8 upstream.
      
      As it is if you ask for a sync gcm you may actually end up with
      an async one because it does not filter out async implementations
      of ghash.
      
      This patch fixes this by adding the necessary filter when looking
      for ghash.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cab0805d
    • Andreas Herrmann's avatar
      Revert "cpufreq: pcc-cpufreq: update default value of cpuinfo_transition_latency" · efbe5a35
      Andreas Herrmann authored
      commit da7d3abe upstream.
      
      This reverts commit 790d849b.
      
      Using a v4.7-rc7 kernel on a HP ProLiant triggered following messages
      
       pcc-cpufreq: (v1.10.00) driver loaded with frequency limits: 1200 MHz, 2800 MHz
       cpufreq: ondemand governor failed, too long transition latency of HW, fallback to performance governor
      
      The last line was shown for each CPU in the system.
      Testing v4.5 (where commit 790d849b was integrated) triggered
      similar messages. Same behaviour on a 2nd HP Proliant system.
      
      So commit 790d849b (cpufreq: pcc-cpufreq: update default value of
      cpuinfo_transition_latency) causes the system to use performance
      governor which, I guess, was not the intention of the patch.
      
      Enabling debug output in pcc-cpufreq provides following verbose output:
      
       pcc-cpufreq: (v1.10.00) driver loaded with frequency limits: 1200 MHz, 2800 MHz
       pcc_get_offset: for CPU 0: pcc_cpu_data input_offset: 0x44, pcc_cpu_data output_offset: 0x48
       init: policy->max is 2800000, policy->min is 1200000
       get: get_freq for CPU 0
       get: SUCCESS: (virtual) output_offset for cpu 0 is 0xffffc9000d7c0048, contains a value of: 0xff06. Speed is: 168000 MHz
       cpufreq: ondemand governor failed, too long transition latency of HW, fallback to performance governor
       target: CPU 0 should go to target freq: 2800000 (virtual) input_offset is 0xffffc9000d7c0044
       target: was SUCCESSFUL for cpu 0
      
      I am asking to revert 790d849b to re-enable usage of ondemand
      governor with pcc-cpufreq.
      
      Fixes: 790d849b (cpufreq: pcc-cpufreq: update default value of cpuinfo_transition_latency)
      Signed-off-by: default avatarAndreas Herrmann <aherrmann@suse.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      efbe5a35
    • Wei Fang's avatar
      fs/dcache.c: avoid soft-lockup in dput() · 84603f74
      Wei Fang authored
      commit 47be6184 upstream.
      
      We triggered soft-lockup under stress test which
      open/access/write/close one file concurrently on more than
      five different CPUs:
      
      WARN: soft lockup - CPU#0 stuck for 11s! [who:30631]
      ...
      [<ffffffc0003986f8>] dput+0x100/0x298
      [<ffffffc00038c2dc>] terminate_walk+0x4c/0x60
      [<ffffffc00038f56c>] path_lookupat+0x5cc/0x7a8
      [<ffffffc00038f780>] filename_lookup+0x38/0xf0
      [<ffffffc000391180>] user_path_at_empty+0x78/0xd0
      [<ffffffc0003911f4>] user_path_at+0x1c/0x28
      [<ffffffc00037d4fc>] SyS_faccessat+0xb4/0x230
      
      ->d_lock trylock may failed many times because of concurrently
      operations, and dput() may execute a long time.
      
      Fix this by replacing cpu_relax() with cond_resched().
      dput() used to be sleepable, so make it sleepable again
      should be safe.
      Signed-off-by: default avatarWei Fang <fangwei1@huawei.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      84603f74
    • Michal Hocko's avatar
      Revert "mm, mempool: only set __GFP_NOMEMALLOC if there are free elements" · 378eef49
      Michal Hocko authored
      commit 4e390b2b upstream.
      
      This reverts commit f9054c70 ("mm, mempool: only set __GFP_NOMEMALLOC
      if there are free elements").
      
      There has been a report about OOM killer invoked when swapping out to a
      dm-crypt device.  The primary reason seems to be that the swapout out IO
      managed to completely deplete memory reserves.  Ondrej was able to
      bisect and explained the issue by pointing to f9054c70 ("mm,
      mempool: only set __GFP_NOMEMALLOC if there are free elements").
      
      The reason is that the swapout path is not throttled properly because
      the md-raid layer needs to allocate from the generic_make_request path
      which means it allocates from the PF_MEMALLOC context.  dm layer uses
      mempool_alloc in order to guarantee a forward progress which used to
      inhibit access to memory reserves when using page allocator.  This has
      changed by f9054c70 ("mm, mempool: only set __GFP_NOMEMALLOC if
      there are free elements") which has dropped the __GFP_NOMEMALLOC
      protection when the memory pool is depleted.
      
      If we are running out of memory and the only way forward to free memory
      is to perform swapout we just keep consuming memory reserves rather than
      throttling the mempool allocations and allowing the pending IO to
      complete up to a moment when the memory is depleted completely and there
      is no way forward but invoking the OOM killer.  This is less than
      optimal.
      
      The original intention of f9054c70 was to help with the OOM
      situations where the oom victim depends on mempool allocation to make a
      forward progress.  David has mentioned the following backtrace:
      
        schedule
        schedule_timeout
        io_schedule_timeout
        mempool_alloc
        __split_and_process_bio
        dm_request
        generic_make_request
        submit_bio
        mpage_readpages
        ext4_readpages
        __do_page_cache_readahead
        ra_submit
        filemap_fault
        handle_mm_fault
        __do_page_fault
        do_page_fault
        page_fault
      
      We do not know more about why the mempool is depleted without being
      replenished in time, though.  In any case the dm layer shouldn't depend
      on any allocations outside of the dedicated pools so a forward progress
      should be guaranteed.  If this is not the case then the dm should be
      fixed rather than papering over the problem and postponing it to later
      by accessing more memory reserves.
      
      mempools are a mechanism to maintain dedicated memory reserves to
      guaratee forward progress.  Allowing them an unbounded access to the
      page allocator memory reserves is going against the whole purpose of
      this mechanism.
      
      Bisected by Ondrej Kozina.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Link: http://lkml.kernel.org/r/20160721145309.GR26379@dhcp22.suse.czSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarOndrej Kozina <okozina@redhat.com>
      Reviewed-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarNeilBrown <neilb@suse.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mikulas Patocka <mpatocka@redhat.com>
      Cc: Ondrej Kozina <okozina@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      378eef49