1. 29 Dec, 2018 1 commit
    • Dave Chinner's avatar
      iomap: Revert "fs/iomap.c: get/put the page in iomap_page_create/release()" · 38d072a4
      Dave Chinner authored
      [ Upstream commit a837eca2 ]
      
      This reverts commit 61c6de66.
      
      The reverted commit added page reference counting to iomap page
      structures that are used to track block size < page size state. This
      was supposed to align the code with page migration page accounting
      assumptions, but what it has done instead is break XFS filesystems.
      Every fstests run I've done on sub-page block size XFS filesystems
      has since picking up this commit 2 days ago has failed with bad page
      state errors such as:
      
      # ./run_check.sh "-m rmapbt=1,reflink=1 -i sparse=1 -b size=1k" "generic/038"
      ....
      SECTION       -- xfs
      FSTYP         -- xfs (debug)
      PLATFORM      -- Linux/x86_64 test1 4.20.0-rc6-dgc+
      MKFS_OPTIONS  -- -f -m rmapbt=1,reflink=1 -i sparse=1 -b size=1k /dev/sdc
      MOUNT_OPTIONS -- /dev/sdc /mnt/scratch
      
      generic/038 454s ...
       run fstests generic/038 at 2018-12-20 18:43:05
       XFS (sdc): Unmounting Filesystem
       XFS (sdc): Mounting V5 Filesystem
       XFS (sdc): Ending clean mount
       BUG: Bad page state in process kswapd0  pfn:3a7fa
       page:ffffea0000ccbeb0 count:0 mapcount:0 mapping:ffff88800d9b6360 index:0x1
       flags: 0xfffffc0000000()
       raw: 000fffffc0000000 dead000000000100 dead000000000200 ffff88800d9b6360
       raw: 0000000000000001 0000000000000000 00000000ffffffff
       page dumped because: non-NULL mapping
       CPU: 0 PID: 676 Comm: kswapd0 Not tainted 4.20.0-rc6-dgc+ #915
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
       Call Trace:
        dump_stack+0x67/0x90
        bad_page.cold.116+0x8a/0xbd
        free_pcppages_bulk+0x4bf/0x6a0
        free_unref_page_list+0x10f/0x1f0
        shrink_page_list+0x49d/0xf50
        shrink_inactive_list+0x19d/0x3b0
        shrink_node_memcg.constprop.77+0x398/0x690
        ? shrink_slab.constprop.81+0x278/0x3f0
        shrink_node+0x7a/0x2f0
        kswapd+0x34b/0x6d0
        ? node_reclaim+0x240/0x240
        kthread+0x11f/0x140
        ? __kthread_bind_mask+0x60/0x60
        ret_from_fork+0x24/0x30
       Disabling lock debugging due to kernel taint
      ....
      
      The failures are from anyway that frees pages and empties the
      per-cpu page magazines, so it's not a predictable failure or an easy
      to debug failure.
      
      generic/038 is a reliable reproducer of this problem - it has a 9 in
      10 failure rate on one of my test machines. Failure on other
      machines have been at random points in fstests runs but every run
      has ended up tripping this problem. Hence generic/038 was used to
      bisect the failure because it was the most reliable failure.
      
      It is too close to the 4.20 release (not to mention holidays) to
      try to diagnose, fix and test the underlying cause of the problem,
      so reverting the commit is the only option we have right now. The
      revert has been tested against a current tot 4.20-rc7+ kernel across
      multiple machines running sub-page block size XFs filesystems and
      none of the bad page state failures have been seen.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Cc: Piotr Jaroszynski <pjaroszynski@nvidia.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: Brian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      38d072a4
  2. 21 Dec, 2018 39 commits