• Ryusuke Konishi's avatar
    nilfs2: fix issue of nilfs_set_page_dirty() for page at EOF boundary · c5449a5c
    Ryusuke Konishi authored
    commit 136e8770 upstream.
    
    nilfs2: fix issue of nilfs_set_page_dirty for page at EOF boundary
    
    DESCRIPTION:
     There are use-cases when NILFS2 file system (formatted with block size
    lesser than 4 KB) can be remounted in RO mode because of encountering of
    "broken bmap" issue.
    
    The issue was reported by Anthony Doggett <Anthony2486@interfaces.org.uk>:
     "The machine I've been trialling nilfs on is running Debian Testing,
      Linux version 3.2.0-4-686-pae (debian-kernel@lists.debian.org) (gcc
      version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.35-2), but I've
      also reproduced it (identically) with Debian Unstable amd64 and Debian
      Experimental (using the 3.8-trunk kernel).  The problematic partitions
      were formatted with "mkfs.nilfs2 -b 1024 -B 8192"."
    
    SYMPTOMS:
    (1) System log contains error messages likewise:
    
        [63102.496756] nilfs_direct_assign: invalid pointer: 0
        [63102.496786] NILFS error (device dm-17): nilfs_bmap_assign: broken bmap (inode number=28)
        [63102.496798]
        [63102.524403] Remounting filesystem read-only
    
    (2) The NILFS2 file system is remounted in RO mode.
    
    REPRODUSING PATH:
    (1) Create volume group with name "unencrypted" by means of vgcreate utility.
    (2) Run script (prepared by Anthony Doggett <Anthony2486@interfaces.org.uk>):
    
    ----------------[BEGIN SCRIPT]--------------------
    
    VG=unencrypted
    lvcreate --size 2G --name ntest $VG
    mkfs.nilfs2 -b 1024 -B 8192 /dev/mapper/$VG-ntest
    mkdir /var/tmp/n
    mkdir /var/tmp/n/ntest
    mount /dev/mapper/$VG-ntest /var/tmp/n/ntest
    mkdir /var/tmp/n/ntest/thedir
    cd /var/tmp/n/ntest/thedir
    sleep 2
    date
    darcs init
    sleep 2
    dmesg|tail -n 5
    date
    darcs whatsnew || true
    date
    sleep 2
    dmesg|tail -n 5
    ----------------[END SCRIPT]--------------------
    
    REPRODUCIBILITY: 100%
    
    INVESTIGATION:
    As it was discovered, the issue takes place during segment
    construction after executing such sequence of user-space operations:
    
      open("_darcs/index", O_RDWR|O_CREAT|O_NOCTTY, 0666) = 7
      fstat(7, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
      ftruncate(7, 60)
    
    The error message "NILFS error (device dm-17): nilfs_bmap_assign: broken
    bmap (inode number=28)" takes place because of trying to get block
    number for third block of the file with logical offset #3072 bytes.  As
    it is possible to see from above output, the file has 60 bytes of the
    whole size.  So, it is enough one block (1 KB in size) allocation for
    the whole file.  Trying to operate with several blocks instead of one
    takes place because of discovering several dirty buffers for this file
    in nilfs_segctor_scan_file() method.
    
    The root cause of this issue is in nilfs_set_page_dirty function which
    is called just before writing to an mmapped page.
    
    When nilfs_page_mkwrite function handles a page at EOF boundary, it
    fills hole blocks only inside EOF through __block_page_mkwrite().
    
    The __block_page_mkwrite() function calls set_page_dirty() after filling
    hole blocks, thus nilfs_set_page_dirty function (=
    a_ops->set_page_dirty) is called.  However, the current implementation
    of nilfs_set_page_dirty() wrongly marks all buffers dirty even for page
    at EOF boundary.
    
    As a result, buffers outside EOF are inconsistently marked dirty and
    queued for write even though they are not mapped with nilfs_get_block
    function.
    
    FIX:
    This modifies nilfs_set_page_dirty() not to mark hole blocks dirty.
    
    Thanks to Vyacheslav Dubeyko for his effort on analysis and proposals
    for this issue.
    Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    Reported-by: default avatarAnthony Doggett <Anthony2486@interfaces.org.uk>
    Reported-by: default avatarVyacheslav Dubeyko <slava@dubeyko.com>
    Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
    Tested-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    c5449a5c
inode.c 29 KB