1. 25 Nov, 2016 16 commits
    • Nicolai Stange's avatar
      f2fs: set ->owner for debugfs status file's file_operations · 05e6ea26
      Nicolai Stange authored
      The struct file_operations instance serving the f2fs/status debugfs file
      lacks an initialization of its ->owner.
      
      This means that although that file might have been opened, the f2fs module
      can still get removed. Any further operation on that opened file, releasing
      included,  will cause accesses to unmapped memory.
      
      Indeed, Mike Marshall reported the following:
      
        BUG: unable to handle kernel paging request at ffffffffa0307430
        IP: [<ffffffff8132a224>] full_proxy_release+0x24/0x90
        <...>
        Call Trace:
         [] __fput+0xdf/0x1d0
         [] ____fput+0xe/0x10
         [] task_work_run+0x8e/0xc0
         [] do_exit+0x2ae/0xae0
         [] ? __audit_syscall_entry+0xae/0x100
         [] ? syscall_trace_enter+0x1ca/0x310
         [] do_group_exit+0x44/0xc0
         [] SyS_exit_group+0x14/0x20
         [] do_syscall_64+0x61/0x150
         [] entry_SYSCALL64_slow_path+0x25/0x25
        <...>
        ---[ end trace f22ae883fa3ea6b8 ]---
        Fixing recursive fault but reboot is needed!
      
      Fix this by initializing the f2fs/status file_operations' ->owner with
      THIS_MODULE.
      
      This will allow debugfs to grab a reference to the f2fs module upon any
      open on that file, thus preventing it from getting removed.
      
      Fixes: 902829aa ("f2fs: move proc files to debugfs")
      Reported-by: default avatarMike Marshall <hubcap@omnibond.com>
      Reported-by: default avatarMartin Brandenburg <martin@omnibond.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarNicolai Stange <nicstange@gmail.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      05e6ea26
    • Chao Yu's avatar
      f2fs: fix incorrect free inode count in ->statfs · b08b12d2
      Chao Yu authored
      While calculating inode count that we can create at most in the left space,
      we should consider space which data/node blocks occupied, since we create
      data/node mixly in main area. So fix the wrong calculation in ->statfs.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      b08b12d2
    • Geliang Tang's avatar
      f2fs: drop duplicate header timer.h · b4ceec29
      Geliang Tang authored
      Drop duplicate header timer.h from segment.c.
      Signed-off-by: default avatarGeliang Tang <geliangtang@gmail.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      b4ceec29
    • Jaegeuk Kim's avatar
      f2fs: fix wrong AUTO_RECOVER condition · 97dd26ad
      Jaegeuk Kim authored
      If i_size is not aligned to the f2fs's block size, we should not skip inode
      update during fsync.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      97dd26ad
    • Jaegeuk Kim's avatar
      f2fs: do not recover i_size if it's valid · 3a3a5ead
      Jaegeuk Kim authored
      If i_size is already valid during roll_forward recovery, we should not update
      it according to the block alignment.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      3a3a5ead
    • Chao Yu's avatar
      f2fs: fix fdatasync · 281518c6
      Chao Yu authored
      For below two cases, we can't guarantee data consistence:
      
      a)
      1. xfs_io "pwrite 0 4195328" "fsync"
      2. xfs_io "pwrite 4195328 1024" "fdatasync"
      3. godown
      4. umount & mount
      --> isize we updated before fdatasync won't be recovered
      
      b)
      1. xfs_io "pwrite -S 0xcc 0 4202496" "fsync"
      2. xfs_io "fpunch 4194304 4096" "fdatasync"
      3. godown
      4. umount & mount
      --> dnode we punched before fdatasync won't be recovered
      
      The reason is that normally fdatasync won't be aware of modification
      of metadata in file, e.g. isize changing, dnode updating, so in ->fsync
      we will skip flushing node pages for above cases, result in making
      fdatasynced file being lost during recovery.
      
      Currently we have introduced DIRTY_META global list in sbi for tracking
      dirty inode selectively, so in fdatasync we can choose to flush nodes
      depend on dirty state of current inode in the list.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      281518c6
    • Chao Yu's avatar
      f2fs: fix to account total free nid correctly · 04d47e67
      Chao Yu authored
      Thread A		Thread B		Thread C
      - f2fs_create
       - f2fs_new_inode
        - f2fs_lock_op
         - alloc_nid
          alloc last nid
        - f2fs_unlock_op
      			- f2fs_create
      			 - f2fs_new_inode
      			  - f2fs_lock_op
      			   - alloc_nid
      			    as node count still not
      			    be increased, we will
      			    loop in alloc_nid
      						- f2fs_write_node_pages
      						 - f2fs_balance_fs_bg
      						  - f2fs_sync_fs
      						   - write_checkpoint
      						    - block_operations
      						     - f2fs_lock_all
       - f2fs_lock_op
      
      While creating new inode, we do not allocate and account nid atomically,
      so that when there is almost no free nids left, we may encounter deadloop
      like above stack.
      
      In order to avoid that, reuse nm_i::available_nids for accounting free nids
      and make nid allocation and counting being atomical during node creation.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      04d47e67
    • Yunlei He's avatar
      f2fs: fix an infinite loop when flush nodes in cp · d40a43af
      Yunlei He authored
      Thread A			Thread B
      
      - write_checkpoint
       - block_operations
         -blk_start_plug
          -sync_node_pages		- f2fs_do_sync_file
      				 - fsync_node_pages
      				  - f2fs_wait_on_page_writeback
      
      Thread A wait for global F2FS_DIRTY_NODES decreased to zero,
      it start a plug list, some requests have been added to this list.
      Thread B lock one dirty node page, and wait this page write back.
      But this page has been in plug list of thread A with PG_writeback flag.
      Thread A keep on running and its plug list has no chance to finish,
      so it seems a deadlock between cp and fsync path.
      
      This patch add a wait on page write back before set node page dirty
      to avoid this problem.
      Signed-off-by: default avatarYunlei He <heyunlei@huawei.com>
      Signed-off-by: default avatarPengyang Hou <houpengyang@huawei.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      d40a43af
    • Chao Yu's avatar
      f2fs: don't wait writeback for datas during checkpoint · 36951b38
      Chao Yu authored
      Normally, while committing checkpoint, we will wait on all pages to be
      writebacked no matter the page is data or metadata, so in scenario where
      there are lots of data IO being submitted with metadata, we may suffer
      long latency for waiting writeback during checkpoint.
      
      Indeed, we only care about persistence for pages with metadata, but not
      pages with data, as file system consistent are only related to metadate,
      so in order to avoid encountering long latency in above scenario, let's
      recognize and reference metadata in submitted IOs, wait writeback only
      for metadatas.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      36951b38
    • Jaegeuk Kim's avatar
      f2fs: fix wrong written_valid_blocks counting · c79b7ff1
      Jaegeuk Kim authored
      Previously, written_valid_blocks was got by ckpt->valid_block_count. But if
      the last checkpoint has some NEW_ADDR due to power-cut, we can get wrong value.
      Fix it to get the number from actual written block count from sit entries.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      c79b7ff1
    • Jaegeuk Kim's avatar
      f2fs: avoid BG_GC in f2fs_balance_fs · 7702bdbe
      Jaegeuk Kim authored
      If many threads hit has_not_enough_free_secs() in f2fs_balance_fs() at the same
      time, all the threads would do FG_GC or BG_GC.
      In this critical path, we totally don't need to do BG_GC at all.
      Let's avoid that.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      7702bdbe
    • Jaegeuk Kim's avatar
      f2fs: fix redundant block allocation · c040ff9d
      Jaegeuk Kim authored
      In direct_IO path of f2fs_file_write_iter(),
      1. f2fs_preallocate_blocks(F2FS_GET_BLOCK_PRE_DIO)
         -> allocate LBA X
      2. f2fs_direct_IO()
         -> return 0;
      
      Then,
      f2fs_write_data_page() will allocate another LBA X+1.
      
      This makes EIO triggered by HM-SMR.
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      c040ff9d
    • Jaegeuk Kim's avatar
      f2fs: use err for f2fs_preallocate_blocks · a7de6086
      Jaegeuk Kim authored
      This patch has no functional change.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      a7de6086
    • Jaegeuk Kim's avatar
      f2fs: support multiple devices · 3c62be17
      Jaegeuk Kim authored
      This patch implements multiple devices support for f2fs.
      Given multiple devices by mkfs.f2fs, f2fs shows them entirely as one big
      volume under one f2fs instance.
      
      Internal block management is very simple, but we will modify block allocation
      and background GC policy to boost IO speed by exploiting them accoording to
      each device speed.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      3c62be17
    • Jaegeuk Kim's avatar
      f2fs: allow dio read for LFS mode · e57e9ae5
      Jaegeuk Kim authored
      We can allow dio reads for LFS mode, while doing buffered writes for dio writes.
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      e57e9ae5
    • Jaegeuk Kim's avatar
      f2fs: revert segment allocation for direct IO · 6ae1be13
      Jaegeuk Kim authored
      Now we don't need to be too much careful about storage alignment for dio, since
      its speed becomes quite fast and we'd better avoid any misalignment first.
      
      Revert: 38aa0889 (f2fs: align direct_io'ed data to section)
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      6ae1be13
  2. 23 Nov, 2016 24 commits