1. 02 Jul, 2013 10 commits
    • Linus Torvalds's avatar
      Merge tag 'staging-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · ce49b628
      Linus Torvalds authored
      Pull staging tree update from Greg KH:
       "Here's the large staging tree merge for 3.11-rc1
      
        Huge thing here is the Lustre client code.  Unfortunatly, due to it
        not building properly on a wide variety of different architectures
        (this was production code???), it is currently disabled from the build
        so as to not annoy people.
      
        Other than Lustre, there are loads of comedi patches, working to clean
        up that subsystem, iio updates and new drivers, and a load of cleanups
        from the OPW applicants in their quest to get a summer internship.
      
        All of these have been in the linux-next releases for a while (hence
        the Lustre code being disabled)"
      
      Fixed up trivial conflict in drivers/staging/serqt_usb2/serqt_usb2.c due
      to independent renamings in the staging driver cleanup and the USB
      tree..
      
      * tag 'staging-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (868 commits)
        Revert "Revert "Revert "staging/lustre: drop CONFIG_BROKEN dependency"""
        staging: rtl8192u: fix line length in r819xU_phy.h
        staging: rtl8192u: rename variables in r819xU_phy.h
        staging: rtl8192u: fix comments in r819xU_phy.h
        staging: rtl8192u: fix whitespace in r819xU_phy.h
        staging: rtl8192u: fix newlines in r819xU_phy.c
        staging: comedi: unioxx5: use comedi_alloc_spriv()
        staging: comedi: unioxx5: fix unioxx5_detach()
        silicom: checkpatch: errors caused by macros
        Staging: silicom: remove the board_t typedef in bpctl_mod.c
        Staging: silicom: capitalize labels in the bp_media_type enum
        Staging: silicom: remove bp_media_type enum typedef
        staging: rtl8192u: replace msleep(1) with usleep_range() in r819xU_phy.c
        staging: rtl8192u: rename dwRegRead and rtStatus in r819xU_phy.c
        staging: rtl8192u: replace __FUNCTION__ in r819xU_phy.c
        staging: rtl8192u: limit line size in r819xU_phy.c
        zram: allow request end to coincide with disksize
        staging: drm/imx: use generic irq chip unused field to block out invalid irqs
        staging: drm/imx: use generic irqchip
        staging: drm/imx: ipu-dmfc: use defines for ipu channel numbers
        ...
      ce49b628
    • Linus Torvalds's avatar
      Merge tag 'tty-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · 0de10f9e
      Linus Torvalds authored
      Pull tty/serial updates from Greg KH:
       "Here is the big TTY / Serial driver merge for 3.11-rc1.
      
        It's not all that big, nothing major changed in the tty api, which is
        a nice change, just a number of serial driver fixes and updates and
        new drivers, along with some n_tty fixes to help resolve some reported
        issues.
      
        All of these have been in the linux-next releases for a while, with
        the exception of the last revert patch, which was reported this past
        weekend by two different people as being needed."
      
      * tag 'tty-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (51 commits)
        Revert "serial: 8250_pci: add support for another kind of NetMos Technology PCI 9835 Multi-I/O Controller"
        pch_uart: Add uart_clk selection for the MinnowBoard
        tty: atmel_serial: prepare clk before calling enable
        tty: Reset itty for other pty
        n_tty: Buffer work should not reschedule itself
        n_tty: Fix unsafe update of available buffer space
        n_tty: Untangle read completion variables
        n_tty: Encapsulate minimum_to_wake within N_TTY
        serial: omap: Fix device tree based PM runtime
        serial: imx: Fix serial clock unbalance
        serial/mpc52xx_uart: fix kernel panic when system reboot
        serial: mfd: Add sysrq support
        serial: imx: enable the clocks for console
        tty: serial: add Freescale lpuart driver support
        serial: imx: Improve Kconfig text
        serial: imx: Allow module build
        serial: imx: Fix warning when !CONFIG_SERIAL_IMX_CONSOLE
        tty/serial/sirf: fix error propagation in sirfsoc_uart_probe()
        serial: omap: fix potential NULL pointer dereference in serial_omap_runtime_suspend()
        tty: serial: Enable uartlite for ARM zynq
        ...
      0de10f9e
    • Linus Torvalds's avatar
      Merge tag 'usb-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · a8427018
      Linus Torvalds authored
      Pull USB updates from Greg KH:
       "Here's the big USB 3.11-rc1 merge request.
      
        Lots of gadget and finally, chipidea driver updates (they were much
        needed), along with a new host controller driver, lots of little
        serial driver fixes, the removal of the 255 usb-serial device
        limitation, and a variety of other minor things.
      
        All of these have been in the linux-next releases for a while"
      
      * tag 'usb-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (254 commits)
        usb: musb: omap2430: make it compile again
        usb: chipidea: ci_hdrc_imx: access phy via private data
        xhci: Add missing unlocks on error paths
        USB: option,qcserial: move Novatel Gobi1K IDs to qcserial
        ehci-atmel.c: prepare clk before calling enable
        USB: ohci-at91: prepare clk before calling enable
        USB: HWA: fix device probe failure
        wusbcore: add entries in Documentation/ABI for new wusbhc sysfs attributes
        wusbcore: add sysfs attribute for retry count
        wusbcore: add sysfs attribute for DNTS count and interval
        usb: chipidea: drop "13xxx" infix
        usb: phy: tegra: remove duplicated include from phy-tegra-usb.c
        usb: host: xhci-plat: release mem region while removing module
        usbmisc_imx: allow autoloading on according to dt ids
        usb: fix build error without CONFIG_USB_PHY
        usb: check usb_hub_to_struct_hub() return value
        xhci: check for failed dma pool allocation
        usb: gadget: f_subset: fix missing unlock on error in geth_alloc()
        usb: gadget: f_ncm: fix missing unlock on error in ncm_alloc()
        usb: gadget: f_ecm: fix missing unlock on error in ecm_alloc()
        ...
      a8427018
    • Linus Torvalds's avatar
      Merge tag 'fscache-20130702' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · bcd7351e
      Linus Torvalds authored
      Pull FS-Cache updates from David Howells:
       "This contains a number of fixes for various FS-Cache issues plus some
        cleanups.  The commits are, in order:
      
         1) Provide a system wait_on_atomic_t() and wake_up_atomic_t() sharing
            the bit-wait table (enhancement for #8).
      
         2) Don't put spin_lock() in a while-condition as spin_lock() may have
            a do {} while(0) wrapper (cleanup).
      
         3) Symbolically name i_mutex lock classes rather than using numbers
            in CacheFiles (cleanup).
      
         4) Don't sleep in page release if __GFP_FS is not set (deadlock vs
            ext4).
      
         5) Uninline fscache_object_init() (cleanup for #7).
      
         6) Wrap checks on object state (cleanup for #7).
      
         7) Simplify the object state machine by separating work states from
            wait states.
      
         8) Simplify cookie retention by objects (NULL pointer deref fix).
      
         9) Remove unused list_to_page() macro (cleanup).
      
        10) Make the remaining-pages counter in the retrieval op atomic
            (assertion failure fix).
      
        11) Don't use spin_is_locked() in assertions (assertion failure fix)"
      
      * tag 'fscache-20130702' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
        FS-Cache: Don't use spin_is_locked() in assertions
        FS-Cache: The retrieval remaining-pages counter needs to be atomic_t
        cachefiles: remove unused macro list_to_page()
        FS-Cache: Simplify cookie retention for fscache_objects, fixing oops
        FS-Cache: Fix object state machine to have separate work and wait states
        FS-Cache: Wrap checks on object state
        FS-Cache: Uninline fscache_object_init()
        FS-Cache: Don't sleep in page release if __GFP_FS is not set
        CacheFiles: name i_mutex lock class explicitly
        fs/fscache: remove spin_lock() from the condition in while()
        Add wait_on_atomic_t() and wake_up_atomic_t()
      bcd7351e
    • Linus Torvalds's avatar
      Merge tag 'dlm-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm · 6072a93b
      Linus Torvalds authored
      Pull dlm updates from David Teigland:
       "This set includes a number of SCTP related fixes in the dlm, and a few
        other minor fixes and changes."
      
      * tag 'dlm-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
        dlm: Avoid LVB truncation
        dlm: log an error for unmanaged lockspaces
        dlm: config: using strlcpy instead of strncpy
        dlm: remove duplicated include from lowcomms.c
        dlm: disable nagle for SCTP
        dlm: retry failed SCTP sends
        dlm: try other IPs when sctp init assoc fails
        dlm: clear correct bit during sctp init failure handling
        dlm: set sctp assoc id during setup
        dlm: clear correct init bit during sctp setup
      6072a93b
    • Linus Torvalds's avatar
      Merge tag 'for-f2fs-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs · 3f490f7f
      Linus Torvalds authored
      Pull f2fs updates from Jaegeuk Kim:
       "This patch-set includes the following major enhancement patches:
         - remount_fs callback function
         - restore parent inode number to enhance the fsync performance
         - xattr security labels
         - reduce the number of redundant lock/unlock data pages
         - avoid frequent write_inode calls
      
        The other minor bug fixes are as follows.
         - endian conversion bugs
         - various bugs in the roll-forward recovery routine"
      
      * tag 'for-f2fs-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (56 commits)
        f2fs: fix to recover i_size from roll-forward
        f2fs: remove the unused argument "sbi" of func destroy_fsync_dnodes()
        f2fs: remove reusing any prefree segments
        f2fs: code cleanup and simplify in func {find/add}_gc_inode
        f2fs: optimize the init_dirty_segmap function
        f2fs: fix an endian conversion bug detected by sparse
        f2fs: fix crc endian conversion
        f2fs: add remount_fs callback support
        f2fs: recover wrong pino after checkpoint during fsync
        f2fs: optimize do_write_data_page()
        f2fs: make locate_dirty_segment() as static
        f2fs: remove unnecessary parameter "offset" from __add_sum_entry()
        f2fs: avoid freqeunt write_inode calls
        f2fs: optimise the truncate_data_blocks_range() range
        f2fs: use the F2FS specific flags in f2fs_ioctl()
        f2fs: sync dir->i_size with its block allocation
        f2fs: fix i_blocks translation on various types of files
        f2fs: set sb->s_fs_info before calling parse_options()
        f2fs: support xattr security labels
        f2fs: fix iget/iput of dir during recovery
        ...
      3f490f7f
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw · c4eb1b07
      Linus Torvalds authored
      Pull GFS2 updates from Steven Whitehouse:
       "There are a few bug fixes for various, mostly very minor corner cases,
        plus some interesting new features.
      
        The new features include atomic_open whose main benefit will be the
        reduction in locking overhead in case of combined lookup/create and
        open operations, sorting the log buffer lists by block number to
        improve the efficiency of AIL writeback, and aggressively issuing
        revokes in gfs2_log_flush to reduce overhead when dropping glocks."
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw:
        GFS2: Reserve journal space for quota change in do_grow
        GFS2: Fix fstrim boundary conditions
        GFS2: fix warning message
        GFS2: aggressively issue revokes in gfs2_log_flush
        GFS2: fix regression in dir_double_exhash
        GFS2: Add atomic_open support
        GFS2: Only do one directory search on create
        GFS2: fix error propagation in init_threads()
        GFS2: Remove no-op wrapper function
        GFS2: Cocci spatch "ptr_ret.spatch"
        GFS2: Eliminate gfs2_rg_lops
        GFS2: Sort buffer lists by inplace block number
      c4eb1b07
    • Linus Torvalds's avatar
      Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · 9e239bb9
      Linus Torvalds authored
      Pull ext4 update from Ted Ts'o:
       "Lots of bug fixes, cleanups and optimizations.  In the bug fixes
        category, of note is a fix for on-line resizing file systems where the
        block size is smaller than the page size (i.e., file systems 1k blocks
        on x86, or more interestingly file systems with 4k blocks on Power or
        ia64 systems.)
      
        In the cleanup category, the ext4's punch hole implementation was
        significantly improved by Lukas Czerner, and now supports bigalloc
        file systems.  In addition, Jan Kara significantly cleaned up the
        write submission code path.  We also improved error checking and added
        a few sanity checks.
      
        In the optimizations category, two major optimizations deserve
        mention.  The first is that ext4_writepages() is now used for
        nodelalloc and ext3 compatibility mode.  This allows writes to be
        submitted much more efficiently as a single bio request, instead of
        being sent as individual 4k writes into the block layer (which then
        relied on the elevator code to coalesce the requests in the block
        queue).  Secondly, the extent cache shrink mechanism, which was
        introduce in 3.9, no longer has a scalability bottleneck caused by the
        i_es_lru spinlock.  Other optimizations include some changes to reduce
        CPU usage and to avoid issuing empty commits unnecessarily."
      
      * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (86 commits)
        ext4: optimize starting extent in ext4_ext_rm_leaf()
        jbd2: invalidate handle if jbd2_journal_restart() fails
        ext4: translate flag bits to strings in tracepoints
        ext4: fix up error handling for mpage_map_and_submit_extent()
        jbd2: fix theoretical race in jbd2__journal_restart
        ext4: only zero partial blocks in ext4_zero_partial_blocks()
        ext4: check error return from ext4_write_inline_data_end()
        ext4: delete unnecessary C statements
        ext3,ext4: don't mess with dir_file->f_pos in htree_dirblock_to_tree()
        jbd2: move superblock checksum calculation to jbd2_write_superblock()
        ext4: pass inode pointer instead of file pointer to punch hole
        ext4: improve free space calculation for inline_data
        ext4: reduce object size when !CONFIG_PRINTK
        ext4: improve extent cache shrink mechanism to avoid to burn CPU time
        ext4: implement error handling of ext4_mb_new_preallocation()
        ext4: fix corruption when online resizing a fs with 1K block size
        ext4: delete unused variables
        ext4: return FIEMAP_EXTENT_UNKNOWN for delalloc extents
        jbd2: remove debug dependency on debug_fs and update Kconfig help text
        jbd2: use a single printk for jbd_debug()
        ...
      9e239bb9
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 63580e51
      Linus Torvalds authored
      Pull VFS patches (part 1) from Al Viro:
       "The major change in this pile is ->readdir() replacement with
        ->iterate(), dealing with ->f_pos races in ->readdir() instances for
        good.
      
        There's a lot more, but I'd prefer to split the pull request into
        several stages and this is the first obvious cutoff point."
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (67 commits)
        [readdir] constify ->actor
        [readdir] ->readdir() is gone
        [readdir] convert ecryptfs
        [readdir] convert coda
        [readdir] convert ocfs2
        [readdir] convert fatfs
        [readdir] convert xfs
        [readdir] convert btrfs
        [readdir] convert hostfs
        [readdir] convert afs
        [readdir] convert ncpfs
        [readdir] convert hfsplus
        [readdir] convert hfs
        [readdir] convert befs
        [readdir] convert cifs
        [readdir] convert freevxfs
        [readdir] convert fuse
        [readdir] convert hpfs
        reiserfs: switch reiserfs_readdir_dentry to inode
        reiserfs: is_privroot_deh() needs only directory inode, actually
        ...
      63580e51
    • Dave Chinner's avatar
      sync: don't block the flusher thread waiting on IO · 7747bd4b
      Dave Chinner authored
      When sync does it's WB_SYNC_ALL writeback, it issues data Io and
      then immediately waits for IO completion. This is done in the
      context of the flusher thread, and hence completely ties up the
      flusher thread for the backing device until all the dirty inodes
      have been synced. On filesystems that are dirtying inodes constantly
      and quickly, this means the flusher thread can be tied up for
      minutes per sync call and hence badly affect system level write IO
      performance as the page cache cannot be cleaned quickly.
      
      We already have a wait loop for IO completion for sync(2), so cut
      this out of the flusher thread and delegate it to wait_sb_inodes().
      Hence we can do rapid IO submission, and then wait for it all to
      complete.
      
      Effect of sync on fsmark before the patch:
      
      FSUse%        Count         Size    Files/sec     App Overhead
      .....
           0       640000         4096      35154.6          1026984
           0       720000         4096      36740.3          1023844
           0       800000         4096      36184.6           916599
           0       880000         4096       1282.7          1054367
           0       960000         4096       3951.3           918773
           0      1040000         4096      40646.2           996448
           0      1120000         4096      43610.1           895647
           0      1200000         4096      40333.1           921048
      
      And a single sync pass took:
      
        real    0m52.407s
        user    0m0.000s
        sys     0m0.090s
      
      After the patch, there is no impact on fsmark results, and each
      individual sync(2) operation run concurrently with the same fsmark
      workload takes roughly 7s:
      
        real    0m6.930s
        user    0m0.000s
        sys     0m0.039s
      
      IOWs, sync is 7-8x faster on a busy filesystem and does not have an
      adverse impact on ongoing async data write operations.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7747bd4b
  2. 01 Jul, 2013 23 commits
    • Jaegeuk Kim's avatar
      f2fs: fix to recover i_size from roll-forward · a1dd3c13
      Jaegeuk Kim authored
      If user requests many data writes and fsync together, the last updated i_size
      should be stored to the inode block consistently.
      
      But, previous write_end just marks the inode as dirty and doesn't update its
      metadata into its inode block.
      After that, fsync just writes the inode block with newly updated data index
      excluding inode metadata updates.
      
      So, this patch introduces write_end in which updates inode block too when the
      i_size is changed.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
      a1dd3c13
    • Gu Zheng's avatar
      f2fs: remove the unused argument "sbi" of func destroy_fsync_dnodes() · 5ebefc5b
      Gu Zheng authored
      As destroy_fsync_dnodes() is a simple list-cleanup func, so delete the unused
      and unrelated f2fs_sb_info argument of it.
      Signed-off-by: default avatarGu Zheng <guz.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
      5ebefc5b
    • Jaegeuk Kim's avatar
      f2fs: remove reusing any prefree segments · 763bfe1b
      Jaegeuk Kim authored
      This patch removes check_prefree_segments initially designed to enhance the
      performance by narrowing the range of LBA usage across the whole block device.
      
      When allocating a new segment, previous f2fs tries to find proper prefree
      segments, and then, if finds a segment, it reuses the segment for further
      data or node block allocation.
      
      However, I found that this was totally wrong approach since the prefree segments
      have several data or node blocks that will be used by the roll-forward mechanism
      operated after sudden-power-off.
      
      Let's assume the following scenario.
      
      /* write 8MB with fsync */
      for (i = 0; i < 2048; i++) {
      	offset = i * 4096;
      	write(fd, offset, 4KB);
      	fsync(fd);
      }
      
      In this case, naive segment allocation sequence will be like:
       data segment: x, x+1, x+2, x+3
       node segment: y, y+1, y+2, y+3.
      
      But, if we can reuse prefree segments, the sequence can be like:
       data segment: x, x+1, y, y+1
       node segment: y, y+1, y+2, y+3.
      Because, y, y+1, and y+2 became prefree segments one by one, and those are
      reused by data allocation.
      
      After conducting this workload, we should consider how to recover the latest
      inode with its data.
      If we reuse the prefree segments such as y or y+1, we lost the old node blocks
      so that f2fs even cannot start roll-forward recovery.
      
      Therefore, I suggest that we should remove reusing prefree segments.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
      763bfe1b
    • Gu Zheng's avatar
      f2fs: code cleanup and simplify in func {find/add}_gc_inode · 6cc4af56
      Gu Zheng authored
      This patch simplifies list operations in find_gc_inode and add_gc_inode.
      Just simple code cleanup.
      Signed-off-by: default avatarGu Zheng <guz.fnst@cn.fujitsu.com>
      [Jaegeuk Kim: add description]
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
      6cc4af56
    • Namjae Jeon's avatar
      f2fs: optimize the init_dirty_segmap function · 8736fbf0
      Namjae Jeon authored
      Optimize the while loop condition
      
      Since this condition will always be true and while loop will
      be terminated by the following condition in code:
      
      if (segno >= TOTAL_SEGS(sbi))
          break;
      Hence we can replace the while loop condition with while(1)
      instead of always checking for segno to be less than Total segs.
      
      Also we do not need to use TOTAL_SEGS() everytime. We can store
      this value in a local variable since this value is constant.
      Signed-off-by: default avatarNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: default avatarPankaj Kumar <pankaj.km@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
      8736fbf0
    • Jaegeuk Kim's avatar
      f2fs: fix an endian conversion bug detected by sparse · 060dd67b
      Jaegeuk Kim authored
      This patch should fix the following bug reported by kbuild test robot.
      
      fs/f2fs/recovery.c:233:33: sparse: incorrect type in assignment
      (different base types)
      
      parse warnings: (new ones prefixed by >>)
      
      >> recovery.c:233: sparse: incorrect type in assignment (different base types)
         recovery.c:233:    expected unsigned int [unsigned] [assigned] ofs_in_node
         recovery.c:233:    got restricted __le16 [assigned] [usertype] ofs_in_node
      >> recovery.c:238: sparse: incorrect type in assignment (different base types)
         recovery.c:238:    expected unsigned int [unsigned] ofs_in_node
         recovery.c:238:    got restricted __le16 [assigned] [usertype] ofs_in_node
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
      060dd67b
    • Jaegeuk Kim's avatar
      f2fs: fix crc endian conversion · 7e586fa0
      Jaegeuk Kim authored
      While calculating CRC for the checkpoint block, we use __u32, but when storing
      the crc value to the disk, we use __le32.
      
      Let's fix the inconsistency.
      Reported-and-Tested-by: default avatarOded Gabbay <ogabbay@advaoptical.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
      7e586fa0
    • Ashish Sangwan's avatar
      ext4: optimize starting extent in ext4_ext_rm_leaf() · 6ae06ff5
      Ashish Sangwan authored
      Both hole punch and truncate use ext4_ext_rm_leaf() for removing
      blocks.  Currently we choose the last extent as the starting
      point for removing blocks:
      
      	ex = EXT_LAST_EXTENT(eh);
      
      This is OK for truncate but for hole punch we can optimize the extent
      selection as the path is already initialized.  We could use this
      information to select proper starting extent.  The code change in this
      patch will not affect truncate as for truncate path[depth].p_ext will
      always be NULL.
      Signed-off-by: default avatarAshish Sangwan <a.sangwan@samsung.com>
      Signed-off-by: default avatarNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      6ae06ff5
    • Theodore Ts'o's avatar
      jbd2: invalidate handle if jbd2_journal_restart() fails · 41a5b913
      Theodore Ts'o authored
      If jbd2_journal_restart() fails the handle will have been disconnected
      from the current transaction.  In this situation, the handle must not
      be used for for any jbd2 function other than jbd2_journal_stop().
      Enforce this with by treating a handle which has a NULL transaction
      pointer as an aborted handle, and issue a kernel warning if
      jbd2_journal_extent(), jbd2_journal_get_write_access(),
      jbd2_journal_dirty_metadata(), etc. is called with an invalid handle.
      
      This commit also fixes a bug where jbd2_journal_stop() would trip over
      a kernel jbd2 assertion check when trying to free an invalid handle.
      
      Also move the responsibility of setting current->journal_info to
      start_this_handle(), simplifying the three users of this function.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Reported-by: default avatarYounger Liu <younger.liu@huawei.com>
      Cc: Jan Kara <jack@suse.cz>
      41a5b913
    • Theodore Ts'o's avatar
      ext4: translate flag bits to strings in tracepoints · 21ddd568
      Theodore Ts'o authored
      Translate the bitfields used in various flags argument to strings to
      make the tracepoint output more human-readable.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      21ddd568
    • Theodore Ts'o's avatar
      ext4: fix up error handling for mpage_map_and_submit_extent() · cb530541
      Theodore Ts'o authored
      The function mpage_released_unused_page() must only be called once;
      otherwise the kernel will BUG() when the second call to
      mpage_released_unused_page() tries to unlock the pages which had been
      unlocked by the first call.
      
      Also restructure the error handling so that we only give up on writing
      the dirty pages in the case of ENOSPC where retrying the allocation
      won't help.  Otherwise, a transient failure, such as a kmalloc()
      failure in calling ext4_map_blocks() might cause us to give up on
      those pages, leading to a scary message in /var/log/messages plus data
      loss.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      cb530541
    • Theodore Ts'o's avatar
      jbd2: fix theoretical race in jbd2__journal_restart · 39c04153
      Theodore Ts'o authored
      Once we decrement transaction->t_updates, if this is the last handle
      holding the transaction from closing, and once we release the
      t_handle_lock spinlock, it's possible for the transaction to commit
      and be released.  In practice with normal kernels, this probably won't
      happen, since the commit happens in a separate kernel thread and it's
      unlikely this could all happen within the space of a few CPU cycles.
      
      On the other hand, with a real-time kernel, this could potentially
      happen, so save the tid found in transaction->t_tid before we release
      t_handle_lock.  It would require an insane configuration, such as one
      where the jbd2 thread was set to a very high real-time priority,
      perhaps because a high priority real-time thread is trying to read or
      write to a file system.  But some people who use real-time kernels
      have been known to do insane things, including controlling
      laser-wielding industrial robots.  :-)
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      39c04153
    • Lukas Czerner's avatar
      ext4: only zero partial blocks in ext4_zero_partial_blocks() · e1be3a92
      Lukas Czerner authored
      Currently if we pass range into ext4_zero_partial_blocks() which covers
      entire block we would attempt to zero it even though we should only zero
      unaligned part of the block.
      
      Fix this by checking whether the range covers the whole block skip
      zeroing if so.
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      e1be3a92
    • Theodore Ts'o's avatar
      ext4: check error return from ext4_write_inline_data_end() · 42c832de
      Theodore Ts'o authored
      The function ext4_write_inline_data_end() can return an error.  So we
      need to assign it to a signed integer variable to check for an error
      return (since copied is an unsigned int).
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: Zheng Liu <wenqing.lz@taobao.com>
      Cc: stable@vger.kernel.org
      42c832de
    • jon ernst's avatar
      ext4: delete unnecessary C statements · 353eefd3
      jon ernst authored
      Comparing unsigned variable with 0 always returns false.
      err = 0 is duplicated and unnecessary.
      
      [ tytso: Also cleaned up error handling in ext4_block_zero_page_range() ]
      Signed-off-by: default avatar"Jon Ernst" <jonernst07@gmx.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      353eefd3
    • Al Viro's avatar
      ext3,ext4: don't mess with dir_file->f_pos in htree_dirblock_to_tree() · 64cb9273
      Al Viro authored
      Both ext3 and ext4 htree_dirblock_to_tree() is just filling the
      in-core rbtree for use by call_filldir().  All updates of ->f_pos are
      done by the latter; bumping it here (on error) is obviously wrong - we
      might very well have it nowhere near the block we'd found an error in.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      64cb9273
    • Theodore Ts'o's avatar
      jbd2: move superblock checksum calculation to jbd2_write_superblock() · fe52d17c
      Theodore Ts'o authored
      Some of the functions which modify the jbd2 superblock were not
      updating the checksum before calling jbd2_write_superblock().  Move
      the call to jbd2_superblock_csum_set() to jbd2_write_superblock(), so
      that the checksum is calculated consistently.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: stable@vger.kernel.org
      fe52d17c
    • Ashish Sangwan's avatar
      ext4: pass inode pointer instead of file pointer to punch hole · aeb2817a
      Ashish Sangwan authored
      No need to pass file pointer when we can directly pass inode pointer.
      Signed-off-by: default avatarAshish Sangwan <a.sangwan@samsung.com>
      Signed-off-by: default avatarNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      aeb2817a
    • boxi liu's avatar
      ext4: improve free space calculation for inline_data · c4932dbe
      boxi liu authored
      In ext4 feature inline_data,it use the xattr's space to store the
      inline data in inode.When we calculate the inline data as the xattr,we
      add the pad.But in get_max_inline_xattr_value_size() function we count
      the free space without pad.It cause some contents are moved to a block
      even if it can be
      stored in the inode.
      Signed-off-by: default avatarliulei <lewis.liulei@huawei.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: default avatarTao Ma <boyu.mt@taobao.com>
      c4932dbe
    • Joe Perches's avatar
      ext4: reduce object size when !CONFIG_PRINTK · e7c96e8e
      Joe Perches authored
      Reduce the object size ~10% could be useful for embedded systems.
      
      Add #ifdef CONFIG_PRINTK #else #endif blocks to hold formats and
      arguments, passing " " to functions when !CONFIG_PRINTK and still
      verifying format and arguments with no_printk.
      
      $ size fs/ext4/built-in.o*
         text	   data	    bss	    dec	    hex	filename
       239375	    610	    888	 240873	  3ace9	fs/ext4/built-in.o.new
       264167	    738	    888	 265793	  40e41	fs/ext4/built-in.o.old
      
          $ grep -E "CONFIG_EXT4|CONFIG_PRINTK" .config
          # CONFIG_PRINTK is not set
          CONFIG_EXT4_FS=y
          CONFIG_EXT4_USE_FOR_EXT23=y
          CONFIG_EXT4_FS_POSIX_ACL=y
          # CONFIG_EXT4_FS_SECURITY is not set
          # CONFIG_EXT4_DEBUG is not set
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      e7c96e8e
    • Zheng Liu's avatar
      ext4: improve extent cache shrink mechanism to avoid to burn CPU time · d3922a77
      Zheng Liu authored
      Now we maintain an proper in-order LRU list in ext4 to reclaim entries
      from extent status tree when we are under heavy memory pressure.  For
      keeping this order, a spin lock is used to protect this list.  But this
      lock burns a lot of CPU time.  We can use the following steps to trigger
      it.
      
        % cd /dev/shm
        % dd if=/dev/zero of=ext4-img bs=1M count=2k
        % mkfs.ext4 ext4-img
        % mount -t ext4 -o loop ext4-img /mnt
        % cd /mnt
        % for ((i=0;i<160;i++)); do truncate -s 64g $i; done
        % for ((i=0;i<160;i++)); do cp $i /dev/null &; done
        % perf record -a -g
        % perf report
      
      This commit tries to fix this problem.  Now a new member called
      i_touch_when is added into ext4_inode_info to record the last access
      time for an inode.  Meanwhile we never need to keep a proper in-order
      LRU list.  So this can avoid to burns some CPU time.  When we try to
      reclaim some entries from extent status tree, we use list_sort() to get
      a proper in-order list.  Then we traverse this list to discard some
      entries.  In ext4_sb_info, we use s_es_last_sorted to record the last
      time of sorting this list.  When we traverse the list, we skip the inode
      that is newer than this time, and move this inode to the tail of LRU
      list.  When the head of the list is newer than s_es_last_sorted, we will
      sort the LRU list again.
      
      In this commit, we break the loop if s_extent_cache_cnt == 0 because
      that means that all extents in extent status tree have been reclaimed.
      
      Meanwhile in this commit, ext4_es_{un}register_shrinker()'s prototype is
      changed to save a local variable in these functions.
      Reported-by: default avatarDave Hansen <dave.hansen@intel.com>
      Signed-off-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      d3922a77
    • Alexey Khoroshilov's avatar
      ext4: implement error handling of ext4_mb_new_preallocation() · 2c00ef3e
      Alexey Khoroshilov authored
      If memory allocation in ext4_mb_new_group_pa() is failed,
      it returns error code, ext4_mb_new_preallocation() propages it,
      but ext4_mb_new_blocks() ignores it.
      
      An observed result was:
      
      - allocation fail means ext4_mb_new_group_pa() does not update
        ext4_allocation_context;
      
      - ext4_mb_new_blocks() sets ext4_allocation_request->len (ar->len =
        ac->ac_b_ex.fe_len;) to number of blocks preallocated (512) instead
        of number of blocks requested (1);
      
      - that activates update cycle in ext4_splice_branch():
          for (i = 1; i < blks; i++) <-- blks is 512 instead of 1 here
            *(where->p + i) = cpu_to_le32(current_block++);
      
      - it iterates 511 times and corrupts a chunk of memory including inode
        structure;
      
      - page fault happens at EXT4_SB(inode->i_sb) in ext4_mark_inode_dirty();
      
      - system hangs with 'scheduling while atomic' BUG.
      
      The patch implements a check for ext4_mb_new_preallocation() error
      code and handles its failure as if ext4_mb_regular_allocator() fails.
      
      Found by Linux File System Verification project (linuxtesting.org).
      
      [ Patch restructed by tytso to make the flow of control easier to follow. ]
      Signed-off-by: default avatarAlexey Khoroshilov <khoroshilov@ispras.ru>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      2c00ef3e
    • Maarten ter Huurne's avatar
      ext4: fix corruption when online resizing a fs with 1K block size · 6ca792ed
      Maarten ter Huurne authored
      Subtracting the number of the first data block places the superblock
      backups one block too early, corrupting the file system. When the block
      size is larger than 1K, the first data block is 0, so the subtraction
      has no effect and no corruption occurs.
      Signed-off-by: default avatarMaarten ter Huurne <maarten@treewalker.org>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      CC: stable@vger.kernel.org
      6ca792ed
  3. 30 Jun, 2013 7 commits