Commit 5993692f authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:

 - further restructure ext4 documentation

 - fix up ext4's delayed allocation for bigalloc file systems

 - fix up some syzbot-detected races in EXT4_IOC_MOVE_EXT,
   EXT4_IOC_SWAP_BOOT, and ext4_remount

 - ... and a few other miscellaneous bugs and optimizations.

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (21 commits)
  ext4: fix use-after-free race in ext4_remount()'s error path
  ext4: cache NULL when both default_acl and acl are NULL
  docs: promote the ext4 data structures book to top level
  docs: move ext4 administrative docs to admin-guide/
  jbd2: fix use after free in jbd2_log_do_checkpoint()
  ext4: propagate error from dquot_initialize() in EXT4_IOC_FSSETXATTR
  ext4: fix setattr project check in fssetxattr ioctl
  docs: make ext4 readme tables readable
  docs: fix ext4 documentation table formatting problems
  docs: generate a separate ext4 pdf file from the documentation
  ext4: convert fault handler to use vm_fault_t type
  ext4: initialize retries variable in ext4_da_write_inline_data_begin()
  ext4: fix EXT4_IOC_SWAP_BOOT
  ext4: fix build error when DX_DEBUG is defined
  ext4: fix argument checking in EXT4_IOC_MOVE_EXT
  ext4: fix reserved cluster accounting at page invalidation time
  ext4: adjust reserved cluster count when removing extents
  ext4: reduce reserved cluster count by number of allocated clusters
  ext4: fix reserved cluster accounting at delayed write time
  ext4: add new pending reservation mechanism
  ...
parents d6edff78 33458eab
This diff is collapsed.
......@@ -71,6 +71,7 @@ configure specific aspects of kernel behavior to your liking.
java
ras
bcache
ext4
pm/index
thunderbolt
LSM/index
......
......@@ -383,6 +383,10 @@ latex_documents = [
'The kernel development community', 'manual'),
('filesystems/index', 'filesystems.tex', 'Linux Filesystems API',
'The kernel development community', 'manual'),
('admin-guide/ext4', 'ext4-admin-guide.tex', 'ext4 Administration Guide',
'ext4 Community', 'manual'),
('filesystems/ext4/index', 'ext4-data-structures.tex',
'ext4 Data Structures and Algorithms', 'ext4 Community', 'manual'),
('gpu/index', 'gpu.tex', 'Linux GPU Driver Developer\'s Guide',
'The kernel development community', 'manual'),
('input/index', 'linux-input.tex', 'The Linux input driver subsystem',
......
......@@ -30,7 +30,7 @@ Extended attributes, when stored after the inode, have a header
``ext4_xattr_ibody_header`` that is 4 bytes long:
.. list-table::
:widths: 1 1 1 77
:widths: 8 8 24 40
:header-rows: 1
* - Offset
......@@ -47,7 +47,7 @@ The beginning of an extended attribute block is in
``struct ext4_xattr_header``, which is 32 bytes long:
.. list-table::
:widths: 1 1 1 77
:widths: 8 8 24 40
:header-rows: 1
* - Offset
......@@ -92,7 +92,7 @@ entries must be stored in sorted order. The sort order is
Attributes stored inside an inode do not need be stored in sorted order.
.. list-table::
:widths: 1 1 1 77
:widths: 8 8 24 40
:header-rows: 1
* - Offset
......@@ -157,7 +157,7 @@ attribute name index field is set, and matching string is removed from
the key name. Here is a map of name index values to key prefixes:
.. list-table::
:widths: 1 79
:widths: 16 64
:header-rows: 1
* - Name Index
......
......@@ -28,7 +28,7 @@ of checksum. The checksum function is whatever the superblock describes
(crc32c as of October 2013) unless noted otherwise.
.. list-table::
:widths: 1 1 4
:widths: 20 8 50
:header-rows: 1
* - Metadata
......
......@@ -34,7 +34,7 @@ is at most 263 bytes long, though on disk you'll need to reference
``dirent.rec_len`` to know for sure.
.. list-table::
:widths: 1 1 1 77
:widths: 8 8 24 40
:header-rows: 1
* - Offset
......@@ -66,7 +66,7 @@ tree traversal. This format is ``ext4_dir_entry_2``, which is at most
``dirent.rec_len`` to know for sure.
.. list-table::
:widths: 1 1 1 77
:widths: 8 8 24 40
:header-rows: 1
* - Offset
......@@ -99,7 +99,7 @@ tree traversal. This format is ``ext4_dir_entry_2``, which is at most
The directory file type is one of the following values:
.. list-table::
:widths: 1 79
:widths: 16 64
:header-rows: 1
* - Value
......@@ -130,7 +130,7 @@ in the place where the name normally goes. The structure is
``struct ext4_dir_entry_tail``:
.. list-table::
:widths: 1 1 1 77
:widths: 8 8 24 40
:header-rows: 1
* - Offset
......@@ -212,7 +212,7 @@ The root of the htree is in ``struct dx_root``, which is the full length
of a data block:
.. list-table::
:widths: 1 1 1 77
:widths: 8 8 24 40
:header-rows: 1
* - Offset
......@@ -305,7 +305,7 @@ of a data block:
The directory hash is one of the following values:
.. list-table::
:widths: 1 79
:widths: 16 64
:header-rows: 1
* - Value
......@@ -327,7 +327,7 @@ Interior nodes of an htree are recorded as ``struct dx_node``, which is
also the full length of a data block:
.. list-table::
:widths: 1 1 1 77
:widths: 8 8 24 40
:header-rows: 1
* - Offset
......@@ -375,7 +375,7 @@ The hash maps that exist in both ``struct dx_root`` and
long:
.. list-table::
:widths: 1 1 1 77
:widths: 8 8 24 40
:header-rows: 1
* - Offset
......@@ -405,7 +405,7 @@ directory index (which will ensure that there's space for the checksum.
The dx\_tail structure is 8 bytes long and looks like this:
.. list-table::
:widths: 1 1 1 77
:widths: 8 8 24 40
:header-rows: 1
* - Offset
......
This diff is collapsed.
......@@ -43,7 +43,7 @@ entire bitmap.
The block group descriptor is laid out in ``struct ext4_group_desc``.
.. list-table::
:widths: 1 1 1 77
:widths: 8 8 24 40
:header-rows: 1
* - Offset
......@@ -157,7 +157,7 @@ The block group descriptor is laid out in ``struct ext4_group_desc``.
Block group flags can be any combination of the following:
.. list-table::
:widths: 1 79
:widths: 16 64
:header-rows: 1
* - Value
......
......@@ -68,7 +68,7 @@ The extent tree header is recorded in ``struct ext4_extent_header``,
which is 12 bytes long:
.. list-table::
:widths: 1 1 1 77
:widths: 8 8 24 40
:header-rows: 1
* - Offset
......@@ -104,7 +104,7 @@ Internal nodes of the extent tree, also known as index nodes, are
recorded as ``struct ext4_extent_idx``, and are 12 bytes long:
.. list-table::
:widths: 1 1 1 77
:widths: 8 8 24 40
:header-rows: 1
* - Offset
......@@ -134,7 +134,7 @@ Leaf nodes of the extent tree are recorded as ``struct ext4_extent``,
and are also 12 bytes long:
.. list-table::
:widths: 1 1 1 77
:widths: 8 8 24 40
:header-rows: 1
* - Offset
......@@ -174,7 +174,7 @@ including) the checksum itself.
``struct ext4_extent_tail`` is 4 bytes long:
.. list-table::
:widths: 1 1 1 77
:widths: 8 8 24 40
:header-rows: 1
* - Offset
......
.. SPDX-License-Identifier: GPL-2.0
===============
ext4 Filesystem
===============
General usage and on-disk artifacts writen by ext4. More documentation may
be ported from the wiki as time permits. This should be considered the
canonical source of information as the details here have been reviewed by
the ext4 community.
===================================
ext4 Data Structures and Algorithms
===================================
.. toctree::
:maxdepth: 5
:maxdepth: 6
:numbered:
ext4
ondisk/index
about.rst
overview.rst
globals.rst
dynamic.rst
......@@ -29,8 +29,9 @@ and the inode structure itself.
The inode table entry is laid out in ``struct ext4_inode``.
.. list-table::
:widths: 1 1 1 77
:widths: 8 8 24 40
:header-rows: 1
:class: longtable
* - Offset
- Size
......@@ -176,7 +177,7 @@ The inode table entry is laid out in ``struct ext4_inode``.
The ``i_mode`` value is a combination of the following flags:
.. list-table::
:widths: 1 79
:widths: 16 64
:header-rows: 1
* - Value
......@@ -227,7 +228,7 @@ The ``i_mode`` value is a combination of the following flags:
The ``i_flags`` field is a combination of these values:
.. list-table::
:widths: 1 79
:widths: 16 64
:header-rows: 1
* - Value
......@@ -314,7 +315,7 @@ The ``osd1`` field has multiple meanings depending on the creator:
Linux:
.. list-table::
:widths: 1 1 1 77
:widths: 8 8 24 40
:header-rows: 1
* - Offset
......@@ -331,7 +332,7 @@ Linux:
Hurd:
.. list-table::
:widths: 1 1 1 77
:widths: 8 8 24 40
:header-rows: 1
* - Offset
......@@ -346,7 +347,7 @@ Hurd:
Masix:
.. list-table::
:widths: 1 1 1 77
:widths: 8 8 24 40
:header-rows: 1
* - Offset
......@@ -365,7 +366,7 @@ The ``osd2`` field has multiple meanings depending on the filesystem creator:
Linux:
.. list-table::
:widths: 1 1 1 77
:widths: 8 8 24 40
:header-rows: 1
* - Offset
......@@ -402,7 +403,7 @@ Linux:
Hurd:
.. list-table::
:widths: 1 1 1 77
:widths: 8 8 24 40
:header-rows: 1
* - Offset
......@@ -433,7 +434,7 @@ Hurd:
Masix:
.. list-table::
:widths: 1 1 1 77
:widths: 8 8 24 40
:header-rows: 1
* - Offset
......
......@@ -48,7 +48,7 @@ Layout
Generally speaking, the journal has this format:
.. list-table::
:widths: 1 1 78
:widths: 16 48 16
:header-rows: 1
* - Superblock
......@@ -76,7 +76,7 @@ The journal superblock will be in the next full block after the
superblock.
.. list-table::
:widths: 1 1 1 1 76
:widths: 12 12 12 32 12
:header-rows: 1
* - 1024 bytes of padding
......@@ -98,7 +98,7 @@ Every block in the journal starts with a common 12-byte header
``struct journal_header_s``:
.. list-table::
:widths: 1 1 1 77
:widths: 8 8 24 40
:header-rows: 1
* - Offset
......@@ -124,7 +124,7 @@ Every block in the journal starts with a common 12-byte header
The journal block type can be any one of:
.. list-table::
:widths: 1 79
:widths: 16 64
:header-rows: 1
* - Value
......@@ -154,7 +154,7 @@ The journal superblock is recorded as ``struct journal_superblock_s``,
which is 1024 bytes long:
.. list-table::
:widths: 1 1 1 77
:widths: 8 8 24 40
:header-rows: 1
* - Offset
......@@ -264,7 +264,7 @@ which is 1024 bytes long:
The journal compat features are any combination of the following:
.. list-table::
:widths: 1 79
:widths: 16 64
:header-rows: 1
* - Value
......@@ -278,7 +278,7 @@ The journal compat features are any combination of the following:
The journal incompat features are any combination of the following:
.. list-table::
:widths: 1 79
:widths: 16 64
:header-rows: 1
* - Value
......@@ -306,7 +306,7 @@ Journal checksum type codes are one of the following. crc32 or crc32c are the
most likely choices.
.. list-table::
:widths: 1 79
:widths: 16 64
:header-rows: 1
* - Value
......@@ -330,7 +330,7 @@ described by a data structure, but here is the block structure anyway.
Descriptor blocks consume at least 36 bytes, but use a full block:
.. list-table::
:widths: 1 1 1 77
:widths: 8 8 24 40
:header-rows: 1
* - Offset
......@@ -355,7 +355,7 @@ defined as ``struct journal_block_tag3_s``, which looks like the
following. The size is 16 or 32 bytes.
.. list-table::
:widths: 1 1 1 77
:widths: 8 8 24 40
:header-rows: 1
* - Offset
......@@ -400,7 +400,7 @@ following. The size is 16 or 32 bytes.
The journal tag flags are any combination of the following:
.. list-table::
:widths: 1 79
:widths: 16 64
:header-rows: 1
* - Value
......@@ -421,7 +421,7 @@ is defined as ``struct journal_block_tag_s``, which looks like the
following. The size is 8, 12, 24, or 28 bytes:
.. list-table::
:widths: 1 1 1 77
:widths: 8 8 24 40
:header-rows: 1
* - Offset
......@@ -471,7 +471,7 @@ JBD2\_FEATURE\_INCOMPAT\_CSUM\_V3 are set, the end of the block is a
``struct jbd2_journal_block_tail``, which looks like this:
.. list-table::
:widths: 1 1 1 77
:widths: 8 8 24 40
:header-rows: 1
* - Offset
......@@ -513,7 +513,7 @@ Revocation blocks are described in
length, but use a full block:
.. list-table::
:widths: 1 1 1 77
:widths: 8 8 24 40
:header-rows: 1
* - Offset
......@@ -543,7 +543,7 @@ JBD2\_FEATURE\_INCOMPAT\_CSUM\_V3 are set, the end of the revocation
block is a ``struct jbd2_journal_revoke_tail``, which has this format:
.. list-table::
:widths: 1 1 1 77
:widths: 8 8 24 40
:header-rows: 1
* - Offset
......@@ -567,7 +567,7 @@ The commit block is described by ``struct commit_header``, which is 32
bytes long (but uses a full block):
.. list-table::
:widths: 1 1 1 77
:widths: 8 8 24 40
:header-rows: 1
* - Offset
......
......@@ -32,7 +32,7 @@ The checksum is calculated against the FS UUID and the MMP structure.
The MMP structure (``struct mmp_struct``) is as follows:
.. list-table::
:widths: 1 1 1 77
:widths: 8 12 20 40
:header-rows: 1
* - Offset
......
.. SPDX-License-Identifier: GPL-2.0
==============================
Data Structures and Algorithms
==============================
.. include:: about.rst
.. include:: overview.rst
.. include:: globals.rst
.. include:: dynamic.rst
......@@ -6,7 +6,7 @@ Special inodes
ext4 reserves some inode for special features, as follows:
.. list-table::
:widths: 1 79
:widths: 6 70
:header-rows: 1
* - inode Number
......
......@@ -19,7 +19,7 @@ The ext4 superblock is laid out as follows in
``struct ext4_super_block``:
.. list-table::
:widths: 1 1 1 77
:widths: 8 8 24 40
:header-rows: 1
* - Offset
......@@ -483,7 +483,7 @@ The ext4 superblock is laid out as follows in
The superblock state is some combination of the following:
.. list-table::
:widths: 1 79
:widths: 8 72
:header-rows: 1
* - Value
......@@ -500,7 +500,7 @@ The superblock state is some combination of the following:
The superblock error policy is one of the following:
.. list-table::
:widths: 1 79
:widths: 8 72
:header-rows: 1
* - Value
......@@ -517,7 +517,7 @@ The superblock error policy is one of the following:
The filesystem creator is one of the following:
.. list-table::
:widths: 1 79
:widths: 8 72
:header-rows: 1
* - Value
......@@ -538,7 +538,7 @@ The filesystem creator is one of the following:
The superblock revision is one of the following:
.. list-table::
:widths: 1 79
:widths: 8 72
:header-rows: 1
* - Value
......@@ -556,7 +556,7 @@ The superblock compatible features field is a combination of any of the
following:
.. list-table::
:widths: 1 79
:widths: 16 64
:header-rows: 1
* - Value
......@@ -595,7 +595,7 @@ The superblock incompatible features field is a combination of any of the
following:
.. list-table::
:widths: 1 79
:widths: 16 64
:header-rows: 1
* - Value
......@@ -647,7 +647,7 @@ The superblock read-only compatible features field is a combination of any of
the following:
.. list-table::
:widths: 1 79
:widths: 16 64
:header-rows: 1
* - Value
......@@ -702,7 +702,7 @@ the following:
The ``s_def_hash_version`` field is one of the following:
.. list-table::
:widths: 1 79
:widths: 8 72
:header-rows: 1
* - Value
......@@ -725,7 +725,7 @@ The ``s_def_hash_version`` field is one of the following:
The ``s_default_mount_opts`` field is any combination of the following:
.. list-table::
:widths: 1 79
:widths: 8 72
:header-rows: 1
* - Value
......@@ -767,7 +767,7 @@ The ``s_default_mount_opts`` field is any combination of the following:
The ``s_flags`` field is any combination of the following:
.. list-table::
:widths: 1 79
:widths: 8 72
:header-rows: 1
* - Value
......@@ -784,7 +784,7 @@ The ``s_flags`` field is any combination of the following:
The ``s_encrypt_algos`` list can contain any of the following:
.. list-table::
:widths: 1 79
:widths: 8 72
:header-rows: 1
* - Value
......
......@@ -284,12 +284,16 @@ ext4_init_acl(handle_t *handle, struct inode *inode, struct inode *dir)
error = __ext4_set_acl(handle, inode, ACL_TYPE_DEFAULT,
default_acl, XATTR_CREATE);
posix_acl_release(default_acl);
} else {
inode->i_default_acl = NULL;
}
if (acl) {
if (!error)
error = __ext4_set_acl(handle, inode, ACL_TYPE_ACCESS,
acl, XATTR_CREATE);
posix_acl_release(acl);
} else {
inode->i_acl = NULL;
}
return error;
}
......@@ -628,6 +628,7 @@ enum {
#define EXT4_FREE_BLOCKS_NO_QUOT_UPDATE 0x0008
#define EXT4_FREE_BLOCKS_NOFREE_FIRST_CLUSTER 0x0010
#define EXT4_FREE_BLOCKS_NOFREE_LAST_CLUSTER 0x0020
#define EXT4_FREE_BLOCKS_RERESERVE_CLUSTER 0x0040
/*
* ioctl commands
......@@ -1030,6 +1031,9 @@ struct ext4_inode_info {
ext4_lblk_t i_da_metadata_calc_last_lblock;
int i_da_metadata_calc_len;
/* pending cluster reservations for bigalloc file systems */
struct ext4_pending_tree i_pending_tree;
/* on-disk additional length */
__u16 i_extra_isize;
......@@ -1401,7 +1405,8 @@ struct ext4_sb_info {
u32 s_min_batch_time;
struct block_device *journal_bdev;
#ifdef CONFIG_QUOTA
char *s_qf_names[EXT4_MAXQUOTAS]; /* Names of quota files with journalled quota */
/* Names of quota files with journalled quota */
char __rcu *s_qf_names[EXT4_MAXQUOTAS];
int s_jquota_fmt; /* Format of quota to use */
#endif
unsigned int s_want_extra_isize; /* New inodes should reserve # bytes */
......@@ -2483,10 +2488,11 @@ extern int ext4_writepage_trans_blocks(struct inode *);
extern int ext4_chunk_trans_blocks(struct inode *, int nrblocks);
extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
loff_t lstart, loff_t lend);
extern int ext4_page_mkwrite(struct vm_fault *vmf);
extern int ext4_filemap_fault(struct vm_fault *vmf);
extern vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf);
extern vm_fault_t ext4_filemap_fault(struct vm_fault *vmf);
extern qsize_t *ext4_get_reserved_space(struct inode *inode);
extern int ext4_get_projid(struct inode *inode, kprojid_t *projid);
extern void ext4_da_release_space(struct inode *inode, int to_free);
extern void ext4_da_update_reserve_space(struct inode *inode,
int used, int quota_claim);
extern int ext4_issue_zeroout(struct inode *inode, ext4_lblk_t lblk,
......@@ -3142,10 +3148,6 @@ extern struct ext4_ext_path *ext4_find_extent(struct inode *, ext4_lblk_t,
int flags);
extern void ext4_ext_drop_refs(struct ext4_ext_path *);
extern int ext4_ext_check_inode(struct inode *inode);
extern int ext4_find_delalloc_range(struct inode *inode,
ext4_lblk_t lblk_start,
ext4_lblk_t lblk_end);
extern int ext4_find_delalloc_cluster(struct inode *inode, ext4_lblk_t lblk);
extern ext4_lblk_t ext4_ext_next_allocated_block(struct ext4_ext_path *path);
extern int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
__u64 start, __u64 len);
......@@ -3156,6 +3158,7 @@ extern int ext4_swap_extents(handle_t *handle, struct inode *inode1,
struct inode *inode2, ext4_lblk_t lblk1,
ext4_lblk_t lblk2, ext4_lblk_t count,
int mark_unwritten,int *err);
extern int ext4_clu_mapped(struct inode *inode, ext4_lblk_t lclu);
/* move_extent.c */
extern void ext4_double_down_write_data_sem(struct inode *first,
......
......@@ -119,6 +119,19 @@ struct ext4_ext_path {
struct buffer_head *p_bh;
};
/*
* Used to record a portion of a cluster found at the beginning or end
* of an extent while traversing the extent tree during space removal.
* A partial cluster may be removed if it does not contain blocks shared
* with extents that aren't being deleted (tofree state). Otherwise,
* it cannot be removed (nofree state).
*/
struct partial_cluster {
ext4_fsblk_t pclu; /* physical cluster number */
ext4_lblk_t lblk; /* logical block number within logical cluster */
enum {initial, tofree, nofree} state;
};
/*
* structure for external API
*/
......
This diff is collapsed.
This diff is collapsed.
......@@ -78,6 +78,51 @@ struct ext4_es_stats {
struct percpu_counter es_stats_shk_cnt;
};
/*
* Pending cluster reservations for bigalloc file systems
*
* A cluster with a pending reservation is a logical cluster shared by at
* least one extent in the extents status tree with delayed and unwritten
* status and at least one other written or unwritten extent. The
* reservation is said to be pending because a cluster reservation would
* have to be taken in the event all blocks in the cluster shared with
* written or unwritten extents were deleted while the delayed and
* unwritten blocks remained.
*
* The set of pending cluster reservations is an auxiliary data structure
* used with the extents status tree to implement reserved cluster/block
* accounting for bigalloc file systems. The set is kept in memory and
* records all pending cluster reservations.
*
* Its primary function is to avoid the need to read extents from the
* disk when invalidating pages as a result of a truncate, punch hole, or
* collapse range operation. Page invalidation requires a decrease in the
* reserved cluster count if it results in the removal of all delayed
* and unwritten extents (blocks) from a cluster that is not shared with a
* written or unwritten extent, and no decrease otherwise. Determining
* whether the cluster is shared can be done by searching for a pending
* reservation on it.
*
* Secondarily, it provides a potentially faster method for determining
* whether the reserved cluster count should be increased when a physical
* cluster is deallocated as a result of a truncate, punch hole, or
* collapse range operation. The necessary information is also present
* in the extents status tree, but might be more rapidly accessed in
* the pending reservation set in many cases due to smaller size.
*
* The pending cluster reservation set is implemented as a red-black tree
* with the goal of minimizing per page search time overhead.
*/
struct pending_reservation {
struct rb_node rb_node;
ext4_lblk_t lclu;
};
struct ext4_pending_tree {
struct rb_root root;
};
extern int __init ext4_init_es(void);
extern void ext4_exit_es(void);
extern void ext4_es_init_tree(struct ext4_es_tree *tree);
......@@ -90,11 +135,18 @@ extern void ext4_es_cache_extent(struct inode *inode, ext4_lblk_t lblk,
unsigned int status);
extern int ext4_es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
ext4_lblk_t len);
extern void ext4_es_find_delayed_extent_range(struct inode *inode,
ext4_lblk_t lblk, ext4_lblk_t end,
struct extent_status *es);
extern void ext4_es_find_extent_range(struct inode *inode,
int (*match_fn)(struct extent_status *es),
ext4_lblk_t lblk, ext4_lblk_t end,
struct extent_status *es);
extern int ext4_es_lookup_extent(struct inode *inode, ext4_lblk_t lblk,
struct extent_status *es);
extern bool ext4_es_scan_range(struct inode *inode,
int (*matching_fn)(struct extent_status *es),
ext4_lblk_t lblk, ext4_lblk_t end);
extern bool ext4_es_scan_clu(struct inode *inode,
int (*matching_fn)(struct extent_status *es),
ext4_lblk_t lblk);
static inline unsigned int ext4_es_status(struct extent_status *es)
{
......@@ -126,6 +178,16 @@ static inline int ext4_es_is_hole(struct extent_status *es)
return (ext4_es_type(es) & EXTENT_STATUS_HOLE) != 0;
}
static inline int ext4_es_is_mapped(struct extent_status *es)
{
return (ext4_es_is_written(es) || ext4_es_is_unwritten(es));
}
static inline int ext4_es_is_delonly(struct extent_status *es)
{
return (ext4_es_is_delayed(es) && !ext4_es_is_unwritten(es));
}
static inline void ext4_es_set_referenced(struct extent_status *es)
{
es->es_pblk |= ((ext4_fsblk_t)EXTENT_STATUS_REFERENCED) << ES_SHIFT;
......@@ -175,4 +237,16 @@ extern void ext4_es_unregister_shrinker(struct ext4_sb_info *sbi);
extern int ext4_seq_es_shrinker_info_show(struct seq_file *seq, void *v);
extern int __init ext4_init_pending(void);
extern void ext4_exit_pending(void);
extern void ext4_init_pending_tree(struct ext4_pending_tree *tree);
extern void ext4_remove_pending(struct inode *inode, ext4_lblk_t lblk);
extern bool ext4_is_pending(struct inode *inode, ext4_lblk_t lblk);
extern int ext4_es_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk,
bool allocated);
extern unsigned int ext4_es_delayed_clu(struct inode *inode, ext4_lblk_t lblk,
ext4_lblk_t len);
extern void ext4_es_remove_blks(struct inode *inode, ext4_lblk_t lblk,
ext4_lblk_t len);
#endif /* _EXT4_EXTENTS_STATUS_H */
......@@ -863,7 +863,7 @@ int ext4_da_write_inline_data_begin(struct address_space *mapping,
handle_t *handle;
struct page *page;
struct ext4_iloc iloc;
int retries;
int retries = 0;
ret = ext4_get_inode_loc(inode, &iloc);
if (ret)
......
......@@ -577,8 +577,8 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
if (!(flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) &&
!(status & EXTENT_STATUS_WRITTEN) &&
ext4_find_delalloc_range(inode, map->m_lblk,
map->m_lblk + map->m_len - 1))
ext4_es_scan_range(inode, &ext4_es_is_delayed, map->m_lblk,
map->m_lblk + map->m_len - 1))
status |= EXTENT_STATUS_DELAYED;
ret = ext4_es_insert_extent(inode, map->m_lblk,
map->m_len, map->m_pblk, status);
......@@ -701,8 +701,8 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
if (!(flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) &&
!(status & EXTENT_STATUS_WRITTEN) &&
ext4_find_delalloc_range(inode, map->m_lblk,
map->m_lblk + map->m_len - 1))
ext4_es_scan_range(inode, &ext4_es_is_delayed, map->m_lblk,
map->m_lblk + map->m_len - 1))
status |= EXTENT_STATUS_DELAYED;
ret = ext4_es_insert_extent(inode, map->m_lblk, map->m_len,
map->m_pblk, status);
......@@ -1595,7 +1595,7 @@ static int ext4_da_reserve_space(struct inode *inode)
return 0; /* success */
}
static void ext4_da_release_space(struct inode *inode, int to_free)
void ext4_da_release_space(struct inode *inode, int to_free)
{
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
struct ext4_inode_info *ei = EXT4_I(inode);
......@@ -1634,13 +1634,11 @@ static void ext4_da_page_release_reservation(struct page *page,
unsigned int offset,
unsigned int length)
{
int to_release = 0, contiguous_blks = 0;
int contiguous_blks = 0;
struct buffer_head *head, *bh;
unsigned int curr_off = 0;
struct inode *inode = page->mapping->host;
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
unsigned int stop = offset + length;
int num_clusters;
ext4_fsblk_t lblk;
BUG_ON(stop > PAGE_SIZE || stop < length);
......@@ -1654,7 +1652,6 @@ static void ext4_da_page_release_reservation(struct page *page,
break;
if ((offset <= curr_off) && (buffer_delay(bh))) {
to_release++;
contiguous_blks++;
clear_buffer_delay(bh);
} else if (contiguous_blks) {
......@@ -1662,7 +1659,7 @@ static void ext4_da_page_release_reservation(struct page *page,
(PAGE_SHIFT - inode->i_blkbits);
lblk += (curr_off >> inode->i_blkbits) -
contiguous_blks;
ext4_es_remove_extent(inode, lblk, contiguous_blks);
ext4_es_remove_blks(inode, lblk, contiguous_blks);
contiguous_blks = 0;
}
curr_off = next_off;
......@@ -1671,21 +1668,9 @@ static void ext4_da_page_release_reservation(struct page *page,
if (contiguous_blks) {
lblk = page->index << (PAGE_SHIFT - inode->i_blkbits);
lblk += (curr_off >> inode->i_blkbits) - contiguous_blks;
ext4_es_remove_extent(inode, lblk, contiguous_blks);
ext4_es_remove_blks(inode, lblk, contiguous_blks);
}
/* If we have released all the blocks belonging to a cluster, then we
* need to release the reserved space for that cluster. */
num_clusters = EXT4_NUM_B2C(sbi, to_release);
while (num_clusters > 0) {
lblk = (page->index << (PAGE_SHIFT - inode->i_blkbits)) +
((num_clusters - 1) << sbi->s_cluster_bits);
if (sbi->s_cluster_ratio == 1 ||
!ext4_find_delalloc_cluster(inode, lblk))
ext4_da_release_space(inode, 1);
num_clusters--;
}
}
/*
......@@ -1780,6 +1765,65 @@ static int ext4_bh_delay_or_unwritten(handle_t *handle, struct buffer_head *bh)
return (buffer_delay(bh) || buffer_unwritten(bh)) && buffer_dirty(bh);
}
/*
* ext4_insert_delayed_block - adds a delayed block to the extents status
* tree, incrementing the reserved cluster/block
* count or making a pending reservation
* where needed
*
* @inode - file containing the newly added block
* @lblk - logical block to be added
*
* Returns 0 on success, negative error code on failure.
*/
static int ext4_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk)
{
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
int ret;
bool allocated = false;
/*
* If the cluster containing lblk is shared with a delayed,
* written, or unwritten extent in a bigalloc file system, it's
* already been accounted for and does not need to be reserved.
* A pending reservation must be made for the cluster if it's
* shared with a written or unwritten extent and doesn't already
* have one. Written and unwritten extents can be purged from the
* extents status tree if the system is under memory pressure, so
* it's necessary to examine the extent tree if a search of the
* extents status tree doesn't get a match.
*/
if (sbi->s_cluster_ratio == 1) {
ret = ext4_da_reserve_space(inode);
if (ret != 0) /* ENOSPC */
goto errout;
} else { /* bigalloc */
if (!ext4_es_scan_clu(inode, &ext4_es_is_delonly, lblk)) {
if (!ext4_es_scan_clu(inode,
&ext4_es_is_mapped, lblk)) {
ret = ext4_clu_mapped(inode,
EXT4_B2C(sbi, lblk));
if (ret < 0)
goto errout;
if (ret == 0) {
ret = ext4_da_reserve_space(inode);
if (ret != 0) /* ENOSPC */
goto errout;
} else {
allocated = true;
}
} else {
allocated = true;
}
}
}
ret = ext4_es_insert_delayed_block(inode, lblk, allocated);
errout:
return ret;
}
/*
* This function is grabs code from the very beginning of
* ext4_map_blocks, but assumes that the caller is from delayed write
......@@ -1859,28 +1903,14 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
add_delayed:
if (retval == 0) {
int ret;
/*
* XXX: __block_prepare_write() unmaps passed block,
* is it OK?
*/
/*
* If the block was allocated from previously allocated cluster,
* then we don't need to reserve it again. However we still need
* to reserve metadata for every block we're going to write.
*/
if (EXT4_SB(inode->i_sb)->s_cluster_ratio == 1 ||
!ext4_find_delalloc_cluster(inode, map->m_lblk)) {
ret = ext4_da_reserve_space(inode);
if (ret) {
/* not enough space to reserve */
retval = ret;
goto out_unlock;
}
}
ret = ext4_es_insert_extent(inode, map->m_lblk, map->m_len,
~0, EXTENT_STATUS_DELAYED);
if (ret) {
ret = ext4_insert_delayed_block(inode, map->m_lblk);
if (ret != 0) {
retval = ret;
goto out_unlock;
}
......@@ -3450,7 +3480,8 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
ext4_lblk_t end = map.m_lblk + map.m_len - 1;
struct extent_status es;
ext4_es_find_delayed_extent_range(inode, map.m_lblk, end, &es);
ext4_es_find_extent_range(inode, &ext4_es_is_delayed,
map.m_lblk, end, &es);
if (!es.es_len || es.es_lblk > end) {
/* entire range is a hole */
......@@ -6153,13 +6184,14 @@ static int ext4_bh_unmapped(handle_t *handle, struct buffer_head *bh)
return !buffer_mapped(bh);
}
int ext4_page_mkwrite(struct vm_fault *vmf)
vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf)
{
struct vm_area_struct *vma = vmf->vma;
struct page *page = vmf->page;
loff_t size;
unsigned long len;
int ret;
int err;
vm_fault_t ret;
struct file *file = vma->vm_file;
struct inode *inode = file_inode(file);
struct address_space *mapping = inode->i_mapping;
......@@ -6172,8 +6204,8 @@ int ext4_page_mkwrite(struct vm_fault *vmf)
down_read(&EXT4_I(inode)->i_mmap_sem);
ret = ext4_convert_inline_data(inode);
if (ret)
err = ext4_convert_inline_data(inode);
if (err)
goto out_ret;
/* Delalloc case is easy... */
......@@ -6181,9 +6213,9 @@ int ext4_page_mkwrite(struct vm_fault *vmf)
!ext4_should_journal_data(inode) &&
!ext4_nonda_switch(inode->i_sb)) {
do {
ret = block_page_mkwrite(vma, vmf,
err = block_page_mkwrite(vma, vmf,
ext4_da_get_block_prep);
} while (ret == -ENOSPC &&
} while (err == -ENOSPC &&
ext4_should_retry_alloc(inode->i_sb, &retries));
goto out_ret;
}
......@@ -6228,8 +6260,8 @@ int ext4_page_mkwrite(struct vm_fault *vmf)
ret = VM_FAULT_SIGBUS;
goto out;
}
ret = block_page_mkwrite(vma, vmf, get_block);
if (!ret && ext4_should_journal_data(inode)) {
err = block_page_mkwrite(vma, vmf, get_block);
if (!err && ext4_should_journal_data(inode)) {
if (ext4_walk_page_buffers(handle, page_buffers(page), 0,
PAGE_SIZE, NULL, do_journal_get_write_access)) {
unlock_page(page);
......@@ -6240,24 +6272,24 @@ int ext4_page_mkwrite(struct vm_fault *vmf)
ext4_set_inode_state(inode, EXT4_STATE_JDATA);
}
ext4_journal_stop(handle);
if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
if (err == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
goto retry_alloc;
out_ret:
ret = block_page_mkwrite_return(ret);
ret = block_page_mkwrite_return(err);
out:
up_read(&EXT4_I(inode)->i_mmap_sem);
sb_end_pagefault(inode->i_sb);
return ret;
}
int ext4_filemap_fault(struct vm_fault *vmf)
vm_fault_t ext4_filemap_fault(struct vm_fault *vmf)
{
struct inode *inode = file_inode(vmf->vma->vm_file);
int err;
vm_fault_t ret;
down_read(&EXT4_I(inode)->i_mmap_sem);
err = filemap_fault(vmf);
ret = filemap_fault(vmf);
up_read(&EXT4_I(inode)->i_mmap_sem);
return err;
return ret;
}
......@@ -67,7 +67,6 @@ static void swap_inode_data(struct inode *inode1, struct inode *inode2)
ei1 = EXT4_I(inode1);
ei2 = EXT4_I(inode2);
swap(inode1->i_flags, inode2->i_flags);
swap(inode1->i_version, inode2->i_version);
swap(inode1->i_blocks, inode2->i_blocks);
swap(inode1->i_bytes, inode2->i_bytes);
......@@ -85,6 +84,21 @@ static void swap_inode_data(struct inode *inode1, struct inode *inode2)
i_size_write(inode2, isize);
}
static void reset_inode_seed(struct inode *inode)
{
struct ext4_inode_info *ei = EXT4_I(inode);
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
__le32 inum = cpu_to_le32(inode->i_ino);
__le32 gen = cpu_to_le32(inode->i_generation);
__u32 csum;
if (!ext4_has_metadata_csum(inode->i_sb))
return;
csum = ext4_chksum(sbi, sbi->s_csum_seed, (__u8 *)&inum, sizeof(inum));
ei->i_csum_seed = ext4_chksum(sbi, csum, (__u8 *)&gen, sizeof(gen));
}
/**
* Swap the information from the given @inode and the inode
* EXT4_BOOT_LOADER_INO. It will basically swap i_data and all other
......@@ -102,10 +116,13 @@ static long swap_inode_boot_loader(struct super_block *sb,
struct inode *inode_bl;
struct ext4_inode_info *ei_bl;
if (inode->i_nlink != 1 || !S_ISREG(inode->i_mode))
if (inode->i_nlink != 1 || !S_ISREG(inode->i_mode) ||
IS_SWAPFILE(inode) || IS_ENCRYPTED(inode) ||
ext4_has_inline_data(inode))
return -EINVAL;
if (!inode_owner_or_capable(inode) || !capable(CAP_SYS_ADMIN))
if (IS_RDONLY(inode) || IS_APPEND(inode) || IS_IMMUTABLE(inode) ||
!inode_owner_or_capable(inode) || !capable(CAP_SYS_ADMIN))
return -EPERM;
inode_bl = ext4_iget(sb, EXT4_BOOT_LOADER_INO);
......@@ -120,13 +137,13 @@ static long swap_inode_boot_loader(struct super_block *sb,
* that only 1 swap_inode_boot_loader is running. */
lock_two_nondirectories(inode, inode_bl);
truncate_inode_pages(&inode->i_data, 0);
truncate_inode_pages(&inode_bl->i_data, 0);
/* Wait for all existing dio workers */
inode_dio_wait(inode);
inode_dio_wait(inode_bl);
truncate_inode_pages(&inode->i_data, 0);
truncate_inode_pages(&inode_bl->i_data, 0);
handle = ext4_journal_start(inode_bl, EXT4_HT_MOVE_EXTENTS, 2);
if (IS_ERR(handle)) {
err = -EINVAL;
......@@ -159,6 +176,8 @@ static long swap_inode_boot_loader(struct super_block *sb,
inode->i_generation = prandom_u32();
inode_bl->i_generation = prandom_u32();
reset_inode_seed(inode);
reset_inode_seed(inode_bl);
ext4_discard_preallocations(inode);
......@@ -169,6 +188,7 @@ static long swap_inode_boot_loader(struct super_block *sb,
inode->i_ino, err);
/* Revert all changes: */
swap_inode_data(inode, inode_bl);
ext4_mark_inode_dirty(handle, inode);
} else {
err = ext4_mark_inode_dirty(handle, inode_bl);
if (err < 0) {
......@@ -178,6 +198,7 @@ static long swap_inode_boot_loader(struct super_block *sb,
/* Revert all changes: */
swap_inode_data(inode, inode_bl);
ext4_mark_inode_dirty(handle, inode);
ext4_mark_inode_dirty(handle, inode_bl);
}
}
ext4_journal_stop(handle);
......@@ -339,19 +360,14 @@ static int ext4_ioctl_setproject(struct file *filp, __u32 projid)
if (projid_eq(kprojid, EXT4_I(inode)->i_projid))
return 0;
err = mnt_want_write_file(filp);
if (err)
return err;
err = -EPERM;
inode_lock(inode);
/* Is it quota file? Do not allow user to mess with it */
if (ext4_is_quota_file(inode))
goto out_unlock;
return err;
err = ext4_get_inode_loc(inode, &iloc);
if (err)
goto out_unlock;
return err;
raw_inode = ext4_raw_inode(&iloc);
if (!EXT4_FITS_IN_INODE(raw_inode, ei, i_projid)) {
......@@ -359,20 +375,20 @@ static int ext4_ioctl_setproject(struct file *filp, __u32 projid)
EXT4_SB(sb)->s_want_extra_isize,
&iloc);
if (err)
goto out_unlock;
return err;
} else {
brelse(iloc.bh);
}
dquot_initialize(inode);
err = dquot_initialize(inode);
if (err)
return err;
handle = ext4_journal_start(inode, EXT4_HT_QUOTA,
EXT4_QUOTA_INIT_BLOCKS(sb) +
EXT4_QUOTA_DEL_BLOCKS(sb) + 3);
if (IS_ERR(handle)) {
err = PTR_ERR(handle);
goto out_unlock;
}
if (IS_ERR(handle))
return PTR_ERR(handle);
err = ext4_reserve_inode_write(handle, inode, &iloc);
if (err)
......@@ -400,9 +416,6 @@ static int ext4_ioctl_setproject(struct file *filp, __u32 projid)
err = rc;
out_stop:
ext4_journal_stop(handle);
out_unlock:
inode_unlock(inode);
mnt_drop_write_file(filp);
return err;
}
#else
......@@ -626,6 +639,30 @@ static long ext4_ioctl_group_add(struct file *file,
return err;
}
static int ext4_ioctl_check_project(struct inode *inode, struct fsxattr *fa)
{
/*
* Project Quota ID state is only allowed to change from within the init
* namespace. Enforce that restriction only if we are trying to change
* the quota ID state. Everything else is allowed in user namespaces.
*/
if (current_user_ns() == &init_user_ns)
return 0;
if (__kprojid_val(EXT4_I(inode)->i_projid) != fa->fsx_projid)
return -EINVAL;
if (ext4_test_inode_flag(inode, EXT4_INODE_PROJINHERIT)) {
if (!(fa->fsx_xflags & FS_XFLAG_PROJINHERIT))
return -EINVAL;
} else {
if (fa->fsx_xflags & FS_XFLAG_PROJINHERIT)
return -EINVAL;
}
return 0;
}
long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
{
struct inode *inode = file_inode(filp);
......@@ -1025,19 +1062,19 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
return err;
inode_lock(inode);
err = ext4_ioctl_check_project(inode, &fa);
if (err)
goto out;
flags = (ei->i_flags & ~EXT4_FL_XFLAG_VISIBLE) |
(flags & EXT4_FL_XFLAG_VISIBLE);
err = ext4_ioctl_setflags(inode, flags);
inode_unlock(inode);
mnt_drop_write_file(filp);
if (err)
return err;
goto out;
err = ext4_ioctl_setproject(filp, fa.fsx_projid);
if (err)
return err;
return 0;
out:
inode_unlock(inode);
mnt_drop_write_file(filp);
return err;
}
case EXT4_IOC_SHUTDOWN:
return ext4_shutdown(sb, arg);
......
......@@ -4915,9 +4915,17 @@ void ext4_free_blocks(handle_t *handle, struct inode *inode,
&sbi->s_flex_groups[flex_group].free_clusters);
}
if (!(flags & EXT4_FREE_BLOCKS_NO_QUOT_UPDATE))
dquot_free_block(inode, EXT4_C2B(sbi, count_clusters));
percpu_counter_add(&sbi->s_freeclusters_counter, count_clusters);
/*
* on a bigalloc file system, defer the s_freeclusters_counter
* update to the caller (ext4_remove_space and friends) so they
* can determine if a cluster freed here should be rereserved
*/
if (!(flags & EXT4_FREE_BLOCKS_RERESERVE_CLUSTER)) {
if (!(flags & EXT4_FREE_BLOCKS_NO_QUOT_UPDATE))
dquot_free_block(inode, EXT4_C2B(sbi, count_clusters));
percpu_counter_add(&sbi->s_freeclusters_counter,
count_clusters);
}
ext4_mb_unload_buddy(&e4b);
......
......@@ -516,9 +516,13 @@ mext_check_arguments(struct inode *orig_inode,
orig_inode->i_ino, donor_inode->i_ino);
return -EINVAL;
}
if (orig_eof < orig_start + *len - 1)
if (orig_eof <= orig_start)
*len = 0;
else if (orig_eof < orig_start + *len - 1)
*len = orig_eof - orig_start;
if (donor_eof < donor_start + *len - 1)
if (donor_eof <= donor_start)
*len = 0;
else if (donor_eof < donor_start + *len - 1)
*len = donor_eof - donor_start;
if (!*len) {
ext4_debug("ext4 move extent: len should not be 0 "
......
......@@ -2261,7 +2261,7 @@ static int ext4_dx_add_entry(handle_t *handle, struct ext4_filename *fname,
dxroot->info.indirect_levels += 1;
dxtrace(printk(KERN_DEBUG
"Creating %d level index...\n",
info->indirect_levels));
dxroot->info.indirect_levels));
err = ext4_handle_dirty_dx_node(handle, dir, frame->bh);
if (err)
goto journal_error;
......
......@@ -914,6 +914,18 @@ static inline void ext4_quota_off_umount(struct super_block *sb)
for (type = 0; type < EXT4_MAXQUOTAS; type++)
ext4_quota_off(sb, type);
}
/*
* This is a helper function which is used in the mount/remount
* codepaths (which holds s_umount) to fetch the quota file name.
*/
static inline char *get_qf_name(struct super_block *sb,
struct ext4_sb_info *sbi,
int type)
{
return rcu_dereference_protected(sbi->s_qf_names[type],
lockdep_is_held(&sb->s_umount));
}
#else
static inline void ext4_quota_off_umount(struct super_block *sb)
{
......@@ -965,7 +977,7 @@ static void ext4_put_super(struct super_block *sb)
percpu_free_rwsem(&sbi->s_journal_flag_rwsem);
#ifdef CONFIG_QUOTA
for (i = 0; i < EXT4_MAXQUOTAS; i++)
kfree(sbi->s_qf_names[i]);
kfree(get_qf_name(sb, sbi, i));
#endif
/* Debugging code just in case the in-memory inode orphan list
......@@ -1040,6 +1052,7 @@ static struct inode *ext4_alloc_inode(struct super_block *sb)
ei->i_da_metadata_calc_len = 0;
ei->i_da_metadata_calc_last_lblock = 0;
spin_lock_init(&(ei->i_block_reservation_lock));
ext4_init_pending_tree(&ei->i_pending_tree);
#ifdef CONFIG_QUOTA
ei->i_reserved_quota = 0;
memset(&ei->i_dquot, 0, sizeof(ei->i_dquot));
......@@ -1530,11 +1543,10 @@ static const char deprecated_msg[] =
static int set_qf_name(struct super_block *sb, int qtype, substring_t *args)
{
struct ext4_sb_info *sbi = EXT4_SB(sb);
char *qname;
char *qname, *old_qname = get_qf_name(sb, sbi, qtype);
int ret = -1;
if (sb_any_quota_loaded(sb) &&
!sbi->s_qf_names[qtype]) {
if (sb_any_quota_loaded(sb) && !old_qname) {
ext4_msg(sb, KERN_ERR,
"Cannot change journaled "
"quota options when quota turned on");
......@@ -1551,8 +1563,8 @@ static int set_qf_name(struct super_block *sb, int qtype, substring_t *args)
"Not enough memory for storing quotafile name");
return -1;
}
if (sbi->s_qf_names[qtype]) {
if (strcmp(sbi->s_qf_names[qtype], qname) == 0)
if (old_qname) {
if (strcmp(old_qname, qname) == 0)
ret = 1;
else
ext4_msg(sb, KERN_ERR,
......@@ -1565,7 +1577,7 @@ static int set_qf_name(struct super_block *sb, int qtype, substring_t *args)
"quotafile must be on filesystem root");
goto errout;
}
sbi->s_qf_names[qtype] = qname;
rcu_assign_pointer(sbi->s_qf_names[qtype], qname);
set_opt(sb, QUOTA);
return 1;
errout:
......@@ -1577,15 +1589,16 @@ static int clear_qf_name(struct super_block *sb, int qtype)
{
struct ext4_sb_info *sbi = EXT4_SB(sb);
char *old_qname = get_qf_name(sb, sbi, qtype);
if (sb_any_quota_loaded(sb) &&
sbi->s_qf_names[qtype]) {
if (sb_any_quota_loaded(sb) && old_qname) {
ext4_msg(sb, KERN_ERR, "Cannot change journaled quota options"
" when quota turned on");
return -1;
}
kfree(sbi->s_qf_names[qtype]);
sbi->s_qf_names[qtype] = NULL;
rcu_assign_pointer(sbi->s_qf_names[qtype], NULL);
synchronize_rcu();
kfree(old_qname);
return 1;
}
#endif
......@@ -1960,7 +1973,7 @@ static int parse_options(char *options, struct super_block *sb,
int is_remount)
{
struct ext4_sb_info *sbi = EXT4_SB(sb);
char *p;
char *p, __maybe_unused *usr_qf_name, __maybe_unused *grp_qf_name;
substring_t args[MAX_OPT_ARGS];
int token;
......@@ -1991,11 +2004,13 @@ static int parse_options(char *options, struct super_block *sb,
"Cannot enable project quota enforcement.");
return 0;
}
if (sbi->s_qf_names[USRQUOTA] || sbi->s_qf_names[GRPQUOTA]) {
if (test_opt(sb, USRQUOTA) && sbi->s_qf_names[USRQUOTA])
usr_qf_name = get_qf_name(sb, sbi, USRQUOTA);
grp_qf_name = get_qf_name(sb, sbi, GRPQUOTA);
if (usr_qf_name || grp_qf_name) {
if (test_opt(sb, USRQUOTA) && usr_qf_name)
clear_opt(sb, USRQUOTA);
if (test_opt(sb, GRPQUOTA) && sbi->s_qf_names[GRPQUOTA])
if (test_opt(sb, GRPQUOTA) && grp_qf_name)
clear_opt(sb, GRPQUOTA);
if (test_opt(sb, GRPQUOTA) || test_opt(sb, USRQUOTA)) {
......@@ -2029,6 +2044,7 @@ static inline void ext4_show_quota_options(struct seq_file *seq,
{
#if defined(CONFIG_QUOTA)
struct ext4_sb_info *sbi = EXT4_SB(sb);
char *usr_qf_name, *grp_qf_name;
if (sbi->s_jquota_fmt) {
char *fmtname = "";
......@@ -2047,11 +2063,14 @@ static inline void ext4_show_quota_options(struct seq_file *seq,
seq_printf(seq, ",jqfmt=%s", fmtname);
}
if (sbi->s_qf_names[USRQUOTA])
seq_show_option(seq, "usrjquota", sbi->s_qf_names[USRQUOTA]);
if (sbi->s_qf_names[GRPQUOTA])
seq_show_option(seq, "grpjquota", sbi->s_qf_names[GRPQUOTA]);
rcu_read_lock();
usr_qf_name = rcu_dereference(sbi->s_qf_names[USRQUOTA]);
grp_qf_name = rcu_dereference(sbi->s_qf_names[GRPQUOTA]);
if (usr_qf_name)
seq_show_option(seq, "usrjquota", usr_qf_name);
if (grp_qf_name)
seq_show_option(seq, "grpjquota", grp_qf_name);
rcu_read_unlock();
#endif
}
......@@ -5103,6 +5122,7 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data)
int err = 0;
#ifdef CONFIG_QUOTA
int i, j;
char *to_free[EXT4_MAXQUOTAS];
#endif
char *orig_data = kstrdup(data, GFP_KERNEL);
......@@ -5122,8 +5142,9 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data)
old_opts.s_jquota_fmt = sbi->s_jquota_fmt;
for (i = 0; i < EXT4_MAXQUOTAS; i++)
if (sbi->s_qf_names[i]) {
old_opts.s_qf_names[i] = kstrdup(sbi->s_qf_names[i],
GFP_KERNEL);
char *qf_name = get_qf_name(sb, sbi, i);
old_opts.s_qf_names[i] = kstrdup(qf_name, GFP_KERNEL);
if (!old_opts.s_qf_names[i]) {
for (j = 0; j < i; j++)
kfree(old_opts.s_qf_names[j]);
......@@ -5352,9 +5373,12 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data)
#ifdef CONFIG_QUOTA
sbi->s_jquota_fmt = old_opts.s_jquota_fmt;
for (i = 0; i < EXT4_MAXQUOTAS; i++) {
kfree(sbi->s_qf_names[i]);
sbi->s_qf_names[i] = old_opts.s_qf_names[i];
to_free[i] = get_qf_name(sb, sbi, i);
rcu_assign_pointer(sbi->s_qf_names[i], old_opts.s_qf_names[i]);
}
synchronize_rcu();
for (i = 0; i < EXT4_MAXQUOTAS; i++)
kfree(to_free[i]);
#endif
kfree(orig_data);
return err;
......@@ -5545,7 +5569,7 @@ static int ext4_write_info(struct super_block *sb, int type)
*/
static int ext4_quota_on_mount(struct super_block *sb, int type)
{
return dquot_quota_on_mount(sb, EXT4_SB(sb)->s_qf_names[type],
return dquot_quota_on_mount(sb, get_qf_name(sb, EXT4_SB(sb), type),
EXT4_SB(sb)->s_jquota_fmt, type);
}
......@@ -5954,6 +5978,10 @@ static int __init ext4_init_fs(void)
if (err)
return err;
err = ext4_init_pending();
if (err)
goto out6;
err = ext4_init_pageio();
if (err)
goto out5;
......@@ -5992,6 +6020,8 @@ static int __init ext4_init_fs(void)
out4:
ext4_exit_pageio();
out5:
ext4_exit_pending();
out6:
ext4_exit_es();
return err;
......@@ -6009,6 +6039,7 @@ static void __exit ext4_exit_fs(void)
ext4_exit_system_zone();
ext4_exit_pageio();
ext4_exit_es();
ext4_exit_pending();
}
MODULE_AUTHOR("Remy Card, Stephen Tweedie, Andrew Morton, Andreas Dilger, Theodore Ts'o and others");
......
......@@ -251,8 +251,8 @@ int jbd2_log_do_checkpoint(journal_t *journal)
bh = jh2bh(jh);
if (buffer_locked(bh)) {
spin_unlock(&journal->j_list_lock);
get_bh(bh);
spin_unlock(&journal->j_list_lock);
wait_on_buffer(bh);
/* the journal_head may have gone by now */
BUFFER_TRACE(bh, "brelse");
......@@ -333,8 +333,8 @@ int jbd2_log_do_checkpoint(journal_t *journal)
jh = transaction->t_checkpoint_io_list;
bh = jh2bh(jh);
if (buffer_locked(bh)) {
spin_unlock(&journal->j_list_lock);
get_bh(bh);
spin_unlock(&journal->j_list_lock);
wait_on_buffer(bh);
/* the journal_head may have gone by now */
BUFFER_TRACE(bh, "brelse");
......
......@@ -242,7 +242,7 @@ int block_commit_write(struct page *page, unsigned from, unsigned to);
int block_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf,
get_block_t get_block);
/* Convert errno to return value from ->page_mkwrite() call */
static inline int block_page_mkwrite_return(int err)
static inline vm_fault_t block_page_mkwrite_return(int err)
{
if (err == 0)
return VM_FAULT_LOCKED;
......
......@@ -17,6 +17,7 @@ struct mpage_da_data;
struct ext4_map_blocks;
struct extent_status;
struct ext4_fsmap;
struct partial_cluster;
#define EXT4_I(inode) (container_of(inode, struct ext4_inode_info, vfs_inode))
......@@ -2035,21 +2036,23 @@ TRACE_EVENT(ext4_ext_show_extent,
);
TRACE_EVENT(ext4_remove_blocks,
TP_PROTO(struct inode *inode, struct ext4_extent *ex,
ext4_lblk_t from, ext4_fsblk_t to,
long long partial_cluster),
TP_PROTO(struct inode *inode, struct ext4_extent *ex,
ext4_lblk_t from, ext4_fsblk_t to,
struct partial_cluster *pc),
TP_ARGS(inode, ex, from, to, partial_cluster),
TP_ARGS(inode, ex, from, to, pc),
TP_STRUCT__entry(
__field( dev_t, dev )
__field( ino_t, ino )
__field( ext4_lblk_t, from )
__field( ext4_lblk_t, to )
__field( long long, partial )
__field( ext4_fsblk_t, ee_pblk )
__field( ext4_lblk_t, ee_lblk )
__field( unsigned short, ee_len )
__field( ext4_fsblk_t, pc_pclu )
__field( ext4_lblk_t, pc_lblk )
__field( int, pc_state)
),
TP_fast_assign(
......@@ -2057,14 +2060,16 @@ TRACE_EVENT(ext4_remove_blocks,
__entry->ino = inode->i_ino;
__entry->from = from;
__entry->to = to;
__entry->partial = partial_cluster;
__entry->ee_pblk = ext4_ext_pblock(ex);
__entry->ee_lblk = le32_to_cpu(ex->ee_block);
__entry->ee_len = ext4_ext_get_actual_len(ex);
__entry->pc_pclu = pc->pclu;
__entry->pc_lblk = pc->lblk;
__entry->pc_state = pc->state;
),
TP_printk("dev %d,%d ino %lu extent [%u(%llu), %u]"
"from %u to %u partial_cluster %lld",
"from %u to %u partial [pclu %lld lblk %u state %d]",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
(unsigned) __entry->ee_lblk,
......@@ -2072,45 +2077,53 @@ TRACE_EVENT(ext4_remove_blocks,
(unsigned short) __entry->ee_len,
(unsigned) __entry->from,
(unsigned) __entry->to,
(long long) __entry->partial)
(long long) __entry->pc_pclu,
(unsigned int) __entry->pc_lblk,
(int) __entry->pc_state)
);
TRACE_EVENT(ext4_ext_rm_leaf,
TP_PROTO(struct inode *inode, ext4_lblk_t start,
struct ext4_extent *ex,
long long partial_cluster),
struct partial_cluster *pc),
TP_ARGS(inode, start, ex, partial_cluster),
TP_ARGS(inode, start, ex, pc),
TP_STRUCT__entry(
__field( dev_t, dev )
__field( ino_t, ino )
__field( long long, partial )
__field( ext4_lblk_t, start )
__field( ext4_lblk_t, ee_lblk )
__field( ext4_fsblk_t, ee_pblk )
__field( short, ee_len )
__field( ext4_fsblk_t, pc_pclu )
__field( ext4_lblk_t, pc_lblk )
__field( int, pc_state)
),
TP_fast_assign(
__entry->dev = inode->i_sb->s_dev;
__entry->ino = inode->i_ino;
__entry->partial = partial_cluster;
__entry->start = start;
__entry->ee_lblk = le32_to_cpu(ex->ee_block);
__entry->ee_pblk = ext4_ext_pblock(ex);
__entry->ee_len = ext4_ext_get_actual_len(ex);
__entry->pc_pclu = pc->pclu;
__entry->pc_lblk = pc->lblk;
__entry->pc_state = pc->state;
),
TP_printk("dev %d,%d ino %lu start_lblk %u last_extent [%u(%llu), %u]"
"partial_cluster %lld",
"partial [pclu %lld lblk %u state %d]",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
(unsigned) __entry->start,
(unsigned) __entry->ee_lblk,
(unsigned long long) __entry->ee_pblk,
(unsigned short) __entry->ee_len,
(long long) __entry->partial)
(long long) __entry->pc_pclu,
(unsigned int) __entry->pc_lblk,
(int) __entry->pc_state)
);
TRACE_EVENT(ext4_ext_rm_idx,
......@@ -2168,9 +2181,9 @@ TRACE_EVENT(ext4_ext_remove_space,
TRACE_EVENT(ext4_ext_remove_space_done,
TP_PROTO(struct inode *inode, ext4_lblk_t start, ext4_lblk_t end,
int depth, long long partial, __le16 eh_entries),
int depth, struct partial_cluster *pc, __le16 eh_entries),
TP_ARGS(inode, start, end, depth, partial, eh_entries),
TP_ARGS(inode, start, end, depth, pc, eh_entries),
TP_STRUCT__entry(
__field( dev_t, dev )
......@@ -2178,7 +2191,9 @@ TRACE_EVENT(ext4_ext_remove_space_done,
__field( ext4_lblk_t, start )
__field( ext4_lblk_t, end )
__field( int, depth )
__field( long long, partial )
__field( ext4_fsblk_t, pc_pclu )
__field( ext4_lblk_t, pc_lblk )
__field( int, pc_state )
__field( unsigned short, eh_entries )
),
......@@ -2188,18 +2203,23 @@ TRACE_EVENT(ext4_ext_remove_space_done,
__entry->start = start;
__entry->end = end;
__entry->depth = depth;
__entry->partial = partial;
__entry->pc_pclu = pc->pclu;
__entry->pc_lblk = pc->lblk;
__entry->pc_state = pc->state;
__entry->eh_entries = le16_to_cpu(eh_entries);
),
TP_printk("dev %d,%d ino %lu since %u end %u depth %d partial %lld "
TP_printk("dev %d,%d ino %lu since %u end %u depth %d "
"partial [pclu %lld lblk %u state %d] "
"remaining_entries %u",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
(unsigned) __entry->start,
(unsigned) __entry->end,
__entry->depth,
(long long) __entry->partial,
(long long) __entry->pc_pclu,
(unsigned int) __entry->pc_lblk,
(int) __entry->pc_state,
(unsigned short) __entry->eh_entries)
);
......@@ -2270,7 +2290,7 @@ TRACE_EVENT(ext4_es_remove_extent,
__entry->lblk, __entry->len)
);
TRACE_EVENT(ext4_es_find_delayed_extent_range_enter,
TRACE_EVENT(ext4_es_find_extent_range_enter,
TP_PROTO(struct inode *inode, ext4_lblk_t lblk),
TP_ARGS(inode, lblk),
......@@ -2292,7 +2312,7 @@ TRACE_EVENT(ext4_es_find_delayed_extent_range_enter,
(unsigned long) __entry->ino, __entry->lblk)
);
TRACE_EVENT(ext4_es_find_delayed_extent_range_exit,
TRACE_EVENT(ext4_es_find_extent_range_exit,
TP_PROTO(struct inode *inode, struct extent_status *es),
TP_ARGS(inode, es),
......@@ -2512,6 +2532,41 @@ TRACE_EVENT(ext4_es_shrink,
__entry->scan_time, __entry->nr_skipped, __entry->retried)
);
TRACE_EVENT(ext4_es_insert_delayed_block,
TP_PROTO(struct inode *inode, struct extent_status *es,
bool allocated),
TP_ARGS(inode, es, allocated),
TP_STRUCT__entry(
__field( dev_t, dev )
__field( ino_t, ino )
__field( ext4_lblk_t, lblk )
__field( ext4_lblk_t, len )
__field( ext4_fsblk_t, pblk )
__field( char, status )
__field( bool, allocated )
),
TP_fast_assign(
__entry->dev = inode->i_sb->s_dev;
__entry->ino = inode->i_ino;
__entry->lblk = es->es_lblk;
__entry->len = es->es_len;
__entry->pblk = ext4_es_pblock(es);
__entry->status = ext4_es_status(es);
__entry->allocated = allocated;
),
TP_printk("dev %d,%d ino %lu es [%u/%u) mapped %llu status %s "
"allocated %d",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
__entry->lblk, __entry->len,
__entry->pblk, show_extent_status(__entry->status),
__entry->allocated)
);
/* fsmap traces */
DECLARE_EVENT_CLASS(ext4_fsmap_class,
TP_PROTO(struct super_block *sb, u32 keydev, u32 agno, u64 bno, u64 len,
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment