1. 23 Dec, 2009 12 commits
    • Dmitry Monakhov's avatar
      ext4: fix sleep inside spinlock issue with quota and dealloc (#14739) · 39bc680a
      Dmitry Monakhov authored
      Unlock i_block_reservation_lock before vfs_dq_reserve_block().
      This patch fixes http://bugzilla.kernel.org/show_bug.cgi?id=14739
      
      CC: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      39bc680a
    • Dmitry Monakhov's avatar
      ext4: Fix potential quota deadlock · d21cd8f1
      Dmitry Monakhov authored
      We have to delay vfs_dq_claim_space() until allocation context destruction.
      Currently we have following call-trace:
      ext4_mb_new_blocks()
        /* task is already holding ac->alloc_semp */
       ->ext4_mb_mark_diskspace_used
          ->vfs_dq_claim_space()  /*  acquire dqptr_sem here. Possible deadlock */
       ->ext4_mb_release_context() /* drop ac->alloc_semp here */
      
      Let's move quota claiming to ext4_da_update_reserve_space()
      
       =======================================================
       [ INFO: possible circular locking dependency detected ]
       2.6.32-rc7 #18
       -------------------------------------------------------
       write-truncate-/3465 is trying to acquire lock:
        (&s->s_dquot.dqptr_sem){++++..}, at: [<c025e73b>] dquot_claim_space+0x3b/0x1b0
      
       but task is already holding lock:
        (&meta_group_info[i]->alloc_sem){++++..}, at: [<c02ce962>] ext4_mb_load_buddy+0xb2/0x370
      
       which lock already depends on the new lock.
      
       the existing dependency chain (in reverse order) is:
      
       -> #3 (&meta_group_info[i]->alloc_sem){++++..}:
              [<c017d04b>] __lock_acquire+0xd7b/0x1260
              [<c017d5ea>] lock_acquire+0xba/0xd0
              [<c0527191>] down_read+0x51/0x90
              [<c02ce962>] ext4_mb_load_buddy+0xb2/0x370
              [<c02d0c1c>] ext4_mb_free_blocks+0x46c/0x870
              [<c029c9d3>] ext4_free_blocks+0x73/0x130
              [<c02c8cfc>] ext4_ext_truncate+0x76c/0x8d0
              [<c02a8087>] ext4_truncate+0x187/0x5e0
              [<c01e0f7b>] vmtruncate+0x6b/0x70
              [<c022ec02>] inode_setattr+0x62/0x190
              [<c02a2d7a>] ext4_setattr+0x25a/0x370
              [<c022ee81>] notify_change+0x151/0x340
              [<c021349d>] do_truncate+0x6d/0xa0
              [<c0221034>] may_open+0x1d4/0x200
              [<c022412b>] do_filp_open+0x1eb/0x910
              [<c021244d>] do_sys_open+0x6d/0x140
              [<c021258e>] sys_open+0x2e/0x40
              [<c0103100>] sysenter_do_call+0x12/0x32
      
       -> #2 (&ei->i_data_sem){++++..}:
              [<c017d04b>] __lock_acquire+0xd7b/0x1260
              [<c017d5ea>] lock_acquire+0xba/0xd0
              [<c0527191>] down_read+0x51/0x90
              [<c02a5787>] ext4_get_blocks+0x47/0x450
              [<c02a74c1>] ext4_getblk+0x61/0x1d0
              [<c02a7a7f>] ext4_bread+0x1f/0xa0
              [<c02bcddc>] ext4_quota_write+0x12c/0x310
              [<c0262d23>] qtree_write_dquot+0x93/0x120
              [<c0261708>] v2_write_dquot+0x28/0x30
              [<c025d3fb>] dquot_commit+0xab/0xf0
              [<c02be977>] ext4_write_dquot+0x77/0x90
              [<c02be9bf>] ext4_mark_dquot_dirty+0x2f/0x50
              [<c025e321>] dquot_alloc_inode+0x101/0x180
              [<c029fec2>] ext4_new_inode+0x602/0xf00
              [<c02ad789>] ext4_create+0x89/0x150
              [<c0221ff2>] vfs_create+0xa2/0xc0
              [<c02246e7>] do_filp_open+0x7a7/0x910
              [<c021244d>] do_sys_open+0x6d/0x140
              [<c021258e>] sys_open+0x2e/0x40
              [<c0103100>] sysenter_do_call+0x12/0x32
      
       -> #1 (&sb->s_type->i_mutex_key#7/4){+.+...}:
              [<c017d04b>] __lock_acquire+0xd7b/0x1260
              [<c017d5ea>] lock_acquire+0xba/0xd0
              [<c0526505>] mutex_lock_nested+0x65/0x2d0
              [<c0260c9d>] vfs_load_quota_inode+0x4bd/0x5a0
              [<c02610af>] vfs_quota_on_path+0x5f/0x70
              [<c02bc812>] ext4_quota_on+0x112/0x190
              [<c026345a>] sys_quotactl+0x44a/0x8a0
              [<c0103100>] sysenter_do_call+0x12/0x32
      
       -> #0 (&s->s_dquot.dqptr_sem){++++..}:
              [<c017d361>] __lock_acquire+0x1091/0x1260
              [<c017d5ea>] lock_acquire+0xba/0xd0
              [<c0527191>] down_read+0x51/0x90
              [<c025e73b>] dquot_claim_space+0x3b/0x1b0
              [<c02cb95f>] ext4_mb_mark_diskspace_used+0x36f/0x380
              [<c02d210a>] ext4_mb_new_blocks+0x34a/0x530
              [<c02c83fb>] ext4_ext_get_blocks+0x122b/0x13c0
              [<c02a5966>] ext4_get_blocks+0x226/0x450
              [<c02a5ff3>] mpage_da_map_blocks+0xc3/0xaa0
              [<c02a6ed6>] ext4_da_writepages+0x506/0x790
              [<c01de272>] do_writepages+0x22/0x50
              [<c01d766d>] __filemap_fdatawrite_range+0x6d/0x80
              [<c01d7b9b>] filemap_flush+0x2b/0x30
              [<c02a40ac>] ext4_alloc_da_blocks+0x5c/0x60
              [<c029e595>] ext4_release_file+0x75/0xb0
              [<c0216b59>] __fput+0xf9/0x210
              [<c0216c97>] fput+0x27/0x30
              [<c02122dc>] filp_close+0x4c/0x80
              [<c014510e>] put_files_struct+0x6e/0xd0
              [<c01451b7>] exit_files+0x47/0x60
              [<c0146a24>] do_exit+0x144/0x710
              [<c0147028>] do_group_exit+0x38/0xa0
              [<c0159abc>] get_signal_to_deliver+0x2ac/0x410
              [<c0102849>] do_notify_resume+0xb9/0x890
              [<c01032d2>] work_notifysig+0x13/0x21
      
       other info that might help us debug this:
      
       3 locks held by write-truncate-/3465:
        #0:  (jbd2_handle){+.+...}, at: [<c02e1f8f>] start_this_handle+0x38f/0x5c0
        #1:  (&ei->i_data_sem){++++..}, at: [<c02a57f6>] ext4_get_blocks+0xb6/0x450
        #2:  (&meta_group_info[i]->alloc_sem){++++..}, at: [<c02ce962>] ext4_mb_load_buddy+0xb2/0x370
      
       stack backtrace:
       Pid: 3465, comm: write-truncate- Not tainted 2.6.32-rc7 #18
       Call Trace:
        [<c0524cb3>] ? printk+0x1d/0x22
        [<c017ac9a>] print_circular_bug+0xca/0xd0
        [<c017d361>] __lock_acquire+0x1091/0x1260
        [<c016bca2>] ? sched_clock_local+0xd2/0x170
        [<c0178fd0>] ? trace_hardirqs_off_caller+0x20/0xd0
        [<c017d5ea>] lock_acquire+0xba/0xd0
        [<c025e73b>] ? dquot_claim_space+0x3b/0x1b0
        [<c0527191>] down_read+0x51/0x90
        [<c025e73b>] ? dquot_claim_space+0x3b/0x1b0
        [<c025e73b>] dquot_claim_space+0x3b/0x1b0
        [<c02cb95f>] ext4_mb_mark_diskspace_used+0x36f/0x380
        [<c02d210a>] ext4_mb_new_blocks+0x34a/0x530
        [<c02c601d>] ? ext4_ext_find_extent+0x25d/0x280
        [<c02c83fb>] ext4_ext_get_blocks+0x122b/0x13c0
        [<c016bca2>] ? sched_clock_local+0xd2/0x170
        [<c016be60>] ? sched_clock_cpu+0x120/0x160
        [<c016beef>] ? cpu_clock+0x4f/0x60
        [<c0178fd0>] ? trace_hardirqs_off_caller+0x20/0xd0
        [<c052712c>] ? down_write+0x8c/0xa0
        [<c02a5966>] ext4_get_blocks+0x226/0x450
        [<c016be60>] ? sched_clock_cpu+0x120/0x160
        [<c016beef>] ? cpu_clock+0x4f/0x60
        [<c017908b>] ? trace_hardirqs_off+0xb/0x10
        [<c02a5ff3>] mpage_da_map_blocks+0xc3/0xaa0
        [<c01d69cc>] ? find_get_pages_tag+0x16c/0x180
        [<c01d6860>] ? find_get_pages_tag+0x0/0x180
        [<c02a73bd>] ? __mpage_da_writepage+0x16d/0x1a0
        [<c01dfc4e>] ? pagevec_lookup_tag+0x2e/0x40
        [<c01ddf1b>] ? write_cache_pages+0xdb/0x3d0
        [<c02a7250>] ? __mpage_da_writepage+0x0/0x1a0
        [<c02a6ed6>] ext4_da_writepages+0x506/0x790
        [<c016beef>] ? cpu_clock+0x4f/0x60
        [<c016bca2>] ? sched_clock_local+0xd2/0x170
        [<c016be60>] ? sched_clock_cpu+0x120/0x160
        [<c016be60>] ? sched_clock_cpu+0x120/0x160
        [<c02a69d0>] ? ext4_da_writepages+0x0/0x790
        [<c01de272>] do_writepages+0x22/0x50
        [<c01d766d>] __filemap_fdatawrite_range+0x6d/0x80
        [<c01d7b9b>] filemap_flush+0x2b/0x30
        [<c02a40ac>] ext4_alloc_da_blocks+0x5c/0x60
        [<c029e595>] ext4_release_file+0x75/0xb0
        [<c0216b59>] __fput+0xf9/0x210
        [<c0216c97>] fput+0x27/0x30
        [<c02122dc>] filp_close+0x4c/0x80
        [<c014510e>] put_files_struct+0x6e/0xd0
        [<c01451b7>] exit_files+0x47/0x60
        [<c0146a24>] do_exit+0x144/0x710
        [<c017b163>] ? lock_release_holdtime+0x33/0x210
        [<c0528137>] ? _spin_unlock_irq+0x27/0x30
        [<c0147028>] do_group_exit+0x38/0xa0
        [<c017babb>] ? trace_hardirqs_on+0xb/0x10
        [<c0159abc>] get_signal_to_deliver+0x2ac/0x410
        [<c0102849>] do_notify_resume+0xb9/0x890
        [<c0178fd0>] ? trace_hardirqs_off_caller+0x20/0xd0
        [<c017b163>] ? lock_release_holdtime+0x33/0x210
        [<c0165b50>] ? autoremove_wake_function+0x0/0x50
        [<c017ba54>] ? trace_hardirqs_on_caller+0x134/0x190
        [<c017babb>] ? trace_hardirqs_on+0xb/0x10
        [<c0300ba4>] ? security_file_permission+0x14/0x20
        [<c0215761>] ? vfs_write+0x131/0x190
        [<c0214f50>] ? do_sync_write+0x0/0x120
        [<c0103115>] ? sysenter_do_call+0x27/0x32
        [<c01032d2>] work_notifysig+0x13/0x21
      
      CC: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      d21cd8f1
    • Jan Kara's avatar
      quota: Fix 64-bit limits setting on 32-bit archs · 82fdfa92
      Jan Kara authored
      Fix warnings:
      fs/quota/quota_v2.c: In function ‘v2_read_file_info’:
      fs/quota/quota_v2.c:123: warning: integer constant is too large for ‘long’ type
      fs/quota/quota_v2.c:124: warning: integer constant is too large for ‘long’ type
      Reported-by: default avatarJerry Leo <jerryleo860202@gmail.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      82fdfa92
    • Eric Sandeen's avatar
      ext3: Replace lock/unlock_super() with an explicit lock for resizing · 96d2a495
      Eric Sandeen authored
      Use a separate lock to protect s_groups_count and the other block
      group descriptors which get changed via an on-line resize operation,
      so we can stop overloading the use of lock_super().
      
      Port of ext4 commit 32ed5058 by
      Theodore Ts'o <tytso@mit.edu>.
      
      CC: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      96d2a495
    • Eric Sandeen's avatar
      ext3: Replace lock/unlock_super() with an explicit lock for the orphan list · b8a052d0
      Eric Sandeen authored
      Use a separate lock to protect the orphan list, so we can stop
      overloading the use of lock_super().
      
      Port of ext4 commit 3b9d4ed2
      by Theodore Ts'o <tytso@mit.edu>.
      
      CC: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      b8a052d0
    • Eric Sandeen's avatar
      ext3: ext3_mark_recovery_complete() doesn't need to use lock_super · 4854a5f0
      Eric Sandeen authored
      The function ext3_mark_recovery_complete() is called from two call
      paths: either (a) while mounting the filesystem, in which case there's
      no danger of any other CPU calling write_super() until the mount is
      completed, and (b) while remounting the filesystem read-write, in
      which case the fs core has already locked the superblock.  This also
      allows us to take out a very vile unlock_super()/lock_super() pair in
      ext3_remount().
      
      Port of ext4 commit a63c9eb2 by
      Theodore Ts'o <tytso@mit.edu>.
      
      CC: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      4854a5f0
    • Eric Sandeen's avatar
      ext3: Remove outdated comment about lock_super() · ed505ee4
      Eric Sandeen authored
      ext3_fill_super() is no longer called by read_super(), and it is no
      longer called with the superblock locked.  The
      unlock_super()/lock_super() is no longer present, so this comment is
      entirely superfluous.
      
      Port of ext4 commit 32ed5058 by
      Theodore Ts'o <tytso@mit.edu>.
      
      CC: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      ed505ee4
    • Dmitry Monakhov's avatar
      quota: Move duplicated code to separate functions · dc52dd3a
      Dmitry Monakhov authored
      - for(..) { mark_dquot_dirty(); } -> mark_all_dquot_dirty()
      - for(..) { dput(); }             -> dqput_all()
      Signed-off-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      dc52dd3a
    • Dmitry Monakhov's avatar
      ext4: Convert to generic reserved quota's space management. · a9e7f447
      Dmitry Monakhov authored
      This patch also fixes write vs chown race condition.
      Acked-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      a9e7f447
    • Dmitry Monakhov's avatar
      quota: decouple fs reserved space from quota reservation · fd8fbfc1
      Dmitry Monakhov authored
      Currently inode_reservation is managed by fs itself and this
      reservation is transfered on dquot_transfer(). This means what
      inode_reservation must always be in sync with
      dquot->dq_dqb.dqb_rsvspace. Otherwise dquot_transfer() will result
      in incorrect quota(WARN_ON in dquot_claim_reserved_space() will be
      triggered)
      This is not easy because of complex locking order issues
      for example http://bugzilla.kernel.org/show_bug.cgi?id=14739
      
      The patch introduce quota reservation field for each fs-inode
      (fs specific inode is used in order to prevent bloating generic
      vfs inode). This reservation is managed by quota code internally
      similar to i_blocks/i_bytes and may not be always in sync with
      internal fs reservation.
      
      Also perform some code rearrangement:
      - Unify dquot_reserve_space() and dquot_reserve_space()
      - Unify dquot_release_reserved_space() and dquot_free_space()
      - Also this patch add missing warning update to release_rsv()
        dquot_release_reserved_space() must call flush_warnings() as
        dquot_free_space() does.
      Signed-off-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      fd8fbfc1
    • Dmitry Monakhov's avatar
      Add unlocked version of inode_add_bytes() function · b462707e
      Dmitry Monakhov authored
      Quota code requires unlocked version of this function. Off course
      we can just copy-paste the code, but copy-pasting is always an evil.
      Signed-off-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      b462707e
    • Dmitry Monakhov's avatar
      ext3: quota macros cleanup [V2] · c459001f
      Dmitry Monakhov authored
      Currently all quota block reservation macros contains hardcoded "2"
      aka MAXQUOTAS value. This is no good because in some places it is not
      obvious to understand what does this digit represent. Let's introduce
      new macro with self descriptive name.
      Signed-off-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      c459001f
  2. 22 Dec, 2009 28 commits