1. 04 Apr, 2013 7 commits
    • Dmitry Monakhov's avatar
      ext4: unregister es_shrinker if mount failed · a75ae78f
      Dmitry Monakhov authored
      Otherwise destroyed ext_sb_info will be part of global shinker list
      and result in the following OOPS:
      
      JBD2: corrupted journal superblock
      JBD2: recovery failed
      EXT4-fs (dm-2): error loading journal
      general protection fault: 0000 [#1] SMP
      Modules linked in: fuse acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel microcode sg button sd_mod crc_t10dif ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_\
      mod
      CPU 1
      Pid: 2758, comm: mount Not tainted 3.8.0-rc3+ #136                  /DH55TC
      RIP: 0010:[<ffffffff811bfb2d>]  [<ffffffff811bfb2d>] unregister_shrinker+0xad/0xe0
      RSP: 0000:ffff88011d5cbcd8  EFLAGS: 00010207
      RAX: 6b6b6b6b6b6b6b6b RBX: 6b6b6b6b6b6b6b53 RCX: 0000000000000006
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000246
      RBP: ffff88011d5cbce8 R08: 0000000000000002 R09: 0000000000000001
      R10: 0000000000000001 R11: 0000000000000000 R12: ffff88011cd3f848
      R13: ffff88011cd3f830 R14: ffff88011cd3f000 R15: 0000000000000000
      FS:  00007f7b721dd7e0(0000) GS:ffff880121a00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 00007fffa6f75038 CR3: 000000011bc1c000 CR4: 00000000000007e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process mount (pid: 2758, threadinfo ffff88011d5ca000, task ffff880116aacb80)
      Stack:
      ffff88011cd3f000 ffffffff8209b6c0 ffff88011d5cbd18 ffffffff812482f1
      00000000000003f3 00000000ffffffea ffff880115f4c200 0000000000000000
      ffff88011d5cbda8 ffffffff81249381 ffff8801219d8bf8 ffffffff00000000
      Call Trace:
      [<ffffffff812482f1>] deactivate_locked_super+0x91/0xb0
      [<ffffffff81249381>] mount_bdev+0x331/0x340
      [<ffffffff81376730>] ? ext4_alloc_flex_bg_array+0x180/0x180
      [<ffffffff81362035>] ext4_mount+0x15/0x20
      [<ffffffff8124869a>] mount_fs+0x9a/0x2e0
      [<ffffffff81277e25>] vfs_kern_mount+0xc5/0x170
      [<ffffffff81279c02>] do_new_mount+0x172/0x2e0
      [<ffffffff8127aa56>] do_mount+0x376/0x380
      [<ffffffff8127ab98>] sys_mount+0x138/0x150
      [<ffffffff818ffed9>] system_call_fastpath+0x16/0x1b
      Code: 8b 05 88 04 eb 00 48 3d 90 ff 06 82 48 8d 58 e8 75 19 4c 89 e7 e8 e4 d7 2c 00 48 c7 c7 00 ff 06 82 e8 58 5f ef ff 5b 41 5c c9 c3 <48> 8b 4b 18 48 8b 73 20 48 89 da 31 c0 48 c7 c7 c5 a0 e4 81 e\
      8
      RIP  [<ffffffff811bfb2d>] unregister_shrinker+0xad/0xe0
      RSP <ffff88011d5cbcd8>
      Signed-off-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      a75ae78f
    • Dmitry Monakhov's avatar
      ext4: fix journal callback list traversal · 5d3ee208
      Dmitry Monakhov authored
      It is incorrect to use list_for_each_entry_safe() for journal callback
      traversial because ->next may be removed by other task:
      ->ext4_mb_free_metadata()
        ->ext4_mb_free_metadata()
          ->ext4_journal_callback_del()
      
      This results in the following issue:
      
      WARNING: at lib/list_debug.c:62 __list_del_entry+0x1c0/0x250()
      Hardware name:
      list_del corruption. prev->next should be ffff88019a4ec198, but was 6b6b6b6b6b6b6b6b
      Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod
      Pid: 16400, comm: jbd2/dm-1-8 Tainted: G        W    3.8.0-rc3+ #107
      Call Trace:
       [<ffffffff8106fb0d>] warn_slowpath_common+0xad/0xf0
       [<ffffffff8106fc06>] warn_slowpath_fmt+0x46/0x50
       [<ffffffff813637e9>] ? ext4_journal_commit_callback+0x99/0xc0
       [<ffffffff8148cae0>] __list_del_entry+0x1c0/0x250
       [<ffffffff813637bf>] ext4_journal_commit_callback+0x6f/0xc0
       [<ffffffff813ca336>] jbd2_journal_commit_transaction+0x23a6/0x2570
       [<ffffffff8108aa42>] ? try_to_del_timer_sync+0x82/0xa0
       [<ffffffff8108b491>] ? del_timer_sync+0x91/0x1e0
       [<ffffffff813d3ecf>] kjournald2+0x19f/0x6a0
       [<ffffffff810ad630>] ? wake_up_bit+0x40/0x40
       [<ffffffff813d3d30>] ? bit_spin_lock+0x80/0x80
       [<ffffffff810ac6be>] kthread+0x10e/0x120
       [<ffffffff810ac5b0>] ? __init_kthread_worker+0x70/0x70
       [<ffffffff818ff6ac>] ret_from_fork+0x7c/0xb0
       [<ffffffff810ac5b0>] ? __init_kthread_worker+0x70/0x70
      
      This patch fix the issue as follows:
      - ext4_journal_commit_callback() make list truly traversial safe
        simply by always starting from list_head
      - fix race between two ext4_journal_callback_del() and
        ext4_journal_callback_try_del()
      Signed-off-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: stable@vger.kernel.com
      5d3ee208
    • Dmitry Monakhov's avatar
      jbd2: fix race between jbd2_journal_remove_checkpoint and ->j_commit_callback · 794446c6
      Dmitry Monakhov authored
      The following race is possible:
      
      [kjournald2]                              other_task
      jbd2_journal_commit_transaction()
        j_state = T_FINISHED;
        spin_unlock(&journal->j_list_lock);
                                               ->jbd2_journal_remove_checkpoint()
      					   ->jbd2_journal_free_transaction();
      					     ->kmem_cache_free(transaction)
        ->j_commit_callback(journal, transaction);
          -> USE_AFTER_FREE
      
      WARNING: at lib/list_debug.c:62 __list_del_entry+0x1c0/0x250()
      Hardware name:
      list_del corruption. prev->next should be ffff88019a4ec198, but was 6b6b6b6b6b6b6b6b
      Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod
      Pid: 16400, comm: jbd2/dm-1-8 Tainted: G        W    3.8.0-rc3+ #107
      Call Trace:
       [<ffffffff8106fb0d>] warn_slowpath_common+0xad/0xf0
       [<ffffffff8106fc06>] warn_slowpath_fmt+0x46/0x50
       [<ffffffff813637e9>] ? ext4_journal_commit_callback+0x99/0xc0
       [<ffffffff8148cae0>] __list_del_entry+0x1c0/0x250
       [<ffffffff813637bf>] ext4_journal_commit_callback+0x6f/0xc0
       [<ffffffff813ca336>] jbd2_journal_commit_transaction+0x23a6/0x2570
       [<ffffffff8108aa42>] ? try_to_del_timer_sync+0x82/0xa0
       [<ffffffff8108b491>] ? del_timer_sync+0x91/0x1e0
       [<ffffffff813d3ecf>] kjournald2+0x19f/0x6a0
       [<ffffffff810ad630>] ? wake_up_bit+0x40/0x40
       [<ffffffff813d3d30>] ? bit_spin_lock+0x80/0x80
       [<ffffffff810ac6be>] kthread+0x10e/0x120
       [<ffffffff810ac5b0>] ? __init_kthread_worker+0x70/0x70
       [<ffffffff818ff6ac>] ret_from_fork+0x7c/0xb0
       [<ffffffff810ac5b0>] ? __init_kthread_worker+0x70/0x70
      
      In order to demonstrace this issue one should mount ext4 with mount -o
      discard option on SSD disk.  This makes callback longer and race
      window becomes wider.
      
      In order to fix this we should mark transaction as finished only after
      callbacks have completed
      Signed-off-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      794446c6
    • Theodore Ts'o's avatar
      ext4: support simple conversion of extent-mapped inodes to use i_blocks · 996bb9fd
      Theodore Ts'o authored
      In order to make it simpler to test the code which support
      i_blocks/indirect-mapped inodes, support the conversion of inodes
      which are less than 12 blocks and which are contained in no more than
      a single extent.
      
      The primary intended use of this code is to converting freshly created
      zero-length files and empty directories.
      
      Note that the version of chattr in e2fsprogs 1.42.7 and earlier has a
      check that prevents the clearing of the extent flag.  A simple patch
      which allows "chattr -e <file>" to work will be checked into the
      e2fsprogs git repository.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      996bb9fd
    • Theodore Ts'o's avatar
      ext4/jbd2: don't wait (forever) for stale tid caused by wraparound · d76a3a77
      Theodore Ts'o authored
      In the case where an inode has a very stale transaction id (tid) in
      i_datasync_tid or i_sync_tid, it's possible that after a very large
      (2**31) number of transactions, that the tid number space might wrap,
      causing tid_geq()'s calculations to fail.
      
      Commit deeeaf13 "jbd2: fix fsync() tid wraparound bug", later modified
      by commit e7b04ac0 "jbd2: don't wake kjournald unnecessarily",
      attempted to fix this problem, but it only avoided kjournald spinning
      forever by fixing the logic in jbd2_log_start_commit().
      
      Unfortunately, in the codepaths in fs/ext4/fsync.c and fs/ext4/inode.c
      that might call jbd2_log_start_commit() with a stale tid, those
      functions will subsequently call jbd2_log_wait_commit() with the same
      stale tid, and then wait for a very long time.  To fix this, we
      replace the calls to jbd2_log_start_commit() and
      jbd2_log_wait_commit() with a call to a new function,
      jbd2_complete_transaction(), which will correctly handle stale tid's.
      
      As a bonus, jbd2_complete_transaction() will avoid locking
      j_state_lock for writing unless a commit needs to be started.  This
      should have a small (but probably not measurable) improvement for
      ext4's scalability.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Reported-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Reported-by: default avatarGeorge Barnett <gbarnett@atlassian.com>
      Cc: stable@vger.kernel.org
      
      d76a3a77
    • Theodore Ts'o's avatar
      ext4: add might_sleep() annotations · b10a44c3
      Theodore Ts'o authored
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: default avatarLukas Czerner <lczerner@redhat.com>
      b10a44c3
    • Theodore Ts'o's avatar
      ext4: add mutex_is_locked() assertion to ext4_truncate() · 19b5ef61
      Theodore Ts'o authored
      [ Added fixup from Lukáš Czerner which only checks the assertion when
        the inode is not new and is not being freed. ]
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      19b5ef61
  2. 03 Apr, 2013 6 commits
  3. 31 Mar, 2013 5 commits
    • Linus Torvalds's avatar
      Linux 3.9-rc5 · 07961ac7
      Linus Torvalds authored
      07961ac7
    • Linus Torvalds's avatar
      Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma · 0bb44280
      Linus Torvalds authored
      Pull slave-dmaengine fixes from Vinod Koul:
       "Two fixes for slave-dmaengine.
      
        The first one is for making slave_id value correct for dw_dmac and
        the other one fixes the endieness in DT parsing"
      
      * 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
        dw_dmac: adjust slave_id accordingly to request line base
        dmaengine: dw_dma: fix endianess for DT xlate function
      0bb44280
    • Linus Torvalds's avatar
      Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · a7b436d3
      Linus Torvalds authored
      Pull media fixes from Mauro Carvalho Chehab:
       "For a some fixes for Kernel 3.9:
         - subsystem build fix when VIDEO_DEV=y, VIDEO_V4L2=m and I2C=m
         - compilation fix for arm multiarch preventing IR_RX51 to be selected
         - regression fix at bttv crop logic
         - s5p-mfc/m5mols/exynos: a few fixes for cameras on exynos hardware"
      
      * 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
        [media] [REGRESSION] bt8xx: Fix too large height in cropcap
        [media] fix compilation with both V4L2 and I2C as 'm'
        [media] m5mols: Fix bug in stream on handler
        [media] s5p-fimc: Do not attempt to disable not enabled media pipeline
        [media] s5p-mfc: Fix encoder control 15 issue
        [media] s5p-mfc: Fix frame skip bug
        [media] s5p-fimc: send valid m2m ctx to fimc_m2m_job_finish
        [media] exynos-gsc: send valid m2m ctx to gsc_m2m_job_finish
        [media] fimc-lite: Fix the variable type to avoid possible crash
        [media] fimc-lite: Initialize 'step' field in fimc_lite_ctrl structure
        [media] ir: IR_RX51 only works on OMAP2
      a7b436d3
    • Linus Torvalds's avatar
      Merge tag 'for-linus-20130331' of git://git.kernel.dk/linux-block · d299c290
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "Alright, this time from 10K up in the air.
      
        Collection of fixes that have been queued up since the merge window
        opened, hence postponed until later in the cycle.  The pull request
        contains:
      
         - A bunch of fixes for the xen blk front/back driver.
      
         - A round of fixes for the new IBM RamSan driver, fixing various
           nasty issues.
      
         - Fixes for multiple drives from Wei Yongjun, bad handling of return
           values and wrong pointer math.
      
         - A fix for loop properly killing partitions when being detached."
      
      * tag 'for-linus-20130331' of git://git.kernel.dk/linux-block: (25 commits)
        mg_disk: fix error return code in mg_probe()
        rsxx: remove unused variable
        rsxx: enable error return of rsxx_eeh_save_issued_dmas()
        block: removes dynamic allocation on stack
        Block: blk-flush: Fixed indent code style
        cciss: fix invalid use of sizeof in cciss_find_cfgtables()
        loop: cleanup partitions when detaching loop device
        loop: fix error return code in loop_add()
        mtip32xx: fix error return code in mtip_pci_probe()
        xen-blkfront: remove frame list from blk_shadow
        xen-blkfront: pre-allocate pages for requests
        xen-blkback: don't store dev_bus_addr
        xen-blkfront: switch from llist to list
        xen-blkback: fix foreach_grant_safe to handle empty lists
        xen-blkfront: replace kmalloc and then memcpy with kmemdup
        xen-blkback: fix dispatch_rw_block_io() error path
        rsxx: fix missing unlock on error return in rsxx_eeh_remap_dmas()
        Adding in EEH support to the IBM FlashSystem 70/80 device driver
        block: IBM RamSan 70/80 error message bug fix.
        block: IBM RamSan 70/80 branding changes.
        ...
      d299c290
    • Paul Walmsley's avatar
      Revert "lockdep: check that no locks held at freeze time" · dbf520a9
      Paul Walmsley authored
      This reverts commit 6aa97070.
      
      Commit 6aa97070 ("lockdep: check that no locks held at freeze time")
      causes problems with NFS root filesystems.  The failures were noticed on
      OMAP2 and 3 boards during kernel init:
      
        [ BUG: swapper/0/1 still has locks held! ]
        3.9.0-rc3-00344-ga937536b #1 Not tainted
        -------------------------------------
        1 lock held by swapper/0/1:
         #0:  (&type->s_umount_key#13/1){+.+.+.}, at: [<c011e84c>] sget+0x248/0x574
      
        stack backtrace:
          rpc_wait_bit_killable
          __wait_on_bit
          out_of_line_wait_on_bit
          __rpc_execute
          rpc_run_task
          rpc_call_sync
          nfs_proc_get_root
          nfs_get_root
          nfs_fs_mount_common
          nfs_try_mount
          nfs_fs_mount
          mount_fs
          vfs_kern_mount
          do_mount
          sys_mount
          do_mount_root
          mount_root
          prepare_namespace
          kernel_init_freeable
          kernel_init
      
      Although the rootfs mounts, the system is unstable.  Here's a transcript
      from a PM test:
      
        http://www.pwsan.com/omap/testlogs/test_v3.9-rc3/20130317194234/pm/37xxevm/37xxevm_log.txt
      
      Here's what the test log should look like:
      
        http://www.pwsan.com/omap/testlogs/test_v3.8/20130218214403/pm/37xxevm/37xxevm_log.txt
      
      Mailing list discussion is here:
      
        http://lkml.org/lkml/2013/3/4/221
      
      Deal with this for v3.9 by reverting the problem commit, until folks can
      figure out the right long-term course of action.
      Signed-off-by: default avatarPaul Walmsley <paul@pwsan.com>
      Cc: Mandeep Singh Baines <msb@chromium.org>
      Cc: Jeff Layton <jlayton@redhat.com>
      Cc: Shawn Guo <shawn.guo@linaro.org>
      Cc: <maciej.rutecki@gmail.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ben Chan <benchan@chromium.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dbf520a9
  4. 30 Mar, 2013 1 commit
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending · 13d2080d
      Linus Torvalds authored
      Pull SCSI target fixes from Nicholas Bellinger:
       "This includes the bug-fix for a >= v3.8-rc1 regression specific to
        iscsi-target persistent reservation conflict handling (CC'ed to
        stable), and a tcm_vhost patch to drop VIRTIO_RING_F_EVENT_IDX usage
        so that in-flight qemu vhost-scsi-pci device code can detect the
        proper vhost feature bits.
      
        Also, there are two more tcm_vhost patches still being discussed by
        MST and Asias for v3.9 that will be required for the in-flight qemu
        vhost-scsi-pci device patch to function properly, and that should
        (hopefully) be the last target fixes for this round."
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
        target: Fix RESERVATION_CONFLICT status regression for iscsi-target special case
        tcm_vhost: Avoid VIRTIO_RING_F_EVENT_IDX feature bit
      13d2080d
  5. 29 Mar, 2013 12 commits
  6. 28 Mar, 2013 9 commits