1. 19 May, 2016 40 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.5.5 · 3b41b7e3
      Greg Kroah-Hartman authored
      3b41b7e3
    • Linus Torvalds's avatar
      nf_conntrack: avoid kernel pointer value leak in slab name · 7ef374ef
      Linus Torvalds authored
      commit 31b0b385 upstream.
      
      The slab name ends up being visible in the directory structure under
      /sys, and even if you don't have access rights to the file you can see
      the filenames.
      
      Just use a 64-bit counter instead of the pointer to the 'net' structure
      to generate a unique name.
      
      This code will go away in 4.7 when the conntrack code moves to a single
      kmemcache, but this is the backportable simple solution to avoiding
      leaking kernel pointers to user space.
      
      Fixes: 5b3501fa ("netfilter: nf_conntrack: per netns nf_conntrack_cachep")
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7ef374ef
    • Yauhen Kharuzhy's avatar
      btrfs: Reset IO error counters before start of device replacing · 9abb19bd
      Yauhen Kharuzhy authored
      commit 7ccefb98 upstream.
      
      If device replace entry was found on disk at mounting and its num_write_errors
      stats counter has non-NULL value, then replace operation will never be
      finished and -EIO error will be reported by btrfs_scrub_dev() because
      this counter is never reset.
      
       # mount -o degraded /media/a4fb5c0a-21c5-4fe7-8d0e-fdd87d5f71ee/
       # btrfs replace status /media/a4fb5c0a-21c5-4fe7-8d0e-fdd87d5f71ee/
       Started on 25.Mar 07:28:00, canceled on 25.Mar 07:28:01 at 0.0%, 40 write errs, 0 uncorr. read errs
       # btrfs replace start -B 4 /dev/sdg /media/a4fb5c0a-21c5-4fe7-8d0e-fdd87d5f71ee/
       ERROR: ioctl(DEV_REPLACE_START) failed on "/media/a4fb5c0a-21c5-4fe7-8d0e-fdd87d5f71ee/": Input/output error, no error
      
      Reset num_write_errors and num_uncorrectable_read_errors counters in the
      dev_replace structure before start of replacing.
      Signed-off-by: default avatarYauhen Kharuzhy <yauhen.kharuzhy@zavadatar.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9abb19bd
    • Josef Bacik's avatar
      Btrfs: don't use src fd for printk · f188c432
      Josef Bacik authored
      commit c79b4713 upstream.
      
      The fd we pass in may not be on a btrfs file system, so don't try to do
      BTRFS_I() on it.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f188c432
    • David Sterba's avatar
      btrfs: fallback to vmalloc in btrfs_compare_tree · 440323f9
      David Sterba authored
      commit 8f282f71 upstream.
      
      The allocation of node could fail if the memory is too fragmented for a
      given node size, practically observed with 64k.
      
      http://article.gmane.org/gmane.comp.file-systems.btrfs/54689Reported-and-tested-by: default avatarJean-Denis Girard <jd.girard@sysnux.pf>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      440323f9
    • Mark Fasheh's avatar
      btrfs: handle non-fatal errors in btrfs_qgroup_inherit() · 52ec5549
      Mark Fasheh authored
      commit 918c2ee1 upstream.
      
      create_pending_snapshot() will go readonly on _any_ error return from
      btrfs_qgroup_inherit(). If qgroups are enabled, a user can crash their fs by
      just making a snapshot and asking it to inherit from an invalid qgroup. For
      example:
      
      $ btrfs sub snap -i 1/10 /btrfs/ /btrfs/foo
      
      Will cause a transaction abort.
      
      Fix this by only throwing errors in btrfs_qgroup_inherit() when we know
      going readonly is acceptable.
      
      The following xfstests test case reproduces this bug:
      
        seq=`basename $0`
        seqres=$RESULT_DIR/$seq
        echo "QA output created by $seq"
      
        here=`pwd`
        tmp=/tmp/$$
        status=1	# failure is the default!
        trap "_cleanup; exit \$status" 0 1 2 3 15
      
        _cleanup()
        {
        	cd /
        	rm -f $tmp.*
        }
      
        # get standard environment, filters and checks
        . ./common/rc
        . ./common/filter
      
        # remove previous $seqres.full before test
        rm -f $seqres.full
      
        # real QA test starts here
        _supported_fs btrfs
        _supported_os Linux
        _require_scratch
      
        rm -f $seqres.full
      
        _scratch_mkfs
        _scratch_mount
        _run_btrfs_util_prog quota enable $SCRATCH_MNT
        # The qgroup '1/10' does not exist and should be silently ignored
        _run_btrfs_util_prog subvolume snapshot -i 1/10 $SCRATCH_MNT $SCRATCH_MNT/snap1
      
        _scratch_unmount
      
        echo "Silence is golden"
      
        status=0
        exit
      Signed-off-by: default avatarMark Fasheh <mfasheh@suse.de>
      Reviewed-by: default avatarQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      52ec5549
    • Liu Bo's avatar
      Btrfs: fix invalid reference in replace_path · 24469ab4
      Liu Bo authored
      commit 264813ac upstream.
      
      Dan Carpenter's static checker has found this error, it's introduced by
      commit 64c043de
      ("Btrfs: fix up read_tree_block to return proper error")
      
      It's really supposed to 'break' the loop on error like others.
      
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      24469ab4
    • Alex Lyakas's avatar
      btrfs: do not write corrupted metadata blocks to disk · b96513cc
      Alex Lyakas authored
      commit 0f805531 upstream.
      
      csum_dirty_buffer was issuing a warning in case the extent buffer
      did not look alright, but was still returning success.
      Let's return error in this case, and also add an additional sanity
      check on the extent buffer header.
      The caller up the chain may BUG_ON on this, for example flush_epd_write_bio will,
      but it is better than to have a silent metadata corruption on disk.
      Signed-off-by: default avatarAlex Lyakas <alex@zadarastorage.com>
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b96513cc
    • Alex Lyakas's avatar
      342da5ce
    • Filipe Manana's avatar
      Btrfs: do not collect ordered extents when logging that inode exists · 13b7e683
      Filipe Manana authored
      commit 5e33a2bd upstream.
      
      When logging that an inode exists, for example as part of a directory
      fsync operation, we were collecting any ordered extents for the inode but
      we ended up doing nothing with them except tagging them as processed, by
      setting the flag BTRFS_ORDERED_LOGGED on them, which prevented a
      subsequent fsync of that inode (using the LOG_INODE_ALL mode) from
      collecting and processing them. This created a time window where a second
      fsync against the inode, using the fast path, ended up not logging the
      checksums for the new extents but it logged the extents since they were
      part of the list of modified extents. This happened because the ordered
      extents were not collected and checksums were not yet added to the csum
      tree - the ordered extents have not gone through btrfs_finish_ordered_io()
      yet (which is where we add them to the csum tree by calling
      inode.c:add_pending_csums()).
      
      So fix this by not collecting an inode's ordered extents if we are logging
      it with the LOG_INODE_EXISTS mode.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      13b7e683
    • Filipe Manana's avatar
      Btrfs: fix race when checking if we can skip fsync'ing an inode · ce91def6
      Filipe Manana authored
      commit affc0ff9 upstream.
      
      If we're about to do a fast fsync for an inode and btrfs_inode_in_log()
      returns false, it's possible that we had an ordered extent in progress
      (btrfs_finish_ordered_io() not run yet) when we noticed that the inode's
      last_trans field was not greater than the id of the last committed
      transaction, but shortly after, before we checked if there were any
      ongoing ordered extents, the ordered extent had just completed and
      removed itself from the inode's ordered tree, in which case we end up not
      logging the inode, losing some data if a power failure or crash happens
      after the fsync handler returns and before the transaction is committed.
      
      Fix this by checking first if there are any ongoing ordered extents
      before comparing the inode's last_trans with the id of the last committed
      transaction - when it completes, an ordered extent always updates the
      inode's last_trans before it removes itself from the inode's ordered
      tree (at btrfs_finish_ordered_io()).
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ce91def6
    • Filipe Manana's avatar
      Btrfs: fix deadlock between direct IO reads and buffered writes · 8eb64c91
      Filipe Manana authored
      commit ade77029 upstream.
      
      While running a test with a mix of buffered IO and direct IO against
      the same files I hit a deadlock reported by the following trace:
      
      [11642.140352] INFO: task kworker/u32:3:15282 blocked for more than 120 seconds.
      [11642.142452]       Not tainted 4.4.0-rc6-btrfs-next-21+ #1
      [11642.143982] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [11642.146332] kworker/u32:3   D ffff880230ef7988 [11642.147737] systemd-journald[571]: Sent WATCHDOG=1 notification.
      [11642.149771]     0 15282      2 0x00000000
      [11642.151205] Workqueue: btrfs-flush_delalloc btrfs_flush_delalloc_helper [btrfs]
      [11642.154074]  ffff880230ef7988 0000000000000246 0000000000014ec0 ffff88023ec94ec0
      [11642.156722]  ffff880233fe8f80 ffff880230ef8000 ffff88023ec94ec0 7fffffffffffffff
      [11642.159205]  0000000000000002 ffffffff8147b7f9 ffff880230ef79a0 ffffffff8147b541
      [11642.161403] Call Trace:
      [11642.162129]  [<ffffffff8147b7f9>] ? bit_wait+0x2f/0x2f
      [11642.163396]  [<ffffffff8147b541>] schedule+0x82/0x9a
      [11642.164871]  [<ffffffff8147e7fe>] schedule_timeout+0x43/0x109
      [11642.167020]  [<ffffffff8147b7f9>] ? bit_wait+0x2f/0x2f
      [11642.167931]  [<ffffffff8108afd1>] ? trace_hardirqs_on_caller+0x17b/0x197
      [11642.182320]  [<ffffffff8108affa>] ? trace_hardirqs_on+0xd/0xf
      [11642.183762]  [<ffffffff810b079b>] ? timekeeping_get_ns+0xe/0x33
      [11642.185308]  [<ffffffff810b0f61>] ? ktime_get+0x41/0x52
      [11642.186782]  [<ffffffff8147ac08>] io_schedule_timeout+0xa0/0x102
      [11642.188217]  [<ffffffff8147ac08>] ? io_schedule_timeout+0xa0/0x102
      [11642.189626]  [<ffffffff8147b814>] bit_wait_io+0x1b/0x39
      [11642.190803]  [<ffffffff8147bb21>] __wait_on_bit_lock+0x4c/0x90
      [11642.192158]  [<ffffffff8111829f>] __lock_page+0x66/0x68
      [11642.193379]  [<ffffffff81082f29>] ? autoremove_wake_function+0x3a/0x3a
      [11642.194831]  [<ffffffffa0450ddd>] lock_page+0x31/0x34 [btrfs]
      [11642.197068]  [<ffffffffa0454e3b>] extent_write_cache_pages.isra.19.constprop.35+0x1af/0x2f4 [btrfs]
      [11642.199188]  [<ffffffffa0455373>] extent_writepages+0x4b/0x5c [btrfs]
      [11642.200723]  [<ffffffffa043c913>] ? btrfs_writepage_start_hook+0xce/0xce [btrfs]
      [11642.202465]  [<ffffffffa043aa82>] btrfs_writepages+0x28/0x2a [btrfs]
      [11642.203836]  [<ffffffff811236bc>] do_writepages+0x23/0x2c
      [11642.205624]  [<ffffffff811198c9>] __filemap_fdatawrite_range+0x5a/0x61
      [11642.207057]  [<ffffffff81119946>] filemap_fdatawrite_range+0x13/0x15
      [11642.208529]  [<ffffffffa044f87e>] btrfs_start_ordered_extent+0xd0/0x1a1 [btrfs]
      [11642.210375]  [<ffffffffa0462613>] ? btrfs_scrubparity_helper+0x140/0x33a [btrfs]
      [11642.212132]  [<ffffffffa044f974>] btrfs_run_ordered_extent_work+0x25/0x34 [btrfs]
      [11642.213837]  [<ffffffffa046262f>] btrfs_scrubparity_helper+0x15c/0x33a [btrfs]
      [11642.215457]  [<ffffffffa046293b>] btrfs_flush_delalloc_helper+0xe/0x10 [btrfs]
      [11642.217095]  [<ffffffff8106483e>] process_one_work+0x256/0x48b
      [11642.218324]  [<ffffffff81064f20>] worker_thread+0x1f5/0x2a7
      [11642.219466]  [<ffffffff81064d2b>] ? rescuer_thread+0x289/0x289
      [11642.220801]  [<ffffffff8106a500>] kthread+0xd4/0xdc
      [11642.222032]  [<ffffffff8106a42c>] ? kthread_parkme+0x24/0x24
      [11642.223190]  [<ffffffff8147fdef>] ret_from_fork+0x3f/0x70
      [11642.224394]  [<ffffffff8106a42c>] ? kthread_parkme+0x24/0x24
      [11642.226295] 2 locks held by kworker/u32:3/15282:
      [11642.227273]  #0:  ("%s-%s""btrfs", name){++++.+}, at: [<ffffffff8106474d>] process_one_work+0x165/0x48b
      [11642.229412]  #1:  ((&work->normal_work)){+.+.+.}, at: [<ffffffff8106474d>] process_one_work+0x165/0x48b
      [11642.231414] INFO: task kworker/u32:8:15289 blocked for more than 120 seconds.
      [11642.232872]       Not tainted 4.4.0-rc6-btrfs-next-21+ #1
      [11642.234109] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [11642.235776] kworker/u32:8   D ffff88020de5f848     0 15289      2 0x00000000
      [11642.237412] Workqueue: writeback wb_workfn (flush-btrfs-481)
      [11642.238670]  ffff88020de5f848 0000000000000246 0000000000014ec0 ffff88023ed54ec0
      [11642.240475]  ffff88021b1ece40 ffff88020de60000 ffff88023ed54ec0 7fffffffffffffff
      [11642.242154]  0000000000000002 ffffffff8147b7f9 ffff88020de5f860 ffffffff8147b541
      [11642.243715] Call Trace:
      [11642.244390]  [<ffffffff8147b7f9>] ? bit_wait+0x2f/0x2f
      [11642.245432]  [<ffffffff8147b541>] schedule+0x82/0x9a
      [11642.246392]  [<ffffffff8147e7fe>] schedule_timeout+0x43/0x109
      [11642.247479]  [<ffffffff8147b7f9>] ? bit_wait+0x2f/0x2f
      [11642.248551]  [<ffffffff8108afd1>] ? trace_hardirqs_on_caller+0x17b/0x197
      [11642.249968]  [<ffffffff8108affa>] ? trace_hardirqs_on+0xd/0xf
      [11642.251043]  [<ffffffff810b079b>] ? timekeeping_get_ns+0xe/0x33
      [11642.252202]  [<ffffffff810b0f61>] ? ktime_get+0x41/0x52
      [11642.253210]  [<ffffffff8147ac08>] io_schedule_timeout+0xa0/0x102
      [11642.254307]  [<ffffffff8147ac08>] ? io_schedule_timeout+0xa0/0x102
      [11642.256118]  [<ffffffff8147b814>] bit_wait_io+0x1b/0x39
      [11642.257131]  [<ffffffff8147bb21>] __wait_on_bit_lock+0x4c/0x90
      [11642.258200]  [<ffffffff8111829f>] __lock_page+0x66/0x68
      [11642.259168]  [<ffffffff81082f29>] ? autoremove_wake_function+0x3a/0x3a
      [11642.260516]  [<ffffffffa0450ddd>] lock_page+0x31/0x34 [btrfs]
      [11642.261841]  [<ffffffffa0454e3b>] extent_write_cache_pages.isra.19.constprop.35+0x1af/0x2f4 [btrfs]
      [11642.263531]  [<ffffffffa0455373>] extent_writepages+0x4b/0x5c [btrfs]
      [11642.264747]  [<ffffffffa043c913>] ? btrfs_writepage_start_hook+0xce/0xce [btrfs]
      [11642.266148]  [<ffffffffa043aa82>] btrfs_writepages+0x28/0x2a [btrfs]
      [11642.267264]  [<ffffffff811236bc>] do_writepages+0x23/0x2c
      [11642.268280]  [<ffffffff81192a2b>] __writeback_single_inode+0xda/0x5ba
      [11642.269407]  [<ffffffff811939f0>] writeback_sb_inodes+0x27b/0x43d
      [11642.270476]  [<ffffffff81193c28>] __writeback_inodes_wb+0x76/0xae
      [11642.271547]  [<ffffffff81193ea6>] wb_writeback+0x19e/0x41c
      [11642.272588]  [<ffffffff81194821>] wb_workfn+0x201/0x341
      [11642.273523]  [<ffffffff81194821>] ? wb_workfn+0x201/0x341
      [11642.274479]  [<ffffffff8106483e>] process_one_work+0x256/0x48b
      [11642.275497]  [<ffffffff81064f20>] worker_thread+0x1f5/0x2a7
      [11642.276518]  [<ffffffff81064d2b>] ? rescuer_thread+0x289/0x289
      [11642.277520]  [<ffffffff81064d2b>] ? rescuer_thread+0x289/0x289
      [11642.278517]  [<ffffffff8106a500>] kthread+0xd4/0xdc
      [11642.279371]  [<ffffffff8106a42c>] ? kthread_parkme+0x24/0x24
      [11642.280468]  [<ffffffff8147fdef>] ret_from_fork+0x3f/0x70
      [11642.281607]  [<ffffffff8106a42c>] ? kthread_parkme+0x24/0x24
      [11642.282604] 3 locks held by kworker/u32:8/15289:
      [11642.283423]  #0:  ("writeback"){++++.+}, at: [<ffffffff8106474d>] process_one_work+0x165/0x48b
      [11642.285629]  #1:  ((&(&wb->dwork)->work)){+.+.+.}, at: [<ffffffff8106474d>] process_one_work+0x165/0x48b
      [11642.287538]  #2:  (&type->s_umount_key#37){+++++.}, at: [<ffffffff81171217>] trylock_super+0x1b/0x4b
      [11642.289423] INFO: task fdm-stress:26848 blocked for more than 120 seconds.
      [11642.290547]       Not tainted 4.4.0-rc6-btrfs-next-21+ #1
      [11642.291453] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [11642.292864] fdm-stress      D ffff88022c107c20     0 26848  26591 0x00000000
      [11642.294118]  ffff88022c107c20 000000038108affa 0000000000014ec0 ffff88023ed54ec0
      [11642.295602]  ffff88013ab1ca40 ffff88022c108000 ffff8800b2fc19d0 00000000000e0fff
      [11642.297098]  ffff8800b2fc19b0 ffff88022c107c88 ffff88022c107c38 ffffffff8147b541
      [11642.298433] Call Trace:
      [11642.298896]  [<ffffffff8147b541>] schedule+0x82/0x9a
      [11642.299738]  [<ffffffffa045225d>] lock_extent_bits+0xfe/0x1a3 [btrfs]
      [11642.300833]  [<ffffffff81082eef>] ? add_wait_queue_exclusive+0x44/0x44
      [11642.301943]  [<ffffffffa0447516>] lock_and_cleanup_extent_if_need+0x68/0x18e [btrfs]
      [11642.303270]  [<ffffffffa04485ba>] __btrfs_buffered_write+0x238/0x4c1 [btrfs]
      [11642.304552]  [<ffffffffa044b50a>] ? btrfs_file_write_iter+0x17c/0x408 [btrfs]
      [11642.305782]  [<ffffffffa044b682>] btrfs_file_write_iter+0x2f4/0x408 [btrfs]
      [11642.306878]  [<ffffffff8116e298>] __vfs_write+0x7c/0xa5
      [11642.307729]  [<ffffffff8116e7d1>] vfs_write+0x9d/0xe8
      [11642.308602]  [<ffffffff8116efbb>] SyS_write+0x50/0x7e
      [11642.309410]  [<ffffffff8147fa97>] entry_SYSCALL_64_fastpath+0x12/0x6b
      [11642.310403] 3 locks held by fdm-stress/26848:
      [11642.311108]  #0:  (&f->f_pos_lock){+.+.+.}, at: [<ffffffff811877e8>] __fdget_pos+0x3a/0x40
      [11642.312578]  #1:  (sb_writers#11){.+.+.+}, at: [<ffffffff811706ee>] __sb_start_write+0x5f/0xb0
      [11642.314170]  #2:  (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffffa044b401>] btrfs_file_write_iter+0x73/0x408 [btrfs]
      [11642.316796] INFO: task fdm-stress:26849 blocked for more than 120 seconds.
      [11642.317842]       Not tainted 4.4.0-rc6-btrfs-next-21+ #1
      [11642.318691] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [11642.319959] fdm-stress      D ffff8801964ffa68     0 26849  26591 0x00000000
      [11642.321312]  ffff8801964ffa68 00ff8801e9975f80 0000000000014ec0 ffff88023ed94ec0
      [11642.322555]  ffff8800b00b4840 ffff880196500000 ffff8801e9975f20 0000000000000002
      [11642.323715]  ffff8801e9975f18 ffff8800b00b4840 ffff8801964ffa80 ffffffff8147b541
      [11642.325096] Call Trace:
      [11642.325532]  [<ffffffff8147b541>] schedule+0x82/0x9a
      [11642.326303]  [<ffffffff8147e7fe>] schedule_timeout+0x43/0x109
      [11642.327180]  [<ffffffff8108ae40>] ? mark_held_locks+0x5e/0x74
      [11642.328114]  [<ffffffff8147f30e>] ? _raw_spin_unlock_irq+0x2c/0x4a
      [11642.329051]  [<ffffffff8108afd1>] ? trace_hardirqs_on_caller+0x17b/0x197
      [11642.330053]  [<ffffffff8147bceb>] __wait_for_common+0x109/0x147
      [11642.330952]  [<ffffffff8147bceb>] ? __wait_for_common+0x109/0x147
      [11642.331869]  [<ffffffff8147e7bb>] ? usleep_range+0x4a/0x4a
      [11642.332925]  [<ffffffff81074075>] ? wake_up_q+0x47/0x47
      [11642.333736]  [<ffffffff8147bd4d>] wait_for_completion+0x24/0x26
      [11642.334672]  [<ffffffffa044f5ce>] btrfs_wait_ordered_extents+0x1c8/0x217 [btrfs]
      [11642.335858]  [<ffffffffa0465b5a>] btrfs_mksubvol+0x224/0x45d [btrfs]
      [11642.336854]  [<ffffffff81082eef>] ? add_wait_queue_exclusive+0x44/0x44
      [11642.337820]  [<ffffffffa0465edb>] btrfs_ioctl_snap_create_transid+0x148/0x17a [btrfs]
      [11642.339026]  [<ffffffffa046603b>] btrfs_ioctl_snap_create_v2+0xc7/0x110 [btrfs]
      [11642.340214]  [<ffffffffa0468582>] btrfs_ioctl+0x590/0x27bd [btrfs]
      [11642.341123]  [<ffffffff8147dc00>] ? mutex_unlock+0xe/0x10
      [11642.341934]  [<ffffffffa00fa6e9>] ? ext4_file_write_iter+0x2a3/0x36f [ext4]
      [11642.342936]  [<ffffffff8108895d>] ? __lock_is_held+0x3c/0x57
      [11642.343772]  [<ffffffff81186a1d>] ? rcu_read_unlock+0x3e/0x5d
      [11642.344673]  [<ffffffff8117dc95>] do_vfs_ioctl+0x458/0x4dc
      [11642.346024]  [<ffffffff81186bbe>] ? __fget_light+0x62/0x71
      [11642.346873]  [<ffffffff8117dd70>] SyS_ioctl+0x57/0x79
      [11642.347720]  [<ffffffff8147fa97>] entry_SYSCALL_64_fastpath+0x12/0x6b
      [11642.350222] 4 locks held by fdm-stress/26849:
      [11642.350898]  #0:  (sb_writers#11){.+.+.+}, at: [<ffffffff811706ee>] __sb_start_write+0x5f/0xb0
      [11642.352375]  #1:  (&type->i_mutex_dir_key#4/1){+.+.+.}, at: [<ffffffffa0465981>] btrfs_mksubvol+0x4b/0x45d [btrfs]
      [11642.354072]  #2:  (&fs_info->subvol_sem){++++..}, at: [<ffffffffa0465a2a>] btrfs_mksubvol+0xf4/0x45d [btrfs]
      [11642.355647]  #3:  (&root->ordered_extent_mutex){+.+...}, at: [<ffffffffa044f456>] btrfs_wait_ordered_extents+0x50/0x217 [btrfs]
      [11642.357516] INFO: task fdm-stress:26850 blocked for more than 120 seconds.
      [11642.358508]       Not tainted 4.4.0-rc6-btrfs-next-21+ #1
      [11642.359376] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [11642.368625] fdm-stress      D ffff88021f167688     0 26850  26591 0x00000000
      [11642.369716]  ffff88021f167688 0000000000000001 0000000000014ec0 ffff88023edd4ec0
      [11642.370950]  ffff880128a98680 ffff88021f168000 ffff88023edd4ec0 7fffffffffffffff
      [11642.372210]  0000000000000002 ffffffff8147b7f9 ffff88021f1676a0 ffffffff8147b541
      [11642.373430] Call Trace:
      [11642.373853]  [<ffffffff8147b7f9>] ? bit_wait+0x2f/0x2f
      [11642.374623]  [<ffffffff8147b541>] schedule+0x82/0x9a
      [11642.375948]  [<ffffffff8147e7fe>] schedule_timeout+0x43/0x109
      [11642.376862]  [<ffffffff8147b7f9>] ? bit_wait+0x2f/0x2f
      [11642.377637]  [<ffffffff8108afd1>] ? trace_hardirqs_on_caller+0x17b/0x197
      [11642.378610]  [<ffffffff8108affa>] ? trace_hardirqs_on+0xd/0xf
      [11642.379457]  [<ffffffff810b079b>] ? timekeeping_get_ns+0xe/0x33
      [11642.380366]  [<ffffffff810b0f61>] ? ktime_get+0x41/0x52
      [11642.381353]  [<ffffffff8147ac08>] io_schedule_timeout+0xa0/0x102
      [11642.382255]  [<ffffffff8147ac08>] ? io_schedule_timeout+0xa0/0x102
      [11642.383162]  [<ffffffff8147b814>] bit_wait_io+0x1b/0x39
      [11642.383945]  [<ffffffff8147bb21>] __wait_on_bit_lock+0x4c/0x90
      [11642.384875]  [<ffffffff8111829f>] __lock_page+0x66/0x68
      [11642.385749]  [<ffffffff81082f29>] ? autoremove_wake_function+0x3a/0x3a
      [11642.386721]  [<ffffffffa0450ddd>] lock_page+0x31/0x34 [btrfs]
      [11642.387596]  [<ffffffffa0454e3b>] extent_write_cache_pages.isra.19.constprop.35+0x1af/0x2f4 [btrfs]
      [11642.389030]  [<ffffffffa0455373>] extent_writepages+0x4b/0x5c [btrfs]
      [11642.389973]  [<ffffffff810a25ad>] ? rcu_read_lock_sched_held+0x61/0x69
      [11642.390939]  [<ffffffffa043c913>] ? btrfs_writepage_start_hook+0xce/0xce [btrfs]
      [11642.392271]  [<ffffffffa0451c32>] ? __clear_extent_bit+0x26e/0x2c0 [btrfs]
      [11642.393305]  [<ffffffffa043aa82>] btrfs_writepages+0x28/0x2a [btrfs]
      [11642.394239]  [<ffffffff811236bc>] do_writepages+0x23/0x2c
      [11642.395045]  [<ffffffff811198c9>] __filemap_fdatawrite_range+0x5a/0x61
      [11642.395991]  [<ffffffff81119946>] filemap_fdatawrite_range+0x13/0x15
      [11642.397144]  [<ffffffffa044f87e>] btrfs_start_ordered_extent+0xd0/0x1a1 [btrfs]
      [11642.398392]  [<ffffffffa0452094>] ? clear_extent_bit+0x17/0x19 [btrfs]
      [11642.399363]  [<ffffffffa0445945>] btrfs_get_blocks_direct+0x12b/0x61c [btrfs]
      [11642.400445]  [<ffffffff8119f7a1>] ? dio_bio_add_page+0x3d/0x54
      [11642.401309]  [<ffffffff8119fa93>] ? submit_page_section+0x7b/0x111
      [11642.402213]  [<ffffffff811a0258>] do_blockdev_direct_IO+0x685/0xc24
      [11642.403139]  [<ffffffffa044581a>] ? btrfs_page_exists_in_range+0x1a1/0x1a1 [btrfs]
      [11642.404360]  [<ffffffffa043d267>] ? btrfs_get_extent_fiemap+0x1c0/0x1c0 [btrfs]
      [11642.406187]  [<ffffffff811a0828>] __blockdev_direct_IO+0x31/0x33
      [11642.407070]  [<ffffffff811a0828>] ? __blockdev_direct_IO+0x31/0x33
      [11642.407990]  [<ffffffffa043d267>] ? btrfs_get_extent_fiemap+0x1c0/0x1c0 [btrfs]
      [11642.409192]  [<ffffffffa043b4ca>] btrfs_direct_IO+0x1c7/0x27e [btrfs]
      [11642.410146]  [<ffffffffa043d267>] ? btrfs_get_extent_fiemap+0x1c0/0x1c0 [btrfs]
      [11642.411291]  [<ffffffff81119a2c>] generic_file_read_iter+0x89/0x4e1
      [11642.412263]  [<ffffffff8108ac05>] ? mark_lock+0x24/0x201
      [11642.413057]  [<ffffffff8116e1f8>] __vfs_read+0x79/0x9d
      [11642.413897]  [<ffffffff8116e6f1>] vfs_read+0x8f/0xd2
      [11642.414708]  [<ffffffff8116ef3d>] SyS_read+0x50/0x7e
      [11642.415573]  [<ffffffff8147fa97>] entry_SYSCALL_64_fastpath+0x12/0x6b
      [11642.416572] 1 lock held by fdm-stress/26850:
      [11642.417345]  #0:  (&f->f_pos_lock){+.+.+.}, at: [<ffffffff811877e8>] __fdget_pos+0x3a/0x40
      [11642.418703] INFO: task fdm-stress:26851 blocked for more than 120 seconds.
      [11642.419698]       Not tainted 4.4.0-rc6-btrfs-next-21+ #1
      [11642.420612] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [11642.421807] fdm-stress      D ffff880196483d28     0 26851  26591 0x00000000
      [11642.422878]  ffff880196483d28 00ff8801c8f60740 0000000000014ec0 ffff88023ed94ec0
      [11642.424149]  ffff8801c8f60740 ffff880196484000 0000000000000246 ffff8801c8f60740
      [11642.425374]  ffff8801bb711840 ffff8801bb711878 ffff880196483d40 ffffffff8147b541
      [11642.426591] Call Trace:
      [11642.427013]  [<ffffffff8147b541>] schedule+0x82/0x9a
      [11642.427856]  [<ffffffff8147b6d5>] schedule_preempt_disabled+0x18/0x24
      [11642.428852]  [<ffffffff8147c23a>] mutex_lock_nested+0x1d7/0x3b4
      [11642.429743]  [<ffffffffa044f456>] ? btrfs_wait_ordered_extents+0x50/0x217 [btrfs]
      [11642.430911]  [<ffffffffa044f456>] btrfs_wait_ordered_extents+0x50/0x217 [btrfs]
      [11642.432102]  [<ffffffffa044f674>] ? btrfs_wait_ordered_roots+0x57/0x191 [btrfs]
      [11642.433259]  [<ffffffffa044f456>] ? btrfs_wait_ordered_extents+0x50/0x217 [btrfs]
      [11642.434431]  [<ffffffffa044f6ea>] btrfs_wait_ordered_roots+0xcd/0x191 [btrfs]
      [11642.436079]  [<ffffffffa0410cab>] btrfs_sync_fs+0xe0/0x1ad [btrfs]
      [11642.437009]  [<ffffffff81197900>] ? SyS_tee+0x23c/0x23c
      [11642.437860]  [<ffffffff81197920>] sync_fs_one_sb+0x20/0x22
      [11642.438723]  [<ffffffff81171435>] iterate_supers+0x75/0xc2
      [11642.439597]  [<ffffffff81197d00>] sys_sync+0x52/0x80
      [11642.440454]  [<ffffffff8147fa97>] entry_SYSCALL_64_fastpath+0x12/0x6b
      [11642.441533] 3 locks held by fdm-stress/26851:
      [11642.442370]  #0:  (&type->s_umount_key#37){+++++.}, at: [<ffffffff8117141f>] iterate_supers+0x5f/0xc2
      [11642.444043]  #1:  (&fs_info->ordered_operations_mutex){+.+...}, at: [<ffffffffa044f661>] btrfs_wait_ordered_roots+0x44/0x191 [btrfs]
      [11642.446010]  #2:  (&root->ordered_extent_mutex){+.+...}, at: [<ffffffffa044f456>] btrfs_wait_ordered_extents+0x50/0x217 [btrfs]
      
      This happened because under specific timings the path for direct IO reads
      can deadlock with concurrent buffered writes. The diagram below shows how
      this happens for an example file that has the following layout:
      
           [  extent A  ]  [  extent B  ]  [ ....
           0K              4K              8K
      
           CPU 1                                               CPU 2                             CPU 3
      
      DIO read against range
       [0K, 8K[ starts
      
      btrfs_direct_IO()
        --> calls btrfs_get_blocks_direct()
            which finds the extent map for the
            extent A and leaves the range
            [0K, 4K[ locked in the inode's
            io tree
      
                                                         buffered write against
                                                         range [4K, 8K[ starts
      
                                                         __btrfs_buffered_write()
                                                           --> dirties page at 4K
      
                                                                                           a user space
                                                                                           task calls sync
                                                                                           for e.g or
                                                                                           writepages() is
                                                                                           invoked by mm
      
                                                                                           writepages()
                                                                                             run_delalloc_range()
                                                                                               cow_file_range()
                                                                                                 --> ordered extent X
                                                                                                     for the buffered
                                                                                                     write is created
                                                                                                     and
                                                                                                     writeback starts
      
        --> calls btrfs_get_blocks_direct()
            again, without submitting first
            a bio for reading extent A, and
            finds the extent map for extent B
      
        --> calls lock_extent_direct()
      
            --> locks range [4K, 8K[
            --> finds ordered extent X
                covering range [4K, 8K[
            --> unlocks range [4K, 8K[
      
                                                        buffered write against
                                                        range [0K, 8K[ starts
      
                                                        __btrfs_buffered_write()
                                                          prepare_pages()
                                                            --> locks pages with
                                                                offsets 0 and 4K
                                                          lock_and_cleanup_extent_if_need()
                                                            --> blocks attempting to
                                                                lock range [0K, 8K[ in
                                                                the inode's io tree,
                                                                because the range [0, 4K[
                                                                is already locked by the
                                                                direct IO task at CPU 1
      
            --> calls
                btrfs_start_ordered_extent(oe X)
      
                btrfs_start_ordered_extent(oe X)
      
                  --> At this point writeback for ordered
                      extent X has not finished yet
      
                  filemap_fdatawrite_range()
                    btrfs_writepages()
                      extent_writepages()
                        extent_write_cache_pages()
                          --> finds page with offset 0
                              with the writeback tag
                              (and not dirty)
                          --> tries to lock it
                               --> deadlock, task at CPU 2
                                   has the page locked and
                                   is blocked on the io range
                                   [0, 4K[ that was locked
                                   earlier by this task
      
      So fix this by falling back to a buffered read in the direct IO read path
      when an ordered extent for a buffered write is found.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8eb64c91
    • Filipe Manana's avatar
      Btrfs: fix extent_same allowing destination offset beyond i_size · 4165ca37
      Filipe Manana authored
      commit f4dfe687 upstream.
      
      When using the same file as the source and destination for a dedup
      (extent_same ioctl) operation we were allowing it to dedup to a
      destination offset beyond the file's size, which doesn't make sense and
      it's not allowed for the case where the source and destination files are
      not the same file. This made de deduplication operation successful only
      when the source range corresponded to a hole, a prealloc extent or an
      extent with all bytes having a value of 0x00. This was also leaving a
      file hole (between i_size and destination offset) without the
      corresponding file extent items, which can be reproduced with the
      following steps for example:
      
        $ mkfs.btrfs -f /dev/sdi
        $ mount /dev/sdi /mnt/sdi
      
        $ xfs_io -f -c "pwrite -S 0xab 304457 404990" /mnt/sdi/foobar
        wrote 404990/404990 bytes at offset 304457
        395 KiB, 99 ops; 0.0000 sec (31.150 MiB/sec and 7984.5149 ops/sec)
      
        $ /git/hub/duperemove/btrfs-extent-same 24576 /mnt/sdi/foobar 28672 /mnt/sdi/foobar 929792
        Deduping 2 total files
        (28672, 24576): /mnt/sdi/foobar
        (929792, 24576): /mnt/sdi/foobar
        1 files asked to be deduped
        i: 0, status: 0, bytes_deduped: 24576
        24576 total bytes deduped in this operation
      
        $ umount /mnt/sdi
        $ btrfsck /dev/sdi
        Checking filesystem on /dev/sdi
        UUID: 98c528aa-0833-427d-9403-b98032ffbf9d
        checking extents
        checking free space cache
        checking fs roots
        root 5 inode 257 errors 100, file extent discount
        Found file extent holes:
                start: 712704, len: 217088
        found 540673 bytes used err is 1
        total csum bytes: 400
        total tree bytes: 131072
        total fs tree bytes: 32768
        total extent tree bytes: 16384
        btree space waste bytes: 123675
        file data blocks allocated: 671744
          referenced 671744
        btrfs-progs v4.2.3
      
      So fix this by not allowing the destination to go beyond the file's size,
      just as we do for the same where the source and destination files are not
      the same.
      
      A test for xfstests follows.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4165ca37
    • Filipe Manana's avatar
      Btrfs: fix file loss on log replay after renaming a file and fsync · 36ff8607
      Filipe Manana authored
      commit 2be63d5c upstream.
      
      We have two cases where we end up deleting a file at log replay time
      when we should not. For this to happen the file must have been renamed
      and a directory inode must have been fsynced/logged.
      
      Two examples that exercise these two cases are listed below.
      
        Case 1)
      
        $ mkfs.btrfs -f /dev/sdb
        $ mount /dev/sdb /mnt
        $ mkdir -p /mnt/a/b
        $ mkdir /mnt/c
        $ touch /mnt/a/b/foo
        $ sync
        $ mv /mnt/a/b/foo /mnt/c/
        # Create file bar just to make sure the fsync on directory a/ does
        # something and it's not a no-op.
        $ touch /mnt/a/bar
        $ xfs_io -c "fsync" /mnt/a
        < power fail / crash >
      
        The next time the filesystem is mounted, the log replay procedure
        deletes file foo.
      
        Case 2)
      
        $ mkfs.btrfs -f /dev/sdb
        $ mount /dev/sdb /mnt
        $ mkdir /mnt/a
        $ mkdir /mnt/b
        $ mkdir /mnt/c
        $ touch /mnt/a/foo
        $ ln /mnt/a/foo /mnt/b/foo_link
        $ touch /mnt/b/bar
        $ sync
        $ unlink /mnt/b/foo_link
        $ mv /mnt/b/bar /mnt/c/
        $ xfs_io -c "fsync" /mnt/a/foo
        < power fail / crash >
      
        The next time the filesystem is mounted, the log replay procedure
        deletes file bar.
      
      The reason why the files are deleted is because when we log inodes
      other then the fsync target inode, we ignore their last_unlink_trans
      value and leave the log without enough information to later replay the
      rename operations. So we need to look at the last_unlink_trans values
      and fallback to a transaction commit if they are greater than the
      id of the last committed transaction.
      
      So fix this by looking at the last_unlink_trans values and fallback to
      transaction commits when needed. Also, when logging other inodes (for
      case 1 we logged descendants of the fsync target inode while for case 2
      we logged ascendants) we need to care about concurrent tasks updating
      the last_unlink_trans of inodes we are logging (which was already an
      existing problem in check_parent_dirs_for_sync()). Since we can not
      acquire their inode mutex (vfs' struct inode ->i_mutex), as that causes
      deadlocks with other concurrent operations that acquire the i_mutex of
      2 inodes (other fsyncs or renames for example), we need to serialize on
      the log_mutex of the inode we are logging. A task setting a new value for
      an inode's last_unlink_trans must acquire the inode's log_mutex and it
      must do this update before doing the actual unlink operation (which is
      already the case except when deleting a snapshot). Conversely the task
      logging the inode must first log the inode and then check the inode's
      last_unlink_trans value while holding its log_mutex, as if its value is
      not greater then the id of the last committed transaction it means it
      logged a safe state of the inode's items, while if its value is not
      smaller then the id of the last committed transaction it means the inode
      state it has logged might not be safe (the concurrent task might have
      just updated last_unlink_trans but hasn't done yet the unlink operation)
      and therefore a transaction commit must be done.
      
      Test cases for xfstests follow in separate patches.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      36ff8607
    • Filipe Manana's avatar
      Btrfs: fix unreplayable log after snapshot delete + parent dir fsync · 29b3f509
      Filipe Manana authored
      commit 1ec9a1ae upstream.
      
      If we delete a snapshot, fsync its parent directory and crash/power fail
      before the next transaction commit, on the next mount when we attempt to
      replay the log tree of the root containing the parent directory we will
      fail and prevent the filesystem from mounting, which is solvable by wiping
      out the log trees with the btrfs-zero-log tool but very inconvenient as
      we will lose any data and metadata fsynced before the parent directory
      was fsynced.
      
      For example:
      
        $ mkfs.btrfs -f /dev/sdc
        $ mount /dev/sdc /mnt
        $ mkdir /mnt/testdir
        $ btrfs subvolume snapshot /mnt /mnt/testdir/snap
        $ btrfs subvolume delete /mnt/testdir/snap
        $ xfs_io -c "fsync" /mnt/testdir
        < crash / power failure and reboot >
        $ mount /dev/sdc /mnt
        mount: mount(2) failed: No such file or directory
      
      And in dmesg/syslog we get the following message and trace:
      
      [192066.361162] BTRFS info (device dm-0): failed to delete reference to snap, inode 257 parent 257
      [192066.363010] ------------[ cut here ]------------
      [192066.365268] WARNING: CPU: 4 PID: 5130 at fs/btrfs/inode.c:3986 __btrfs_unlink_inode+0x17a/0x354 [btrfs]()
      [192066.367250] BTRFS: Transaction aborted (error -2)
      [192066.368401] Modules linked in: btrfs dm_flakey dm_mod ppdev sha256_generic xor raid6_pq hmac drbg ansi_cprng aesni_intel acpi_cpufreq tpm_tis aes_x86_64 tpm ablk_helper evdev cryptd sg parport_pc i2c_piix4 psmouse lrw parport i2c_core pcspkr gf128mul processor serio_raw glue_helper button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs]
      [192066.377154] CPU: 4 PID: 5130 Comm: mount Tainted: G        W       4.4.0-rc6-btrfs-next-20+ #1
      [192066.378875] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
      [192066.380889]  0000000000000000 ffff880143923670 ffffffff81257570 ffff8801439236b8
      [192066.382561]  ffff8801439236a8 ffffffff8104ec07 ffffffffa039dc2c 00000000fffffffe
      [192066.384191]  ffff8801ed31d000 ffff8801b9fc9c88 ffff8801086875e0 ffff880143923710
      [192066.385827] Call Trace:
      [192066.386373]  [<ffffffff81257570>] dump_stack+0x4e/0x79
      [192066.387387]  [<ffffffff8104ec07>] warn_slowpath_common+0x99/0xb2
      [192066.388429]  [<ffffffffa039dc2c>] ? __btrfs_unlink_inode+0x17a/0x354 [btrfs]
      [192066.389236]  [<ffffffff8104ec68>] warn_slowpath_fmt+0x48/0x50
      [192066.389884]  [<ffffffffa039dc2c>] __btrfs_unlink_inode+0x17a/0x354 [btrfs]
      [192066.390621]  [<ffffffff81184b55>] ? iput+0xb0/0x266
      [192066.391200]  [<ffffffffa039ea25>] btrfs_unlink_inode+0x1c/0x3d [btrfs]
      [192066.391930]  [<ffffffffa03ca623>] check_item_in_log+0x1fe/0x29b [btrfs]
      [192066.392715]  [<ffffffffa03ca827>] replay_dir_deletes+0x167/0x1cf [btrfs]
      [192066.393510]  [<ffffffffa03cccc7>] replay_one_buffer+0x417/0x570 [btrfs]
      [192066.394241]  [<ffffffffa03ca164>] walk_up_log_tree+0x10e/0x1dc [btrfs]
      [192066.394958]  [<ffffffffa03cac72>] walk_log_tree+0xa5/0x190 [btrfs]
      [192066.395628]  [<ffffffffa03ce8b8>] btrfs_recover_log_trees+0x239/0x32c [btrfs]
      [192066.396790]  [<ffffffffa03cc8b0>] ? replay_one_extent+0x50a/0x50a [btrfs]
      [192066.397891]  [<ffffffffa0394041>] open_ctree+0x1d8b/0x2167 [btrfs]
      [192066.398897]  [<ffffffffa03706e1>] btrfs_mount+0x5ef/0x729 [btrfs]
      [192066.399823]  [<ffffffff8108ad98>] ? trace_hardirqs_on+0xd/0xf
      [192066.400739]  [<ffffffff8108959b>] ? lockdep_init_map+0xb9/0x1b3
      [192066.401700]  [<ffffffff811714b9>] mount_fs+0x67/0x131
      [192066.402482]  [<ffffffff81188560>] vfs_kern_mount+0x6c/0xde
      [192066.403930]  [<ffffffffa03702bd>] btrfs_mount+0x1cb/0x729 [btrfs]
      [192066.404831]  [<ffffffff8108ad98>] ? trace_hardirqs_on+0xd/0xf
      [192066.405726]  [<ffffffff8108959b>] ? lockdep_init_map+0xb9/0x1b3
      [192066.406621]  [<ffffffff811714b9>] mount_fs+0x67/0x131
      [192066.407401]  [<ffffffff81188560>] vfs_kern_mount+0x6c/0xde
      [192066.408247]  [<ffffffff8118ae36>] do_mount+0x893/0x9d2
      [192066.409047]  [<ffffffff8113009b>] ? strndup_user+0x3f/0x8c
      [192066.409842]  [<ffffffff8118b187>] SyS_mount+0x75/0xa1
      [192066.410621]  [<ffffffff8147e517>] entry_SYSCALL_64_fastpath+0x12/0x6b
      [192066.411572] ---[ end trace 2de42126c1e0a0f0 ]---
      [192066.412344] BTRFS: error (device dm-0) in __btrfs_unlink_inode:3986: errno=-2 No such entry
      [192066.413748] BTRFS: error (device dm-0) in btrfs_replay_log:2464: errno=-2 No such entry (Failed to recover log tree)
      [192066.415458] BTRFS error (device dm-0): cleaner transaction attach returned -30
      [192066.444613] BTRFS: open_ctree failed
      
      This happens because when we are replaying the log and processing the
      directory entry pointing to the snapshot in the subvolume tree, we treat
      its btrfs_dir_item item as having a location with a key type matching
      BTRFS_INODE_ITEM_KEY, which is wrong because the type matches
      BTRFS_ROOT_ITEM_KEY and therefore must be processed differently, as the
      object id refers to a root number and not to an inode in the root
      containing the parent directory.
      
      So fix this by triggering a transaction commit if an fsync against the
      parent directory is requested after deleting a snapshot. This is the
      simplest approach for a rare use case. Some alternative that avoids the
      transaction commit would require more code to explicitly delete the
      snapshot at log replay time (factoring out common code from ioctl.c:
      btrfs_ioctl_snap_destroy()), special care at fsync time to remove the
      log tree of the snapshot's root from the log root of the root of tree
      roots, amongst other steps.
      
      A test case for xfstests that triggers the issue follows.
      
        seq=`basename $0`
        seqres=$RESULT_DIR/$seq
        echo "QA output created by $seq"
        tmp=/tmp/$$
        status=1	# failure is the default!
        trap "_cleanup; exit \$status" 0 1 2 3 15
      
        _cleanup()
        {
            _cleanup_flakey
            cd /
            rm -f $tmp.*
        }
      
        # get standard environment, filters and checks
        . ./common/rc
        . ./common/filter
        . ./common/dmflakey
      
        # real QA test starts here
        _need_to_be_root
        _supported_fs btrfs
        _supported_os Linux
        _require_scratch
        _require_dm_target flakey
        _require_metadata_journaling $SCRATCH_DEV
      
        rm -f $seqres.full
      
        _scratch_mkfs >>$seqres.full 2>&1
        _init_flakey
        _mount_flakey
      
        # Create a snapshot at the root of our filesystem (mount point path), delete it,
        # fsync the mount point path, crash and mount to replay the log. This should
        # succeed and after the filesystem is mounted the snapshot should not be visible
        # anymore.
        _run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT $SCRATCH_MNT/snap1
        _run_btrfs_util_prog subvolume delete $SCRATCH_MNT/snap1
        $XFS_IO_PROG -c "fsync" $SCRATCH_MNT
        _flakey_drop_and_remount
        [ -e $SCRATCH_MNT/snap1 ] && \
            echo "Snapshot snap1 still exists after log replay"
      
        # Similar scenario as above, but this time the snapshot is created inside a
        # directory and not directly under the root (mount point path).
        mkdir $SCRATCH_MNT/testdir
        _run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT $SCRATCH_MNT/testdir/snap2
        _run_btrfs_util_prog subvolume delete $SCRATCH_MNT/testdir/snap2
        $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/testdir
        _flakey_drop_and_remount
        [ -e $SCRATCH_MNT/testdir/snap2 ] && \
            echo "Snapshot snap2 still exists after log replay"
      
        _unmount_flakey
      
        echo "Silence is golden"
        status=0
        exit
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Tested-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      29b3f509
    • David Sterba's avatar
      btrfs: change max_inline default to 2048 · 259d2fbd
      David Sterba authored
      commit f7e98a7f upstream.
      
      The current practical default is ~4k on x86_64 (the logic is more complex,
      simplified for brevity), the inlined files land in the metadata group and
      thus consume space that could be needed for the real metadata.
      
      The inlining brings some usability surprises:
      
      1) total space consumption measured on various filesystems and btrfs
         with DUP metadata was quite visible because of the duplicated data
         within metadata
      
      2) inlined data may exhaust the metadata, which are more precious in case
         the entire device space is allocated to chunks (ie. balance cannot
         make the space more compact)
      
      3) performance suffers a bit as the inlined blocks are duplicate and
         stored far away on the device.
      
      Proposed fix: set the default to 2048
      
      This fixes namely 1), the total filesysystem space consumption will be on
      par with other filesystems.
      
      Partially fixes 2), more data are pushed to the data block groups.
      
      The characteristics of 3) are based on actual small file size
      distribution.
      
      The change is independent of the metadata blockgroup type (though it's
      most visible with DUP) or system page size as these parameters are not
      trival to find out, compared to file size.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      259d2fbd
    • David Sterba's avatar
      btrfs: remove error message from search ioctl for nonexistent tree · 202efaae
      David Sterba authored
      commit 11ea474f upstream.
      
      Let's remove the error message that appears when the tree_id is not
      present. This can happen with the quota tree and has been observed in
      practice. The applications are supposed to handle -ENOENT and we don't
      need to report that in the system log as it's not a fatal error.
      Reported-by: default avatarVlastimil Babka <vbabka@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      202efaae
    • Josef Bacik's avatar
      Btrfs: fix truncate_space_check · 94863434
      Josef Bacik authored
      commit dc95f7bf upstream.
      
      truncate_space_check is using btrfs_csum_bytes_to_leaves() but forgetting to
      multiply by nodesize so we get an actual byte count.  We need a tracepoint here
      so that we have the matching reserve for the release that will come later.  Also
      add a comment to make clear what the intent of truncate_space_check is.
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      94863434
    • Zhao Lei's avatar
      btrfs: reada: Fix in-segment calculation for reada · 94ad29d2
      Zhao Lei authored
      commit 50378530 upstream.
      
      reada_zone->end is end pos of segment:
       end = start + cache->key.offset - 1;
      
      So we need to use "<=" in condition to judge is a pos in the
      segment.
      
      The problem happened rearly, because logical pos rarely pointed
      to last 4k of a blockgroup, but we need to fix it to make code
      right in logic.
      Signed-off-by: default avatarZhao Lei <zhaolei@cn.fujitsu.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      94ad29d2
    • Alex Deucher's avatar
      drm/amdgpu: fix DP mode validation · e23744bf
      Alex Deucher authored
      commit c47b9e09 upstream.
      
      Switch the order of the loops to walk the rates on the top
      so we exhaust all DP 1.1 rate/lane combinations before trying
      DP 1.2 rate/lane combos.
      
      This avoids selecting rates that are supported by the monitor,
      but not the connector leading to valid modes getting rejected.
      
      bug:
      https://bugs.freedesktop.org/show_bug.cgi?id=95206Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e23744bf
    • Alex Deucher's avatar
      drm/radeon: fix DP mode validation · df6ac4bc
      Alex Deucher authored
      commit ff0bd441 upstream.
      
      Switch the order of the loops to walk the rates on the top
      so we exhaust all DP 1.1 rate/lane combinations before trying
      DP 1.2 rate/lane combos.
      
      This avoids selecting rates that are supported by the monitor,
      but not the connector leading to valid modes getting rejected.
      
      bug:
      https://bugs.freedesktop.org/show_bug.cgi?id=95206Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      df6ac4bc
    • Arindam Nath's avatar
      drm/radeon: fix DP link training issue with second 4K monitor · 0f1b8876
      Arindam Nath authored
      commit 1a738347 upstream.
      
      There is an issue observed when we hotplug a second DP
      4K monitor to the system. Sometimes, the link training
      fails for the second monitor after HPD interrupt
      generation.
      
      The issue happens when some queued or deferred transactions
      are already present on the AUX channel when we initiate
      a new transcation to (say) get DPCD or during link training.
      
      We set AUX_IGNORE_HPD_DISCON bit in the AUX_CONTROL
      register so that we can ignore any such deferred
      transactions when a new AUX transaction is initiated.
      Signed-off-by: default avatarArindam Nath <arindam.nath@amd.com>
      Reviewed-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0f1b8876
    • Imre Deak's avatar
      drm/i915/bdw: Add missing delay during L3 SQC credit programming · 9907c72b
      Imre Deak authored
      commit d6a862fe upstream.
      
      BSpec requires us to wait ~100 clocks before re-enabling clock gating,
      so make sure we do this.
      
      CC: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: default avatarImre Deak <imre.deak@intel.com>
      Reviewed-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1462280061-1457-2-git-send-email-imre.deak@intel.com
      (cherry picked from commit 48e5d68d)
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9907c72b
    • Lyude's avatar
      Revert "drm/i915: start adding dp mst audio" · 6c1827a9
      Lyude authored
      commit 26792526 upstream.
      
      Right now MST audio is causing too many kernel panics to really keep
      around in the kernel. On top of that, even after fixing said panics it's
      still basically non-functional (at least on all the setups I've tested
      it on). Revert until we have a proper solution for this.
      
      This reverts commit 3d52ccf5.
      Signed-off-by: default avatarLyude <cpaul@redhat.com>
      Fixes: 3d52ccf5 ("drm/i915: start adding dp mst audio")
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: http://patchwork.freedesktop.org/patch/msgid/1462287692-28570-1-git-send-email-cpaul@redhat.com
      (cherry picked from commit 5a8f97ea)
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6c1827a9
    • Daniel Vetter's avatar
      drm/i915: Bail out of pipe config compute loop on LPT · 36d8fb8f
      Daniel Vetter authored
      commit 2700818a upstream.
      
      LPT is pch, so might run into the fdi bandwidth constraint (especially
      since it has only 2 lanes). But right now we just force pipe_bpp back
      to 24, resulting in a nice loop (which we bail out with a loud
      WARN_ON). Fix this.
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      References: https://bugs.freedesktop.org/show_bug.cgi?id=93477Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Tested-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: http://patchwork.freedesktop.org/patch/msgid/1462264381-7573-1-git-send-email-daniel.vetter@ffwll.ch
      (cherry picked from commit f58a1acc)
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      36d8fb8f
    • Lucas Stach's avatar
      drm/radeon: fix PLL sharing on DCE6.1 (v2) · eca3d50f
      Lucas Stach authored
      commit e3c00d87 upstream.
      
      On DCE6.1 PPLL2 is exclusively available to UNIPHYA, so it should not
      be taken into consideration when looking for an already enabled PLL
      to be shared with other outputs.
      
      This fixes the broken VGA port (TRAVIS DP->VGA bridge) on my Richland
      based laptop, where the internal display is connected to UNIPHYA through
      a TRAVIS DP->LVDS bridge.
      
      Bug:
      https://bugs.freedesktop.org/show_bug.cgi?id=78987
      
      v2: agd: add check in radeon_get_shared_nondp_ppll as well, drop
          extra parameter.
      Signed-off-by: default avatarLucas Stach <dev@lynxeye.de>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eca3d50f
    • Ville Syrjälä's avatar
      drm/i915: Update CDCLK_FREQ register on BDW after changing cdclk frequency · be193155
      Ville Syrjälä authored
      commit a04e23d4 upstream.
      
      Update CDCLK_FREQ on BDW after changing the cdclk frequency. Not sure
      if this is a late addition to the spec, or if I simply overlooked this
      step when writing the original code.
      
      This is what Bspec has to say about CDCLK_FREQ:
      "Program this field to the CD clock frequency minus one. This is used to
       generate a divided down clock for miscellaneous timers in display."
      
      And the "Broadwell Sequences for Changing CD Clock Frequency" section
      clarifies this further:
      "For CD clock 337.5 MHz, program 337 decimal.
       For CD clock 450 MHz, program 449 decimal.
       For CD clock 540 MHz, program 539 decimal.
       For CD clock 675 MHz, program 674 decimal."
      
      Cc: Mika Kahola <mika.kahola@intel.com>
      Fixes: b432e5cf ("drm/i915: BDW clock change support")
      Signed-off-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1461689194-6079-2-git-send-email-ville.syrjala@linux.intel.comReviewed-by: default avatarMika Kahola <mika.kahola@intel.com>
      (cherry picked from commit 7f1052a8)
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      be193155
    • Mauro Carvalho Chehab's avatar
      Revert "[media] videobuf2-v4l2: Verify planes array in buffer dequeueing" · 1943bd0f
      Mauro Carvalho Chehab authored
      commit 93f0750d upstream.
      
      This patch causes a Kernel panic when called on a DVB driver.
      
      This was also reported by David R <david@unsolicited.net>:
      
      May  7 14:47:35 server kernel: [  501.247123] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
      May  7 14:47:35 server kernel: [  501.247239] IP: [<ffffffffa0222c71>] __verify_planes_array.isra.3+0x1/0x80 [videobuf2_v4l2]
      May  7 14:47:35 server kernel: [  501.247354] PGD cae6f067 PUD ca99c067 PMD 0
      May  7 14:47:35 server kernel: [  501.247426] Oops: 0000 [#1] SMP
      May  7 14:47:35 server kernel: [  501.247482] Modules linked in: xfs tun xt_connmark xt_TCPMSS xt_tcpmss xt_owner xt_REDIRECT nf_nat_redirect xt_nat ipt_MASQUERADE nf_nat_masquerade_ipv4 ts_kmp ts_bm xt_string ipt_REJECT nf_reject_ipv4 xt_recent xt_conntrack xt_multiport xt_pkttype xt_tcpudp xt_mark nf_log_ipv4 nf_log_common xt_LOG xt_limit iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables ip6table_filter ip6_tables x_tables pppoe pppox dm_crypt ts2020 regmap_i2c ds3000 cx88_dvb dvb_pll cx88_vp3054_i2c mt352 videobuf2_dvb cx8800 cx8802 cx88xx pl2303 tveeprom videobuf2_dma_sg ppdev videobuf2_memops videobuf2_v4l2 videobuf2_core dvb_usb_digitv snd_hda_codec_via snd_hda_codec_hdmi snd_hda_codec_generic radeon dvb_usb snd_hda_intel amd64_edac_mod serio_raw snd_hda_codec edac_core fbcon k10temp bitblit softcursor snd_hda_core font snd_pcm_oss i2c_piix4 snd_mixer_oss tileblit drm_kms_helper syscopyarea snd_pcm snd_seq_dummy sysfillrect snd_seq_oss sysimgblt fb_sys_fops ttm snd_seq_midi r8169 snd_rawmidi drm snd_seq_midi_event e1000e snd_seq snd_seq_device snd_timer snd ptp pps_core i2c_algo_bit soundcore parport_pc ohci_pci shpchp tpm_tis tpm nfsd auth_rpcgss oid_registry hwmon_vid exportfs nfs_acl mii nfs bonding lockd grace lp sunrpc parport
      May  7 14:47:35 server kernel: [  501.249564] CPU: 1 PID: 6889 Comm: vb2-cx88[0] Not tainted 4.5.3 #3
      May  7 14:47:35 server kernel: [  501.249644] Hardware name: System manufacturer System Product Name/M4A785TD-V EVO, BIOS 0211    07/08/2009
      May  7 14:47:35 server kernel: [  501.249767] task: ffff8800aebf3600 ti: ffff8801e07a0000 task.ti: ffff8801e07a0000
      May  7 14:47:35 server kernel: [  501.249861] RIP: 0010:[<ffffffffa0222c71>]  [<ffffffffa0222c71>] __verify_planes_array.isra.3+0x1/0x80 [videobuf2_v4l2]
      May  7 14:47:35 server kernel: [  501.250002] RSP: 0018:ffff8801e07a3de8  EFLAGS: 00010086
      May  7 14:47:35 server kernel: [  501.250071] RAX: 0000000000000283 RBX: ffff880210dc5000 RCX: 0000000000000283
      May  7 14:47:35 server kernel: [  501.250161] RDX: ffffffffa0222cf0 RSI: 0000000000000000 RDI: ffff880210dc5014
      May  7 14:47:35 server kernel: [  501.250251] RBP: ffff8801e07a3df8 R08: ffff8801e07a0000 R09: 0000000000000000
      May  7 14:47:35 server kernel: [  501.250348] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8800cda2a9d8
      May  7 14:47:35 server kernel: [  501.250438] R13: ffff880210dc51b8 R14: 0000000000000000 R15: ffff8800cda2a828
      May  7 14:47:35 server kernel: [  501.250528] FS:  00007f5b77fff700(0000) GS:ffff88021fc40000(0000) knlGS:00000000adaffb40
      May  7 14:47:35 server kernel: [  501.250631] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      May  7 14:47:35 server kernel: [  501.250704] CR2: 0000000000000004 CR3: 00000000ca19d000 CR4: 00000000000006e0
      May  7 14:47:35 server kernel: [  501.250794] Stack:
      May  7 14:47:35 server kernel: [  501.250822]  ffff8801e07a3df8 ffffffffa0222cfd ffff8801e07a3e70 ffffffffa0236beb
      May  7 14:47:35 server kernel: [  501.250937]  0000000000000283 ffff8801e07a3e94 0000000000000000 0000000000000000
      May  7 14:47:35 server kernel: [  501.251051]  ffff8800aebf3600 ffffffff8108d8e0 ffff8801e07a3e38 ffff8801e07a3e38
      May  7 14:47:35 server kernel: [  501.251165] Call Trace:
      May  7 14:47:35 server kernel: [  501.251200]  [<ffffffffa0222cfd>] ? __verify_planes_array_core+0xd/0x10 [videobuf2_v4l2]
      May  7 14:47:35 server kernel: [  501.251306]  [<ffffffffa0236beb>] vb2_core_dqbuf+0x2eb/0x4c0 [videobuf2_core]
      May  7 14:47:35 server kernel: [  501.251398]  [<ffffffff8108d8e0>] ? prepare_to_wait_event+0x100/0x100
      May  7 14:47:35 server kernel: [  501.251482]  [<ffffffffa023855b>] vb2_thread+0x1cb/0x220 [videobuf2_core]
      May  7 14:47:35 server kernel: [  501.251569]  [<ffffffffa0238390>] ? vb2_core_qbuf+0x230/0x230 [videobuf2_core]
      May  7 14:47:35 server kernel: [  501.251662]  [<ffffffffa0238390>] ? vb2_core_qbuf+0x230/0x230 [videobuf2_core]
      May  7 14:47:35 server kernel: [  501.255982]  [<ffffffff8106f984>] kthread+0xc4/0xe0
      May  7 14:47:35 server kernel: [  501.260292]  [<ffffffff8106f8c0>] ? kthread_park+0x50/0x50
      May  7 14:47:35 server kernel: [  501.264615]  [<ffffffff81697a5f>] ret_from_fork+0x3f/0x70
      May  7 14:47:35 server kernel: [  501.268962]  [<ffffffff8106f8c0>] ? kthread_park+0x50/0x50
      May  7 14:47:35 server kernel: [  501.273216] Code: 0d 01 74 16 48 8b 46 28 48 8b 56 30 48 89 87 d0 01 00 00 48 89 97 d8 01 00 00 5d c3 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 <8b> 46 04 48 89 e5 8d 50 f7 31 c0 83 fa 01 76 02 5d c3 48 83 7e
      May  7 14:47:35 server kernel: [  501.282146] RIP  [<ffffffffa0222c71>] __verify_planes_array.isra.3+0x1/0x80 [videobuf2_v4l2]
      May  7 14:47:35 server kernel: [  501.286391]  RSP <ffff8801e07a3de8>
      May  7 14:47:35 server kernel: [  501.290619] CR2: 0000000000000004
      May  7 14:47:35 server kernel: [  501.294786] ---[ end trace b2b354153ccad110 ]---
      
      This reverts commit 2c1f6951.
      
      Cc: Sakari Ailus <sakari.ailus@linux.intel.com>
      Cc: Hans Verkuil <hans.verkuil@cisco.com>
      Fixes: 2c1f6951 ("[media] videobuf2-v4l2: Verify planes array in buffer dequeueing")
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@osg.samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1943bd0f
    • Marek Szyprowski's avatar
      Input: max8997-haptic - fix NULL pointer dereference · 6d8b57ff
      Marek Szyprowski authored
      commit 6ae645d5 upstream.
      
      NULL pointer derefence happens when booting with DTB because the
      platform data for haptic device is not set in supplied data from parent
      MFD device.
      
      The MFD device creates only platform data (from Device Tree) for itself,
      not for haptic child.
      
      Unable to handle kernel NULL pointer dereference at virtual address 0000009c
      pgd = c0004000
      	[0000009c] *pgd=00000000
      	Internal error: Oops: 5 [#1] PREEMPT SMP ARM
      	(max8997_haptic_probe) from [<c03f9cec>] (platform_drv_probe+0x4c/0xb0)
      	(platform_drv_probe) from [<c03f8440>] (driver_probe_device+0x214/0x2c0)
      	(driver_probe_device) from [<c03f8598>] (__driver_attach+0xac/0xb0)
      	(__driver_attach) from [<c03f67ac>] (bus_for_each_dev+0x68/0x9c)
      	(bus_for_each_dev) from [<c03f7a38>] (bus_add_driver+0x1a0/0x218)
      	(bus_add_driver) from [<c03f8db0>] (driver_register+0x78/0xf8)
      	(driver_register) from [<c0101774>] (do_one_initcall+0x90/0x1d8)
      	(do_one_initcall) from [<c0a00dbc>] (kernel_init_freeable+0x15c/0x1fc)
      	(kernel_init_freeable) from [<c06bb5b4>] (kernel_init+0x8/0x114)
      	(kernel_init) from [<c0107938>] (ret_from_fork+0x14/0x3c)
      Signed-off-by: default avatarMarek Szyprowski <m.szyprowski@samsung.com>
      Fixes: 104594b0 ("Input: add driver support for MAX8997-haptic")
      [k.kozlowski: Write commit message, add CC-stable]
      Signed-off-by: default avatarKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6d8b57ff
    • Al Viro's avatar
      get_rock_ridge_filename(): handle malformed NM entries · f18783e6
      Al Viro authored
      commit 99d82582 upstream.
      
      Payloads of NM entries are not supposed to contain NUL.  When we run
      into such, only the part prior to the first NUL goes into the
      concatenation (i.e. the directory entry name being encoded by a bunch
      of NM entries).  We do stop when the amount collected so far + the
      claimed amount in the current NM entry exceed 254.  So far, so good,
      but what we return as the total length is the sum of *claimed*
      sizes, not the actual amount collected.  And that can grow pretty
      large - not unlimited, since you'd need to put CE entries in
      between to be able to get more than the maximum that could be
      contained in one isofs directory entry / continuation chunk and
      we are stop once we'd encountered 32 CEs, but you can get about 8Kb
      easily.  And that's what will be passed to readdir callback as the
      name length.  8Kb __copy_to_user() from a buffer allocated by
      __get_free_page()
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f18783e6
    • Steven Rostedt's avatar
      tools lib traceevent: Do not reassign parg after collapse_tree() · 9bd227fa
      Steven Rostedt authored
      commit 106b816c upstream.
      
      At the end of process_filter(), collapse_tree() was changed to update
      the parg parameter, but the reassignment after the call wasn't removed.
      
      What happens is that the "current_op" gets modified and freed and parg
      is assigned to the new allocated argument. But after the call to
      collapse_tree(), parg is assigned again to the just freed "current_op",
      and this causes the tool to crash.
      
      The current_op variable must also be assigned to NULL in case of error,
      otherwise it will cause it to be free()ed twice.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Fixes: 42d6194d ("tools lib traceevent: Refactor process_filter()")
      Link: http://lkml.kernel.org/r/20160511150936.678c18a1@gandalf.local.homeSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9bd227fa
    • Johannes Thumshirn's avatar
      qla1280: Don't allocate 512kb of host tags · c9458953
      Johannes Thumshirn authored
      commit 2bcbc814 upstream.
      
      The qla1280 driver sets the scsi_host_template's can_queue field to 0xfffff
      which results in an allocation failure when allocating the block layer tags
      for the driver's queues. This was introduced with the change for host wide
      tags in commit 64d513ac - "scsi: use host wide tags by default".
      
      Reduce can_queue to MAX_OUTSTANDING_COMMANDS (512) to solve the allocation
      error.
      Signed-off-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Fixes: 64d513ac - "scsi: use host wide tags by default"
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: Michael Reed <mdr@sgi.com>
      Reviewed-by: default avatarLaurence Oberman <loberman@redhat.com>
      Reviewed-by: default avatarLee Duncan <lduncan@suse.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarJames Bottomley <jejb@linux.vnet.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c9458953
    • Al Viro's avatar
      atomic_open(): fix the handling of create_error · aa9c7ecf
      Al Viro authored
      commit 10c64cea upstream.
      
      * if we have a hashed negative dentry and either CREAT|EXCL on
      r/o filesystem, or CREAT|TRUNC on r/o filesystem, or CREAT|EXCL
      with failing may_o_create(), we should fail with EROFS or the
      error may_o_create() has returned, but not ENOENT.  Which is what
      the current code ends up returning.
      
      * if we have CREAT|TRUNC hitting a regular file on a read-only
      filesystem, we can't fail with EROFS here.  At the very least,
      not until we'd done follow_managed() - we might have a writable
      file (or a device, for that matter) bound on top of that one.
      Moreover, the code downstream will see that O_TRUNC and attempt
      to grab the write access (*after* following possible mount), so
      if we really should fail with EROFS, it will happen.  No need
      to do that inside atomic_open().
      
      The real logics is much simpler than what the current code is
      trying to do - if we decided to go for simple lookup, ended
      up with a negative dentry *and* had create_error set, fail with
      create_error.  No matter whether we'd got that negative dentry
      from lookup_real() or had found it in dcache.
      Acked-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aa9c7ecf
    • Hans de Goede's avatar
      regulator: axp20x: Fix axp22x ldo_io voltage ranges · c8ace74a
      Hans de Goede authored
      commit a2262e5a upstream.
      
      The minium voltage of 1800mV is a copy and paste error from the axp20x
      regulator info. The correct minimum voltage for the ldo_io regulators
      on the axp22x is 700mV.
      
      Fixes: 1b82b4e4 ("regulator: axp20x: Add support for AXP22X regulators")
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Acked-by: default avatarChen-Yu Tsai <wens@csie.org>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c8ace74a
    • Krzysztof Kozlowski's avatar
      regulator: s2mps11: Fix invalid selector mask and voltages for buck9 · f64ce585
      Krzysztof Kozlowski authored
      commit 3b672623 upstream.
      
      The buck9 regulator of S2MPS11 PMIC had incorrect vsel_mask (0xff
      instead of 0x1f) thus reading entire register as buck9's voltage. This
      effectively caused regulator core to interpret values as higher voltages
      than they were and then to set real voltage much lower than intended.
      
      The buck9 provides power to other regulators, including LDO13
      and LDO19 which supply the MMC2 (SD card). On Odroid XU3/XU4 the lower
      voltage caused SD card detection errors on Odroid XU3/XU4:
      	mmc1: card never left busy state
      	mmc1: error -110 whilst initialising SD card
      
      During driver probe the regulator core was checking whether initial
      voltage matches the constraints. With incorrect vsel_mask of 0xff and
      default value of 0x50, the core interpreted this as 5 V which is outside
      of constraints (3-3.775 V). Then the regulator core was adjusting the
      voltage to match the constraints. With incorrect vsel_mask this new
      voltage mapped to a vere low voltage in the driver.
      Signed-off-by: default avatarKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Reviewed-by: default avatarJavier Martinez Canillas <javier@osg.samsung.com>
      Tested-by: default avatarJavier Martinez Canillas <javier@osg.samsung.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f64ce585
    • Wanpeng Li's avatar
      workqueue: fix rebind bound workers warning · 911cf4b9
      Wanpeng Li authored
      commit f7c17d26 upstream.
      
      ------------[ cut here ]------------
      WARNING: CPU: 0 PID: 16 at kernel/workqueue.c:4559 rebind_workers+0x1c0/0x1d0
      Modules linked in:
      CPU: 0 PID: 16 Comm: cpuhp/0 Not tainted 4.6.0-rc4+ #31
      Hardware name: IBM IBM System x3550 M4 Server -[7914IUW]-/00Y8603, BIOS -[D7E128FUS-1.40]- 07/23/2013
       0000000000000000 ffff881037babb58 ffffffff8139d885 0000000000000010
       0000000000000000 0000000000000000 0000000000000000 ffff881037babba8
       ffffffff8108505d ffff881037ba0000 000011cf3e7d6e60 0000000000000046
      Call Trace:
       dump_stack+0x89/0xd4
       __warn+0xfd/0x120
       warn_slowpath_null+0x1d/0x20
       rebind_workers+0x1c0/0x1d0
       workqueue_cpu_up_callback+0xf5/0x1d0
       notifier_call_chain+0x64/0x90
       ? trace_hardirqs_on_caller+0xf2/0x220
       ? notify_prepare+0x80/0x80
       __raw_notifier_call_chain+0xe/0x10
       __cpu_notify+0x35/0x50
       notify_down_prepare+0x5e/0x80
       ? notify_prepare+0x80/0x80
       cpuhp_invoke_callback+0x73/0x330
       ? __schedule+0x33e/0x8a0
       cpuhp_down_callbacks+0x51/0xc0
       cpuhp_thread_fun+0xc1/0xf0
       smpboot_thread_fn+0x159/0x2a0
       ? smpboot_create_threads+0x80/0x80
       kthread+0xef/0x110
       ? wait_for_completion+0xf0/0x120
       ? schedule_tail+0x35/0xf0
       ret_from_fork+0x22/0x50
       ? __init_kthread_worker+0x70/0x70
      ---[ end trace eb12ae47d2382d8f ]---
      notify_down_prepare: attempt to take down CPU 0 failed
      
      This bug can be reproduced by below config w/ nohz_full= all cpus:
      
      CONFIG_BOOTPARAM_HOTPLUG_CPU0=y
      CONFIG_DEBUG_HOTPLUG_CPU0=y
      CONFIG_NO_HZ_FULL=y
      
      As Thomas pointed out:
      
      | If a down prepare callback fails, then DOWN_FAILED is invoked for all
      | callbacks which have successfully executed DOWN_PREPARE.
      |
      | But, workqueue has actually two notifiers. One which handles
      | UP/DOWN_FAILED/ONLINE and one which handles DOWN_PREPARE.
      |
      | Now look at the priorities of those callbacks:
      |
      | CPU_PRI_WORKQUEUE_UP        = 5
      | CPU_PRI_WORKQUEUE_DOWN      = -5
      |
      | So the call order on DOWN_PREPARE is:
      |
      | CB 1
      | CB ...
      | CB workqueue_up() -> Ignores DOWN_PREPARE
      | CB ...
      | CB X ---> Fails
      |
      | So we call up to CB X with DOWN_FAILED
      |
      | CB 1
      | CB ...
      | CB workqueue_up() -> Handles DOWN_FAILED
      | CB ...
      | CB X-1
      |
      | So the problem is that the workqueue stuff handles DOWN_FAILED in the up
      | callback, while it should do it in the down callback. Which is not a good idea
      | either because it wants to be called early on rollback...
      |
      | Brilliant stuff, isn't it? The hotplug rework will solve this problem because
      | the callbacks become symetric, but for the existing mess, we need some
      | workaround in the workqueue code.
      
      The boot CPU handles housekeeping duty(unbound timers, workqueues,
      timekeeping, ...) on behalf of full dynticks CPUs. It must remain
      online when nohz full is enabled. There is a priority set to every
      notifier_blocks:
      
      workqueue_cpu_up > tick_nohz_cpu_down > workqueue_cpu_down
      
      So tick_nohz_cpu_down callback failed when down prepare cpu 0, and
      notifier_blocks behind tick_nohz_cpu_down will not be called any
      more, which leads to workers are actually not unbound. Then hotplug
      state machine will fallback to undo and online cpu 0 again. Workers
      will be rebound unconditionally even if they are not unbound and
      trigger the warning in this progress.
      
      This patch fix it by catching !DISASSOCIATED to avoid rebind bound
      workers.
      
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Suggested-by: default avatarLai Jiangshan <jiangshanlai@gmail.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      911cf4b9
    • Boris Brezillon's avatar
      ARM: dts: at91: sam9x5: Fix the memory range assigned to the PMC · bef800fb
      Boris Brezillon authored
      commit aab0a4c8 upstream.
      
      The memory range assigned to the PMC (Power Management Controller) was
      not including the PMC_PCR register which are used to control peripheral
      clocks.
      
      This was working fine thanks to the page granularity of ioremap(), but
      started to fail when we switched to syscon/regmap, because regmap is
      making sure that all accesses are falling into the reserved range.
      Signed-off-by: default avatarBoris Brezillon <boris.brezillon@free-electrons.com>
      Reported-by: default avatarRichard Genoud <richard.genoud@gmail.com>
      Tested-by: default avatarRichard Genoud <richard.genoud@gmail.com>
      Fixes: 863a81c3 ("clk: at91: make use of syscon to share PMC registers in several drivers")
      Signed-off-by: default avatarNicolas Ferre <nicolas.ferre@atmel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bef800fb
    • Miklos Szeredi's avatar
      vfs: rename: check backing inode being equal · 1c1c3e93
      Miklos Szeredi authored
      commit 9409e22a upstream.
      
      If a file is renamed to a hardlink of itself POSIX specifies that rename(2)
      should do nothing and return success.
      
      This condition is checked in vfs_rename().  However it won't detect hard
      links on overlayfs where these are given separate inodes on the overlayfs
      layer.
      
      Overlayfs itself detects this condition and returns success without doing
      anything, but then vfs_rename() will proceed as if this was a successful
      rename (detach_mounts(), d_move()).
      
      The correct thing to do is to detect this condition before even calling
      into overlayfs.  This patch does this by calling vfs_select_inode() to get
      the underlying inodes.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1c1c3e93
    • Miklos Szeredi's avatar
      vfs: add vfs_select_inode() helper · ad56dcb2
      Miklos Szeredi authored
      commit 54d5ca87 upstream.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ad56dcb2
    • Alexander Shishkin's avatar
      perf/core: Disable the event on a truncated AUX record · b45e5d3d
      Alexander Shishkin authored
      commit 9f448cd3 upstream.
      
      When the PMU driver reports a truncated AUX record, it effectively means
      that there is no more usable room in the event's AUX buffer (even though
      there may still be some room, so that perf_aux_output_begin() doesn't take
      action). At this point the consumer still has to be woken up and the event
      has to be disabled, otherwise the event will just keep spinning between
      perf_aux_output_begin() and perf_aux_output_end() until its context gets
      unscheduled.
      
      Again, for cpu-wide events this means never, so once in this condition,
      they will be forever losing data.
      
      Fix this by disabling the event and waking up the consumer in case of a
      truncated AUX record.
      Reported-by: default avatarMarkus Metzger <markus.t.metzger@intel.com>
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/1462886313-13660-3-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b45e5d3d