1. 09 Mar, 2016 40 commits
    • Thomas Betker's avatar
      Revert "jffs2: Fix lock acquisition order bug in jffs2_write_begin" · 17f95016
      Thomas Betker authored
      commit 157078f6 upstream.
      
      This reverts commit 5ffd3412
      ("jffs2: Fix lock acquisition order bug in jffs2_write_begin").
      
      The commit modified jffs2_write_begin() to remove a deadlock with
      jffs2_garbage_collect_live(), but this introduced new deadlocks found
      by multiple users. page_lock() actually has to be called before
      mutex_lock(&c->alloc_sem) or mutex_lock(&f->sem) because
      jffs2_write_end() and jffs2_readpage() are called with the page locked,
      and they acquire c->alloc_sem and f->sem, resp.
      
      In other words, the lock order in jffs2_write_begin() was correct, and
      it is the jffs2_garbage_collect_live() path that has to be changed.
      
      Revert the commit to get rid of the new deadlocks, and to clear the way
      for a better fix of the original deadlock.
      Reported-by: default avatarDeng Chao <deng.chao1@zte.com.cn>
      Reported-by: default avatarMing Liu <liu.ming50@gmail.com>
      Reported-by: default avatarwangzaiwei <wangzaiwei@top-vision.cn>
      Signed-off-by: default avatarThomas Betker <thomas.betker@rohde-schwarz.com>
      Signed-off-by: default avatarDavid Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      17f95016
    • Filipe Manana's avatar
      Btrfs: fix loading of orphan roots leading to BUG_ON · c8421505
      Filipe Manana authored
      commit 909c3a22 upstream.
      
      When looking for orphan roots during mount we can end up hitting a
      BUG_ON() (at root-item.c:btrfs_find_orphan_roots()) if a log tree is
      replayed and qgroups are enabled. This is because after a log tree is
      replayed, a transaction commit is made, which triggers qgroup extent
      accounting which in turn does backref walking which ends up reading and
      inserting all roots in the radix tree fs_info->fs_root_radix, including
      orphan roots (deleted snapshots). So after the log tree is replayed, when
      finding orphan roots we hit the BUG_ON with the following trace:
      
      [118209.182438] ------------[ cut here ]------------
      [118209.183279] kernel BUG at fs/btrfs/root-tree.c:314!
      [118209.184074] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      [118209.185123] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic ppdev xor raid6_pq evdev sg parport_pc parport acpi_cpufreq tpm_tis tpm psmouse
      processor i2c_piix4 serio_raw pcspkr i2c_core button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata
      virtio_pci virtio_ring virtio scsi_mod e1000 floppy [last unloaded: btrfs]
      [118209.186318] CPU: 14 PID: 28428 Comm: mount Tainted: G        W       4.5.0-rc5-btrfs-next-24+ #1
      [118209.186318] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
      [118209.186318] task: ffff8801ec131040 ti: ffff8800af34c000 task.ti: ffff8800af34c000
      [118209.186318] RIP: 0010:[<ffffffffa04237d7>]  [<ffffffffa04237d7>] btrfs_find_orphan_roots+0x1fc/0x244 [btrfs]
      [118209.186318] RSP: 0018:ffff8800af34faa8  EFLAGS: 00010246
      [118209.186318] RAX: 00000000ffffffef RBX: 00000000ffffffef RCX: 0000000000000001
      [118209.186318] RDX: 0000000080000000 RSI: 0000000000000001 RDI: 00000000ffffffff
      [118209.186318] RBP: ffff8800af34fb08 R08: 0000000000000001 R09: 0000000000000000
      [118209.186318] R10: ffff8800af34f9f0 R11: 6db6db6db6db6db7 R12: ffff880171b97000
      [118209.186318] R13: ffff8801ca9d65e0 R14: ffff8800afa2e000 R15: 0000160000000000
      [118209.186318] FS:  00007f5bcb914840(0000) GS:ffff88023edc0000(0000) knlGS:0000000000000000
      [118209.186318] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [118209.186318] CR2: 00007f5bcaceb5d9 CR3: 00000000b49b5000 CR4: 00000000000006e0
      [118209.186318] Stack:
      [118209.186318]  fffffbffffffffff 010230ffffffffff 0101000000000000 ff84000000000000
      [118209.186318]  fbffffffffffffff 30ffffffffffffff 0000000000000101 ffff880082348000
      [118209.186318]  0000000000000000 ffff8800afa2e000 ffff8800afa2e000 0000000000000000
      [118209.186318] Call Trace:
      [118209.186318]  [<ffffffffa042e2db>] open_ctree+0x1e37/0x21b9 [btrfs]
      [118209.186318]  [<ffffffffa040a753>] btrfs_mount+0x97e/0xaed [btrfs]
      [118209.186318]  [<ffffffff8108e1c0>] ? trace_hardirqs_on+0xd/0xf
      [118209.186318]  [<ffffffff8117b87e>] mount_fs+0x67/0x131
      [118209.186318]  [<ffffffff81192d2b>] vfs_kern_mount+0x6c/0xde
      [118209.186318]  [<ffffffffa0409f81>] btrfs_mount+0x1ac/0xaed [btrfs]
      [118209.186318]  [<ffffffff8108e1c0>] ? trace_hardirqs_on+0xd/0xf
      [118209.186318]  [<ffffffff8108c26b>] ? lockdep_init_map+0xb9/0x1b3
      [118209.186318]  [<ffffffff8117b87e>] mount_fs+0x67/0x131
      [118209.186318]  [<ffffffff81192d2b>] vfs_kern_mount+0x6c/0xde
      [118209.186318]  [<ffffffff81195637>] do_mount+0x8a6/0x9e8
      [118209.186318]  [<ffffffff8119598d>] SyS_mount+0x77/0x9f
      [118209.186318]  [<ffffffff81493017>] entry_SYSCALL_64_fastpath+0x12/0x6b
      [118209.186318] Code: 64 00 00 85 c0 89 c3 75 24 f0 41 80 4c 24 20 20 49 8b bc 24 f0 01 00 00 4c 89 e6 e8 e8 65 00 00 85 c0 89 c3 74 11 83 f8 ef 75 02 <0f> 0b
      4c 89 e7 e8 da 72 00 00 eb 1c 41 83 bc 24 00 01 00 00 00
      [118209.186318] RIP  [<ffffffffa04237d7>] btrfs_find_orphan_roots+0x1fc/0x244 [btrfs]
      [118209.186318]  RSP <ffff8800af34faa8>
      [118209.230735] ---[ end trace 83938f987d85d477 ]---
      
      So fix this by not treating the error -EEXIST, returned when attempting
      to insert a root already inserted by the backref walking code, as an error.
      
      The following test case for xfstests reproduces the bug:
      
        seq=`basename $0`
        seqres=$RESULT_DIR/$seq
        echo "QA output created by $seq"
        tmp=/tmp/$$
        status=1	# failure is the default!
        trap "_cleanup; exit \$status" 0 1 2 3 15
      
        _cleanup()
        {
            _cleanup_flakey
            cd /
            rm -f $tmp.*
        }
      
        # get standard environment, filters and checks
        . ./common/rc
        . ./common/filter
        . ./common/dmflakey
      
        # real QA test starts here
        _supported_fs btrfs
        _supported_os Linux
        _require_scratch
        _require_dm_target flakey
        _require_metadata_journaling $SCRATCH_DEV
      
        rm -f $seqres.full
      
        _scratch_mkfs >>$seqres.full 2>&1
        _init_flakey
        _mount_flakey
      
        _run_btrfs_util_prog quota enable $SCRATCH_MNT
      
        # Create 2 directories with one file in one of them.
        # We use these just to trigger a transaction commit later, moving the file from
        # directory a to directory b and doing an fsync against directory a.
        mkdir $SCRATCH_MNT/a
        mkdir $SCRATCH_MNT/b
        touch $SCRATCH_MNT/a/f
        sync
      
        # Create our test file with 2 4K extents.
        $XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 8K" $SCRATCH_MNT/foobar | _filter_xfs_io
      
        # Create a snapshot and delete it. This doesn't really delete the snapshot
        # immediately, just makes it inaccessible and invisible to user space, the
        # snapshot is deleted later by a dedicated kernel thread (cleaner kthread)
        # which is woke up at the next transaction commit.
        # A root orphan item is inserted into the tree of tree roots, so that if a
        # power failure happens before the dedicated kernel thread does the snapshot
        # deletion, the next time the filesystem is mounted it resumes the snapshot
        # deletion.
        _run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT $SCRATCH_MNT/snap
        _run_btrfs_util_prog subvolume delete $SCRATCH_MNT/snap
      
        # Now overwrite half of the extents we wrote before. Because we made a snapshpot
        # before, which isn't really deleted yet (since no transaction commit happened
        # after we did the snapshot delete request), the non overwritten extents get
        # referenced twice, once by the default subvolume and once by the snapshot.
        $XFS_IO_PROG -c "pwrite -S 0xbb 4K 8K" $SCRATCH_MNT/foobar | _filter_xfs_io
      
        # Now move file f from directory a to directory b and fsync directory a.
        # The fsync on the directory a triggers a transaction commit (because a file
        # was moved from it to another directory) and the file fsync leaves a log tree
        # with file extent items to replay.
        mv $SCRATCH_MNT/a/f $SCRATCH_MNT/a/b
        $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/a
        $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foobar
      
        echo "File digest before power failure:"
        md5sum $SCRATCH_MNT/foobar | _filter_scratch
      
        # Now simulate a power failure and mount the filesystem to replay the log tree.
        # After the log tree was replayed, we used to hit a BUG_ON() when processing
        # the root orphan item for the deleted snapshot. This is because when processing
        # an orphan root the code expected to be the first code inserting the root into
        # the fs_info->fs_root_radix radix tree, while in reallity it was the second
        # caller attempting to do it - the first caller was the transaction commit that
        # took place after replaying the log tree, when updating the qgroup counters.
        _flakey_drop_and_remount
      
        echo "File digest before after failure:"
        # Must match what he got before the power failure.
        md5sum $SCRATCH_MNT/foobar | _filter_scratch
      
        _unmount_flakey
        status=0
        exit
      
      Fixes: 2d9e9776 ("Btrfs: use btrfs_get_fs_root in resolve_indirect_ref")
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c8421505
    • Gabor Juhos's avatar
      pata-rb532-cf: get rid of the irq_to_gpio() call · 5667d50c
      Gabor Juhos authored
      commit 01836176 upstream.
      
      The RB532 platform specific irq_to_gpio() implementation has been
      removed with commit 832f5dac ("MIPS: Remove all the uses of
      custom gpio.h"). Now the platform uses the generic stub which causes
      the following error:
      
        pata-rb532-cf pata-rb532-cf: no GPIO found for irq149
        pata-rb532-cf: probe of pata-rb532-cf failed with error -2
      
      Drop the irq_to_gpio() call and get the GPIO number from platform
      data instead. After this change, the driver works again:
      
        scsi host0: pata-rb532-cf
        ata1: PATA max PIO4 irq 149
        ata1.00: CFA: CF 1GB, 20080820, max MWDMA4
        ata1.00: 1989792 sectors, multi 0: LBA
        ata1.00: configured for PIO4
        scsi 0:0:0:0: Direct-Access     ATA      CF 1GB           0820 PQ: 0\
        ANSI: 5
        sd 0:0:0:0: [sda] 1989792 512-byte logical blocks: (1.01 GB/971 MiB)
        sd 0:0:0:0: [sda] Write Protect is off
        sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't\
        support DPO or FUA
         sda: sda1 sda2
        sd 0:0:0:0: [sda] Attached SCSI disk
      
      Fixes: 832f5dac ("MIPS: Remove all the uses of custom gpio.h")
      Cc: Alban Bedel <albeu@free.fr>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarGabor Juhos <juhosg@openwrt.org>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5667d50c
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Do not have 'comm' filter override event 'comm' field · 180c86a4
      Steven Rostedt (Red Hat) authored
      commit e57cbaf0 upstream.
      
      Commit 9f616680 "tracing: Allow triggers to filter for CPU ids and
      process names" added a 'comm' filter that will filter events based on the
      current tasks struct 'comm'. But this now hides the ability to filter events
      that have a 'comm' field too. For example, sched_migrate_task trace event.
      That has a 'comm' field of the task to be migrated.
      
       echo 'comm == "bash"' > events/sched_migrate_task/filter
      
      will now filter all sched_migrate_task events for tasks named "bash" that
      migrates other tasks (in interrupt context), instead of seeing when "bash"
      itself gets migrated.
      
      This fix requires a couple of changes.
      
      1) Change the look up order for filter predicates to look at the events
         fields before looking at the generic filters.
      
      2) Instead of basing the filter function off of the "comm" name, have the
         generic "comm" filter have its own filter_type (FILTER_COMM). Test
         against the type instead of the name to assign the filter function.
      
      3) Add a new "COMM" filter that works just like "comm" but will filter based
         on the current task, even if the trace event contains a "comm" field.
      
      Do the same for "cpu" field, adding a FILTER_CPU and a filter "CPU".
      
      Fixes: 9f616680 "tracing: Allow triggers to filter for CPU ids and process names"
      Reported-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      180c86a4
    • Manuel Lauss's avatar
      ata: ahci: don't mark HotPlugCapable Ports as external/removable · a55479ab
      Manuel Lauss authored
      commit dc8b4afc upstream.
      
      The HPCP bit is set by bioses for on-board sata ports either because
      they think sata is hotplug capable in general or to allow Windows
      to display a "device eject" icon on ports which are routed to an
      external connector bracket.
      
      However in Redhat Bugzilla #1310682, users report that with kernel 4.4,
      where this bit test first appeared, a lot of partitions on sata drives
      are now mounted automatically.
      
      This patch should fix redhat and a lot of other distros which
      unconditionally automount all devices which have the "removable"
      bit set.
      Signed-off-by: default avatarManuel Lauss <manuel.lauss@gmail.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Fixes: 8a3e33cf ("ata: ahci: find eSATA ports and flag them as removable" changes userspace behavior)
      Link: http://lkml.kernel.org/g/56CF35FA.1070500@redhat.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a55479ab
    • Todd E Brandt's avatar
      PM / sleep / x86: Fix crash on graph trace through x86 suspend · af2d87ae
      Todd E Brandt authored
      commit 92f9e179 upstream.
      
      Pause/unpause graph tracing around do_suspend_lowlevel as it has
      inconsistent call/return info after it jumps to the wakeup vector.
      The graph trace buffer will otherwise become misaligned and
      may eventually crash and hang on suspend.
      
      To reproduce the issue and test the fix:
      Run a function_graph trace over suspend/resume and set the graph
      function to suspend_devices_and_enter. This consistently hangs the
      system without this fix.
      Signed-off-by: default avatarTodd Brandt <todd.e.brandt@linux.intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      af2d87ae
    • Ard Biesheuvel's avatar
      arm64: vmemmap: use virtual projection of linear region · e75c4b65
      Ard Biesheuvel authored
      commit dfd55ad8 upstream.
      
      Commit dd006da2 ("arm64: mm: increase VA range of identity map") made
      some changes to the memory mapping code to allow physical memory to reside
      at an offset that exceeds the size of the virtual mapping.
      
      However, since the size of the vmemmap area is proportional to the size of
      the VA area, but it is populated relative to the physical space, we may
      end up with the struct page array being mapped outside of the vmemmap
      region. For instance, on my Seattle A0 box, I can see the following output
      in the dmesg log.
      
         vmemmap : 0xffffffbdc0000000 - 0xffffffbfc0000000   (     8 GB maximum)
                   0xffffffbfc0000000 - 0xffffffbfd0000000   (   256 MB actual)
      
      We can fix this by deciding that the vmemmap region is not a projection of
      the physical space, but of the virtual space above PAGE_OFFSET, i.e., the
      linear region. This way, we are guaranteed that the vmemmap region is of
      sufficient size, and we can even reduce the size by half.
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e75c4b65
    • Alexandra Yates's avatar
      Adding Intel Lewisburg device IDs for SATA · 419a2cd1
      Alexandra Yates authored
      commit f5bdd66c upstream.
      
      This patch complements the list of device IDs previously
      added for lewisburg sata.
      Signed-off-by: default avatarAlexandra Yates <alexandra.yates@linux.intel.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      419a2cd1
    • Tejun Heo's avatar
      writeback: flush inode cgroup wb switches instead of pinning super_block · c5cbbec5
      Tejun Heo authored
      commit a1a0e23e upstream.
      
      If cgroup writeback is in use, inodes can be scheduled for
      asynchronous wb switching.  Before 5ff8eaac ("writeback: keep
      superblock pinned during cgroup writeback association switches"), this
      could race with umount leading to super_block being destroyed while
      inodes are pinned for wb switching.  5ff8eaac fixed it by bumping
      s_active while wb switches are in flight; however, this allowed
      in-flight wb switches to make umounts asynchronous when the userland
      expected synchronosity - e.g. fsck immediately following umount may
      fail because the device is still busy.
      
      This patch removes the problematic super_block pinning and instead
      makes generic_shutdown_super() flush in-flight wb switches.  wb
      switches are now executed on a dedicated isw_wq so that they can be
      flushed and isw_nr_in_flight keeps track of the number of in-flight wb
      switches so that flushing can be avoided in most cases.
      
      v2: Move cgroup_writeback_umount() further below and add MS_ACTIVE
          check in inode_switch_wbs() as Jan an Al suggested.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarTahsin Erdogan <tahsin@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Link: http://lkml.kernel.org/g/CAAeU0aNCq7LGODvVGRU-oU_o-6enii5ey0p1c26D1ZzYwkDc5A@mail.gmail.com
      Fixes: 5ff8eaac ("writeback: keep superblock pinned during cgroup writeback association switches")
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Tested-by: default avatarTahsin Erdogan <tahsin@google.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c5cbbec5
    • Ming Lei's avatar
      block: bio: introduce helpers to get the 1st and last bvec · 7adb5cc0
      Ming Lei authored
      commit 7bcd79ac upstream.
      
      The bio passed to bio_will_gap() may be fast cloned from upper
      layer(dm, md, bcache, fs, ...), or from bio splitting in block
      core.
      
      Unfortunately bio_will_gap() just figures out the last bvec via
      'bi_io_vec[prev->bi_vcnt - 1]' directly, and this way is obviously
      wrong.
      
      This patch introduces two helpers for getting the first and last
      bvec of one bio for fixing the issue.
      Reported-by: default avatarSagi Grimberg <sagig@dev.mellanox.co.il>
      Reviewed-by: default avatarSagi Grimberg <sagig@mellanox.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMing Lei <ming.lei@canonical.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7adb5cc0
    • Harvey Hunt's avatar
      libata: Align ata_device's id on a cacheline · cea2cbff
      Harvey Hunt authored
      commit 4ee34ea3 upstream.
      
      The id buffer in ata_device is a DMA target, but it isn't explicitly
      cacheline aligned. Due to this, adjacent fields can be overwritten with
      stale data from memory on non coherent architectures. As a result, the
      kernel is sometimes unable to communicate with an ATA device.
      
      Fix this by ensuring that the id buffer is cacheline aligned.
      
      This issue is similar to that fixed by Commit 84bda12a
      ("libata: align ap->sector_buf").
      Signed-off-by: default avatarHarvey Hunt <harvey.hunt@imgtec.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cea2cbff
    • Arnd Bergmann's avatar
      libata: fix HDIO_GET_32BIT ioctl · b693f2ad
      Arnd Bergmann authored
      commit 287e6611 upstream.
      
      As reported by Soohoon Lee, the HDIO_GET_32BIT ioctl does not
      work correctly in compat mode with libata.
      
      I have investigated the issue further and found multiple problems
      that all appeared with the same commit that originally introduced
      HDIO_GET_32BIT handling in libata back in linux-2.6.8 and presumably
      also linux-2.4, as the code uses "copy_to_user(arg, &val, 1)" to copy
      a 'long' variable containing either 0 or 1 to user space.
      
      The problems with this are:
      
      * On big-endian machines, this will always write a zero because it
        stores the wrong byte into user space.
      
      * In compat mode, the upper three bytes of the variable are updated
        by the compat_hdio_ioctl() function, but they now contain
        uninitialized stack data.
      
      * The hdparm tool calling this ioctl uses a 'static long' variable
        to store the result. This means at least the upper bytes are
        initialized to zero, but calling another ioctl like HDIO_GET_MULTCOUNT
        would fill them with data that remains stale when the low byte
        is overwritten. Fortunately libata doesn't implement any of the
        affected ioctl commands, so this would only happen when we query
        both an IDE and an ATA device in the same command such as
        "hdparm -N -c /dev/hda /dev/sda"
      
      * The libata code for unknown reasons started using ATA_IOC_GET_IO32
        and ATA_IOC_SET_IO32 as aliases for HDIO_GET_32BIT and HDIO_SET_32BIT,
        while the ioctl commands that were added later use the normal
        HDIO_* names. This is harmless but rather confusing.
      
      This addresses all four issues by changing the code to use put_user()
      on an 'unsigned long' variable in HDIO_GET_32BIT, like the IDE subsystem
      does, and by clarifying the names of the ioctl commands.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reported-by: default avatarSoohoon Lee <Soohoon.Lee@f5.com>
      Tested-by: default avatarSoohoon Lee <Soohoon.Lee@f5.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b693f2ad
    • Arindam Nath's avatar
      drm/amdgpu: return from atombios_dp_get_dpcd only when error · f17a7e40
      Arindam Nath authored
      commit 0b39c531 upstream.
      
      In amdgpu_connector_hotplug(), we need to start DP link
      training only after we have received DPCD. The function
      amdgpu_atombios_dp_get_dpcd() returns non-zero value only
      when an error condition is met, otherwise returns zero.
      So in case the function encounters an error, we need to
      skip rest of the code and return from amdgpu_connector_hotplug()
      immediately. Only when we are successfull in reading DPCD
      pin, we should carry on with turning-on the monitor.
      Signed-off-by: default avatarArindam Nath <arindam.nath@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f17a7e40
    • Chunming Zhou's avatar
      drm/amdgpu/gfx8: specify which engine to wait before vm flush · 624a59a5
      Chunming Zhou authored
      commit 9cac5373 upstream.
      
      Select between me and pfp properly.
      Signed-off-by: default avatarChunming Zhou <David1.Zhou@amd.com>
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      624a59a5
    • Christian König's avatar
      drm/amdgpu: apply gfx_v8 fixes to gfx_v7 as well · a46a700a
      Christian König authored
      commit feebe91a upstream.
      
      We never ported that back to CIK, so we could run into VM faults here.
      Signed-off-by: default avatarChristian König <christian.koenig@amd.com>
      Reviewed-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a46a700a
    • Alex Deucher's avatar
      drm/amdgpu/pm: update current crtc info after setting the powerstate · 3b0809af
      Alex Deucher authored
      commit eda1d1cf upstream.
      
      On CI, we need to see if the number of crtcs changes to determine
      whether or not we need to upload the mclk table again.  In practice
      we don't currently upload the mclk table again after the initial load.
      The only reason you would would be to add new states, e.g., for
      arbitrary mclk setting which is not currently supported.
      Acked-by: default avatarJordan Lazare <Jordan.Lazare@amd.com>
      Acked-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3b0809af
    • Alex Deucher's avatar
      drm/radeon/pm: update current crtc info after setting the powerstate · 80e726c7
      Alex Deucher authored
      commit 5e031d9f upstream.
      
      On CI, we need to see if the number of crtcs changes to determine
      whether or not we need to upload the mclk table again.  In practice
      we don't currently upload the mclk table again after the initial load.
      The only reason you would would be to add new states, e.g., for
      arbitrary mclk setting which is not currently supported.
      Acked-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      80e726c7
    • Timothy Pearson's avatar
      drm/ast: Fix incorrect register check for DRAM width · 598fc9b7
      Timothy Pearson authored
      commit 2d02b8bd upstream.
      
      During DRAM initialization on certain ASpeed devices, an incorrect
      bit (bit 10) was checked in the "SDRAM Bus Width Status" register
      to determine DRAM width.
      
      Query bit 6 instead in accordance with the Aspeed AST2050 datasheet v1.05.
      Signed-off-by: default avatarTimothy Pearson <tpearson@raptorengineeringinc.com>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      598fc9b7
    • Mike Christie's avatar
      target: Fix WRITE_SAME/DISCARD conversion to linux 512b sectors · 3028963a
      Mike Christie authored
      commit 8a9ebe71 upstream.
      
      In a couple places we are not converting to/from the Linux
      block layer 512 bytes sectors.
      
      1.
      
      The request queue values and what we do are a mismatch of
      things:
      
      max_discard_sectors - This is in linux block layer 512 byte
      sectors. We are just copying this to max_unmap_lba_count.
      
      discard_granularity - This is in bytes. We are converting it
      to Linux block layer 512 byte sectors.
      
      discard_alignment - This is in bytes. We are just copying
      this over.
      
      The problem is that the core LIO code exports these values in
      spc_emulate_evpd_b0 and we use them to test request arguments
      in sbc_execute_unmap, but we never convert to the block size
      we export to the initiator. If we are not using 512 byte sectors
      then we are exporting the wrong values or are checks are off.
      And, for the discard_alignment/bytes case we are just plain messed
      up.
      
      2.
      
      blkdev_issue_discard's start and number of sector arguments
      are supposed to be in linux block layer 512 byte sectors. We are
      currently passing in the values we get from the initiator which
      might be based on some other sector size.
      
      There is a similar problem in iblock_execute_write_same where
      the bio functions want values in 512 byte sectors but we are
      passing in what we got from the initiator.
      Signed-off-by: default avatarMike Christie <mchristi@redhat.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      [ kamal: backport to 4.4-stable: no unmap_zeroes_data ]
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3028963a
    • Joerg Roedel's avatar
      iommu/vt-d: Use BUS_NOTIFY_REMOVED_DEVICE in hotplug path · 00548ecf
      Joerg Roedel authored
      commit e6a8c9b3 upstream.
      
      In the PCI hotplug path of the Intel IOMMU driver, replace
      the usage of the BUS_NOTIFY_DEL_DEVICE notifier, which is
      executed before the driver is unbound from the device, with
      BUS_NOTIFY_REMOVED_DEVICE, which runs after that.
      
      This fixes a kernel BUG being triggered in the VT-d code
      when the device driver tries to unmap DMA buffers and the
      VT-d driver already destroyed all mappings.
      Reported-by: default avatarStefani Seibold <stefani@seibold.net>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      00548ecf
    • Suravee Suthikulpanit's avatar
      iommu/amd: Fix boot warning when device 00:00.0 is not iommu covered · 44792d5c
      Suravee Suthikulpanit authored
      commit 38e45d02 upstream.
      
      The setup code for the performance counters in the AMD IOMMU driver
      tests whether the counters can be written. It tests to setup a counter
      for device 00:00.0, which fails on systems where this particular device
      is not covered by the IOMMU.
      
      Fix this by not relying on device 00:00.0 but only on the IOMMU being
      present.
      Signed-off-by: default avatarSuravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      44792d5c
    • Jay Cornwall's avatar
      iommu/amd: Apply workaround for ATS write permission check · ad0f9dad
      Jay Cornwall authored
      commit 358875fd upstream.
      
      The AMD Family 15h Models 30h-3Fh (Kaveri) BIOS and Kernel Developer's
      Guide omitted part of the BIOS IOMMU L2 register setup specification.
      Without this setup the IOMMU L2 does not fully respect write permissions
      when handling an ATS translation request.
      
      The IOMMU L2 will set PTE dirty bit when handling an ATS translation with
      write permission request, even when PTE RW bit is clear. This may occur by
      direct translation (which would cause a PPR) or by prefetch request from
      the ATC.
      
      This is observed in practice when the IOMMU L2 modifies a PTE which maps a
      pagecache page. The ext4 filesystem driver BUGs when asked to writeback
      these (non-modified) pages.
      
      Enable ATS write permission check in the Kaveri IOMMU L2 if BIOS has not.
      Signed-off-by: default avatarJay Cornwall <jay@jcornwall.me>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ad0f9dad
    • Michael S. Tsirkin's avatar
      arm/arm64: KVM: Fix ioctl error handling · d1c623c9
      Michael S. Tsirkin authored
      commit 4cad67fc upstream.
      
      Calling return copy_to_user(...) in an ioctl will not
      do the right thing if there's a pagefault:
      copy_to_user returns the number of bytes not copied
      in this case.
      
      Fix up kvm to do
      	return copy_to_user(...)) ?  -EFAULT : 0;
      
      everywhere.
      Acked-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d1c623c9
    • Paolo Bonzini's avatar
      KVM: x86: fix root cause for missed hardware breakpoints · 25e86186
      Paolo Bonzini authored
      commit 70e4da7a upstream.
      
      Commit 172b2386 ("KVM: x86: fix missed hardware breakpoints",
      2016-02-10) worked around a case where the debug registers are not loaded
      correctly on preemption and on the first entry to KVM_RUN.
      
      However, Xiao Guangrong pointed out that the root cause must be that
      KVM_DEBUGREG_BP_ENABLED is not being set correctly.  This can indeed
      happen due to the lazy debug exit mechanism, which does not call
      kvm_update_dr7.  Fix it by replacing the existing loop (more or less
      equivalent to kvm_update_dr0123) with calls to all the kvm_update_dr*
      functions.
      
      Fixes: 172b2386Reviewed-by: default avatarXiao Guangrong <guangrong.xiao@linux.intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      25e86186
    • Michael S. Tsirkin's avatar
      vfio: fix ioctl error handling · 7931825d
      Michael S. Tsirkin authored
      commit 8160c4e4 upstream.
      
      Calling return copy_to_user(...) in an ioctl will not
      do the right thing if there's a pagefault:
      copy_to_user returns the number of bytes not copied
      in this case.
      
      Fix up vfio to do
      	return copy_to_user(...)) ?
      		-EFAULT : 0;
      
      everywhere.
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7931825d
    • Yadan Fan's avatar
      Fix cifs_uniqueid_to_ino_t() function for s390x · bbb3c507
      Yadan Fan authored
      commit 1ee9f4bd upstream.
      
      This issue is caused by commit 02323db1 ("cifs: fix
      cifs_uniqueid_to_ino_t not to ever return 0"), when BITS_PER_LONG
      is 64 on s390x, the corresponding cifs_uniqueid_to_ino_t()
      function will cast 64-bit fileid to 32-bit by using (ino_t)fileid,
      because ino_t (typdefed __kernel_ino_t) is int type.
      
      It's defined in arch/s390/include/uapi/asm/posix_types.h
      
          #ifndef __s390x__
      
          typedef unsigned long   __kernel_ino_t;
          ...
          #else /* __s390x__ */
      
          typedef unsigned int    __kernel_ino_t;
      
      So the #ifdef condition is wrong for s390x, we can just still use
      one cifs_uniqueid_to_ino_t() function with comparing sizeof(ino_t)
      and sizeof(u64) to choose the correct execution accordingly.
      Signed-off-by: default avatarYadan Fan <ydfan@suse.com>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bbb3c507
    • Pavel Shilovsky's avatar
      CIFS: Fix SMB2+ interim response processing for read requests · b52a3958
      Pavel Shilovsky authored
      commit 6cc3b242 upstream.
      
      For interim responses we only need to parse a header and update
      a number credits. Now it is done for all SMB2+ command except
      SMB2_READ which is wrong. Fix this by adding such processing.
      Signed-off-by: default avatarPavel Shilovsky <pshilovsky@samba.org>
      Tested-by: default avatarShirish Pargaonkar <shirishpargaonkar@gmail.com>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b52a3958
    • Justin Maggard's avatar
      cifs: fix out-of-bounds access in lease parsing · 77f0cc0e
      Justin Maggard authored
      commit deb7deff upstream.
      
      When opening a file, SMB2_open() attempts to parse the lease state from the
      SMB2 CREATE Response.  However, the parsing code was not careful to ensure
      that the create contexts are not empty or invalid, which can lead to out-
      of-bounds memory access.  This can be seen easily by trying
      to read a file from a OSX 10.11 SMB3 server.  Here is sample crash output:
      
      BUG: unable to handle kernel paging request at ffff8800a1a77cc6
      IP: [<ffffffff8828a734>] SMB2_open+0x804/0x960
      PGD 8f77067 PUD 0
      Oops: 0000 [#1] SMP
      Modules linked in:
      CPU: 3 PID: 2876 Comm: cp Not tainted 4.5.0-rc3.x86_64.1+ #14
      Hardware name: NETGEAR ReadyNAS 314          /ReadyNAS 314          , BIOS 4.6.5 10/11/2012
      task: ffff880073cdc080 ti: ffff88005b31c000 task.ti: ffff88005b31c000
      RIP: 0010:[<ffffffff8828a734>]  [<ffffffff8828a734>] SMB2_open+0x804/0x960
      RSP: 0018:ffff88005b31fa08  EFLAGS: 00010282
      RAX: 0000000000000015 RBX: 0000000000000000 RCX: 0000000000000006
      RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff88007eb8c8b0
      RBP: ffff88005b31fad8 R08: 666666203d206363 R09: 6131613030383866
      R10: 3030383866666666 R11: 00000000000002b0 R12: ffff8800660fd800
      R13: ffff8800a1a77cc2 R14: 00000000424d53fe R15: ffff88005f5a28c0
      FS:  00007f7c8a2897c0(0000) GS:ffff88007eb80000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: ffff8800a1a77cc6 CR3: 000000005b281000 CR4: 00000000000006e0
      Stack:
       ffff88005b31fa70 ffffffff88278789 00000000000001d3 ffff88005f5a2a80
       ffffffff00000003 ffff88005d029d00 ffff88006fde05a0 0000000000000000
       ffff88005b31fc78 ffff88006fde0780 ffff88005b31fb2f 0000000100000fe0
      Call Trace:
       [<ffffffff88278789>] ? cifsConvertToUTF16+0x159/0x2d0
       [<ffffffff8828cf68>] smb2_open_file+0x98/0x210
       [<ffffffff8811e80c>] ? __kmalloc+0x1c/0xe0
       [<ffffffff882685f4>] cifs_open+0x2a4/0x720
       [<ffffffff88122cef>] do_dentry_open+0x1ff/0x310
       [<ffffffff88268350>] ? cifsFileInfo_get+0x30/0x30
       [<ffffffff88123d92>] vfs_open+0x52/0x60
       [<ffffffff88131dd0>] path_openat+0x170/0xf70
       [<ffffffff88097d48>] ? remove_wait_queue+0x48/0x50
       [<ffffffff88133a29>] do_filp_open+0x79/0xd0
       [<ffffffff8813f2ca>] ? __alloc_fd+0x3a/0x170
       [<ffffffff881240c4>] do_sys_open+0x114/0x1e0
       [<ffffffff881241a9>] SyS_open+0x19/0x20
       [<ffffffff8896e257>] entry_SYSCALL_64_fastpath+0x12/0x6a
      Code: 4d 8d 6c 07 04 31 c0 4c 89 ee e8 47 6f e5 ff 31 c9 41 89 ce 44 89 f1 48 c7 c7 28 b1 bd 88 31 c0 49 01 cd 4c 89 ee e8 2b 6f e5 ff <45> 0f b7 75 04 48 c7 c7 31 b1 bd 88 31 c0 4d 01 ee 4c 89 f6 e8
      RIP  [<ffffffff8828a734>] SMB2_open+0x804/0x960
       RSP <ffff88005b31fa08>
      CR2: ffff8800a1a77cc6
      ---[ end trace d9f69ba64feee469 ]---
      Signed-off-by: default avatarJustin Maggard <jmaggard@netgear.com>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      77f0cc0e
    • Jean-Philippe Brucker's avatar
      fbcon: set a default value to blink interval · 2f553b9b
      Jean-Philippe Brucker authored
      commit a1e533ec upstream.
      
      Since commit 27a4c827
      	fbcon: use the cursor blink interval provided by vt
      
      two attempts have been made at fixing a possible hang caused by
      cursor_timer_handler. That function registers a timer to be triggered at
      "jiffies + fbcon_ops.cur_blink_jiffies".
      
      A new case had been encountered during initialisation of clcd-pl11x:
      
          fbcon_fb_registered
          do_fbcon_takeover
      
          ->  do_register_con_driver
              fbcon_startup
          (A) add_cursor_timer (with cur_blink_jiffies = 0)
      
          ->  do_bind_con_driver
              visual_init
              fbcon_init
          (B) cur_blink_jiffies = msecs_to_jiffies(vc->vc_cur_blink_ms);
      
      If we take an softirq anywhere between A and B (and we do),
      cursor_timer_handler executes indefinitely.
      
      Instead of patching all possible paths that lead to this case one at a
      time, fix the issue at the source and initialise cur_blink_jiffies to
      200ms when allocating fbcon_ops. This was its default value before
      aforesaid commit. fbcon_cursor or fbcon_init will refine this value
      downstream.
      Signed-off-by: default avatarJean-Philippe Brucker <jean-philippe.brucker@arm.com>
      Tested-by: default avatarScot Doyle <lkml14@scotdoyle.com>
      Signed-off-by: default avatarTomi Valkeinen <tomi.valkeinen@ti.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2f553b9b
    • Owen Hofmann's avatar
      kvm: x86: Update tsc multiplier on change. · 8cedd1c0
      Owen Hofmann authored
      commit 2680d6da upstream.
      
      vmx.c writes the TSC_MULTIPLIER field in vmx_vcpu_load, but only when a
      vcpu has migrated physical cpus. Record the last value written and
      update in vmx_vcpu_load on any change, otherwise a cpu migration must
      occur for TSC frequency scaling to take effect.
      
      Fixes: ff2c3a18Signed-off-by: default avatarOwen Hofmann <osh@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8cedd1c0
    • Michael S. Tsirkin's avatar
      mips/kvm: fix ioctl error handling · ce16d9cf
      Michael S. Tsirkin authored
      commit 0178fd7d upstream.
      
      Returning directly whatever copy_to_user(...) or copy_from_user(...)
      returns may not do the right thing if there's a pagefault:
      copy_to_user/copy_from_user return the number of bytes not copied in
      this case, but ioctls need to return -EFAULT instead.
      
      Fix up kvm on mips to do
      	return copy_to_user(...)) ?  -EFAULT : 0;
      and
      	return copy_from_user(...)) ?  -EFAULT : 0;
      
      everywhere.
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ce16d9cf
    • Helge Deller's avatar
      parisc: Fix ptrace syscall number and return value modification · 59dc7ab2
      Helge Deller authored
      commit 98e8b6c9 upstream.
      
      Mike Frysinger reported that his ptrace testcase showed strange
      behaviour on parisc: It was not possible to avoid a syscall and the
      return value of a syscall couldn't be changed.
      
      To modify a syscall number, we were missing to save the new syscall
      number to gr20 which is then picked up later in assembly again.
      
      The effect that the return value couldn't be changed is a side-effect of
      another bug in the assembly code. When a process is ptraced, userspace
      expects each syscall to report entrance and exit of a syscall.  If a
      syscall number was given which doesn't exist, we jumped to the normal
      syscall exit code instead of informing userspace that the (non-existant)
      syscall exits. This unexpected behaviour confuses userspace and thus the
      bug was misinterpreted as if we can't change the return value.
      
      This patch fixes both problems and was tested on 64bit kernel with
      32bit userspace.
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Tested-by: default avatarMike Frysinger <vapier@gentoo.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      59dc7ab2
    • Murali Karicheri's avatar
      PCI: keystone: Fix MSI code that retrieves struct pcie_port pointer · 5bbc2b41
      Murali Karicheri authored
      commit 79e3f4a8 upstream.
      
      Commit cbce7900 ("PCI: designware: Make driver arch-agnostic") changed
      the host bridge sysdata pointer from the ARM pci_sys_data to the DesignWare
      pcie_port structure, and changed pcie-designware.c to reflect that.  But it
      did not change the corresponding code in pci-keystone-dw.c, so it caused
      crashes on Keystone:
      
        Unable to handle kernel NULL pointer dereference at virtual address 00000030
        pgd = c0003000
        [00000030] *pgd=80000800004003, *pmd=00000000
        Internal error: Oops: 206 [#1] PREEMPT SMP ARM
        CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.4.2-00139-gb74f926 #2
        Hardware name: Keystone
        PC is at ks_dw_pcie_msi_irq_unmask+0x24/0x58
      
      Change pci-keystone-dw.c to expect sysdata to be the struct pcie_port
      pointer.
      
      [bhelgaas: changelog]
      Fixes: cbce7900 ("PCI: designware: Make driver arch-agnostic")
      Signed-off-by: default avatarMurali Karicheri <m-karicheri2@ti.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      CC: Zhou Wang <wangzhou1@hisilicon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5bbc2b41
    • Keith Busch's avatar
      block: Initialize max_dev_sectors to 0 · 9ec19090
      Keith Busch authored
      commit 5f009d3f upstream.
      
      The new queue limit is not used by the majority of block drivers, and
      should be initialized to 0 for the driver's requested settings to be used.
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Acked-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: default avatarSagi Grimberg <sagig@mellanox.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9ec19090
    • Oded Gabbay's avatar
    • Qu Wenruo's avatar
      btrfs: async-thread: Fix a use-after-free error for trace · d60a3749
      Qu Wenruo authored
      commit 0a95b851 upstream.
      
      Parameter of trace_btrfs_work_queued() can be freed in its workqueue.
      So no one use use that pointer after queue_work().
      
      Fix the user-after-free bug by move the trace line before queue_work().
      Reported-by: default avatarDave Jones <davej@codemonkey.org.uk>
      Signed-off-by: default avatarQu Wenruo <quwenruo@cn.fujitsu.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d60a3749
    • Zhao Lei's avatar
      btrfs: Fix no_space in write and rm loop · 08acfd9d
      Zhao Lei authored
      commit e1746e83 upstream.
      
      I see no_space in v4.4-rc1 again in xfstests generic/102.
      It happened randomly in some node only.
      (one of 4 phy-node, and a kvm with non-virtio block driver)
      
      By bisect, we can found the first-bad is:
       commit bdced438 ("block: setup bi_phys_segments after splitting")'
      But above patch only triggered the bug by making bio operation
      faster(or slower).
      
      Main reason is in our space_allocating code, we need to commit
      page writeback before wait it complish, this patch fixed above
      bug.
      
      BTW, there is another reason for generic/102 fail, caused by
      disable default mixed-blockgroup, I'll fix it in xfstests.
      Signed-off-by: default avatarZhao Lei <zhaolei@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      08acfd9d
    • Filipe Manana's avatar
      Btrfs: fix deadlock running delayed iputs at transaction commit time · 2232d6dc
      Filipe Manana authored
      commit c2d6cb16 upstream.
      
      While running a stress test I ran into a deadlock when running the delayed
      iputs at transaction time, which produced the following report and trace:
      
      [  886.399989] =============================================
      [  886.400871] [ INFO: possible recursive locking detected ]
      [  886.401663] 4.4.0-rc6-btrfs-next-18+ #1 Not tainted
      [  886.402384] ---------------------------------------------
      [  886.403182] fio/8277 is trying to acquire lock:
      [  886.403568]  (&fs_info->delayed_iput_sem){++++..}, at: [<ffffffffa0538823>] btrfs_run_delayed_iputs+0x36/0xbf [btrfs]
      [  886.403568]
      [  886.403568] but task is already holding lock:
      [  886.403568]  (&fs_info->delayed_iput_sem){++++..}, at: [<ffffffffa0538823>] btrfs_run_delayed_iputs+0x36/0xbf [btrfs]
      [  886.403568]
      [  886.403568] other info that might help us debug this:
      [  886.403568]  Possible unsafe locking scenario:
      [  886.403568]
      [  886.403568]        CPU0
      [  886.403568]        ----
      [  886.403568]   lock(&fs_info->delayed_iput_sem);
      [  886.403568]   lock(&fs_info->delayed_iput_sem);
      [  886.403568]
      [  886.403568]  *** DEADLOCK ***
      [  886.403568]
      [  886.403568]  May be due to missing lock nesting notation
      [  886.403568]
      [  886.403568] 3 locks held by fio/8277:
      [  886.403568]  #0:  (sb_writers#11){.+.+.+}, at: [<ffffffff81174c4c>] __sb_start_write+0x5f/0xb0
      [  886.403568]  #1:  (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffffa054620d>] btrfs_file_write_iter+0x73/0x408 [btrfs]
      [  886.403568]  #2:  (&fs_info->delayed_iput_sem){++++..}, at: [<ffffffffa0538823>] btrfs_run_delayed_iputs+0x36/0xbf [btrfs]
      [  886.403568]
      [  886.403568] stack backtrace:
      [  886.403568] CPU: 6 PID: 8277 Comm: fio Not tainted 4.4.0-rc6-btrfs-next-18+ #1
      [  886.403568] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
      [  886.403568]  0000000000000000 ffff88009f80f770 ffffffff8125d4fd ffffffff82af1fc0
      [  886.403568]  ffff88009f80f830 ffffffff8108e5f9 0000000200000000 ffff88009fd92290
      [  886.403568]  0000000000000000 ffffffff82af1fc0 ffffffff829cfb01 00042b216d008804
      [  886.403568] Call Trace:
      [  886.403568]  [<ffffffff8125d4fd>] dump_stack+0x4e/0x79
      [  886.403568]  [<ffffffff8108e5f9>] __lock_acquire+0xd42/0xf0b
      [  886.403568]  [<ffffffff810c22db>] ? __module_address+0xdf/0x108
      [  886.403568]  [<ffffffff8108eb77>] lock_acquire+0x10d/0x194
      [  886.403568]  [<ffffffff8108eb77>] ? lock_acquire+0x10d/0x194
      [  886.403568]  [<ffffffffa0538823>] ? btrfs_run_delayed_iputs+0x36/0xbf [btrfs]
      [  886.489542]  [<ffffffff8148556b>] down_read+0x3e/0x4d
      [  886.489542]  [<ffffffffa0538823>] ? btrfs_run_delayed_iputs+0x36/0xbf [btrfs]
      [  886.489542]  [<ffffffffa0538823>] btrfs_run_delayed_iputs+0x36/0xbf [btrfs]
      [  886.489542]  [<ffffffffa0533953>] btrfs_commit_transaction+0x8f5/0x96e [btrfs]
      [  886.489542]  [<ffffffffa0521d7a>] flush_space+0x435/0x44a [btrfs]
      [  886.489542]  [<ffffffffa052218b>] ? reserve_metadata_bytes+0x26a/0x384 [btrfs]
      [  886.489542]  [<ffffffffa05221ae>] reserve_metadata_bytes+0x28d/0x384 [btrfs]
      [  886.489542]  [<ffffffffa052256c>] ? btrfs_block_rsv_refill+0x58/0x96 [btrfs]
      [  886.489542]  [<ffffffffa0522584>] btrfs_block_rsv_refill+0x70/0x96 [btrfs]
      [  886.489542]  [<ffffffffa053d747>] btrfs_evict_inode+0x394/0x55a [btrfs]
      [  886.489542]  [<ffffffff81188e31>] evict+0xa7/0x15c
      [  886.489542]  [<ffffffff81189878>] iput+0x1d3/0x266
      [  886.489542]  [<ffffffffa053887c>] btrfs_run_delayed_iputs+0x8f/0xbf [btrfs]
      [  886.489542]  [<ffffffffa0533953>] btrfs_commit_transaction+0x8f5/0x96e [btrfs]
      [  886.489542]  [<ffffffff81085096>] ? signal_pending_state+0x31/0x31
      [  886.489542]  [<ffffffffa0521191>] btrfs_alloc_data_chunk_ondemand+0x1d7/0x288 [btrfs]
      [  886.489542]  [<ffffffffa0521282>] btrfs_check_data_free_space+0x40/0x59 [btrfs]
      [  886.489542]  [<ffffffffa05228f5>] btrfs_delalloc_reserve_space+0x1e/0x4e [btrfs]
      [  886.489542]  [<ffffffffa053620a>] btrfs_direct_IO+0x10c/0x27e [btrfs]
      [  886.489542]  [<ffffffff8111d9a1>] generic_file_direct_write+0xb3/0x128
      [  886.489542]  [<ffffffffa05463c3>] btrfs_file_write_iter+0x229/0x408 [btrfs]
      [  886.489542]  [<ffffffff8108ae38>] ? __lock_is_held+0x38/0x50
      [  886.489542]  [<ffffffff8117279e>] __vfs_write+0x7c/0xa5
      [  886.489542]  [<ffffffff81172cda>] vfs_write+0xa0/0xe4
      [  886.489542]  [<ffffffff811734cc>] SyS_write+0x50/0x7e
      [  886.489542]  [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f
      [ 1081.852335] INFO: task fio:8244 blocked for more than 120 seconds.
      [ 1081.854348]       Not tainted 4.4.0-rc6-btrfs-next-18+ #1
      [ 1081.857560] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 1081.863227] fio        D ffff880213f9bb28     0  8244   8240 0x00000000
      [ 1081.868719]  ffff880213f9bb28 00ffffff810fc6b0 ffffffff0000000a ffff88023ed55240
      [ 1081.872499]  ffff880206b5d400 ffff880213f9c000 ffff88020a4d5318 ffff880206b5d400
      [ 1081.876834]  ffffffff00000001 ffff880206b5d400 ffff880213f9bb40 ffffffff81482ba4
      [ 1081.880782] Call Trace:
      [ 1081.881793]  [<ffffffff81482ba4>] schedule+0x7f/0x97
      [ 1081.883340]  [<ffffffff81485eb5>] rwsem_down_write_failed+0x2d5/0x325
      [ 1081.895525]  [<ffffffff8108d48d>] ? trace_hardirqs_on_caller+0x16/0x1ab
      [ 1081.897419]  [<ffffffff81269723>] call_rwsem_down_write_failed+0x13/0x20
      [ 1081.899251]  [<ffffffff81269723>] ? call_rwsem_down_write_failed+0x13/0x20
      [ 1081.901063]  [<ffffffff81089fae>] ? __down_write_nested.isra.0+0x1f/0x21
      [ 1081.902365]  [<ffffffff814855bd>] down_write+0x43/0x57
      [ 1081.903846]  [<ffffffffa05211b0>] ? btrfs_alloc_data_chunk_ondemand+0x1f6/0x288 [btrfs]
      [ 1081.906078]  [<ffffffffa05211b0>] btrfs_alloc_data_chunk_ondemand+0x1f6/0x288 [btrfs]
      [ 1081.908846]  [<ffffffff8108d461>] ? mark_held_locks+0x56/0x6c
      [ 1081.910409]  [<ffffffffa0521282>] btrfs_check_data_free_space+0x40/0x59 [btrfs]
      [ 1081.912482]  [<ffffffffa05228f5>] btrfs_delalloc_reserve_space+0x1e/0x4e [btrfs]
      [ 1081.914597]  [<ffffffffa053620a>] btrfs_direct_IO+0x10c/0x27e [btrfs]
      [ 1081.919037]  [<ffffffff8111d9a1>] generic_file_direct_write+0xb3/0x128
      [ 1081.920754]  [<ffffffffa05463c3>] btrfs_file_write_iter+0x229/0x408 [btrfs]
      [ 1081.922496]  [<ffffffff8108ae38>] ? __lock_is_held+0x38/0x50
      [ 1081.923922]  [<ffffffff8117279e>] __vfs_write+0x7c/0xa5
      [ 1081.925275]  [<ffffffff81172cda>] vfs_write+0xa0/0xe4
      [ 1081.926584]  [<ffffffff811734cc>] SyS_write+0x50/0x7e
      [ 1081.927968]  [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f
      [ 1081.985293] INFO: lockdep is turned off.
      [ 1081.986132] INFO: task fio:8249 blocked for more than 120 seconds.
      [ 1081.987434]       Not tainted 4.4.0-rc6-btrfs-next-18+ #1
      [ 1081.988534] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 1081.990147] fio        D ffff880218febbb8     0  8249   8240 0x00000000
      [ 1081.991626]  ffff880218febbb8 00ffffff81486b8e ffff88020000000b ffff88023ed75240
      [ 1081.993258]  ffff8802120a9a00 ffff880218fec000 ffff88020a4d5318 ffff8802120a9a00
      [ 1081.994850]  ffffffff00000001 ffff8802120a9a00 ffff880218febbd0 ffffffff81482ba4
      [ 1081.996485] Call Trace:
      [ 1081.997037]  [<ffffffff81482ba4>] schedule+0x7f/0x97
      [ 1081.998017]  [<ffffffff81485eb5>] rwsem_down_write_failed+0x2d5/0x325
      [ 1081.999241]  [<ffffffff810852a5>] ? finish_wait+0x6d/0x76
      [ 1082.000306]  [<ffffffff81269723>] call_rwsem_down_write_failed+0x13/0x20
      [ 1082.001533]  [<ffffffff81269723>] ? call_rwsem_down_write_failed+0x13/0x20
      [ 1082.002776]  [<ffffffff81089fae>] ? __down_write_nested.isra.0+0x1f/0x21
      [ 1082.003995]  [<ffffffff814855bd>] down_write+0x43/0x57
      [ 1082.005000]  [<ffffffffa05211b0>] ? btrfs_alloc_data_chunk_ondemand+0x1f6/0x288 [btrfs]
      [ 1082.007403]  [<ffffffffa05211b0>] btrfs_alloc_data_chunk_ondemand+0x1f6/0x288 [btrfs]
      [ 1082.008988]  [<ffffffffa0545064>] btrfs_fallocate+0x7c1/0xc2f [btrfs]
      [ 1082.010193]  [<ffffffff8108a1ba>] ? percpu_down_read+0x4e/0x77
      [ 1082.011280]  [<ffffffff81174c4c>] ? __sb_start_write+0x5f/0xb0
      [ 1082.012265]  [<ffffffff81174c4c>] ? __sb_start_write+0x5f/0xb0
      [ 1082.013021]  [<ffffffff811712e4>] vfs_fallocate+0x170/0x1ff
      [ 1082.013738]  [<ffffffff81181ebb>] ioctl_preallocate+0x89/0x9b
      [ 1082.014778]  [<ffffffff811822d7>] do_vfs_ioctl+0x40a/0x4ea
      [ 1082.015778]  [<ffffffff81176ea7>] ? SYSC_newfstat+0x25/0x2e
      [ 1082.016806]  [<ffffffff8118b4de>] ? __fget_light+0x4d/0x71
      [ 1082.017789]  [<ffffffff8118240e>] SyS_ioctl+0x57/0x79
      [ 1082.018706]  [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f
      
      This happens because we can recursively acquire the semaphore
      fs_info->delayed_iput_sem when attempting to allocate space to satisfy
      a file write request as shown in the first trace above - when committing
      a transaction we acquire (down_read) the semaphore before running the
      delayed iputs, and when running a delayed iput() we can end up calling
      an inode's eviction handler, which in turn commits another transaction
      and attempts to acquire (down_read) again the semaphore to run more
      delayed iput operations.
      This results in a deadlock because if a task acquires multiple times a
      semaphore it should invoke down_read_nested() with a different lockdep
      class for each level of recursion.
      
      Fix this by simplifying the implementation and use a mutex instead that
      is acquired by the cleaner kthread before it runs the delayed iputs
      instead of always acquiring a semaphore before delayed references are
      run from anywhere.
      
      Fixes: d7c15171 (btrfs: Fix NO_SPACE bug caused by delayed-iput)
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2232d6dc
    • Geert Uytterhoeven's avatar
      drivers: sh: Restore legacy clock domain on SuperH platforms · ad1d8666
      Geert Uytterhoeven authored
      commit 0378ba48 upstream.
      
      CONFIG_ARCH_SHMOBILE is not only enabled for Renesas ARM platforms
      (which are DT based and multi-platform), but also on a select set of
      Renesas SuperH platforms (SH7722/SH7723/SH7724/SH7343/SH7366). Hence
      since commit 0ba58de2 ("drivers: sh: Get rid of
      CONFIG_ARCH_SHMOBILE_MULTI"), the legacy clock domain is no longer
      installed on these SuperH platforms, and module clocks may not be
      enabled when needed, leading to driver failures.
      
      To fix this, add an additional check for CONFIG_OF.
      
      Fixes: 0ba58de2 ("drivers: sh: Get rid of CONFIG_ARCH_SHMOBILE_MULTI").
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarSimon Horman <horms+renesas@verge.net.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ad1d8666
    • Al Viro's avatar
      use ->d_seq to get coherency between ->d_inode and ->d_flags · 53729dbb
      Al Viro authored
      commit a528aca7 upstream.
      
      Games with ordering and barriers are way too brittle.  Just
      bump ->d_seq before and after updating ->d_inode and ->d_flags
      type bits, so that verifying ->d_seq would guarantee they are
      coherent.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      53729dbb