1. 26 Jun, 2017 32 commits
    • Johannes Thumshirn's avatar
      scsi: qla2xxx: don't disable a not previously enabled PCI device · eecbbd83
      Johannes Thumshirn authored
      [ Upstream commit ddff7ed4 ]
      
      When pci_enable_device() or pci_enable_device_mem() fail in
      qla2x00_probe_one() we bail out but do a call to
      pci_disable_device(). This causes the dev_WARN_ON() in
      pci_disable_device() to trigger, as the device wasn't enabled
      previously.
      
      So instead of taking the 'probe_out' error path we can directly return
      *iff* one of the pci_enable_device() calls fails.
      
      Additionally rename the 'probe_out' goto label's name to the more
      descriptive 'disable_device'.
      Signed-off-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Fixes: e315cd28 ("[SCSI] qla2xxx: Code changes for qla data structure refactoring")
      Cc: <stable@vger.kernel.org>
      Reviewed-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Reviewed-by: default avatarGiridhar Malavali <giridhar.malavali@cavium.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      eecbbd83
    • Marc Zyngier's avatar
      KVM: arm/arm64: Handle possible NULL stage2 pud when ageing pages · 4a213a0f
      Marc Zyngier authored
      [ Upstream commit d6dbdd3c ]
      
      Under memory pressure, we start ageing pages, which amounts to parsing
      the page tables. Since we don't want to allocate any extra level,
      we pass NULL for our private allocation cache. Which means that
      stage2_get_pud() is allowed to fail. This results in the following
      splat:
      
      [ 1520.409577] Unable to handle kernel NULL pointer dereference at virtual address 00000008
      [ 1520.417741] pgd = ffff810f52fef000
      [ 1520.421201] [00000008] *pgd=0000010f636c5003, *pud=0000010f56f48003, *pmd=0000000000000000
      [ 1520.429546] Internal error: Oops: 96000006 [#1] PREEMPT SMP
      [ 1520.435156] Modules linked in:
      [ 1520.438246] CPU: 15 PID: 53550 Comm: qemu-system-aar Tainted: G        W       4.12.0-rc4-00027-g1885c397eaec #7205
      [ 1520.448705] Hardware name: FOXCONN R2-1221R-A4/C2U4N_MB, BIOS G31FB12A 10/26/2016
      [ 1520.463726] task: ffff800ac5fb4e00 task.stack: ffff800ce04e0000
      [ 1520.469666] PC is at stage2_get_pmd+0x34/0x110
      [ 1520.474119] LR is at kvm_age_hva_handler+0x44/0xf0
      [ 1520.478917] pc : [<ffff0000080b137c>] lr : [<ffff0000080b149c>] pstate: 40000145
      [ 1520.486325] sp : ffff800ce04e33d0
      [ 1520.489644] x29: ffff800ce04e33d0 x28: 0000000ffff40064
      [ 1520.494967] x27: 0000ffff27e00000 x26: 0000000000000000
      [ 1520.500289] x25: ffff81051ba65008 x24: 0000ffff40065000
      [ 1520.505618] x23: 0000ffff40064000 x22: 0000000000000000
      [ 1520.510947] x21: ffff810f52b20000 x20: 0000000000000000
      [ 1520.516274] x19: 0000000058264000 x18: 0000000000000000
      [ 1520.521603] x17: 0000ffffa6fe7438 x16: ffff000008278b70
      [ 1520.526940] x15: 000028ccd8000000 x14: 0000000000000008
      [ 1520.532264] x13: ffff7e0018298000 x12: 0000000000000002
      [ 1520.537582] x11: ffff000009241b93 x10: 0000000000000940
      [ 1520.542908] x9 : ffff0000092ef800 x8 : 0000000000000200
      [ 1520.548229] x7 : ffff800ce04e36a8 x6 : 0000000000000000
      [ 1520.553552] x5 : 0000000000000001 x4 : 0000000000000000
      [ 1520.558873] x3 : 0000000000000000 x2 : 0000000000000008
      [ 1520.571696] x1 : ffff000008fd5000 x0 : ffff0000080b149c
      [ 1520.577039] Process qemu-system-aar (pid: 53550, stack limit = 0xffff800ce04e0000)
      [...]
      [ 1521.510735] [<ffff0000080b137c>] stage2_get_pmd+0x34/0x110
      [ 1521.516221] [<ffff0000080b149c>] kvm_age_hva_handler+0x44/0xf0
      [ 1521.522054] [<ffff0000080b0610>] handle_hva_to_gpa+0xb8/0xe8
      [ 1521.527716] [<ffff0000080b3434>] kvm_age_hva+0x44/0xf0
      [ 1521.532854] [<ffff0000080a58b0>] kvm_mmu_notifier_clear_flush_young+0x70/0xc0
      [ 1521.539992] [<ffff000008238378>] __mmu_notifier_clear_flush_young+0x88/0xd0
      [ 1521.546958] [<ffff00000821eca0>] page_referenced_one+0xf0/0x188
      [ 1521.552881] [<ffff00000821f36c>] rmap_walk_anon+0xec/0x250
      [ 1521.558370] [<ffff000008220f78>] rmap_walk+0x78/0xa0
      [ 1521.563337] [<ffff000008221104>] page_referenced+0x164/0x180
      [ 1521.569002] [<ffff0000081f1af0>] shrink_active_list+0x178/0x3b8
      [ 1521.574922] [<ffff0000081f2058>] shrink_node_memcg+0x328/0x600
      [ 1521.580758] [<ffff0000081f23f4>] shrink_node+0xc4/0x328
      [ 1521.585986] [<ffff0000081f2718>] do_try_to_free_pages+0xc0/0x340
      [ 1521.592000] [<ffff0000081f2a64>] try_to_free_pages+0xcc/0x240
      [...]
      
      The trivial fix is to handle this NULL pud value early, rather than
      dereferencing it blindly.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: default avatarChristoffer Dall <cdall@linaro.org>
      Signed-off-by: default avatarChristoffer Dall <cdall@linaro.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      4a213a0f
    • Jeff Mahoney's avatar
      btrfs: fix memory leak in update_space_info failure path · 951269f9
      Jeff Mahoney authored
      [ Upstream commit 896533a7 ]
      
      If we fail to add the space_info kobject, we'll leak the memory
      for the percpu counter.
      
      Fixes: 6ab0a202 (btrfs: publish allocation data in sysfs)
      Cc: <stable@vger.kernel.org> # v3.14+
      Signed-off-by: default avatarJeff Mahoney <jeffm@suse.com>
      Reviewed-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      951269f9
    • David Sterba's avatar
      btrfs: use correct types for page indices in btrfs_page_exists_in_range · d42014c8
      David Sterba authored
      [ Upstream commit cc2b702c ]
      
      Variables start_idx and end_idx are supposed to hold a page index
      derived from the file offsets. The int type is not the right one though,
      offsets larger than 1 << 44 will get silently trimmed off the high bits.
      (1 << 44 is 16TiB)
      
      What can go wrong, if start is below the boundary and end gets trimmed:
      - if there's a page after start, we'll find it (radix_tree_gang_lookup_slot)
      - the final check "if (page->index <= end_idx)" will unexpectedly fail
      
      The function will return false, ie. "there's no page in the range",
      although there is at least one.
      
      btrfs_page_exists_in_range is used to prevent races in:
      
      * in hole punching, where we make sure there are not pages in the
        truncated range, otherwise we'll wait for them to finish and redo
        truncation, but we're going to replace the pages with holes anyway so
        the only problem is the intermediate state
      
      * lock_extent_direct: we want to make sure there are no pages before we
        lock and start DIO, to prevent stale data reads
      
      For practical occurence of the bug, there are several constaints.  The
      file must be quite large, the affected range must cross the 16TiB
      boundary and the internal state of the file pages and pending operations
      must match.  Also, we must not have started any ordered data in the
      range, otherwise we don't even reach the buggy function check.
      
      DIO locking tries hard in several places to avoid deadlocks with
      buffered IO and avoids waiting for ranges. The worst consequence seems
      to be stale data read.
      
      CC: Liu Bo <bo.li.liu@oracle.com>
      CC: stable@vger.kernel.org	# 3.16+
      Fixes: fc4adbff ("btrfs: Drop EXTENT_UPTODATE check in hole punching and direct locking")
      Reviewed-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      d42014c8
    • Frederic Barrat's avatar
      cxl: Fix error path on bad ioctl · cc558c20
      Frederic Barrat authored
      [ Upstream commit cec422c1 ]
      
      Fix error path if we can't copy user structure on CXL_IOCTL_START_WORK
      ioctl. We shouldn't unlock the context status mutex as it was not
      locked (yet).
      
      Fixes: 0712dc7e ("cxl: Fix issues when unmapping contexts")
      Cc: stable@vger.kernel.org # v3.19+
      Signed-off-by: default avatarFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Reviewed-by: default avatarVaibhav Jain <vaibhav@linux.vnet.ibm.com>
      Reviewed-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      cc558c20
    • Al Viro's avatar
      ufs: set correct ->s_maxsize · c58e11d1
      Al Viro authored
      [ Upstream commit 6b0d144f ]
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      c58e11d1
    • Al Viro's avatar
      fix ufs_isblockset() · 7ba100d5
      Al Viro authored
      [ Upstream commit 414cf718 ]
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      7ba100d5
    • Tejun Heo's avatar
      cpuset: consider dying css as offline · 7f805350
      Tejun Heo authored
      [ Upstream commit 41c25707 ]
      
      In most cases, a cgroup controller don't care about the liftimes of
      cgroups.  For the controller, a css becomes online when ->css_online()
      is called on it and offline when ->css_offline() is called.
      
      However, cpuset is special in that the user interface it exposes cares
      whether certain cgroups exist or not.  Combined with the RCU delay
      between cgroup removal and css offlining, this can lead to user
      visible behavior oddities where operations which should succeed after
      cgroup removals fail for some time period.  The effects of cgroup
      removals are delayed when seen from userland.
      
      This patch adds css_is_dying() which tests whether offline is pending
      and updates is_cpuset_online() so that the function returns false also
      while offline is pending.  This gets rid of the userland visible
      delays.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarDaniel Jordan <daniel.m.jordan@oracle.com>
      Link: http://lkml.kernel.org/r/327ca1f5-7957-fbb9-9e5f-9ba149d40ba2@oracle.com
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      7f805350
    • Matt Ranostay's avatar
      iio: proximity: as3935: fix AS3935_INT mask · 51037ec2
      Matt Ranostay authored
      [ Upstream commit 275292d3 ]
      
      AS3935 interrupt mask has been incorrect so valid lightning events
      would never trigger an buffer event. Also noise interrupt should be
      BIT(0).
      
      Fixes: 24ddb0e4 ("iio: Add AS3935 lightning sensor support")
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarMatt Ranostay <matt.ranostay@konsulko.com>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      51037ec2
    • Oleg Drokin's avatar
      staging/lustre/lov: remove set_fs() call from lov_getstripe() · 60e9d774
      Oleg Drokin authored
      [ Upstream commit 0a33252e ]
      
      lov_getstripe() calls set_fs(KERNEL_DS) so that it can handle a struct
      lov_user_md pointer from user- or kernel-space.  This changes the
      behavior of copy_from_user() on SPARC and may result in a misaligned
      access exception which in turn oopses the kernel.  In fact the
      relevant argument to lov_getstripe() is never called with a
      kernel-space pointer and so changing the address limits is unnecessary
      and so we remove the calls to save, set, and restore the address
      limits.
      Signed-off-by: default avatarJohn L. Hammond <john.hammond@intel.com>
      Reviewed-on: http://review.whamcloud.com/6150
      Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3221Reviewed-by: default avatarAndreas Dilger <andreas.dilger@intel.com>
      Reviewed-by: default avatarLi Wei <wei.g.li@intel.com>
      Signed-off-by: default avatarOleg Drokin <green@linuxhacker.ru>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      60e9d774
    • Michael Thalmeier's avatar
      usb: chipidea: debug: check before accessing ci_role · 6f4f7e81
      Michael Thalmeier authored
      [ Upstream commit 0340ff83 ]
      
      ci_role BUGs when the role is >= CI_ROLE_END.
      
      Cc: stable@vger.kernel.org  #v3.10+
      Signed-off-by: default avatarMichael Thalmeier <michael.thalmeier@hale.at>
      Signed-off-by: default avatarPeter Chen <peter.chen@nxp.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      6f4f7e81
    • Jisheng Zhang's avatar
      usb: chipidea: udc: fix NULL pointer dereference if udc_start failed · 9738b3df
      Jisheng Zhang authored
      [ Upstream commit aa1f058d ]
      
      Fix below NULL pointer dereference. we set ci->roles[CI_ROLE_GADGET]
      too early in ci_hdrc_gadget_init(), if udc_start() fails due to some
      reason, the ci->roles[CI_ROLE_GADGET] check in  ci_hdrc_gadget_destroy
      can't protect us.
      
      We fix this issue by only setting ci->roles[CI_ROLE_GADGET] if
      udc_start() succeed.
      
      [    1.398550] Unable to handle kernel NULL pointer dereference at
      virtual address 00000000
      ...
      [    1.448600] PC is at dma_pool_free+0xb8/0xf0
      [    1.453012] LR is at dma_pool_free+0x28/0xf0
      [    2.113369] [<ffffff80081817d8>] dma_pool_free+0xb8/0xf0
      [    2.118857] [<ffffff800841209c>] destroy_eps+0x4c/0x68
      [    2.124165] [<ffffff8008413770>] ci_hdrc_gadget_destroy+0x28/0x50
      [    2.130461] [<ffffff800840fa30>] ci_hdrc_probe+0x588/0x7e8
      [    2.136129] [<ffffff8008380fb8>] platform_drv_probe+0x50/0xb8
      [    2.142066] [<ffffff800837f494>] driver_probe_device+0x1fc/0x2a8
      [    2.148270] [<ffffff800837f68c>] __device_attach_driver+0x9c/0xf8
      [    2.154563] [<ffffff800837d570>] bus_for_each_drv+0x58/0x98
      [    2.160317] [<ffffff800837f174>] __device_attach+0xc4/0x138
      [    2.166072] [<ffffff800837f738>] device_initial_probe+0x10/0x18
      [    2.172185] [<ffffff800837e58c>] bus_probe_device+0x94/0xa0
      [    2.177940] [<ffffff800837c560>] device_add+0x3f0/0x560
      [    2.183337] [<ffffff8008380d20>] platform_device_add+0x180/0x240
      [    2.189541] [<ffffff800840f0e8>] ci_hdrc_add_device+0x440/0x4f8
      [    2.195654] [<ffffff8008414194>] ci_hdrc_usb2_probe+0x13c/0x2d8
      [    2.201769] [<ffffff8008380fb8>] platform_drv_probe+0x50/0xb8
      [    2.207705] [<ffffff800837f494>] driver_probe_device+0x1fc/0x2a8
      [    2.213910] [<ffffff800837f5ec>] __driver_attach+0xac/0xb0
      [    2.219575] [<ffffff800837d4b0>] bus_for_each_dev+0x60/0xa0
      [    2.225329] [<ffffff800837ec80>] driver_attach+0x20/0x28
      [    2.230816] [<ffffff800837e880>] bus_add_driver+0x1d0/0x238
      [    2.236571] [<ffffff800837fdb0>] driver_register+0x60/0xf8
      [    2.242237] [<ffffff8008380ef4>] __platform_driver_register+0x44/0x50
      [    2.248891] [<ffffff80086fd440>] ci_hdrc_usb2_driver_init+0x18/0x20
      [    2.255365] [<ffffff8008082950>] do_one_initcall+0x38/0x128
      [    2.261121] [<ffffff80086e0d00>] kernel_init_freeable+0x1ac/0x250
      [    2.267414] [<ffffff800852f0b8>] kernel_init+0x10/0x100
      [    2.272810] [<ffffff8008082680>] ret_from_fork+0x10/0x50
      
      Cc: stable <stable@vger.kernel.org>
      Fixes: 3f124d23 ("usb: chipidea: add role init and destroy APIs")
      Signed-off-by: default avatarJisheng Zhang <jszhang@marvell.com>
      Signed-off-by: default avatarPeter Chen <peter.chen@nxp.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      9738b3df
    • Thinh Nguyen's avatar
      usb: gadget: f_mass_storage: Serialize wake and sleep execution · db87e41d
      Thinh Nguyen authored
      [ Upstream commit dc9217b6 ]
      
      f_mass_storage has a memorry barrier issue with the sleep and wake
      functions that can cause a deadlock. This results in intermittent hangs
      during MSC file transfer. The host will reset the device after receiving
      no response to resume the transfer. This issue is seen when dwc3 is
      processing 2 transfer-in-progress events at the same time, invoking
      completion handlers for CSW and CBW. Also this issue occurs depending on
      the system timing and latency.
      
      To increase the chance to hit this issue, you can force dwc3 driver to
      wait and process those 2 events at once by adding a small delay (~100us)
      in dwc3_check_event_buf() whenever the request is for CSW and read the
      event count again. Avoid debugging with printk and ftrace as extra
      delays and memory barrier will mask this issue.
      
      Scenario which can lead to failure:
      -----------------------------------
      1) The main thread sleeps and waits for the next command in
         get_next_command().
      2) bulk_in_complete() wakes up main thread for CSW.
      3) bulk_out_complete() tries to wake up the running main thread for CBW.
      4) thread_wakeup_needed is not loaded with correct value in
         sleep_thread().
      5) Main thread goes to sleep again.
      
      The pattern is shown below. Note the 2 critical variables.
       * common->thread_wakeup_needed
       * bh->state
      
      	CPU 0 (sleep_thread)		CPU 1 (wakeup_thread)
      	==============================  ===============================
      
      					bh->state = BH_STATE_FULL;
      					smp_wmb();
      	thread_wakeup_needed = 0;	thread_wakeup_needed = 1;
      	smp_rmb();
      	if (bh->state != BH_STATE_FULL)
      		sleep again ...
      
      As pointed out by Alan Stern, this is an R-pattern issue. The issue can
      be seen when there are two wakeups in quick succession. The
      thread_wakeup_needed can be overwritten in sleep_thread, and the read of
      the bh->state maybe reordered before the write to thread_wakeup_needed.
      
      This patch applies full memory barrier smp_mb() in both sleep_thread()
      and wakeup_thread() to ensure the order which the thread_wakeup_needed
      and bh->state are written and loaded.
      
      However, a better solution in the future would be to use wait_queue
      method that takes care of managing memory barrier between waker and
      waiter.
      
      Cc: <stable@vger.kernel.org>
      Acked-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: default avatarThinh Nguyen <thinhn@synopsys.com>
      Signed-off-by: default avatarFelipe Balbi <felipe.balbi@linux.intel.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      db87e41d
    • Konstantin Khlebnikov's avatar
      ext4: keep existing extra fields when inode expands · 92629579
      Konstantin Khlebnikov authored
      [ Upstream commit 887a9730 ]
      
      ext4_expand_extra_isize() should clear only space between old and new
      size.
      
      Fixes: 6dd4ee7c # v2.6.23
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      92629579
    • Jan Kara's avatar
      ext4: fix SEEK_HOLE · 4d1adc2a
      Jan Kara authored
      [ Upstream commit 7d95eddf ]
      
      Currently, SEEK_HOLE implementation in ext4 may both return that there's
      a hole at some offset although that offset already has data and skip
      some holes during a search for the next hole. The first problem is
      demostrated by:
      
      xfs_io -c "falloc 0 256k" -c "pwrite 0 56k" -c "seek -h 0" file
      wrote 57344/57344 bytes at offset 0
      56 KiB, 14 ops; 0.0000 sec (2.054 GiB/sec and 538461.5385 ops/sec)
      Whence	Result
      HOLE	0
      
      Where we can see that SEEK_HOLE wrongly returned offset 0 as containing
      a hole although we have written data there. The second problem can be
      demonstrated by:
      
      xfs_io -c "falloc 0 256k" -c "pwrite 0 56k" -c "pwrite 128k 8k"
             -c "seek -h 0" file
      
      wrote 57344/57344 bytes at offset 0
      56 KiB, 14 ops; 0.0000 sec (1.978 GiB/sec and 518518.5185 ops/sec)
      wrote 8192/8192 bytes at offset 131072
      8 KiB, 2 ops; 0.0000 sec (2 GiB/sec and 500000.0000 ops/sec)
      Whence	Result
      HOLE	139264
      
      Where we can see that hole at offsets 56k..128k has been ignored by the
      SEEK_HOLE call.
      
      The underlying problem is in the ext4_find_unwritten_pgoff() which is
      just buggy. In some cases it fails to update returned offset when it
      finds a hole (when no pages are found or when the first found page has
      higher index than expected), in some cases conditions for detecting hole
      are just missing (we fail to detect a situation where indices of
      returned pages are not contiguous).
      
      Fix ext4_find_unwritten_pgoff() to properly detect non-contiguous page
      indices and also handle all cases where we got less pages then expected
      in one place and handle it properly there.
      
      CC: stable@vger.kernel.org
      Fixes: c8c0df24
      CC: Zheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      4d1adc2a
    • Wanpeng Li's avatar
      KVM: async_pf: avoid async pf injection when in guest mode · 8406f302
      Wanpeng Li authored
      [ Upstream commit 9bc1f09f ]
      
       INFO: task gnome-terminal-:1734 blocked for more than 120 seconds.
             Not tainted 4.12.0-rc4+ #8
       "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
       gnome-terminal- D    0  1734   1015 0x00000000
       Call Trace:
        __schedule+0x3cd/0xb30
        schedule+0x40/0x90
        kvm_async_pf_task_wait+0x1cc/0x270
        ? __vfs_read+0x37/0x150
        ? prepare_to_swait+0x22/0x70
        do_async_page_fault+0x77/0xb0
        ? do_async_page_fault+0x77/0xb0
        async_page_fault+0x28/0x30
      
      This is triggered by running both win7 and win2016 on L1 KVM simultaneously,
      and then gives stress to memory on L1, I can observed this hang on L1 when
      at least ~70% swap area is occupied on L0.
      
      This is due to async pf was injected to L2 which should be injected to L1,
      L2 guest starts receiving pagefault w/ bogus %cr2(apf token from the host
      actually), and L1 guest starts accumulating tasks stuck in D state in
      kvm_async_pf_task_wait() since missing PAGE_READY async_pfs.
      
      This patch fixes the hang by doing async pf when executing L1 guest.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      8406f302
    • Marc Zyngier's avatar
      arm: KVM: Allow unaligned accesses at HYP · fdb67b2a
      Marc Zyngier authored
      [ Upstream commit 33b5c388 ]
      
      We currently have the HSCTLR.A bit set, trapping unaligned accesses
      at HYP, but we're not really prepared to deal with it.
      
      Since the rest of the kernel is pretty happy about that, let's follow
      its example and set HSCTLR.A to zero. Modern CPUs don't really care.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarChristoffer Dall <cdall@linaro.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      fdb67b2a
    • Wanpeng Li's avatar
      KVM: cpuid: Fix read/write out-of-bounds vulnerability in cpuid emulation · 1e8dabb6
      Wanpeng Li authored
      [ Upstream commit a3641631 ]
      
      If "i" is the last element in the vcpu->arch.cpuid_entries[] array, it
      potentially can be exploited the vulnerability. this will out-of-bounds
      read and write.  Luckily, the effect is small:
      
      	/* when no next entry is found, the current entry[i] is reselected */
      	for (j = i + 1; ; j = (j + 1) % nent) {
      		struct kvm_cpuid_entry2 *ej = &vcpu->arch.cpuid_entries[j];
      		if (ej->function == e->function) {
      
      It reads ej->maxphyaddr, which is user controlled.  However...
      
      			ej->flags |= KVM_CPUID_FLAG_STATE_READ_NEXT;
      
      After cpuid_entries there is
      
      	int maxphyaddr;
      	struct x86_emulate_ctxt emulate_ctxt;  /* 16-byte aligned */
      
      So we have:
      
      - cpuid_entries at offset 1B50 (6992)
      - maxphyaddr at offset 27D0 (6992 + 3200 = 10192)
      - padding at 27D4...27DF
      - emulate_ctxt at 27E0
      
      And it writes in the padding.  Pfew, writing the ops field of emulate_ctxt
      would have been much worse.
      
      This patch fixes it by modding the index to avoid the out-of-bounds
      access. Worst case, i == j and ej->function == e->function,
      the loop can bail out.
      Reported-by: default avatarMoguofang <moguofang@huawei.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Guofang Mo <moguofang@huawei.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      1e8dabb6
    • Paolo Bonzini's avatar
      kvm: async_pf: fix rcu_irq_enter() with irqs enabled · 702eb8d2
      Paolo Bonzini authored
      [ Upstream commit bbaf0e2b ]
      
      native_safe_halt enables interrupts, and you just shouldn't
      call rcu_irq_enter() with interrupts enabled.  Reorder the
      call with the following local_irq_disable() to respect the
      invariant.
      Reported-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Acked-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      702eb8d2
    • J. Bruce Fields's avatar
      nfsd4: fix null dereference on replay · 4b1bf4b0
      J. Bruce Fields authored
      [ Upstream commit 9a307403 ]
      
      if we receive a compound such that:
      
      	- the sessionid, slot, and sequence number in the SEQUENCE op
      	  match a cached succesful reply with N ops, and
      	- the Nth operation of the compound is a PUTFH, PUTPUBFH,
      	  PUTROOTFH, or RESTOREFH,
      
      then nfsd4_sequence will return 0 and set cstate->status to
      nfserr_replay_cache.  The current filehandle will not be set.  This will
      cause us to call check_nfsd_access with first argument NULL.
      
      To nfsd4_compound it looks like we just succesfully executed an
      operation that set a filehandle, but the current filehandle is not set.
      
      Fix this by moving the nfserr_replay_cache earlier.  There was never any
      reason to have it after the encode_op label, since the only case where
      he hit that is when opdesc->op_func sets it.
      
      Note that there are two ways we could hit this case:
      
      	- a client is resending a previously sent compound that ended
      	  with one of the four PUTFH-like operations, or
      	- a client is sending a *new* compound that (incorrectly) shares
      	  sessionid, slot, and sequence number with a previously sent
      	  compound, and the length of the previously sent compound
      	  happens to match the position of a PUTFH-like operation in the
      	  new compound.
      
      The second is obviously incorrect client behavior.  The first is also
      very strange--the only purpose of a PUTFH-like operation is to set the
      current filehandle to be used by the following operation, so there's no
      point in having it as the last in a compound.
      
      So it's likely this requires a buggy or malicious client to reproduce.
      Reported-by: default avatarScott Mayhew <smayhew@redhat.com>
      Cc: stable@kernel.vger.org
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      4b1bf4b0
    • Gilad Ben-Yossef's avatar
      crypto: gcm - wait for crypto op not signal safe · 026ed759
      Gilad Ben-Yossef authored
      [ Upstream commit f3ad5870 ]
      
      crypto_gcm_setkey() was using wait_for_completion_interruptible() to
      wait for completion of async crypto op but if a signal occurs it
      may return before DMA ops of HW crypto provider finish, thus
      corrupting the data buffer that is kfree'ed in this case.
      
      Resolve this by using wait_for_completion() instead.
      Reported-by: default avatarEric Biggers <ebiggers3@gmail.com>
      Signed-off-by: default avatarGilad Ben-Yossef <gilad@benyossef.com>
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      026ed759
    • Eric Biggers's avatar
      KEYS: fix freeing uninitialized memory in key_update() · e02ed52d
      Eric Biggers authored
      [ Upstream commit 63a0b050 ]
      
      key_update() freed the key_preparsed_payload even if it was not
      initialized first.  This would cause a crash if userspace called
      keyctl_update() on a key with type like "asymmetric" that has a
      ->preparse() method but not an ->update() method.  Possibly it could
      even be triggered for other key types by racing with keyctl_setperm() to
      make the KEY_NEED_WRITE check fail (the permission was already checked,
      so normally it wouldn't fail there).
      
      Reproducer with key type "asymmetric", given a valid cert.der:
      
      keyctl new_session
      keyid=$(keyctl padd asymmetric desc @s < cert.der)
      keyctl setperm $keyid 0x3f000000
      keyctl update $keyid data
      
      [  150.686666] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
      [  150.687601] IP: asymmetric_key_free_kids+0x12/0x30
      [  150.688139] PGD 38a3d067
      [  150.688141] PUD 3b3de067
      [  150.688447] PMD 0
      [  150.688745]
      [  150.689160] Oops: 0000 [#1] SMP
      [  150.689455] Modules linked in:
      [  150.689769] CPU: 1 PID: 2478 Comm: keyctl Not tainted 4.11.0-rc4-xfstests-00187-ga9f6b6b8 #742
      [  150.690916] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-20170228_101828-anatol 04/01/2014
      [  150.692199] task: ffff88003b30c480 task.stack: ffffc90000350000
      [  150.692952] RIP: 0010:asymmetric_key_free_kids+0x12/0x30
      [  150.693556] RSP: 0018:ffffc90000353e58 EFLAGS: 00010202
      [  150.694142] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000004
      [  150.694845] RDX: ffffffff81ee3920 RSI: ffff88003d4b0700 RDI: 0000000000000001
      [  150.697569] RBP: ffffc90000353e60 R08: ffff88003d5d2140 R09: 0000000000000000
      [  150.702483] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
      [  150.707393] R13: 0000000000000004 R14: ffff880038a4d2d8 R15: 000000000040411f
      [  150.709720] FS:  00007fcbcee35700(0000) GS:ffff88003fd00000(0000) knlGS:0000000000000000
      [  150.711504] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  150.712733] CR2: 0000000000000001 CR3: 0000000039eab000 CR4: 00000000003406e0
      [  150.714487] Call Trace:
      [  150.714975]  asymmetric_key_free_preparse+0x2f/0x40
      [  150.715907]  key_update+0xf7/0x140
      [  150.716560]  ? key_default_cmp+0x20/0x20
      [  150.717319]  keyctl_update_key+0xb0/0xe0
      [  150.718066]  SyS_keyctl+0x109/0x130
      [  150.718663]  entry_SYSCALL_64_fastpath+0x1f/0xc2
      [  150.719440] RIP: 0033:0x7fcbce75ff19
      [  150.719926] RSP: 002b:00007ffd5d167088 EFLAGS: 00000206 ORIG_RAX: 00000000000000fa
      [  150.720918] RAX: ffffffffffffffda RBX: 0000000000404d80 RCX: 00007fcbce75ff19
      [  150.721874] RDX: 00007ffd5d16785e RSI: 000000002866cd36 RDI: 0000000000000002
      [  150.722827] RBP: 0000000000000006 R08: 000000002866cd36 R09: 00007ffd5d16785e
      [  150.723781] R10: 0000000000000004 R11: 0000000000000206 R12: 0000000000404d80
      [  150.724650] R13: 00007ffd5d16784d R14: 00007ffd5d167238 R15: 000000000040411f
      [  150.725447] Code: 83 c4 08 31 c0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 85 ff 74 23 55 48 89 e5 53 48 89 fb <48> 8b 3f e8 06 21 c5 ff 48 8b 7b 08 e8 fd 20 c5 ff 48 89 df e8
      [  150.727489] RIP: asymmetric_key_free_kids+0x12/0x30 RSP: ffffc90000353e58
      [  150.728117] CR2: 0000000000000001
      [  150.728430] ---[ end trace f7f8fe1da2d5ae8d ]---
      
      Fixes: 4d8c0250 ("KEYS: Call ->free_preparse() even after ->preparse() returns an error")
      Cc: stable@vger.kernel.org # 3.17+
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarJames Morris <james.l.morris@oracle.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      e02ed52d
    • Eric W. Biederman's avatar
      ptrace: Properly initialize ptracer_cred on fork · a38f69cb
      Eric W. Biederman authored
      [ Upstream commit c70d9d80 ]
      
      When I introduced ptracer_cred I failed to consider the weirdness of
      fork where the task_struct copies the old value by default.  This
      winds up leaving ptracer_cred set even when a process forks and
      the child process does not wind up being ptraced.
      
      Because ptracer_cred is not set on non-ptraced processes whose
      parents were ptraced this has broken the ability of the enlightenment
      window manager to start setuid children.
      
      Fix this by properly initializing ptracer_cred in ptrace_init_task
      
      This must be done with a little bit of care to preserve the current value
      of ptracer_cred when ptrace carries through fork.  Re-reading the
      ptracer_cred from the ptracing process at this point is inconsistent
      with how PT_PTRACE_CAP has been maintained all of these years.
      Tested-by: default avatarTakashi Iwai <tiwai@suse.de>
      Fixes: 64b875f7 ("ptrace: Capture the ptracer's creds not PT_PTRACE_CAP")
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      a38f69cb
    • Jane Chu's avatar
      arch/sparc: support NR_CPUS = 4096 · 94d53c50
      Jane Chu authored
      [ Upstream commit c79a1373 ]
      
      Linux SPARC64 limits NR_CPUS to 4064 because init_cpu_send_mondo_info()
      only allocates a single page for NR_CPUS mondo entries. Thus we cannot
      use all 4096 CPUs on some SPARC platforms.
      
      To fix, allocate (2^order) pages where order is set according to the size
      of cpu_list for possible cpus. Since cpu_list_pa and cpu_mondo_block_pa
      are not used in asm code, there are no imm13 offsets from the base PA
      that will break because they can only reach one page.
      
      Orabug: 25505750
      Signed-off-by: default avatarJane Chu <jane.chu@oracle.com>
      Reviewed-by: default avatarBob Picco <bob.picco@oracle.com>
      Reviewed-by: default avatarAtish Patra <atish.patra@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      94d53c50
    • Pavel Tatashin's avatar
      sparc64: delete old wrap code · 252bf31f
      Pavel Tatashin authored
      [ Upstream commit 0197e41c ]
      
      The old method that is using xcall and softint to get new context id is
      deleted, as it is replaced by a method of using per_cpu_secondary_mm
      without xcall to perform the context wrap.
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Reviewed-by: default avatarBob Picco <bob.picco@oracle.com>
      Reviewed-by: default avatarSteven Sistare <steven.sistare@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      252bf31f
    • Pavel Tatashin's avatar
      sparc64: new context wrap · 0837a048
      Pavel Tatashin authored
      [ Upstream commit a0582f26 ]
      
      The current wrap implementation has a race issue: it is called outside of
      the ctx_alloc_lock, and also does not wait for all CPUs to complete the
      wrap.  This means that a thread can get a new context with a new version
      and another thread might still be running with the same context. The
      problem is especially severe on CPUs with shared TLBs, like sun4v. I used
      the following test to very quickly reproduce the problem:
      - start over 8K processes (must be more than context IDs)
      - write and read values at a  memory location in every process.
      
      Very quickly memory corruptions start happening, and what we read back
      does not equal what we wrote.
      
      Several approaches were explored before settling on this one:
      
      Approach 1:
      Move smp_new_mmu_context_version() inside ctx_alloc_lock, and wait for
      every process to complete the wrap. (Note: every CPU must WAIT before
      leaving smp_new_mmu_context_version_client() until every one arrives).
      
      This approach ends up with deadlocks, as some threads own locks which other
      threads are waiting for, and they never receive softint until these threads
      exit smp_new_mmu_context_version_client(). Since we do not allow the exit,
      deadlock happens.
      
      Approach 2:
      Handle wrap right during mondo interrupt. Use etrap/rtrap to enter into
      into C code, and issue new versions to every CPU.
      This approach adds some overhead to runtime: in switch_mm() we must add
      some checks to make sure that versions have not changed due to wrap while
      we were loading the new secondary context. (could be protected by PSTATE_IE
      but that degrades performance as on M7 and older CPUs as it takes 50 cycles
      for each access). Also, we still need a global per-cpu array of MMs to know
      where we need to load new contexts, otherwise we can change context to a
      thread that is going way (if we received mondo between switch_mm() and
      switch_to() time). Finally, there are some issues with window registers in
      rtrap() when context IDs are changed during CPU mondo time.
      
      The approach in this patch is the simplest and has almost no impact on
      runtime.  We use the array with mm's where last secondary contexts were
      loaded onto CPUs and bump their versions to the new generation without
      changing context IDs. If a new process comes in to get a context ID, it
      will go through get_new_mmu_context() because of version mismatch. But the
      running processes do not need to be interrupted. And wrap is quicker as we
      do not need to xcall and wait for everyone to receive and complete wrap.
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Reviewed-by: default avatarBob Picco <bob.picco@oracle.com>
      Reviewed-by: default avatarSteven Sistare <steven.sistare@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      0837a048
    • Pavel Tatashin's avatar
      sparc64: add per-cpu mm of secondary contexts · 169dc5fd
      Pavel Tatashin authored
      [ Upstream commit 7a5b4bbf ]
      
      The new wrap is going to use information from this array to figure out
      mm's that currently have valid secondary contexts setup.
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Reviewed-by: default avatarBob Picco <bob.picco@oracle.com>
      Reviewed-by: default avatarSteven Sistare <steven.sistare@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      169dc5fd
    • Pavel Tatashin's avatar
      sparc64: redefine first version · ccadb4e6
      Pavel Tatashin authored
      [ Upstream commit c4415235 ]
      
      CTX_FIRST_VERSION defines the first context version, but also it defines
      first context. This patch redefines it to only include the first context
      version.
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Reviewed-by: default avatarBob Picco <bob.picco@oracle.com>
      Reviewed-by: default avatarSteven Sistare <steven.sistare@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      ccadb4e6
    • Pavel Tatashin's avatar
      sparc64: combine activate_mm and switch_mm · 5203c6c9
      Pavel Tatashin authored
      [ Upstream commit 14d0334c ]
      
      The only difference between these two functions is that in activate_mm we
      unconditionally flush context. However, there is no need to keep this
      difference after fixing a bug where cpumask was not reset on a wrap. So, in
      this patch we combine these.
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Reviewed-by: default avatarBob Picco <bob.picco@oracle.com>
      Reviewed-by: default avatarSteven Sistare <steven.sistare@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      5203c6c9
    • Pavel Tatashin's avatar
      sparc64: reset mm cpumask after wrap · 317a4448
      Pavel Tatashin authored
      [ Upstream commit 58897485 ]
      
      After a wrap (getting a new context version) a process must get a new
      context id, which means that we would need to flush the context id from
      the TLB before running for the first time with this ID on every CPU. But,
      we use mm_cpumask to determine if this process has been running on this CPU
      before, and this mask is not reset after a wrap. So, there are two possible
      fixes for this issue:
      
      1. Clear mm cpumask whenever mm gets a new context id
      2. Unconditionally flush context every time process is running on a CPU
      
      This patch implements the first solution
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Reviewed-by: default avatarBob Picco <bob.picco@oracle.com>
      Reviewed-by: default avatarSteven Sistare <steven.sistare@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      317a4448
    • James Clarke's avatar
      sparc: Machine description indices can vary · a2334e23
      James Clarke authored
      [ Upstream commit c982aa9c ]
      
      VIO devices were being looked up by their index in the machine
      description node block, but this often varies over time as devices are
      added and removed. Instead, store the ID and look up using the type,
      config handle and ID.
      Signed-off-by: default avatarJames Clarke <jrtc27@jrtc27.com>
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112541Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      a2334e23
    • Mike Kravetz's avatar
      sparc64: mm: fix copy_tsb to correctly copy huge page TSBs · 8ee93884
      Mike Kravetz authored
      [ Upstream commit 654f4807 ]
      
      When a TSB grows beyond its current capacity, a new TSB is allocated
      and copy_tsb is called to copy entries from the old TSB to the new.
      A hash shift based on page size is used to calculate the index of an
      entry in the TSB.  copy_tsb has hard coded PAGE_SHIFT in these
      calculations.  However, for huge page TSBs the value REAL_HPAGE_SHIFT
      should be used.  As a result, when copy_tsb is called for a huge page
      TSB the entries are placed at the incorrect index in the newly
      allocated TSB.  When doing hardware table walk, the MMU does not
      match these entries and we end up in the TSB miss handling code.
      This code will then create and write an entry to the correct index
      in the TSB.  We take a performance hit for the table walk miss and
      recreation of these entries.
      
      Pass a new parameter to copy_tsb that is the page size shift to be
      used when copying the TSB.
      Suggested-by: default avatarAnthony Yznaga <anthony.yznaga@oracle.com>
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      8ee93884
  2. 25 Jun, 2017 8 commits