1. 03 Jul, 2018 40 commits
    • Leon Romanovsky's avatar
      RDMA/mlx4: Discard unknown SQP work requests · 786c8d79
      Leon Romanovsky authored
      commit 6b1ca7ec upstream.
      
      There is no need to crash the machine if unknown work request was
      received in SQP MAD.
      
      Cc: <stable@vger.kernel.org> # 3.6
      Fixes: 37bfc7c1 ("IB/mlx4: SR-IOV multiplex and demultiplex MADs")
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      786c8d79
    • Mike Marciniszyn's avatar
      IB/hfi1: Fix user context tail allocation for DMA_RTAIL · a3369992
      Mike Marciniszyn authored
      commit 1bc0299d upstream.
      
      The following code fails to allocate a buffer for the
      tail address that the hardware DMAs into when the user
      context DMA_RTAIL is set.
      
      if (HFI1_CAP_KGET_MASK(rcd->flags, DMA_RTAIL)) {
      	rcd->rcvhdrtail_kvaddr = dma_zalloc_coherent(
      		&dd->pcidev->dev, PAGE_SIZE, &dma_hdrqtail,
                      gfp_flags);
      	if (!rcd->rcvhdrtail_kvaddr)
      		goto bail_free;
      	rcd->rcvhdrqtailaddr_dma = dma_hdrqtail;
      }
      
      So the rcvhdrtail_kvaddr would then be NULL.
      
      The mmap logic fails to check for a NULL rcvhdrtail_kvaddr.
      
      The fix is to test for both user and kernel DMA_TAIL options
      during the allocation as well as testing for a NULL
      rcvhdrtail_kvaddr during the mmap processing.
      
      Additionally, all downstream testing of the capmask for DMA_RTAIL
      have been eliminated in favor of testing rcvhdrtail_kvaddr.
      
      Cc: <stable@vger.kernel.org> # 4.9.x
      Reviewed-by: default avatarMichael J. Ruhl <michael.j.ruhl@intel.com>
      Signed-off-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a3369992
    • Sebastian Sanchez's avatar
      IB/hfi1: Optimize kthread pointer locking when queuing CQ entries · 964705c4
      Sebastian Sanchez authored
      commit af8aab71 upstream.
      
      All threads queuing CQ entries on different CQs are unnecessarily
      synchronized by a spin lock to check if the CQ kthread worker hasn't
      been destroyed before queuing an CQ entry.
      
      The lock used in 6efaf10f ("IB/rdmavt: Avoid queuing work into a
      destroyed cq kthread worker") is a device global lock and will have
      poor performance at scale as completions are entered from a large
      number of CPUs.
      
      Convert to use RCU where the read side of RCU is rvt_cq_enter() to
      determine that the worker is alive prior to triggering the
      completion event.
      Apply write side RCU semantics in rvt_driver_cq_init() and
      rvt_cq_exit().
      
      Fixes: 6efaf10f ("IB/rdmavt: Avoid queuing work into a destroyed cq kthread worker")
      Cc: <stable@vger.kernel.org> # 4.14.x
      Reviewed-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: default avatarSebastian Sanchez <sebastian.sanchez@intel.com>
      Signed-off-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      964705c4
    • Michael J. Ruhl's avatar
      IB/hfi1: Reorder incorrect send context disable · 2bd28cba
      Michael J. Ruhl authored
      commit a93a0a31 upstream.
      
      User send context integrity bits are cleared before the context is
      disabled.  If the send context is still processing data, any packets
      that need those integrity bits will cause an error and halt the send
      context.
      
      During the disable handling, the driver waits for the context to drain.
      If the context is halted, the driver will eventually timeout because
      the context won't drain and then incorrectly bounce the link.
      
      Reorder the bit clearing and the context disable.
      
      Examine the software state and send context status as well as the
      egress status to determine if a send context is in the halted state.
      
      Promote the check macros to static functions for consistency with the
      new check and to follow kernel style.
      
      Remove an unused define that refers to the egress timeout.
      
      Cc: <stable@vger.kernel.org> # 4.9.x
      Reviewed-by: default avatarMitko Haralanov <mitko.haralanov@intel.com>
      Reviewed-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: default avatarMichael J. Ruhl <michael.j.ruhl@intel.com>
      Signed-off-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2bd28cba
    • Mike Marciniszyn's avatar
      IB/hfi1: Fix fault injection init/exit issues · 9e81f9a2
      Mike Marciniszyn authored
      commit 8c79d822 upstream.
      
      There are config dependent code paths that expose panics in unload
      paths both in this file and in debugfs_remove_recursive() because
      CONFIG_FAULT_INJECTION and CONFIG_FAULT_INJECTION_DEBUG_FS can be
      set independently.
      
      Having CONFIG_FAULT_INJECTION set and CONFIG_FAULT_INJECTION_DEBUG_FS
      reset causes fault_create_debugfs_attr() to return an error.
      
      The debugfs.c routines tolerate failures, but the module unload panics
      dereferencing a NULL in the two exit routines.  If that is fixed, the
      dir passed to debugfs_remove_recursive comes from a memory location
      that was freed and potentially reused causing a segfault or corrupting
      memory.
      
      Here is an example of the NULL deref panic:
      
      [66866.286829] BUG: unable to handle kernel NULL pointer dereference at 0000000000000088
      [66866.295602] IP: hfi1_dbg_ibdev_exit+0x2a/0x80 [hfi1]
      [66866.301138] PGD 858496067 P4D 858496067 PUD 8433a7067 PMD 0
      [66866.307452] Oops: 0000 [#1] SMP
      [66866.310953] Modules linked in: hfi1(-) rdmavt rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm iw_cm ib_cm ib_core rpcsec_gss_krb5 nfsv4 dns_resolver nfsv3 nfs fscache sb_edac x86_pkg_temp_thermal intel_powerclamp vfat fat coretemp kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel iTCO_wdt iTCO_vendor_support crypto_simd mei_me glue_helper cryptd mxm_wmi ipmi_si pcspkr lpc_ich sg mei ioatdma ipmi_devintf i2c_i801 mfd_core shpchp ipmi_msghandler wmi acpi_power_meter acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sd_mod mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt igb fb_sys_fops ttm ahci ptp crc32c_intel libahci pps_core drm dca libata i2c_algo_bit i2c_core [last unloaded: opa_vnic]
      [66866.385551] CPU: 8 PID: 7470 Comm: rmmod Not tainted 4.14.0-mam-tid-rdma #2
      [66866.393317] Hardware name: Intel Corporation S2600WT2/S2600WT2, BIOS SE5C610.86B.01.01.0018.C4.072020161249 07/20/2016
      [66866.405252] task: ffff88084f28c380 task.stack: ffffc90008454000
      [66866.411866] RIP: 0010:hfi1_dbg_ibdev_exit+0x2a/0x80 [hfi1]
      [66866.417984] RSP: 0018:ffffc90008457da0 EFLAGS: 00010202
      [66866.423812] RAX: 0000000000000000 RBX: ffff880857de0000 RCX: 0000000180040001
      [66866.431773] RDX: 0000000180040002 RSI: ffffea0021088200 RDI: 0000000040000000
      [66866.439734] RBP: ffffc90008457da8 R08: ffff88084220e000 R09: 0000000180040001
      [66866.447696] R10: 000000004220e001 R11: ffff88084220e000 R12: ffff88085a31c000
      [66866.455657] R13: ffffffffa07c9820 R14: ffffffffa07c9890 R15: ffff881059d78100
      [66866.463618] FS:  00007f6876047740(0000) GS:ffff88085f800000(0000) knlGS:0000000000000000
      [66866.472644] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [66866.479053] CR2: 0000000000000088 CR3: 0000000856357006 CR4: 00000000001606e0
      [66866.487013] Call Trace:
      [66866.489747]  remove_one+0x1f/0x220 [hfi1]
      [66866.494221]  pci_device_remove+0x39/0xc0
      [66866.498596]  device_release_driver_internal+0x141/0x210
      [66866.504424]  driver_detach+0x3f/0x80
      [66866.508409]  bus_remove_driver+0x55/0xd0
      [66866.512784]  driver_unregister+0x2c/0x50
      [66866.517164]  pci_unregister_driver+0x2a/0xa0
      [66866.521934]  hfi1_mod_cleanup+0x10/0xaa2 [hfi1]
      [66866.526988]  SyS_delete_module+0x171/0x250
      [66866.531558]  do_syscall_64+0x67/0x1b0
      [66866.535644]  entry_SYSCALL64_slow_path+0x25/0x25
      [66866.540792] RIP: 0033:0x7f6875525c27
      [66866.544777] RSP: 002b:00007ffd48528e78 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
      [66866.553224] RAX: ffffffffffffffda RBX: 0000000001cc01d0 RCX: 00007f6875525c27
      [66866.561185] RDX: 00007f6875596000 RSI: 0000000000000800 RDI: 0000000001cc0238
      [66866.569146] RBP: 0000000000000000 R08: 00007f68757e9060 R09: 00007f6875596000
      [66866.577120] R10: 00007ffd48528c00 R11: 0000000000000206 R12: 00007ffd48529db4
      [66866.585080] R13: 0000000000000000 R14: 0000000001cc01d0 R15: 0000000001cc0010
      [66866.593040] Code: 90 0f 1f 44 00 00 48 83 3d a3 8b 03 00 00 55 48 89 e5 53 48 89 fb 74 4e 48 8d bf 18 0c 00 00 e8 9d f2 ff ff 48 8b 83 20 0c 00 00 <48> 8b b8 88 00 00 00 e8 2a 21 b3 e0 48 8b bb 20 0c 00 00 e8 0e
      [66866.614127] RIP: hfi1_dbg_ibdev_exit+0x2a/0x80 [hfi1] RSP: ffffc90008457da0
      [66866.621885] CR2: 0000000000000088
      [66866.625618] ---[ end trace c4817425783fb092 ]---
      
      Fix by insuring that upon failure from fault_create_debugfs_attr() the
      parent pointer for the routines is always set to NULL and guards added
      in the exit routines to insure that debugfs_remove_recursive() is not
      called when when the parent pointer is NULL.
      
      Fixes: 0181ce31 ("IB/hfi1: Add receive fault injection feature")
      Cc: <stable@vger.kernel.org> # 4.14.x
      Reviewed-by: default avatarMichael J. Ruhl <michael.j.ruhl@intel.com>
      Signed-off-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9e81f9a2
    • Max Gurtovoy's avatar
      IB/isert: fix T10-pi check mask setting · c3295186
      Max Gurtovoy authored
      commit 0e12af84 upstream.
      
      A copy/paste bug (probably) caused setting of an app_tag check mask
      in case where a ref_tag check was needed.
      
      Fixes: 38a2d0d4 ("IB/isert: convert to the generic RDMA READ/WRITE API")
      Fixes: 9e961ae7 ("IB/isert: Support T10-PI protected transactions")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarMax Gurtovoy <maxg@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c3295186
    • Alex Estrin's avatar
      IB/isert: Fix for lib/dma_debug check_sync warning · 7d4aaca8
      Alex Estrin authored
      commit 763b6965 upstream.
      
      The following error message occurs on a target host in a debug build
      during session login:
      
      [ 3524.411874] WARNING: CPU: 5 PID: 12063 at lib/dma-debug.c:1207 check_sync+0x4ec/0x5b0
      [ 3524.421057] infiniband hfi1_0: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x0000000000000000] [size=76 bytes]
      ......snip .....
      
      [ 3524.535846] CPU: 5 PID: 12063 Comm: iscsi_np Kdump: loaded Not tainted 3.10.0-862.el7.x86_64.debug #1
      [ 3524.546764] Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.2.6 06/08/2015
      [ 3524.555740] Call Trace:
      [ 3524.559102]  [<ffffffffa5fe915b>] dump_stack+0x19/0x1b
      [ 3524.565477]  [<ffffffffa58a2f58>] __warn+0xd8/0x100
      [ 3524.571557]  [<ffffffffa58a2fdf>] warn_slowpath_fmt+0x5f/0x80
      [ 3524.578610]  [<ffffffffa5bf5b8c>] check_sync+0x4ec/0x5b0
      [ 3524.585177]  [<ffffffffa58efc3f>] ? set_cpus_allowed_ptr+0x5f/0x1c0
      [ 3524.592812]  [<ffffffffa5bf5cd0>] debug_dma_sync_single_for_cpu+0x80/0x90
      [ 3524.601029]  [<ffffffffa586add3>] ? x2apic_send_IPI_mask+0x13/0x20
      [ 3524.608574]  [<ffffffffa585ee1b>] ? native_smp_send_reschedule+0x5b/0x80
      [ 3524.616699]  [<ffffffffa58e9b76>] ? resched_curr+0xf6/0x140
      [ 3524.623567]  [<ffffffffc0879af0>] isert_create_send_desc.isra.26+0xe0/0x110 [ib_isert]
      [ 3524.633060]  [<ffffffffc087af95>] isert_put_login_tx+0x55/0x8b0 [ib_isert]
      [ 3524.641383]  [<ffffffffa58ef114>] ? try_to_wake_up+0x1a4/0x430
      [ 3524.648561]  [<ffffffffc098cfed>] iscsi_target_do_tx_login_io+0xdd/0x230 [iscsi_target_mod]
      [ 3524.658557]  [<ffffffffc098d827>] iscsi_target_do_login+0x1a7/0x600 [iscsi_target_mod]
      [ 3524.668084]  [<ffffffffa59f9bc9>] ? kstrdup+0x49/0x60
      [ 3524.674420]  [<ffffffffc098e976>] iscsi_target_start_negotiation+0x56/0xc0 [iscsi_target_mod]
      [ 3524.684656]  [<ffffffffc098c2ee>] __iscsi_target_login_thread+0x90e/0x1070 [iscsi_target_mod]
      [ 3524.694901]  [<ffffffffc098ca50>] ? __iscsi_target_login_thread+0x1070/0x1070 [iscsi_target_mod]
      [ 3524.705446]  [<ffffffffc098ca50>] ? __iscsi_target_login_thread+0x1070/0x1070 [iscsi_target_mod]
      [ 3524.715976]  [<ffffffffc098ca78>] iscsi_target_login_thread+0x28/0x60 [iscsi_target_mod]
      [ 3524.725739]  [<ffffffffa58d60ff>] kthread+0xef/0x100
      [ 3524.732007]  [<ffffffffa58d6010>] ? insert_kthread_work+0x80/0x80
      [ 3524.739540]  [<ffffffffa5fff1b7>] ret_from_fork_nospec_begin+0x21/0x21
      [ 3524.747558]  [<ffffffffa58d6010>] ? insert_kthread_work+0x80/0x80
      [ 3524.755088] ---[ end trace 23f8bf9238bd1ed8 ]---
      [ 3595.510822] iSCSI/iqn.1994-05.com.redhat:537fa56299: Unsupported SCSI Opcode 0xa3, sending CHECK_CONDITION.
      
      The code calls dma_sync on login_tx_desc->dma_addr prior to initializing it
      with dma-mapped address.
      login_tx_desc is a part of iser_conn structure and is used only once
      during login negotiation, so the issue is fixed by eliminating
      dma_sync call for this buffer using a special case routine.
      
      Cc: <stable@vger.kernel.org>
      Reviewed-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Reviewed-by: default avatarDon Dutile <ddutile@redhat.com>
      Signed-off-by: default avatarAlex Estrin <alex.estrin@intel.com>
      Signed-off-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7d4aaca8
    • Erez Shitrit's avatar
      IB/mlx5: Fetch soft WQE's on fatal error state · c06f8c21
      Erez Shitrit authored
      commit 7b74a83c upstream.
      
      On fatal error the driver simulates CQE's for ULPs that rely on
      completion of all their posted work-request.
      
      For the GSI traffic, the mlx5 has its own mechanism that sends the
      completions via software CQE's directly to the relevant CQ.
      
      This should be kept in fatal error too, so the driver should simulate
      such CQE's with the specified error state in order to complete GSI QP
      work requests.
      
      Without the fix the next deadlock might appears:
              schedule_timeout+0x274/0x350
              wait_for_common+0xec/0x240
              mcast_remove_one+0xd0/0x120 [ib_core]
              ib_unregister_device+0x12c/0x230 [ib_core]
              mlx5_ib_remove+0xc4/0x270 [mlx5_ib]
              mlx5_detach_device+0x184/0x1a0 [mlx5_core]
              mlx5_unload_one+0x308/0x340 [mlx5_core]
              mlx5_pci_err_detected+0x74/0xe0 [mlx5_core]
      
      Cc: <stable@vger.kernel.org> # 4.7
      Fixes: 89ea94a7 ("IB/mlx5: Reset flow support for IB kernel ULPs")
      Signed-off-by: default avatarErez Shitrit <erezsh@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c06f8c21
    • Jack Morgenstein's avatar
      IB/core: Make testing MR flags for writability a static inline function · 96fb9b88
      Jack Morgenstein authored
      commit 08bb558a upstream.
      
      Make the MR writability flags check, which is performed in umem.c,
      a static inline function in file ib_verbs.h
      
      This allows the function to be used by low-level infiniband drivers.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      96fb9b88
    • Jack Morgenstein's avatar
      IB/mlx4: Mark user MR as writable if actual virtual memory is writable · 1c82abc1
      Jack Morgenstein authored
      commit d8f9cc32 upstream.
      
      To allow rereg_user_mr to modify the MR from read-only to writable without
      using get_user_pages again, we needed to define the initial MR as writable.
      However, this was originally done unconditionally, without taking into
      account the writability of the underlying virtual memory.
      
      As a result, any attempt to register a read-only MR over read-only
      virtual memory failed.
      
      To fix this, do not add the writable flag bit when the user virtual memory
      is not writable (e.g. const memory).
      
      However, when the underlying memory is NOT writable (and we therefore
      do not define the initial MR as writable), the IB core adds a
      "force writable" flag to its user-pages request. If this succeeds,
      the reg_user_mr caller gets a writable copy of the original pages.
      
      If the user-space caller then does a rereg_user_mr operation to enable
      writability, this will succeed. This should not be allowed, since
      the original virtual memory was not writable.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 9376932d ("IB/mlx4_ib: Add support for user MR re-registration")
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1c82abc1
    • Alex Estrin's avatar
      IB/{hfi1, qib}: Add handling of kernel restart · 49e10832
      Alex Estrin authored
      commit 8d3e7113 upstream.
      
      A warm restart will fail to unload the driver, leaving link state
      potentially flapping up to the point the BIOS resets the adapter.
      Correct the issue by hooking the shutdown pci method,
      which will bring port down.
      
      Cc: <stable@vger.kernel.org> # 4.9.x
      Reviewed-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: default avatarAlex Estrin <alex.estrin@intel.com>
      Signed-off-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      49e10832
    • Mike Marciniszyn's avatar
      IB/qib: Fix DMA api warning with debug kernel · e884ed82
      Mike Marciniszyn authored
      commit 0252f733 upstream.
      
      The following error occurs in a debug build when running MPI PSM:
      
      [  307.415911] WARNING: CPU: 4 PID: 23867 at lib/dma-debug.c:1158
      check_unmap+0x4ee/0xa20
      [  307.455661] ib_qib 0000:05:00.0: DMA-API: device driver failed to check map
      error[device address=0x00000000df82b000] [size=4096 bytes] [mapped as page]
      [  307.517494] Modules linked in:
      [  307.531584]  ib_isert iscsi_target_mod ib_srpt target_core_mod rpcrdma
      sunrpc ib_srp scsi_transport_srp scsi_tgt ib_iser libiscsi ib_ipoib
      scsi_transport_iscsi rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm
      ib_qib intel_powerclamp coretemp rdmavt intel_rapl iosf_mbi kvm_intel kvm
      irqbypass crc32_pclmul ghash_clmulni_intel ipmi_ssif ib_core aesni_intel sg
      ipmi_si lrw gf128mul dca glue_helper ipmi_devintf iTCO_wdt gpio_ich hpwdt
      iTCO_vendor_support ablk_helper hpilo acpi_power_meter cryptd ipmi_msghandler
      ie31200_edac shpchp pcc_cpufreq lpc_ich pcspkr ip_tables xfs libcrc32c sd_mod
      crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea
      sysfillrect sysimgblt fb_sys_fops ttm ahci crct10dif_pclmul crct10dif_common
      drm crc32c_intel libahci tg3 libata serio_raw ptp i2c_core
      [  307.846113]  pps_core dm_mirror dm_region_hash dm_log dm_mod
      [  307.866505] CPU: 4 PID: 23867 Comm: mpitests-IMB-MP Kdump: loaded Not
      tainted 3.10.0-862.el7.x86_64.debug #1
      [  307.911178] Hardware name: HP ProLiant DL320e Gen8, BIOS J05 11/09/2013
      [  307.944206] Call Trace:
      [  307.956973]  [<ffffffffbd9e915b>] dump_stack+0x19/0x1b
      [  307.982201]  [<ffffffffbd2a2f58>] __warn+0xd8/0x100
      [  308.005999]  [<ffffffffbd2a2fdf>] warn_slowpath_fmt+0x5f/0x80
      [  308.034260]  [<ffffffffbd5f667e>] check_unmap+0x4ee/0xa20
      [  308.060801]  [<ffffffffbd41acaa>] ? page_add_file_rmap+0x2a/0x1d0
      [  308.090689]  [<ffffffffbd5f6c4d>] debug_dma_unmap_page+0x9d/0xb0
      [  308.120155]  [<ffffffffbd4082e0>] ? might_fault+0xa0/0xb0
      [  308.146656]  [<ffffffffc07761a5>] qib_tid_free.isra.14+0x215/0x2a0 [ib_qib]
      [  308.180739]  [<ffffffffc0776bf4>] qib_write+0x894/0x1280 [ib_qib]
      [  308.210733]  [<ffffffffbd540b00>] ? __inode_security_revalidate+0x70/0x80
      [  308.244837]  [<ffffffffbd53c2b7>] ? security_file_permission+0x27/0xb0
      [  308.266025] qib_ib0.8006: multicast join failed for
      ff12:401b:8006:0000:0000:0000:ffff:ffff, status -22
      [  308.323421]  [<ffffffffbd46f5d3>] vfs_write+0xc3/0x1f0
      [  308.347077]  [<ffffffffbd492a5c>] ? fget_light+0xfc/0x510
      [  308.372533]  [<ffffffffbd47045a>] SyS_write+0x8a/0x100
      [  308.396456]  [<ffffffffbd9ff355>] system_call_fastpath+0x1c/0x21
      
      The code calls a qib_map_page() which has never correctly tested for a
      mapping error.
      
      Fix by testing for pci_dma_mapping_error() in all cases and properly
      handling the failure in the caller.
      
      Additionally, streamline qib_map_page() arguments to satisfy just
      the single caller.
      
      Cc: <stable@vger.kernel.org>
      Reviewed-by: default avatarAlex Estrin <alex.estrin@intel.com>
      Tested-by: default avatarDon Dutile <ddutile@redhat.com>
      Reviewed-by: default avatarDon Dutile <ddutile@redhat.com>
      Signed-off-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e884ed82
    • Tadeusz Struk's avatar
      tpm: fix race condition in tpm_common_write() · c41cb9cb
      Tadeusz Struk authored
      commit 3ab2011e upstream.
      
      There is a race condition in tpm_common_write function allowing
      two threads on the same /dev/tpm<N>, or two different applications
      on the same /dev/tpmrm<N> to overwrite each other commands/responses.
      Fixed this by taking the priv->buffer_mutex early in the function.
      
      Also converted the priv->data_pending from atomic to a regular size_t
      type. There is no need for it to be atomic since it is only touched
      under the protection of the priv->buffer_mutex.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarTadeusz Struk <tadeusz.struk@intel.com>
      Reviewed-by: default avatarJarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
      Signed-off-by: default avatarJarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c41cb9cb
    • Tadeusz Struk's avatar
      tpm: fix use after free in tpm2_load_context() · 1bf1a5e2
      Tadeusz Struk authored
      commit 8c81c247 upstream.
      
      If load context command returns with TPM2_RC_HANDLE or TPM2_RC_REFERENCE_H0
      then we have use after free in line 114 and double free in 117.
      
      Fixes: 4d57856a ("tpm2: add session handle context saving and restoring to the space code")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarTadeusz Struk <tadeusz.struk@intel.com>
      Reviewed-by: default avatarJarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
      Signed-off--by: default avatarJarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1bf1a5e2
    • Srinivas Kandagatla's avatar
      of: platform: stop accessing invalid dev in of_platform_device_destroy · 1ed68714
      Srinivas Kandagatla authored
      commit 522811e9 upstream.
      
      Immediately after the platform_device_unregister() the device will be
      cleaned up. Accessing the freed pointer immediately after that will
      crash the system.
      
      Found this bug when kernel is built with CONFIG_PAGE_POISONING and testing
      loading/unloading audio drivers in a loop on Qcom platforms.
      
      Fix this by moving of_node_clear_flag() just before the unregister calls.
      
      Below is the crash trace:
      
      Unable to handle kernel paging request at virtual address 6b6b6b6b6b6c03
      Mem abort info:
        ESR = 0x96000021
        Exception class = DABT (current EL), IL = 32 bits
        SET = 0, FnV = 0
        EA = 0, S1PTW = 0
      Data abort info:
        ISV = 0, ISS = 0x00000021
        CM = 0, WnR = 0
      [006b6b6b6b6b6c03] address between user and kernel address ranges
      Internal error: Oops: 96000021 [#1] PREEMPT SMP
      Modules linked in:
      CPU: 2 PID: 1784 Comm: sh Tainted: G        W         4.17.0-rc7-02230-ge3a63a7ef641-dirty #204
      Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
      pstate: 80000005 (Nzcv daif -PAN -UAO)
      pc : clear_bit+0x18/0x2c
      lr : of_platform_device_destroy+0x64/0xb8
      sp : ffff00000c9c3930
      x29: ffff00000c9c3930 x28: ffff80003d39b200
      x27: ffff000008bb1000 x26: 0000000000000040
      x25: 0000000000000124 x24: ffff80003a9a3080
      x23: 0000000000000060 x22: ffff00000939f518
      x21: ffff80003aa79e98 x20: ffff80003aa3dae0
      x19: ffff80003aa3c890 x18: ffff800009feb794
      x17: 0000000000000000 x16: 0000000000000000
      x15: ffff800009feb790 x14: 0000000000000000
      x13: ffff80003a058778 x12: ffff80003a058728
      x11: ffff80003a058750 x10: 0000000000000000
      x9 : 0000000000000006 x8 : ffff80003a825988
      x7 : bbbbbbbbbbbbbbbb x6 : 0000000000000001
      x5 : 0000000000000000 x4 : 0000000000000001
      x3 : 0000000000000008 x2 : 0000000000000001
      x1 : 6b6b6b6b6b6b6c03 x0 : 0000000000000000
      Process sh (pid: 1784, stack limit = 0x        (ptrval))
      Call trace:
       clear_bit+0x18/0x2c
       q6afe_remove+0x20/0x38
       apr_device_remove+0x30/0x70
       device_release_driver_internal+0x170/0x208
       device_release_driver+0x14/0x20
       bus_remove_device+0xcc/0x150
       device_del+0x10c/0x310
       device_unregister+0x1c/0x70
       apr_remove_device+0xc/0x18
       device_for_each_child+0x50/0x80
       apr_remove+0x18/0x20
       rpmsg_dev_remove+0x38/0x68
       device_release_driver_internal+0x170/0x208
       device_release_driver+0x14/0x20
       bus_remove_device+0xcc/0x150
       device_del+0x10c/0x310
       device_unregister+0x1c/0x70
       qcom_smd_remove_device+0xc/0x18
       device_for_each_child+0x50/0x80
       qcom_smd_unregister_edge+0x3c/0x70
       smd_subdev_remove+0x18/0x28
       rproc_stop+0x48/0xd8
       rproc_shutdown+0x60/0xe8
       state_store+0xbc/0xf8
       dev_attr_store+0x18/0x28
       sysfs_kf_write+0x3c/0x50
       kernfs_fop_write+0x118/0x1e0
       __vfs_write+0x18/0x110
       vfs_write+0xa4/0x1a8
       ksys_write+0x48/0xb0
       sys_write+0xc/0x18
       el0_svc_naked+0x30/0x34
      Code: d2800022 8b400c21 f9800031 9ac32043 (c85f7c22)
      ---[ end trace 32020935775616a2 ]---
      Signed-off-by: default avatarSrinivas Kandagatla <srinivas.kandagatla@linaro.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1ed68714
    • Stefan M Schaeckeler's avatar
      of: unittest: for strings, account for trailing \0 in property length field · 6ba51909
      Stefan M Schaeckeler authored
      commit 3b9cf790 upstream.
      
      For strings, account for trailing \0 in property length field:
      
      This is consistent with how dtc builds string properties.
      
      Function __of_prop_dup() would misbehave on such properties as it duplicates
      properties based on the property length field creating new string values
      without trailing \0s.
      Signed-off-by: default avatarStefan M Schaeckeler <sschaeck@cisco.com>
      Reviewed-by: default avatarFrank Rowand <frank.rowand@sony.com>
      Tested-by: default avatarFrank Rowand <frank.rowand@sony.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6ba51909
    • Frank Rowand's avatar
      of: overlay: validate offset from property fixups · 4910cc25
      Frank Rowand authored
      commit 482137bf upstream.
      
      The smatch static checker marks the data in offset as untrusted,
      leading it to warn:
      
        drivers/of/resolver.c:125 update_usages_of_a_phandle_reference()
        error: buffer underflow 'prop->value' 's32min-s32max'
      
      Add check to verify that offset is within the property data.
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarFrank Rowand <frank.rowand@sony.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4910cc25
    • Jerome Brunet's avatar
      ARM64: dts: meson: disable sd-uhs modes on the libretech-cc · 728ea230
      Jerome Brunet authored
      commit d5b4885b upstream.
      
      There is a problem with the sd-uhs mode when doing a soft reboot.
      Switching back from 1.8v to 3.3v messes with the card, which no longer
      respond (timeout errors). According to the specification, we should
      perform a card reset (power cycling the card) but this is something we
      cannot control on this design.
      
      Then the only solution to restore the communication with the card is an
      "unplug-plug" which is not acceptable
      
      Until we find a solution, if any, disable the sd-uhs modes on this design.
      For the people using uhs at the moment, there will a performance drop as
      a result.
      
      Fixes: 3cde63eb ("ARM64: dts: meson-gxl: libretech-cc: enable high speed modes")
      Signed-off-by: default avatarJerome Brunet <jbrunet@baylibre.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarKevin Hilman <khilman@baylibre.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      728ea230
    • Will Deacon's avatar
      arm64: mm: Ensure writes to swapper are ordered wrt subsequent cache maintenance · 64df84dc
      Will Deacon authored
      commit 71c8fc0c upstream.
      
      When rewriting swapper using nG mappings, we must performance cache
      maintenance around each page table access in order to avoid coherency
      problems with the host's cacheable alias under KVM. To ensure correct
      ordering of the maintenance with respect to Device memory accesses made
      with the Stage-1 MMU disabled, DMBs need to be added between the
      maintenance and the corresponding memory access.
      
      This patch adds a missing DMB between writing a new page table entry and
      performing a clean+invalidate on the same line.
      
      Fixes: f992b4df ("arm64: kpti: Add ->enable callback to remap swapper using nG mappings")
      Cc: <stable@vger.kernel.org> # 4.16.x-
      Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      64df84dc
    • Will Deacon's avatar
      arm64: kpti: Use early_param for kpti= command-line option · ee6ae5ac
      Will Deacon authored
      commit b5b7dd64 upstream.
      
      We inspect __kpti_forced early on as part of the cpufeature enable
      callback which remaps the swapper page table using non-global entries.
      
      Ensure that __kpti_forced has been updated to reflect the kpti=
      command-line option before we start using it.
      
      Fixes: ea1e3de8 ("arm64: entry: Add fake CPU feature for unmapping the kernel at EL0")
      Cc: <stable@vger.kernel.org> # 4.16.x-
      Reported-by: default avatarWei Xu <xuwei5@hisilicon.com>
      Tested-by: default avatarSudeep Holla <sudeep.holla@arm.com>
      Tested-by: default avatarWei Xu <xuwei5@hisilicon.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ee6ae5ac
    • Dave Martin's avatar
      arm64: Fix syscall restarting around signal suppressed by tracer · cdfa28c2
      Dave Martin authored
      commit 0fe42512 upstream.
      
      Commit 17c28958 ("arm64: Abstract syscallno manipulation") abstracts
      out the pt_regs.syscallno value for a syscall cancelled by a tracer
      as NO_SYSCALL, and provides helpers to set and check for this
      condition.  However, the way this was implemented has the
      unintended side-effect of disabling part of the syscall restart
      logic.
      
      This comes about because the second in_syscall() check in
      do_signal() re-evaluates the "in a syscall" condition based on the
      updated pt_regs instead of the original pt_regs.  forget_syscall()
      is explicitly called prior to the second check in order to prevent
      restart logic in the ret_to_user path being spuriously triggered,
      which means that the second in_syscall() check always yields false.
      
      This triggers a failure in
      tools/testing/selftests/seccomp/seccomp_bpf.c, when using ptrace to
      suppress a signal that interrups a nanosleep() syscall.
      
      Misbehaviour of this type is only expected in the case where a
      tracer suppresses a signal and the target process is either being
      single-stepped or the interrupted syscall attempts to restart via
      -ERESTARTBLOCK.
      
      This patch restores the old behaviour by performing the
      in_syscall() check only once at the start of the function.
      
      Fixes: 17c28958 ("arm64: Abstract syscallno manipulation")
      Signed-off-by: default avatarDave Martin <Dave.Martin@arm.com>
      Reported-by: default avatarSumit Semwal <sumit.semwal@linaro.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: <stable@vger.kernel.org> # 4.14.x-
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cdfa28c2
    • Dinh Nguyen's avatar
      ARM: dts: socfpga: Fix NAND controller node compatible for Arria10 · 14ca7d34
      Dinh Nguyen authored
      commit 3877ef7a upstream.
      
      The NAND compatible "denali,denal-nand-dt" property has never been used and
      is obsolete. Remove it.
      
      Cc: stable@vger.kernel.org
      Fixes: f549af06("ARM: dts: socfpga: Add NAND device tree for Arria10")
      Signed-off-by: default avatarDinh Nguyen <dinguyen@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      14ca7d34
    • Marek Vasut's avatar
      ARM: dts: socfpga: Fix NAND controller clock supply · ae6647c7
      Marek Vasut authored
      commit 4eda9b76 upstream.
      
      The Denali NAND x-clock should be supplied by nand_x_clk, not by
      nand_clk. Fix this, otherwise the Denali driver gets incorrect
      clock frequency information and incorrectly configures the NAND
      timing.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMarek Vasut <marex@denx.de>
      Fixes: d837a80d ("ARM: dts: socfpga: add nand controller nodes")
      Cc: Steffen Trumtrar <s.trumtrar@pengutronix.de>
      Signed-off-by: default avatarDinh Nguyen <dinguyen@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ae6647c7
    • Marek Vasut's avatar
      ARM: dts: socfpga: Fix NAND controller node compatible · 3482130d
      Marek Vasut authored
      commit d9a695f3 upstream.
      
      The compatible string for the Denali NAND controller is incorrect,
      fix it by replacing it with one matching the DT bindings and the
      driver.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMarek Vasut <marex@denx.de>
      Fixes: d837a80d ("ARM: dts: socfpga: add nand controller nodes")
      Cc: Steffen Trumtrar <s.trumtrar@pengutronix.de>
      Signed-off-by: default avatarDinh Nguyen <dinguyen@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3482130d
    • Thor Thayer's avatar
      ARM: dts: Fix SPI node for Arria10 · 3db24d2e
      Thor Thayer authored
      commit 975ba94c upstream.
      
      Remove the unused bus-num node and change num-chipselect
      to num-cs to match SPI bindings.
      
      Cc: stable@vger.kernel.org
      Fixes: f2d6f8f8 ("ARM: dts: socfpga: Add SPI Master1 for Arria10 SR chip")
      Signed-off-by: default avatarThor Thayer <thor.thayer@linux.intel.com>
      Signed-off-by: default avatarDinh Nguyen <dinguyen@kernel.org>
      Signed-off-by: default avatarOlof Johansson <olof@lixom.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3db24d2e
    • David Rivshin's avatar
      ARM: 8764/1: kgdb: fix NUMREGBYTES so that gdb_regs[] is the correct size · eda170a9
      David Rivshin authored
      commit 76ed0b80 upstream.
      
      NUMREGBYTES (which is used as the size for gdb_regs[]) is incorrectly
      based on DBG_MAX_REG_NUM instead of GDB_MAX_REGS. DBG_MAX_REG_NUM
      is the number of total registers, while GDB_MAX_REGS is the number
      of 'unsigned longs' it takes to serialize those registers. Since
      FP registers require 3 'unsigned longs' each, DBG_MAX_REG_NUM is
      smaller than GDB_MAX_REGS.
      
      This causes GDB 8.0 give the following error on connect:
      "Truncated register 19 in remote 'g' packet"
      
      This also causes the register serialization/deserialization logic
      to overflow gdb_regs[], overwriting whatever follows.
      
      Fixes: 834b2964 ("kgdb,arm: fix register dump")
      Cc: <stable@vger.kernel.org> # 2.6.37+
      Signed-off-by: default avatarDavid Rivshin <drivshin@allworx.com>
      Acked-by: default avatarRabin Vincent <rabin@rab.in>
      Tested-by: default avatarDaniel Thompson <daniel.thompson@linaro.org>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eda170a9
    • Vaibhav Jain's avatar
      cxl: Disable prefault_mode in Radix mode · c9debbd1
      Vaibhav Jain authored
      commit b6c84ba2 upstream.
      
      Currently we see a kernel-oops reported on Power-9 while attaching a
      context to an AFU, with radix-mode and sysfs attr 'prefault_mode' set
      to anything other than 'none'. The backtrace of the oops is of this
      form:
      
        Unable to handle kernel paging request for data at address 0x00000080
        Faulting instruction address: 0xc00800000bcf3b20
        cpu 0x1: Vector: 300 (Data Access) at [c00000037f003800]
            pc: c00800000bcf3b20: cxl_load_segment+0x178/0x290 [cxl]
            lr: c00800000bcf39f0: cxl_load_segment+0x48/0x290 [cxl]
            sp: c00000037f003a80
           msr: 9000000000009033
           dar: 80
         dsisr: 40000000
          current = 0xc00000037f280000
          paca    = 0xc0000003ffffe600   softe: 3        irq_happened: 0x01
            pid   = 3529, comm = afp_no_int
        <snip>
        cxl_prefault+0xfc/0x248 [cxl]
        process_element_entry_psl9+0xd8/0x1a0 [cxl]
        cxl_attach_dedicated_process_psl9+0x44/0x130 [cxl]
        native_attach_process+0xc0/0x130 [cxl]
        afu_ioctl+0x3f4/0x5e0 [cxl]
        do_vfs_ioctl+0xdc/0x890
        ksys_ioctl+0x68/0xf0
        sys_ioctl+0x40/0xa0
        system_call+0x58/0x6c
      
      The issue is caused as on Power-8 the AFU attr 'prefault_mode' was
      used to improve initial storage fault performance by prefaulting
      process segments. However on Power-9 with radix mode we don't have
      Storage-Segments that we can prefault. Also prefaulting process Pages
      will be too costly and fine-grained.
      
      Hence, since the prefaulting mechanism doesn't makes sense of
      radix-mode, this patch updates prefault_mode_store() to not allow any
      other value apart from CXL_PREFAULT_NONE when radix mode is enabled.
      
      Fixes: f24be42a ("cxl: Add psl9 specific code")
      Cc: stable@vger.kernel.org # v4.12+
      Signed-off-by: default avatarVaibhav Jain <vaibhav@linux.ibm.com>
      Acked-by: default avatarFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Acked-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c9debbd1
    • Finley Xiao's avatar
      soc: rockchip: power-domain: Fix wrong value when power up pd with writemask · 971a5557
      Finley Xiao authored
      commit 9e59c5f6 upstream.
      
      Solve the pd could only ever turn off but never turn them on again,
      if the pd registers have the writemask bits.
      
      So far this affects the rk3328 only.
      
      Fixes: 79bb17ce ("soc: rockchip: power-domain: Support domain control in hiword-registers")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarFinley Xiao <finley.xiao@rock-chips.com>
      Signed-off-by: default avatarElaine Zhang <zhangqing@rock-chips.com>
      Reviewed-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarHeiko Stuebner <heiko@sntech.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      971a5557
    • Mahesh Salgaonkar's avatar
      powerpc/fadump: Unregister fadump on kexec down path. · 56fbab60
      Mahesh Salgaonkar authored
      commit 722cde76 upstream.
      
      Unregister fadump on kexec down path otherwise the fadump registration
      in new kexec-ed kernel complains that fadump is already registered.
      This makes new kernel to continue using fadump registered by previous
      kernel which may lead to invalid vmcore generation. Hence this patch
      fixes this issue by un-registering fadump in fadump_cleanup() which is
      called during kexec path so that new kernel can register fadump with
      new valid values.
      
      Fixes: b500afff ("fadump: Invalidate registration and release reserved memory for general use.")
      Cc: stable@vger.kernel.org # v3.4+
      Signed-off-by: default avatarMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      56fbab60
    • Gautham R. Shenoy's avatar
      cpuidle: powernv: Fix promotion from snooze if next state disabled · 3b185e66
      Gautham R. Shenoy authored
      commit 0a4ec6aa upstream.
      
      The commit 78eaa10f ("cpuidle: powernv/pseries: Auto-promotion of
      snooze to deeper idle state") introduced a timeout for the snooze idle
      state so that it could be eventually be promoted to a deeper idle
      state. The snooze timeout value is static and set to the target
      residency of the next idle state, which would train the cpuidle
      governor to pick the next idle state eventually.
      
      The unfortunate side-effect of this is that if the next idle state(s)
      is disabled, the CPU will forever remain in snooze, despite the fact
      that the system is completely idle, and other deeper idle states are
      available.
      
      This patch fixes the issue by dynamically setting the snooze timeout
      to the target residency of the next enabled state on the device.
      
      Before Patch:
        POWER8 : Only nap disabled.
        $ cpupower monitor sleep 30
        sleep took 30.01297 seconds and exited with status 0
                      |Idle_Stats
        PKG |CORE|CPU | snoo | Nap  | Fast
           0|   8|   0| 96.41|  0.00|  0.00
           0|   8|   1| 96.43|  0.00|  0.00
           0|   8|   2| 96.47|  0.00|  0.00
           0|   8|   3| 96.35|  0.00|  0.00
           0|   8|   4| 96.37|  0.00|  0.00
           0|   8|   5| 96.37|  0.00|  0.00
           0|   8|   6| 96.47|  0.00|  0.00
           0|   8|   7| 96.47|  0.00|  0.00
      
        POWER9: Shallow states (stop0lite, stop1lite, stop2lite, stop0, stop1,
        stop2) disabled:
        $ cpupower monitor sleep 30
        sleep took 30.05033 seconds and exited with status 0
                      |Idle_Stats
        PKG |CORE|CPU | snoo | stop | stop | stop | stop | stop | stop | stop | stop
           0|  16|   0| 89.79|  0.00|  0.00|  0.00|  0.00|  0.00|  0.00|  0.00|  0.00
           0|  16|   1| 90.12|  0.00|  0.00|  0.00|  0.00|  0.00|  0.00|  0.00|  0.00
           0|  16|   2| 90.21|  0.00|  0.00|  0.00|  0.00|  0.00|  0.00|  0.00|  0.00
           0|  16|   3| 90.29|  0.00|  0.00|  0.00|  0.00|  0.00|  0.00|  0.00|  0.00
      
      After Patch:
        POWER8 : Only nap disabled.
        $ cpupower monitor sleep 30
        sleep took 30.01200 seconds and exited with status 0
                      |Idle_Stats
        PKG |CORE|CPU | snoo | Nap  | Fast
           0|   8|   0| 16.58|  0.00| 77.21
           0|   8|   1| 18.42|  0.00| 75.38
           0|   8|   2|  4.70|  0.00| 94.09
           0|   8|   3| 17.06|  0.00| 81.73
           0|   8|   4|  3.06|  0.00| 95.73
           0|   8|   5|  7.00|  0.00| 96.80
           0|   8|   6|  1.00|  0.00| 98.79
           0|   8|   7|  5.62|  0.00| 94.17
      
        POWER9: Shallow states (stop0lite, stop1lite, stop2lite, stop0, stop1,
        stop2) disabled:
      
        $ cpupower monitor sleep 30
        sleep took 30.02110 seconds and exited with status 0
                      |Idle_Stats
        PKG |CORE|CPU | snoo | stop | stop | stop | stop | stop | stop | stop | stop
           0|   0|   0|  0.69|  0.00|  0.00|  0.00|  0.00|  0.00|  0.00|  9.39| 89.70
           0|   0|   1|  0.00|  0.00|  0.00|  0.00|  0.00|  0.00|  0.00|  0.05| 93.21
           0|   0|   2|  0.00|  0.00|  0.00|  0.00|  0.00|  0.00|  0.00|  0.00| 89.93
           0|   0|   3|  0.00|  0.00|  0.00|  0.00|  0.00|  0.00|  0.00|  0.00| 93.26
      
      Fixes: 78eaa10f ("cpuidle: powernv/pseries: Auto-promotion of snooze to deeper idle state")
      Cc: stable@vger.kernel.org # v4.2+
      Signed-off-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Reviewed-by: default avatarBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3b185e66
    • Akshay Adiga's avatar
      powerpc/powernv/cpuidle: Init all present cpus for deep states · a5d49dfb
      Akshay Adiga authored
      commit ac9816dc upstream.
      
      Init all present cpus for deep states instead of "all possible" cpus.
      Init fails if a possible cpu is guarded. Resulting in making only
      non-deep states available for cpuidle/hotplug.
      
      Stewart says, this means that for single threaded workloads, if you
      guard out a CPU core you'll not get WoF (Workload Optimised
      Frequency), which means that performance goes down when you wouldn't
      expect it to.
      
      Fixes: 77b54e9f ("powernv/powerpc: Add winkle support for offline cpus")
      Cc: stable@vger.kernel.org # v3.19+
      Signed-off-by: default avatarAkshay Adiga <akshay.adiga@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a5d49dfb
    • Haren Myneni's avatar
      powerpc/powernv: copy/paste - Mask SO bit in CR · 134e70c2
      Haren Myneni authored
      commit 75743649 upstream.
      
      NX can set the 3rd bit in CR register for XER[SO] (Summary overflow)
      which is not related to paste request. The current paste function
      returns failure for a successful request when this bit is set. So mask
      this bit and check the proper return status.
      
      Fixes: 2392c8c8 ("powerpc/powernv/vas: Define copy/paste interfaces")
      Cc: stable@vger.kernel.org # v4.14+
      Signed-off-by: default avatarHaren Myneni <haren@us.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      134e70c2
    • Alexey Kardashevskiy's avatar
      powerpc/powernv/ioda2: Remove redundant free of TCE pages · 0e8bb91c
      Alexey Kardashevskiy authored
      commit 98fd72fe upstream.
      
      When IODA2 creates a PE, it creates an IOMMU table with it_ops::free
      set to pnv_ioda2_table_free() which calls pnv_pci_ioda2_table_free_pages().
      
      Since iommu_tce_table_put() calls it_ops::free when the last reference
      to the table is released, explicit call to pnv_pci_ioda2_table_free_pages()
      is not needed so let's remove it.
      
      This should fix double free in the case of PCI hotuplug as
      pnv_pci_ioda2_table_free_pages() does not reset neither
      iommu_table::it_base nor ::it_size.
      
      This was not exposed by SRIOV as it uses different code path via
      pnv_pcibios_sriov_disable().
      
      IODA1 does not inialize it_ops::free so it does not have this issue.
      
      Fixes: c5f7700b ("powerpc/powernv: Dynamically release PE")
      Cc: stable@vger.kernel.org # v4.8+
      Signed-off-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0e8bb91c
    • Michael Neuling's avatar
      powerpc/ptrace: Fix enforcement of DAWR constraints · 919c9b81
      Michael Neuling authored
      commit cd6ef7ee upstream.
      
      Back when we first introduced the DAWR, in commit 4ae7ebe9
      ("powerpc: Change hardware breakpoint to allow longer ranges"), we
      screwed up the constraint making it a 1024 byte boundary rather than a
      512. This makes the check overly permissive. Fortunately GDB is the
      only real user and it always did they right thing, so we never
      noticed.
      
      This fixes the constraint to 512 bytes.
      
      Fixes: 4ae7ebe9 ("powerpc: Change hardware breakpoint to allow longer ranges")
      Cc: stable@vger.kernel.org # v3.9+
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      919c9b81
    • Anju T Sudhakar's avatar
      powerpc/perf: Fix memory allocation for core-imc based on num_possible_cpus() · 1ab90923
      Anju T Sudhakar authored
      commit d2032678 upstream.
      
      Currently memory is allocated for core-imc based on cpu_present_mask,
      which has bit 'cpu' set iff cpu is populated. We use (cpu number / threads
      per core) as the array index to access the memory.
      
      Under some circumstances firmware marks a CPU as GUARDed CPU and boot the
      system, until cleared of errors, these CPU's are unavailable for all
      subsequent boots. GUARDed CPUs are possible but not present from linux
      view, so it blows a hole when we assume the max length of our allocation
      is driven by our max present cpus, where as one of the cpus might be online
      and be beyond the max present cpus, due to the hole.
      So (cpu number / threads per core) value bounds the array index and leads
      to memory overflow.
      
      Call trace observed during a guard test:
      
      Faulting instruction address: 0xc000000000149f1c
      cpu 0x69: Vector: 380 (Data Access Out of Range) at [c000003fea303420]
          pc:c000000000149f1c: prefetch_freepointer+0x14/0x30
          lr:c00000000014e0f8: __kmalloc+0x1a8/0x1ac
          sp:c000003fea3036a0
         msr:9000000000009033
         dar:c9c54b2c91dbf6b7
        current = 0xc000003fea2c0000
        paca    = 0xc00000000fddd880	 softe: 3	 irq_happened: 0x01
          pid   = 1, comm = swapper/104
      Linux version 4.16.7-openpower1 (smc@smc-desktop) (gcc version 6.4.0
      (Buildroot 2018.02.1-00006-ga8d1126)) #2 SMP Fri May 4 16:44:54 PDT 2018
      enter ? for help
      call trace:
      	 __kmalloc+0x1a8/0x1ac
      	 (unreliable)
      	 init_imc_pmu+0x7f4/0xbf0
      	 opal_imc_counters_probe+0x3fc/0x43c
      	 platform_drv_probe+0x48/0x80
      	 driver_probe_device+0x22c/0x308
      	 __driver_attach+0xa0/0xd8
      	 bus_for_each_dev+0x88/0xb4
      	 driver_attach+0x2c/0x40
      	 bus_add_driver+0x1e8/0x228
      	 driver_register+0xd0/0x114
      	 __platform_driver_register+0x50/0x64
      	 opal_imc_driver_init+0x24/0x38
      	 do_one_initcall+0x150/0x15c
      	 kernel_init_freeable+0x250/0x254
      	 kernel_init+0x1c/0x150
      	 ret_from_kernel_thread+0x5c/0xc8
      
      Allocating memory for core-imc based on cpu_possible_mask, which has
      bit 'cpu' set iff cpu is populatable, will fix this issue.
      Reported-by: default avatarPridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
      Signed-off-by: default avatarAnju T Sudhakar <anju@linux.vnet.ibm.com>
      Reviewed-by: default avatarBalbir Singh <bsingharora@gmail.com>
      Tested-by: default avatarPridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
      Fixes: 39a846db ("powerpc/perf: Add core IMC PMU support")
      Cc: stable@vger.kernel.org # v4.14+
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1ab90923
    • Michael Neuling's avatar
      powerpc/ptrace: Fix setting 512B aligned breakpoints with PTRACE_SET_DEBUGREG · c12d2416
      Michael Neuling authored
      commit 4f7c06e2 upstream.
      
      In commit e2a800be ("powerpc/hw_brk: Fix off by one error when
      validating DAWR region end") we fixed setting the DAWR end point to
      its max value via PPC_PTRACE_SETHWDEBUG. Unfortunately we broke
      PTRACE_SET_DEBUGREG when setting a 512 byte aligned breakpoint.
      
      PTRACE_SET_DEBUGREG currently sets the length of the breakpoint to
      zero (memset() in hw_breakpoint_init()). This worked with
      arch_validate_hwbkpt_settings() before the above patch was applied but
      is now broken if the breakpoint is 512byte aligned.
      
      This sets the length of the breakpoint to 8 bytes when using
      PTRACE_SET_DEBUGREG.
      
      Fixes: e2a800be ("powerpc/hw_brk: Fix off by one error when validating DAWR region end")
      Cc: stable@vger.kernel.org # v3.11+
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c12d2416
    • Aneesh Kumar K.V's avatar
      powerpc/mm/hash: Add missing isync prior to kernel stack SLB switch · 5fefd9a5
      Aneesh Kumar K.V authored
      commit 91d06971 upstream.
      
      Currently we do not have an isync, or any other context synchronizing
      instruction prior to the slbie/slbmte in _switch() that updates the
      SLB entry for the kernel stack.
      
      However that is not correct as outlined in the ISA.
      
      From Power ISA Version 3.0B, Book III, Chapter 11, page 1133:
      
        "Changing the contents of ... the contents of SLB entries ... can
         have the side effect of altering the context in which data
         addresses and instruction addresses are interpreted, and in which
         instructions are executed and data accesses are performed.
         ...
         These side effects need not occur in program order, and therefore
         may require explicit synchronization by software.
         ...
         The synchronizing instruction before the context-altering
         instruction ensures that all instructions up to and including that
         synchronizing instruction are fetched and executed in the context
         that existed before the alteration."
      
      And page 1136:
      
        "For data accesses, the context synchronizing instruction before the
         slbie, slbieg, slbia, slbmte, tlbie, or tlbiel instruction ensures
         that all preceding instructions that access data storage have
         completed to a point at which they have reported all exceptions
         they will cause."
      
      We're not aware of any bugs caused by this, but it should be fixed
      regardless.
      
      Add the missing isync when updating kernel stack SLB entry.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      [mpe: Flesh out change log with more ISA text & explanation]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5fefd9a5
    • Miklos Szeredi's avatar
      fuse: fix control dir setup and teardown · 69829f74
      Miklos Szeredi authored
      commit 6becdb60 upstream.
      
      syzbot is reporting NULL pointer dereference at fuse_ctl_remove_conn() [1].
      Since fc->ctl_ndents is incremented by fuse_ctl_add_conn() when new_inode()
      failed, fuse_ctl_remove_conn() reaches an inode-less dentry and tries to
      clear d_inode(dentry)->i_private field.
      
      Fix by only adding the dentry to the array after being fully set up.
      
      When tearing down the control directory, do d_invalidate() on it to get rid
      of any mounts that might have been added.
      
      [1] https://syzkaller.appspot.com/bug?id=f396d863067238959c91c0b7cfc10b163638cac6Reported-by: default avatarsyzbot <syzbot+32c236387d66c4516827@syzkaller.appspotmail.com>
      Fixes: bafa9654 ("[PATCH] fuse: add control filesystem")
      Cc: <stable@vger.kernel.org> # v2.6.18
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      69829f74
    • Tetsuo Handa's avatar
      fuse: don't keep dead fuse_conn at fuse_fill_super(). · 3a37d85a
      Tetsuo Handa authored
      commit 543b8f86 upstream.
      
      syzbot is reporting use-after-free at fuse_kill_sb_blk() [1].
      Since sb->s_fs_info field is not cleared after fc was released by
      fuse_conn_put() when initialization failed, fuse_kill_sb_blk() finds
      already released fc and tries to hold the lock. Fix this by clearing
      sb->s_fs_info field after calling fuse_conn_put().
      
      [1] https://syzkaller.appspot.com/bug?id=a07a680ed0a9290585ca424546860464dd9658dbSigned-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reported-by: default avatarsyzbot <syzbot+ec3986119086fe4eec97@syzkaller.appspotmail.com>
      Fixes: 3b463ae0 ("fuse: invalidation reverse calls")
      Cc: John Muir <john@jmuir.com>
      Cc: Csaba Henk <csaba@gluster.com>
      Cc: Anand Avati <avati@redhat.com>
      Cc: <stable@vger.kernel.org> # v2.6.31
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3a37d85a
    • Miklos Szeredi's avatar
      fuse: atomic_o_trunc should truncate pagecache · 2f7bf369
      Miklos Szeredi authored
      commit df0e91d4 upstream.
      
      Fuse has an "atomic_o_trunc" mode, where userspace filesystem uses the
      O_TRUNC flag in the OPEN request to truncate the file atomically with the
      open.
      
      In this mode there's no need to send a SETATTR request to userspace after
      the open, so fuse_do_setattr() checks this mode and returns.  But this
      misses the important step of truncating the pagecache.
      
      Add the missing parts of truncation to the ATTR_OPEN branch.
      Reported-by: default avatarChad Austin <chadaustin@fb.com>
      Fixes: 6ff958ed ("fuse: add atomic open+truncate support")
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2f7bf369