- 30 Nov, 2017 40 commits
-
-
Nicholas Piggin authored
commit 85e3f1ad upstream. Radix VA space allocations test addresses against mm->task_size which is 512TB, even in cases where the intention is to limit allocation to below 128TB. This results in mmap with a hint address below 128TB but address + length above 128TB succeeding when it should fail (as hash does after the previous patch). Set the high address limit to be considered up front, and base subsequent allocation checks on that consistently. Fixes: f4ea6dcb ("powerpc/mm: Enable mappings above 128TB") Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Reviewed-by:
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michael Ellerman authored
commit 475b581f upstream. On 64-bit Book3s, when we take an instruction fault the reason for the fault may be reported in SRR1. For data faults the reason is reported in DSISR (Data Storage Instruction Status Register). The reasons reported in each do not necessarily correspond, so we mask the SRR1 bits before copying them to the DSISR, which is then used by the page fault code. Prior to commit b4c001dc ("powerpc/mm: Use symbolic constants for filtering SRR1 bits on ISIs") we used a hard-coded mask of 0x58200000, which corresponds to: DSISR_NOHPTE 0x40000000 /* no translation found */ DSISR_NOEXEC_OR_G 0x10000000 /* exec of no-exec or guarded */ DSISR_PROTFAULT 0x08000000 /* protection fault */ DSISR_KEYFAULT 0x00200000 /* Storage Key fault */ That commit added a #define for the mask, DSISR_SRR1_MATCH_64S, but incorrectly used a different similarly named DSISR_BAD_FAULT_64S. This had the effect of changing the mask to 0xa43a0000, which omits everything but DSISR_KEYFAULT. Luckily this had no visible effect, because in practice we hardly use the DSISR bits. The lack of DSISR_NOHPTE means a TLB flush optimisation was missed in the native HPTE code, and DSISR_NOEXEC_OR_G and DSISR_PROTFAULT are both only used to trigger rare warnings. So we got lucky, but let's fix it. The new value only has bits between 17 and 30 set, so we can continue to use andis. Fixes: b4c001dc ("powerpc/mm: Use symbolic constants for filtering SRR1 bits on ISIs") Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Naveen N. Rao authored
commit 46725b17 upstream. When a uprobe is installed on an instruction that we currently do not emulate, we copy the instruction into a xol buffer and single step that instruction. If that instruction generates a fault, we abort the single stepping before invoking the signal handler. Once the signal handler is done, the uprobe trap is hit again since the instruction is retried and the process repeats. We use uprobe_deny_signal() to detect if the xol instruction triggered a signal. If so, we clear TIF_SIGPENDING and set TIF_UPROBE so that the signal is not handled until after the single stepping is aborted. In this case, uprobe_deny_signal() returns true and get_signal() ends up returning 0. However, in do_signal(), we are not looking at the return value, but depending on ksig.sig for further action, all with an uninitialized ksig that is not touched in this scenario. Fix the same by initializing ksig.sig to 0. Fixes: 129b69df ("powerpc: Use get_signal() signal_setup_done()") Reported-by:
Anton Blanchard <anton@samba.org> Signed-off-by:
Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michael Ellerman authored
commit f3f1dfd6 upstream. init_imc_pmu() uses topology_physical_package_id() to detect the node id of the processor it is on to get local memory, but that's wrong, and can lead to crashes. Fix it to use cpu_to_node(). Fixes: 885dcd70 ("powerpc/perf: Add nest IMC PMU support") Reported-By:
Rob Lippert <rlippert@google.com> Tested-By:
Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by:
Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Balbir Singh authored
commit f79ad50e upstream. When using the radix MMU on Power9 DD1, to work around a hardware problem, radix__pte_update() is required to do a two stage update of the PTE. First we write a zero value into the PTE, then we flush the TLB, and then we write the new PTE value. In the normal case that works OK, but it does not work if we're updating the PTE that maps the code we're executing, because the mapping is removed by the TLB flush and we can no longer execute from it. Unfortunately the STRICT_RWX code needs to do exactly that. The exact symptoms when we hit this case vary, sometimes we print an oops and then get stuck after that, but I've also seen a machine just get stuck continually page faulting with no oops printed. The variance is presumably due to the exact layout of the text and the page size used for the mappings. In all cases we are unable to boot to a shell. There are possible solutions such as creating a second mapping of the TLB flush code, executing from that, and then jumping back to the original. However we don't want to add that level of complexity for a DD1 work around. So just detect that we're running on Power9 DD1 and refrain from changing the permissions, effectively disabling STRICT_RWX on Power9 DD1. Fixes: 7614ff32 ("powerpc/mm/radix: Implement STRICT_RWX/mark_rodata_ro() for Radix") Reported-by:
Andrew Jeffery <andrew@aj.id.au> [Changelog as suggested by Michael Ellerman <mpe@ellerman.id.au>] Signed-off-by:
Balbir Singh <bsingharora@gmail.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Christophe Leroy authored
commit 252eb558 upstream. On powerpc32, patch_instruction() is called by apply_feature_fixups() which is called from early_init() There is the following note in front of early_init(): * Note that the kernel may be running at an address which is different * from the address that it was linked at, so we must use RELOC/PTRRELOC * to access static data (including strings). -- paulus Therefore, slab_is_available() cannot be called yet, and text_poke_area must be addressed with PTRRELOC() Fixes: 95902e6c ("powerpc/mm: Implement STRICT_KERNEL_RWX on PPC32") Reported-by:
Meelis Roos <mroos@linux.ee> Signed-off-by:
Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
John David Anglin authored
commit 05f016d2 upstream. As noted by Christoph Biedl, passing a pointer size of 4 in the new CAS implementation causes a kernel crash. The attached patch corrects the off by one error in the argument validity check. In reviewing the code, I noticed that we only perform word operations with the pointer size argument. The subi instruction intentionally uses a word condition on 64-bit kernels. Nullification was used instead of a cmpib instruction as the branch should never be taken. The shlw pseudo-operation generates a depw,z instruction and it clears the target before doing a shift left word deposit. Thus, we don't need to clip the upper 32 bits of this argument on 64-bit kernels. Tested with a gcc testsuite run with a 64-bit kernel. The gcc atomic code in libgcc is the only direct user of the new CAS implementation that I am aware of. Signed-off-by:
John David Anglin <dave.anglin@bell.net> Signed-off-by:
Helge Deller <deller@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Brian King authored
commit 0a9a17e3 upstream. This patch fixes an issue seen on Power systems with ixgbe which results in skb list corruption and an eventual kernel oops. The following is what was observed: CPU 1 CPU2 ============================ ============================ 1: ixgbe_xmit_frame_ring ixgbe_clean_tx_irq 2: first->skb = skb eop_desc = tx_buffer->next_to_watch 3: ixgbe_tx_map read_barrier_depends() 4: wmb check adapter written status bit 5: first->next_to_watch = tx_desc napi_consume_skb(tx_buffer->skb ..); 6: writel(i, tx_ring->tail); The read_barrier_depends is insufficient to ensure that tx_buffer->skb does not get loaded prior to tx_buffer->next_to_watch, which then results in loading a stale skb pointer. This patch replaces the read_barrier_depends with smp_rmb to ensure loads are ordered with respect to the load of tx_buffer->next_to_watch. Signed-off-by:
Brian King <brking@linux.vnet.ibm.com> Acked-by:
Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by:
Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by:
Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Brian King authored
commit 7b8edcc6 upstream. The original issue being fixed in this patch was seen with the ixgbe driver, but the same issue exists with fm10k as well, as the code is very similar. read_barrier_depends is not sufficient to ensure loads following it are not speculatively loaded out of order by the CPU, which can result in stale data being loaded, causing potential system crashes. Signed-off-by:
Brian King <brking@linux.vnet.ibm.com> Acked-by:
Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by:
Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Brian King authored
commit f72271e2 upstream. The original issue being fixed in this patch was seen with the ixgbe driver, but the same issue exists with i40evf as well, as the code is very similar. read_barrier_depends is not sufficient to ensure loads following it are not speculatively loaded out of order by the CPU, which can result in stale data being loaded, causing potential system crashes. Signed-off-by:
Brian King <brking@linux.vnet.ibm.com> Acked-by:
Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by:
Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by:
Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Brian King authored
commit ae0c585d upstream. The original issue being fixed in this patch was seen with the ixgbe driver, but the same issue exists with ixgbevf as well, as the code is very similar. read_barrier_depends is not sufficient to ensure loads following it are not speculatively loaded out of order by the CPU, which can result in stale data being loaded, causing potential system crashes. Signed-off-by:
Brian King <brking@linux.vnet.ibm.com> Acked-by:
Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by:
Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by:
Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Brian King authored
commit 1e1f9ca5 upstream. The original issue being fixed in this patch was seen with the ixgbe driver, but the same issue exists with igbvf as well, as the code is very similar. read_barrier_depends is not sufficient to ensure loads following it are not speculatively loaded out of order by the CPU, which can result in stale data being loaded, causing potential system crashes. Signed-off-by:
Brian King <brking@linux.vnet.ibm.com> Acked-by:
Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by:
Aaron Brown <aaron.f.brown@intel.com> Signed-off-by:
Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Brian King authored
commit c4cb9918 upstream. The original issue being fixed in this patch was seen with the ixgbe driver, but the same issue exists with igb as well, as the code is very similar. read_barrier_depends is not sufficient to ensure loads following it are not speculatively loaded out of order by the CPU, which can result in stale data being loaded, causing potential system crashes. Signed-off-by:
Brian King <brking@linux.vnet.ibm.com> Acked-by:
Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by:
Aaron Brown <aaron.f.brown@intel.com> Signed-off-by:
Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Brian King authored
commit 52c6912f upstream. The original issue being fixed in this patch was seen with the ixgbe driver, but the same issue exists with i40e as well, as the code is very similar. read_barrier_depends is not sufficient to ensure loads following it are not speculatively loaded out of order by the CPU, which can result in stale data being loaded, causing potential system crashes. Signed-off-by:
Brian King <brking@linux.vnet.ibm.com> Acked-by:
Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by:
Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by:
Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Bin Meng authored
commit 9d63f176 upstream. There are two bugs in current intel_spi_sw_cycle(): - The 'data byte count' field should be the number of bytes transferred minus 1 - SSFSTS_CTL is the offset from ispi->sregs, not ispi->base Signed-off-by:
Bin Meng <bmeng.cn@gmail.com> Acked-by:
Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by:
Cyrille Pitchen <cyrille.pitchen@wedev4u.fr> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Johan Hovold authored
commit c45e3e4c upstream. A recent change fixing NFC device allocation itself introduced an error-handling bug by returning an error pointer in case device-id allocation failed. This is clearly broken as the callers still expected NULL to be returned on errors as detected by Dan's static checker. Fix this up by returning NULL in the event that we've run out of memory when allocating a new device id. Note that the offending commit is marked for stable (3.8) so this fix needs to be backported along with it. Fixes: 20777bc5 ("NFC: fix broken device allocation") Reported-by:
Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by:
Johan Hovold <johan@kernel.org> Signed-off-by:
Samuel Ortiz <sameo@linux.intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Daniel Jurgens authored
commit 877add28 upstream. When modify QP is called on a shared QP update the security context for the real QP. When security is subsequently enforced the shared QP handles will be checked as well. Without this change shared QP handles get added to the port/pkey lists, which is a bug, because not all shared QP handles will be checked for access. Also the shared QP security context wouldn't get removed from the port/pkey lists causing access to free memory and list corruption when they are destroyed. Fixes: d291f1a6 ("IB/core: Enforce PKey security on QPs") Signed-off-by:
Daniel Jurgens <danielj@mellanox.com> Reviewed-by:
Parav Pandit <parav@mellanox.com> Signed-off-by:
Leon Romanovsky <leon@kernel.org> Signed-off-by:
Doug Ledford <dledford@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Parav Pandit authored
commit 89548bca upstream. Below kernel crash is observed when Pkey security enforcement fails on received MADs. This issue is reported in [1]. ib_free_recv_mad() accesses the rmpp_list, whose initialization is needed before accessing it. When security enformcent fails on received MADs, MAD processing avoided due to security checks failed. OpenSM[3770]: SM port is down kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 kernel: IP: ib_free_recv_mad+0x44/0xa0 [ib_core] kernel: PGD 0 kernel: P4D 0 kernel: kernel: Oops: 0002 [#1] SMP kernel: CPU: 0 PID: 2833 Comm: kworker/0:1H Tainted: P IO 4.13.4-1-pve #1 kernel: Hardware name: Dell XS23-TY3 /9CMP63, BIOS 1.71 09/17/2013 kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core] kernel: task: ffffa069c6541600 task.stack: ffffb9a729054000 kernel: RIP: 0010:ib_free_recv_mad+0x44/0xa0 [ib_core] kernel: RSP: 0018:ffffb9a729057d38 EFLAGS: 00010286 kernel: RAX: ffffa069cb138a48 RBX: ffffa069cb138a10 RCX: 0000000000000000 kernel: RDX: ffffb9a729057d38 RSI: 0000000000000000 RDI: ffffa069cb138a20 kernel: RBP: ffffb9a729057d60 R08: ffffa072d2d49800 R09: ffffa069cb138ae0 kernel: R10: ffffa069cb138ae0 R11: ffffa072b3994e00 R12: ffffb9a729057d38 kernel: R13: ffffa069d1c90000 R14: 0000000000000000 R15: ffffa069d1c90880 kernel: FS: 0000000000000000(0000) GS:ffffa069dba00000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 0000000000000008 CR3: 00000011f51f2000 CR4: 00000000000006f0 kernel: Call Trace: kernel: ib_mad_recv_done+0x5cc/0xb50 [ib_core] kernel: __ib_process_cq+0x5c/0xb0 [ib_core] kernel: ib_cq_poll_work+0x20/0x60 [ib_core] kernel: process_one_work+0x1e9/0x410 kernel: worker_thread+0x4b/0x410 kernel: kthread+0x109/0x140 kernel: ? process_one_work+0x410/0x410 kernel: ? kthread_create_on_node+0x70/0x70 kernel: ? SyS_exit_group+0x14/0x20 kernel: ret_from_fork+0x25/0x30 kernel: RIP: ib_free_recv_mad+0x44/0xa0 [ib_core] RSP: ffffb9a729057d38 kernel: CR2: 0000000000000008 [1] : https://www.spinics.net/lists/linux-rdma/msg56190.html Fixes: 47a2b338 ("IB/core: Enforce security on management datagrams") Signed-off-by:
Parav Pandit <parav@mellanox.com> Reported-by:
Chris Blake <chrisrblake93@gmail.com> Reviewed-by:
Daniel Jurgens <danielj@mellanox.com> Reviewed-by:
Hal Rosenstock <hal@mellanox.com> Signed-off-by:
Doug Ledford <dledford@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Bart Van Assche authored
commit 8a0d18c6 upstream. This patch fixes the following kernel crash: general protection fault: 0000 [#1] PREEMPT SMP Workqueue: ib_mad2 timeout_sends [ib_core] Call Trace: ib_sa_path_rec_callback+0x1c4/0x1d0 [ib_core] send_handler+0xb2/0xd0 [ib_core] timeout_sends+0x14d/0x220 [ib_core] process_one_work+0x200/0x630 worker_thread+0x4e/0x3b0 kthread+0x113/0x150 Fixes: commit aef9ec39 ("IB: Add SCSI RDMA Protocol (SRP) initiator") Signed-off-by:
Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by:
Sagi Grimberg <sagi@grimberg.me> Signed-off-by:
Doug Ledford <dledford@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michael J. Ruhl authored
commit d7d62617 upstream. The addition of the VNIC contexts to num_rcv_contexts changes the meaning of the sysfs value nctxts from available user contexts, to user contexts + reserved VNIC contexts. User applications that use nctxts are now broken. Update the calculation so that VNIC contexts are used only if there are hardware contexts available, and do not silently affect nctxts. Update code to use the calculated VNIC context number. Update the sysfs value nctxts to be available user contexts only. Fixes: 2280740f ("IB/hfi1: Virtual Network Interface Controller (VNIC) HW support") Reviewed-by:
Ira Weiny <ira.weiny@intel.com> Reviewed-by:
Niranjana Vishwanathapura <Niranjana.Vishwanathapura@intel.com> Reviewed-by:
Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by:
Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by:
Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by:
Doug Ledford <dledford@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Parav Pandit authored
commit 5a3dc323 upstream. In recent code, two path record entries are alwasy cleared while allocated could be either one or two path record entries. This leads to zero out of unallocated memory. This fix initializes alternative path record only when alternative path is set. While we are at it, path record allocation doesn't check for OPA alternative path, but rest of the code checks for OPA alternative path. Path record allocation code doesn't check for OPA alternative LID. This can further lead to memory corruption when only one path record is allocated, but there is actually alternative OPA path record present in CM request. Fixes: 9fdca4da ("IB/SA: Split struct sa_path_rec based on IB and ROCE specific fields") Signed-off-by:
Parav Pandit <parav@mellanox.com> Reviewed-by:
Moni Shoua <monis@mellanox.com> Signed-off-by:
Leon Romanovsky <leon@kernel.org> Signed-off-by:
Doug Ledford <dledford@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Bart Van Assche authored
commit c70ca389 upstream. Make srpt_parse_i_port_id() return a negative value if hex2bin() fails. Fixes: commit a42d985b ("ib_srpt: Initial SRP Target merge for v3.3-rc1") Signed-off-by:
Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by:
Doug Ledford <dledford@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Chuck Lever authored
commit 0bad47ca upstream. During each NFSv4 callback Call, an RDMA Send completion frees the page that contains the RPC Call message. If the upper layer determines that a retransmit is necessary, this is too soon. One possible symptom: after a GARBAGE_ARGS response an NFSv4.1 callback request, the following BUG fires on the NFS server: kernel: BUG: Bad page state in process kworker/0:2H pfn:7d3ce2 kernel: page:ffffea001f4f3880 count:-2 mapcount:0 mapping: (null) index:0x0 kernel: flags: 0x2fffff80000000() kernel: raw: 002fffff80000000 0000000000000000 0000000000000000 fffffffeffffffff kernel: raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000 kernel: page dumped because: nonzero _refcount kernel: Modules linked in: cts rpcsec_gss_krb5 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue rpcrdm a ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pc lmul crc32_pclmul ghash_clmulni_intel pcbc iTCO_wdt iTCO_vendor_support aesni_intel crypto_simd glue_helper cryptd pcspkr lpc_ich i2c_i801 mei_me mf d_core mei raid0 sg wmi ioatdma ipmi_si ipmi_devintf ipmi_msghandler shpchp acpi_power_meter acpi_pad nfsd nfs_acl lockd auth_rpcgss grace sunrpc ip_tables xfs libcrc32c mlx4_en mlx4_ib mlx5_ib ib_core sd_mod sr_mod cdrom ast drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci crc32c_intel libahci drm mlx5_core igb libata mlx4_core dca i2c_algo_bit i2c_core nvme kernel: ptp nvme_core pps_core dm_mirror dm_region_hash dm_log dm_mod dax kernel: CPU: 0 PID: 11495 Comm: kworker/0:2H Not tainted 4.14.0-rc3-00001-g577ce48 #811 kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015 kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core] kernel: Call Trace: kernel: dump_stack+0x62/0x80 kernel: bad_page+0xfe/0x11a kernel: free_pages_check_bad+0x76/0x78 kernel: free_pcppages_bulk+0x364/0x441 kernel: ? ttwu_do_activate.isra.61+0x71/0x78 kernel: free_hot_cold_page+0x1c5/0x202 kernel: __put_page+0x2c/0x36 kernel: svc_rdma_put_context+0xd9/0xe4 [rpcrdma] kernel: svc_rdma_wc_send+0x50/0x98 [rpcrdma] This issue exists all the way back to v4.5, but refactoring and code re-organization prevents this simple patch from applying to kernels older than v4.12. The fix is the same, however, if someone needs to backport it. Reported-by:
Ben Coddington <bcodding@redhat.com> BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=314 Fixes: 5d252f90 ('svcrdma: Add class for RDMA backwards ... ') Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Reviewed-by:
Jeff Layton <jlayton@redhat.com> Signed-off-by:
J. Bruce Fields <bfields@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dan Williams authored
commit c1fb3542 upstream. For the same reason that /proc/iomem returns 0's for non-root readers and acpi tables are root-only, make the 'resource' attribute for namespace devices only readable by root. Otherwise we disclose physical address information. Fixes: bf9bccc1 ("libnvdimm: pmem label sets and namespace instantiation") Reported-by:
Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by:
Dan Williams <dan.j.williams@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dan Williams authored
commit b8ff981f upstream. For the same reason that /proc/iomem returns 0's for non-root readers and acpi tables are root-only, make the 'resource' attribute for region devices only readable by root. Otherwise we disclose physical address information. Fixes: 802f4be6 ("libnvdimm: Add 'resource' sysfs attribute to regions") Cc: Dave Jiang <dave.jiang@intel.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Reported-by:
Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by:
Dan Williams <dan.j.williams@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dan Williams authored
commit b18d4b8a upstream. The set of valid sequence numbers is {1,2,3}. The specification indicates that an implementation should consider 0 a sign of a critical error: UEFI 2.7: 13.19 NVDIMM Label Protocol Software never writes the sequence number 00, so a correctly check-summed Index Block with this sequence number probably indicates a critical error. When software discovers this case it treats it as an invalid Index Block indication. While the expectation is that the invalid block is just thrown away, the Robustness Principle says we should fix this to make both sequence numbers valid. Fixes: f524bf27 ("libnvdimm: write pmem label set") Reported-by:
Juston Li <juston.li@intel.com> Signed-off-by:
Dan Williams <dan.j.williams@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dan Williams authored
commit 26417ae4 upstream. For the same reason that /proc/iomem returns 0's for non-root readers and acpi tables are root-only, make the 'resource' attribute for pfn devices only readable by root. Otherwise we disclose physical address information. Fixes: f6ed58c7 ("libnvdimm, pfn: 'resource'-address and 'size'...") Reported-by:
Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by:
Dan Williams <dan.j.williams@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dan Williams authored
commit d34cb808 upstream. If we successfully enable a DIMM then it must not be locked and we can clear the label-read failure condition. Otherwise, we need to reload the entire bus provider driver to achieve the same effect, and that can disrupt unrelated DIMMs and namespaces. Fixes: 9d62ed96 ("libnvdimm: handle locked label storage areas") Signed-off-by:
Dan Williams <dan.j.williams@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Johan Hovold authored
commit 33ec6dbc upstream. Fix child node-lookup during probe, which ended up searching the whole device tree depth-first starting at parent rather than just matching on its children. Note that the original premature free of the parent node has already been fixed separately, but that fix was apparently never backported to stable. Fixes: 9ac33b0c ("CLK: TI: Driver for DRA7 ATL (Audio Tracking Logic)") Fixes: 660e1551 ("clk: ti: dra7-atl-clock: Fix of_node reference counting") Cc: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by:
Johan Hovold <johan@kernel.org> Acked-by:
Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by:
Stephen Boyd <sboyd@codeaurora.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Trond Myklebust authored
commit e9d4bf21 upstream. There is no guarantee that either the request or the svc_xprt exist by the time we get round to printing the trace message. Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by:
J. Bruce Fields <bfields@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mikulas Patocka authored
commit 9f586fff upstream. Don't crash in case of allocation failure in dax_alloc_inode. syzkaller hit the following crash on e4880bc5 kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access [..] RIP: 0010:dax_alloc_inode+0x3b/0x70 drivers/dax/super.c:348 Call Trace: alloc_inode+0x65/0x180 fs/inode.c:208 new_inode_pseudo+0x69/0x190 fs/inode.c:890 new_inode+0x1c/0x40 fs/inode.c:919 mount_pseudo_xattr+0x288/0x560 fs/libfs.c:261 mount_pseudo include/linux/fs.h:2137 [inline] dax_mount+0x2e/0x40 drivers/dax/super.c:388 mount_fs+0x66/0x2d0 fs/super.c:1223 Fixes: 7b6be844 ("dax: refactor dax-fs into a generic provider...") Reported-by:
syzbot <syzkaller@googlegroups.com> Signed-off-by:
Mikulas Patocka <mpatocka@redhat.com> Signed-off-by:
Dan Williams <dan.j.williams@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jeff Moyer authored
commit 957ac8c4 upstream. PMD faults on a zero length file on a file system mounted with -o dax will not generate SIGBUS as expected. fd = open(...O_TRUNC); addr = mmap(NULL, 2*1024*1024, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); *addr = 'a'; <expect SIGBUS> The problem is this code in dax_iomap_pmd_fault: max_pgoff = (i_size_read(inode) - 1) >> PAGE_SHIFT; If the inode size is zero, we end up with a max_pgoff that is way larger than 0. :) Fix it by using DIV_ROUND_UP, as is done elsewhere in the kernel. I tested this with some simple test code that ensured that SIGBUS was received where expected. Fixes: 642261ac ("dax: add struct iomap based DAX PMD support") Signed-off-by:
Jeff Moyer <jmoyer@redhat.com> Signed-off-by:
Dan Williams <dan.j.williams@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Paolo Bonzini authored
commit 8a1b4392 upstream. This is more or less a revert of commit 2c82878b ("KVM: VMX: require virtual NMI support", 2017-03-27); it turns out that Core 2 Duo machines only had virtual NMIs in some SKUs. The revert is not trivial because in the meanwhile there have been several fixes to nested NMI injection. Therefore, the entire vNMI state is moved to struct loaded_vmcs. Another change compared to before the patch is a simplification here: if (unlikely(!cpu_has_virtual_nmis() && vmx->soft_vnmi_blocked && !(is_guest_mode(vcpu) && nested_cpu_has_virtual_nmis( get_vmcs12(vcpu))))) { The final condition here is always true (because nested_cpu_has_virtual_nmis is always false) and is removed. Fixes: 2c82878b Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1490803Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Radim Krčmář <rkrcmar@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Paolo Bonzini authored
commit 15038e14 upstream. For many years some users of assigned devices have reported worse performance on AMD processors with NPT than on AMD without NPT, Intel or bare metal. The reason turned out to be that SVM is discarding the guest PAT setting and uses the default (PA0=PA4=WB, PA1=PA5=WT, PA2=PA6=UC-, PA3=UC). The guest might be using a different setting, and especially might want write combining but isn't getting it (instead getting slow UC or UC- accesses). Thanks a lot to geoff@hostfission.com for noticing the relation to the g_pat setting. The patch has been tested also by a bunch of people on VFIO users forums. Fixes: 709ddebf Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=196409Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Reviewed-by:
David Hildenbrand <david@redhat.com> Tested-by:
Nick Sarnie <commendsarnex@gmail.com> Signed-off-by:
Radim Krčmář <rkrcmar@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ladi Prosek authored
commit 21f2d551 upstream. Intel SDM 27.5.2 Loading Host Segment and Descriptor-Table Registers: "The GDTR and IDTR limits are each set to FFFFH." Signed-off-by:
Ladi Prosek <lprosek@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Paul Mackerras authored
commit 00bb6ae5 upstream. When running a guest on a POWER9 system with the in-kernel XICS emulation disabled (for example by running QEMU with the parameter "-machine pseries,kernel_irqchip=off"), the kernel does not pass the XICS-related hypercalls such as H_CPPR up to userspace for emulation there as it should. The reason for this is that the real-mode handlers for these hypercalls don't check whether a XICS device has been instantiated before calling the xics-on-xive code. That code doesn't check either, leading to potential NULL pointer dereferences because vcpu->arch.xive_vcpu is NULL. Those dereferences won't cause an exception in real mode but will lead to kernel memory corruption. This fixes it by adding kvmppc_xics_enabled() checks before calling the XICS functions. Fixes: 5af50993 ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller") Signed-off-by:
Paul Mackerras <paulus@ozlabs.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Vasily Averin authored
commit dc3033e1 upstream. lockd_up() can call lockd_unregister_notifiers twice: inside lockd_start_svc() when it calls lockd_svc_exit_thread() and then in error path of lockd_up() Patch forces lockd_start_svc() to unregister notifiers in all error cases and removes extra unregister in error path of lockd_up(). Fixes: cb7d224f "lockd: unregister notifier blocks if the service ..." Signed-off-by:
Vasily Averin <vvs@virtuozzo.com> Reviewed-by:
Jeff Layton <jlayton@kernel.org> Signed-off-by:
J. Bruce Fields <bfields@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Johan Hovold authored
commit 00ee9a1c upstream. Fix child-node lookup during initialisation, which ended up searching the whole device tree depth-first starting at the parent rather than just matching on its children. To make things worse, the parent gic node was prematurely freed, while the ppi-partitions node was leaked. Fixes: e3825ba1 ("irqchip/gic-v3: Add support for partitioned PPIs") Signed-off-by:
Johan Hovold <johan@kernel.org> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Marc Zyngier authored
commit 4f8413a3 upstream. When requesting a shared interrupt, we assume that the firmware support code (DT or ACPI) has called irqd_set_trigger_type already, so that we can retrieve it and check that the requester is being reasonnable. Unfortunately, we still have non-DT, non-ACPI systems around, and these guys won't call irqd_set_trigger_type before requesting the interrupt. The consequence is that we fail the request that would have worked before. We can either chase all these use cases (boring), or address it in core code (easier). Let's have a per-irq_desc flag that indicates whether irqd_set_trigger_type has been called, and let's just check it when checking for a shared interrupt. If it hasn't been set, just take whatever the interrupt requester asks. Fixes: 382bd4de ("genirq: Use irqd_get_trigger_type to compare the trigger type for shared IRQs") Reported-and-tested-by:
Petr Cvek <petrcvekcz@gmail.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Nate Dailey authored
commit f6eca2d4 upstream. If freeze_array is attempted in the middle of close_sync/ wait_all_barriers, deadlock can occur. freeze_array will wait for nr_pending and nr_queued to line up. wait_all_barriers increments nr_pending for each barrier bucket, one at a time, but doesn't actually issue IO that could be counted in nr_queued. So freeze_array is blocked until wait_all_barriers completes and allow_all_barriers runs. At the same time, when _wait_barrier sees array_frozen == 1, it stops and waits for freeze_array to complete. Prevent the deadlock by making close_sync call _wait_barrier and _allow_barrier for one bucket at a time, instead of deferring the _allow_barrier calls until after all _wait_barriers are complete. Signed-off-by:
Nate Dailey <nate.dailey@stratus.com> Fix: fd76863e(RAID1: a new I/O barrier implementation to remove resync window) Reviewed-by:
Coly Li <colyli@suse.de> Signed-off-by:
Shaohua Li <shli@fb.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-