- 04 Jan, 2020 14 commits
-
-
Aneesh Kumar K.V authored
[ Upstream commit 75838a32 ] If the hypervisor returned H_PTEG_FULL for H_ENTER hcall, retry a hash page table insert by removing a random entry from the group. After some runtime, it is very well possible to find all the 8 hash page table entry slot in the hpte group used for mapping. Don't fail a bolted entry insert in that case. With Storage class memory a user can find this error easily since a namespace enable/disable is equivalent to memory add/remove. This results in failures as reported below: $ ndctl create-namespace -r region1 -t pmem -m devdax -a 65536 -s 100M libndctl: ndctl_dax_enable: dax1.3: failed to enable Error: namespace1.2: failed to enable failed to create namespace: No such device or address In kernel log we find the details as below: Unable to create mapping for hot added memory 0xc000042006000000..0xc00004200d000000: -1 dax_pmem: probe of dax1.3 failed with error -14 This indicates that we failed to create a bolted hash table entry for direct-map address backing the namespace. We also observe failures such that not all namespaces will be enabled with ndctl enable-namespace all command. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191024093542.29777-2-aneesh.kumar@linux.ibm.comSigned-off-by: Sasha Levin <sashal@kernel.org>
-
Michael Ellerman authored
[ Upstream commit eb8e20f8 ] accumulate_stolen_time() is called prior to interrupt state being reconciled, which can trip the warning in arch_local_irq_restore(): WARNING: CPU: 5 PID: 1017 at arch/powerpc/kernel/irq.c:258 .arch_local_irq_restore+0x9c/0x130 ... NIP .arch_local_irq_restore+0x9c/0x130 LR .rb_start_commit+0x38/0x80 Call Trace: .ring_buffer_lock_reserve+0xe4/0x620 .trace_function+0x44/0x210 .function_trace_call+0x148/0x170 .ftrace_ops_no_ops+0x180/0x1d0 ftrace_call+0x4/0x8 .accumulate_stolen_time+0x1c/0xb0 decrementer_common+0x124/0x160 For now just mark it as notrace. We may change the ordering to call it after interrupt state has been reconciled, but that is a larger change. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191024055932.27940-1-mpe@ellerman.id.auSigned-off-by: Sasha Levin <sashal@kernel.org>
-
Xiang Chen authored
[ Upstream commit 550c0d89 ] For IOs from upper layer, preemption may be disabled as it may be called by function __blk_mq_delay_run_hw_queue which will call get_cpu() (it disables preemption). So if flags HISI_SAS_REJECT_CMD_BIT is set in function hisi_sas_task_exec(), it may disable preempt twice after down() and up() which will cause following call trace: BUG: scheduling while atomic: fio/60373/0x00000002 Call trace: dump_backtrace+0x0/0x150 show_stack+0x24/0x30 dump_stack+0xa0/0xc4 __schedule_bug+0x68/0x88 __schedule+0x4b8/0x548 schedule+0x40/0xd0 schedule_timeout+0x200/0x378 __down+0x78/0xc8 down+0x54/0x70 hisi_sas_task_exec.isra.10+0x598/0x8d8 [hisi_sas_main] hisi_sas_queue_command+0x28/0x38 [hisi_sas_main] sas_queuecommand+0x168/0x1b0 [libsas] scsi_queue_rq+0x2ac/0x980 blk_mq_dispatch_rq_list+0xb0/0x550 blk_mq_do_dispatch_sched+0x6c/0x110 blk_mq_sched_dispatch_requests+0x114/0x1d8 __blk_mq_run_hw_queue+0xb8/0x130 __blk_mq_delay_run_hw_queue+0x1c0/0x220 blk_mq_run_hw_queue+0xb0/0x128 blk_mq_sched_insert_requests+0xdc/0x208 blk_mq_flush_plug_list+0x1b4/0x3a0 blk_flush_plug_list+0xdc/0x110 blk_finish_plug+0x3c/0x50 blkdev_direct_IO+0x404/0x550 generic_file_read_iter+0x9c/0x848 blkdev_read_iter+0x50/0x78 aio_read+0xc8/0x170 io_submit_one+0x1fc/0x8d8 __arm64_sys_io_submit+0xdc/0x280 el0_svc_common.constprop.0+0xe0/0x1e0 el0_svc_handler+0x34/0x90 el0_svc+0x10/0x14 ... To solve the issue, check preemptible() to avoid disabling preempt multiple when flag HISI_SAS_REJECT_CMD_BIT is set. Link: https://lore.kernel.org/r/1571926105-74636-5-git-send-email-john.garry@huawei.comSigned-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Dan Carpenter authored
[ Upstream commit d6c9b31a ] These are called with IRQs disabled from csio_mgmt_tmo_handler() so we can't call spin_unlock_irq() or it will enable IRQs prematurely. Fixes: a3667aae ("[SCSI] csiostor: Chelsio FCoE offload driver") Link: https://lore.kernel.org/r/20191019085913.GA14245@mwandaSigned-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
James Smart authored
[ Upstream commit feff8b3d ] When operating in private loop mode, PLOGI exchanges are racing and the driver tries to abort it's PLOGI. But the PLOGI abort ends up terminating the login with the other end causing the other end to abort its PLOGI as well. Discovery never fully completes. Fix by disabling the PLOGI abort when private loop and letting the state machine play out. Link: https://lore.kernel.org/r/20191018211832.7917-5-jsmart2021@gmail.comSigned-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
David Disseldorp authored
[ Upstream commit 9cef2a79 ] RFC 2307 states: For CHAP [RFC1994], in the first step, the initiator MUST send: CHAP_A=<A1,A2...> Where A1,A2... are proposed algorithms, in order of preference. ... For the Algorithm, as stated in [RFC1994], one value is required to be implemented: 5 (CHAP with MD5) LIO currently checks for this value by only comparing a single byte in the tokenized Algorithm string, which means that any value starting with a '5' (e.g. "55") is interpreted as "CHAP with MD5". Fix this by comparing the entire tokenized string. Reviewed-by: Lee Duncan <lduncan@suse.com> Reviewed-by: Mike Christie <mchristi@redhat.com> Signed-off-by: David Disseldorp <ddiss@suse.de> Link: https://lore.kernel.org/r/20190912095547.22427-2-ddiss@suse.deSigned-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Nicholas Graumann authored
[ Upstream commit 8a631a5a ] Whenever we reset the channel, we need to clear desc_pendingcount along with desc_submitcount. Otherwise when a new transaction is submitted, the irq coalesce level could be programmed to an incorrect value in the axidma case. This behavior can be observed when terminating pending transactions with xilinx_dma_terminate_all() and then submitting new transactions without releasing and requesting the channel. Signed-off-by: Nicholas Graumann <nick.graumann@gmail.com> Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com> Link: https://lore.kernel.org/r/1571150904-3988-8-git-send-email-radhey.shyam.pandey@xilinx.comSigned-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Thierry Reding authored
[ Upstream commit 96d3ab80 ] Page tables that reside in physical memory beyond the 4 GiB boundary are currently not working properly. The reason is that when the physical address for page directory entries is read, it gets truncated at 32 bits and can cause crashes when passing that address to the DMA API. Fix this by first casting the PDE value to a dma_addr_t and then using the page frame number mask for the SMMU instance to mask out the invalid bits, which are typically used for mapping attributes, etc. Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Ezequiel Garcia authored
[ Upstream commit 42bb97b8 ] IOMMU domain resource life is well-defined, managed by .domain_alloc and .domain_free. Therefore, domain-specific resources shouldn't be tied to the device life, but instead to its domain. Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Chao Yu authored
[ Upstream commit fe1897ea ] generic/018 reports an inconsistent status of atime, the testcase is as below: - open file with O_SYNC - write file to construct fraged space - calc md5 of file - record {a,c,m}time - defrag file --- do nothing - umount & mount - check {a,c,m}time The root cause is, as f2fs enables lazytime by default, atime update will dirty vfs inode, rather than dirtying f2fs inode (by set with FI_DIRTY_INODE), so later f2fs_write_inode() called from VFS will fail to update inode page due to our skip: f2fs_write_inode() if (is_inode_flag_set(inode, FI_DIRTY_INODE)) return 0; So eventually, after evict(), we lose last atime for ever. To fix this issue, we need to check whether {a,c,m,cr}time is consistent in between inode cache and inode page, and only skip f2fs_update_inode() if f2fs inode is not dirty and time is consistent as well. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Evan Green authored
[ Upstream commit 463fa44e ] Across suspend and resume, we are seeing error messages like the following: atmel_mxt_ts i2c-PRP0001:00: __mxt_read_reg: i2c transfer failed (-121) atmel_mxt_ts i2c-PRP0001:00: Failed to read T44 and T5 (-121) This occurs because the driver leaves its IRQ enabled. Upon resume, there is an IRQ pending, but the interrupt is serviced before both the driver and the underlying I2C bus have been resumed. This causes EREMOTEIO errors. Disable the IRQ in suspend, and re-enable it on resume. If there are cases where the driver enters suspend with interrupts disabled, that's a bug we should fix separately. Signed-off-by: Evan Green <evgreen@chromium.org> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
James Smart authored
[ Upstream commit 07b85824 ] Symptoms were seen of the driver not having valid data for mailbox commands. After debugging, the following sequence was found: The driver maintains a port-wide pointer of the mailbox command that is currently in execution. Once finished, the port-wide pointer is cleared (done in lpfc_sli4_mq_release()). The next mailbox command issued will set the next pointer and so on. The mailbox response data is only copied if there is a valid port-wide pointer. In the failing case, it was seen that a new mailbox command was being attempted in parallel with the completion. The parallel path was seeing the mailbox no long in use (flag check under lock) and thus set the port pointer. The completion path had cleared the active flag under lock, but had not touched the port pointer. The port pointer is cleared after the lock is released. In this case, the completion path cleared the just-set value by the parallel path. Fix by making the calls that clear mbox state/port pointer while under lock. Also slightly cleaned up the error path. Link: https://lore.kernel.org/r/20190922035906.10977-8-jsmart2021@gmail.comSigned-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
Sreekanth Reddy authored
[ Upstream commit 782b2818 ] When user issues diag register command from application with required size, and if driver unable to allocate the memory, then it will fail the register command. While failing the register command, driver is not currently clearing MPT3_CMD_PENDING bit in ctl_cmds.status variable which was set before trying to allocate the memory. As this bit is set, subsequent register command will be failed with BUSY status even when user wants to register the trace buffer will less memory. Clear MPT3_CMD_PENDING bit in ctl_cmds.status before returning the diag register command with no memory status. Link: https://lore.kernel.org/r/1568379890-18347-4-git-send-email-sreekanth.reddy@broadcom.comSigned-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
James Smart authored
[ Upstream commit 3f97aed6 ] An issue was seen discovering all SCSI Luns when a target device undergoes link bounce. The driver currently does not qualify the FC4 support on the target. Therefore it will send a SCSI PRLI and an NVMe PRLI. The expectation is that the target will reject the PRLI if it is not supported. If a PRLI times out, the driver will retry. The driver will not proceed with the device until both SCSI and NVMe PRLIs are resolved. In the failure case, the device is FCP only and does not respond to the NVMe PRLI, thus initiating the wait/retry loop in the driver. During that time, a RSCN is received (device bounced) causing the driver to issue a GID_FT. The GID_FT response comes back before the PRLI mess is resolved and it prematurely cancels the PRLI retry logic and leaves the device in a STE_PRLI_ISSUE state. Discovery with the target never completes or resets. Fix by resetting the node state back to STE_NPR_NODE when GID_FT completes, thereby restarting the discovery process for the node. Link: https://lore.kernel.org/r/20190922035906.10977-10-jsmart2021@gmail.comSigned-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-
- 31 Dec, 2019 26 commits
-
-
Greg Kroah-Hartman authored
-
Masami Hiramatsu authored
commit 91e2f539 upstream. Fix die_walk_lines() to list the function entry line correctly. Since the dwarf_entrypc() does not return the entry pc if the DIE has only range attribute, __die_walk_funclines() fails to list the declaration line (entry line) in that case. To solve this issue, this introduces die_entrypc() which correctly returns the entry PC (the first address range) even if the DIE has only range attribute. With this fix die_walk_lines() shows the function entry line is able to probe correctly. Fixes: 4cc9cec6 ("perf probe: Introduce lines walker interface") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/157190837419.1859.4619125803596816752.stgit@devnote2Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Thomas Backlund <tmb@mageia.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mike Christie authored
commit 1c05839a upstream. This fixes a regression added with: commit e9e006f5 Author: Mike Christie <mchristi@redhat.com> Date: Sun Aug 4 14:10:06 2019 -0500 nbd: fix max number of supported devs where we can deadlock during device shutdown. The problem occurs if the recv_work's nbd_config_put occurs after nbd_start_device_ioctl has returned and the userspace app has droppped its reference via closing the device and running nbd_release. The recv_work nbd_config_put call would then drop the refcount to zero and try to destroy the config which would try to do destroy_workqueue from the recv work. This patch just has nbd_start_device_ioctl do a flush_workqueue when it wakes so we know after the ioctl returns running works have exited. This also fixes a possible race where we could try to reuse the device while old recv_works are still running. Cc: stable@vger.kernel.org Fixes: e9e006f5 ("nbd: fix max number of supported devs") Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Adrian Hunter authored
commit 75d27ea1 upstream. Command queuing has been reported broken on some systems based on Intel GLK. A separate patch disables command queuing in some cases. This patch adds a quirk for broken command queuing, which enables users with problems to disable command queuing using sdhci module parameters for quirks. Fixes: 8ee82bda ("mmc: sdhci-pci: Add CQHCI support for Intel GLK") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20191217095349.14592-2-adrian.hunter@intel.comSigned-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Adrian Hunter authored
commit bedf9fc0 upstream. Command queuing has been reported broken on some Lenovo systems based on Intel GLK. This is likely a BIOS issue, so disable command queuing for Intel GLK if the BIOS vendor string is "LENOVO". Fixes: 8ee82bda ("mmc: sdhci-pci: Add CQHCI support for Intel GLK") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20191217095349.14592-1-adrian.hunter@intel.comSigned-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Yangbo Lu authored
commit fe0acab4 upstream. Two previous patches introduced below quirks for P2020 platforms. - SDHCI_QUIRK_RESET_AFTER_REQUEST - SDHCI_QUIRK_BROKEN_TIMEOUT_VAL The patches made a mistake to add them in quirks2 of sdhci_host structure, while they were defined for quirks. host->quirks2 |= SDHCI_QUIRK_RESET_AFTER_REQUEST; host->quirks2 |= SDHCI_QUIRK_BROKEN_TIMEOUT_VAL; This patch is to fix them. host->quirks |= SDHCI_QUIRK_RESET_AFTER_REQUEST; host->quirks |= SDHCI_QUIRK_BROKEN_TIMEOUT_VAL; Fixes: 05cb6b2a ("mmc: sdhci-of-esdhc: add erratum eSDHC-A001 and A-008358 support") Fixes: a46e4271 ("mmc: sdhci-of-esdhc: add erratum eSDHC5 support") Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20191216031842.40068-1-yangbo.lu@nxp.comSigned-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Faiz Abbas authored
commit 2c92dd20 upstream. Tuning support in DDR50 speed mode was added in SD Specifications Part1 Physical Layer Specification v3.01. Its not possible to distinguish between v3.00 and v3.01 from the SCR and that is why since commit 4324f6de ("mmc: core: enable CMD19 tuning for DDR50 mode") tuning failures are ignored in DDR50 speed mode. Cards compatible with v3.00 don't respond to CMD19 in DDR50 and this error gets printed during enumeration and also if retune is triggered at any time during operation. Update the printk level to pr_debug so that these errors don't lead to false error reports. Signed-off-by: Faiz Abbas <faiz_abbas@ti.com> Cc: stable@vger.kernel.org # v4.4+ Link: https://lore.kernel.org/r/20191206114326.15856-1-faiz_abbas@ti.comSigned-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Rasmus Villemoes authored
commit 8b6dc6b2 upstream. This reverts commit 5dd19552. First, the fix seems to be plain wrong, since the erratum suggests waiting 5ms before setting setting SYSCTL[RSTD], but this msleep() happens after the call of sdhci_reset() which is where that bit gets set (if SDHCI_RESET_DATA is in mask). Second, walking the whole device tree to figure out if some node has a "fsl,p2020-esdhc" compatible string is hugely expensive - about 70 to 100 us on our mpc8309 board. Walking the device tree is done under a raw_spin_lock, so this is obviously really bad on an -rt system, and a waste of time on all. In fact, since esdhc_reset() seems to get called around 100 times per second, that mpc8309 now spends 0.8% of its time determining that it is not a p2020. Whether those 100 calls/s are normal or due to some other bug or misconfiguration, regularly hitting a 100 us non-preemptible window is unacceptable. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20191204085447.27491-1-linux@rasmusvillemoes.dkSigned-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Veerabhadrarao Badiganti authored
commit fa56ac97 upstream. The DDR_CONFIG register offset got updated after a specific minor version of sdcc V4. This offset change has not been properly taken care of while updating register changes for sdcc V5. Correcting proper offset for this register. Also updating this register value to reflect the recommended RCLK delay. Signed-off-by: Veerabhadrarao Badiganti <vbadigan@codeaurora.org> Link: https://lore.kernel.org/r/0101016ea738ec72-fa0f852d-20f8-474a-80b2-4b0ef63b132c-000000@us-west-2.amazonses.com Fixes: f1535888 ("mmc: sdhci-msm: Define new Register address map") Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Christophe Leroy authored
commit 099bc481 upstream. Before commit 0366a1c7 ("powerpc/irq: Run softirqs off the top of the irq stack"), check_stack_overflow() was called by do_IRQ(), before switching to the irq stack. In that commit, do_IRQ() was renamed __do_irq(), and is now executing on the irq stack, so check_stack_overflow() has just become almost useless. Move check_stack_overflow() call in do_IRQ() to do the check while still on the current stack. Fixes: 0366a1c7 ("powerpc/irq: Run softirqs off the top of the irq stack") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/e033aa8116ab12b7ca9a9c75189ad0741e3b9b5f.1575872340.git.christophe.leroy@c-s.frSigned-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Srikar Dronamraju authored
commit 14c73bd3 upstream. With commit 247f2f6f ("sched/core: Don't schedule threads on pre-empted vCPUs"), the scheduler avoids preempted vCPUs to schedule tasks on wakeup. This leads to wrong choice of CPU, which in-turn leads to larger wakeup latencies. Eventually, it leads to performance regression in latency sensitive benchmarks like soltp, schbench etc. On Powerpc, vcpu_is_preempted() only looks at yield_count. If the yield_count is odd, the vCPU is assumed to be preempted. However yield_count is increased whenever the LPAR enters CEDE state (idle). So any CPU that has entered CEDE state is assumed to be preempted. Even if vCPU of dedicated LPAR is preempted/donated, it should have right of first-use since they are supposed to own the vCPU. On a Power9 System with 32 cores: # lscpu Architecture: ppc64le Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s) per core: 8 Core(s) per socket: 1 Socket(s): 16 NUMA node(s): 2 Model: 2.2 (pvr 004e 0202) Model name: POWER9 (architected), altivec supported Hypervisor vendor: pHyp Virtualization type: para L1d cache: 32K L1i cache: 32K L2 cache: 512K L3 cache: 10240K NUMA node0 CPU(s): 0-63 NUMA node1 CPU(s): 64-127 # perf stat -a -r 5 ./schbench v5.4 v5.4 + patch Latency percentiles (usec) Latency percentiles (usec) 50.0000th: 45 50.0th: 45 75.0000th: 62 75.0th: 63 90.0000th: 71 90.0th: 74 95.0000th: 77 95.0th: 78 *99.0000th: 91 *99.0th: 82 99.5000th: 707 99.5th: 83 99.9000th: 6920 99.9th: 86 min=0, max=10048 min=0, max=96 Latency percentiles (usec) Latency percentiles (usec) 50.0000th: 45 50.0th: 46 75.0000th: 61 75.0th: 64 90.0000th: 72 90.0th: 75 95.0000th: 79 95.0th: 79 *99.0000th: 691 *99.0th: 83 99.5000th: 3972 99.5th: 85 99.9000th: 8368 99.9th: 91 min=0, max=16606 min=0, max=117 Latency percentiles (usec) Latency percentiles (usec) 50.0000th: 45 50.0th: 46 75.0000th: 61 75.0th: 64 90.0000th: 71 90.0th: 75 95.0000th: 77 95.0th: 79 *99.0000th: 106 *99.0th: 83 99.5000th: 2364 99.5th: 84 99.9000th: 7480 99.9th: 90 min=0, max=10001 min=0, max=95 Latency percentiles (usec) Latency percentiles (usec) 50.0000th: 45 50.0th: 47 75.0000th: 62 75.0th: 65 90.0000th: 72 90.0th: 75 95.0000th: 78 95.0th: 79 *99.0000th: 93 *99.0th: 84 99.5000th: 108 99.5th: 85 99.9000th: 6792 99.9th: 90 min=0, max=17681 min=0, max=117 Latency percentiles (usec) Latency percentiles (usec) 50.0000th: 46 50.0th: 45 75.0000th: 62 75.0th: 64 90.0000th: 73 90.0th: 75 95.0000th: 79 95.0th: 79 *99.0000th: 113 *99.0th: 82 99.5000th: 2724 99.5th: 83 99.9000th: 6184 99.9th: 93 min=0, max=9887 min=0, max=111 Performance counter stats for 'system wide' (5 runs): context-switches 43,373 ( +- 0.40% ) 44,597 ( +- 0.55% ) cpu-migrations 1,211 ( +- 5.04% ) 220 ( +- 6.23% ) page-faults 15,983 ( +- 5.21% ) 15,360 ( +- 3.38% ) Waiman Long suggested using static_keys. Fixes: 247f2f6f ("sched/core: Don't schedule threads on pre-empted vCPUs") Cc: stable@vger.kernel.org # v4.18+ Reported-by: Parth Shah <parth@linux.ibm.com> Reported-by: Ihor Pasichnyk <Ihor.Pasichnyk@ibm.com> Tested-by: Juri Lelli <juri.lelli@redhat.com> Acked-by: Waiman Long <longman@redhat.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Phil Auld <pauld@redhat.com> Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com> Tested-by: Parth Shah <parth@linux.ibm.com> [mpe: Move the key and setting of the key to pseries/setup.c] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191213035036.6913-1-mpe@ellerman.id.auSigned-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Yazen Ghannam authored
commit 966af209 upstream. Each logical CPU in Scalable MCA systems controls a unique set of MCA banks in the system. These banks are not shared between CPUs. The bank types and ordering will be the same across CPUs on currently available systems. However, some CPUs may see a bank as Reserved/Read-as-Zero (RAZ) while other CPUs do not. In this case, the bank seen as Reserved on one CPU is assumed to be the same type as the bank seen as a known type on another CPU. In general, this occurs when the hardware represented by the MCA bank is disabled, e.g. disabled memory controllers on certain models, etc. The MCA bank is disabled in the hardware, so there is no possibility of getting an MCA/MCE from it even if it is assumed to have a known type. For example: Full system: Bank | Type seen on CPU0 | Type seen on CPU1 ------------------------------------------------ 0 | LS | LS 1 | UMC | UMC 2 | CS | CS System with hardware disabled: Bank | Type seen on CPU0 | Type seen on CPU1 ------------------------------------------------ 0 | LS | LS 1 | UMC | RAZ 2 | CS | CS For this reason, there is a single, global struct smca_banks[] that is initialized at boot time. This array is initialized on each CPU as it comes online. However, the array will not be updated if an entry already exists. This works as expected when the first CPU (usually CPU0) has all possible MCA banks enabled. But if the first CPU has a subset, then it will save a "Reserved" type in smca_banks[]. Successive CPUs will then not be able to update smca_banks[] even if they encounter a known bank type. This may result in unexpected behavior. Depending on the system configuration, a user may observe issues enumerating the MCA thresholding sysfs interface. The issues may be as trivial as sysfs entries not being available, or as severe as system hangs. For example: Bank | Type seen on CPU0 | Type seen on CPU1 ------------------------------------------------ 0 | LS | LS 1 | RAZ | UMC 2 | CS | CS Extend the smca_banks[] entry check to return if the entry is a non-reserved type. Otherwise, continue so that CPUs that encounter a known bank type can update smca_banks[]. Fixes: 68627a69 ("x86/mce/AMD, EDAC/mce_amd: Enumerate Reserved SMCA bank type") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: <stable@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191121141508.141273-1-Yazen.Ghannam@amd.comSigned-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Konstantin Khlebnikov authored
commit 246ff09f upstream. ... because interrupts are disabled that early and sending IPIs can deadlock: BUG: sleeping function called from invalid context at kernel/sched/completion.c:99 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/1 no locks held by swapper/1/0. irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [<ffffffff8106dda9>] copy_process+0x8b9/0x1ca0 softirqs last enabled at (0): [<ffffffff8106dda9>] copy_process+0x8b9/0x1ca0 softirqs last disabled at (0): [<0000000000000000>] 0x0 Preemption disabled at: [<ffffffff8104703b>] start_secondary+0x3b/0x190 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.5.0-rc2+ #1 Hardware name: GIGABYTE MZ01-CE1-00/MZ01-CE1-00, BIOS F02 08/29/2018 Call Trace: dump_stack ___might_sleep.cold.92 wait_for_completion ? generic_exec_single rdmsr_safe_on_cpu ? wrmsr_on_cpus mce_amd_feature_init mcheck_cpu_init identify_cpu identify_secondary_cpu smp_store_cpu_info start_secondary secondary_startup_64 The function smca_configure() is called only on the current CPU anyway, therefore replace rdmsr_safe_on_cpu() with atomic rdmsr_safe() and avoid the IPI. [ bp: Update commit message. ] Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: <stable@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/157252708836.3876.4604398213417262402.stgit@buzzSigned-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Will Deacon authored
commit 1ce74e96 upstream. Commit 4b927b94 ("KVM: arm/arm64: vgic: Introduce find_reg_by_id()") introduced 'find_reg_by_id()', which looks up a system register only if the 'id' index parameter identifies a valid system register. As part of the patch, existing callers of 'find_reg()' were ported over to the new interface, but this breaks 'index_to_sys_reg_desc()' in the case that the initial lookup in the vCPU target table fails because we will then call into 'find_reg()' for the system register table with an uninitialised 'param' as the key to the lookup. GCC 10 is bright enough to spot this (amongst a tonne of false positives, but hey!): | arch/arm64/kvm/sys_regs.c: In function ‘index_to_sys_reg_desc.part.0.isra’: | arch/arm64/kvm/sys_regs.c:983:33: warning: ‘params.Op2’ may be used uninitialized in this function [-Wmaybe-uninitialized] | 983 | (u32)(x)->CRn, (u32)(x)->CRm, (u32)(x)->Op2); | [...] Revert the hunk of 4b927b94 which breaks 'index_to_sys_reg_desc()' so that the old behaviour of checking the index upfront is restored. Fixes: 4b927b94 ("KVM: arm/arm64: vgic: Introduce find_reg_by_id()") Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20191212094049.12437-1-will@kernel.orgSigned-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dan Carpenter authored
commit 7f420d64 upstream. We need to unlock the xattr before returning on this error path. Cc: stable@kernel.org # 4.13 Fixes: c03b45b8 ("ext4, project: expand inode extra size if possible") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20191213185010.6k7yl2tck3wlsdkt@kili.mountainSigned-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jan Kara authored
commit 109ba779 upstream. ext4_check_dir_entry() currently does not catch a case when a directory entry ends so close to the block end that the header of the next directory entry would not fit in the remaining space. This can lead to directory iteration code trying to access address beyond end of current buffer head leading to oops. CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191202170213.4761-3-jack@suse.czSigned-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jan Kara authored
commit 64d4ce89 upstream. Function ext4_empty_dir() doesn't correctly handle directories with holes and crashes on bh->b_data dereference when bh is NULL. Reorganize the loop to use 'offset' variable all the times instead of comparing pointers to current direntry with bh->b_data pointer. Also add more strict checking of '.' and '..' directory entries to avoid entering loop in possibly invalid state on corrupted filesystems. References: CVE-2019-19037 CC: stable@vger.kernel.org Fixes: 4e19d6b6 ("ext4: allow directory holes") Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191202170213.4761-2-jack@suse.czSigned-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ian Abbott authored
commit ab42b48f upstream. The "auto-attach" handler function `gsc_hpdi_auto_attach()` calls `dma_alloc_coherent()` in a loop to allocate some DMA data buffers, and also calls it to allocate a buffer for a DMA descriptor chain. However, it does not check the return value of any of these calls. Change `gsc_hpdi_auto_attach()` to return `-ENOMEM` if any of these `dma_alloc_coherent()` calls fail. This will result in the comedi core calling the "detach" handler `gsc_hpdi_detach()` as part of the clean-up, which will call `gsc_hpdi_free_dma()` to free any allocated DMA coherent memory buffers. Cc: <stable@vger.kernel.org> #4.6+ Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Link: https://lore.kernel.org/r/20191216110823.216237-1-abbotti@mev.co.ukSigned-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Hans de Goede authored
commit 133b2ace upstream. At least on the HP Envy x360 15-cp0xxx model the WMI interface for HPWMI_FEATURE2_QUERY requires an outsize of at least 128 bytes, otherwise it fails with an error code 5 (HPWMI_RET_INVALID_PARAMETERS): Dec 06 00:59:38 kernel: hp_wmi: query 0xd returned error 0x5 We do not care about the contents of the buffer, we just want to know if the HPWMI_FEATURE2_QUERY command is supported. This commits bumps the buffer size, fixing the error. Fixes: 8a1513b4 ("hp-wmi: limit hotkey enable") Cc: stable@vger.kernel.org BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1520703Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alexander Shishkin authored
commit 88385866 upstream. This adds support for Intel Trace Hub in Elkhart Lake. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20191217115527.74383-3-alexander.shishkin@linux.intel.comSigned-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alexander Shishkin authored
commit e4de2a5d upstream. This adds Intel(R) Trace Hub PCI ID for Comet Lake PCH-V. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20191217115527.74383-2-alexander.shishkin@linux.intel.comSigned-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Erkka Talvitie authored
commit 64cc3f12 upstream. When disconnecting a USB hub that has some child device(s) connected to it (such as a USB mouse), then the stack tries to clear halt and reset device(s) which are _already_ physically disconnected. The issue has been reproduced with: CPU: IMX6D5EYM10AD or MCIMX6D5EYM10AE. SW: U-Boot 2019.07 and kernel 4.19.40. CPU: HP Proliant Microserver Gen8. SW: Linux version 4.2.3-300.fc23.x86_64 In this situation there will be error bit for MMF active yet the CERR equals EHCI_TUNE_CERR + halt. Existing implementation interprets this as a stall [1] (chapter 8.4.5). The possible conditions when the MMF will be active + halt can be found from [2] (Table 4-13). Fix for the issue is to check whether MMF is active and PID Code is IN before checking for the stall. If these conditions are true then it is not a stall. What happens after the fix is that when disconnecting a hub with attached device(s) the situation is not interpret as a stall. [1] [https://www.usb.org/document-library/usb-20-specification, usb_20.pdf] [2] [https://www.intel.com/content/dam/www/public/us/en/documents/ technical-specifications/ehci-specification-for-usb.pdf] Signed-off-by: Erkka Talvitie <erkka.talvitie@vincit.fi> Reviewed-by: Alan Stern <stern@rowland.harvard.edu> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/ef70941d5f349767f19c0ed26b0dd9eed8ad81bb.1576050523.git.erkka.talvitie@vincit.fiSigned-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Rafael J. Wysocki authored
commit 85572c2c upstream. The scheduler code calling cpufreq_update_util() may run during CPU offline on the target CPU after the IRQ work lists have been flushed for it, so the target CPU should be prevented from running code that may queue up an IRQ work item on it at that point. Unfortunately, that may not be the case if dvfs_possible_from_any_cpu is set for at least one cpufreq policy in the system, because that allows the CPU going offline to run the utilization update callback of the cpufreq governor on behalf of another (online) CPU in some cases. If that happens, the cpufreq governor callback may queue up an IRQ work on the CPU running it, which is going offline, and the IRQ work may not be flushed after that point. Moreover, that IRQ work cannot be flushed until the "offlining" CPU goes back online, so if any other CPU calls irq_work_sync() to wait for the completion of that IRQ work, it will have to wait until the "offlining" CPU is back online and that may not happen forever. In particular, a system-wide deadlock may occur during CPU online as a result of that. The failing scenario is as follows. CPU0 is the boot CPU, so it creates a cpufreq policy and becomes the "leader" of it (policy->cpu). It cannot go offline, because it is the boot CPU. Next, other CPUs join the cpufreq policy as they go online and they leave it when they go offline. The last CPU to go offline, say CPU3, may queue up an IRQ work while running the governor callback on behalf of CPU0 after leaving the cpufreq policy because of the dvfs_possible_from_any_cpu effect described above. Then, CPU0 is the only online CPU in the system and the stale IRQ work is still queued on CPU3. When, say, CPU1 goes back online, it will run irq_work_sync() to wait for that IRQ work to complete and so it will wait for CPU3 to go back online (which may never happen even in principle), but (worse yet) CPU0 is waiting for CPU1 at that point too and a system-wide deadlock occurs. To address this problem notice that CPUs which cannot run cpufreq utilization update code for themselves (for example, because they have left the cpufreq policies that they belonged to), should also be prevented from running that code on behalf of the other CPUs that belong to a cpufreq policy with dvfs_possible_from_any_cpu set and so in that case the cpufreq_update_util_data pointer of the CPU running the code must not be NULL as well as for the CPU which is the target of the cpufreq utilization update in progress. Accordingly, change cpufreq_this_cpu_can_update() into a regular function in kernel/sched/cpufreq.c (instead of a static inline in a header file) and make it check the cpufreq_update_util_data pointer of the local CPU if dvfs_possible_from_any_cpu is set for the target cpufreq policy. Also update the schedutil governor to do the cpufreq_this_cpu_can_update() check in the non-fast-switch case too to avoid the stale IRQ work issues. Fixes: 99d14d0e ("cpufreq: Process remote callbacks from any CPU if the platform permits") Link: https://lore.kernel.org/linux-pm/20191121093557.bycvdo4xyinbc5cb@vireshk-i7/Reported-by: Anson Huang <anson.huang@nxp.com> Tested-by: Anson Huang <anson.huang@nxp.com> Cc: 4.14+ <stable@vger.kernel.org> # 4.14+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Peng Fan <peng.fan@nxp.com> (i.MX8QXP-MEK) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Suwan Kim authored
commit aabb5b83 upstream. If a transaction error happens in vhci_recv_ret_submit(), event handler closes connection and changes port status to kick hub_event. Then hub tries to flush the endpoint URBs, but that causes infinite loop between usb_hub_flush_endpoint() and vhci_urb_dequeue() because "vhci_priv" in vhci_urb_dequeue() was already released by vhci_recv_ret_submit() before a transmission error occurred. Thus, vhci_urb_dequeue() terminates early and usb_hub_flush_endpoint() continuously calls vhci_urb_dequeue(). The root cause of this issue is that vhci_recv_ret_submit() terminates early without giving back URB when transaction error occurs in vhci_recv_ret_submit(). That causes the error URB to still be linked at endpoint list without “vhci_priv". So, in the case of transaction error in vhci_recv_ret_submit(), unlink URB from the endpoint, insert proper error code in urb->status and give back URB. Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Suwan Kim <suwan.kim027@gmail.com> Cc: stable <stable@vger.kernel.org> Acked-by: Shuah Khan <skhan@linuxfoundation.org> Link: https://lore.kernel.org/r/20191213023055.19933-3-suwan.kim027@gmail.comSigned-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Suwan Kim authored
commit d986294e upstream. When vhci uses SG and receives data whose size is smaller than SG buffer size, it tries to receive more data even if it acutally receives all the data from the server. If then, it erroneously adds error event and triggers connection shutdown. vhci-hcd should check if it received all the data even if there are more SG entries left. So, check if it receivces all the data from the server in for_each_sg() loop. Fixes: ea44d190 ("usbip: Implement SG support to vhci-hcd and stub driver") Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Suwan Kim <suwan.kim027@gmail.com> Acked-by: Shuah Khan <skhan@linuxfoundation.org> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20191213023055.19933-2-suwan.kim027@gmail.comSigned-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dan Carpenter authored
[ Upstream commit b6293c82 ] Callers of alloc_test_extent_buffer have not correctly interpreted the return value as error pointer, as alloc_test_extent_buffer should behave as alloc_extent_buffer. The self-tests were unaffected but btrfs_find_create_tree_block could call both functions and that would cause problems up in the call chain. Fixes: faa2dbf0 ("Btrfs: add sanity tests for new qgroup accounting code") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
-