- 03 Jul, 2019 40 commits
-
-
Xin Long authored
[ Upstream commit c492d4c7 ] This patch is to fix a dst defcnt leak, which can be reproduced by doing: # ip net a c; ip net a s; modprobe tipc # ip net e s ip l a n eth1 type veth peer n eth1 netns c # ip net e c ip l s lo up; ip net e c ip l s eth1 up # ip net e s ip l s lo up; ip net e s ip l s eth1 up # ip net e c ip a a 1.1.1.2/8 dev eth1 # ip net e s ip a a 1.1.1.1/8 dev eth1 # ip net e c tipc b e m udp n u1 localip 1.1.1.2 # ip net e s tipc b e m udp n u1 localip 1.1.1.1 # ip net d c; ip net d s; rmmod tipc and it will get stuck and keep logging the error: unregister_netdevice: waiting for lo to become free. Usage count = 1 The cause is that a dst is held by the udp sock's sk_rx_dst set on udp rx path with udp_early_demux == 1, and this dst (eventually holding lo dev) can't be released as bearer's removal in tipc pernet .exit happens after lo dev's removal, default_device pernet .exit. "There are two distinct types of pernet_operations recognized: subsys and device. At creation all subsys init functions are called before device init functions, and at destruction all device exit functions are called before subsys exit function." So by calling register_pernet_device instead to register tipc_net_ops, the pernet .exit() will be invoked earlier than loopback dev's removal when a netns is being destroyed, as fou/gue does. Note that vxlan and geneve udp tunnels don't have this issue, as the udp sock is released in their device ndo_stop(). This fix is also necessary for tipc dst_cache, which will hold dsts on tx path and I will introduce in my next patch. Reported-by:
Li Shuang <shuali@redhat.com> Signed-off-by:
Xin Long <lucien.xin@gmail.com> Acked-by:
Jon Maloy <jon.maloy@ericsson.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
YueHaibing authored
[ Upstream commit ee429742 ] We should rather have vlan_tci filled all the way down to the transmitting netdevice and let it do the hw/sw vlan implementation. Suggested-by:
Jiri Pirko <jiri@resnulli.us> Signed-off-by:
YueHaibing <yuehaibing@huawei.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Xin Long authored
[ Upstream commit 25bff6d5 ] Now in sctp_endpoint_init(), it holds the sk then creates auth shkey. But when the creation fails, it doesn't release the sk, which causes a sk defcnf leak, Here to fix it by only holding the sk when auth shkey is created successfully. Fixes: a29a5bd4 ("[SCTP]: Implement SCTP-AUTH initializations.") Reported-by: syzbot+afabda3890cc2f765041@syzkaller.appspotmail.com Reported-by: syzbot+276ca1c77a19977c0130@syzkaller.appspotmail.com Signed-off-by:
Xin Long <lucien.xin@gmail.com> Acked-by:
Neil Horman <nhorman@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Roland Hii authored
[ Upstream commit d0bb82fd ] When transmitting certain PTP frames, e.g. SYNC and DELAY_REQ, the PTP daemon, e.g. ptp4l, is polling the driver for the frame transmit hardware timestamp. The polling will most likely timeout if the tx coalesce is enabled due to the Interrupt-on-Completion (IC) bit is not set in tx descriptor for those frames. This patch will ignore the tx coalesce parameter and set the IC bit when transmitting PTP frames which need to report out the frame transmit hardware timestamp to user space. Fixes: f748be53 ("net: stmmac: Rework coalesce timer and fix multi-queue races") Signed-off-by:
Roland Hii <roland.king.guan.hii@intel.com> Signed-off-by:
Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by:
Voon Weifeng <weifeng.voon@intel.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Roland Hii authored
[ Upstream commit a1e5388b ] When ADDSUB bit is set, the system time seconds field is calculated as the complement of the seconds part of the update value. For example, if 3.000000001 seconds need to be subtracted from the system time, this field is calculated as 2^32 - 3 = 4294967296 - 3 = 0x100000000 - 3 = 0xFFFFFFFD Previously, the 0x100000000 is mistakenly written as 100000000. This is further simplified from sec = (0x100000000ULL - sec); to sec = -sec; Fixes: ba1ffd74 ("stmmac: fix PTP support for GMAC4") Signed-off-by:
Roland Hii <roland.king.guan.hii@intel.com> Signed-off-by:
Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by:
Voon Weifeng <weifeng.voon@intel.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
JingYi Hou authored
[ Upstream commit d0bae4a0 ] In sock_getsockopt(), 'optlen' is fetched the first time from userspace. 'len < 0' is then checked. Then in condition 'SO_MEMINFO', 'optlen' is fetched the second time from userspace. If change it between two fetches may cause security problems or unexpected behaivor, and there is no reason to fetch it a second time. To fix this, we need to remove the second fetch. Signed-off-by:
JingYi Hou <houjingyi647@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Dumazet authored
[ Upstream commit 55655e3d ] syzbot found we can leak memory in packet_set_ring(), if user application provides buggy parameters. Fixes: 7f953ab2 ("af_packet: TX_RING support for TPACKET_V3") Signed-off-by:
Eric Dumazet <edumazet@google.com> Cc: Sowmini Varadhan <sowmini.varadhan@oracle.com> Reported-by:
syzbot <syzkaller@googlegroups.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Stephen Suryaputra authored
[ Upstream commit 38c73529 ] In commit 19e4e768 ("ipv4: Fix raw socket lookup for local traffic"), the dif argument to __raw_v4_lookup() is coming from the returned value of inet_iif() but the change was done only for the first lookup. Subsequent lookups in the while loop still use skb->dev->ifIndex. Fixes: 19e4e768 ("ipv4: Fix raw socket lookup for local traffic") Signed-off-by:
Stephen Suryaputra <ssuryaextr@gmail.com> Reviewed-by:
David Ahern <dsahern@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
YueHaibing authored
[ Upstream commit 30d8177e ] We build vlan on top of bonding interface, which vlan offload is off, bond mode is 802.3ad (LACP) and xmit_hash_policy is BOND_XMIT_POLICY_ENCAP34. Because vlan tx offload is off, vlan tci is cleared and skb push the vlan header in validate_xmit_vlan() while sending from vlan devices. Then in bond_xmit_hash, __skb_flow_dissect() fails to get information from protocol headers encapsulated within vlan, because 'nhoff' is points to IP header, so bond hashing is based on layer 2 info, which fails to distribute packets across slaves. This patch always enable bonding's vlan tx offload, pass the vlan packets to the slave devices with vlan tci, let them to handle vlan implementation. Fixes: 278339a4 ("bonding: propogate vlan_features to bonding master") Suggested-by:
Jiri Pirko <jiri@resnulli.us> Signed-off-by:
YueHaibing <yuehaibing@huawei.com> Acked-by:
Jiri Pirko <jiri@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Neil Horman authored
[ Upstream commit 89ed5b51 ] When an application is run that: a) Sets its scheduler to be SCHED_FIFO and b) Opens a memory mapped AF_PACKET socket, and sends frames with the MSG_DONTWAIT flag cleared, its possible for the application to hang forever in the kernel. This occurs because when waiting, the code in tpacket_snd calls schedule, which under normal circumstances allows other tasks to run, including ksoftirqd, which in some cases is responsible for freeing the transmitted skb (which in AF_PACKET calls a destructor that flips the status bit of the transmitted frame back to available, allowing the transmitting task to complete). However, when the calling application is SCHED_FIFO, its priority is such that the schedule call immediately places the task back on the cpu, preventing ksoftirqd from freeing the skb, which in turn prevents the transmitting task from detecting that the transmission is complete. We can fix this by converting the schedule call to a completion mechanism. By using a completion queue, we force the calling task, when it detects there are no more frames to send, to schedule itself off the cpu until such time as the last transmitted skb is freed, allowing forward progress to be made. Tested by myself and the reporter, with good results Change Notes: V1->V2: Enhance the sleep logic to support being interruptible and allowing for honoring to SK_SNDTIMEO (Willem de Bruijn) V2->V3: Rearrage the point at which we wait for the completion queue, to avoid needing to check for ph/skb being null at the end of the loop. Also move the complete call to the skb destructor to avoid needing to modify __packet_set_status. Also gate calling complete on packet_read_pending returning zero to avoid multiple calls to complete. (Willem de Bruijn) Move timeo computation within loop, to re-fetch the socket timeout since we also use the timeo variable to record the return code from the wait_for_complete call (Neil Horman) V3->V4: Willem has requested that the control flow be restored to the previous state. Doing so lets us eliminate the need for the po->wait_on_complete flag variable, and lets us get rid of the packet_next_frame function, but introduces another complexity. Specifically, but using the packet pending count, we can, if an applications calls sendmsg multiple times with MSG_DONTWAIT set, each set of transmitted frames, when complete, will cause tpacket_destruct_skb to issue a complete call, for which there will never be a wait_on_completion call. This imbalance will lead to any future call to wait_for_completion here to return early, when the frames they sent may not have completed. To correct this, we need to re-init the completion queue on every call to tpacket_snd before we enter the loop so as to ensure we wait properly for the frames we send in this iteration. Change the timeout and interrupted gotos to out_put rather than out_status so that we don't try to free a non-existant skb Clean up some extra newlines (Willem de Bruijn) Reviewed-by:
Willem de Bruijn <willemb@google.com> Signed-off-by:
Neil Horman <nhorman@tuxdriver.com> Reported-by:
Matteo Croce <mcroce@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Wang Xin authored
commit 9a9e295e upstream. Within at24_loop_until_timeout the timestamp used for timeout checking is recorded after the I2C transfer and sleep_range(). Under high CPU load either the execution time for I2C transfer or sleep_range() could actually be larger than the timeout value. Worst case the I2C transfer is only tried once because the loop will exit due to the timeout although the EEPROM is now ready. To fix this issue the timestamp is recorded at the beginning of each iteration. That is, before I2C transfer and sleep. Then the timeout is actually checked against the timestamp of the previous iteration. This makes sure that even if the timeout is reached, there is still one more chance to try the I2C transfer in case the EEPROM is ready. Example: If you have a system which combines high CPU load with repeated EEPROM writes you will run into the following scenario. - System makes a successful regmap_bulk_write() to EEPROM. - System wants to perform another write to EEPROM but EEPROM is still busy with the last write. - Because of high CPU load the usleep_range() will sleep more than 25 ms (at24_write_timeout). - Within the over-long sleeping the EEPROM finished the previous write operation and is ready again. - at24_loop_until_timeout() will detect timeout and won't try to write. Signed-off-by:
Wang Xin <xin.wang7@cn.bosch.com> Signed-off-by:
Mark Jonas <mark.jonas@de.bosch.com> Signed-off-by:
Bartosz Golaszewski <brgl@bgdev.pl> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Paul Burton authored
commit 6d4d367d upstream. The MIPS GIC contains a block of registers used to map local interrupts to a particular CPU interrupt pin. Since these registers are found at a consecutive range of addresses we access them using an index, via the (read|write)_gic_v[lo]_map accessor functions. We currently use values from enum mips_gic_local_interrupt as those indices. Unfortunately whilst enum mips_gic_local_interrupt provides the correct offsets for bits in the pending & mask registers, the ordering of the map registers is subtly different... Compared with the ordering of pending & mask bits, the map registers move the FDC from the end of the list to index 3 after the timer interrupt. As a result the performance counter & software interrupts are therefore at indices 4-6 rather than indices 3-5. Notably this causes problems with performance counter interrupts being incorrectly mapped on some systems, and presumably will also cause problems for FDC interrupts. Introduce a function to map from enum mips_gic_local_interrupt to the index of the corresponding map register, and use it to ensure we access the map registers for the correct interrupts. Signed-off-by:
Paul Burton <paul.burton@mips.com> Fixes: a0dc5cb5 ("irqchip: mips-gic: Simplify gic_local_irq_domain_map()") Fixes: da61fcf9 ("irqchip: mips-gic: Use irq_cpu_online to (un)mask all-VP(E) IRQs") Reported-and-tested-by:
Archer Yan <ayan@wavecomp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jason Cooper <jason@lakedaemon.net> Cc: stable@vger.kernel.org # v4.14+ Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Trond Myklebust authored
commit 9dc6edcf upstream. Move the initialisation back into xprt.c. Signed-off-by:
Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Yihao Wu <wuyihao@linux.alibaba.com> Cc: Caspar Zhang <caspar@linux.alibaba.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Geert Uytterhoeven authored
commit 1bf72720 upstream. Currently, if the user specifies an unsupported mitigation strategy on the kernel command line, it will be ignored silently. The code will fall back to the default strategy, possibly leaving the system more vulnerable than expected. This may happen due to e.g. a simple typo, or, for a stable kernel release, because not all mitigation strategies have been backported. Inform the user by printing a message. Fixes: 98af8452 ("cpu/speculation: Add 'mitigations=' cmdline option") Signed-off-by:
Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Acked-by:
Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190516070935.22546-1-geert@linux-m68k.orgSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Trond Myklebust authored
commit 68f46159 upstream. Fix a typo where we're confusing the default TCP retrans value (NFS_DEF_TCP_RETRANS) for the default TCP timeout value. Fixes: 15d03055 ("pNFS/flexfiles: Set reasonable default ...") Cc: stable@vger.kernel.org # 4.8+ Signed-off-by:
Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sean Christopherson authored
commit b6b80c78 upstream. SVM's Nested Page Tables (NPT) reuses x86 paging for the host-controlled page walk. For 32-bit KVM, this means PAE paging is used even when TDP is enabled, i.e. the PAE root array needs to be allocated. Fixes: ee6268ba ("KVM: x86: Skip pae_root shadow allocation if tdp enabled") Cc: stable@vger.kernel.org Reported-by:
Jiri Palecek <jpalecek@web.de> Signed-off-by:
Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Cc: Jiri Palecek <jpalecek@web.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Reinette Chatre authored
commit 32f010de upstream. While the DOC at the beginning of lib/bitmap.c explicitly states that "The number of valid bits in a given bitmap does _not_ need to be an exact multiple of BITS_PER_LONG.", some of the bitmap operations do indeed access BITS_PER_LONG portions of the provided bitmap no matter the size of the provided bitmap. For example, if find_first_bit() is provided with an 8 bit bitmap the operation will access BITS_PER_LONG bits from the provided bitmap. While the operation ensures that these extra bits do not affect the result, the memory is still accessed. The capacity bitmasks (CBMs) are typically stored in u32 since they can never exceed 32 bits. A few instances exist where a bitmap_* operation is performed on a CBM by simply pointing the bitmap operation to the stored u32 value. The consequence of this pattern is that some bitmap_* operations will access out-of-bounds memory when interacting with the provided CBM. This same issue has previously been addressed with commit 49e00eee ("x86/intel_rdt: Fix out-of-bounds memory access in CBM tests") but at that time not all instances of the issue were fixed. Fix this by using an unsigned long to store the capacity bitmask data that is passed to bitmap functions. Fixes: e6519011 ("x86/intel_rdt: Introduce "bit_usage" to display cache allocations details") Fixes: f4e80d67 ("x86/intel_rdt: Resctrl files reflect pseudo-locked information") Fixes: 95f0b77e ("x86/intel_rdt: Initialize new resource group with sane defaults") Signed-off-by:
Reinette Chatre <reinette.chatre@intel.com> Signed-off-by:
Borislav Petkov <bp@suse.de> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: stable <stable@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/58c9b6081fd9bf599af0dfc01a6fdd335768efef.1560975645.git.reinette.chatre@intel.comSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Thomas Gleixner authored
commit 5423f5ce upstream. A recent change moved the microcode loader hotplug callback into the early startup phase which is running with interrupts disabled. It missed that the callbacks invoke sysfs functions which might sleep causing nice 'might sleep' splats with proper debugging enabled. Split the callbacks and only load the microcode in the early startup phase and move the sysfs handling back into the later threaded and preemptible bringup phase where it was before. Fixes: 78f4e932 ("x86/microcode, cpuhotplug: Add a microcode loader CPU hotplug callback") Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: stable@vger.kernel.org Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1906182228350.1766@nanos.tec.linutronix.deSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alejandro Jimenez authored
commit c1f7fec1 upstream. The bits set in x86_spec_ctrl_mask are used to calculate the guest's value of SPEC_CTRL that is written to the MSR before VMENTRY, and control which mitigations the guest can enable. In the case of SSBD, unless the host has enabled SSBD always on mode (by passing "spec_store_bypass_disable=on" in the kernel parameters), the SSBD bit is not set in the mask and the guest can not properly enable the SSBD always on mitigation mode. This has been confirmed by running the SSBD PoC on a guest using the SSBD always on mitigation mode (booted with kernel parameter "spec_store_bypass_disable=on"), and verifying that the guest is vulnerable unless the host is also using SSBD always on mode. In addition, the guest OS incorrectly reports the SSB vulnerability as mitigated. Always set the SSBD bit in x86_spec_ctrl_mask when the host CPU supports it, allowing the guest to use SSBD whether or not the host has chosen to enable the mitigation in any of its modes. Fixes: be6fcb54 ("x86/bugs: Rework spec_ctrl base and mask logic") Signed-off-by:
Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Liam Merwick <liam.merwick@oracle.com> Reviewed-by:
Mark Kanda <mark.kanda@oracle.com> Reviewed-by:
Paolo Bonzini <pbonzini@redhat.com> Cc: bp@alien8.de Cc: rkrcmar@redhat.com Cc: kvm@vger.kernel.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1560187210-11054-1-git-send-email-alejandro.j.jimenez@oracle.comSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jan Kara authored
commit 240b4cc8 upstream. Once we unlock adapter->hw_lock in pvscsi_queue_lck() nothing prevents just queued scsi_cmnd from completing and freeing the request. Thus cmd->cmnd[0] dereference can dereference already freed request leading to kernel crashes or other issues (which one of our customers observed). Store cmd->cmnd[0] in a local variable before unlocking adapter->hw_lock to fix the issue. CC: <stable@vger.kernel.org> Signed-off-by:
Jan Kara <jack@suse.cz> Reviewed-by:
Ewan D. Milne <emilne@redhat.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
zhangyi (F) authored
commit 211ad4b7 upstream. Currently, although we submit super bios in order (and super.nr_entries is incremented by each logged entry), submit_bio() is async so each super sector may not be written to log device in order and then the final nr_entries may be smaller than it should be. This problem can be reproduced by the xfstests generic/455 with ext4: QA output created by 455 -Silence is golden +mark 'end' does not exist Fix this by serializing submission of super sectors to make sure each is written to the log disk in order. Fixes: 0e9cebe7 ("dm: add log writes target") Cc: stable@vger.kernel.org Signed-off-by:
zhangyi (F) <yi.zhang@huawei.com> Suggested-by:
Josef Bacik <josef@toxicpanda.com> Reviewed-by:
Josef Bacik <josef@toxicpanda.com> Signed-off-by:
Mike Snitzer <snitzer@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Colin Ian King authored
commit 7298e3b0 upstream. Currently the calcuation of end_pfn can round up the pfn number to more than the actual maximum number of pfns, causing an Oops. Fix this by ensuring end_pfn is never more than max_pfn. This can be easily triggered when on systems where the end_pfn gets rounded up to more than max_pfn using the idle-page stress-ng stress test: sudo stress-ng --idle-page 0 BUG: unable to handle kernel paging request at 00000000000020d8 #PF error: [normal kernel read fault] PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 1 PID: 11039 Comm: stress-ng-idle- Not tainted 5.0.0-5-generic #6-Ubuntu Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 RIP: 0010:page_idle_get_page+0xc8/0x1a0 Code: 0f b1 0a 75 7d 48 8b 03 48 89 c2 48 c1 e8 33 83 e0 07 48 c1 ea 36 48 8d 0c 40 4c 8d 24 88 49 c1 e4 07 4c 03 24 d5 00 89 c3 be <49> 8b 44 24 58 48 8d b8 80 a1 02 00 e8 07 d5 77 00 48 8b 53 08 48 RSP: 0018:ffffafd7c672fde8 EFLAGS: 00010202 RAX: 0000000000000005 RBX: ffffe36341fff700 RCX: 000000000000000f RDX: 0000000000000284 RSI: 0000000000000275 RDI: 0000000001fff700 RBP: ffffafd7c672fe00 R08: ffffa0bc34056410 R09: 0000000000000276 R10: ffffa0bc754e9b40 R11: ffffa0bc330f6400 R12: 0000000000002080 R13: ffffe36341fff700 R14: 0000000000080000 R15: ffffa0bc330f6400 FS: 00007f0ec1ea5740(0000) GS:ffffa0bc7db00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000020d8 CR3: 0000000077d68000 CR4: 00000000000006e0 Call Trace: page_idle_bitmap_write+0x8c/0x140 sysfs_kf_bin_write+0x5c/0x70 kernfs_fop_write+0x12e/0x1b0 __vfs_write+0x1b/0x40 vfs_write+0xab/0x1b0 ksys_write+0x55/0xc0 __x64_sys_write+0x1a/0x20 do_syscall_64+0x5a/0x110 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Link: http://lkml.kernel.org/r/20190618124352.28307-1-colin.king@canonical.com Fixes: 33c3fc71 ("mm: introduce idle page tracking") Signed-off-by:
Colin Ian King <colin.king@canonical.com> Reviewed-by:
Andrew Morton <akpm@linux-foundation.org> Acked-by:
Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Naoya Horiguchi authored
commit faf53def upstream. madvise(MADV_SOFT_OFFLINE) often returns -EBUSY when calling soft offline for hugepages with overcommitting enabled. That was caused by the suboptimal code in current soft-offline code. See the following part: ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL, MIGRATE_SYNC, MR_MEMORY_FAILURE); if (ret) { ... } else { /* * We set PG_hwpoison only when the migration source hugepage * was successfully dissolved, because otherwise hwpoisoned * hugepage remains on free hugepage list, then userspace will * find it as SIGBUS by allocation failure. That's not expected * in soft-offlining. */ ret = dissolve_free_huge_page(page); if (!ret) { if (set_hwpoison_free_buddy_page(page)) num_poisoned_pages_inc(); } } return ret; Here dissolve_free_huge_page() returns -EBUSY if the migration source page was freed into buddy in migrate_pages(), but even in that case we actually has a chance that set_hwpoison_free_buddy_page() succeeds. So that means current code gives up offlining too early now. dissolve_free_huge_page() checks that a given hugepage is suitable for dissolving, where we should return success for !PageHuge() case because the given hugepage is considered as already dissolved. This change also affects other callers of dissolve_free_huge_page(), which are cleaned up together. [n-horiguchi@ah.jp.nec.com: v3] Link: http://lkml.kernel.org/r/1560761476-4651-3-git-send-email-n-horiguchi@ah.jp.nec.comLink: http://lkml.kernel.org/r/1560154686-18497-3-git-send-email-n-horiguchi@ah.jp.nec.com Fixes: 6bc9b564 ("mm: fix race on soft-offlining") Signed-off-by:
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reported-by:
Chen, Jerry T <jerry.t.chen@intel.com> Tested-by:
Chen, Jerry T <jerry.t.chen@intel.com> Reviewed-by:
Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by:
Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: Xishi Qiu <xishi.qiuxishi@alibaba-inc.com> Cc: "Chen, Jerry T" <jerry.t.chen@intel.com> Cc: "Zhuo, Qiuxu" <qiuxu.zhuo@intel.com> Cc: <stable@vger.kernel.org> [4.19+] Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Naoya Horiguchi authored
commit b38e5962 upstream. The pass/fail of soft offline should be judged by checking whether the raw error page was finally contained or not (i.e. the result of set_hwpoison_free_buddy_page()), but current code do not work like that. It might lead us to misjudge the test result when set_hwpoison_free_buddy_page() fails. Without this fix, there are cases where madvise(MADV_SOFT_OFFLINE) may not offline the original page and will not return an error. Link: http://lkml.kernel.org/r/1560154686-18497-2-git-send-email-n-horiguchi@ah.jp.nec.comSigned-off-by:
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Fixes: 6bc9b564 ("mm: fix race on soft-offlining") Reviewed-by:
Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by:
Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: Xishi Qiu <xishi.qiuxishi@alibaba-inc.com> Cc: "Chen, Jerry T" <jerry.t.chen@intel.com> Cc: "Zhuo, Qiuxu" <qiuxu.zhuo@intel.com> Cc: <stable@vger.kernel.org> [4.19+] Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dinh Nguyen authored
commit 74684cce upstream. The fixed dividers for the emac clocks should be 2 not 4. Cc: stable@vger.kernel.org Signed-off-by:
Dinh Nguyen <dinguyen@kernel.org> Signed-off-by:
Stephen Boyd <sboyd@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jann Horn authored
commit 867bfa4a upstream. load_flat_shared_library() is broken: It only calls load_flat_file() if prepare_binprm() returns zero, but prepare_binprm() returns the number of bytes read - so this only happens if the file is empty. Instead, call into load_flat_file() if the number of bytes read is non-negative. (Even if the number of bytes is zero - in that case, load_flat_file() will see nullbytes and return a nice -ENOEXEC.) In addition, remove the code related to bprm creds and stop using prepare_binprm() - this code is loading a library, not a main executable, and it only actually uses the members "buf", "file" and "filename" of the linux_binprm struct. Instead, call kernel_read() directly. Link: http://lkml.kernel.org/r/20190524201817.16509-1-jannh@google.com Fixes: 287980e4 ("remove lots of IS_ERR_VALUE abuses") Signed-off-by:
Jann Horn <jannh@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
zhong jiang authored
commit 29b190fa upstream. mpol_rebind_nodemask() is called for MPOL_BIND and MPOL_INTERLEAVE mempoclicies when the tasks's cpuset's mems_allowed changes. For policies created without MPOL_F_STATIC_NODES or MPOL_F_RELATIVE_NODES, it works by remapping the policy's allowed nodes (stored in v.nodes) using the previous value of mems_allowed (stored in w.cpuset_mems_allowed) as the domain of map and the new mems_allowed (passed as nodes) as the range of the map (see the comment of bitmap_remap() for details). The result of remapping is stored back as policy's nodemask in v.nodes, and the new value of mems_allowed should be stored in w.cpuset_mems_allowed to facilitate the next rebind, if it happens. However, 213980c0 ("mm, mempolicy: simplify rebinding mempolicies when updating cpusets") introduced a bug where the result of remapping is stored in w.cpuset_mems_allowed instead. Thus, a mempolicy's allowed nodes can evolve in an unexpected way after a series of rebinding due to cpuset mems_allowed changes, possibly binding to a wrong node or a smaller number of nodes which may e.g. overload them. This patch fixes the bug so rebinding again works as intended. [vbabka@suse.cz: new changlog] Link: http://lkml.kernel.org/r/ef6a69c6-c052-b067-8f2c-9d615c619bb9@suse.cz Link: http://lkml.kernel.org/r/1558768043-23184-1-git-send-email-zhongjiang@huawei.com Fixes: 213980c0 ("mm, mempolicy: simplify rebinding mempolicies when updating cpusets") Signed-off-by:
zhong jiang <zhongjiang@huawei.com> Reviewed-by:
Vlastimil Babka <vbabka@suse.cz> Cc: Oscar Salvador <osalvador@suse.de> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
John Ogness authored
commit cb8f381f upstream. 0a1eb2d4 ("fs/proc: Stop reporting eip and esp in /proc/PID/stat") stopped reporting eip/esp and fd7d5627 ("fs/proc: Report eip/esp in /prod/PID/stat for coredumping") reintroduced the feature to fix a regression with userspace core dump handlers (such as minicoredumper). Because PF_DUMPCORE is only set for the primary thread, this didn't fix the original problem for secondary threads. Allow reporting the eip/esp for all threads by checking for PF_EXITING as well. This is set for all the other threads when they are killed. coredump_wait() waits for all the tasks to become inactive before proceeding to invoke a core dumper. Link: http://lkml.kernel.org/r/87y32p7i7a.fsf@linutronix.de Link: http://lkml.kernel.org/r/20190522161614.628-1-jlu@pengutronix.de Fixes: fd7d5627 ("fs/proc: Report eip/esp in /prod/PID/stat for coredumping") Signed-off-by:
John Ogness <john.ogness@linutronix.de> Reported-by:
Jan Luebbe <jlu@pengutronix.de> Tested-by:
Jan Luebbe <jlu@pengutronix.de> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jack Pham authored
commit bd674224 upstream OUT endpoint requests may somtimes have this flag set when preparing to be submitted to HW indicating that there is an additional TRB chained to the request for alignment purposes. If that request is removed before the controller can execute the transfer (e.g. ep_dequeue/ep_disable), the request will not go through the dwc3_gadget_ep_cleanup_completed_request() handler and will not have its needs_extra_trb flag cleared when dwc3_gadget_giveback() is called. This same request could be later requeued for a new transfer that does not require an extra TRB and if it is successfully completed, the cleanup and TRB reclamation will incorrectly process the additional TRB which belongs to the next request, and incorrectly advances the TRB dequeue pointer, thereby messing up calculation of the next requeust's actual/remaining count when it completes. The right thing to do here is to ensure that the flag is cleared before it is given back to the function driver. A good place to do that is in dwc3_gadget_del_and_unmap_request(). Fixes: c6267a51 ("usb: dwc3: gadget: align transfers to wMaxPacketSize") Cc: Fei Yang <fei.yang@intel.com> Cc: Sam Protsenko <semen.protsenko@linaro.org> Cc: Felipe Balbi <balbi@kernel.org> Cc: linux-usb@vger.kernel.org Cc: stable@vger.kernel.org # 4.19.y Signed-off-by:
Jack Pham <jackp@codeaurora.org> Signed-off-by:
Felipe Balbi <felipe.balbi@linux.intel.com> (cherry picked from commit bd674224) Signed-off-by:
John Stultz <john.stultz@linaro.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Felipe Balbi authored
commit fec9095b upstream Now that we have a list of cancelled requests, we can skip over TRBs when END_TRANSFER command completes. Cc: Fei Yang <fei.yang@intel.com> Cc: Sam Protsenko <semen.protsenko@linaro.org> Cc: Felipe Balbi <balbi@kernel.org> Cc: linux-usb@vger.kernel.org Cc: stable@vger.kernel.org # 4.19.y Signed-off-by:
Felipe Balbi <felipe.balbi@linux.intel.com> (cherry picked from commit fec9095b) Signed-off-by:
John Stultz <john.stultz@linaro.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Felipe Balbi authored
commit d4f1afe5 upstream Whenever we have a request in flight, we can move it to the cancelled list and later simply iterate over that list and skip over any TRBs we find. Cc: Fei Yang <fei.yang@intel.com> Cc: Sam Protsenko <semen.protsenko@linaro.org> Cc: Felipe Balbi <balbi@kernel.org> Cc: linux-usb@vger.kernel.org Cc: stable@vger.kernel.org # 4.19.y Signed-off-by:
Felipe Balbi <felipe.balbi@linux.intel.com> (cherry picked from commit d4f1afe5) Signed-off-by:
John Stultz <john.stultz@linaro.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Felipe Balbi authored
commit d5443bbf upstream This list will host cancelled requests who still have TRBs being processed. Cc: Fei Yang <fei.yang@intel.com> Cc: Sam Protsenko <semen.protsenko@linaro.org> Cc: Felipe Balbi <balbi@kernel.org> Cc: linux-usb@vger.kernel.org Cc: stable@vger.kernel.org # 4.19.y Signed-off-by:
Felipe Balbi <felipe.balbi@linux.intel.com> (cherry picked from commit d5443bbf) Signed-off-by:
John Stultz <john.stultz@linaro.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Felipe Balbi authored
commit 7746a8df upstream Extract the logic for skipping over TRBs to its own function. This makes the code slightly more readable and makes it easier to move this call to its final resting place as a following patch. Cc: Fei Yang <fei.yang@intel.com> Cc: Sam Protsenko <semen.protsenko@linaro.org> Cc: Felipe Balbi <balbi@kernel.org> Cc: linux-usb@vger.kernel.org Cc: stable@vger.kernel.org # 4.19.y Signed-off-by:
Felipe Balbi <felipe.balbi@linux.intel.com> (cherry picked from commit 7746a8df) Signed-off-by:
John Stultz <john.stultz@linaro.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Felipe Balbi authored
commit c3acd590 upstream Now that we track how many TRBs a request uses, it's easier to skip over them in case of a call to usb_ep_dequeue(). Let's do so and simplify the code a bit. Cc: Fei Yang <fei.yang@intel.com> Cc: Sam Protsenko <semen.protsenko@linaro.org> Cc: Felipe Balbi <balbi@kernel.org> Cc: linux-usb@vger.kernel.org Cc: stable@vger.kernel.org # 4.19.y Signed-off-by:
Felipe Balbi <felipe.balbi@linux.intel.com> (cherry picked from commit c3acd590) Signed-off-by:
John Stultz <john.stultz@linaro.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Felipe Balbi authored
commit 09fe1f8d upstream This will help us remove the wait_event() from our ->dequeue(). Cc: Fei Yang <fei.yang@intel.com> Cc: Sam Protsenko <semen.protsenko@linaro.org> Cc: Felipe Balbi <balbi@kernel.org> Cc: linux-usb@vger.kernel.org Cc: stable@vger.kernel.org # 4.19.y Signed-off-by:
Felipe Balbi <felipe.balbi@linux.intel.com> (cherry picked from commit 09fe1f8d) Signed-off-by:
John Stultz <john.stultz@linaro.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Felipe Balbi authored
commit 1a22ec64 upstream Both flags are used for the same purpose in dwc3: appending an extra TRB at the end to deal with controller requirements. By combining both flags into one, we make it clear that the situation is the same and that they should be treated equally. Cc: Fei Yang <fei.yang@intel.com> Cc: Sam Protsenko <semen.protsenko@linaro.org> Cc: Felipe Balbi <balbi@kernel.org> Cc: linux-usb@vger.kernel.org Cc: stable@vger.kernel.org # 4.19.y Signed-off-by:
Felipe Balbi <felipe.balbi@linux.intel.com> (cherry picked from commit 1a22ec64) Signed-off-by:
John Stultz <john.stultz@linaro.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
John Stultz authored
This reverts commit 25ad17d6, as we will be cherry-picking a number of changes from upstream that allows us to later cherry-pick the same fix from upstream rather than using this modified backported version. Cc: Fei Yang <fei.yang@intel.com> Cc: Sam Protsenko <semen.protsenko@linaro.org> Cc: Felipe Balbi <balbi@kernel.org> Cc: linux-usb@vger.kernel.org Cc: stable@vger.kernel.org # 4.19.y Signed-off-by:
John Stultz <john.stultz@linaro.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Bjørn Mork authored
[ Upstream commit 904d88d7 ] The syzbot reported Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xca/0x13e lib/dump_stack.c:113 print_address_description+0x67/0x231 mm/kasan/report.c:188 __kasan_report.cold+0x1a/0x32 mm/kasan/report.c:317 kasan_report+0xe/0x20 mm/kasan/common.c:614 qmi_wwan_probe+0x342/0x360 drivers/net/usb/qmi_wwan.c:1417 usb_probe_interface+0x305/0x7a0 drivers/usb/core/driver.c:361 really_probe+0x281/0x660 drivers/base/dd.c:509 driver_probe_device+0x104/0x210 drivers/base/dd.c:670 __device_attach_driver+0x1c2/0x220 drivers/base/dd.c:777 bus_for_each_drv+0x15c/0x1e0 drivers/base/bus.c:454 Caused by too many confusing indirections and casts. id->driver_info is a pointer stored in a long. We want the pointer here, not the address of it. Thanks-to: Hillf Danton <hdanton@sina.com> Reported-by: syzbot+b68605d7fadd21510de1@syzkaller.appspotmail.com Cc: Kristian Evensen <kristian.evensen@gmail.com> Fixes: e4bf6348 ("qmi_wwan: Add quirk for Quectel dynamic config") Signed-off-by:
Bjørn Mork <bjorn@mork.no> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Adeodato Simó authored
[ Upstream commit 52ad259e ] This silences -Wmissing-prototypes when defining p9_release_pages. Link: http://lkml.kernel.org/r/b1c4df8f21689b10d451c28fe38e860722d20e71.1542089696.git.dato@net.com.org.esSigned-off-by:
Adeodato Simó <dato@net.com.org.es> Signed-off-by:
Dominique Martinet <dominique.martinet@cea.fr> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Dominique Martinet authored
[ Upstream commit fb488fc1 ] p9_read_work/p9_write_work might still hold references to a req after having been cancelled; make sure we put any of these to avoid potential request leak on disconnect. Fixes: 728356de ("9p: Add refcount to p9_req_t") Link: http://lkml.kernel.org/r/1539057956-23741-2-git-send-email-asmadeus@codewreck.orgSigned-off-by:
Dominique Martinet <dominique.martinet@cea.fr> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Latchesar Ionkov <lucho@ionkov.net> Reviewed-by:
Tomas Bortoli <tomasbortoli@gmail.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-