- 25 May, 2018 40 commits
-
-
Martin Schwidefsky authored
[ Upstream commit 4253b0e0 ] The nospec-branch.c file is compiled without the gcc options to generate expoline thunks. The return branch of the sysfs show functions cpu_show_spectre_v1 and cpu_show_spectre_v2 is an indirect branch as well. These need to be compiled with expolines. Move the sysfs functions for spectre reporting to a separate file and loose an '.' for one of the messages. Cc: stable@vger.kernel.org # 4.16 Fixes: d424986f ("s390: add sysfs attributes for spectre") Signed-off-by:
Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Martin Schwidefsky authored
[ Upstream commit c50c84c3 ] The assember code in arch/s390/kernel uses a few more indirect branches which need to be done with execute trampolines for CONFIG_EXPOLINE=y. Cc: stable@vger.kernel.org # 4.16 Fixes: f19fbd5e ("s390: introduce execute-trampolines for branches") Reviewed-by:
Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by:
Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Martin Schwidefsky authored
[ Upstream commit 23a4d7fd ] The return from the ftrace_stub, _mcount, ftrace_caller and return_to_handler functions is done with "br %r14" and "br %r1". These are indirect branches as well and need to use execute trampolines for CONFIG_EXPOLINE=y. The ftrace_caller function is a special case as it returns to the start of a function and may only use %r0 and %r1. For a pre z10 machine the standard execute trampoline uses a LARL + EX to do this, but this requires *two* registers in the range %r1..%r15. To get around this the 'br %r1' located in the lowcore is used, then the EX instruction does not need an address register. But the lowcore trick may only be used for pre z14 machines, with noexec=on the mapping for the first page may not contain instructions. The solution for that is an ALTERNATIVE in the expoline THUNK generated by 'GEN_BR_THUNK %r1' to switch to EXRL, this relies on the fact that a machine that supports noexec=on has EXRL as well. Cc: stable@vger.kernel.org # 4.16 Fixes: f19fbd5e ("s390: introduce execute-trampolines for branches") Signed-off-by:
Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Martin Schwidefsky authored
[ Upstream commit 97489e06 ] The return from the memmove, memset, memcpy, __memset16, __memset32 and __memset64 functions are done with "br %r14". These are indirect branches as well and need to use execute trampolines for CONFIG_EXPOLINE=y. Cc: stable@vger.kernel.org # 4.16 Fixes: f19fbd5e ("s390: introduce execute-trampolines for branches") Reviewed-by:
Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by:
Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Martin Schwidefsky authored
[ Upstream commit 467a3bf2 ] The return from the crc32_le_vgfm_16/crc32c_le_vgfm_16 and the crc32_be_vgfm_16 functions are done with "br %r14". These are indirect branches as well and need to use execute trampolines for CONFIG_EXPOLINE=y. Cc: stable@vger.kernel.org # 4.16 Fixes: f19fbd5e ("s390: introduce execute-trampolines for branches") Reviewed-by:
Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by:
Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Martin Schwidefsky authored
[ Upstream commit 6dd85fbb ] To be able to use the expoline branches in different assembler files move the associated macros from entry.S to a new header nospec-insn.h. While we are at it make the macros a bit nicer to use. Cc: stable@vger.kernel.org # 4.16 Fixes: f19fbd5e ("s390: introduce execute-trampolines for branches") Signed-off-by:
Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Martin Schwidefsky authored
[ Upstream commit fba9eb79 ] Add a header with macros usable in assembler files to emit alternative code sequences. It works analog to the alternatives for inline assmeblies in C files, with the same restrictions and capabilities. The syntax is ALTERNATIVE "<default instructions sequence>", \ "<alternative instructions sequence>", \ "<features-bit>" and ALTERNATIVE_2 "<default instructions sequence>", \ "<alternative instructions sqeuence #1>", \ "<feature-bit #1>", "<alternative instructions sqeuence #2>", \ "<feature-bit #2>" Reviewed-by:
Vasily Gorbik <gor@linux.vnet.ibm.com> Signed-off-by:
Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Al Viro authored
commit 5aa1437d upstream. open file, unlink it, then use ioctl(2) to make it immutable or append only. Now close it and watch the blocks *not* freed... Immutable/append-only checks belong in ->setattr(). Note: the bug is old and backport to anything prior to 737f2e93 ("ext2: convert to use the new truncate convention") will need these checks lifted into ext2_setattr(). Cc: stable@kernel.org Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Arvind Yadav authored
[ Upstream commit 00ad691a ] Never directly free @dev after calling device_register(), even if it returned an error. Always use put_device() to give up the reference initialized. Signed-off-by:
Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mohammed Gamal authored
[ Commit 55be9f25 upstream. ] On older windows hosts the net_device instance is returned to the caller of rndis_filter_device_add() without having the presence bit set first. This would cause any subsequent calls to network device operations (e.g. MTU change, channel change) to fail after the device is detached once, returning -ENODEV. Instead of returning the device instabce, we take the exit path where we call netif_device_attach() Fixes: 7b2ee50c ("hv_netvsc: common detach logic") Signed-off-by:
Mohammed Gamal <mgamal@redhat.com> Reviewed-by:
Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mohammed Gamal authored
[ Commit a56d99d7 upstream. ] Prior to commit 0cf73780 ("hv_netvsc: netvsc_teardown_gpadl() split") the call sequence in netvsc_device_remove() was as follows (as implemented in netvsc_destroy_buf()): 1- Send NVSP_MSG1_TYPE_REVOKE_RECV_BUF message 2- Teardown receive buffer GPADL 3- Send NVSP_MSG1_TYPE_REVOKE_SEND_BUF message 4- Teardown send buffer GPADL 5- Close vmbus This didn't work for WS2016 hosts. Commit 0cf73780 ("hv_netvsc: netvsc_teardown_gpadl() split") rearranged the teardown sequence as follows: 1- Send NVSP_MSG1_TYPE_REVOKE_RECV_BUF message 2- Send NVSP_MSG1_TYPE_REVOKE_SEND_BUF message 3- Close vmbus 4- Teardown receive buffer GPADL 5- Teardown send buffer GPADL That worked well for WS2016 hosts, but it prevented guests on older hosts from shutting down after changing network settings. Commit 0ef58b0a ("hv_netvsc: change GPAD teardown order on older versions") ensured the following message sequence for older hosts 1- Send NVSP_MSG1_TYPE_REVOKE_RECV_BUF message 2- Send NVSP_MSG1_TYPE_REVOKE_SEND_BUF message 3- Teardown receive buffer GPADL 4- Teardown send buffer GPADL 5- Close vmbus However, with this sequence calling `ip link set eth0 mtu 1000` hangs and the process becomes uninterruptible. On futher analysis it turns out that on tearing down the receive buffer GPADL the kernel is waiting indefinitely in vmbus_teardown_gpadl() for a completion to be signaled. Here is a snippet of where this occurs: int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 gpadl_handle) { struct vmbus_channel_gpadl_teardown *msg; struct vmbus_channel_msginfo *info; unsigned long flags; int ret; info = kmalloc(sizeof(*info) + sizeof(struct vmbus_channel_gpadl_teardown), GFP_KERNEL); if (!info) return -ENOMEM; init_completion(&info->waitevent); info->waiting_channel = channel; [....] ret = vmbus_post_msg(msg, sizeof(struct vmbus_channel_gpadl_teardown), true); if (ret) goto post_msg_err; wait_for_completion(&info->waitevent); [....] } The completion is signaled from vmbus_ongpadl_torndown(), which gets called when the corresponding message is received from the host, which apparently never happens in that case. This patch works around the issue by restoring the first mentioned message sequence for older hosts Fixes: 0ef58b0a ("hv_netvsc: change GPAD teardown order on older versions") Signed-off-by:
Mohammed Gamal <mgamal@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mohammed Gamal authored
[ Commit 7992894c upstream. ] Split each of the functions into two for each of send/recv buffers. This will be needed in order to implement a fine-grained messaging sequence to the host so that we accommodate the requirements of different Windows versions Fixes: 0ef58b0a ("hv_netvsc: change GPAD teardown order on older versions") Signed-off-by:
Mohammed Gamal <mgamal@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mohammed Gamal authored
commit 2afc5d61A upstram When changing network interface settings, Windows guests older than WS2016 can no longer shutdown. This was addressed by commit 0ef58b0a ("hv_netvsc: change GPAD teardown order on older versions"), however the issue also occurs on WS2012 guests that share NVSP protocol versions with WS2016 guests. Hence we use Windows version directly to differentiate them. Fixes: 0ef58b0a ("hv_netvsc: change GPAD teardown order on older versions") Signed-off-by:
Mohammed Gamal <mgamal@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Stephen Hemminger authored
[ Commit 7b2ee50c upstream. ] Make common function for detaching internals of device during changes to MTU and RSS. Make sure no more packets are transmitted and all packets have been received before doing device teardown. Change the wait logic to be common and use usleep_range(). Changes transmit enabling logic so that transmit queues are disabled during the period when lower device is being changed. And enabled only after sub channels are setup. This avoids issue where it could be that a packet was being sent while subchannel was not initialized. Fixes: 8195b139 ("hv_netvsc: fix deadlock on hotplug") Signed-off-by:
Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Stephen Hemminger authored
[ Commit 0ef58b0a upstream. ] On older versions of Windows, the host ignores messages after vmbus channel is closed. Workaround this by doing what Windows does and send the teardown before close on older versions of NVSP protocol. Reported-by:
Mohammed Gamal <mgamal@redhat.com> Fixes: 0cf73780 ("hv_netvsc: netvsc_teardown_gpadl() split") Signed-off-by:
Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Stephen Hemminger authored
[ Commit 02400fce upstream. ] The receive processing may continue to happen while the internal network device state is in RCU grace period. The internal RNDIS structure is associated with the internal netvsc_device structure; both have the same RCU lifetime. Defer freeing all associated parts until after grace period. Fixes: 0cf73780 ("hv_netvsc: netvsc_teardown_gpadl() split") Signed-off-by:
Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Stephen Hemminger authored
[ Commit 8348e046 upstream. ] This makes sure that no CPU is still process packets when the channel is closed. Fixes: 76bb5db5 ("netvsc: fix use after free on module removal") Signed-off-by:
Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Stephen Hemminger authored
[ Commit b3bf5666 upstream. ] When VF is used for accelerated networking it will likely have more queues (and different policy) than the synthetic NIC. This patch defers the queue policy to the VF so that all the queues can be used. This impacts workloads like local generate UDP. Signed-off-by:
Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Stephen Hemminger authored
[ Commit d64e38ae upstream. ] There is a race between napi_reschedule and re-enabling interrupts which could lead to missed host interrrupts. This occurs when interrupts are re-enabled (hv_end_read) and vmbus irq callback (netvsc_channel_cb) has already scheduled NAPI. Signed-off-by:
Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Stephen Hemminger authored
[ Commit a7483ec0 upstream. ] Block setup of multiple channels earlier in the teardown process. This avoids possible races between halt and subchannel initialization. Suggested-by:
Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by:
Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Stephen Hemminger authored
[ Commit fcfb4a00 upstream. ] Need to delete NAPI association if vmbus_open fails. Signed-off-by:
Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Stephen Hemminger authored
[ Commit f4950e45 upstream. ] Don't wake transmit queues if link is not up yet. Signed-off-by:
Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Stephen Hemminger authored
[ Commit 12f69661 upstream. ] Change the initialization order so that the device is ready to transmit (ie connect vsp is completed) before setting the internal reference to the device with RCU. This avoids any races on initialization and prevents retry issues on shutdown. Signed-off-by:
Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Haiyang Zhang authored
[ Commit 25a39f7f upstream. ] Since we no longer localize channel/CPU affiliation within one NUMA node, num_online_cpus() is used as the number of channel cap, instead of the number of processors in a NUMA node. This patch allows a bigger range for tuning the number of channels. Signed-off-by:
Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Stephen Hemminger authored
[ Commit cfd8afd9 upstream. ] If the transmit queue is known full, then don't keep aggregating data. And the cp_partial flag which indicates that the current aggregation buffer is full can be folded in to avoid more conditionals. Signed-off-by:
Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Vitaly Kuznetsov authored
[ Commit aefd80e8 upstream. ] rndis_filter_device_add() is called both from netvsc_probe() when we initially create the device and from set channels/mtu/ringparam routines where we basically remove the device and add it back. hw_features is reset in rndis_filter_device_add() and filled with host data. However, we lose all additional flags which are set outside of the driver, e.g. register_netdevice() adds NETIF_F_SOFT_FEATURES and many others. Unfortunately, calls to rndis_{query_hwcaps(), _set_offload_params()} calls cannot be avoided on every RNDIS reset: host expects us to set required features explicitly. Moreover, in theory hardware capabilities can change and we need to reflect the change in hw_features. Reset net->hw_features bits according to host data in rndis_netdev_set_hwcaps(), clear corresponding feature bits from net->features in case some features went missing (will never happen in real life I guess but let's be consistent). Signed-off-by:
Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by:
Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Vitaly Kuznetsov authored
[ Commit 0cf73780 upstream. ] It was found that in some cases host refuses to teardown GPADL for send/ receive buffers (probably when some work with these buffere is scheduled or ongoing). Change the teardown logic to be: 1) Send NVSP_MSG1_TYPE_REVOKE_* messages 2) Close the channel 3) Teardown GPADLs. This seems to work reliably. Signed-off-by:
Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Haiyang Zhang authored
[ Commit a6fb6aa3 upstream. ] In some cases, like internal vSwitch, the host doesn't provide send indirection table updates. This patch sets the table to be equal weight after subchannels are all open. Otherwise, all workload will be on one TX channel. As tested, this patch has largely increased the throughput over internal vSwitch. Signed-off-by:
Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Haiyang Zhang authored
[ Commit 6b0cbe31 upstream. ] tx_table is part of the private data of kernel net_device. It is only zero-ed out when allocating net_device. We may recreate netvsc_device w/o recreating net_device, so the private netdev data, including tx_table, are not zeroed. It may contain channel numbers for the older netvsc_device. This patch adds initialization of tx_table each time we recreate netvsc_device. Signed-off-by:
Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Haiyang Zhang authored
[ Commit 39e91cfb upstream. ] Simplify the variable name: tx_send_table Signed-off-by:
Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Haiyang Zhang authored
[ Commit 47371300 upstream. ] Rename this variable because it is the Receive indirection table. Signed-off-by:
Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Haiyang Zhang authored
[ Commit 6450f8f2 upstream. ] For older hosts without multi-channel (vRSS) support, and some error cases, we still need to set the real number of queues to one. This patch adds this missing setting. Fixes: 8195b139 ("hv_netvsc: fix deadlock on hotplug") Signed-off-by:
Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by:
Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
hpreg@vmware.com authored
[ Upstream commit f3002c13 ] The gen bits must be read first from (resp. written last to) DMA memory. The proper way to enforce this on Linux is to call dma_rmb() (resp. dma_wmb()). Signed-off-by:
Regis Duchesne <hpreg@vmware.com> Acked-by:
Ronak Doshi <doshir@vmware.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
hpreg@vmware.com authored
[ Upstream commit 61aeecea ] The DMA mask must be set before, not after, the first DMA map operation, or the first DMA map operation could in theory fail on some systems. Fixes: b0eb57cb ("VMXNET3: Add support for virtual IOMMU") Signed-off-by:
Regis Duchesne <hpreg@vmware.com> Acked-by:
Ronak Doshi <doshir@vmware.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Dumazet authored
[ Upstream commit 7f582b24 ] syzkaller found a reliable way to crash the host, hitting a BUG() in __tcp_retransmit_skb() Malicous MSG_FASTOPEN is the root cause. We need to purge write queue in tcp_connect_init() at the point we init snd_una/write_seq. This patch also replaces the BUG() by a less intrusive WARN_ON_ONCE() kernel BUG at net/ipv4/tcp_output.c:2837! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 0 PID: 5276 Comm: syz-executor0 Not tainted 4.17.0-rc3+ #51 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__tcp_retransmit_skb+0x2992/0x2eb0 net/ipv4/tcp_output.c:2837 RSP: 0000:ffff8801dae06ff8 EFLAGS: 00010206 RAX: ffff8801b9fe61c0 RBX: 00000000ffc18a16 RCX: ffffffff864e1a49 RDX: 0000000000000100 RSI: ffffffff864e2e12 RDI: 0000000000000005 RBP: ffff8801dae073a0 R08: ffff8801b9fe61c0 R09: ffffed0039c40dd2 R10: ffffed0039c40dd2 R11: ffff8801ce206e93 R12: 00000000421eeaad R13: ffff8801ce206d4e R14: ffff8801ce206cc0 R15: ffff8801cd4f4a80 FS: 0000000000000000(0000) GS:ffff8801dae00000(0063) knlGS:00000000096bc900 CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 0000000020000000 CR3: 00000001c47b6000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> tcp_retransmit_skb+0x2e/0x250 net/ipv4/tcp_output.c:2923 tcp_retransmit_timer+0xc50/0x3060 net/ipv4/tcp_timer.c:488 tcp_write_timer_handler+0x339/0x960 net/ipv4/tcp_timer.c:573 tcp_write_timer+0x111/0x1d0 net/ipv4/tcp_timer.c:593 call_timer_fn+0x230/0x940 kernel/time/timer.c:1326 expire_timers kernel/time/timer.c:1363 [inline] __run_timers+0x79e/0xc50 kernel/time/timer.c:1666 run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692 __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285 invoke_softirq kernel/softirq.c:365 [inline] irq_exit+0x1d1/0x200 kernel/softirq.c:405 exiting_irq arch/x86/include/asm/apic.h:525 [inline] smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863 Fixes: cf60af03 ("net-tcp: Fast Open client - sendmsg(MSG_FASTOPEN)") Signed-off-by:
Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Reported-by:
syzbot <syzkaller@googlegroups.com> Acked-by:
Neal Cardwell <ncardwell@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Dumazet authored
[ Upstream commit 9709020c ] We must not call sock_diag_has_destroy_listeners(sk) on a socket that has no reference on net structure. BUG: KASAN: use-after-free in sock_diag_has_destroy_listeners include/linux/sock_diag.h:75 [inline] BUG: KASAN: use-after-free in __sk_free+0x329/0x340 net/core/sock.c:1609 Read of size 8 at addr ffff88018a02e3a0 by task swapper/1/0 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.17.0-rc5+ #54 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b9/0x294 lib/dump_stack.c:113 print_address_description+0x6c/0x20b mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433 sock_diag_has_destroy_listeners include/linux/sock_diag.h:75 [inline] __sk_free+0x329/0x340 net/core/sock.c:1609 sk_free+0x42/0x50 net/core/sock.c:1623 sock_put include/net/sock.h:1664 [inline] reqsk_free include/net/request_sock.h:116 [inline] reqsk_put include/net/request_sock.h:124 [inline] inet_csk_reqsk_queue_drop_and_put net/ipv4/inet_connection_sock.c:672 [inline] reqsk_timer_handler+0xe27/0x10e0 net/ipv4/inet_connection_sock.c:739 call_timer_fn+0x230/0x940 kernel/time/timer.c:1326 expire_timers kernel/time/timer.c:1363 [inline] __run_timers+0x79e/0xc50 kernel/time/timer.c:1666 run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692 __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285 invoke_softirq kernel/softirq.c:365 [inline] irq_exit+0x1d1/0x200 kernel/softirq.c:405 exiting_irq arch/x86/include/asm/apic.h:525 [inline] smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863 </IRQ> RIP: 0010:native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:54 RSP: 0018:ffff8801d9ae7c38 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13 RAX: dffffc0000000000 RBX: 1ffff1003b35cf8a RCX: 0000000000000000 RDX: 1ffffffff11a30d0 RSI: 0000000000000001 RDI: ffffffff88d18680 RBP: ffff8801d9ae7c38 R08: ffffed003b5e46c3 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 R13: ffff8801d9ae7cf0 R14: ffffffff897bef20 R15: 0000000000000000 arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline] default_idle+0xc2/0x440 arch/x86/kernel/process.c:354 arch_cpu_idle+0x10/0x20 arch/x86/kernel/process.c:345 default_idle_call+0x6d/0x90 kernel/sched/idle.c:93 cpuidle_idle_call kernel/sched/idle.c:153 [inline] do_idle+0x395/0x560 kernel/sched/idle.c:262 cpu_startup_entry+0x104/0x120 kernel/sched/idle.c:368 start_secondary+0x426/0x5b0 arch/x86/kernel/smpboot.c:269 secondary_startup_64+0xa5/0xb0 arch/x86/kernel/head_64.S:242 Allocated by task 4557: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490 kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554 kmem_cache_zalloc include/linux/slab.h:691 [inline] net_alloc net/core/net_namespace.c:383 [inline] copy_net_ns+0x159/0x4c0 net/core/net_namespace.c:423 create_new_namespaces+0x69d/0x8f0 kernel/nsproxy.c:107 unshare_nsproxy_namespaces+0xc3/0x1f0 kernel/nsproxy.c:206 ksys_unshare+0x708/0xf90 kernel/fork.c:2408 __do_sys_unshare kernel/fork.c:2476 [inline] __se_sys_unshare kernel/fork.c:2474 [inline] __x64_sys_unshare+0x31/0x40 kernel/fork.c:2474 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 69: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528 __cache_free mm/slab.c:3498 [inline] kmem_cache_free+0x86/0x2d0 mm/slab.c:3756 net_free net/core/net_namespace.c:399 [inline] net_drop_ns.part.14+0x11a/0x130 net/core/net_namespace.c:406 net_drop_ns net/core/net_namespace.c:405 [inline] cleanup_net+0x6a1/0xb20 net/core/net_namespace.c:541 process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145 worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279 kthread+0x345/0x410 kernel/kthread.c:240 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412 The buggy address belongs to the object at ffff88018a02c140 which belongs to the cache net_namespace of size 8832 The buggy address is located 8800 bytes inside of 8832-byte region [ffff88018a02c140, ffff88018a02e3c0) The buggy address belongs to the page: page:ffffea0006280b00 count:1 mapcount:0 mapping:ffff88018a02c140 index:0x0 compound_mapcount: 0 flags: 0x2fffc0000008100(slab|head) raw: 02fffc0000008100 ffff88018a02c140 0000000000000000 0000000100000001 raw: ffffea00062a1320 ffffea0006268020 ffff8801d9bdde40 0000000000000000 page dumped because: kasan: bad access detected Fixes: b922622e ("sock_diag: don't broadcast kernel sockets") Signed-off-by:
Eric Dumazet <edumazet@google.com> Cc: Craig Gallek <kraig@google.com> Reported-by:
syzbot <syzkaller@googlegroups.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Willem de Bruijn authored
[ Upstream commit b84bbaf7 ] Packet sockets allow construction of packets shorter than dev->hard_header_len to accommodate protocols with variable length link layer headers. These packets are padded to dev->hard_header_len, because some device drivers interpret that as a minimum packet size. packet_snd reserves dev->hard_header_len bytes on allocation. SOCK_DGRAM sockets call skb_push in dev_hard_header() to ensure that link layer headers are stored in the reserved range. SOCK_RAW sockets do the same in tpacket_snd, but not in packet_snd. Syzbot was able to send a zero byte packet to a device with massive 116B link layer header, causing padding to cross over into skb_shinfo. Fix this by writing from the start of the llheader reserved range also in the case of packet_snd/SOCK_RAW. Update skb_set_network_header to the new offset. This also corrects it for SOCK_DGRAM, where it incorrectly double counted reserve due to the skb_push in dev_hard_header. Fixes: 9ed988cd ("packet: validate variable length ll headers") Reported-by: syzbot+71d74a5406d02057d559@syzkaller.appspotmail.com Signed-off-by:
Willem de Bruijn <willemb@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Willem de Bruijn authored
[ Upstream commit 113f99c3 ] Device features may change during transmission. In particular with corking, a device may toggle scatter-gather in between allocating and writing to an skb. Do not unconditionally assume that !NETIF_F_SG at write time implies that the same held at alloc time and thus the skb has sufficient tailroom. This issue predates git history. Fixes: 1da177e4 ("Linux-2.6.12-rc2") Reported-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
Willem de Bruijn <willemb@google.com> Reviewed-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Biggers authored
[ Upstream commit d49baa7e ] It's possible to crash the kernel in several different ways by sending messages to the SMC_PNETID generic netlink family that are missing the expected attributes: - Missing SMC_PNETID_NAME => null pointer dereference when comparing names. - Missing SMC_PNETID_ETHNAME => null pointer dereference accessing smc_pnetentry::ndev. - Missing SMC_PNETID_IBNAME => null pointer dereference accessing smc_pnetentry::smcibdev. - Missing SMC_PNETID_IBPORT => out of bounds array access to smc_ib_device::pattr[-1]. Fix it by validating that all expected attributes are present and that SMC_PNETID_IBPORT is nonzero. Reported-by: syzbot+5cd61039dc9b8bfa6e47@syzkaller.appspotmail.com Fixes: 6812baab ("smc: establish pnet table management") Cc: <stable@vger.kernel.org> # v4.11+ Signed-off-by:
Eric Biggers <ebiggers@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Paolo Abeni authored
[ Upstream commit 44a63b13 ] Hangbin reported an Oops triggered by the syzkaller qdisc rules: kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN PTI Modules linked in: sch_red CPU: 0 PID: 28699 Comm: syz-executor5 Not tainted 4.17.0-rc4.kcov #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:qdisc_hash_add+0x26/0xa0 RSP: 0018:ffff8800589cf470 EFLAGS: 00010203 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff824ad971 RDX: 0000000000000007 RSI: ffffc9000ce9f000 RDI: 000000000000003c RBP: 0000000000000001 R08: ffffed000b139ea2 R09: ffff8800589cf4f0 R10: ffff8800589cf50f R11: ffffed000b139ea2 R12: ffff880054019fc0 R13: ffff880054019fb4 R14: ffff88005c0af600 R15: ffff880054019fb0 FS: 00007fa6edcb1700(0000) GS:ffff88005ce00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000740 CR3: 000000000fc16000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: red_change+0x2d2/0xed0 [sch_red] qdisc_create+0x57e/0xef0 tc_modify_qdisc+0x47f/0x14e0 rtnetlink_rcv_msg+0x6a8/0x920 netlink_rcv_skb+0x2a2/0x3c0 netlink_unicast+0x511/0x740 netlink_sendmsg+0x825/0xc30 sock_sendmsg+0xc5/0x100 ___sys_sendmsg+0x778/0x8e0 __sys_sendmsg+0xf5/0x1b0 do_syscall_64+0xbd/0x3b0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x450869 RSP: 002b:00007fa6edcb0c48 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007fa6edcb16b4 RCX: 0000000000450869 RDX: 0000000000000000 RSI: 00000000200000c0 RDI: 0000000000000013 RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 0000000000008778 R14: 0000000000702838 R15: 00007fa6edcb1700 Code: e9 0b fe ff ff 0f 1f 44 00 00 55 53 48 89 fb 89 f5 e8 3f 07 f3 fe 48 8d 7b 3c 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 04 84 d2 75 51 RIP: qdisc_hash_add+0x26/0xa0 RSP: ffff8800589cf470 When a red qdisc is updated with a 0 limit, the child qdisc is left unmodified, no additional scheduler is created in red_change(), the 'child' local variable is rightfully NULL and must not add it to the hash table. This change addresses the above issue moving qdisc_hash_add() right after the child qdisc creation. It additionally removes unneeded checks for noop_qdisc. Reported-by:
Hangbin Liu <liuhangbin@gmail.com> Fixes: 49b49971 ("net: sched: make default fifo qdiscs appear in the dump") Signed-off-by:
Paolo Abeni <pabeni@redhat.com> Acked-by:
Jiri Kosina <jkosina@suse.cz> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-