- 15 Nov, 2019 40 commits
-
-
David S. Miller authored
Karsten Graul says: ==================== net/smc: improve termination handling (part 3) Part 3 of the SMC termination patches improves the link group termination processing and introduces the ability to immediately terminate a link group. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ursula Braun authored
If the SMC module is unloaded or an IB device is thrown away, the immediate link group freeing introduced for SMCD is exploited for SMCR as well. That means SMCR-specifics are added to smc_conn_kill(). Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ursula Braun authored
Make sure all pending work requests are completed before freeing a link. Dismiss tx pending slots already when terminating a link group to exploit termination shortcut in tx completion queue handler. And kill the completion queue tasklets after destroy of the completion queues, otherwise there is a time window for another tasklet schedule of an already killed tasklet. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ursula Braun authored
For abnormal termination issue an LLC DELETE_LINK without the orderly flag. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ursula Braun authored
Avoid waiting for a free work request buffer, if the link group is already terminating. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ursula Braun authored
If the ism module is unloaded return control from exit routine only, if all link groups are freed. If an IB device is thrown away return control from device removal only, if all link groups belonging to this device are freed. A counters for the total number of SMCD link groups per ISM device is introduced. ism module unloading continues only if the total number of SMCD link groups for all ISM devices is zero. ISM device removal continues only it the total number of SMCD link groups per ISM device has decreased to zero. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ursula Braun authored
A final cleanup due to SMCD device removal means immediate freeing of all link groups belonging to this device in interrupt context. This patch introduces a separate SMCD link group termination routine, which terminates all link groups of an SMCD device. This new routine smcd_terminate_all ()is reused if the smc module is unloaded. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ursula Braun authored
SMCD link group termination is called when peer signals its shutdown of its corresponding link group. For regular shutdowns no connections exist anymore. For abnormal shutdowns connections must be killed and their DMBs must be unregistered immediately. That means the SMCR method to delay the link group freeing several seconds does not fit. This patch adds immediate termination of a link group and its SMCD connections and makes sure all SMCD link group related cleanup steps are finished. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ursula Braun authored
If peer announces shutdown, use the link group terminate worker for local cleanup of link groups and connections to terminate link group in proper context. Make sure link groups are cleaned up first before destroying the event queue of the SMCD device, because link group cleanup may raise events. Send signal shutdown only if peer has not done it already. Send socket abort or close only, if peer has not already announced shutdown. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Jose Abreu says: ==================== net: stmmac: CPU Performance Improvements CPU Performance improvements for stmmac. Please check bellow for results before and after the series. Patch 1/7, allows RX Interrupt on Completion to be disabled and only use the RX HW Watchdog. Patch 2/7, setups the default RX coalesce settings instead of using the minimum value. Patch 3/7 and 4/7, removes the uneeded computations for RX Flow Control activation/de-activation, on some cases. Patch 5/7, tunes-up the default coalesce settings. Patch 6/7, re-works the TX coalesce timer activation logic. Patch 7/7, removes the now uneeded TBU interrupt. NetPerf UDP Results: -------------------- Socket Message Elapsed Messages CPU Service Size Size Time Okay Errors Throughput Util Demand bytes bytes secs # # 10^6bits/sec % SS us/KB --- XGMAC@2.5G: Before 212992 1400 10.00 2100620 0 2351.7 36.69 5.112 212992 10.00 2100539 2351.6 26.18 3.648 --- XGMAC@2.5G: After 212992 1400 10.00 2108972 0 2361.5 21.73 3.015 212992 10.00 2097038 2348.1 19.21 2.666 --- GMAC5@1G: Before 212992 1400 10.00 786000 0 880.2 34.71 12.923 212992 10.00 786000 880.2 23.42 8.719 --- GMAC5@1G: After 212992 1400 10.00 842648 0 943.7 14.12 4.903 212992 10.00 842648 943.7 12.73 4.418 Perf TCP Results on RX Path: ---------------------------- --- XGMAC@2.5G: Before 22.51% swapper [stmmac] [k] dwxgmac2_dma_interrupt 10.82% swapper [stmmac] [k] dwxgmac2_host_mtl_irq_status 5.21% swapper [stmmac] [k] dwxgmac2_host_irq_status 4.67% swapper [stmmac] [k] dwxgmac3_safety_feat_irq_status 3.63% swapper [kernel.kallsyms] [k] stack_trace_consume_entry 2.74% iperf3 [kernel.kallsyms] [k] copy_user_enhanced_fast_string 2.52% swapper [kernel.kallsyms] [k] update_stack_state 1.94% ksoftirqd/0 [stmmac] [k] dwxgmac2_dma_interrupt 1.45% iperf3 [kernel.kallsyms] [k] queued_spin_lock_slowpath 1.26% swapper [kernel.kallsyms] [k] create_object --- XGMAC@2.5G: After 7.43% swapper [kernel.kallsyms] [k] stack_trace_consume_entry 5.86% swapper [stmmac] [k] dwxgmac2_dma_interrupt 5.68% swapper [kernel.kallsyms] [k] update_stack_state 4.71% iperf3 [kernel.kallsyms] [k] copy_user_enhanced_fast_string 2.88% swapper [kernel.kallsyms] [k] create_object 2.69% swapper [stmmac] [k] dwxgmac2_host_mtl_irq_status 2.61% swapper [stmmac] [k] stmmac_napi_poll_rx 2.52% swapper [kernel.kallsyms] [k] unwind_next_frame.part.4 1.48% swapper [kernel.kallsyms] [k] unwind_get_return_address 1.38% swapper [kernel.kallsyms] [k] arch_stack_walk --- GMAC5@1G: Before 31.29% swapper [stmmac] [k] dwmac4_dma_interrupt 14.57% swapper [stmmac] [k] dwmac4_irq_mtl_status 10.66% swapper [stmmac] [k] dwmac4_irq_status 1.97% swapper [kernel.kallsyms] [k] stack_trace_consume_entry 1.73% iperf3 [kernel.kallsyms] [k] copy_user_enhanced_fast_string 1.59% swapper [kernel.kallsyms] [k] update_stack_state 1.15% iperf3 [kernel.kallsyms] [k] do_syscall_64 1.01% ksoftirqd/0 [stmmac] [k] dwmac4_dma_interrupt 0.89% swapper [kernel.kallsyms] [k] __default_send_IPI_dest_field 0.75% swapper [stmmac] [k] stmmac_napi_poll_rx --- GMAC5@1G: After 6.70% swapper [kernel.kallsyms] [k] stack_trace_consume_entry 5.79% swapper [stmmac] [k] dwmac4_dma_interrupt 5.29% swapper [kernel.kallsyms] [k] update_stack_state 3.52% iperf3 [kernel.kallsyms] [k] copy_user_enhanced_fast_string 2.83% swapper [stmmac] [k] dwmac4_irq_mtl_status 2.62% swapper [kernel.kallsyms] [k] create_object 2.46% swapper [stmmac] [k] stmmac_napi_poll_rx 2.32% swapper [kernel.kallsyms] [k] unwind_next_frame.part.4 2.19% swapper [stmmac] [k] dwmac4_irq_status 1.39% swapper [kernel.kallsyms] [k] unwind_get_return_address ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jose Abreu authored
Now that TX Coalesce has been rewritten we no longer need this additional interrupt enabled. This reduces CPU usage. Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jose Abreu authored
Coalesce logic currently increments the number of packets and sets the IC bit when the coalesced packets have passed a given limit. This does not reflect very well what coalesce was meant for as we can have a large number of packets that are coalesced and then a single one, sent later on that has the IC bit. Rework the logic so that it coalesces only upon a limit of packets and sets the IC bit for large number of packets. Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jose Abreu authored
Tune-up the defalt coalesce settings for optimal values. This gives the best performance in most of the use-cases. Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jose Abreu authored
RFA and RFD should not be dependent on FIFO size. In fact, the more FIFO space we have, the later we can activate Flow Control. Let's use hard-coded values for RFA and RFD for all FIFO sizes with the exception of 4k, which is a special case. Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jose Abreu authored
RFA and RFD should not be dependent on FIFO size. In fact, the more FIFO space we have, the later we can activate Flow Control. Let's use hard-coded values for RFA and RFD for all FIFO sizes with the exception of 4k, which is a special case. Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jose Abreu authored
For performance reasons, sometimes using the minimum RX Coalesce value is not optimal. Lets setup a default value that is optimal in most of the use cases. Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jose Abreu authored
We may only want to use the RX Watchdog so lets check if RX Coalesce settings are non-zero and only set the RX Interrupt on Completion bit if its not. Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
Commit 0c3cbbf9 ("mlxsw: Add specific trap for packets routed via invalid nexthops") allocated an adjacency entry during driver initialization whose purpose is to discard packets hitting the route pointing to it. These adjacency entries are allocated from a resource called KVD linear (KVDL). There are situations in which the user can decide to set the size of this resource (via devlink-resource) to 0, in which case the driver will not be able to load. Therefore, instead of pre-allocating this adjacency entry, simply allocate it only when needed. A variable indicating the validity of the entry is added and is used to ensure it is only allocated and written once and that it is freed after all the routes were flushed. Fixes: 0c3cbbf9 ("mlxsw: Add specific trap for packets routed via invalid nexthops") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
YueHaibing authored
If PROC_FS is not set, gcc warning this: net/tls/tls_proc.c:23:12: warning: 'tls_statistics_seq_show' defined but not used [-Wunused-function] Use #ifdef to guard this. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Julian Wiedmann says: ==================== s390/qeth: updates 2019-11-14 please apply the following qeth patches to net-next. Along with the usual cleanups, this (1) reduces collateral packet loss in the RX path when dealing with bad packets and/or allocation errors, and (2) simplifies how the L3 driver deals with mcast IP addresses. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Julian Wiedmann authored
Given the way how the sysfs attributes are registered / unregistered, the show/store helpers will never be called with a NULL drvdata. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Julian Wiedmann authored
The remaining usage effectively is a kmemdup() of the query object. By not wrapping it, some of the callers can now use GFP_KERNEL for the allocation. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Julian Wiedmann authored
Use vlan_for_each() instead of tracking each registered VID internally. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Julian Wiedmann authored
Current code processes each (VLAN) device twice - once to inspect the IPv4 mcast addresses, and then a second time to walk the IPv6 mcast addresses. Unify all this into a single helper, thus removing some checks and a duplicated VLAN lookup. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Julian Wiedmann authored
Trust the IPv4/IPv6 code to properly remove its mcast addresses when a VLAN device is unregistered, and then also trigger an RX modeset whenever it's needed. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Julian Wiedmann authored
Push the inet6_dev locking down into the helper that actually needs it for walking the mc_list. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Julian Wiedmann authored
qeth_core_free_card() is meant to be the counterpart of qeth_alloc_card() - but unfortunately was also picked as the place to free the QDIO queues. This gets messy when qeth_core_probe_device() fails during qeth_add_dbf_entry(). At this point the card->qdio.state is not initialized yet, so qeth_free_qdio_queues() ends up operating on uninitialized data. Luckily for now, the whole qeth_card struct is zero-allocated and the value of the QETH_QDIO_UNINITIALIZED enum is 0 as well. So there's no real impact from this bug at the moment, it's just really fragile. Clean this up by moving the qeth_free_qdio_queues() call up one level in the hierarchy. This way it doesn't get called from the error path. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Julian Wiedmann authored
When current code fails to allocate an skb in the RX path, it drops the whole RX buffer. Considering the large number of packets that a single RX buffer might contain, this is quite drastic. Skip over the packet instead, and try to extract the next packet from the RX buffer. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Julian Wiedmann authored
Packets with an unexpected HW format are currently first extracted from the RX buffer, passed upwards to the layer-specific driver and only then finally dropped. Enhance the RX path so that we can drop such packets before even allocating an skb. For this, add some additional logic so that when a packet is meant to be dropped, we can still walk along the packet's data chunks in the RX buffer. This allows us to extract the following packet(s) from the buffer. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Julian Wiedmann authored
Each RX buffer may contain up to 64KB worth of data. In case the device needs to discard a packet _after_ already having reserved space for it in the buffer, the whole buffer gets set to ERROR state. As the buffer might contain any number of good packets, this can result in collateral packet loss. qeth can provide relief by enabling per-frame invalidation. The RX buffer is then presented as usual, we just need to spot & drop any individual packet that was flagged as invalid. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Julian Wiedmann authored
Where available, use the fine-grained counters in rtnl_link_stats64 to indicate different RX error causes. For drop reasons, use driver-private ethtool counters. In particular this patch allows us to keep track of driver-side drops due to unknown/unsupported HW descriptor format. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Stefano Garzarella says: ==================== vsock: add multi-transports support Most of the patches are reviewed by Dexuan, Stefan, and Jorgen. The following patches need reviews: - [11/15] vsock: add multi-transports support - [12/15] vsock/vmci: register vmci_transport only when VMCI guest/host are active - [15/15] vhost/vsock: refuse CID assigned to the guest->host transport RFC: https://patchwork.ozlabs.org/cover/1168442/ v1: https://patchwork.ozlabs.org/cover/1181986/ v1 -> v2: - Patch 11: + vmci_transport: sent reset when vsock_assign_transport() fails [Jorgen] + fixed loopback in the guests, checking if the remote_addr is the same of transport_g2h->get_local_cid() + virtio_transport_common: updated space available while creating the new child socket during a connection request - Patch 12: + removed 'features' variable in vmci_transport_init() [Stefan] + added a flag to register only once the host [Jorgen] - Added patch 15 to refuse CID assigned to the guest->host transport in the vhost_transport This series adds the multi-transports support to vsock, following this proposal: https://www.spinics.net/lists/netdev/msg575792.html With the multi-transports support, we can use VSOCK with nested VMs (using also different hypervisors) loading both guest->host and host->guest transports at the same time. Before this series, vmci_transport supported this behavior but only using VMware hypervisor on L0, L1, etc. The first 9 patches are cleanups and preparations, maybe some of these can go regardless of this series. Patch 10 changes the hvs_remote_addr_init(). setting the VMADDR_CID_HOST as remote CID instead of VMADDR_CID_ANY to make the choice of transport to be used work properly. Patch 11 adds multi-transports support. Patch 12 changes a little bit the vmci_transport and the vmci driver to register the vmci_transport only when there are active host/guest. Patch 13 prevents the transport modules unloading while sockets are assigned to them. Patch 14 fixes an issue in the bind() logic discoverable only with the new multi-transport support. Patch 15 refuses CID assigned to the guest->host transport in the vhost_transport. I've tested this series with nested KVM (vsock-transport [L0,L1], virtio-transport[L1,L2]) and with VMware (L0) + KVM (L1) (vmci-transport [L0,L1], vhost-transport [L1], virtio-transport[L2]). Dexuan successfully tested the RFC series on HyperV with a Linux guest. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stefano Garzarella authored
In a nested VM environment, we have to refuse to assign to a nested guest the same CID assigned to our guest->host transport. In this way, the user can use the local CID for loopback. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stefano Garzarella authored
When we are looking for a socket bound to a specific address, we also have to take into account the CID. This patch is useful with multi-transports support because it allows the binding of the same port with different CID, and it prevents a connection to a wrong socket bound to the same port, but with different CID. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stefano Garzarella authored
This patch adds 'module' member in the 'struct vsock_transport' in order to get/put the transport module. This prevents the module unloading while sockets are assigned to it. We increase the module refcnt when a socket is assigned to a transport, and we decrease the module refcnt when the socket is destructed. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stefano Garzarella authored
To allow other transports to be loaded with vmci_transport, we register the vmci_transport as G2H or H2G only when a VMCI guest or host is active. To do that, this patch adds a callback registered in the vmci driver that will be called when the host or guest becomes active. This callback will register the vmci_transport in the VSOCK core. Cc: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stefano Garzarella authored
This patch adds the support of multiple transports in the VSOCK core. With the multi-transports support, we can use vsock with nested VMs (using also different hypervisors) loading both guest->host and host->guest transports at the same time. Major changes: - vsock core module can be loaded regardless of the transports - vsock_core_init() and vsock_core_exit() are renamed to vsock_core_register() and vsock_core_unregister() - vsock_core_register() has a feature parameter (H2G, G2H, DGRAM) to identify which directions the transport can handle and if it's support DGRAM (only vmci) - each stream socket is assigned to a transport when the remote CID is set (during the connect() or when we receive a connection request on a listener socket). The remote CID is used to decide which transport to use: - remote CID <= VMADDR_CID_HOST will use guest->host transport; - remote CID == local_cid (guest->host transport) will use guest->host transport for loopback (host->guest transports don't support loopback); - remote CID > VMADDR_CID_HOST will use host->guest transport; - listener sockets are not bound to any transports since no transport operations are done on it. In this way we can create a listener socket, also if the transports are not loaded or with VMADDR_CID_ANY to listen on all transports. - DGRAM sockets are handled as before, since only the vmci_transport provides this feature. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stefano Garzarella authored
Remote peer is always the host, so we set VMADDR_CID_HOST as remote CID instead of VMADDR_CID_ANY. Reviewed-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stefano Garzarella authored
vsock_insert_unbound() was called only when 'sock' parameter of __vsock_create() was not null. This only happened when __vsock_create() was called by vsock_create(). In order to simplify the multi-transports support, this patch moves vsock_insert_unbound() at the end of vsock_create(). Reviewed-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stefano Garzarella authored
All transports call __vsock_create() with the same parameters, most of them depending on the parent socket. In order to simplify the VSOCK core APIs exposed to the transports, this patch adds the vsock_create_connected() callable from transports to create a new socket when a connection request is received. We also unexported the __vsock_create(). Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-