- 01 May, 2017 21 commits
-
-
David S. Miller authored
Jakub Kicinski says: ==================== nfp: optimize XDP TX and small fixes This series optimizes the nfp XDP TX performance a little bit. I run quick tests on an Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz. Single core/queue performance for both touch and drop and touch and forward is above 20Mpps @64B packets, drop being 2Mpps faster. I think this is max for a single queue on the low power NFPs. There are also a few minor fixes included for code in net-next. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
For legacy reasons NFP FW may be compiled to DMA packets to a constant offset into the buffer and use the space before it for metadata. This ensures that packets data always start at a certain offset regardless of the amount of preceding metadata. If rx offset is set to 0 there may still be up to 64 bytes of metadata but metadata will start at the beginning of the buffer, instead of: data_start_offset = rx_offset - meta_len Even though we make the buffers larger to accommodate up to 64 bytes of metadata, if there is only N bytes of metadata, we will end up with N bytes of headroom and 64 - N bytes of tailroom. Therefore we can't rely on that space for XDP headroom. Make sure we always allocate full 256 bytes. This, unfortunately, means we can't fit the headroom on an u8 any more. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Right now the required Service Process ABI version is still tied to max ID of known commands. For new NSP commands we are adding we are checking if NSP version is recent enough on command-by-command basis. The driver doesn't have to force the device to have the very latest flash, anything newer than 0.8 should do. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Reading TX queue indexes from the device memory on each interrupt is expensive. It's doubly expensive with XDP running since we have two TX rings to check there. If the software indexes indicate that the TX queue is completely empty, however, we don't need to look at the device completion index at all. The queuing CPU is doing a wmb() before kicking the device TX so we should be safe to assume on the CPU handling the completions will never see old value of the software copy of the index. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
On the RX path we follow the "drop if allocation of replacement buffer fails" rule. With XDP we extended that to the TX action, so if XDP prog returned TX but allocation of replacement RX buffer failed, we will drop the packet. To improve our XDP TX performance extend the idea of rings being always full to XDP TX rings. Pre-fill the XDP TX rings with RX buffers, and when XDP prog returns TX action swap the RX buffer with the next buffer from the TX ring. XDP TX complete will no longer free the buffers but let them sit on the TX ring and wait for swap with RX buffer, instead. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
We will soon allocate RX buffers for caching on XDP TX rings. The rx_ring parameter passed to nfp_net_rx_alloc_one() is not actually used, remove it. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
As Or points out in commit 423b3aec ("net/mlx4: Change ENOTSUPP to EOPNOTSUPP"), ENOTSUPP is NFS specific error. Replace it with EOPNOTSUPP. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Willem de Bruijn authored
Avoid hashing the tx napi struct into napi_hash[], which is used for busy polling receive queues. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Howells authored
Initialise init_net.count to 1 for its pointer from init_nsproxy lest someone tries to do a get_net() and a put_net() in a process in which current->ns_proxy->net_ns points to the initial network namespace. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Girish Moodalbail authored
Creating a geneve link with 'udpcsum' set results in a creation of link for which UDP checksum will NOT be computed on outbound packets, as can be seen below. 11: gen0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN link/ether c2:85:27:b6:b4:15 brd ff:ff:ff:ff:ff:ff promiscuity 0 geneve id 200 remote 192.168.13.1 dstport 6081 noudpcsum Similarly, creating a link with 'noudpcsum' set results in a creation of link for which UDP checksum will be computed on outbound packets. Fixes: 9b4437a5 ("geneve: Unify LWT and netdev handling.") Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Lance Richardson <lrichard@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Jiri Benc says: ==================== vxlan: do not error out on disabled IPv6 This patchset fixes a bug with metadata based tunnels when booted with ipv6.disable=1. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Benc authored
The message "Cannot bind port X, err=Y" creates only confusion. In metadata based mode, failure of IPv6 socket creation is okay if IPv6 is disabled and no error message should be printed. But when IPv6 tunnel was requested, such failure is fatal. The vxlan_socket_create does not know when the error is harmless and when it's not. Instead of passing such information down to vxlan_socket_create, remove the message completely. It's not useful. We propagate the error code up to the user space and the port number comes from the user space. There's nothing in the message that the process creating vxlan interface does not know. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Benc authored
When IPv6 is compiled but disabled at runtime, __vxlan_sock_add returns -EAFNOSUPPORT. For metadata based tunnels, this causes failure of the whole operation of bringing up the tunnel. Ignore failure of IPv6 socket creation for metadata based tunnels caused by IPv6 not being available. Fixes: b1be00a6 ("vxlan: support both IPv4 and IPv6 sockets in a single vxlan device") Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andy Shevchenko authored
Replace pattern int status; ... status = func(...); return status; by return func(...); No functional change intented. Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andy Shevchenko authored
Reuse bnx2x_null_format_ver() in functions where it's appropriated instead of open coded variant. Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andy Shevchenko authored
Use scnprintf() when printing version instead of custom open coded variants. Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Merge tag 'linux-can-next-for-4.12-20170427' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2017-04-25 this is a pull request of 1 patch for net-next/master. This patch by Oliver Hartkopp fixes the build of the broad cast manager with CONFIG_PROC_FS disabled. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Chenbo Feng authored
The description inside uapi/linux/bpf.h about bpf_get_socket_uid helper function is no longer valid. It returns overflowuid rather than 0 when failed. Signed-off-by: Chenbo Feng <fengc@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Davide Caratti authored
avoid direct access to sk->sk_state when tcp_poll() is called on a socket using active TCP fastopen with deferred connect. Use local variable 'state', which stores the result of sk_state_load(), like it was done in commit 00fd38d9 ("tcp: ensure proper barriers in lockless contexts"). Fixes: 19f6d3f3 ("net/tcp-fastopen: Add new API support") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
While testing a fix [1] in ___pskb_trim(), addressing the WARN_ON_ONCE() in skb_try_coalesce() reported by Andrey, I found that we had an skb with skb->sk set but no skb->destructor. This invalidated heuristic found in commit 158f323b ("net: adjust skb->truesize in pskb_expand_head()") and in cited patch. Considering the BUG_ON(skb->sk) we have in skb_orphan(), we should restrain the temporary setting to a minimal section. [1] https://patchwork.ozlabs.org/patch/755570/ net: adjust skb->truesize in ___pskb_trim() Fixes: 8f917bba ("bpf: pass sk to helper functions") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alexandre Belloni authored
Since 83a77e9e, the phydev irq is explicitly set to PHY_POLL when there is no pdata. It doesn't work on DT enabled platforms because the phydev irq is already set by libphy before. Fixes: 83a77e9e ("net: macb: Added PCI wrapper for Platform Driver.") Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 30 Apr, 2017 19 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queueDavid S. Miller authored
Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2017-04-30 This series contains updates to i40e and i40evf only. Jake provides majority of the changes in this series, starting with the renaming of a flag to avoid confusion. Then renamed a variable to a more meaningful name to clarify what is actually being done and to reduce confusion. Amortizes the wait time when initializing or disabling lots of VFs by using i40e_reset_all_vfs() and i40e_vsi_stop_rings_no_wait(). Cleaned up a unnecessary delay since pci_disable_sriov() already has its own delay, so need to add a additional delay when removing VFs. Avoid using the same name flags for both vsi->state and pf->state, to make code review easier and assist future work to use the correct state field when checking bits. Use DECLARE_BITMAP() to ensure that we always allocate enough space for flags. Replace hw_disabled_flags with the new _AUTO_DISABLED flags, which are more readable because we are not setting an *_ENABLED flag to disable the feature. Alex corrects a oversight where we were not reprogramming the ports after a reset, which was causing us to lose all of the receive tunnel offloads. Arnd Bergmann moves the declaration of a local variable to avoid a warning seen on architectures with larger pages about an unused variable. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queueDavid S. Miller authored
Jeff Kirsher says: ==================== 1GbE Intel Wired LAN Driver Updates 2017-04-30 This series contains updates to e1000e only. Jarod Wilson fixes an issue where the workaround for 82574 & 82583 is needed for i218 as well, so set the appropriate flags. Sasha adds support for the upcoming new i219 devices for the client platform (CannonLake), which includes the support for 38.4MHz frequency to support PTP on CannonLake. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sasha Neftin authored
Add support for 38.4MHz frequency is required for PTP on CannonLake. SYSTIM frequency adjustment attributes for TIMINCA are get/set dependent on the hardware clock frequency for a different types of adapters. 38.4MHz frequency supported by CannonLake and active once time synchronisation mechanism was enabled Changed abbreviation from Hz to HZ to be compliant checkpatch code style Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Reviewed-by: Raanan Avargil <raanan.avargil@intel.com> Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Sasha Neftin authored
The propagation of CannonLake mac type to driver functionality Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Reviewed-by: Raanan Avargil <raanan.avargil@intel.com> Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Sasha Neftin authored
i219 (6) and i219 (7) are the next LOM generations that will be available on the nextIntel Client platform (CannonLake) This patch provides the initial support for these devices Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Reviewed-by: Raanan Avargil <raanan.avargil@intel.com> Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Jarod Wilson authored
I've got reports that the Intel I-218V NIC in Intel NUC5i5RYH systems used as a PTP slave experiences random ~10 hour clock jumps, which are resolved if the same workaround for the 82574 and 82583 is employed, so set the appropriate flag2 in e1000_pch_lpt_info too. Reported-by: Rupesh Patel <rupatel@redhat.com> Signed-off-by: Jarod Wilson <jarod@redhat.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Arnd Bergmann authored
On architectures with larger pages, we get a warning about an unused variable: drivers/net/ethernet/intel/i40evf/i40evf_main.c: In function 'i40evf_configure_rx': drivers/net/ethernet/intel/i40evf/i40evf_main.c:690:21: error: unused variable 'netdev' [-Werror=unused-variable] This moves the declaration into the #ifdef to avoid the warning. Fixes: dab86afd ("i40e/i40evf: Change the way we limit the maximum frame size for Rx") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Jacob Keller authored
This matches the ordering of how we free stuff during reset and remove. It also makes logical sense because we set the interrupts based on the number of queues. Currently this doesn't really matter in practice. However a future patch moves the assignment of num_active_queues into i40evf_alloc_queues, which is required by i40evf_set_interrupt_capability. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Jacob Keller authored
The flag used by the common code and PF code is I40E_FLAG_FD_ATR_ENABLED, not *FDIR*. It turns out none of the txrx code actually shared with the VF driver actually checks the ATR flag. This is made even more obvious by the typo in the VF header file. Let's just remove the flag from the VF driver since it's not needed. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Jacob Keller authored
The hw_disabled_flags field was added as a way of signifying that a feature was automatically or temporarily disabled. However, we actually only use this for FDir features. Replace its use with new _AUTO_DISABLED flags instead. This is more readable, because you aren't setting an *_ENABLED flag to *disable* the feature. Additionally, clean up a few areas where we used these bits. First, we don't really need to set the auto-disable flag for ATR if we're fully disabling the feature via ethtool. Second, we should always clear the auto-disable bits in case they somehow got set when the feature was disabled. However, avoid displaying a message that we've re-enabled the feature. Third, we shouldn't be re-enabling ATR in the SB ntuple add flow, because it might have been disabled due to space constraints. Instead, we should just wait for the fdir_check_and_reenable to be called by the watchdog. Overall, this change allows us to simplify some code by removing an extra field we didn't need, and the result should make it more clear as to what we're actually doing with these flags. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Jacob Keller authored
We already set pairs to the value of adapter->num_active_queues. This value is limited by vsi_res->num_queue_pairs and num_online_cpus(). This means that pairs by definition is already smaller than num_online_cpus()*2, so we don't even need to bother with this check. Lets just remove it and update the comment. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Jacob Keller authored
Instead of assuming our flags fit within an unsigned long, use DECLARE_BITMAP which will ensure that we always allocate enough space. Additionally, use __I40E_STATE_SIZE__ markers as the last element of the enumeration so that the size of the BITMAP is compile-time assigned rather than programmer-time assigned. This ensures that potential future flag additions do not actually overrun the array. This is especially important as 32bit systems would only have 32bit longs instead of 64bit longs as we generally have assumed in the prior code. This change also removes a dereference of the state fields throughout the code, so it does have a bit of code churn. The conversions were automated using sed replacements with an alternation s/&(vsi->back|vsi|pf)->state/\1->state/ s/&adapter->vsi.state/adapter->vsi.state/ For debugfs, we modify the printing so that we can display chunks of the state value on new lines. This ensures that we can print the entire set of state values. Additionally, we now print them as 08lx to ensure that they display nicely. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Jacob Keller authored
Avoid using the same named flags for both vsi->state and pf->state. This makes code review easier, as it is more likely that future authors will use the correct state field when checking bits. Previous commits already found issues with at least one check, and possibly others may be incorrect. This reduces confusion as it is more clear what each flag represents, and which flags are valid for which state field. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Jacob Keller authored
The delay was added because of a desire to ensure that the VF driver can finish up removing. However, pci_disable_sriov already has its own ssleep() call that will sleep for an entire second, so there is no reason to add extra delay on top of this by using msleep here. In practice, an msleep() won't have a huge impact on timing but there is no real value in keeping it, so lets just simplify the code and remove it. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Jacob Keller authored
Just as we do in i40e_reset_all_vfs, save some time when freeing VFs by amortizing the wait time for stopping queues. We can use i40e_vsi_stop_rings_no_wait() to begin the process of stopping all the VF rings at once. Then, once we've started the process on each VF we can begin waiting for the VFs to stop. This helps reduce the total wait time by a large factor. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Alexander Duyck authored
This patch corrects a major oversight in that we were not reprogramming the ports after a reset. As a result we completely lost all of the Rx tunnel offloads on receive including Rx checksum, RSS on inner headers, and ATR. The fix for this is pretty standard as all we needed to do is reset the filter bits to pending for all active filters and schedule the sync event. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Jacob Keller authored
The .index field of i40e_udp_port_config represents the udp port number. Rename this variable to port so that it is more obvious. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Jacob Keller authored
When allocating a large number of VFs, the driver previously used i40e_reset_vf in a sequence. Just as when performing a normal reset, this accumulates a large amount of delay for handling all of the VFs in sequence. This delay is mainly due to a hardware requirement to wait after initiating a reset on the VF. We recently added a new function, i40e_reset_all_vfs() which can be used to amortize the delay time, by first triggering all VF resets, then waiting once, and finally cleaning up and allocating the VFs. This is almost as good as truly running the resets in parallel. In order to avoid sending a spurious reset message to a client interface, we have a check to see whether we've assigned pf->num_alloc_vfs yet. This was originally intended as a way to distinguish the "initialization" case from the regular reset case. Unfortunately, this means that we can't directly use i40e_reset_all_vfs yet. Lets avoid this check of pf->num_alloc_vfs by replacing it with a proper VSI state bit which we can use instead. This makes the intention much clearer and allows us to re-use the i40e_reset_all_vfs function directly. Change-ID: I694279b37eb6b5a91b6670182d0c15d10244fd6e Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Jacob Keller authored
These flags represent the state of the VF at various times. Do not spell them as _STAT_ which can be confusing to readers who may think these refer to statistics. Change-ID: I6bc092cd472e8276896a1fd7498aced2084312df Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-