- 27 Feb, 2018 40 commits
-
-
Petr Machata authored
Similarly to mirror-to-gretap, this enables mirroring to IPv6 gretap netdevice. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
When a user requests mirror from a mlxsw physical port (possibly based on an ACL match) to a gretap netdevice, the driver needs to resolve the request to a particular physical port that the mirrored packets will egress through, and a suite of configuration keys (importantly, IP and MAC addresses). That means calling into routing and neighbor kernel code to simulate the decisions made by the system for packets passing through a gretap netdevice. Add a new instance of mlxsw_sp_span_entry_ops to support this. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
The check for whether a mirror port (which is a mlxsw front panel port) belongs to the same mlxsw instance as the mirrored port, is currently only done in spectrum_acl, even though it's applicable for the matchall case as well. Thus move it to mlxsw_sp_span_entry_create(). Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
For some netdevices, for which mlxsw offloads mirroring, may have a complex relationship between the declared intent and low-level device configuration. Trying to accurately track which changes might influence offloading decisions is finicky and error-prone. Instead, this patch introduces a function mlxsw_sp_span_entry_respin, which re-queries the configuration anew and, if different, removes the existing offloads and installs new ones. Call this function strategically at event handlers that might influence the mirroring configuration. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
To support mirroring to different device types, the functions that partake in configuring the port analyzer need to be extended to admit non-trivial SPAN types. Create a structure where all details of SPAN configuration are kept, struct mlxsw_sp_span_parms. Also create struct mlxsw_sp_span_entry_ops to keep per-SPAN-type operations. Instantiate the latter once for MLXSW_REG_MPAT_SPAN_TYPE_LOCAL_ETH, and once for a suite of NOP callbacks used for invalidated SPAN entry. Put the formet as a sole member of a new array mlxsw_sp_span_entry_types, where all known SPAN types are kept. Introduce a new function, mlxsw_sp_span_entry_ops(), to look up the right ops suite given a netdevice. Change mlxsw_sp_span_mirror_add() to use both parms and ops structures. Change mlxsw_sp_span_entry_get() and mlxsw_sp_span_entry_create() to take these as arguments. Modify mlxsw_sp_span_entry_configure() and mlxsw_sp_span_entry_deconfigure() to dispatch to ops. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
Currently the only mirror action supported by mlxsw is mirror to another mlxsw physical port. Correspondingly, span_entry, which tracks each mlxsw mirror in the system, currently holds a u8 number of the destination port. To extend this system to mirror to gretap and ip6gretap netdevices, have struct mlxsw_sp_span_entry actually hold the destination netdevice itself. This change then trickles down in obvious manner to SPAN module API and mirror-related interfaces in struct mlxsw_afa_ops. To prevent use of invalid pointer, NETDEV_UNREGISTER needs to be hooked and the corresponding SPAN entry invalidated. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
Configuring the hardware for encapsulated SPAN involves more code than the simple mirroring case. Extract the related code to a separate function to separate it from the rest of SPAN entry creation. Extract deconfigure as well for symmetry, even though disablement is the same regardless of SPAN type. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
It is known statically ahead of time which SPAN entry will have which ID. Just initialize it eagerly in mlxsw_sp_span_init(), don't wait until the entry is actually created. This simplifies some code in mlxsw_sp_span_entry_create() Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
Instead of removing span_entry by the port number, allow removing by SPAN id. That simplifies some code right here, and for mirroring to soft netdevices, avoids problems with netdevice pointer invalidation and reuse. Rename mlxsw_sp_span_entry_find() to mlxsw_sp_span_entry_find_by_port() and keep it--follow-up patches will make use of it. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
To support encapsulated SPAN, extend mlxsw_reg_mpat_pack() with a field to set the SPAN type. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
MPAT Register is used to query and configure the Switch Port Analyzer Table. To configure Port Analyzer to encapsulate mirrored packets, additional fields need to be specified for the MPAT register. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
Initializing struct flowi4 is useful for drivers that need to emulate routing decisions made by a tunnel interface. Publish the function (appropriately renamed) so that the drivers in question don't need to cut'n'paste it around. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
Determining whether a device is a GRE device is easily done by inspecting struct net_device.type. However, for the tap variants, the type is just ARPHRD_ETHER. Therefore introduce two predicate functions that use netdev_ops to tell the tap devices. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
To support mirroring to ip6gretap, the SPAN module needs to be able to decode IPv6 addresses specified at that tunnel. Extend mlxsw_sp_ipip_netdev_saddr() and mlxsw_sp_ipip_netdev_daddr() to support IPv6 addresses. To that end, add and publish a support function mlxsw_sp_ipip_netdev_parms6(). Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
Extract the logic for determining whether a given IPv4/IPv6 address is all-zeroes from mlxsw_sp_ipip_tunnel_complete to a separate function. Make that function public within the module. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Thomas Falcon says: ==================== ibmvnic: Miscellaneous driver fixes and enhancements There is not a general theme to this patch set other than that it fixes a few issues with the ibmvnic driver. I will just give a quick summary of what each patch does here. "ibmvnic: Fix TX descriptor tracking again" resolves a race condition introduced in an earlier fix to track outstanding transmit descriptors. This condition can throw off the tracking counter to the point that a transmit queue will halt forever. "ibmvnic: Allocate statistics buffers during probe" allocates queue statistics buffers on device probe to avoid a crash when accessing statistics of an unopened interface. "ibmvnic: Harden TX/RX pool cleaning" includes additional checks to avoid a bad access when cleaning RX and TX buffer pools during a device reset. "ibmvnic: Report queue stops and restarts as debug output" changes TX queue state notifications from informational to debug messages. This information is not necessarily useful to a user and under load can result in a lot of log output. "ibmvnic: Do not attempt to login if RX or TX queues are not allocated" checks that device queues have been allocated successfully before attempting device login. This resolves a panic that could occur if a user attempted to configure a device after a failed reset. Thanks for your attention. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Thomas Falcon authored
If a device reset fails for some reason, TX and RX queue resources could be released. If a user attempts to open the device in this scenario, it may result in a kernel panic as the driver tries to access this memory. To fix this, include a check before device login that TX/RX queues are still there before enabling the device. In addition, return a value that can be checked in case of any errors to avoid waiting for a completion that will never come. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Thomas Falcon authored
It's not necessary to report each time a queue is stopped and restarted as an informational message. Change that to be a debug message so that it can be observed if needed but not printed by default. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Thomas Falcon authored
If the driver releases resources after a failed reset or some other error, the driver might attempt to clean up and free memory that isn't there anymore. Include some additional checks that RX/TX queues along with their associated structures are still there before cleaning. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Thomas Falcon authored
Currently, buffers holding individual queue statistics are allocated when the device is opened. If an ibmvnic interface is hotplugged or initialized but never opened, an attempt to get statistics with ethtool will result in a kernel panic. Since the driver allocates a constant number, the maximum supported queues, of buffers, these can be allocated during device probe and freed when the device is hot-unplugged or the module is removed. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Thomas Falcon authored
Sorry, the previous change introduced a race condition between transmit completion processing and tracking TX descriptors. If a completion is received before the number of descriptors is logged, the number of descriptors will be add but not removed. After enough times, this could halt the transmit queue forever. Log the number of descriptors used by a transmit before sending. I stress tested the fix on two different systems running over the weekend without any issues. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Niklas Cassel says: ==================== stmmac barrier fixes and cleanup ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Niklas Cassel authored
Make dwmac4_release_tx_desc() clear all descriptor fields, not just TDES2 and TDES3. I'm suspecting that TDES0 and TDES1 wasn't cleared because the DMA engine uses them to store the tx hardware timestamp (if PTP is enabled). However, stmmac_tx_clean() calls stmmac_get_tx_hwtstamp(), which reads and saves the timestamp, before it calls release_tx_desc(), so this is not an issue. stmmac_xmit() and stmmac_tso_xmit() both always overwrite TDES0, however, stmmac_tso_xmit() sometimes sets TDES1, and since neither stmmac_xmit() nor stmmac_tso_xmit() explicitly clears TDES1, both functions might reuse a DMA descriptor with old TDES1 data. I haven't observed any misbehavior even though TDES1 sometimes point to an old skb, however, explicitly clearing both TDES0 and TDES1 in dwmac4_release_tx_desc() minimizes the chances of undefined behavior. Signed-off-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Niklas Cassel authored
According to Documentation/memory-barriers.txt, we need to use a dma_rmb() after reading the status/own bit, to ensure that all descriptor fields are read after reading the own bit. This way, we ensure that the DMA engine is done with the DMA descriptor before we read the other descriptor fields, e.g. reading the tx hardware timestamp (if PTP is enabled). Signed-off-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Niklas Cassel authored
The last memory barrier in stmmac_xmit()/stmmac_tso_xmit() is placed between a coherent memory write and a MMIO write: The own bit is written in First Desc (TSO: MSS desc or First Desc). <barrier> The DMA engine is started by a write to the tx desc tail pointer/ enable dma transmission register, i.e. a MMIO write. This barrier cannot be a simple dma_wmb(), since a dma_wmb() is only used to guarantee the ordering, with respect to other writes, to cache coherent DMA memory. To guarantee that the cache coherent memory writes have completed before we attempt to write to the cache incoherent MMIO region, we need to use the more heavyweight barrier wmb(). Signed-off-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Niklas Cassel authored
A dma_wmb() is used to guarantee the ordering, with respect to other writes, to cache coherent DMA memory. There is a dma_wmb() in prepare_tx_desc()/prepare_tso_tx_desc() which ensures that TDES0/1/2 is written before TDES3 (which contains the own bit), for First Desc. However, in the rare case that MSS changes, there will be a MSS context descriptor in front of the regular DMA descriptors: <MSS desc> <- DMA Next Descriptor <First Desc> <desc n> <Last Desc> Thus, for this special case, we need a dma_wmb() after prepare_tso_tx_desc()/before writing the own bit to the MSS desc, so that we flush the write to TDES3 for First Desc, in order to ensure that the MSS descriptor is the last descriptor to set the own bit. Signed-off-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Sowmini Varadhan says: ==================== RDS: optimized notification for zerocopy completion Resending with acked-by additions: previous attempt does not show up in Patchwork. This time with a new mail Message-Id. RDS applications use predominantly request-response, transacation based IPC, so that ingress and egress traffic are well-balanced, and it is possible/desirable to reduce system-call overhead by piggybacking the notifications for zerocopy completion response with data. Moreover, it has been pointed out that socket functions block if sk_err is non-zero, thus if the RDS code does not plan/need to use sk_error_queue path for completion notification, it is preferable to remove the sk_errror_queue related paths in RDS. Both of these goals are implemented in this series. v2: removed sk_error_queue support v3: incorporated additional code review comments (details in each patch) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sowmini Varadhan authored
PF_RDS sockets pass up cookies for zerocopy completion as ancillary data. Update msg_zerocopy to reap this information. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sowmini Varadhan authored
This commit is an optimization over commit 01883eda ("rds: support for zcopy completion notification") for PF_RDS sockets. RDS applications are predominantly request-response transactions, so it is more efficient to reduce the number of system calls and have zerocopy completion notification delivered as ancillary data on the POLLIN channel. Cookies are passed up as ancillary data (at level SOL_RDS) in a struct rds_zcopy_cookies when the returned value of recvmsg() is greater than, or equal to, 0. A max of RDS_MAX_ZCOOKIES may be passed with each message. This commit removes support for zerocopy completion notification on MSG_ERRQUEUE for PF_RDS sockets. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sowmini Varadhan authored
In preparation for optimized reception of zerocopy completion, revert the Rx side changes introduced by Commit dfb8434b ("selftests/net: add zerocopy support for PF_RDS test case") Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queueDavid S. Miller authored
Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2018-02-26 This series contains updates to i40e and i40evf only. Mariusz adds a new ethtool private flag for forcing true link state with the requested changes from Jakub Kicinski. Paweł fixes an issue where we were double locking the same resource which would generate a kernel panic after bringing an interface up for i40evf. Alan modifies both drivers to use software values to determine if there are packets stalled on the ring with the added benefit of being less CPU intensive since we do not need to reach into the hardware to get the values. Colin Ian King provides a few fixes detected by Coverity, first was to pass a struct by reference versus by value to be more efficient. Then verify the VSI pointer is not NULL before trying to dereference it. Cleaned up redundant checks that always return true. Dan Carpenter fixes over indented lines of code. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Heiner Kallweit authored
This patch improves few aspects of interrupt handling: - update to current interrupt allocation API (use pci_alloc_irq_vectors() instead of deprecated pci_enable_msi()) - this implicitly will allocate a MSI-X interrupt if available - get rid of flag RTL_FEATURE_MSI - remove some dead code, intentionally disabling (unreliable) MSI being partially available on old PCI chips. The patch works fine on a RTL8168evl (chip version 34) and on a RTL8169SB (chip version 04). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Ahern authored
Fixes: 153e1b84 ("selftests: Add FIB onlink tests") Reported-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Madalin Bucur says: ==================== DPAA Ethernet fixes Fixed an issue on the Tx path that was visible in netperf TCP_SENDFILE tests. Addressed another issue with Rx errors not being always counted. Adding control for allmulti. v2: rephrased commit message, reduced changes in the SG mapping fix ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Radu Bulie authored
This patch adds allmulticast option for memac, dtsec and 10GEC controllers. Signed-off-by: Radu Bulie <radu-andrei.bulie@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Madalin Bucur authored
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Madalin Bucur authored
Simplify the code and avoid some Rx errors not being accounted. Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Madalin Bucur authored
An issue in the code mapping the skb fragments into scatter-gather frames was evidentiated by netperf TCP_SENDFILE tests. The size was set wrong for all fragments but the first, affecting the transmission of any skb with more than one fragment. Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Merge branch 'ieee802154-for-davem-2018-02-26' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next Stefan Schmidt says: ==================== pull-request: ieee802154-next 2018-02-26 An update from ieee802154 for *net-next* Alexander corrected a setting which got lost during some 6lowpan rework a while back and Xue Liu provided us with a new driver for the MCR20A transceiver. If there are any issues let me know. If not, please pull. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Kirill Tkhai says: ==================== Converting pernet_operations (part #3) This patchset continues to review and to convert pernet_operations to async. Where it is possible, they are grouped by type of actions init/exit methods ([1/28], for example). I hope this will make the review a little bit easier. The changes are tree-wide: in net, fs, drivers and security. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-