- 21 Nov, 2020 25 commits
-
-
Yonglong Liu authored
For DEVICE_VERSION_V1/2, there are total 1024 queues and queue sets. For DEVICE_VERSION_V3, it increases to 1280, and can be assigned to one pf, so remove the limitation of 1024. To keep compatible with DEVICE_VERSION_V1/2 and old driver version, the queue number is split into two part: tqp_num(range 0~1023) and ext_tqp_num(range 1024~1279). Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Thomas Falcon says: ==================== ibmvnic: Performance improvements and other updates The first three patches utilize a hypervisor call allowing multiple TX and RX buffer replenishment descriptors to be sent in one operation, which significantly reduces hypervisor call overhead. The xmit_more and Byte Queue Limit API's are leveraged to provide this support for TX descriptors. The subsequent two patches remove superfluous code and members in TX completion handling function and TX buffer structure, respectively, and remove unused routines. Finally, four patches which ensure that device queue memory is cache-line aligned, resolving slowdowns observed in PCI traces, as well as optimize the driver's NAPI polling function and to RX buffer replenishment are provided by Dwip Banerjee. This series provides significant performance improvements, allowing the driver to fully utilize 100Gb NIC's. ==================== Link: https://lore.kernel.org/r/1605748345-32062-1-git-send-email-tlfalcon@linux.ibm.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Dwip N. Banerjee authored
Reduce the amount of time spent replenishing RX buffers by only doing so once available buffers has fallen under a certain threshold, in this case half of the total number of buffers, or if the polling loop exits before the packets processed is less than its budget. Non-exhaustion of NAPI budget implies lower incoming packet pressure, allowing the leeway to refill the buffers in preparation for any impending burst. Signed-off-by: Dwip N. Banerjee <dnbanerg@us.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Dwip N. Banerjee authored
Take advantage of the additional optimizations in netdev_alloc_skb when allocating socket buffers to be used for packet reception. Signed-off-by: Dwip N. Banerjee <dnbanerg@us.ibm.com> Acked-by: Lijun Pan <ljp@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Dwip N. Banerjee authored
If the current NAPI polling loop exits without completing it's budget, only re-enable interrupts if there are no entries remaining in the queue and napi_complete_done is successful. If there are entries remaining on the queue that were missed, restart the polling loop. Signed-off-by: Dwip N. Banerjee <dnbanerg@us.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Dwip N. Banerjee authored
PCI bus slowdowns were observed on IBM VNIC devices as a result of partial cache line writes and non-cache aligned full cache line writes. Ensure that packet data buffers are cache-line aligned to avoid these slowdowns. Signed-off-by: Dwip N. Banerjee <dnbanerg@us.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Thomas Falcon authored
It is not longer used, so remove it. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Acked-by: Lijun Pan <ljp@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Thomas Falcon authored
Remove unused and superfluous code and members in existing TX implementation and data structures. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Thomas Falcon authored
Include support for the xmit_more feature utilizing the H_SEND_SUB_CRQ_INDIRECT hypervisor call which allows the sending of multiple subordinate Command Response Queue descriptors in one hypervisor call via a DMA-mapped buffer. This update reduces hypervisor calls and thus hypervisor call overhead per TX descriptor. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Thomas Falcon authored
Utilize the H_SEND_SUB_CRQ_INDIRECT hypervisor call to send multiple RX buffer descriptors to the device in one hypervisor call operation. This change will reduce the number of hypervisor calls and thus hypervisor call overhead needed to transmit RX buffer descriptors to the device. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Thomas Falcon authored
This patch introduces the infrastructure to send batched subordinate Command Response Queue descriptors, which are used by the ibmvnic driver to send TX frame and RX buffer descriptors. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Acked-by: Lijun Pan <ljp@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Alex Elder says: ==================== net: ipa: add a driver shutdown callback The final patch in this series adds a driver shutdown callback for the IPA driver. The patches leading up to that address some issues encountered while ensuring that callback worked as expected: - The first just reports a little more information when channels or event rings are in unexpected states - The second patch recognizes a condition where an as-yet-unused channel does not require a reset during teardown - The third patch explicitly ignores a certain error condition, because it can't be avoided, and is harmless if it occurs - The fourth properly handles requests to retry a channel HALT request - The fifth makes a second attempt to stop modem activity during shutdown if it's busy The shutdown callback is implemented by calling the existing remove callback function (reporting if that returns an error). ==================== Link: https://lore.kernel.org/r/20201119224929.23819-1-elder@linaro.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
A system shutdown can happen at essentially any time, and it's possible that the IPA driver is busy when a shutdown is underway. IPA hardware accesses IMEM and SMEM memory regions using an IOMMU, and at some point during shutdown, needed I/O mappings could become invalid. This could be disastrous for any "in flight" IPA activity. Avoid this by defining a new driver shutdown callback that stops all IPA activity and cleanly shuts down the driver. It merely calls the driver's existing remove callback, reporting the error if it returns one. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
The IPA driver remove callback, ipa_remove(), calls ipa_modem_stop() if the setup stage of initialization is complete. If a concurrent call to ipa_modem_start() or ipa_modem_stop() has begin but not completed, ipa_modem_stop() can return an error (-EBUSY). The next patch adds a driver shutdown callback, which will simply call ipa_remove(). We really want our shutdown callback to clean things up. So add a single retry to the ipa_modem_stop() call in ipa_remove() after a short (millisecond) delay. This offers no guarantee the shutdown will complete successfully, but we'll at least try a little harder before giving up. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
When stopping an AP RX channel, there can be a transient period while the channel enters STOP_IN_PROC state before reaching the final STOPPED state. In that case we make another attempt to stop the channel. Similarly, when stopping a modem channel (using a GSI generic command issued from the AP), it's possible that multiple attempts will be required before the channel reaches STOPPED state. Add a field to the GSI structure to record an errno representing the result code provided when a generic command completes. If the result learned in gsi_isr_gp_int1() is RETRY, record -EAGAIN in the result code, otherwise record 0 for success, or -EIO for any other result. If we time out nf gsi_generic_command() waiting for the command to complete, return -ETIMEDOUT (as before). Otherwise return the result stashed by gsi_isr_gp_int1(). Add a loop in gsi_modem_channel_halt() to reissue the HALT command if the result code indicates -EAGAIN. Limit this to 10 retries (after the initial attempt). Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
IPA v4.2 has a hardware quirk that requires the AP to allocate GSI channels for the modem to use. It is recommended that these modem channels get stopped (with a HALT generic command) by the AP when its IPA driver gets removed. The AP has no way of knowing the current state of a modem channel. So when the IPA driver issues a HALT command it's possible the channel is not running, and in that case we get an error indication. This error simply means we didn't need to stop the channel, so we can ignore it. This patch adds an explanation for this situation, and arranges for this condition to *not* report an error message. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
If the rmnet_ipa0 network device has not been opened at the time we remove or shut down the IPA driver, its underlying TX and RX GSI channels will not have been started, and they will still be in ALLOCATED state. The RESET command on a channel is meant to return a channel to ALLOCATED state after it's been stopped. But if it was never started, its state will still be ALLOCATED, the RESET command is not required. Quietly skip doing the reset without printing an error message if a channel is already in ALLOCATED state when we request it be reset. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
When a GSI command is used to change the state of a channel or event ring we check the state before and after the command to ensure it is as expected. If not, we print an error message, but it does not include the channel or event ring id, and it easily can. Add the channel or event ring id to these error messages. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Alex Elder says: ==================== net: ipa: platform-specific clock and interconnect rates This series changes the way the IPA core clock rate and the bandwidth parameters for interconnects are specified. Previously these were specified with hard-wired constants, with the same values used for the SDM845 and SC7180 platforms. Now these parameters are recorded in platform-specific configuration data. For the SC7180 this means we use an all-new core clock rate and interconnect parameters. Additionally, while developing this I learned that the average bandwidth setting for two of the interconnects is ignored (on both platforms). Zero is now used explicitly as that unused bandwidth value. This means the SDM845 bandwidth settings are also changed by this series. ==================== Link: https://lore.kernel.org/r/20201119224041.16066-1-elder@linaro.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
Stop assuming a fixed IPA core clock rate and interconnect bandwidths. Use the configuration data defined for these things instead. Get rid of the previously-used constants. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
Populate the core clock rate and interconnect average and peak bandwidth data for SDM845 and SC7180 in their configuration data files. At this point we still don't *use* this data. Note that SC7180 actually defines a new core clock rate (100 MHz instead of 75 MHz) and new interconnect bandwidth values. They will be activated in the next commit, which uses the configured values rather than the fixed constants. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
Define a new type of configuration data, used to initialize the IPA core clock and interconnects. This is the first of three patches, and defines the data types and interface but doesn't yet use them. Switch the return value if there is no matching configuration data to ENODEV instead of ENOTSUPP (to avoid using the nonstandard errno). Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Grygorii Strashko authored
The mdio_bus may have dependencies from GPIO controller and so got deferred. Now it will print error message every time -EPROBE_DEFER is returned which from: __mdiobus_register() |-devm_gpiod_get_optional() without actually identifying error code. "mdio_bus 4a101000.mdio: mii_bus 4a101000.mdio couldn't get reset GPIO" Hence, suppress error message for devm_gpiod_get_optional() returning -EPROBE_DEFER case by using dev_err_probe(). Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Link: https://lore.kernel.org/r/20201119203446.20857-1-grygorii.strashko@ti.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Heiner Kallweit authored
Tiny improvement, let dev_err_probe() deal with EPROBE_DEFER. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/b0c4ebcf-2047-e933-b890-8a20e4bdb19f@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Heiner Kallweit authored
Some chip versions have a hw bug resulting in lost door bell rings. To work around this the doorbell is also rung whenever we still have tx descriptors in flight after having cleaned up tx descriptors. These PCI(e) writes come at a cost, therefore let's reduce the number of extra doorbell rings. If skb is NULL then this means: - last cleaned-up descriptor belongs to a skb with at least one fragment and last fragment isn't marked as sent yet - hw is in progress sending the skb, therefore no extra doorbell ring is needed for this skb - once last fragment is marked as transmitted hw will trigger a tx done interrupt and we come here again (with skb != NULL) and ring the doorbell if needed Therefore skip the workaround doorbell ring if skb is NULL. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/0a15a83c-aecf-ab51-8071-b29d9dcd529a@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 20 Nov, 2020 15 commits
-
-
Jakub Kicinski authored
Mat Martineau says: ==================== mptcp: More miscellaneous MPTCP fixes Here's another batch of fixup and enhancement patches that we have collected in the MPTCP tree. Patch 1 removes an unnecessary flag and related code. Patch 2 fixes a bug encountered when closing fallback sockets. Patches 3 and 4 choose a better transmit subflow, with a self test. Patch 5 adjusts tracking of unaccepted subflows Patches 6-8 improve handling of long ADD_ADDR options, with a test. Patch 9 more reliably tracks the MPTCP-level window shared with peers. Patch 10 sends MPTCP-level acknowledgements more aggressively, so the peer can send more data without extra delay. ==================== Link: https://lore.kernel.org/r/20201119194603.103158-1-mathew.j.martineau@linux.intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Paolo Abeni authored
Send timely MPTCP-level ack is somewhat difficult when the insertion into the msk receive level is performed by the worker. It needs TCP-level dup-ack to notify the MPTCP-level ack_seq increase, as both the TCP-level ack seq and the rcv window are unchanged. We can actually avoid processing incoming data with the worker, and let the subflow or recevmsg() send ack as needed. When recvmsg() moves the skbs inside the msk receive queue, the msk space is still unchanged, so tcp_cleanup_rbuf() could end-up skipping TCP-level ack generation. Anyway, when __mptcp_move_skbs() is invoked, a known amount of bytes is going to be consumed soon: we update rcv wnd computation taking them in account. Additionally we need to explicitly trigger tcp_cleanup_rbuf() when recvmsg() consumes a significant amount of the receive buffer. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Florian Westphal authored
OoO handling attempts to detect when packet is out-of-window by testing current ack sequence and remaining space vs. sequence number. This doesn't work reliably. Store the highest allowed sequence number that we've announced and use it to detect oow packets. Do this when mptcp options get written to the packet (wire format). For this to work we need to move the write_options call until after stack selected a new tcp window. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Geliang Tang authored
This patch added IPv6 support for do_transfer, and the test cases for ADD_ADDR IPv6. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Geliang Tang authored
When ADD_ADDR suboption includes an IPv6 address, the size is 28 octets. It will not fit when other MPTCP suboptions are included in this packet, e.g. DSS. So here we send out an ADD_ADDR dedicated packet to carry only ADD_ADDR suboption, no other MPTCP suboptions. In mptcp_pm_announce_addr, we check whether this is an IPv6 ADD_ADDR. If it is, we set the flag MPTCP_ADD_ADDR_IPV6 to true. Then we call mptcp_pm_nl_add_addr_send_ack to sent out a new pure ACK packet. In mptcp_established_options_add_addr, we check whether this is a pure ACK packet for ADD_ADDR. If it is, we drop all other MPTCP suboptions in this packet, only put ADD_ADDR suboption in it. Suggested-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Geliang Tang authored
This patch changed the 'add_addr_signal' type from bool to char, so that we could encode the addr type there. Suggested-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Paolo Abeni authored
This will simplify all operation dealing with subflows before accept time (e.g. data fin processing, add_addr). The join list is already flushed by mptcp_stream_accept() before returning the newly created msk to the user space. This also fixes an potential bug present into the old code: conn_list was manipulated without helding the msk lock in mptcp_stream_accept(). Tested-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Florian Westphal authored
Add a test case where a link fails with multiple subflows. The expectation is that MPTCP will transmit any data that could not be delivered via the failed link on another subflow. Co-developed-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Florian Westphal authored
In case a subflow path is blocked, MPTCP-level retransmit may not take place anymore because such subflow is likely to have unacked data left in its write queue. Ignore subflows that have experienced loss and test next candidate. Fixes: 3b1d6210 ("mptcp: implement and use MPTCP-level retransmission") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Paolo Abeni authored
We need to cope with some more state transition for fallback sockets, or could still end-up moving to TCP_CLOSE too early and avoid spooling some pending data Fixes: e16163b6 ("mptcp: refactor shutdown and close") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Paolo Abeni authored
Only mptcp_close() can actually cancel the workqueue, no need to add and use this flag. Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Ido Schimmel says: ==================== mlxsw: Add support for nexthop objects This patch set adds support for nexthop objects in mlxsw. Nexthop objects are treated as another front-end for programming nexthops, in addition to the existing IPv4 and IPv6 front-ends. Patch #1 registers a listener to the nexthop notification chain and parses the nexthop information into the existing mlxsw data structures that are already used by the IPv4 and IPv6 front-ends. Blackhole nexthops are currently rejected. Support will be added in a follow-up patch set. Patch #2 extends mlxsw to resolve its internal nexthop objects from the nexthop identifier encoded in the FIB info of the notified routes. Patch #3 finally removes the limitation of rejecting routes that use nexthop objects. Patch #4 adds a selftest. Patches #5-#8 add generic forwarding selftests that can be used with veth pairs or physical loopbacks. ==================== Link: https://lore.kernel.org/r/20201119130848.407918-1-idosch@idosch.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Ido Schimmel authored
Add a nexthop objects version of gre_multipath.sh. Unlike the original test, it also tests IPv6 overlay which is not possible with the legacy nexthop implementation. See commit 9a2ad362 ("selftests: forwarding: gre_multipath: Drop IPv6 tests") for more info. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Ido Schimmel authored
In a similar fashion to router_multipath.sh and its nexthop objects version router_mpath_nh.sh, create a nexthop objects version of router.sh. It reuses the same topology, but uses device-only nexthop objects instead of legacy ones. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Ido Schimmel authored
In addition to IPv4 multipath tests with IPv4 nexthops, also test IPv4 multipath with nexthops that use IPv6 link-local addresses. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-