- 16 Apr, 2021 21 commits
-
-
Florian Westphal authored
Similar to PRIORITY/KEEPALIVE: needs to be mirrored to all subflows. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Westphal authored
Similar to previous patch: needs to be mirrored to all subflows. Device bind is simpler: it is only done on the initial (listener) sk. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Westphal authored
start with something simple: both take an integer value, both need to be mirrored to all subflows. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Westphal authored
Paolo Abeni suggested to avoid re-syncing new subflows because they inherit options from listener. In case options were set on listener but are not set on mptcp-socket there is no need to do any synchronisation for new subflows. This change sets sockopt_seq of new mptcp sockets to the seq of the mptcp listener sock. Subflow sequence is set to the embedded tcp listener sk. Add a comment explaing why sk_state is involved in sockopt_seq generation. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Westphal authored
Handle following cases: 1. setsockopt is called with multiple subflows. Change might have to be mirrored to all of them. This is done directly in process context/setsockopt call. 2. Outgoing subflow is created after one or several setsockopt() calls have been made. Old setsockopt changes should be synced to the new socket. 3. Incoming subflow, after setsockopt call(s). Cases 2 and 3 are handled right after the join list is spliced to the conn list. Not all sockopt values can be just be copied by value, some require helper calls. Those can acquire socket lock (which can sleep). If the join->conn list splicing is done from preemptible context, synchronization can be done right away, otherwise its deferred to work queue. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
Unrolling mcast state at msk dismantel time is bug prone, as syzkaller reported: ====================================================== WARNING: possible circular locking dependency detected 5.11.0-syzkaller #0 Not tainted ------------------------------------------------------ syz-executor905/8822 is trying to acquire lock: ffffffff8d678fe8 (rtnl_mutex){+.+.}-{3:3}, at: ipv6_sock_mc_close+0xd7/0x110 net/ipv6/mcast.c:323 but task is already holding lock: ffff888024390120 (sk_lock-AF_INET6){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1600 [inline] ffff888024390120 (sk_lock-AF_INET6){+.+.}-{0:0}, at: mptcp6_release+0x57/0x130 net/mptcp/protocol.c:3507 which lock already depends on the new lock. Instead we can simply forbid any mcast-related setsockopt. Let's do the same with all other non supported sockopts. Fixes: 717e79c8 ("mptcp: Add setsockopt()/getsockopt() socket operations") Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
The MPTCP sockopt implementation is going to be much more big and complex soon. Let's move it to a different source file. No functional change intended. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Matthieu Baerts authored
This change reverts commit 86581852 ("mptcp: forbit mcast-related sockopt on MPTCP sockets"). As announced in the cover letter of the mentioned patch above, the following commits introduce a larger MPTCP sockopt implementation refactor. This time, we switch from a blocklist to an allowlist. This is safer for the future where new sockoptions could be added while not being fully supported with MPTCP sockets and thus causing unstabilities. Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Gatis Peisenieks authored
Tx queue cleanup happens in interrupt handler on same core as rx queue processing. Both can take considerable amount of processing in high packet-per-second scenarios. Sending big amounts of packets can stall the rx processing which is unfair and also can lead to out-of-memory condition since __dev_kfree_skb_irq queues the skbs for later kfree in softirq which is not allowed to happen with heavy load in interrupt handler. This puts tx cleanup in its own napi and enables threaded napi to allow the rx/tx queue processing to happen on different cores. The ability to sustain equal amounts of tx/rx traffic increased: from 280Kpps to 1130Kpps on Threadripper 3960X with upcoming Mikrotik 10/25G NIC, from 520Kpps to 850Kpps on Intel i3-3320 with Mikrotik RB44Ge adapter. Signed-off-by: Gatis Peisenieks <gatis@mikrotik.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Vladimir Oltean says: ==================== Pass the BR_FDB_LOCAL information to switchdev drivers Bridge FDB entries with the is_local flag are entries which are terminated locally and not forwarded. Switchdev drivers might want to be notified of these addresses so they can trap them. If they don't program these entries to hardware, there is no guarantee that they will do the right thing with these entries, and they won't be, let's say, flooded. Ideally none of the switchdev drivers should ignore these entries, but having access to the is_local bit is the bare minimum change that should be done in the bridge layer, before this is even possible. These 2 changes are extracted from the larger "RX filtering in DSA" series: https://patchwork.kernel.org/project/netdevbpf/patch/20210224114350.2791260-8-olteanv@gmail.com/ https://patchwork.kernel.org/project/netdevbpf/patch/20210224114350.2791260-9-olteanv@gmail.com/ and submitted separately, because they touch all switchdev drivers, while the rest is mostly specific to DSA. This change is not a functional one, in the sense that everybody still ignores the local FDB entries, but this will be changed by further patches at least for DSA. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
As explained in bugfix commit 6ab4c311 ("net: bridge: don't notify switchdev for local FDB addresses") as well as in this discussion: https://lore.kernel.org/netdev/20210117193009.io3nungdwuzmo5f7@skbuf/ the switchdev notifiers for FDB entries managed to have a zero-day bug, which was that drivers would not know what to do with local FDB entries, because they were not told that they are local. The bug fix was to simply not notify them of those addresses. Let us now add the 'is_local' bit to bridge FDB entries, and make all drivers ignore these entries by their own choice. Co-developed-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tobias Waldekranz authored
Instead of having to add more and more arguments to br_switchdev_fdb_call_notifiers, get rid of it and build the info struct directly in br_switchdev_fdb_notify. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
We need to store cmlen instead of len in cm->cmsg_len. Fixes: 38ebcf50 ("scm: optimize put_cmsg()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Jakub Kicinski says: ==================== ethtool: add standard FEC statistics This set adds uAPI for reporting standard FEC statistics, and implements it in a handful of drivers. The statistics are taken from the IEEE standard, with one extra seemingly popular but not standard statistics added. The implementation is similar to that of the pause frame statistics, user requests the stats by setting a bit (ETHTOOL_FLAG_STATS) in the common ethtool header of ETHTOOL_MSG_FEC_GET. Since standard defines the statistics per lane what's reported is both total and per-lane counters: # ethtool -I --show-fec eth0 FEC parameters for eth0: Configured FEC encodings: None Active FEC encoding: None Statistics: corrected_blocks: 256 Lane 0: 255 Lane 1: 1 uncorrectable_blocks: 145 Lane 0: 128 Lane 1: 17 v2: check for errors in mlx5 register access ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Report corrected bits. v2: catch reg access errors (Saeed) Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Report what appears to be the standard block counts: - 30.5.1.1.17 aFECCorrectedBlocks - 30.5.1.1.18 aFECUncorrectableBlocks Don't report the per-lane symbol counts, if those really count symbols they are not what the standard calls for (even if symbols seem like the most useful thing to count.) Fingers crossed that fec_corrected_errors is not in symbols. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Report corrected bits. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Similarly to pause statistics add stats for FEC. The IEEE standard mandates two sets of counters: - 30.5.1.1.17 aFECCorrectedBlocks - 30.5.1.1.18 aFECUncorrectableBlocks where block is a block of bits FEC operates on. Each of these counters is defined per lane (PCS instance). Multiple vendors provide number of corrected _bits_ rather than/as well as blocks. This set adds the 2 standard-based block counters and a extra one for corrected bits. Counters are exposed to user space via netlink in new attributes. Each attribute carries an array of u64s, first element is the total count, and the following ones are a per-lane break down. Much like with pause stats the operation will not fail when driver does not implement the get_fec_stats callback (nor can the driver fail the operation by returning an error). If stats can't be reported the relevant attributes will be empty. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Refactor fec_prepare_data() a little bit to skip the body of the function and exit on error. Currently the code depends on the fact that we only have one call which may fail between ethnl_ops_begin() and ethnl_ops_complete() and simply saves the error code. This will get hairy with the stats also being queried. No functional changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
We'll need it for FEC stats as well. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Calling two copy_to_user() for very small regions has very high overhead. Switch to inlined unsafe_put_user() to save one stac/clac sequence, and avoid copy_to_user(). Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 15 Apr, 2021 19 commits
-
-
Yangbo Lu authored
Convert system_wq queue_work() to schedule_work() which is a wrapper around it, since the former is a rare construct. Fixes: 7294380c ("enetc: support PTP Sync packet one-step timestamping") Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Huazhong Tan says: ==================== net: hns3: updates for -next This series adds support for pushing link status to VFs for the HNS3 ethernet driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Guangbin Huang authored
To reduce the processing of unnecessary mailbox command when PF supports actively push its link status to VFs, VFs stop sending request link status command in periodic service task in this case. Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Guangbin Huang authored
Previously, VF updates its link status every second by send query command to PF in periodic service task. If link stats of PF is changed, VF may need at most one second to update its link status. To reduce delay of link status between PF and VFs, PF actively push its link status to VFs when its link status is updated. And to let VF know PF supports this new feature, the link status changed mailbox command adds one bit to indicate it. Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Bauer authored
The Atheros AR8031 and AR8033 expose different registers for SGMII/Fiber as well as the copper side of the PHY depending on the BT_BX_REG_SEL bit in the chip configure register. The driver assumes the copper side is selected on probe, but this might not be the case depending which page was last selected by the bootloader. Notably, Ubiquiti UniFi bootloaders show this behavior. Select the copper page when probing to circumvent this. Signed-off-by: David Bauer <mail@david-bauer.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queueDavid S. Miller authored
Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2021-04-14 This series contains updates to ice driver only. Bruce changes and removes open coded values to instead use existing kernel defines and suppresses false cppcheck issues. Ani adds new VSI states to track netdev allocation and registration. He also removes leading underscores in the ice_pf_state enum. Jesse refactors ITR by introducing helpers to reduce duplicated code and structures to simplify checking of ITR mode. He also triggers a software interrupt when exiting napi poll or busy-poll to ensure all work is processed. Modifies /proc/iomem to display driver name instead of PCI address. He also changes the checks of vsi->type to use a local variable in ice_vsi_rebuild() and removes an unneeded struct member. Jake replaces the driver's adaptive interrupt moderation algorithm to use the kernel's DIM library implementation. Scott reworks module reads to reduce the number of reads needed and remove excessive increment of QSFP page. Brett sets the vsi->vf_id to invalid for non-VF VSIs. Paul removes the return value from ice_vsi_manage_rss_lut() as it's not communicating anything critical. He also reduces the scope of a variable. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paul M Stillwell Jr authored
The scope of this variable can be reduced so do that. Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Paul M Stillwell Jr authored
We were saving the return value from ice_vsi_manage_rss_lut(), but the errors from that function are not critical so change it to return void and remove the code that saved the value. Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Bruce Allan authored
Silence false errors, warnings and style issues reported by cppcheck. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Brett Creeley authored
Currently the vsi->vf_id is set only for ICE_VSI_VF and it's left as 0 for all other VSI types. This is confusing and could be problematic since 0 is a valid vf_id. Fix this by always setting non VF VSI types to ICE_INVAL_VFID. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Jesse Brandeburg authored
The only time you can ever have a rq_last_status is if a firmware event was somehow reporting a status on the receive queue, which are generally firmware initiated events or mailbox messages from a VF. Mostly this struct member was unused. Fix this problem by still printing the value of the field in a debug print, but don't store the value forever in a struct, potentially creating opportunities for callers to use the wrong struct member. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Jesse Brandeburg authored
Do a minor refactor on ice_vsi_rebuild to use a local variable to store vsi->type. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Jesse Brandeburg authored
The driver previously printed it's PCI address in the name field for the pci resource, which when displayed via /proc/iomem, would print the same thing twice. It's more useful for debugging to see the driver name, as most other modules do. Here's a diff of before and after this change: 99100000-991fffff : 0000:3b:00.1 9a000000-a04fffff : PCI Bus 0000:3b 9a000000-9bffffff : 0000:3b:00.1 - 9a000000-9bffffff : 0000:3b:00.1 + 9a000000-9bffffff : ice 9c000000-9dffffff : 0000:3b:00.0 - 9c000000-9dffffff : 0000:3b:00.0 + 9c000000-9dffffff : ice 9e000000-9effffff : 0000:3b:00.1 9f000000-9fffffff : 0000:3b:00.0 a0000000-a000ffff : 0000:3b:00.1 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Scott W Taylor authored
There was an excessive increment of the QSFP page, which is now fixed. Additionally, this new update now reads 8 bytes at a time and will retry each request if the module/bus is busy. Also, prevent reading from upper pages if module does not support those pages. Signed-off-by: Scott W Taylor <scott.w.taylor@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Jesse Brandeburg authored
Use a dedicated bitfield in order to both increase the amount of checking around the length of ITR writes as well as simplify the checks of dynamic mode. Basically unpack the "high bit means dynamic" logic into bitfields. Also, remove some unused ITR defines. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Jesse Brandeburg authored
The driver would occasionally miss that there were outstanding descriptors to clean when exiting busy/napi poll. This issue has been in the code since the introduction of the ice driver. Attempt to "catch" any remaining work by triggering a software interrupt when exiting napi poll or busy-poll. This will not cause extra interrupts in the case of normal execution. This issue was found when running sfnt-pingpong, with busy poll enabled, and typically with larger I/O sizes like > 8192, the program would occasionally report > 1 second maximums to complete a ping pong. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Jacob Keller authored
The ice driver has support for adaptive interrupt moderation, an algorithm for tuning the interrupt rate dynamically. This algorithm is based on various assumptions about ring size, socket buffer size, link speed, SKB overhead, ethernet frame overhead and more. The Linux kernel has support for a dynamic interrupt moderation algorithm known as "dimlib". Replace the custom driver-specific implementation of dynamic interrupt moderation with the kernel's algorithm. The Intel hardware has a different hardware implementation than the originators of the dimlib code had to work with, which requires the driver to use a slightly different set of inputs for the actual moderation values, while getting all the advice from dimlib of better/worse, shift left or right. The change made for this implementation is to use a pair of values for each of the 5 "slots" that the dimlib moderation expects, and the driver will program those pairs when dimlib recommends a slot to use. The currently implementation uses two tables, one for receive and one for transmit, and the pairs of values in each slot set the maximum delay of an interrupt and a maximum number of interrupts per second (both expressed in microseconds). There are two separate kinds of bugs fixed by using DIMLIB, one is UDP single stream send was too slow, and the other is that 8K ping-pong was going to the most aggressive moderation and has much too high latency. The overall result of using DIMLIB is that we meet or exceed our performance expectations set based on the old algorithm. Co-developed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Jesse Brandeburg authored
Introduce several new helpers for writing ITR and GLINT_RATE registers, and refactor the code calling them. This resulted in removal of several duplicate functions and rolled a bunch of simple code back into the calling routines. In particular this removes some code that was doing both a store and a set in a helper function, which seems better done as separate tasks in the caller (and generally takes less lines of code even with a tiny bit of repetition). Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Anirudh Venkataramanan authored
Add two new VSI states, one to track if a netdev for the VSI has been allocated and the other to track if the netdev has been registered. Call unregister_netdev/free_netdev only when the corresponding state bits are set. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-