- 27 Apr, 2022 1 commit
-
-
Eric Dumazet authored
Logic added in commit f35f8219 ("tcp: defer skb freeing after socket lock is released") helped bulk TCP flows to move the cost of skbs frees outside of critical section where socket lock was held. But for RPC traffic, or hosts with RFS enabled, the solution is far from being ideal. For RPC traffic, recvmsg() has to return to user space right after skb payload has been consumed, meaning that BH handler has no chance to pick the skb before recvmsg() thread. This issue is more visible with BIG TCP, as more RPC fit one skb. For RFS, even if BH handler picks the skbs, they are still picked from the cpu on which user thread is running. Ideally, it is better to free the skbs (and associated page frags) on the cpu that originally allocated them. This patch removes the per socket anchor (sk->defer_list) and instead uses a per-cpu list, which will hold more skbs per round. This new per-cpu list is drained at the end of net_action_rx(), after incoming packets have been processed, to lower latencies. In normal conditions, skbs are added to the per-cpu list with no further action. In the (unlikely) cases where the cpu does not run net_action_rx() handler fast enough, we use an IPI to raise NET_RX_SOFTIRQ on the remote cpu. Also, we do not bother draining the per-cpu list from dev_cpu_dead() This is because skbs in this list have no requirement on how fast they should be freed. Note that we can add in the future a small per-cpu cache if we see any contention on sd->defer_lock. Tested on a pair of hosts with 100Gbit NIC, RFS enabled, and /proc/sys/net/ipv4/tcp_rmem[2] tuned to 16MB to work around page recycling strategy used by NIC driver (its page pool capacity being too small compared to number of skbs/pages held in sockets receive queues) Note that this tuning was only done to demonstrate worse conditions for skb freeing for this particular test. These conditions can happen in more general production workload. 10 runs of one TCP_STREAM flow Before: Average throughput: 49685 Mbit. Kernel profiles on cpu running user thread recvmsg() show high cost for skb freeing related functions (*) 57.81% [kernel] [k] copy_user_enhanced_fast_string (*) 12.87% [kernel] [k] skb_release_data (*) 4.25% [kernel] [k] __free_one_page (*) 3.57% [kernel] [k] __list_del_entry_valid 1.85% [kernel] [k] __netif_receive_skb_core 1.60% [kernel] [k] __skb_datagram_iter (*) 1.59% [kernel] [k] free_unref_page_commit (*) 1.16% [kernel] [k] __slab_free 1.16% [kernel] [k] _copy_to_iter (*) 1.01% [kernel] [k] kfree (*) 0.88% [kernel] [k] free_unref_page 0.57% [kernel] [k] ip6_rcv_core 0.55% [kernel] [k] ip6t_do_table 0.54% [kernel] [k] flush_smp_call_function_queue (*) 0.54% [kernel] [k] free_pcppages_bulk 0.51% [kernel] [k] llist_reverse_order 0.38% [kernel] [k] process_backlog (*) 0.38% [kernel] [k] free_pcp_prepare 0.37% [kernel] [k] tcp_recvmsg_locked (*) 0.37% [kernel] [k] __list_add_valid 0.34% [kernel] [k] sock_rfree 0.34% [kernel] [k] _raw_spin_lock_irq (*) 0.33% [kernel] [k] __page_cache_release 0.33% [kernel] [k] tcp_v6_rcv (*) 0.33% [kernel] [k] __put_page (*) 0.29% [kernel] [k] __mod_zone_page_state 0.27% [kernel] [k] _raw_spin_lock After patch: Average throughput: 73076 Mbit. Kernel profiles on cpu running user thread recvmsg() looks better: 81.35% [kernel] [k] copy_user_enhanced_fast_string 1.95% [kernel] [k] _copy_to_iter 1.95% [kernel] [k] __skb_datagram_iter 1.27% [kernel] [k] __netif_receive_skb_core 1.03% [kernel] [k] ip6t_do_table 0.60% [kernel] [k] sock_rfree 0.50% [kernel] [k] tcp_v6_rcv 0.47% [kernel] [k] ip6_rcv_core 0.45% [kernel] [k] read_tsc 0.44% [kernel] [k] _raw_spin_lock_irqsave 0.37% [kernel] [k] _raw_spin_lock 0.37% [kernel] [k] native_irq_return_iret 0.33% [kernel] [k] __inet6_lookup_established 0.31% [kernel] [k] ip6_protocol_deliver_rcu 0.29% [kernel] [k] tcp_rcv_established 0.29% [kernel] [k] llist_reverse_order v2: kdoc issue (kernel bots) do not defer if (alloc_cpu == smp_processor_id()) (Paolo) replace the sk_buff_head with a single-linked list (Jakub) add a READ_ONCE()/WRITE_ONCE() for the lockless read of sd->defer_list Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20220422201237.416238-1-eric.dumazet@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 26 Apr, 2022 4 commits
-
-
Ethan Yang authored
add support for Sierra Wireless EM7590 0xc081 composition. Signed-off-by: Ethan Yang <etyang@sierrawireless.com> Acked-by: Bjørn Mork <bjorn@mork.no> Link: https://lore.kernel.org/r/20220425054028.5444-1-etyang@sierrawireless.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Hangbin Liu authored
Currently, the kernel drops GSO VLAN tagged packet if it's created with socket(AF_PACKET, SOCK_RAW, 0) plus virtio_net_hdr. The reason is AF_PACKET doesn't adjust the skb network header if there is a VLAN tag. Then after virtio_net_hdr_set_proto() called, the skb->protocol will be set to ETH_P_IP/IPv6. And in later inet/ipv6_gso_segment() the skb is dropped as network header position is invalid. Let's handle VLAN packets by adjusting network header position in packet_parse_headers(). The adjustment is safe and does not affect the later xmit as tap device also did that. In packet_snd(), packet_parse_headers() need to be moved before calling virtio_net_hdr_set_proto(), so we can set correct skb->protocol and network header first. There is no need to update tpacket_snd() as it calls packet_parse_headers() in tpacket_fill_skb(), which is already before calling virtio_net_hdr_* functions. skb->no_fcs setting is also moved upper to make all skb settings together and keep consistency with function packet_sendmsg_spkt(). Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20220425014502.985464-1-liuhangbin@gmail.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Arun Ramadoss authored
The ksz8795 and ksz9477 uses the same algorithm for the port_stp_state_set function except the register address is different. So moved the algorithm to the ksz_common.c and used the dev_ops for register read and write. This function can also used for the lan937x part. Hence making it generic for all the parts. Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20220424112831.11504-1-arun.ramadoss@microchip.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Arun Ramadoss authored
Added the config_intr and handle_interrupt for the LAN937x phy which is same as the LAN87xx phy. Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220423154727.29052-1-arun.ramadoss@microchip.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
- 25 Apr, 2022 19 commits
-
-
Tetsuo Handa authored
Flushing system-wide workqueues is dangerous and will be forbidden. Replace system_wq with local wwan_wq. While we are at it, make err_clean_devs: label of wwan_hwsim_init() behave like wwan_hwsim_exit(), for it is theoretically possible to call wwan_hwsim_debugfs_devcreate_write()/wwan_hwsim_debugfs_devdestroy_write() by the moment wwan_hwsim_init_devs() returns. Link: https://lkml.kernel.org/r/49925af7-78a8-a3dd-bce6-cfc02e1a9236@I-love.SAKURA.ne.jpSigned-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Reviewed-by: Loic Poulain <loic.poulain@linaro.org> Link: https://lore.kernel.org/r/7390d51f-60e2-3cee-5277-b819a55ceabe@I-love.SAKURA.ne.jpSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Marcin Wojtas authored
Reduce a number of included headers to a necessary minimum. Signed-off-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yajun Deng authored
net/ipv4/arp.c:1412:36: warning: unused variable 'arp_seq_ops' [-Wunused-const-variable] Add #ifdef CONFIG_PROC_FS for 'arp_seq_ops'. Fixes: e968b1b3 ("arp: Remove #ifdef CONFIG_PROC_FS") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alex Elder authored
The aggregation byte limit for an endpoint is currently computed based on the endpoint's receive buffer size. However, some bytes at the front of each receive buffer are reserved on the assumption that--as with SKBs--it might be useful to insert data (such as headers) before what lands in the buffer. The aggregation byte limit currently doesn't take into account that reserved space, and as a result, aggregation could require space past that which is available in the buffer. Fix this by reducing the size used to compute the aggregation byte limit by the NET_SKB_PAD offset reserved for each receive buffer. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Dan Carpenter authored
Check if the kzalloc() failed. Fixes: 804775df ("net: ethernet: mtk_eth_soc: add support for Wireless Ethernet Dispatch (WED)") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yang Yingliang authored
Replace the BUG_ON() with returning error code to handle the fault more gracefully. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Haowen Bai authored
payload only memset but no use at all, so we drop them. Signed-off-by: Haowen Bai <baihaowen@meizu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Ido Schimmel says: ==================== mlxsw: extend line card model by devices and info Jiri says: This patchset is extending the line card model by three items: 1) line card devices 2) line card info 3) line card device info First three patches are introducing the necessary changes in devlink core. Then, all three extensions are implemented in mlxsw alongside with selftest. Examples: $ devlink lc show pci/0000:01:00.0 lc 8 pci/0000:01:00.0: lc 8 state active type 16x100G supported_types: 16x100G devices: device 0 device 1 device 2 device 3 $ devlink lc info pci/0000:01:00.0 lc 8 pci/0000:01:00.0: lc 8 versions: fixed: hw.revision 0 running: ini.version 4 devices: device 0 versions: running: fw 19.2010.1310 device 1 versions: running: fw 19.2010.1310 device 2 versions: running: fw 19.2010.1310 device 3 versions: running: fw 19.2010.1310 Note that device FW flashing is going to be implemented in the follow-up patchset. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Once line card is activated, check the device FW version is exposed. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Extend MDDQ to obtain FW version of line card device and implement device_info_get() op to fill up the info with that. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Add FW version fields to MDDQ device_info. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Once line card is provisioned, check if HW revision and INI version are exposed. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Implement info_get() to expose HW revision of a linecard and loaded INI version. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Once line card is provisioned, check the count of devices on it and print them out. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
In case the line card is provisioned, go over all possible existing devices (gearboxes) on it and attach them, so devlink core is aware of them. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Extend existing MDDQ register by possibility to query information about devices residing on a line card. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Extend the line card info message with information (e.g., FW version) about devices found on the line card. Example: $ devlink lc info pci/0000:01:00.0 lc 8 pci/0000:01:00.0: lc 8 versions: fixed: hw.revision 0 running: ini.version 4 devices: device 0 versions: running: fw 19.2010.1310 device 1 versions: running: fw 19.2010.1310 device 2 versions: running: fw 19.2010.1310 device 3 versions: running: fw 19.2010.1310 Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Allow the driver to provide per line card info get op to fill-up info, similar to the "devlink dev info". Example: $ devlink lc info pci/0000:01:00.0 lc 8 pci/0000:01:00.0: lc 8 versions: fixed: hw.revision 0 running: ini.version 4 Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Line card can contain one or more devices that makes sense to make visible to the user. For example, this can be a gearbox with flash memory, which could be updated. Provide the driver possibility to attach such devices to a line card and expose those to user. Example: $ devlink lc show pci/0000:01:00.0 lc 8 pci/0000:01:00.0: lc 8 state active type 16x100G supported_types: 16x100G devices: device 0 device 1 device 2 device 3 Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 23 Apr, 2022 16 commits
-
-
David S. Miller authored
Vladimir Oltean says: ==================== DSA selftests When working on complex new features or reworks it becomes increasingly difficult to ensure there aren't regressions being introduced, and therefore it would be nice if we could go over the functionality we already have and write some tests for it. Verbally I know from Tobias Waldekranz that he has been working on some selftests for DSA, yet I have never seen them, so here I am adding some tests I have written which have been useful for me. The list is by no means complete (it only covers elementary functionality), but it's still good to have as a starting point. I also borrowed some refactoring changes from Joachim Wiberg that he submitted for his "net: bridge: forwarding of unknown IPv4/IPv6/MAC BUM traffic" series, but not the entirety of his selftests. I now think that his selftests have some overlap with bridge_vlan_unaware.sh and bridge_vlan_aware.sh and they should be more tightly integrated with each other - yet I didn't do that either :). Another issue I had with his selftests was that they jumped straight ahead to configure brport flags on br0 (a radical new idea still at RFC status) while we have bigger problems, and we don't have nearly enough coverage for the *existing* functionality. One idea introduced here which I haven't seen before is the symlinking of relevant forwarding selftests to the selftests/drivers/net/<my-driver>/ folder, plus a forwarding.config file. I think there's some value in having things structured this way, since the forwarding dir has so many selftests that aren't relevant to DSA that it is a bit difficult to find the ones that are. While searching for applications that I could use for multicast testing (not my domain of interest/knowledge really), I found Joachim Wiberg's mtools, mcjoin and omping, and I tried them all with various degrees of success. In particular, I was going to use mcjoin, but I faced some issues getting IPv6 multicast traffic to work in a VRF, and I bothered David Ahern about it here: https://lore.kernel.org/netdev/97eaffb8-2125-834e-641f-c99c097b6ee2@gmail.com/t/ It seems that the problem is that this application should use SO_BINDTODEVICE, yet it doesn't. So I ended up patching the bare-bones mtools (msend, mreceive) forked by Joachim from the University of Virginia's Multimedia Networks Group to include IPv6 support, and to use SO_BINDTODEVICE. This is what I'm using now for IPv6. Note that mausezahn doesn't appear to do a particularly good job of supporting IPv6 really, and I needed a program to emit the actual IP_ADD_MEMBERSHIP calls, for dev_mc_add(), so I could test RX filtering. Crafting the IGMP/MLD reports by hand doesn't really do the trick. While extremely bare-bones, the mreceive application now seems to do what I need it to. Feedback appreciated, it is very likely that I could have done things in a better way. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
This adds an initial subset of forwarding selftests which I considered to be relevant for DSA drivers, along with a forwarding.config that makes it easier to run them (disables veth pair creation, makes sure MAC addresses are unique and stable). The intention is to request driver writers to run these selftests during review and make sure that the tests pass, or at least that the problems are known. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
This tests the capability of switch ports to filter out undesired traffic. Different drivers are expected to have different capabilities here (so some may fail and some may pass), yet the test still has some value, for example to check for regressions. There are 2 kinds of failures, one is when a packet which should have been accepted isn't (and that should be fixed), and the other "failure" (as reported by the test) is when a packet could have been filtered out (for being unnecessary) yet it was received. The bridge driver fares particularly badly at this test: TEST: br0: Unicast IPv4 to primary MAC address [ OK ] TEST: br0: Unicast IPv4 to macvlan MAC address [ OK ] TEST: br0: Unicast IPv4 to unknown MAC address [FAIL] reception succeeded, but should have failed TEST: br0: Unicast IPv4 to unknown MAC address, promisc [ OK ] TEST: br0: Unicast IPv4 to unknown MAC address, allmulti [FAIL] reception succeeded, but should have failed TEST: br0: Multicast IPv4 to joined group [ OK ] TEST: br0: Multicast IPv4 to unknown group [FAIL] reception succeeded, but should have failed TEST: br0: Multicast IPv4 to unknown group, promisc [ OK ] TEST: br0: Multicast IPv4 to unknown group, allmulti [ OK ] TEST: br0: Multicast IPv6 to joined group [ OK ] TEST: br0: Multicast IPv6 to unknown group [FAIL] reception succeeded, but should have failed TEST: br0: Multicast IPv6 to unknown group, promisc [ OK ] TEST: br0: Multicast IPv6 to unknown group, allmulti [ OK ] mainly because it does not implement IFF_UNICAST_FLT. Yet I still think having the test (with the failures) is useful in case somebody wants to tackle that problem in the future, to make an easy before-and-after comparison. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
Bombard a standalone switch port with various kinds of traffic to ensure it is really standalone and doesn't leak packets to other switch ports. Also check for switch ports in different bridges, and switch ports in a VLAN-aware bridge but having different pvids. No forwarding should take place in either case. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
Pinging an IPv6 link-local multicast address selects the link-local unicast address of the interface as source, and we'd like to monitor for that in tcpdump. Add a helper to the forwarding library which retrieves the link-local IPv6 address of an interface, to make that task easier. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
Extend the forwarding library with calls to some small C programs which join an IP multicast group and send some packets to it. Both IPv4 and IPv6 groups are supported. Use cases range from testing IGMP/MLD snooping, to RX filtering, to multicast routing. Testing multicast traffic using msend/mreceive is intended to be done using tcpdump. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Joachim Wiberg authored
Extend tcpdump_start() & C:o to handle multiple instances. Useful when observing bridge operation, e.g., unicast learning/flooding, and any case of multicast distribution (to these ports but not that one ...). This means the interface argument is now a mandatory argument to all tcpdump_*() functions, hence the changes to the ocelot flower test. Signed-off-by: Joachim Wiberg <troglobit@gmail.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Joachim Wiberg authored
For some use-cases we may want to change the tcpdump flags used in tcpdump_start(). For instance, observing interfaces without the PROMISC flag, e.g. to see what's really being forwarded to the bridge interface. Signed-off-by: Joachim Wiberg <troglobit@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
By default, DSA switch ports inherit their MAC address from the DSA master. This works well for practical situations, but some selftests like bridge_vlan_unaware.sh loop back 2 standalone DSA ports with 2 bridged DSA ports, and require the bridge to forward packets between the standalone ports. Due to the bridge seeing that the MAC DA it needs to forward is present as a local FDB entry (it coincides with the MAC address of the bridge ports), the test packets are not forwarded, but terminated locally on br0. In turn, this makes the ping and ping6 tests fail. Address this by introducing an option to have stable MAC addresses. When mac_addr_prepare is called, the current addresses of the netifs are saved and replaced with 00:01:02:03:04:${netif number}. Then when mac_addr_restore is called at the end of the test, the original MAC addresses are restored. This ensures that the MAC addresses are unique, which makes the test pass even for DSA ports. The usage model is for the behavior to be opt-in via STABLE_MAC_ADDRS, which DSA should set to true, all others behave as before. By hooking the calls to mac_addr_prepare and mac_addr_restore within the forwarding lib itself, we do not need to patch each individual selftest, the only requirement is that pre_cleanup is called. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Mat Martineau says: ==================== mptcp: TCP fallback for established connections RFC 8684 allows some MPTCP connections to fall back to regular TCP when the MPTCP DSS checksum detects middlebox interference, there is only a single subflow, and there is no unacknowledged out-of-sequence data. When this condition is detected, the stack sends a MPTCP DSS option with an "infinite mapping" to signal that a fallback is happening, and the peers will stop sending MPTCP options in their TCP headers. The Linux MPTCP stack has not yet supported this type of fallback, instead closing the connection when the MPTCP checksum fails. This series adds support for fallback to regular TCP in a more limited scenario, for only MPTCP connections that have never connected additional subflows or transmitted out-of-sequence data. The selftests are also updated to check new MIBs that track infinite mappings. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Geliang Tang authored
This patch adds a function chk_infi_nr() to check the mibs for the infinite mapping. Invoke it in chk_join_nr() when validate_checksum is set. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Geliang Tang authored
In trace event class mptcp_dump_mpext, dump the newly added infinite_map field of struct mptcp_dump_mpext too. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Geliang Tang authored
This patch adds a new mib named MPTCP_MIB_INFINITEMAPTX, increase it when a infinite mapping has been sent out. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Geliang Tang authored
This patch adds the infinite mapping receiving logic. When the infinite mapping is received, set the map_data_len of the subflow to 0. In subflow_check_data_avail(), only reset the subflow when the map_data_len of the subflow is non-zero. Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Geliang Tang authored
This patch adds the infinite mapping sending logic. Add a new flag send_infinite_map in struct mptcp_subflow_context. Set it true when a single contiguous subflow is in use and the allow_infinite_fallback flag is true in mptcp_pm_mp_fail_received(). In mptcp_sendmsg_frag(), if this flag is true, call the new function mptcp_update_infinite_map() to set the infinite mapping. Add a new flag infinite_map in struct mptcp_ext, set it true in mptcp_update_infinite_map(), and check this flag in a new helper mptcp_check_infinite_map(). In mptcp_update_infinite_map(), set data_len to 0, and clear the send_infinite_map flag, then do fallback. In mptcp_established_options(), use the helper mptcp_check_infinite_map() to let the infinite mapping DSS can be sent out in the fallback mode. Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Geliang Tang authored
This patch adds a new member allow_infinite_fallback in mptcp_sock, which is initialized to 'true' when the connection begins and is set to 'false' on any retransmit or successful MP_JOIN. Only do infinite mapping fallback if there is a single subflow AND there have been no retransmissions AND there have never been any MP_JOINs. Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-