- 17 Nov, 2020 8 commits
-
-
Daniel Borkmann authored
Magnus Karlsson says: ==================== This patch set improves the performance of mainly the Tx processing of AF_XDP sockets. Though, patch 3 also improves the Rx path. All in all, this patch set improves the throughput of the l2fwd xdpsock application by around 11%. If we just take a look at Tx processing part, it is improved by 35% to 40%. Hopefully the new batched Tx interfaces should be of value to other drivers implementing AF_XDP zero-copy support. But patch #3 is generic and will improve performance of all drivers when using AF_XDP sockets (under the premises explained in that patch). @Daniel. In patch 3, I apply all the padding required to hinder the adjacency prefetcher to prefetch the wrong things. After this patch set, I will submit another patch set that introduces ____cacheline_padding_in_smp in include/linux/cache.h according to your suggestions. The last patch in that patch set will then convert the explicit paddings that we have now to ____cacheline_padding_in_smp. v2 -> v3: * Fixed #pragma warning with clang and defined a loop_unrolled_for macro for easier readability [lkp, Nick] * Simplified invalid descriptor handling in xskq_cons_read_desc_batch() v1 -> v2: * Removed added parameter in i40e_setup_tx_descriptors and adopted a simpler solution [Maciej] * Added test for !xs in xsk_tx_peek_release_desc_batch() [John] * Simplified return path in xsk_tx_peek_release_desc_batch() [John] * Dropped patch #1 in v1 that introduced lazy completions. Hopefully this is not needed when we get busy poll [Jakub] * Iterate over local variable in xskq_prod_reserve_addr_batch() for improved performance * Fixed the fallback path in xsk_tx_peek_release_desc_batch() so that it also produces a batch of descriptors, albeit by using the slower (but more general) older code. This improves the performance of the case when multiple sockets are sharing the same device and queue id. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Magnus Karlsson authored
Use the new batched xsk interfaces for the Tx path in the i40e driver to improve performance. On my machine, this yields a throughput increase of 4% for the l2fwd sample app in xdpsock. If we instead just look at the Tx part, this patch set increases throughput with above 20% for Tx. Note that I had to explicitly loop unroll the inner loop to get to this performance level, by using a pragma. It is honored by both clang and gcc and should be ignored by versions that do not support it. Using the -funroll-loops compiler command line switch on the source file resulted in a loop unrolling on a higher level that lead to a performance decrease instead of an increase. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/1605525167-14450-6-git-send-email-magnus.karlsson@gmail.com
-
Magnus Karlsson authored
Introduce batched descriptor interfaces in the xsk core code for the Tx path to be used in the driver to write a code path with higher performance. This interface will be used by the i40e driver in the next patch. Though other drivers would likely benefit from this new interface too. Note that batching is only implemented for the common case when there is only one socket bound to the same device and queue id. When this is not the case, we fall back to the old non-batched version of the function. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/1605525167-14450-5-git-send-email-magnus.karlsson@gmail.com
-
Magnus Karlsson authored
Introduce one cache line worth of padding between the consumer pointer and the flags field as well as between the flags field and the start of the descriptors in all the lockless rings. This so that the x86 HW adjacency prefetcher will not prefetch the adjacent pointer/field when only one pointer/field is going to be used. This improves throughput performance for the l2fwd sample app with 1% on my machine with HW prefetching turned on in the BIOS. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/1605525167-14450-4-git-send-email-magnus.karlsson@gmail.com
-
Magnus Karlsson authored
Remove the unnecessary access to the software ring for the AF_XDP zero-copy driver. This was used to record the length of the packet so that the driver Tx completion code could sum this up to produce the total bytes sent. This is now performed during the transmission of the packet, so no need to record this in the software ring. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/1605525167-14450-3-git-send-email-magnus.karlsson@gmail.com
-
Magnus Karlsson authored
Increment the statistics over how many Tx packets have been sent at the time of sending instead of at the time of completion. This as a completion event means that the buffer has been sent AND returned to user space. The packet always gets sent shortly after sendto() is called. The kernel might, for performance reasons, decide to not return every single buffer to user space immediately after sending, for example, only after a batch of packets have been transmitted. Incrementing the number of packets sent at completion, will in that case be confusing as if you send a single packet, the counter might show zero for a while even though the packet has been transmitted. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/1605525167-14450-2-git-send-email-magnus.karlsson@gmail.com
-
Alan Maguire authored
When operating on split BTF, btf__find_by_name[_kind] will not iterate over all types since they use btf->nr_types to show the number of types to iterate over. For split BTF this is the number of types _on top of base BTF_, so it will underestimate the number of types to iterate over, especially for vmlinux + module BTF, where the latter is much smaller. Use btf__get_nr_types() instead. Fixes: ba451366 ("libbpf: Implement basic split BTF support") Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1605437195-2175-1-git-send-email-alan.maguire@oracle.com
-
Martin KaFai Lau authored
The intention of the current check is to avoid using bpf_sk_storage in irq and nmi. Jakub pointed out that the current check cannot do that. For example, in_serving_softirq() returns true if the softirq handling is interrupted by hard irq. Fixes: 8e4597c6 ("bpf: Allow using bpf_sk_storage in FENTRY/FEXIT/RAW_TP") Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201116200113.2868539-1-kafai@fb.com
-
- 16 Nov, 2020 1 commit
-
-
Santucci Pierpaolo authored
From second fragment on, IPV6FR program must stop the dissection of IPV6 fragmented packet. This is the same approach used for IPV4 fragmentation. This fixes the flow keys calculation for the upper-layer protocols. Note that according to RFC8200, the first fragment packet must include the upper-layer header. Signed-off-by: Santucci Pierpaolo <santucci@epigenesys.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/X7JUzUj34ceE2wBm@santucci.pierpaolo
-
- 14 Nov, 2020 22 commits
-
-
Jakub Kicinski authored
Shannon Nelson says: ==================== ionic updates These updates are a bit of code cleaning and a minor bit of performance tweaking. v3: convert ionic_lif_quiesce() to void v2: added void cast on call to ionic_lif_quiesce() lowered batching threshold added patch to flatten calls to ionic_lif_rx_mode added patch to change from_ndo to can_sleep ==================== Link: https://lore.kernel.org/r/20201112182208.46770-1-snelson@pensando.ioSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Shannon Nelson authored
With a few more uses of true and false in function calls, we need to give them some useful names so we can tell from the calling point what we're doing. Signed-off-by: Shannon Nelson <snelson@pensando.io> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Shannon Nelson authored
Instead of having two different ways of expressing the same sleepability concept, using opposite logic, we can rework the from_ndo to can_sleep for a more consistent usage. Fixes: 1800eee1 ("net: ionic: Replace in_interrupt() usage.") Signed-off-by: Shannon Nelson <snelson@pensando.io> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Shannon Nelson authored
The _ionic_lif_rx_mode() is only used once and really doesn't need to be broken out. Signed-off-by: Shannon Nelson <snelson@pensando.io> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Shannon Nelson authored
We should be using the multicast sync routines for the multicast filters. Also, let's just flatten the logic a bit and pull the small unicast routine back into ionic_set_rx_mode(). Fixes: 1800eee1 ("net: ionic: Replace in_interrupt() usage.") Signed-off-by: Shannon Nelson <snelson@pensando.io> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Shannon Nelson authored
We don't need to refill the rx descriptors on every napi if only a few were handled. Waiting until we can batch up a few together will save us a few Rx cycles. Signed-off-by: Shannon Nelson <snelson@pensando.io> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Shannon Nelson authored
After the queues are stopped, expressly quiesce the lif. This assures that even if the queues were in an odd state, the firmware will close up everything cleanly. Signed-off-by: Shannon Nelson <snelson@pensando.io> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Shannon Nelson authored
Request a link check as soon as the netdev is registered rather than waiting for the watchdog to go off in order to get the interface operational a little more quickly. Signed-off-by: Shannon Nelson <snelson@pensando.io> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Shannon Nelson authored
Change the order of operations in the link_up handling to be sure that the queues are up and ready before we announce that the link is up. Signed-off-by: Shannon Nelson <snelson@pensando.io> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
YueHaibing authored
Check PTR_ERR with IS_ERR to fix this. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Link: https://lore.kernel.org/r/20201112144936.54776-1-yuehaibing@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Lukas Bulwahn authored
Commit bdb7cc64 ("ipv6: Count interface receive statistics on the ingress netdev") removed all callees for ipv6_skb_idev(). Hence, since then, ipv6_skb_idev() is unused and make CC=clang W=1 warns: net/ipv6/exthdrs.c:909:33: warning: unused function 'ipv6_skb_idev' [-Wunused-function] So, remove this unused function and a -Wunused-function warning. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Link: https://lore.kernel.org/r/20201113135012.32499-1-lukas.bulwahn@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski authored
Daniel Borkmann says: ==================== pull-request: bpf-next 2020-11-14 1) Add BTF generation for kernel modules and extend BTF infra in kernel e.g. support for split BTF loading and validation, from Andrii Nakryiko. 2) Support for pointers beyond pkt_end to recognize LLVM generated patterns on inlined branch conditions, from Alexei Starovoitov. 3) Implements bpf_local_storage for task_struct for BPF LSM, from KP Singh. 4) Enable FENTRY/FEXIT/RAW_TP tracing program to use the bpf_sk_storage infra, from Martin KaFai Lau. 5) Add XDP bulk APIs that introduce a defer/flush mechanism to optimize the XDP_REDIRECT path, from Lorenzo Bianconi. 6) Fix a potential (although rather theoretical) deadlock of hashtab in NMI context, from Song Liu. 7) Fixes for cross and out-of-tree build of bpftool and runqslower allowing build for different target archs on same source tree, from Jean-Philippe Brucker. 8) Fix error path in htab_map_alloc() triggered from syzbot, from Eric Dumazet. 9) Move functionality from test_tcpbpf_user into the test_progs framework so it can run in BPF CI, from Alexander Duyck. 10) Lift hashtab key_size limit to be larger than MAX_BPF_STACK, from Florian Lehner. Note that for the fix from Song we have seen a sparse report on context imbalance which requires changes in sparse itself for proper annotation detection where this is currently being discussed on linux-sparse among developers [0]. Once we have more clarification/guidance after their fix, Song will follow-up. [0] https://lore.kernel.org/linux-sparse/CAHk-=wh4bx8A8dHnX612MsDO13st6uzAz1mJ1PaHHVevJx_ZCw@mail.gmail.com/T/ https://lore.kernel.org/linux-sparse/20201109221345.uklbp3lzgq6g42zb@ltop.local/T/ * git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (66 commits) net: mlx5: Add xdp tx return bulking support net: mvpp2: Add xdp tx return bulking support net: mvneta: Add xdp tx return bulking support net: page_pool: Add bulk support for ptr_ring net: xdp: Introduce bulking for xdp tx return path bpf: Expose bpf_d_path helper to sleepable LSM hooks bpf: Augment the set of sleepable LSM hooks bpf: selftest: Use bpf_sk_storage in FENTRY/FEXIT/RAW_TP bpf: Allow using bpf_sk_storage in FENTRY/FEXIT/RAW_TP bpf: Rename some functions in bpf_sk_storage bpf: Folding omem_charge() into sk_storage_charge() selftests/bpf: Add asm tests for pkt vs pkt_end comparison. selftests/bpf: Add skb_pkt_end test bpf: Support for pointers beyond pkt_end. tools/bpf: Always run the *-clean recipes tools/bpf: Add bootstrap/ to .gitignore bpf: Fix NULL dereference in bpf_task_storage tools/bpftool: Fix build slowdown tools/runqslower: Build bpftool using HOSTCC tools/runqslower: Enable out-of-tree build ... ==================== Link: https://lore.kernel.org/r/20201114020819.29584-1-daniel@iogearbox.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Steen Hegelund authored
Add VSC8572 and VSC8574 in the PTP configuration as they also support PTP. The relevant datasheets can be found here: - VSC8572: https://www.microchip.com/wwwproducts/en/VSC8572 - VSC8574: https://www.microchip.com/wwwproducts/en/VSC8574Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Link: https://lore.kernel.org/r/20201112092250.914079-1-steen.hegelund@microchip.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Daniel Borkmann authored
Lorenzo Bianconi says: ==================== XDP bulk APIs introduce a defer/flush mechanism to return pages belonging to the same xdp_mem_allocator object (identified via the mem.id field) in bulk to optimize I-cache and D-cache since xdp_return_frame is usually run inside the driver NAPI tx completion loop. Convert mvneta, mvpp2 and mlx5 drivers to xdp_return_frame_bulk APIs. More details on benchmarks run on mlx5 can be found here: https://github.com/xdp-project/xdp-project/blob/master/areas/mem/xdp_bulk_return01.org Changes since v5: - do not keep looping over ptr_ring if the cache is full but release leftover pages running page_pool_return_page Changes since v4: - fix comments - introduce xdp_frame_bulk_init utility routine - compiler annotations for I-cache code layout - move rcu_read_lock outside fast-path - mlx5 xdp bulking code optimization Changes since v3: - align DEV_MAP_BULK_SIZE to XDP_BULK_QUEUE_SIZE - refactor page_pool_put_page_bulk to avoid code duplication Changes since v2: - move mvneta changes in a dedicated patch Changes since v1: - improve comments - rework xdp_return_frame_bulk routine logic - move count and xa fields at the beginning of xdp_frame_bulk struct - invert logic in page_pool_put_page_bulk for loop ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
-
Lorenzo Bianconi authored
Convert mlx5 driver to xdp_return_frame_bulk APIs. XDP_REDIRECT (upstream codepath): 8.9Mpps XDP_REDIRECT (upstream codepath + bulking APIs): 10.2Mpps Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/250460319fd868b7b5668fc1deca74dd42813a90.1605267335.git.lorenzo@kernel.org
-
Lorenzo Bianconi authored
Convert mvpp2 driver to xdp_return_frame_bulk APIs. XDP_REDIRECT (upstream codepath): 1.79Mpps XDP_REDIRECT (upstream codepath + bulking APIs): 1.93Mpps Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Matteo Croce <mcroce@microsoft.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/0b38c295e58e8ce251ef6b4e2187a2f457f9f7a3.1605267335.git.lorenzo@kernel.org
-
Lorenzo Bianconi authored
Convert mvneta driver to xdp_return_frame_bulk APIs. XDP_REDIRECT (upstream codepath): 275Kpps XDP_REDIRECT (upstream codepath + bulking APIs): 284Kpps Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/9af8014006d022fc0fec78cdaa71beb56999750d.1605267335.git.lorenzo@kernel.org
-
Lorenzo Bianconi authored
Introduce the capability to batch page_pool ptr_ring refill since it is usually run inside the driver NAPI tx completion loop. Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com> Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Link: https://lore.kernel.org/bpf/08dd249c9522c001313f520796faa777c4089e1c.1605267335.git.lorenzo@kernel.org
-
Lorenzo Bianconi authored
XDP bulk APIs introduce a defer/flush mechanism to return pages belonging to the same xdp_mem_allocator object (identified via the mem.id field) in bulk to optimize I-cache and D-cache since xdp_return_frame is usually run inside the driver NAPI tx completion loop. The bulk queue size is set to 16 to be aligned to how XDP_REDIRECT bulking works. The bulk is flushed when it is full or when mem.id changes. xdp_frame_bulk is usually stored/allocated on the function call-stack to avoid locking penalties. Current implementation considers only page_pool memory model. Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com> Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Link: https://lore.kernel.org/bpf/e190c03eac71b20c8407ae0fc2c399eda7835f49.1605267335.git.lorenzo@kernel.org
-
Jisheng Zhang authored
Use the devm_reset_control_get_optional() and devm_clk_get_optional() rather than open coding them. Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com> Link: https://lore.kernel.org/r/20201112092606.5173aa6f@xhacker.debianSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Heiner Kallweit authored
We can simplify the for() condition and eliminate variable tx_left. The change also considers that tp->cur_tx may be incremented by a racing rtl8169_start_xmit(). In addition replace the write to tp->dirty_tx and the following smp_mb() with an equivalent call to smp_store_mb(). This implicitly adds a WRITE_ONCE() to the write. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/c2e19e5e-3d3f-d663-af32-13c3374f5def@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Heiner Kallweit authored
tp->dirty_tx and tp->cur_tx may be changed by a racing rtl_tx() or rtl8169_start_xmit(). Use READ_ONCE() to annotate the races and ensure that the compiler doesn't use cached values. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/5676fee3-f6b4-84f2-eba5-c64949a371ad@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 13 Nov, 2020 9 commits
-
-
Jakub Kicinski authored
Alex Elder says: ==================== net: ipa: two fixes This small series makes two fixes to the IPA code: - While reviewing something else I found that one of the resource limits on the SDM845 used the wrong value. The first patch fixes this. The correct value allocates more resources of this type for IPA to use, and otherwise does not change behavior. - When the IPA-resident microcontroller starts up it generates an event, which triggers an AP interrupt. The event merely provides some information for logging, which we don't support. We already ignore the event, and that's harmless. So this patch explicitly ignores it rather than issuing a warning when it occurs. ==================== Link: https://lore.kernel.org/r/20201112121157.19784-1-elder@linaro.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
The IPA-resident microcontroller has the ability to log various activity in an area of IPA shared memory. When the microcontroller starts it generates an event to the AP to provide information about the log. We don't support reading this log, and we can safely ignore the event. So do that rather than treating the log info event we receive as "unsupported." Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
I have discovered that the maximum number of source packet contexts configured for SDM845 is incorrect. Fix this error. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Edward Cree says: ==================== sfc: further EF100 encap TSO features This series adds support for GRE and GRE_CSUM TSO on EF100 NICs, as well as improving the handling of UDP tunnel TSO. ==================== Link: https://lore.kernel.org/r/eda2de73-edf2-8b92-edb9-099ebda09ebc@solarflare.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Edward Cree authored
We can treat SKB_GSO_GRE almost exactly the same as UDP tunnels, except that we don't want to edit the outer UDP len (as there isn't one). For SKB_GSO_GRE_CSUM, we have to use GSO_PARTIAL as the device doesn't support offload of non-UDP outer L4 checksums. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Martin Habets <mhabets@solarflare.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
-
Edward Cree authored
By asking the HW for the correct edits, we can make UDP tunnel TSO work without needing GSO_PARTIAL. So don't specify it in our netdev->gso_partial_features. However, retain GSO_PARTIAL support, as this will be used for other protocols later. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Martin Habets <mhabets@solarflare.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
-
Edward Cree authored
Our TSO descriptors got even more fussy. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Martin Habets <mhabets@solarflare.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
-
Jakub Kicinski authored
Alex Elder says: ==================== net: ipa: GSI register consolidation This series rearranges and consolidates some GSI register definitions. Its general aim is to make things more consistent, by: - Using enumerated types to define the values held in GSI register fields - Defining field values in "gsi_reg.h", together with the definition of the register (and field) that holds them - Format enumerated type members consistently, with hexidecimal numeric values, and assignments aligned on the same column There is one checkpatch "CHECK" warning requesting a blank line; I ignored that because my intention was to group certain definitions. ==================== Link: https://lore.kernel.org/r/20201110215922.23514-1-elder@linaro.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
Replace constants defined with an "_FVAL" suffix with values defined in enumerated types, to be consistent with other usage in the driver. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-