- 21 Jul, 2022 14 commits
-
-
Jakub Kicinski authored
Alex Elder says: ==================== net: ipa: move configuration data files This series moves the "ipa_data-vX.Y.c" files into a subdirectory. The first patch adds a Makefile variable containing the list of supported IPA versions, and uses it to simplify the way these files are specified. ==================== Link: https://lore.kernel.org/r/20220719150827.295248-1-elder@linaro.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
Reduce the clutter in the main IPA source directory by creating a new "data" subdirectory, and locating all of the configuration data files in there. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
Create a variable in the Makefile listing the IPA versions supported by the driver. Use that to create the list of configuration data object files used (rather than listing them all individually). Add a SPDX license comment. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Alex Elder says: ==================== net: ipa: small transaction updates Version 2 of this series corrects a misspelling of "outstanding" pointed out by the netdev test bots. (For some reason I don't see that when I run "checkpatch".) I found and fixed a second instance of that word being misspelled as well. This series includes three changes to the transaction code. The first adds a new transaction list that represents a distinct state that has not been maintained. The second moves a field in the transaction information structure, and reorders its initialization a bit. The third skips a function call when it is known not to be necessary. The last two are very small "leftover" patches. ==================== Link: https://lore.kernel.org/r/20220719181020.372697-1-elder@linaro.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
Since commit 8797972a ("net: ipa: remove command info pool"), we don't allocate "command info" entries for command channel transactions. Fix a comment that seems to suggest we still do. (Even before that commit, the comment was out of place.) Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
When the IPA driver has completed its initialization and setup stages, it emits a brief message to the log. Add a small message that reports when it has been removed. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
In gsi_trans_free(), there's no point in ipa_gsi_trans_release() if a transaction is unused. No used TREs means no IPA layer resources to clean up. So only call ipa_gsi_trans_release() if at least one TRE was used. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
The transaction map is really associated with the transaction pool; move its definition earlier in the gsi_trans_info structure. Rearrange initialization in gsi_channel_trans_init() so it sets the tre_avail value first, then initializes the transaction pool, and finally allocating the transaction map. Update comments. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
We currently put a transaction on the pending list when it has been committed. But until the channel's doorbell rings, these transactions aren't actually "owned" by the hardware yet. Add a new "committed" state (and list), to represent transactions that have been committed but not yet sent to hardware. Define "pending" to mean committed transactions that have been sent to hardware but have not yet completed. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
Create a new attribute group meant to provide a single place that defines endpoint IDs that might be needed by user space. Not all defined endpoints are presented, and only those that are defined will be made visible. The new attributes use "extended" device attributes to hold endpoint IDs, which is a little more compact and efficient. Reimplement the existing modem endpoint ID attribute files using common code. Signed-off-by: Alex Elder <elder@linaro.org> Link: https://lore.kernel.org/r/20220719191639.373249-1-elder@linaro.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Kuniyuki Iwashima authored
This patch fixes a build error reported in the link. [0] unix_connect.c: In function ‘unix_connect_test’: unix_connect.c:115:55: error: expected identifier before ‘(’ token #define offsetof(type, member) ((size_t)&((type *)0)->(member)) ^ unix_connect.c:128:12: note: in expansion of macro ‘offsetof’ addrlen = offsetof(struct sockaddr_un, sun_path) + variant->len; ^~~~~~~~ We can fix this by removing () around member, but checkpatch will complain about it, and the root cause of the build failure is that I followed the warning and fixed this in the v2 -> v3 change of the blamed commit. [1] CHECK: Macro argument 'member' may be better as '(member)' to avoid precedence issues #33: FILE: tools/testing/selftests/net/af_unix/unix_connect.c:115: +#define offsetof(type, member) ((size_t)&((type *)0)->member) To avoid this warning, let's use offsetof() defined in stddef.h instead. [0]: https://lore.kernel.org/linux-mm/202207182205.FrkMeDZT-lkp@intel.com/ [1]: https://lore.kernel.org/netdev/20220702154818.66761-1-kuniyu@amazon.com/ Fixes: e95ab1d8 ("selftests: net: af_unix: Test connect() with different netns.") Reported-by: kernel test robot <lkp@intel.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20220720005750.16600-1-kuniyu@amazon.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jian Shen authored
It's repeated with line 1793-1795, and there isn't any other handling for it. So remove it. Signed-off-by: Jian Shen <shenjian15@huawei.com> Link: https://lore.kernel.org/r/20220719142424.4528-1-shenjian15@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-nextJakub Kicinski authored
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for net-next: 1) Simplify nf_ct_get_tuple(), from Jackie Liu. 2) Add format to request_module() call, from Bill Wendling. 3) Add /proc/net/stats/nf_flowtable to monitor in-flight pending hardware offload objects to be processed, from Vlad Buslov. 4) Missing rcu annotation and accessors in the netfilter tree, from Florian Westphal. 5) Merge h323 conntrack helper nat hooks into single object, also from Florian. 6) A batch of update to fix sparse warnings treewide, from Florian Westphal. 7) Move nft_cmp_fast_mask() where it used, from Florian. 8) Missing const in nf_nat_initialized(), from James Yonan. 9) Use bitmap API for Maglev IPVS scheduler, from Christophe Jaillet. 10) Use refcount_inc instead of _inc_not_zero in flowtable, from Florian Westphal. 11) Remove pr_debug in xt_TPROXY, from Nathan Cancellor. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: xt_TPROXY: remove pr_debug invocations netfilter: flowtable: prefer refcount_inc netfilter: ipvs: Use the bitmap API to allocate bitmaps netfilter: nf_nat: in nf_nat_initialized(), use const struct nf_conn * netfilter: nf_tables: move nft_cmp_fast_mask to where its used netfilter: nf_tables: use correct integer types netfilter: nf_tables: add and use BE register load-store helpers netfilter: nf_tables: use the correct get/put helpers netfilter: x_tables: use correct integer types netfilter: nfnetlink: add missing __be16 cast netfilter: nft_set_bitmap: Fix spelling mistake netfilter: h323: merge nat hook pointers into one netfilter: nf_conntrack: use rcu accessors where needed netfilter: nf_conntrack: add missing __rcu annotations netfilter: nf_flow_table: count pending offload workqueue tasks net/sched: act_ct: set 'net' pointer when creating new nf_flow_table netfilter: conntrack: use correct format characters netfilter: conntrack: use fallthrough to cleanup ==================== Link: https://lore.kernel.org/r/20220720230754.209053-1-pablo@netfilter.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linuxJakub Kicinski authored
Saeed Mahameed says: ==================== mlx5-updates-2022-07-17 1) Add resiliency for lost completions for PTP TX port timestamp 2) Report Header-data split state via ethtool 3) Decouple HTB code from main regular TX code * tag 'mlx5-updates-2022-07-17' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: CT: Remove warning of ignore_flow_level support for non PF net/mlx5e: Add resiliency for PTP TX port timestamp net/mlx5: Expose ts_cqe_metadata_size2wqe_counter net/mlx5e: HTB, move htb functions to a new file net/mlx5e: HTB, change functions name to follow convention net/mlx5e: HTB, remove priv from htb function calls net/mlx5e: HTB, hide and dynamically allocate mlx5e_htb structure net/mlx5e: HTB, move stats and max_sqs to priv net/mlx5e: HTB, move section comment to the right place net/mlx5e: HTB, move ids to selq_params struct net/mlx5e: HTB, reduce visibility of htb functions net/mlx5e: Fix mqprio_rl handling on devlink reload net/mlx5e: Report header-data split state through ethtool ==================== Link: https://lore.kernel.org/r/20220719203529.51151-1-saeed@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 20 Jul, 2022 22 commits
-
-
Justin Stitt authored
pr_debug calls are no longer needed in this file. Pablo suggested "a patch to remove these pr_debug calls". This patch has some other beneficial collateral as it also silences multiple Clang -Wformat warnings that were present in the pr_debug calls. diff from v1 -> v2: * converted if statement one-liner style * x == NULL is now !x Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Florian Westphal authored
With refcount_inc_not_zero, we'd also need a smp_rmb or similar, followed by a test of the CONFIRMED bit. However, the ct pointer is taken from skb->_nfct, its refcount must not be 0 (else, we'd already have a use-after-free bug). Use refcount_inc() instead to clarify the ct refcount is expected to be at least 1. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Christophe JAILLET authored
Use bitmap_zalloc()/bitmap_free() instead of hand-writing them. It is less verbose and it improves the semantic. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Alex Elder authored
When a GSI channel is initially allocated, and after it has been reset, the hardware assumes its ring index is 0. And although we do initialize channels this way, the comments in the IPA code don't really explain this. For event rings, it doesn't matter what value we use initially, so using 0 is just fine. Add some information about the assumptions made by hardware above the definition of the gsi_ring structure in "gsi.h". Zero the index field for all rings (channel and event) when the ring is allocated. As a result, that function initializes all fields in the structure. Stop zeroing the index the top of gsi_channel_program(). Initially we'll use the index value set when the channel ring was allocated. And we'll explicitly zero the index value in gsi_channel_reset() before programming the hardware, adding a comment explaining why it's required. For event rings, use the index initialized by gsi_ring_alloc() rather than 0 when ringing the doorbell in gsi_evt_ring_program(). (It'll still be zero, but we won't assume that to be the case.) Use a local variable in gsi_evt_ring_program() that represents the address of the event ring's ring structure. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Oleksandr Mazur authored
For SFP port prestera driver will use kernel phylink infrastucture to configure port mode based on the module that has beed inserted Co-developed-by: Yevhen Orlov <yevhen.orlov@plvision.eu> Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu> Co-developed-by: Taras Chornyi <taras.chornyi@plvision.eu> Signed-off-by: Taras Chornyi <taras.chornyi@plvision.eu> Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andrey Turkin authored
Some tools (e.g. libxdp) use that information. Signed-off-by: Andrey Turkin <andrey.turkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Merge tag 'linux-can-next-for-5.20-20220720' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== this is a pull request of 29 patches for net-next/master. The first 6 patches target the slcan driver. Dan Carpenter contributes a hardening patch, followed by 5 cleanup patches. Biju Das contributes 5 patches to prepare the sja1000 driver to support the Renesas RZ/N1 SJA1000 CAN controller. Dario Binacchi's patch for the slcan driver fixes a sleep with held spin lock. Another patch by Dario Binacchi fixes a wrong comment in the c_can driver. Pavel Pisa updates the CTU CAN FD IP core registers. Stephane Grosjean contributes 3 patches to the peak_usb driver for cleanups and support of a new MCU. The last 12 patches are by Vincent Mailhol, they fix and improve the txerr and rxerr reporting in all CAN drivers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Marc Kleine-Budde authored
Vincent Mailhol says: ==================== can: error: set of fixes and improvement on txerr and rxerr reporting This series is a collection of patches targeting the CAN error counter. The series is split in three blocks (with small relation to each other). Several drivers uses the data[6] and data[7] fields (both of type u8) of the CAN error frame to report those values. However, the maximum size an u8 can hold is 255 and the error counter can exceed this value if bus-off status occurs. As such, the first nine patches of this series make sure that no drivers try to report txerr or rxerr through the CAN error frame when bus-off status is reached. can_frame::data[5..7] are defined as being "controller specific". Controller specific behaviors are not something desirable (portability issue...) The tenth patch of this series specifies how can_frame::data[5..7] should be use and remove any "controller specific" freedom. The eleventh patch adds a flag to notify though can_frame::can_id that data[6..7] were populated (in order to be consistent with other fields). Finally, the twelfth and last patch add three macro values to specify the different error counter threshold with so far was hard-coded as magic numbers in the drivers. N.B.: * patches 1 to 10 are for net (stable). * patches 11 and 12 are for net-next (but depends on patches 1 to 10). ** Changelog ** v1 -> v2: https://lore.kernel.org/all/20220712153157.83847-1-mailhol.vincent@wanadoo.fr * Fix typo in patch #10: data[7] of CAN error frames is for the RX error counter, not the TX one (this is litteraly a one byte change). ==================== As discussed take the whole series via can-next -> net-next. Link: https://lore.kernel.org/all/20220719143550.3681-1-mailhol.vincent@wanadoo.frSigned-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-
Vincent Mailhol authored
Currently, drivers are using magic numbers to derive the CAN error states from the error counter. Add three macro declarations to remediate this. For reference, the error-active, error-passive and bus-off are defined in ISO 11898, section 12.1.4.2 "Error counting". Although ISO 11898 does not define error-warning state, this extra value is also commonly used and is thus also added. Link: https://lore.kernel.org/all/20220719143550.3681-13-mailhol.vincent@wanadoo.frSigned-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-
Vincent Mailhol authored
Add a dedicated flag in uapi/linux/can/error.h to notify the userland that fields data[6] and data[7] of the CAN error frame were respectively populated with the tx and rx error counters. For all driver tree-wide, set up this flags whenever needed. Link: https://lore.kernel.org/all/20220719143550.3681-12-mailhol.vincent@wanadoo.frSigned-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-
Vincent Mailhol authored
Currently, data[5..7] of struct can_frame, when used as a CAN error frame, are defined as being "controller specific". Device specific behaviours are problematic because it prevents someone from writing code which is portable between devices. As a matter of fact, data[5] is never used, data[6] is always used to report TX error counter and data[7] is always used to report RX error counter. can-utils also relies on this. This patch updates the comment in the uapi header to specify that data[5] is reserved (and thus should not be used) and that data[6..7] are used for error counters. Fixes: 0d66548a ("[CAN]: Add PF_CAN core module") Link: https://lore.kernel.org/all/20220719143550.3681-11-mailhol.vincent@wanadoo.frSigned-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-
Vincent Mailhol authored
During bus off, the error count is greater than 255 and can not fit in a u8. Fixes: 0024d8ad ("can: usb_8dev: Add support for USB2CAN interface from 8 devices") Link: https://lore.kernel.org/all/20220719143550.3681-10-mailhol.vincent@wanadoo.frSigned-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-
Vincent Mailhol authored
During bus off, the error count is greater than 255 and can not fit in a u8. Fixes: 7259124e ("can: kvaser_usb: Split driver into kvaser_usb_core.c and kvaser_usb_leaf.c") Link: https://lore.kernel.org/all/20220719143550.3681-9-mailhol.vincent@wanadoo.fr CC: Jimmy Assarsson <extja@kvaser.com> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-
Vincent Mailhol authored
During bus off, the error count is greater than 255 and can not fit in a u8. Fixes: aec5fb22 ("can: kvaser_usb: Add support for Kvaser USB hydra family") Link: https://lore.kernel.org/all/20220719143550.3681-8-mailhol.vincent@wanadoo.fr CC: Jimmy Assarsson <extja@kvaser.com> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-
Vincent Mailhol authored
During bus off, the error count is greater than 255 and can not fit in a u8. Fixes: 0738eff1 ("can: Allwinner A10/A20 CAN Controller support - Kernel module") Link: https://lore.kernel.org/all/20220719143550.3681-7-mailhol.vincent@wanadoo.fr CC: Chen-Yu Tsai <wens@csie.org> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-
Vincent Mailhol authored
During bus off, the error count is greater than 255 and can not fit in a u8. Fixes: 57e83fb9 ("can: hi311x: Add Holt HI-311x CAN driver") Link: https://lore.kernel.org/all/20220719143550.3681-6-mailhol.vincent@wanadoo.frSigned-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-
Vincent Mailhol authored
During bus off, the error count is greater than 255 and can not fit in a u8. alloc_can_err_skb() already sets cf to NULL if the allocation fails [1], so the redundant cf = NULL assignment gets removed. [1] https://elixir.bootlin.com/linux/latest/source/drivers/net/can/dev/skb.c#L187 Fixes: 0a9cdcf0 ("can: slcan: extend the protocol with CAN state info") Link: https://lore.kernel.org/all/20220719143550.3681-5-mailhol.vincent@wanadoo.fr CC: Dario Binacchi <dario.binacchi@amarulasolutions.com> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-
Vincent Mailhol authored
During bus off, the error count is greater than 255 and can not fit in a u8. Fixes: 215db185 ("can: sja1000: Consolidate and unify state change handling") Link: https://lore.kernel.org/all/20220719143550.3681-4-mailhol.vincent@wanadoo.frSigned-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-
Vincent Mailhol authored
During bus off, the error count is greater than 255 and can not fit in a u8. Fixes: fd115931 ("can: add Renesas R-Car CAN driver") Link: https://lore.kernel.org/all/20220719143550.3681-3-mailhol.vincent@wanadoo.frSigned-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-
Vincent Mailhol authored
During bus off, the error count is greater than 255 and can not fit in a u8. Fixes: 0c78ab76 ("pch_can: Add setting TEC/REC statistics processing") Link: https://lore.kernel.org/all/20220719143550.3681-2-mailhol.vincent@wanadoo.frSigned-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queueJakub Kicinski authored
Tony Nguyen says: ==================== 1GbE Intel Wired LAN Driver Updates 2022-07-18 This series contains updates to igc driver only. Kurt Kanzenbach adds support for Qbv schedules where one queue stays open in consecutive entries. Sasha removes an unused define and field. * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: igc: Remove forced_speed_duplex value igc: Remove MSI-X PBA Clear register igc: Lift TAPRIO schedule restriction ==================== Link: https://lore.kernel.org/r/20220718180109.4114540-1-anthony.l.nguyen@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Davide Caratti authored
the last caller has been removed with commit 96f5e66e ("mac80211: fix aggregation for hardware with ampdu queues"), so it's safe to remove this function. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Link: https://lore.kernel.org/r/703d549e3088367651d92a059743f1be848d74b7.1658133689.git.dcaratti@redhat.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 19 Jul, 2022 4 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/kuba/linuxJakub Kicinski authored
Pavel Begunkov says: ==================== io_uring zerocopy send The patchset implements io_uring zerocopy send. It works with both registered and normal buffers, mixing is allowed but not recommended. Apart from usual request completions, just as with MSG_ZEROCOPY, io_uring separately notifies the userspace when buffers are freed and can be reused (see API design below), which is delivered into io_uring's Completion Queue. Those "buffer-free" notifications are not necessarily per request, but the userspace has control over it and should explicitly attaching a number of requests to a single notification. The series also adds some internal optimisations when used with registered buffers like removing page referencing. From the kernel networking perspective there are two main changes. The first one is passing ubuf_info into the network layer from io_uring (inside of an in kernel struct msghdr). This allows extra optimisations, e.g. ubuf_info caching on the io_uring side, but also helps to avoid cross-referencing and synchronisation problems. The second part is an optional optimisation removing page referencing for requests with registered buffers. Benchmarking UDP with an optimised version of the selftest (see [1]), which sends a bunch of requests, waits for completions and repeats. "+ flush" column posts one additional "buffer-free" notification per request, and just "zc" doesn't post buffer notifications at all. NIC (requests / second): IO size | non-zc | zc | zc + flush 4000 | 495134 | 606420 (+22%) | 558971 (+12%) 1500 | 551808 | 577116 (+4.5%) | 565803 (+2.5%) 1000 | 584677 | 592088 (+1.2%) | 560885 (-4%) 600 | 596292 | 598550 (+0.4%) | 555366 (-6.7%) dummy (requests / second): IO size | non-zc | zc | zc + flush 8000 | 1299916 | 2396600 (+84%) | 2224219 (+71%) 4000 | 1869230 | 2344146 (+25%) | 2170069 (+16%) 1200 | 2071617 | 2361960 (+14%) | 2203052 (+6%) 600 | 2106794 | 2381527 (+13%) | 2195295 (+4%) Previously it also brought a massive performance speedup compared to the msg_zerocopy tool (see [3]), which is probably not super interesting. There is also an additional bunch of refcounting optimisations that was omitted from the series for simplicity and as they don't change the picture drastically, they will be sent as follow up, as well as flushing optimisations closing the performance gap b/w two last columns. For TCP on localhost (with hacks enabling localhost zerocopy) and including additional overhead for receive: IO size | non-zc | zc 1200 | 4174 | 4148 4096 | 7597 | 11228 Using a real NIC 1200 bytes, zc is worse than non-zc ~5-10%, maybe the omitted optimisations will somewhat help, should look better for 4000, but couldn't test properly because of setup problems. Links: liburing (benchmark + tests): [1] https://github.com/isilence/liburing/tree/zc_v4 kernel repo: [2] https://github.com/isilence/linux/tree/zc_v4 RFC v1: [3] https://lore.kernel.org/io-uring/cover.1638282789.git.asml.silence@gmail.com/ RFC v2: https://lore.kernel.org/io-uring/cover.1640029579.git.asml.silence@gmail.com/ Net patches based: git@github.com:isilence/linux.git zc_v4-net-base or https://github.com/isilence/linux/tree/zc_v4-net-base API design overview: The series introduces an io_uring concept of notifactors. From the userspace perspective it's an entity to which it can bind one or more requests and then requesting to flush it. Flushing a notifier makes it impossible to attach new requests to it, and instructs the notifier to post a completion once all requests attached to it are completed and the kernel doesn't need the buffers anymore. Notifications are stored in notification slots, which should be registered as an array in io_uring. Each slot stores only one notifier at any particular moment. Flushing removes it from the slot and the slot automatically replaces it with a new notifier. All operations with notifiers are done by specifying an index of a slot it's currently in. When registering a notification the userspace specifies a u64 tag for each slot, which will be copied in notification completion entries as cqe::user_data. cqe::res is 0 and cqe::flags is equal to wrap around u32 sequence number counting notifiers of a slot. ==================== Link: https://lore.kernel.org/r/cover.1657643355.git.asml.silence@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Pavel Begunkov authored
Teach tcp how to use external ubuf_info provided in msghdr and also prepare it for managed frags by sprinkling skb_zcopy_downgrade_managed() when it could mix managed and not managed frags. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Pavel Begunkov authored
Teach ipv6/udp how to use external ubuf_info provided in msghdr and also prepare it for managed frags by sprinkling skb_zcopy_downgrade_managed() when it could mix managed and not managed frags. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Pavel Begunkov authored
Teach ipv4/udp how to use external ubuf_info provided in msghdr and also prepare it for managed frags by sprinkling skb_zcopy_downgrade_managed() when it could mix managed and not managed frags. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-