- 08 Jul, 2022 1 commit
-
-
Jie Wang authored
Currently NIC packet receiving performance based on page pool deteriorates occasionally. To analysis the causes of this problem page allocation stats are collected. Here are the stats when NIC rx performance deteriorates: bandwidth(Gbits/s) 16.8 6.91 rx_pp_alloc_fast 13794308 21141869 rx_pp_alloc_slow 108625 166481 rx_pp_alloc_slow_h 0 0 rx_pp_alloc_empty 8192 8192 rx_pp_alloc_refill 0 0 rx_pp_alloc_waive 100433 158289 rx_pp_recycle_cached 0 0 rx_pp_recycle_cache_full 0 0 rx_pp_recycle_ring 362400 420281 rx_pp_recycle_ring_full 6064893 9709724 rx_pp_recycle_released_ref 0 0 The rx_pp_alloc_waive count indicates that a large number of pages' numa node are inconsistent with the NIC device numa node. Therefore these pages can't be reused by the page pool. As a result, many new pages would be allocated by __page_pool_alloc_pages_slow which is time consuming. This causes the NIC rx performance fluctuations. The main reason of huge numa mismatch pages in page pool is that page pool uses alloc_pages_bulk_array to allocate original pages. This function is not suitable for page allocation in NUMA scenario. So this patch uses alloc_pages_bulk_array_node which has a NUMA id input parameter to ensure the NUMA consistent between NIC device and allocated pages. Repeated NIC rx performance tests are performed 40 times. NIC rx bandwidth is higher and more stable compared to the datas above. Here are three test stats, the rx_pp_alloc_waive count is zero and rx_pp_alloc_slow which indicates pages allocated from slow patch is relatively low. bandwidth(Gbits/s) 93 93.9 93.8 rx_pp_alloc_fast 60066264 61266386 60938254 rx_pp_alloc_slow 16512 16517 16539 rx_pp_alloc_slow_ho 0 0 0 rx_pp_alloc_empty 16512 16517 16539 rx_pp_alloc_refill 473841 481910 481585 rx_pp_alloc_waive 0 0 0 rx_pp_recycle_cached 0 0 0 rx_pp_recycle_cache_full 0 0 0 rx_pp_recycle_ring 29754145 30358243 30194023 rx_pp_recycle_ring_full 0 0 0 rx_pp_recycle_released_ref 0 0 0 Signed-off-by: Jie Wang <wangjie125@huawei.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Link: https://lore.kernel.org/r/20220705113515.54342-1-huangguangbin2@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 07 Jul, 2022 23 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski authored
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds authored
Pull networking fixes from Paolo Abeni: "Including fixes from bpf, netfilter, can, and bluetooth. Current release - regressions: - bluetooth: fix deadlock on hci_power_on_sync Previous releases - regressions: - sched: act_police: allow 'continue' action offload - eth: usbnet: fix memory leak in error case - eth: ibmvnic: properly dispose of all skbs during a failover Previous releases - always broken: - bpf: - fix insufficient bounds propagation from adjust_scalar_min_max_vals - clear page contiguity bit when unmapping pool - netfilter: nft_set_pipapo: release elements in clone from abort path - mptcp: netlink: issue MP_PRIO signals from userspace PMs - can: - rcar_canfd: fix data transmission failed on R-Car V3U - gs_usb: gs_usb_open/close(): fix memory leak Misc: - add Wenjia as SMC maintainer" * tag 'net-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits) wireguard: Kconfig: select CRYPTO_CHACHA_S390 crypto: s390 - do not depend on CRYPTO_HW for SIMD implementations wireguard: selftests: use microvm on x86 wireguard: selftests: always call kernel makefile wireguard: selftests: use virt machine on m68k wireguard: selftests: set fake real time in init r8169: fix accessing unset transport header net: rose: fix UAF bug caused by rose_t0timer_expiry usbnet: fix memory leak in error case Revert "tls: rx: move counting TlsDecryptErrors for sync" mptcp: update MIB_RMSUBFLOW in cmd_sf_destroy mptcp: fix local endpoint accounting selftests: mptcp: userspace PM support for MP_PRIO signals mptcp: netlink: issue MP_PRIO signals from userspace PMs mptcp: Acquire the subflow socket lock before modifying MP_PRIO flags mptcp: Avoid acquiring PM lock for subflow priority changes mptcp: fix locking in mptcp_nl_cmd_sf_destroy() net/mlx5e: Fix matchall police parameters validation net/sched: act_police: allow 'continue' action offload net: lan966x: hardcode the number of external ports ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrlLinus Torvalds authored
Pull pin control fixes from Linus Walleij: - Tag Intel pin control as supported in MAINTAINERS - Fix a NULL pointer exception in the Aspeed driver - Correct some NAND functions in the Sunxi A83T driver - Use the right offset for some Sunxi pins - Fix a zero base offset in the Freescale (NXP) i.MX93 - Fix the IRQ support in the STM32 driver * tag 'pinctrl-v5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: stm32: fix optional IRQ support to gpios pinctrl: imx: Add the zero base flag for imx93 pinctrl: sunxi: sunxi_pconf_set: use correct offset pinctrl: sunxi: a83t: Fix NAND function name for some pins pinctrl: aspeed: Fix potential NULL dereference in aspeed_pinmux_set_mux() MAINTAINERS: Update Intel pin control to Supported
-
Linus Torvalds authored
These are indeed "should not happen" situations, but it turns out recent changes made the 'task_is_stopped_or_trace()' case trigger (fix for that exists, is pending more testing), and the BUG_ON() makes it unnecessarily hard to actually debug for no good reason. It's been that way for a long time, but let's make it clear: BUG_ON() is not good for debugging, and should never be used in situations where you could just say "this shouldn't happen, but we can continue". Use WARN_ON_ONCE() instead to make sure it gets logged, and then just continue running. Instead of making the system basically unusuable because you crashed the machine while potentially holding some very core locks (eg this function is commonly called while holding 'tasklist_lock' for writing). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kuniyuki Iwashima authored
Commit 6dd4142f ("Merge branch 'af_unix-per-netns-socket-hash'") and commit 51bae889 ("af_unix: Put pathname sockets in the global hash table.") changed a hash table layout. Before: unix_socket_table [0 - 255] : abstract & pathname sockets [256 - 511] : unnamed sockets After: per-netns table [0 - 255] : abstract & pathname sockets [256 - 511] : unnamed sockets bsd_socket_table [0 - 255] : pathname sockets (sk_bind_node) Now, while looking up sockets, we traverse the global table for the pathname sockets and the first half of each per-netns hash table for abstract sockets, where pathname sockets are also linked. Thus, the more pathname sockets we have, the longer we take to look up abstract sockets. This characteristic has been there before the layout change, but we can improve it now. This patch changes the per-netns hash table's layout so that sockets not requiring lookup reside in the first half and do not impact the lookup of abstract sockets. per-netns table [0 - 255] : pathname & unnamed sockets [256 - 511] : abstract sockets bsd_socket_table [0 - 255] : pathname sockets (sk_bind_node) We have run a test that bind()s 100,000 abstract/pathname sockets for each, bind()s an abstract socket 100,000 times and measures the time on __unix_find_socket_byname(). The result shows that the patch makes each lookup faster. Without this patch: $ sudo ./funclatency -p 2278 --microseconds __unix_find_socket_byname.isra.44 usec : count distribution 0 -> 1 : 0 | | 2 -> 3 : 0 | | 4 -> 7 : 0 | | 8 -> 15 : 126 | | 16 -> 31 : 1438 |* | 32 -> 63 : 4150 |*** | 64 -> 127 : 9049 |******* | 128 -> 255 : 37704 |******************************* | 256 -> 511 : 47533 |****************************************| With this patch: $ sudo ./funclatency -p 3648 --microseconds __unix_find_socket_byname.isra.46 usec : count distribution 0 -> 1 : 109 | | 2 -> 3 : 318 | | 4 -> 7 : 725 | | 8 -> 15 : 2501 |* | 16 -> 31 : 3061 |** | 32 -> 63 : 4028 |*** | 64 -> 127 : 9312 |******* | 128 -> 255 : 51372 |****************************************| 256 -> 511 : 28574 |********************** | Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20220705233715.759-1-kuniyu@amazon.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Jakub Kicinski authored
Jason A. Donenfeld says: ==================== wireguard patches for 5.19-rc6 1) A few small fixups to the selftests, per usual. Of particular note is a fix for a test flake that occurred on especially fast systems that boot in less than a second. 2) An addition during this cycle of some s390 crypto interacted with the way wireguard selects dependencies, resulting in linker errors reported by the kernel test robot. So Vladis sent in a patch for that, which also required a small preparatory fix moving some Kconfig symbols around. ==================== Link: https://lore.kernel.org/r/20220707003157.526645-1-Jason@zx2c4.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Vladis Dronov authored
Select the new implementation of CHACHA20 for S390 when available. It is faster than the generic software implementation, but also prevents some linker errors in certain situations. Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/linux-kernel/202207030630.6SZVkrWf-lkp@intel.com/Signed-off-by: Vladis Dronov <vdronov@redhat.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jason A. Donenfeld authored
Various accelerated software implementation Kconfig values for S390 were mistakenly placed into drivers/crypto/Kconfig, even though they're mainly just SIMD code and live in arch/s390/crypto/ like usual. This gives them the very unusual dependency on CRYPTO_HW, which leads to problems elsewhere. This patch fixes the issue by moving the Kconfig values for non-hardware drivers into the usual place in crypto/Kconfig. Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jason A. Donenfeld authored
This makes for faster tests, faster compile time, and allows us to ditch ACPI finally. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jason A. Donenfeld authored
These selftests are used for much more extensive changes than just the wireguard source files. So always call the kernel's build file, which will do something or nothing after checking the whole tree, per usual. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jason A. Donenfeld authored
This should be a bit more stable hopefully. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jason A. Donenfeld authored
Not all platforms have an RTC, and rather than trying to force one into each, it's much easier to just set a fixed time. This is necessary because WireGuard's latest handshakes parameter is returned in wallclock time, and if the system time isn't set, and the system is really fast, then this returns 0, which trips the test. Turning this on requires setting CONFIG_COMPAT_32BIT_TIME=y, as musl doesn't support settimeofday without it. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Christophe JAILLET authored
Use bitmap_empty() instead of hand-writing it. It is less verbose and it improves the semantic. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/78713a72414b99f673c3a9ec0519bb41c080935a.1657053343.git.christophe.jaillet@wanadoo.frSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Christophe JAILLET authored
Use bitmap_zalloc()/bitmap_free() instead of hand-writing them. It is less verbose and it improves the semantic. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/d61ec77ce0b92f7539c6a144106139f8d737ec29.1657053343.git.christophe.jaillet@wanadoo.frSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Christophe JAILLET authored
Use bitmap_zalloc()/bitmap_free() instead of hand-writing them. It is less verbose and it improves the semantic. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/521bd2a49be5d88e493bcfb63505d3df91a1c2d2.1657052743.git.christophe.jaillet@wanadoo.frSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Christophe JAILLET authored
Use bitmap_zalloc()/bitmap_free() instead of hand-writing them. It is less verbose and it improves the semantic. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/d508f3adf7e2804f4d3793271b82b196a2ccb940.1657052562.git.christophe.jaillet@wanadoo.frSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Christophe JAILLET authored
Use bitmap_zalloc()/bitmap_free() instead of hand-writing them. It is less verbose and it improves the semantic. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Link: https://lore.kernel.org/r/c62c1774e6a34bc64323ce526b385aa87c1ca575.1657049799.git.christophe.jaillet@wanadoo.frSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Christophe JAILLET authored
Use bitmap_zalloc()/bitmap_free() instead of hand-writing them. It is less verbose and it improves the semantic. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Link: https://lore.kernel.org/r/717ba530215f4d7ce9fedcc73d98dba1f70d7f71.1657049636.git.christophe.jaillet@wanadoo.frSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Heiner Kallweit authored
66e4c8d9 ("net: warn if transport header was not set") added a check that triggers a warning in r8169, see [0]. The commit referenced in the Fixes tag refers to the change from which the patch applies cleanly, there's nothing wrong with this commit. It seems the actual issue (not bug, because the warning is harmless here) was introduced with bdfa4ed6 ("r8169: use Giant Send"). [0] https://bugzilla.kernel.org/show_bug.cgi?id=216157 Fixes: 8d520b4d ("r8169: work around RTL8125 UDP hw bug") Reported-by: Erhard F. <erhard_f@mailbox.org> Tested-by: Erhard F. <erhard_f@mailbox.org> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/1b2c2b29-3dc0-f7b6-5694-97ec526d51a0@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Yang Yingliang authored
Remove unnecessary spi_set_drvdata() in b53_spi_remove(), the driver_data will be set to NULL in device_unbind_cleanup() after calling ->remove(). Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20220705131733.351962-1-yangyingliang@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Duoming Zhou authored
There are UAF bugs caused by rose_t0timer_expiry(). The root cause is that del_timer() could not stop the timer handler that is running and there is no synchronization. One of the race conditions is shown below: (thread 1) | (thread 2) | rose_device_event | rose_rt_device_down | rose_remove_neigh rose_t0timer_expiry | rose_stop_t0timer(rose_neigh) ... | del_timer(&neigh->t0timer) | kfree(rose_neigh) //[1]FREE neigh->dce_mode //[2]USE | The rose_neigh is deallocated in position [1] and use in position [2]. The crash trace triggered by POC is like below: BUG: KASAN: use-after-free in expire_timers+0x144/0x320 Write of size 8 at addr ffff888009b19658 by task swapper/0/0 ... Call Trace: <IRQ> dump_stack_lvl+0xbf/0xee print_address_description+0x7b/0x440 print_report+0x101/0x230 ? expire_timers+0x144/0x320 kasan_report+0xed/0x120 ? expire_timers+0x144/0x320 expire_timers+0x144/0x320 __run_timers+0x3ff/0x4d0 run_timer_softirq+0x41/0x80 __do_softirq+0x233/0x544 ... This patch changes rose_stop_ftimer() and rose_stop_t0timer() in rose_remove_neigh() to del_timer_sync() in order that the timer handler could be finished before the resources such as rose_neigh and so on are deallocated. As a result, the UAF bugs could be mitigated. Fixes: 1da177e4 ("Linux-2.6.12-rc2") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Link: https://lore.kernel.org/r/20220705125610.77971-1-duoming@zju.edu.cnSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Oliver Neukum authored
usbnet_write_cmd_async() mixed up which buffers need to be freed in which error case. v2: add Fixes tag v3: fix uninitialized buf pointer Fixes: 877bd862 ("usbnet: introduce usbnet 3 command helpers") Signed-off-by: Oliver Neukum <oneukum@suse.com> Link: https://lore.kernel.org/r/20220705125351.17309-1-oneukum@suse.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
This reverts commit 2ef8e39f, reversing changes made to e7ce9fc9. There are build warnings here which break the normal build due to -Werror. Ratheesh was nice enough to quickly follow up with fixes but didn't hit all the warnings I see on GCC 12 so to unlock net-next from taking patches let get this series out for now. Link: https://lore.kernel.org/r/20220707013201.1372433-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 06 Jul, 2022 16 commits
-
-
https://github.com/openrisc/linuxLinus Torvalds authored
Pull OpenRISC fixes from Stafford Horne: "Fixups for OpenRISC found during recent testing: - An OpenRISC irqchip fix to stop acking level interrupts which was causing issues on SMP platforms - A comment typo fix in our unwinder code" * tag 'for-linus' of https://github.com/openrisc/linux: openrisc: unwinder: Fix grammar issue in comment irqchip: or1k-pic: Undefine mask_ack for level triggered hardware
-
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/soundLinus Torvalds authored
Pull sound fixes from Takashi Iwai: "This became largish as it includes the pending ASoC fixes. Almost all changes are device-specific small fixes, while many of them are coverage for mixer issues that were detected by selftest. In addition, usual suspects for HD/USB-audio are there" * tag 'sound-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (43 commits) ALSA: cs46xx: Fix missing snd_card_free() call at probe error ALSA: usb-audio: Add quirk for Fiero SC-01 (fw v1.0.0) ALSA: usb-audio: Add quirk for Fiero SC-01 ALSA: hda/realtek: Add quirk for Clevo L140PU ALSA: usb-audio: Add quirks for MacroSilicon MS2100/MS2106 devices ASoC: madera: Fix event generation for rate controls ASoC: madera: Fix event generation for OUT1 demux ASoC: cs47l15: Fix event generation for low power mux control ASoC: cs35l41: Add ASP TX3/4 source to register patch ASoC: dapm: Initialise kcontrol data for mux/demux controls ASoC: rt711-sdca: fix kernel NULL pointer dereference when IO error ASoC: cs35l41: Correct some control names ASoC: wm5110: Fix DRE control ASoC: wm_adsp: Fix event for preloader MAINTAINERS: update ASoC Qualcomm maintainer email-id ASoC: rockchip: i2s: switch BCLK to GPIO ASoC: SOF: Intel: disable IMR boot when resuming from ACPI S4 and S5 states ASoC: SOF: pm: add definitions for S4 and S5 states ASoC: SOF: pm: add explicit behavior for ACPI S1 and S2 ASoC: SOF: Intel: hda: Fix compressed stream position tracking ...
-
Gal Pressman authored
This reverts commit 284b4d93. When using TLS device offload and coming from tls_device_reencrypt() flow, -EBADMSG error in tls_do_decryption() should not be counted towards the TLSTlsDecryptError counter. Move the counter increase back to the decrypt_internal() call site in decrypt_skb_update(). This also fixes an issue where: if (n_sgin < 1) return -EBADMSG; Errors in decrypt_internal() were not counted after the cited patch. Fixes: 284b4d93 ("tls: rx: move counting TlsDecryptErrors for sync") Cc: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Qiao Ma says: ==================== net: hinic: fix bugs about dev_get_stats These patches fixes 2 bugs of hinic driver: - fix bug that ethtool get wrong stats because of hinic_{txq|rxq}_clean_stats() is called - avoid kernel hung in hinic_get_stats64() See every patch for more information. Changes in v4: - removed meaningless u64_stats_sync protection in hinic_{txq|rxq}_get_stats - merged the third patch in v2 into first one Changes in v3: - fixes a compile warning reported by kernel test robot <lkp@intel.com> Changes in v2: - fixes another 2 bugs. (v1 is a single patch, see: https://lore.kernel.org/all/07736c2b7019b6883076a06129e06e8f7c5f7154.1656487154.git.mqaio@linux.alibaba.com/). - to fix extra bugs, hinic_dev.tx_stats/rx_stats is removed, so there is no need to use spinlock or semaphore now. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Qiao Ma authored
When using hinic device as a bond slave device, and reading device stats of master bond device, the kernel may hung. The kernel panic calltrace as follows: Kernel panic - not syncing: softlockup: hung tasks Call trace: native_queued_spin_lock_slowpath+0x1ec/0x31c dev_get_stats+0x60/0xcc dev_seq_printf_stats+0x40/0x120 dev_seq_show+0x1c/0x40 seq_read_iter+0x3c8/0x4dc seq_read+0xe0/0x130 proc_reg_read+0xa8/0xe0 vfs_read+0xb0/0x1d4 ksys_read+0x70/0xfc __arm64_sys_read+0x20/0x30 el0_svc_common+0x88/0x234 do_el0_svc+0x2c/0x90 el0_svc+0x1c/0x30 el0_sync_handler+0xa8/0xb0 el0_sync+0x148/0x180 And the calltrace of task that actually caused kernel hungs as follows: __switch_to+124 __schedule+548 schedule+72 schedule_timeout+348 __down_common+188 __down+24 down+104 hinic_get_stats64+44 [hinic] dev_get_stats+92 bond_get_stats+172 [bonding] dev_get_stats+92 dev_seq_printf_stats+60 dev_seq_show+24 seq_read_iter+964 seq_read+220 proc_reg_read+164 vfs_read+172 ksys_read+108 __arm64_sys_read+28 el0_svc_common+132 do_el0_svc+40 el0_svc+24 el0_sync_handler+164 el0_sync+324 When getting device stats from bond, kernel will call bond_get_stats(). It first holds the spinlock bond->stats_lock, and then call hinic_get_stats64() to collect hinic device's stats. However, hinic_get_stats64() calls `down(&nic_dev->mgmt_lock)` to protect its critical section, which may schedule current task out. And if system is under high pressure, the task cannot be woken up immediately, which eventually triggers kernel hung panic. Since previous patch has replaced hinic_dev.tx_stats/rx_stats with local variable in hinic_get_stats64(), there is nothing need to be protected by lock, so just removing down()/up() is ok. Fixes: edd384f6 ("net-next/hinic: Add ethtool and stats") Signed-off-by: Qiao Ma <mqaio@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Qiao Ma authored
Function hinic_get_stats64() will do two operations: 1. reads stats from every hinic_rxq/txq and accumulates them 2. calls hinic_rxq/txq_clean_stats() to clean every rxq/txq's stats For hinic_get_stats64(), it could get right data, because it sums all data to nic_dev->rx_stats/tx_stats. But it is wrong for get_drv_queue_stats(), this function will read hinic_rxq's stats, which have been cleared to zero by hinic_get_stats64(). I have observed hinic's cleanup operation by using such command: > watch -n 1 "cat ethtool -S eth4 | tail -40" Result before: ... rxq7_pkts: 1 rxq7_bytes: 90 rxq7_errors: 0 rxq7_csum_errors: 0 rxq7_other_errors: 0 ... rxq9_pkts: 11 rxq9_bytes: 726 rxq9_errors: 0 rxq9_csum_errors: 0 rxq9_other_errors: 0 ... rxq11_pkts: 0 rxq11_bytes: 0 rxq11_errors: 0 rxq11_csum_errors: 0 rxq11_other_errors: 0 Result after a few seconds: ... rxq7_pkts: 0 rxq7_bytes: 0 rxq7_errors: 0 rxq7_csum_errors: 0 rxq7_other_errors: 0 ... rxq9_pkts: 2 rxq9_bytes: 132 rxq9_errors: 0 rxq9_csum_errors: 0 rxq9_other_errors: 0 ... rxq11_pkts: 1 rxq11_bytes: 170 rxq11_errors: 0 rxq11_csum_errors: 0 rxq11_other_errors: 0 To solve this problem, we just keep every queue's total stats in their own queue (aka hinic_{rxq|txq}), and simply sum all per-queue stats every time calling hinic_get_stats64(). With that solution, there is no need to clean per-queue stats now, and there is no need to maintain global hinic_dev.{tx|rx}_stats, too. Fixes: edd384f6 ("net-next/hinic: Add ethtool and stats") Signed-off-by: Qiao Ma <mqaio@linux.alibaba.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Jakub Kicinski says: ==================== tls: rx: nopad and backlog flushing This small series contains the two changes I've been working towards in the previous ~50 patches a couple of months ago. The first major change is the optional "nopad" optimization. Currently TLS 1.3 Rx performs quite poorly because it does not support the "zero-copy" or rather direct decrypt to a user space buffer. Because of TLS 1.3 record padding we don't know if a record contains data or a control message until we decrypt it. Most records will contain data, tho, so the optimization is to try the decryption hoping its data and retry if it wasn't. The performance gain from doing that is significant (~40%) but if I'm completely honest the major reason is that we call skb_cow_data() on the non-"zc" path. The next series will remove the CoW, dropping the gain to only ~10%. The second change is to flush the backlog every 128kB. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
We continuously hold the socket lock during large reads and writes. This may inflate RTT and negatively impact TCP performance. Flush the backlog periodically. I tried to pick a flush period (128kB) which gives significant benefit but the max Bps rate is not yet visibly impacted. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Add a self-test variant with TLS 1.3 nopad set. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Since optimisitic decrypt may add extra load in case of retries require socket owner to explicitly opt-in. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
We currently don't support decrypt to user buffer with TLS 1.3 because we don't know the record type and how much padding record contains before decryption. In practice data records are by far most common and padding gets used rarely so we can assume data record, no padding, and if we find out that wasn't the case - retry the crypto in place (decrypt to skb). To safeguard from user overwriting content type and padding before we can check it attach a 1B sg entry where last byte of the record will land. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
To make future patches easier to review make data_len contain the length of the data, without the tail. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Mat Martineau says: ==================== mptcp: Path manager fixes for 5.19 The MPTCP userspace path manager is new in 5.19, and these patches fix some issues in that new code. Patches 1-3 fix path manager locking issues. Patches 4 and 5 allow userspace path managers to change priority of established subflows using the existing MPTCP_PM_CMD_SET_FLAGS generic netlink command. Includes corresponding self test update. Patches 6 and 7 fix accounting of available endpoint IDs and the MPTCP_MIB_RMSUBFLOW counter. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Geliang Tang authored
This patch increases MPTCP_MIB_RMSUBFLOW mib counter in userspace pm destroy subflow function mptcp_nl_cmd_sf_destroy() when removing subflow. Fixes: 702c2f64 ("mptcp: netlink: allow userspace-driven subflow establishment") Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
In mptcp_pm_nl_rm_addr_or_subflow() we always mark as available the id corresponding to the just removed address. The used bitmap actually tracks only the local IDs: we must restrict the operation when a (local) subflow is removed. Fixes: a88c9e49 ("mptcp: do not block subflows creation on errors") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Kishen Maloor authored
This change updates the testing sample (pm_nl_ctl) to exercise the updated MPTCP_PM_CMD_SET_FLAGS command for userspace PMs to issue MP_PRIO signals over the selected subflow. E.g. ./pm_nl_ctl set 10.0.1.2 port 47234 flags backup token 823274047 rip 10.0.1.1 rport 50003 userspace_pm.sh has a new selftest that invokes this command. Fixes: 259a834f ("selftests: mptcp: functional tests for the userspace PM type") Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Kishen Maloor <kishen.maloor@intel.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-