1. 08 Jul, 2022 1 commit
    • Jie Wang's avatar
      net: page_pool: optimize page pool page allocation in NUMA scenario · d810d367
      Jie Wang authored
      Currently NIC packet receiving performance based on page pool deteriorates
      occasionally. To analysis the causes of this problem page allocation stats
      are collected. Here are the stats when NIC rx performance deteriorates:
      
      bandwidth(Gbits/s)		16.8		6.91
      rx_pp_alloc_fast		13794308	21141869
      rx_pp_alloc_slow		108625		166481
      rx_pp_alloc_slow_h		0		0
      rx_pp_alloc_empty		8192		8192
      rx_pp_alloc_refill		0		0
      rx_pp_alloc_waive		100433		158289
      rx_pp_recycle_cached		0		0
      rx_pp_recycle_cache_full	0		0
      rx_pp_recycle_ring		362400		420281
      rx_pp_recycle_ring_full		6064893		9709724
      rx_pp_recycle_released_ref	0		0
      
      The rx_pp_alloc_waive count indicates that a large number of pages' numa
      node are inconsistent with the NIC device numa node. Therefore these pages
      can't be reused by the page pool. As a result, many new pages would be
      allocated by __page_pool_alloc_pages_slow which is time consuming. This
      causes the NIC rx performance fluctuations.
      
      The main reason of huge numa mismatch pages in page pool is that page pool
      uses alloc_pages_bulk_array to allocate original pages. This function is
      not suitable for page allocation in NUMA scenario. So this patch uses
      alloc_pages_bulk_array_node which has a NUMA id input parameter to ensure
      the NUMA consistent between NIC device and allocated pages.
      
      Repeated NIC rx performance tests are performed 40 times. NIC rx bandwidth
      is higher and more stable compared to the datas above. Here are three test
      stats, the rx_pp_alloc_waive count is zero and rx_pp_alloc_slow which
      indicates pages allocated from slow patch is relatively low.
      
      bandwidth(Gbits/s)		93		93.9		93.8
      rx_pp_alloc_fast		60066264	61266386	60938254
      rx_pp_alloc_slow		16512		16517		16539
      rx_pp_alloc_slow_ho		0		0		0
      rx_pp_alloc_empty		16512		16517		16539
      rx_pp_alloc_refill		473841		481910		481585
      rx_pp_alloc_waive		0		0		0
      rx_pp_recycle_cached		0		0		0
      rx_pp_recycle_cache_full	0		0		0
      rx_pp_recycle_ring		29754145	30358243	30194023
      rx_pp_recycle_ring_full		0		0		0
      rx_pp_recycle_released_ref	0		0		0
      Signed-off-by: default avatarJie Wang <wangjie125@huawei.com>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Link: https://lore.kernel.org/r/20220705113515.54342-1-huangguangbin2@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d810d367
  2. 07 Jul, 2022 23 commits
  3. 06 Jul, 2022 16 commits
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of https://github.com/openrisc/linux · 9f09069c
      Linus Torvalds authored
      Pull OpenRISC fixes from Stafford Horne:
       "Fixups for OpenRISC found during recent testing:
      
         - An OpenRISC irqchip fix to stop acking level interrupts which was
           causing issues on SMP platforms
      
         - A comment typo fix in our unwinder code"
      
      * tag 'for-linus' of https://github.com/openrisc/linux:
        openrisc: unwinder: Fix grammar issue in comment
        irqchip: or1k-pic: Undefine mask_ack for level triggered hardware
      9f09069c
    • Linus Torvalds's avatar
      Merge tag 'sound-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · c3850b3f
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "This became largish as it includes the pending ASoC fixes.
      
        Almost all changes are device-specific small fixes, while many of them
        are coverage for mixer issues that were detected by selftest. In
        addition, usual suspects for HD/USB-audio are there"
      
      * tag 'sound-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (43 commits)
        ALSA: cs46xx: Fix missing snd_card_free() call at probe error
        ALSA: usb-audio: Add quirk for Fiero SC-01 (fw v1.0.0)
        ALSA: usb-audio: Add quirk for Fiero SC-01
        ALSA: hda/realtek: Add quirk for Clevo L140PU
        ALSA: usb-audio: Add quirks for MacroSilicon MS2100/MS2106 devices
        ASoC: madera: Fix event generation for rate controls
        ASoC: madera: Fix event generation for OUT1 demux
        ASoC: cs47l15: Fix event generation for low power mux control
        ASoC: cs35l41: Add ASP TX3/4 source to register patch
        ASoC: dapm: Initialise kcontrol data for mux/demux controls
        ASoC: rt711-sdca: fix kernel NULL pointer dereference when IO error
        ASoC: cs35l41: Correct some control names
        ASoC: wm5110: Fix DRE control
        ASoC: wm_adsp: Fix event for preloader
        MAINTAINERS: update ASoC Qualcomm maintainer email-id
        ASoC: rockchip: i2s: switch BCLK to GPIO
        ASoC: SOF: Intel: disable IMR boot when resuming from ACPI S4 and S5 states
        ASoC: SOF: pm: add definitions for S4 and S5 states
        ASoC: SOF: pm: add explicit behavior for ACPI S1 and S2
        ASoC: SOF: Intel: hda: Fix compressed stream position tracking
        ...
      c3850b3f
    • Gal Pressman's avatar
      Revert "tls: rx: move counting TlsDecryptErrors for sync" · a069a905
      Gal Pressman authored
      This reverts commit 284b4d93.
      When using TLS device offload and coming from tls_device_reencrypt()
      flow, -EBADMSG error in tls_do_decryption() should not be counted
      towards the TLSTlsDecryptError counter.
      
      Move the counter increase back to the decrypt_internal() call site in
      decrypt_skb_update().
      This also fixes an issue where:
      	if (n_sgin < 1)
      		return -EBADMSG;
      
      Errors in decrypt_internal() were not counted after the cited patch.
      
      Fixes: 284b4d93 ("tls: rx: move counting TlsDecryptErrors for sync")
      Cc: Jakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarMaxim Mikityanskiy <maximmi@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarGal Pressman <gal@nvidia.com>
      Reviewed-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a069a905
    • David S. Miller's avatar
      Merge branch 'hinic-dev_get_stats-fixes' · cd355d0b
      David S. Miller authored
      Qiao Ma says:
      
      ====================
      net: hinic: fix bugs about dev_get_stats
      
      These patches fixes 2 bugs of hinic driver:
      - fix bug that ethtool get wrong stats because of hinic_{txq|rxq}_clean_stats() is called
      - avoid kernel hung in hinic_get_stats64()
      
      See every patch for more information.
      
      Changes in v4:
      - removed meaningless u64_stats_sync protection in hinic_{txq|rxq}_get_stats
      - merged the third patch in v2 into first one
      
      Changes in v3:
      - fixes a compile warning reported by kernel test robot <lkp@intel.com>
      
      Changes in v2:
      - fixes another 2 bugs. (v1 is a single patch, see: https://lore.kernel.org/all/07736c2b7019b6883076a06129e06e8f7c5f7154.1656487154.git.mqaio@linux.alibaba.com/).
      - to fix extra bugs, hinic_dev.tx_stats/rx_stats is removed, so there is no need to use spinlock or semaphore now.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cd355d0b
    • Qiao Ma's avatar
      net: hinic: avoid kernel hung in hinic_get_stats64() · 98f9fcde
      Qiao Ma authored
      When using hinic device as a bond slave device, and reading device stats
      of master bond device, the kernel may hung.
      
      The kernel panic calltrace as follows:
      Kernel panic - not syncing: softlockup: hung tasks
      Call trace:
        native_queued_spin_lock_slowpath+0x1ec/0x31c
        dev_get_stats+0x60/0xcc
        dev_seq_printf_stats+0x40/0x120
        dev_seq_show+0x1c/0x40
        seq_read_iter+0x3c8/0x4dc
        seq_read+0xe0/0x130
        proc_reg_read+0xa8/0xe0
        vfs_read+0xb0/0x1d4
        ksys_read+0x70/0xfc
        __arm64_sys_read+0x20/0x30
        el0_svc_common+0x88/0x234
        do_el0_svc+0x2c/0x90
        el0_svc+0x1c/0x30
        el0_sync_handler+0xa8/0xb0
        el0_sync+0x148/0x180
      
      And the calltrace of task that actually caused kernel hungs as follows:
        __switch_to+124
        __schedule+548
        schedule+72
        schedule_timeout+348
        __down_common+188
        __down+24
        down+104
        hinic_get_stats64+44 [hinic]
        dev_get_stats+92
        bond_get_stats+172 [bonding]
        dev_get_stats+92
        dev_seq_printf_stats+60
        dev_seq_show+24
        seq_read_iter+964
        seq_read+220
        proc_reg_read+164
        vfs_read+172
        ksys_read+108
        __arm64_sys_read+28
        el0_svc_common+132
        do_el0_svc+40
        el0_svc+24
        el0_sync_handler+164
        el0_sync+324
      
      When getting device stats from bond, kernel will call bond_get_stats().
      It first holds the spinlock bond->stats_lock, and then call
      hinic_get_stats64() to collect hinic device's stats.
      However, hinic_get_stats64() calls `down(&nic_dev->mgmt_lock)` to
      protect its critical section, which may schedule current task out.
      And if system is under high pressure, the task cannot be woken up
      immediately, which eventually triggers kernel hung panic.
      
      Since previous patch has replaced hinic_dev.tx_stats/rx_stats with local
      variable in hinic_get_stats64(), there is nothing need to be protected
      by lock, so just removing down()/up() is ok.
      
      Fixes: edd384f6 ("net-next/hinic: Add ethtool and stats")
      Signed-off-by: default avatarQiao Ma <mqaio@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      98f9fcde
    • Qiao Ma's avatar
      net: hinic: fix bug that ethtool get wrong stats · 67dffd3d
      Qiao Ma authored
      Function hinic_get_stats64() will do two operations:
      1. reads stats from every hinic_rxq/txq and accumulates them
      2. calls hinic_rxq/txq_clean_stats() to clean every rxq/txq's stats
      
      For hinic_get_stats64(), it could get right data, because it sums all
      data to nic_dev->rx_stats/tx_stats.
      But it is wrong for get_drv_queue_stats(), this function will read
      hinic_rxq's stats, which have been cleared to zero by hinic_get_stats64().
      
      I have observed hinic's cleanup operation by using such command:
      > watch -n 1 "cat ethtool -S eth4 | tail -40"
      
      Result before:
           ...
           rxq7_pkts: 1
           rxq7_bytes: 90
           rxq7_errors: 0
           rxq7_csum_errors: 0
           rxq7_other_errors: 0
           ...
           rxq9_pkts: 11
           rxq9_bytes: 726
           rxq9_errors: 0
           rxq9_csum_errors: 0
           rxq9_other_errors: 0
           ...
           rxq11_pkts: 0
           rxq11_bytes: 0
           rxq11_errors: 0
           rxq11_csum_errors: 0
           rxq11_other_errors: 0
      
      Result after a few seconds:
           ...
           rxq7_pkts: 0
           rxq7_bytes: 0
           rxq7_errors: 0
           rxq7_csum_errors: 0
           rxq7_other_errors: 0
           ...
           rxq9_pkts: 2
           rxq9_bytes: 132
           rxq9_errors: 0
           rxq9_csum_errors: 0
           rxq9_other_errors: 0
           ...
           rxq11_pkts: 1
           rxq11_bytes: 170
           rxq11_errors: 0
           rxq11_csum_errors: 0
           rxq11_other_errors: 0
      
      To solve this problem, we just keep every queue's total stats in their own
      queue (aka hinic_{rxq|txq}), and simply sum all per-queue stats every time
      calling hinic_get_stats64().
      With that solution, there is no need to clean per-queue stats now,
      and there is no need to maintain global hinic_dev.{tx|rx}_stats, too.
      
      Fixes: edd384f6 ("net-next/hinic: Add ethtool and stats")
      Signed-off-by: default avatarQiao Ma <mqaio@linux.alibaba.com>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      67dffd3d
    • David S. Miller's avatar
      Merge branch 'tls-rx-nopad-and-backlog-flushing' · 4874fb94
      David S. Miller authored
      Jakub Kicinski says:
      
      ====================
      tls: rx: nopad and backlog flushing
      
      This small series contains the two changes I've been working
      towards in the previous ~50 patches a couple of months ago.
      
      The first major change is the optional "nopad" optimization.
      Currently TLS 1.3 Rx performs quite poorly because it does
      not support the "zero-copy" or rather direct decrypt to a user
      space buffer. Because of TLS 1.3 record padding we don't
      know if a record contains data or a control message until
      we decrypt it. Most records will contain data, tho, so the
      optimization is to try the decryption hoping its data and
      retry if it wasn't.
      
      The performance gain from doing that is significant (~40%)
      but if I'm completely honest the major reason is that we
      call skb_cow_data() on the non-"zc" path. The next series
      will remove the CoW, dropping the gain to only ~10%.
      
      The second change is to flush the backlog every 128kB.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4874fb94
    • Jakub Kicinski's avatar
      tls: rx: periodically flush socket backlog · c46b0183
      Jakub Kicinski authored
      We continuously hold the socket lock during large reads and writes.
      This may inflate RTT and negatively impact TCP performance.
      Flush the backlog periodically. I tried to pick a flush period (128kB)
      which gives significant benefit but the max Bps rate is not yet visibly
      impacted.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c46b0183
    • Jakub Kicinski's avatar
      selftests: tls: add selftest variant for pad · f36068a2
      Jakub Kicinski authored
      Add a self-test variant with TLS 1.3 nopad set.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f36068a2
    • Jakub Kicinski's avatar
      tls: rx: add sockopt for enabling optimistic decrypt with TLS 1.3 · 88527790
      Jakub Kicinski authored
      Since optimisitic decrypt may add extra load in case of retries
      require socket owner to explicitly opt-in.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      88527790
    • Jakub Kicinski's avatar
      tls: rx: support optimistic decrypt to user buffer with TLS 1.3 · ce61327c
      Jakub Kicinski authored
      We currently don't support decrypt to user buffer with TLS 1.3
      because we don't know the record type and how much padding
      record contains before decryption. In practice data records
      are by far most common and padding gets used rarely so
      we can assume data record, no padding, and if we find out
      that wasn't the case - retry the crypto in place (decrypt
      to skb).
      
      To safeguard from user overwriting content type and padding
      before we can check it attach a 1B sg entry where last byte
      of the record will land.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ce61327c
    • Jakub Kicinski's avatar
      tls: rx: don't include tail size in data_len · 603380f5
      Jakub Kicinski authored
      To make future patches easier to review make data_len
      contain the length of the data, without the tail.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      603380f5
    • David S. Miller's avatar
      Merge branch 'mptcp-path-manager-fixes' · ae9fdf6c
      David S. Miller authored
      Mat Martineau says:
      
      ====================
      mptcp: Path manager fixes for 5.19
      
      The MPTCP userspace path manager is new in 5.19, and these patches fix
      some issues in that new code.
      
      Patches 1-3 fix path manager locking issues.
      
      Patches 4 and 5 allow userspace path managers to change priority of
      established subflows using the existing MPTCP_PM_CMD_SET_FLAGS generic
      netlink command. Includes corresponding self test update.
      
      Patches 6 and 7 fix accounting of available endpoint IDs and the
      MPTCP_MIB_RMSUBFLOW counter.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ae9fdf6c
    • Geliang Tang's avatar
      mptcp: update MIB_RMSUBFLOW in cmd_sf_destroy · d2d21f17
      Geliang Tang authored
      This patch increases MPTCP_MIB_RMSUBFLOW mib counter in userspace pm
      destroy subflow function mptcp_nl_cmd_sf_destroy() when removing subflow.
      
      Fixes: 702c2f64 ("mptcp: netlink: allow userspace-driven subflow establishment")
      Signed-off-by: default avatarGeliang Tang <geliang.tang@suse.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d2d21f17
    • Paolo Abeni's avatar
      mptcp: fix local endpoint accounting · 843b5e75
      Paolo Abeni authored
      In mptcp_pm_nl_rm_addr_or_subflow() we always mark as available
      the id corresponding to the just removed address.
      
      The used bitmap actually tracks only the local IDs: we must
      restrict the operation when a (local) subflow is removed.
      
      Fixes: a88c9e49 ("mptcp: do not block subflows creation on errors")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      843b5e75
    • Kishen Maloor's avatar
      selftests: mptcp: userspace PM support for MP_PRIO signals · ca188a25
      Kishen Maloor authored
      This change updates the testing sample (pm_nl_ctl) to exercise
      the updated MPTCP_PM_CMD_SET_FLAGS command for userspace PMs to
      issue MP_PRIO signals over the selected subflow.
      
      E.g. ./pm_nl_ctl set 10.0.1.2 port 47234 flags backup token 823274047 rip 10.0.1.1 rport 50003
      
      userspace_pm.sh has a new selftest that invokes this command.
      
      Fixes: 259a834f ("selftests: mptcp: functional tests for the userspace PM type")
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarKishen Maloor <kishen.maloor@intel.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ca188a25