1. 16 Nov, 2023 35 commits
  2. 15 Nov, 2023 5 commits
    • Yonghong Song's avatar
      bpf: Do not allocate percpu memory at init stage · 1fda5bb6
      Yonghong Song authored
      Kirill Shutemov reported significant percpu memory consumption increase after
      booting in 288-cpu VM ([1]) due to commit 41a5db8d ("bpf: Add support for
      non-fix-size percpu mem allocation"). The percpu memory consumption is
      increased from 111MB to 969MB. The number is from /proc/meminfo.
      
      I tried to reproduce the issue with my local VM which at most supports upto
      255 cpus. With 252 cpus, without the above commit, the percpu memory
      consumption immediately after boot is 57MB while with the above commit the
      percpu memory consumption is 231MB.
      
      This is not good since so far percpu memory from bpf memory allocator is not
      widely used yet. Let us change pre-allocation in init stage to on-demand
      allocation when verifier detects there is a need of percpu memory for bpf
      program. With this change, percpu memory consumption after boot can be reduced
      signicantly.
      
        [1] https://lore.kernel.org/lkml/20231109154934.4saimljtqx625l3v@box.shutemov.name/
      
      Fixes: 41a5db8d ("bpf: Add support for non-fix-size percpu mem allocation")
      Reported-and-tested-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Acked-by: default avatarHou Tao <houtao1@huawei.com>
      Link: https://lore.kernel.org/r/20231111013928.948838-1-yonghong.song@linux.devSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      1fda5bb6
    • Gal Pressman's avatar
      net: Fix undefined behavior in netdev name allocation · 674e3180
      Gal Pressman authored
      Cited commit removed the strscpy() call and kept the snprintf() only.
      
      It is common to use 'dev->name' as the format string before a netdev is
      registered, this results in 'res' and 'name' pointers being equal.
      According to POSIX, if copying takes place between objects that overlap
      as a result of a call to sprintf() or snprintf(), the results are
      undefined.
      
      Add back the strscpy() and use 'buf' as an intermediate buffer.
      
      Fixes: 7ad17b04 ("net: trust the bitmap in __dev_alloc_name()")
      Cc: Jakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarGal Pressman <gal@nvidia.com>
      Reviewed-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      674e3180
    • Jakub Kicinski's avatar
      net: don't dump stack on queue timeout · e316dd1c
      Jakub Kicinski authored
      The top syzbot report for networking (#14 for the entire kernel)
      is the queue timeout splat. We kept it around for a long time,
      because in real life it provides pretty strong signal that
      something is wrong with the driver or the device.
      
      Removing it is also likely to break monitoring for those who
      track it as a kernel warning.
      
      Nevertheless, WARN()ings are best suited for catching kernel
      programming bugs. If a Tx queue gets starved due to a pause
      storm, priority configuration, or other weirdness - that's
      obviously a problem, but not a problem we can fix at
      the kernel level.
      
      Bite the bullet and convert the WARN() to a print.
      
      Before:
      
        NETDEV WATCHDOG: eni1np1 (netdevsim): transmit queue 0 timed out 1975 ms
        WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x39e/0x3b0
        [... completely pointless stack trace of a timer follows ...]
      
      Now:
      
        netdevsim netdevsim1 eni1np1: NETDEV WATCHDOG: CPU: 0: transmit queue 0 timed out 1769 ms
      
      Alternatively we could mark the drivers which syzbot has
      learned to abuse as "print-instead-of-WARN" selectively.
      
      Reported-by: syzbot+d55372214aff0faa1f1f@syzkaller.appspotmail.com
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e316dd1c
    • David S. Miller's avatar
      Merge branch 'bnxt_en-tx-improvements' · 8d5855a5
      David S. Miller authored
      Michael Chan says:
      
      ====================
      bnxt_en: TX path improvements
      
      All patches in this patchset are related to improving the TX path.
      There are 2 areas of improvements:
      
      1. The TX interrupt logic currently counts the number of TX completions
      to determine the number of TX SKBs to free.  We now change it so that
      the TX completion will now contain the hardware consumer index
      information.  The driver will keep track of the latest hardware
      consumer index from the last TX completion and clean up all TX SKBs
      up to that index.  This scheme aligns better with future chips and
      allows xmit_more code path to be more optimized.
      
      2. The current driver logic requires an additional MSIX for each
      additional MQPRIO TX ring.  This scheme uses too many MSIX vectors if
      the user enables a large number of MQPRIO TCs.  We now use a new scheme
      that will use the same MSIX for all the MQPRIO TX rings for each
      ethtool channel.  Each ethtool TX channel can have up to 8 MQPRIO
      TX rings and now they all will share the same MSIX.
      
      v2: Rebased
      
      v1 posted on Oct 27 2023 right before the close of net-next:
      
      https://lore.kernel.org/netdev/20231027232252.36111-1-michael.chan@broadcom.com/
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d5855a5
    • Michael Chan's avatar
      bnxt_en: Optimize xmit_more TX path · c1056a59
      Michael Chan authored
      Now that we use the cumulative consumer index scheme for TX completion,
      we don't need to have one TX completion per TX packet in the xmit_more
      code path.  Set the TX_BD_FLAGS_NO_CMPL flag if xmit_more is true.
      Fallback to one interrupt per packet if the ring is filled beyond
      bp->tx_wake_thresh.
      
      Also, move the wmb() to bnxt_txr_db_kick().  When xmit_more is true,
      we'll skip the bnxt_txr_db_kick() call and there is no need to call
      wmb() to sync. the TX BD data.
      Reviewed-by: default avatarSomnath Kotur <somnath.kotur@broadcom.com>
      Reviewed-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1056a59