1. 19 Nov, 2023 7 commits
    • Li RongQing's avatar
      rtnetlink: introduce nlmsg_new_large and use it in rtnl_getlink · ac40916a
      Li RongQing authored
      if a PF has 256 or more VFs, ip link command will allocate an order 3
      memory or more, and maybe trigger OOM due to memory fragment,
      the VFs needed memory size is computed in rtnl_vfinfo_size.
      
      so introduce nlmsg_new_large which calls netlink_alloc_large_skb in
      which vmalloc is used for large memory, to avoid the failure of
      allocating memory
      
          ip invoked oom-killer: gfp_mask=0xc2cc0(GFP_KERNEL|__GFP_NOWARN|\
      	__GFP_COMP|__GFP_NOMEMALLOC), order=3, oom_score_adj=0
          CPU: 74 PID: 204414 Comm: ip Kdump: loaded Tainted: P           OE
          Call Trace:
          dump_stack+0x57/0x6a
          dump_header+0x4a/0x210
          oom_kill_process+0xe4/0x140
          out_of_memory+0x3e8/0x790
          __alloc_pages_slowpath.constprop.116+0x953/0xc50
          __alloc_pages_nodemask+0x2af/0x310
          kmalloc_large_node+0x38/0xf0
          __kmalloc_node_track_caller+0x417/0x4d0
          __kmalloc_reserve.isra.61+0x2e/0x80
          __alloc_skb+0x82/0x1c0
          rtnl_getlink+0x24f/0x370
          rtnetlink_rcv_msg+0x12c/0x350
          netlink_rcv_skb+0x50/0x100
          netlink_unicast+0x1b2/0x280
          netlink_sendmsg+0x355/0x4a0
          sock_sendmsg+0x5b/0x60
          ____sys_sendmsg+0x1ea/0x250
          ___sys_sendmsg+0x88/0xd0
          __sys_sendmsg+0x5e/0xa0
          do_syscall_64+0x33/0x40
          entry_SYSCALL_64_after_hwframe+0x44/0xa9
          RIP: 0033:0x7f95a65a5b70
      
      Cc: Yunsheng Lin <linyunsheng@huawei.com>
      Signed-off-by: default avatarLi RongQing <lirongqing@baidu.com>
      Link: https://lore.kernel.org/r/20231115120108.3711-1-lirongqing@baidu.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ac40916a
    • Jakub Kicinski's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue · 459a70ba
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      ice: one by one port representors creation
      
      Michal Swiatkowski says:
      
      Currently ice supports creating port representors only for VFs. For that
      use case they can be created and removed in one step.
      
      This patchset is refactoring current flow to support port representor
      creation also for subfunctions and SIOV. In this case port representors
      need to be created and removed one by one. Also, they can be added and
      removed while other port representors are running.
      
      To achieve that we need to change the switchdev configuration flow.
      Three first patches are only cosmetic (renaming, removing not used code).
      Next few ones are preparation for new flow. The most important one
      is "add VF representor one by one". It fully implements new flow.
      
      New type of port representor (for subfunction) will be introduced in
      follow up patchset.
      
      * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
        ice: reserve number of CP queues
        ice: adjust switchdev rebuild path
        ice: add VF representors one by one
        ice: realloc VSI stats arrays
        ice: set Tx topology every time new repr is added
        ice: allow changing SWITCHDEV_CTRL VSI queues
        ice: return pointer to representor
        ice: make representor code generic
        ice: remove VF pointer reference in eswitch code
        ice: track port representors in xarray
        ice: use repr instead of vf->repr
        ice: track q_id in representor
        ice: remove unused control VSI parameter
        ice: remove redundant max_vsi_num variable
        ice: rename switchdev to eswitch
      ====================
      
      Link: https://lore.kernel.org/r/20231114181449.1290117-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      459a70ba
    • Jakub Kicinski's avatar
      Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue · a49296e0
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      igc: Add support for physical + free-running timers
      
      Vinicius Costa Gomes says:
      
      The objective is to allow having functionality that depends on the
      physical timer (taprio and ETF offloads, for example) and vclocks
      operating together.
      
      The "big" missing piece is the implementation of the .getcyclesx64()
      function in igc, as i225/i226 have multiple timers, we use one of
      those timers (timer 1) as a free-running (non adjustable) timer.
      
      The complication is that only implementing .getcyclesx64() and nothing
      else will break synchronization when using vclocks, as reading the clock
      will retrieve the free-running value but timnestamps will come from the
      adjustable timer. The solution is to modify "in one go" the timestamping
      code to be able to retrieve the timestamp from the correct timer (if a
      socket is "phc_bound" to a vclock the timestamp will come from the
      free-running timer).
      
      I was debating whether or not to do the adjustments for the internal latencies
      for the free-running timestamps, decided to do the adjustments so the path
      delay when using vclocks is similar to the one when using the physical clock.
      
      One future improvement is to implement the .getcrosscycles() function.
      
      * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
        igc: Add support for PTP .getcyclesx64()
        igc: Simplify setting flags in the TX data descriptor
      ====================
      
      Link: https://lore.kernel.org/r/20231114183640.1303163-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a49296e0
    • Jakub Kicinski's avatar
      Merge branch 'net-sched-cls_u32-use-proper-refcounts' · 516cba96
      Jakub Kicinski authored
      Pedro Tammela says:
      
      ====================
      net/sched: cls_u32: use proper refcounts
      
      In u32 we are open coding refcounts of hashtables with integers which is
      far from ideal. Update those with proper refcount and add a couple of
      tests to tdc that exercise the refcounts explicitly.
      ====================
      
      Link: https://lore.kernel.org/r/20231114141856.974326-1-pctammela@mojatatu.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      516cba96
    • Pedro Tammela's avatar
      selftests/tc-testing: add hashtable tests for u32 · 54293e4d
      Pedro Tammela authored
      Add tests to specifically check for the refcount interactions of
      hashtables created by u32. These tables should not be deleted when
      referenced and the flush order should respect a tree like composition.
      Signed-off-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Link: https://lore.kernel.org/r/20231114141856.974326-3-pctammela@mojatatu.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      54293e4d
    • Pedro Tammela's avatar
      net/sched: cls_u32: replace int refcounts with proper refcounts · 6b78debe
      Pedro Tammela authored
      Proper refcounts will always warn splat when something goes wrong,
      be it underflow, saturation or object resurrection. As these are always
      a source of bugs, use it in cls_u32 as a safeguard to prevent/catch issues.
      Another benefit is that the refcount API self documents the code, making
      clear when transitions to dead are expected.
      
      For such an update we had to make minor adaptations on u32 to fit the refcount
      API. First we set explicitly to '1' when objects are created, then the
      objects are alive until a 1 -> 0 happens, which is then released appropriately.
      
      The above made clear some redundant operations in the u32 code
      around the root_ht handling that were removed. The root_ht is created
      with a refcnt set to 1. Then when it's associated with tcf_proto it increments the refcnt to 2.
      Throughout the entire code the root_ht is an exceptional case and can never be referenced,
      therefore the refcnt never incremented/decremented.
      Its lifetime is always bound to tcf_proto, meaning if you delete tcf_proto
      the root_ht is deleted as well. The code made up for the fact that root_ht refcnt is 2 and did
      a double decrement to free it, which is not a fit for the refcount API.
      
      Even though refcount_t is implemented using atomics, we should observe
      a negligible control plane impact.
      Signed-off-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Link: https://lore.kernel.org/r/20231114141856.974326-2-pctammela@mojatatu.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6b78debe
    • Jakub Kicinski's avatar
      net: partial revert of the "Make timestamping selectable: series · 289354f2
      Jakub Kicinski authored
      Revert following commits:
      
      commit acec05fb ("net_tstamp: Add TIMESTAMPING SOFTWARE and HARDWARE mask")
      commit 11d55be0 ("net: ethtool: Add a command to expose current time stamping layer")
      commit bb8645b0 ("netlink: specs: Introduce new netlink command to get current timestamp")
      commit d905f9c7 ("net: ethtool: Add a command to list available time stamping layers")
      commit aed5004e ("netlink: specs: Introduce new netlink command to list available time stamping layers")
      commit 51bdf316 ("net: Replace hwtstamp_source by timestamping layer")
      commit 0f7f463d ("net: Change the API of PHY default timestamp to MAC")
      commit 091fab12 ("net: ethtool: ts: Update GET_TS to reply the current selected timestamp")
      commit 152c75e1 ("net: ethtool: ts: Let the active time stamping layer be selectable")
      commit ee60ea6b ("netlink: specs: Introduce time stamping set command")
      
      They need more time for reviews.
      
      Link: https://lore.kernel.org/all/20231118183529.6e67100c@kernel.org/Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      289354f2
  2. 18 Nov, 2023 33 commits