1. 14 Feb, 2022 17 commits
    • David S. Miller's avatar
      Merge branch 'netdev-RT' · da54d75b
      David S. Miller authored
      Sebastian Andrzej Siewior says:
      
      ====================
      net: dev: PREEMPT_RT fixups.
      
      this series removes or replaces preempt_disable() and local_irq_save()
      sections which are problematic on PREEMPT_RT.
      Patch 2 makes netif_rx() work from any context after I found suggestions
      for it in an old thread. Should that work, then the context-specific
      variants could be removed.
      
      v2…v3:
         - #2
           - Export __netif_rx() so it can be used by everyone.
           - Add a lockdep assert to check for interrupt context.
           - Update the kernel doc and mention that the skb is posted to
             backlog NAPI.
           - Use __netif_rx() also in drivers/net/*.c.
           - Added Toke''s review tag and kept Eric's desptite the changes
             made.
      
      v1…v2:
        - #1 and #2
          - merge patch 1 und 2 from the series (as per Toke).
          - updated patch description and corrected the first commit number (as
            per Eric).
         - #2
           - Provide netif_rx() as in v1 and additionally __netif_rx() without
             local_bh disable()+enable() for the loopback driver. __netif_rx() is
             not exported (loopback is built-in only) so it won't be used
             drivers. If this doesn't work then we can still export/ define a
             wrapper as Eric suggested.
           - Added a comment that netif_rx() considered legacy.
         - #3
           - Moved ____napi_schedule() into rps_ipi_queued() and
             renamed it napi_schedule_rps().
         https://lore.kernel.org/all/20220204201259.1095226-1-bigeasy@linutronix.de/
      
      v1:
         https://lore.kernel.org/all/20220202122848.647635-1-bigeasy@linutronix.de
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      da54d75b
    • Sebastian Andrzej Siewior's avatar
      net: dev: Make rps_lock() disable interrupts. · e722db8d
      Sebastian Andrzej Siewior authored
      Disabling interrupts and in the RPS case locking input_pkt_queue is
      split into local_irq_disable() and optional spin_lock().
      
      This breaks on PREEMPT_RT because the spinlock_t typed lock can not be
      acquired with disabled interrupts.
      The sections in which the lock is acquired is usually short in a sense that it
      is not causing long und unbounded latiencies. One exception is the
      skb_flow_limit() invocation which may invoke a BPF program (and may
      require sleeping locks).
      
      By moving local_irq_disable() + spin_lock() into rps_lock(), we can keep
      interrupts disabled on !PREEMPT_RT and enabled on PREEMPT_RT kernels.
      Without RPS on a PREEMPT_RT kernel, the needed synchronisation happens
      as part of local_bh_disable() on the local CPU.
      ____napi_schedule() is only invoked if sd is from the local CPU. Replace
      it with __napi_schedule_irqoff() which already disables interrupts on
      PREEMPT_RT as needed. Move this call to rps_ipi_queued() and rename the
      function to napi_schedule_rps as suggested by Jakub.
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Reviewed-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e722db8d
    • Sebastian Andrzej Siewior's avatar
      net: dev: Makes sure netif_rx() can be invoked in any context. · baebdf48
      Sebastian Andrzej Siewior authored
      Dave suggested a while ago (eleven years by now) "Let's make netif_rx()
      work in all contexts and get rid of netif_rx_ni()". Eric agreed and
      pointed out that modern devices should use netif_receive_skb() to avoid
      the overhead.
      In the meantime someone added another variant, netif_rx_any_context(),
      which behaves as suggested.
      
      netif_rx() must be invoked with disabled bottom halves to ensure that
      pending softirqs, which were raised within the function, are handled.
      netif_rx_ni() can be invoked only from process context (bottom halves
      must be enabled) because the function handles pending softirqs without
      checking if bottom halves were disabled or not.
      netif_rx_any_context() invokes on the former functions by checking
      in_interrupts().
      
      netif_rx() could be taught to handle both cases (disabled and enabled
      bottom halves) by simply disabling bottom halves while invoking
      netif_rx_internal(). The local_bh_enable() invocation will then invoke
      pending softirqs only if the BH-disable counter drops to zero.
      
      Eric is concerned about the overhead of BH-disable+enable especially in
      regard to the loopback driver. As critical as this driver is, it will
      receive a shortcut to avoid the additional overhead which is not needed.
      
      Add a local_bh_disable() section in netif_rx() to ensure softirqs are
      handled if needed.
      Provide __netif_rx() which does not disable BH and has a lockdep assert
      to ensure that interrupts are disabled. Use this shortcut in the
      loopback driver and in drivers/net/*.c.
      Make netif_rx_ni() and netif_rx_any_context() invoke netif_rx() so they
      can be removed once they are no more users left.
      
      Link: https://lkml.kernel.org/r/20100415.020246.218622820.davem@davemloft.netSigned-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      baebdf48
    • Sebastian Andrzej Siewior's avatar
      net: dev: Remove preempt_disable() and get_cpu() in netif_rx_internal(). · f234ae29
      Sebastian Andrzej Siewior authored
      The preempt_disable() () section was introduced in commit
          cece1945 ("net: disable preemption before call smp_processor_id()")
      
      and adds it in case this function is invoked from preemtible context and
      because get_cpu() later on as been added.
      
      The get_cpu() usage was added in commit
          b0e28f1e ("net: netif_rx() must disable preemption")
      
      because ip_dev_loopback_xmit() invoked netif_rx() with enabled preemption
      causing a warning in smp_processor_id(). The function netif_rx() should
      only be invoked from an interrupt context which implies disabled
      preemption. The commit
         e30b38c2 ("ip: Fix ip_dev_loopback_xmit()")
      
      was addressing this and replaced netif_rx() with in netif_rx_ni() in
      ip_dev_loopback_xmit().
      
      Based on the discussion on the list, the former patch (b0e28f1e)
      should not have been applied only the latter (e30b38c2).
      
      Remove get_cpu() and preempt_disable() since the function is supposed to
      be invoked from context with stable per-CPU pointers. Bottom halves have
      to be disabled at this point because the function may raise softirqs
      which need to be processed.
      
      Link: https://lkml.kernel.org/r/20100415.013347.98375530.davem@davemloft.netSigned-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f234ae29
    • Dave Ertman's avatar
      ice: Simplify tracking status of RDMA support · 88f62aea
      Dave Ertman authored
      The status of support for RDMA is currently being tracked with two
      separate status flags. This is unnecessary with the current state of
      the driver.
      
      Simplify status tracking down to a single flag.
      
      Rename the helper function to denote the RDMA specific status and
      universally use the helper function to test the status bit.
      Signed-off-by: default avatarDave Ertman <david.m.ertman@intel.com>
      Tested-by: default avatarLeszek Kaliszczuk <leszek.kaliszczuk@intel.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      88f62aea
    • David S. Miller's avatar
      Merge branch 'ocelot-stats' · d4e7592b
      David S. Miller authored
      Colin Foster says:
      
      ====================
      use bulk reads for ocelot statistics
      
      Ocelot loops over memory regions to gather stats on different ports.
      These regions are mostly continuous, and are ordered. This patch set
      uses that information to break the stats reads into regions that can get
      read in bulk.
      
      The motiviation is for general cleanup, but also for SPI. Performing two
      back-to-back reads on a SPI bus require toggling the CS line, holding,
      re-toggling the CS line, sending 3 address bytes, sending N padding
      bytes, then actually performing the read. Bulk reads could reduce almost
      all of that overhead, but require that the reads are performed via
      regmap_bulk_read.
      
      Verified with eth0 hooked up to the CPU port:
      NIC statistics:
           Good Rx Frames: 905
           Rx Octets: 78848
           Good Tx Frames: 691
           Tx Octets: 52516
           Rx + Tx 65-127 Octet Frames: 1574
           Rx + Tx 128-255 Octet Frames: 22
           Net Octets: 131364
           Rx DMA chan 0: head_enqueue: 1
           Rx DMA chan 0: tail_enqueue: 1032
           Rx DMA chan 0: busy_dequeue: 628
           Rx DMA chan 0: good_dequeue: 905
           Tx DMA chan 0: head_enqueue: 346
           Tx DMA chan 0: tail_enqueue: 345
           Tx DMA chan 0: misqueued: 345
           Tx DMA chan 0: empty_dequeue: 346
           Tx DMA chan 0: good_dequeue: 691
           p00_rx_octets: 52516
           p00_rx_unicast: 691
           p00_rx_frames_65_to_127_octets: 691
           p00_tx_octets: 78848
           p00_tx_unicast: 905
           p00_tx_frames_65_to_127_octets: 883
           p00_tx_frames_128_255_octets: 22
           p00_tx_green_prio_0: 905
      
      And with swp2 connected to swp3 with STP enabled:
      NIC statistics:
           tx_packets: 379
           tx_bytes: 19708
           rx_packets: 1
           rx_bytes: 46
           rx_octets: 64
           rx_multicast: 1
           rx_frames_below_65_octets: 1
           rx_classified_drops: 1
           tx_octets: 44630
           tx_multicast: 387
           tx_broadcast: 290
           tx_frames_below_65_octets: 379
           tx_frames_65_to_127_octets: 294
           tx_frames_128_255_octets: 4
           tx_green_prio_0: 298
           tx_green_prio_7: 379
      NIC statistics:
           tx_packets: 1
           tx_bytes: 52
           rx_packets: 713
           rx_bytes: 34148
           rx_octets: 46982
           rx_multicast: 407
           rx_broadcast: 306
           rx_frames_below_65_octets: 399
           rx_frames_65_to_127_octets: 310
           rx_frames_128_to_255_octets: 4
           rx_classified_drops: 399
           rx_green_prio_0: 314
           tx_octets: 64
           tx_multicast: 1
           tx_frames_below_65_octets: 1
           tx_green_prio_7: 1
      
      v1 > v2: reword commit messages
      v2 > v3: correctly mark this for net-next when sending
      v3 > v4: calloc array instead of zalloc per review
      v4 > v5:
          Apply CR suggestions for whitespace
          Fix calloc / zalloc mixup
          Properly destroy workqueues
          Add third commit to split long macros
      v5 > v6:
          Fix functionality - v5 was improperly tested
          Add bugfix for ethtool mutex lock
          Remove unnecessary ethtool stats reads
      v6 > v7:
          Remove mutex bug patch that was applied via net
          Rename function based on CR
          Add missed error check
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d4e7592b
    • Colin Foster's avatar
      net: mscc: ocelot: use bulk reads for stats · d87b1c08
      Colin Foster authored
      Create and utilize bulk regmap reads instead of single access for gathering
      stats. The background reading of statistics happens frequently, and over
      a few contiguous memory regions.
      
      High speed PCIe buses and MMIO access will probably see negligible
      performance increase. Lower speed buses like SPI and I2C could see
      significant performance increase, since the bus configuration and register
      access times account for a large percentage of data transfer time.
      Signed-off-by: default avatarColin Foster <colin.foster@in-advantage.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d87b1c08
    • Colin Foster's avatar
      net: mscc: ocelot: add ability to perform bulk reads · 40f3a5c8
      Colin Foster authored
      Regmap supports bulk register reads. Ocelot does not. This patch adds
      support for Ocelot to invoke bulk regmap reads. That will allow any driver
      that performs consecutive reads over memory regions to optimize that
      access.
      Signed-off-by: default avatarColin Foster <colin.foster@in-advantage.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      40f3a5c8
    • Colin Foster's avatar
      net: ocelot: align macros for consistency · 65c53595
      Colin Foster authored
      In the ocelot.h file, several read / write macros were split across
      multiple lines, while others weren't. Split all macros that exceed the 80
      character column width and match the style of the rest of the file.
      Signed-off-by: default avatarColin Foster <colin.foster@in-advantage.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      65c53595
    • Colin Foster's avatar
      net: mscc: ocelot: remove unnecessary stat reading from ethtool · e27d785e
      Colin Foster authored
      The ocelot_update_stats function only needs to read from one port, yet it
      was updating the stats for all ports. Update to only read the stats that
      are necessary.
      Signed-off-by: default avatarColin Foster <colin.foster@in-advantage.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e27d785e
    • David Ahern's avatar
      ipv6: Add reasons for skb drops to __udp6_lib_rcv · 4cf91f82
      David Ahern authored
      Add reasons to __udp6_lib_rcv for skb drops. The only twist is that the
      NO_SOCKET takes precedence over the CSUM or other counters for that
      path (motivation behind this patch - csum counter was misleading).
      Signed-off-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4cf91f82
    • David S. Miller's avatar
      Merge branch 'dm9051' · a1b86c5d
      David S. Miller authored
      Joseph CHAMG says:
      
      ====================
      ADD DM9051 ETHERNET DRIVER
      
      DM9051 is a spi interface chip,
      need cs/mosi/miso/clock with an interrupt gpio pin
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a1b86c5d
    • Joseph CHAMG's avatar
      net: Add dm9051 driver · 2dc95a4d
      Joseph CHAMG authored
      Add davicom dm9051 spi ethernet driver, The driver work for the
      device platform which has the spi master
      Signed-off-by: default avatarJoseph CHAMG <josright123@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2dc95a4d
    • Joseph CHAMG's avatar
      dt-bindings: net: Add Davicom dm9051 SPI ethernet controller · 759856e9
      Joseph CHAMG authored
      This is a new yaml base data file for configure davicom dm9051 with
      device tree
      Signed-off-by: default avatarJoseph CHAMG <josright123@gmail.com>
      Reviewed-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      759856e9
    • Tony Lu's avatar
      net/smc: Add comment for smc_tx_pending · 2e13bde1
      Tony Lu authored
      The previous patch introduces a lock-free version of smc_tx_work() to
      solve unnecessary lock contention, which is expected to be held lock.
      So this adds comment to remind people to keep an eye out for locks.
      Suggested-by: default avatarStefan Raspl <raspl@linux.ibm.com>
      Signed-off-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2e13bde1
    • Kalash Nainwal's avatar
      Generate netlink notification when default IPv6 route preference changes · 806c37dd
      Kalash Nainwal authored
      Generate RTM_NEWROUTE netlink notification when the route preference
       changes on an existing kernel generated default route in response to
       RA messages. Currently netlink notifications are generated only when
       this route is added or deleted but not when the route preference
       changes, which can cause userspace routing application state to go
       out of sync with kernel.
      Signed-off-by: default avatarKalash Nainwal <kalash@arista.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      806c37dd
    • Davide Caratti's avatar
      net/sched: act_police: more accurate MTU policing · 4ddc844e
      Davide Caratti authored
      in current Linux, MTU policing does not take into account that packets at
      the TC ingress have the L2 header pulled. Thus, the same TC police action
      (with the same value of tcfp_mtu) behaves differently for ingress/egress.
      In addition, the full GSO size is compared to tcfp_mtu: as a consequence,
      the policer drops GSO packets even when individual segments have the L2 +
      L3 + L4 + payload length below the configured valued of tcfp_mtu.
      
      Improve the accuracy of MTU policing as follows:
       - account for mac_len for non-GSO packets at TC ingress.
       - compare MTU threshold with the segmented size for GSO packets.
      Also, add a kselftest that verifies the correct behavior.
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Reviewed-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ddc844e
  2. 13 Feb, 2022 10 commits
  3. 12 Feb, 2022 1 commit
  4. 11 Feb, 2022 12 commits