1. 24 Mar, 2019 11 commits
    • David S. Miller's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 071d08af
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      100GbE Intel Wired LAN Driver Updates 2019-03-22
      
      This series contains updates to ice driver only.
      
      Akeem enables MAC anti-spoofing by default when a new VSI is being
      created.  Fixes an issue when reclaiming VF resources back to the pool
      after reset, by freeing VF resources separately using the first VF
      vector index to traverse the list, instead of starting at the last
      assigned vectors list.  Added support for VF & PF promiscuous mode in
      the ice driver.  Fixed the PF driver from letting the VF know it is "not
      trusted" when it attempts to add more than its permitted additional MAC
      addresses.  Altered how the driver gets the VF VSIs instances, instead
      of using the mailbox messages to retrieve VSIs, get it directly via the
      VF object in the PF data structure.
      
      Bruce fixes return values to resolve static analysis warnings.  Made
      whitespace changes to increase readability and reduce code wrapping.
      
      Anirudh cleans up code by removing a function prototype that was never
      implemented and removed an unused field in the ice_sched_vsi_info
      structure.
      
      Kiran fixes a potential divide by zero issue by adding a check.
      
      Victor cleans up the transmit scheduler by adjusting the stack variable
      usage and added/modified debug prints to make them more useful.
      
      Yashaswini updates the driver in VEB mode to ensure that the LAN_EN bit
      is set if all the right conditions are met.
      
      Christopher ensures the loopback enable bit is not set for prune switch
      rules, since all transmit traffic would be looped back to the internal
      switch and dropped.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      071d08af
    • David S. Miller's avatar
      Merge branch 'tcp-rx-tx-cache' · bdaba895
      David S. Miller authored
      Eric Dumazet says:
      
      ====================
      tcp: add rx/tx cache to reduce lock contention
      
      On hosts with many cpus we can observe a very serious contention
      on spinlocks used in mm slab layer.
      
      The following can happen quite often :
      
      1) TX path
        sendmsg() allocates one (fclone) skb on CPU A, sends a clone.
        ACK is received on CPU B, and consumes the skb that was in the retransmit
        queue.
      
      2) RX path
        network driver allocates skb on CPU C
        recvmsg() happens on CPU D, freeing the skb after it has been delivered
        to user space.
      
      In both cases, we are hitting the asymetric alloc/free pattern
      for which slab has to drain alien caches. At 8 Mpps per second,
      this represents 16 Mpps alloc/free per second and has a huge penalty.
      
      In an interesting experiment, I tried to use a single kmem_cache for all the skbs
      (in skb_init() : skbuff_fclone_cache = skbuff_head_cache =
                        kmem_cache_create("skbuff_fclone_cache", sizeof(struct sk_buff_fclones),);
      qnd most of the contention disappeared, since cpus could better use
      their local slab per-cpu cache.
      
      But we can do actually better, in the following patches.
      
      TX : at ACK time, no longer free the skb but put it back in a tcp socket cache,
           so that next sendmsg() can reuse it immediately.
      
      RX : at recvmsg() time, do not free the skb but put it in a tcp socket cache
         so that it can be freed by the cpu feeding the incoming packets in BH.
      
      This increased the performance of small RPC benchmark by about 10 % on a host
      with 112 hyperthreads.
      
      v2 : - Solved a race condition : sk_stream_alloc_skb() to make sure the prior
             clone has been freed.
           - Really test rps_needed in sk_eat_skb() as claimed.
           - Fixed rps_needed use in drivers/net/tun.c
      
      v3: Added a #ifdef CONFIG_RPS, to avoid compile error (kbuild robot)
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bdaba895
    • Eric Dumazet's avatar
      tcp: add one skb cache for rx · 8b27dae5
      Eric Dumazet authored
      Often times, recvmsg() system calls and BH handling for a particular
      TCP socket are done on different cpus.
      
      This means the incoming skb had to be allocated on a cpu,
      but freed on another.
      
      This incurs a high spinlock contention in slab layer for small rpc,
      but also a high number of cache line ping pongs for larger packets.
      
      A full size GRO packet might use 45 page fragments, meaning
      that up to 45 put_page() can be involved.
      
      More over performing the __kfree_skb() in the recvmsg() context
      adds a latency for user applications, and increase probability
      of trapping them in backlog processing, since the BH handler
      might found the socket owned by the user.
      
      This patch, combined with the prior one increases the rpc
      performance by about 10 % on servers with large number of cores.
      
      (tcp_rr workload with 10,000 flows and 112 threads reach 9 Mpps
       instead of 8 Mpps)
      
      This also increases single bulk flow performance on 40Gbit+ links,
      since in this case there are often two cpus working in tandem :
      
       - CPU handling the NIC rx interrupts, feeding the receive queue,
        and (after this patch) freeing the skbs that were consumed.
      
       - CPU in recvmsg() system call, essentially 100 % busy copying out
        data to user space.
      
      Having at most one skb in a per-socket cache has very little risk
      of memory exhaustion, and since it is protected by socket lock,
      its management is essentially free.
      
      Note that if rps/rfs is used, we do not enable this feature, because
      there is high chance that the same cpu is handling both the recvmsg()
      system call and the TCP rx path, but that another cpu did the skb
      allocations in the device driver right before the RPS/RFS logic.
      
      To properly handle this case, it seems we would need to record
      on which cpu skb was allocated, and use a different channel
      to give skbs back to this cpu.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8b27dae5
    • Eric Dumazet's avatar
      tcp: add one skb cache for tx · 472c2e07
      Eric Dumazet authored
      On hosts with a lot of cores, RPC workloads suffer from heavy contention on slab spinlocks.
      
          20.69%  [kernel]       [k] queued_spin_lock_slowpath
           5.64%  [kernel]       [k] _raw_spin_lock
           3.83%  [kernel]       [k] syscall_return_via_sysret
           3.48%  [kernel]       [k] __entry_text_start
           1.76%  [kernel]       [k] __netif_receive_skb_core
           1.64%  [kernel]       [k] __fget
      
      For each sendmsg(), we allocate one skb, and free it at the time ACK packet comes.
      
      In many cases, ACK packets are handled by another cpus, and this unfortunately
      incurs heavy costs for slab layer.
      
      This patch uses an extra pointer in socket structure, so that we try to reuse
      the same skb and avoid these expensive costs.
      
      We cache at most one skb per socket so this should be safe as far as
      memory pressure is concerned.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      472c2e07
    • Eric Dumazet's avatar
      net: convert rps_needed and rfs_needed to new static branch api · dc05360f
      Eric Dumazet authored
      We prefer static_branch_unlikely() over static_key_false() these days.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dc05360f
    • David S. Miller's avatar
      Merge branch 'net-dev-BYPASS-for-lockless-qdisc' · 7c1508e5
      David S. Miller authored
      Paolo Abeni says:
      
      ====================
      net: dev: BYPASS for lockless qdisc
      
      This patch series is aimed at improving xmit performances of lockless qdisc
      in the uncontended scenario.
      
      After the lockless refactor pfifo_fast can't leverage the BYPASS optimization.
      Due to retpolines the overhead for the avoidables enqueue and dequeue operations
      has increased and we see measurable regressions.
      
      The first patch introduces the BYPASS code path for lockless qdisc, and the
      second one optimizes such path further. Overall this avoids up to 3 indirect
      calls per xmit packet. Detailed performance figures are reported in the 2nd
      patch.
      
       v2 -> v3:
        - qdisc_is_empty() has a const argument (Eric)
      
       v1 -> v2:
        - use really an 'empty' flag instead of 'not_empty', as
          suggested by Eric
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c1508e5
    • Paolo Abeni's avatar
      net: dev: introduce support for sch BYPASS for lockless qdisc · ba27b4cd
      Paolo Abeni authored
      With commit c5ad119f ("net: sched: pfifo_fast use skb_array")
      pfifo_fast no longer benefit from the TCQ_F_CAN_BYPASS optimization.
      Due to retpolines the cost of the enqueue()/dequeue() pair has become
      relevant and we observe measurable regression for the uncontended
      scenario when the packet-rate is below line rate.
      
      After commit 46b1c18f ("net: sched: put back q.qlen into a
      single location") we can check for empty qdisc with a reasonably
      fast operation even for nolock qdiscs.
      
      This change extends TCQ_F_CAN_BYPASS support to nolock qdisc.
      The new chunk of code mirrors closely the existing one for traditional
      qdisc, leveraging a newly introduced helper to read atomically the
      qdisc length.
      
      Tested with pktgen in queue xmit mode, with pfifo_fast, a MQ
      device, and MQ root qdisc:
      
      threads         vanilla         patched
                      kpps            kpps
      1               2465            2889
      2               4304            5188
      4               7898            9589
      
      Same as above, but with a single queue device:
      
      threads         vanilla         patched
                      kpps            kpps
      1               2556            2827
      2               2900            2900
      4               5000            5000
      8               4700            4700
      
      No mesaurable changes in the contended scenarios, and more 10%
      improvement in the uncontended ones.
      
       v1 -> v2:
        - rebased after flag name change
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Tested-by: default avatarIvan Vecera <ivecera@redhat.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarIvan Vecera <ivecera@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ba27b4cd
    • Paolo Abeni's avatar
      net: sched: add empty status flag for NOLOCK qdisc · 28cff537
      Paolo Abeni authored
      The queue is marked not empty after acquiring the seqlock,
      and it's up to the NOLOCK qdisc clearing such flag on dequeue.
      Since the empty status lays on the same cache-line of the
      seqlock, it's always hot on cache during the updates.
      
      This makes the empty flag update a little bit loosy. Given
      the lack of synchronization between enqueue and dequeue, this
      is unavoidable.
      
      v2 -> v3:
       - qdisc_is_empty() has a const argument (Eric)
      
      v1 -> v2:
       - use really an 'empty' flag instead of 'not_empty', as
         suggested by Eric
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarIvan Vecera <ivecera@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      28cff537
    • Soheil Hassas Yeganeh's avatar
      tcp: add documentation for tcp_ca_state · 576fd2f7
      Soheil Hassas Yeganeh authored
      Add documentation to the tcp_ca_state enum, since this enum is
      exposed in uapi.
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Cc: Sowmini Varadhan <sowmini05@gmail.com>
      Acked-by: default avatarSowmini Varadhan <sowmini05@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      576fd2f7
    • Eric Dumazet's avatar
      tcp: remove conditional branches from tcp_mstamp_refresh() · e6d14070
      Eric Dumazet authored
      tcp_clock_ns() (aka ktime_get_ns()) is using monotonic clock,
      so the checks we had in tcp_mstamp_refresh() are no longer
      relevant.
      
      This patch removes cpu stall (when the cache line is not hot)
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e6d14070
    • Florian Fainelli's avatar
      net: phy: Correct Cygnus/Omega PHY driver prompt · a7a01ab3
      Florian Fainelli authored
      The tristate prompt should have been replaced rather than defined a few
      lines below, rebase mistake.
      
      Fixes: 17cc9821 ("net: phy: Move Omega PHY entry to Cygnus PHY driver")
      Reported-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a7a01ab3
  2. 22 Mar, 2019 18 commits
  3. 21 Mar, 2019 11 commits
    • David S. Miller's avatar
      Merge branch 'Refactor-flower-classifier-to-remove-dependency-on-rtnl-lock' · 1d965c4d
      David S. Miller authored
      Vlad Buslov says:
      
      ====================
      Refactor flower classifier to remove dependency on rtnl lock
      
      Currently, all netlink protocol handlers for updating rules, actions and
      qdiscs are protected with single global rtnl lock which removes any
      possibility for parallelism. This patch set is a third step to remove
      rtnl lock dependency from TC rules update path.
      
      Recently, new rtnl registration flag RTNL_FLAG_DOIT_UNLOCKED was added.
      TC rule update handlers (RTM_NEWTFILTER, RTM_DELTFILTER, etc.) are
      already registered with this flag and only take rtnl lock when qdisc or
      classifier requires it. Classifiers can indicate that their ops
      callbacks don't require caller to hold rtnl lock by setting the
      TCF_PROTO_OPS_DOIT_UNLOCKED flag. The goal of this change is to refactor
      flower classifier to support unlocked execution and register it with
      unlocked flag.
      
      This patch set implements following changes to make flower classifier
      concurrency-safe:
      
      - Implement reference counting for individual filters. Change fl_get to
        take reference to filter. Implement tp->ops->put callback that was
        introduced in cls API patch set to release reference to flower filter.
      
      - Use tp->lock spinlock to protect internal classifier data structures
        from concurrent modification.
      
      - Handle concurrent tcf proto deletion by returning EAGAIN, which will
        cause cls API to retry and create new proto instance or return error
        to the user (depending on message type).
      
      - Handle concurrent insertion of filter with same priority and handle by
        returning EAGAIN, which will cause cls API to lookup filter again and
        process it accordingly to netlink message flags.
      
      - Extend flower mask with reference counting and protect masks list with
        masks_lock spinlock.
      
      - Prevent concurrent mask insertion by inserting temporary value to
        masks hash table. This is necessary because mask initialization is a
        sleeping operation and cannot be done while holding tp->lock.
      
      Both chain level and classifier level conflicts are resolved by
      returning -EAGAIN to cls API that results restart of whole operation.
      This retry mechanism is a result of fine-grained locking approach used
      in this and previous changes in series and is necessary to allow
      concurrent updates on same chain instance. Alternative approach would be
      to lock the whole chain while updating filters on any of child tp's,
      adding and removing classifier instances from the chain. However, since
      most CPU-intensive parts of filter update code are specifically in
      classifier code and its dependencies (extensions and hw offloads), such
      approach would negate most of the gains introduced by this change and
      previous changes in the series when updating same chain instance.
      
      Tcf hw offloads API is not changed by this patch set and still requires
      caller to hold rtnl lock. Refactored flower classifier tracks rtnl lock
      state by means of 'rtnl_held' flag provided by cls API and obtains the
      lock before calling hw offloads. Following patch set will lift this
      restriction and refactor cls hw offloads API to support unlocked
      execution.
      
      With these changes flower classifier is safely registered with
      TCF_PROTO_OPS_DOIT_UNLOCKED flag in last patch.
      
      Changes from V2 to V3:
      - Rebase on latest net-next
      
      Changes from V1 to V2:
      - Extend cover letter with explanation about retry mechanism.
      - Rebase on current net-next.
      - Patch 1:
        - Use rcu_dereference_raw() for tp->root dereference.
        - Update comment in fl_head_dereference().
      - Patch 2:
        - Remove redundant check in fl_change error handling code.
        - Add empty line between error check and new handle assignment.
      - Patch 3:
        - Refactor loop in fl_get_next_filter() to improve readability.
      - Patch 4:
        - Refactor __fl_delete() to improve readability.
      - Patch 6:
        - Fix comment in fl_check_assign_mask().
      - Patch 9:
        - Extend commit message.
        - Fix error code in comment.
      - Patch 11:
        - Fix fl_hw_replace_filter() to always release rtnl lock in error
          handlers.
      - Patch 12:
        - Don't take rtnl lock before calling __fl_destroy_filter() in
          workqueue context.
        - Extend commit message with explanation why flower still takes rtnl
          lock before calling hardware offloads API.
      
      Github: <https://github.com/vbuslov/linux/tree/unlocked-flower-cong3>
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1d965c4d
    • Vlad Buslov's avatar
      net: sched: flower: set unlocked flag for flower proto ops · 92149190
      Vlad Buslov authored
      Set TCF_PROTO_OPS_DOIT_UNLOCKED for flower classifier to indicate that its
      ops callbacks don't require caller to hold rtnl lock. Don't take rtnl lock
      in fl_destroy_filter_work() that is executed on workqueue instead of being
      called by cls API and is not affected by setting
      TCF_PROTO_OPS_DOIT_UNLOCKED. Rtnl mutex is still manually taken by flower
      classifier before calling hardware offloads API that has not been updated
      for unlocked execution.
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      92149190
    • Vlad Buslov's avatar
      net: sched: flower: track rtnl lock state · c24e43d8
      Vlad Buslov authored
      Use 'rtnl_held' flag to track if caller holds rtnl lock. Propagate the flag
      to internal functions that need to know rtnl lock state. Take rtnl lock
      before calling tcf APIs that require it (hw offload, bind filter, etc.).
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c24e43d8
    • Vlad Buslov's avatar
      net: sched: flower: protect flower classifier state with spinlock · 3d81e711
      Vlad Buslov authored
      struct tcf_proto was extended with spinlock to be used by classifiers
      instead of global rtnl lock. Use it to protect shared flower classifier
      data structures (handle_idr, mask hashtable and list) and fields of
      individual filters that can be accessed concurrently. This patch set uses
      tcf_proto->lock as per instance lock that protects all filters on
      tcf_proto.
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3d81e711
    • Vlad Buslov's avatar
      net: sched: flower: handle concurrent tcf proto deletion · 272ffaad
      Vlad Buslov authored
      Without rtnl lock protection tcf proto can be deleted concurrently. Check
      tcf proto 'deleting' flag after taking tcf spinlock to verify that no
      concurrent deletion is in progress. Return EAGAIN error if concurrent
      deletion detected, which will cause caller to retry and possibly create new
      instance of tcf proto.
      
      Retry mechanism is a result of fine-grained locking approach used in this
      and previous changes in series and is necessary to allow concurrent updates
      on same chain instance. Alternative approach would be to lock the whole
      chain while updating filters on any of child tp's, adding and removing
      classifier instances from the chain. However, since most CPU-intensive
      parts of filter update code are specifically in classifier code and its
      dependencies (extensions and hw offloads), such approach would negate most
      of the gains introduced by this change and previous changes in the series
      when updating same chain instance.
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      272ffaad
    • Vlad Buslov's avatar
      net: sched: flower: handle concurrent filter insertion in fl_change · 9a2d9389
      Vlad Buslov authored
      Check if user specified a handle and another filter with the same handle
      was inserted concurrently. Return EAGAIN to retry filter processing (in
      case it is an overwrite request).
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9a2d9389
    • Vlad Buslov's avatar
      net: sched: flower: protect masks list with spinlock · 259e60f9
      Vlad Buslov authored
      Protect modifications of flower masks list with spinlock to remove
      dependency on rtnl lock and allow concurrent access.
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      259e60f9
    • Vlad Buslov's avatar
      net: sched: flower: handle concurrent mask insertion · 195c234d
      Vlad Buslov authored
      Without rtnl lock protection masks with same key can be inserted
      concurrently. Insert temporary mask with reference count zero to masks
      hashtable. This will cause any concurrent modifications to retry.
      
      Wait for rcu grace period to complete after removing temporary mask from
      masks hashtable to accommodate concurrent readers.
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Suggested-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      195c234d
    • Vlad Buslov's avatar
      net: sched: flower: add reference counter to flower mask · f48ef4d5
      Vlad Buslov authored
      Extend fl_flow_mask structure with reference counter to allow parallel
      modification without relying on rtnl lock. Use rcu read lock to safely
      lookup mask and increment reference counter in order to accommodate
      concurrent deletes.
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f48ef4d5
    • Vlad Buslov's avatar
      net: sched: flower: track filter deletion with flag · b2552b8c
      Vlad Buslov authored
      In order to prevent double deletion of filter by concurrent tasks when rtnl
      lock is not used for synchronization, add 'deleted' filter field. Check
      value of this field when modifying filters and return error if concurrent
      deletion is detected.
      
      Refactor __fl_delete() to accept pointer to 'last' boolean as argument,
      and return error code as function return value instead. This is necessary
      to signal concurrent filter delete to caller.
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b2552b8c
    • Vlad Buslov's avatar
      net: sched: flower: introduce reference counting for filters · 06177558
      Vlad Buslov authored
      Extend flower filters with reference counting in order to remove dependency
      on rtnl lock in flower ops and allow to modify filters concurrently.
      Reference to flower filter can be taken/released concurrently as soon as it
      is marked as 'unlocked' by last patch in this series. Use atomic reference
      counter type to make concurrent modifications safe.
      
      Always take reference to flower filter while working with it:
      - Modify fl_get() to take reference to filter.
      - Implement tp->put() callback as fl_put() function to allow cls API to
      release reference taken by fl_get().
      - Modify fl_change() to assume that caller holds reference to fold and take
      reference to fnew.
      - Take reference to filter while using it in fl_walk().
      
      Implement helper functions to get/put filter reference counter.
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      06177558