1. 30 Apr, 2017 22 commits
    • Tariq Toukan's avatar
      net/mlx5e: Optimize poll ICOSQ completion queue · 1f5b1e47
      Tariq Toukan authored
      UMR operations are more frequent and important.
      Check them first, and add a compiler branch predictor hint.
      
      According to current design, ICOSQ CQ can contain at most one
      pending CQE per napi. Poll function is optimized accordingly.
      
      Performance:
      Single-stream packet-rate tested with pktgen.
      Packets are dropped in tc level to zoom into driver data-path.
      Larger gain is expected for larger packet sizes, as BW is higher
      and UMR posts are more frequent.
      
      ---------------------------------------------
      packet size | before    | after     | gain  |
      64B         | 4,092,370 | 4,113,306 |  0.5% |
      1024B       | 3,421,435 | 3,633,819 |  6.2% |
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Cc: kernel-team@fb.com
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      1f5b1e47
    • Hadar Hen Zion's avatar
      net/mlx5e: Act on delay probe time updates · a2fa1fe5
      Hadar Hen Zion authored
      The user can change delay_first_probe_time parameter through sysctl.
      Listen to NETEVENT_DELAY_PROBE_TIME_UPDATE notifications and update the
      intervals for updating the neighbours 'used' value periodic task and
      for flow HW counters query periodic task.
      Both of the intervals will be update only in case the new delay prob
      time value is lower the current interval.
      
      Since the driver saves only one min interval value and not per device,
      the users will be able to set lower interval value for updating
      neighbour 'used' value periodic task but they won't be able to schedule
      a higher interval for this periodic task.
      The used interval for scheduling neighbour 'used' value periodic task is
      the minimal delay prob time parameter ever seen by the driver.
      Signed-off-by: default avatarHadar Hen Zion <hadarh@mellanox.com>
      Reviewed-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      a2fa1fe5
    • Hadar Hen Zion's avatar
      net/mlx5e: Update neighbour 'used' state using HW flow rules counters · f6dfb4c3
      Hadar Hen Zion authored
      When IP tunnel encapsulation rules are offloaded, the kernel can't see
      the traffic of the offloaded flow. The neighbour for the IP tunnel
      destination of the offloaded flow can mistakenly become STALE and
      deleted by the kernel since its 'used' value wasn't changed.
      
      To make sure that a neighbour which is used by the HW won't become
      STALE, we proactively update the neighbour 'used' value every
      DELAY_PROBE_TIME period, when packets were matched and counted by the HW
      for one of the tunnel encap flows related to this neighbour.
      
      The periodic task that updates the used neighbours is scheduled when a
      tunnel encap rule is successfully offloaded into HW and keeps re-scheduling
      itself as long as the representor's neighbours list isn't empty.
      
      Add, remove, lookup and status change operations done over the
      representor's neighbours list or the neighbour hash entry encaps list
      are all serialized by RTNL lock.
      Signed-off-by: default avatarHadar Hen Zion <hadarh@mellanox.com>
      Reviewed-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      f6dfb4c3
    • Hadar Hen Zion's avatar
      net/mlx5e: Add support to neighbour update flow · 232c0013
      Hadar Hen Zion authored
      In order to offload TC encap rules, the driver does a lookup for the IP
      tunnel neighbour according to the output device and the destination IP
      given by the user.
      
      To keep tracking after the validity state of such neighbours, we keep
      the neighbours information (pair of device pointer and destination IP)
      in a hash table maintained at the relevant egress representor and
      register to get NETEVENT_NEIGH_UPDATE events. When getting neighbour update
      netevent, we search for a match among the cached neighbours entries used for
      encapsulation.
      
      In case the neighbour isn't valid, we can't offload the flow into the
      HW. We cache the flow (requested matching and actions) in the driver and
      offload the rule later, when the neighbour is resolved and becomes
      valid.
      
      When a flow is only cached in the driver and not offloaded into HW
      yet, we use EAGAIN return value to mark it internally, the TC ndo still
      returns success.
      
      Listen to kernel neighbour update netevents to trace relevant neighbours
      validity state:
      
      1. If a neighbour becomes valid, offload the related rules to HW.
      
      2. If the neighbour becomes invalid, remove the related rules from HW.
      
      3. If the neighbour mac address was changed, update the encap header.
         Remove all the offloaded rules using the old encap header from the HW
         and insert new rules to HW with updated encap header.
      
      Access to the neighbors hash table is protected by RTNL lock of its
      caller or by the table's spinlock.
      
      Details of the locking/synchronization among the different actions
      applied on the neighbour table:
      
      Add/remove operations - protected by RTNL lock of its caller (all TC
      commands are protected by RTNL lock). Add and remove operations are
      initiated only when the user inserts/removes a TC rule into/from the driver.
      
      Lookup/remove operations - since the lookup operation is done from
      netevent notifier block, RTNL lock can't be used (atomic context).
      Use the table's spin lock to protect lookups from TC user removal operation.
      bh is used since netevent can be called from a softirq context.
      
      Lookup/add operations - The hash table access functions are taking
      care of the protection between lookup and add operations.
      
      When adding/removing encap headers and rules to/from the HW, RTNL lock
      is used. It can happen when:
      
      1. The user inserts/removes a TC rule into/from the driver (TC commands
      are protected by RTNL lock of it's caller).
      
      2. The driver gets neighbour notification event, which reports about
      neighbour validity status change. Before adding/removing encap headers
      and rules to/from the HW, RTNL lock is taken.
      
      A neighbour hash table entry should be freed when its encap list is empty.
      Since The neighbour update netevent notification schedules a neighbour
      update work that uses the neighbour hash entry, it can't be freed
      unconditionally when the encap list becomes empty during TC delete rule flow.
      Use reference count to protect from freeing neighbour hash table entry
      while it's still in use.
      
      When the user asks to unregister a netdvice used by one of the neigbours,
      neighbour removal notification is received. Then we take a reference on the
      neighbour and don't free it until the relevant encap entries (and flows) are
      marked as invalid (not offloaded) and removed from HW.
      As long as the encap entry is still valid (checked under RTNL lock) we
      can safely access the neighbour device saved on mlx5e_neigh struct.
      Signed-off-by: default avatarHadar Hen Zion <hadarh@mellanox.com>
      Reviewed-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      232c0013
    • Hadar Hen Zion's avatar
      net/mlx5e: Add neighbour hash table to the representors · 37b498ff
      Hadar Hen Zion authored
      Add hash table to the representors which is to be used by the next patch
      to save neighbours information in the driver.
      
      In order to offload IP tunnel encapsulation rules, the driver must find
      the tunnel dst neighbour according to the output device and the
      destination address given by the user. The next patch will cache the
      neighbors information in the driver to allow support in neigh update
      flow for tunnel encap rules.
      
      The neighbour entries are also saved in a list so we easily iterate over
      them when querying statistics in order to provide 'used' feedback to the
      kernel neighbour NUD core.
      Signed-off-by: default avatarHadar Hen Zion <hadarh@mellanox.com>
      Reviewed-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      37b498ff
    • Hadar Hen Zion's avatar
      net/mlx5e: Read neigh parameters with proper locking · 033354d5
      Hadar Hen Zion authored
      The nud_state and hardware address fields are protected by the neighbour
      lock, we should acquire it before accessing those parameters.
      
      Use this lock to avoid inconsistency between the neighbour validity state
      and it's hardware address.
      Signed-off-by: default avatarHadar Hen Zion <hadarh@mellanox.com>
      Reviewed-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      033354d5
    • Hadar Hen Zion's avatar
      net/mlx5e: Use flag to properly monitor a flow rule offloading state · 0b67a38f
      Hadar Hen Zion authored
      Instead of relaying on the 'flow->rule' pointer value which can be
      valid or invalid (in case the FW returns an error while trying to offload
      the rule), monitor the rule state using a flag.
      
      In downstream patch which adds support to IP tunneling neigh update
      flow, a TC rule could be cached in the driver and not offloaded into the
      HW. In this case, the flow handle pointer stays NULL.
      
      Check the offloaded flag to properly deal with rules which are currently
      not offloaded when querying rule statistics.
      
      This patch doesn't add any new functionality.
      Signed-off-by: default avatarHadar Hen Zion <hadarh@mellanox.com>
      Reviewed-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      0b67a38f
    • Hadar Hen Zion's avatar
      net/mlx5e: Remove output device parameter from create encap header helpers definition · 1a8552bd
      Hadar Hen Zion authored
      Passing output device parameter to the helper functions that deal with
      creation of encapsulation headers is redundant. Output device parameter
      can be defined inside those helpers, no need to pass it. Refactor the code by
      removing the parameter from the function signature.
      
      This patch doesn't change any functionality.
      Signed-off-by: default avatarHadar Hen Zion <hadarh@mellanox.com>
      Reviewed-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      1a8552bd
    • Or Gerlitz's avatar
      net/mlx5e: Move the encap entry structure from the eswitch header · c1ae1152
      Or Gerlitz authored
      The encap entry structure isn't manipulated by the eswitch code,
      hence it can/needs to be removed from the eswitch header.
      
      Do that, and change it to have mlx5e_ prefix.
      
      This patch doesn't change any functionality.
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      c1ae1152
    • Or Gerlitz's avatar
      net/mlx5: Remove encap entry pointer from the eswitch flow attributes · 45247bf2
      Or Gerlitz authored
      Encap wise, the tc eswitch flow attribute struct needs to have
      only the encap ID which is programmed later to the HW and none
      of the higher level encap params, fix that.
      
      This patch doesn't change any functionality.
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      45247bf2
    • Saeed Mahameed's avatar
      net/mlx5e: Extendable vport representor netdev private data · 1d447a39
      Saeed Mahameed authored
      Make representor netdev private data extendable by adding new struct
      "mlx5e_rep_priv" and use it as the rep netdev private data struct
      instead of directly pointing to mlx5_eswitch_rep.
      
      Added new en_rep.h header file to contain all representor related
      definitions and prototypes, and moved all representor specific logic
      into en_rep.c.
      
      Needed for downstream patches to extend representor functionality to
      support neighbour update.
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Reviewed-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      1d447a39
    • David S. Miller's avatar
      Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · c08bac03
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      10GbE Intel Wired LAN Driver Updates 2017-04-29
      
      This series contains updates to ixgbe and ixgbevf only, most notable is
      the addition of XDP support to our 10GbE drivers.
      
      Paul fixes ixgbe to acquire the PHY semaphore before accessing PHY
      registers when issuing a device reset.
      
      John adds XDP support (yeah!) for ixgbe.
      
      Emil fixes an issue by flushing the MACVLAN filters on VF reset to avoid
      conflicts with other VFs that may end up using the same MAC address.  Also
      fixed a bug where ethtool -S displayed some empty fields for ixgbevf
      because it was using ixgbe_stats instead ixgbevf_stats for
      IXGBEVF_QUEUE_STATS_LEN.
      
      Tony adds the ability to specify a zero MAC address in order to clear the
      VF's MAC address from the RAR table.  Also adds support for a new
      1000Base-T device based on x550EM_X MAC type.  Fixed an issue where the
      RSS key specified by the user would be over-written with a pre-existing
      value, so change the rss_key to a pointer so we can check to see if the
      key has a value set before attempting to set it.  Fixed the logic for
      mailbox support for getting RETA and RSS values, which are only supported
      by 82599 and x540 devices.
      
      v2: fixed up patches #2 and #3 based on feedback from Jakub and to
          address build issues when page sizes are larger than 4k
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c08bac03
    • Tony Nguyen's avatar
      ixgbevf: Check for RSS key before setting value · e60ae003
      Tony Nguyen authored
      The RSS key is being repopulated every time the interface is brought up
      regardless of whether there is an existing value. If the user sets the RSS
      key and the interface is brought up (e.g. reset), the user specified RSS
      key will be overwritten.
      
      This patch changes the rss_key to a pointer so we can check to see if the
      key has been populated and preserve it accordingly.
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      e60ae003
    • Tony Nguyen's avatar
      ixgbevf: Fix errors in retrieving RETA and RSS from PF · 82fb670c
      Tony Nguyen authored
      Mailbox support for getting RETA and RSS is available for only 82599 and
      x540; a previous patch reversed the logic and these adapters were
      returning not supported.
      
      Also, the NACK check in ixgbevf_get_rss_key_locked() was checking for the
      command IXGBE_VF_GET_RETA instead of IXGBE_VF_GET_RSS_KEY.
      
      This patch corrects both issues by correcting the logic and checking for
      the right command.
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      82fb670c
    • Tony Nguyen's avatar
      ixgbe: Check for RSS key before setting value · 3dfbfc7e
      Tony Nguyen authored
      The RSS key is being repopulated every time the interface is brought up
      regardless of whether there is an existing value. If the user sets the RSS
      key and the interface is brought up (e.g. reset), the user specified RSS
      key will be overwritten.
      
      This patch changes the rss_key to a pointer so we can check to see if the
      key has been populated and preserve it accordingly.
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      3dfbfc7e
    • Paul Greenwalt's avatar
      ixgbe: Add 1000Base-T device based on X550EM_X MAC · 8dc963e1
      Paul Greenwalt authored
      Add support for new 1000Base-T device based on X550EM_X MAC
      type. All PHY operations are disabled as the PHY is controlled
      by FW.
      Signed-off-by: default avatarPaul Greenwalt <paul.greenwalt@intel.com>
      Tested-by: default avatarKrishneil Singh <krishneil.k.singh@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      8dc963e1
    • Tony Nguyen's avatar
      ixgbe: Allow setting zero MAC address for VF · 27bdc44c
      Tony Nguyen authored
      Currently, there is no logic that allows a VF's MAC address to be removed
      from the RAR table.
      
      Allow the user to specify a zero MAC address in order to clear the VF's
      MAC address from the RAR table.  This functionality is also utilized by
      libvirt when removing VFs.
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      27bdc44c
    • Emil Tantilov's avatar
      ixgbevf: fix size of queue stats length · f87fc447
      Emil Tantilov authored
      IXGBEVF_QUEUE_STATS_LEN is based on ixgebvf_stats, not ixgbe_stats.
      
      This change fixes a bug where ethtool -S displayed some empty fields.
      Signed-off-by: default avatarEmil Tantilov <emil.s.tantilov@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      f87fc447
    • Emil Tantilov's avatar
      ixgbe: clean macvlan MAC filter table on VF reset · e251ecf7
      Emil Tantilov authored
      Flush the macvlan filters on VF reset to avoid conflict with other VFs that
      may end up using the same MAC address.
      
      The main change here is the call to ixgbe_set_vf_macvlan() with index 0.
      
      Moved ixgbe_set_vf_macvlan() in front of ixgbe_vf_reset_event() to avoid
      adding a prototype.
      Reported-by: default avatarSritej Kanakadandi Sritej Rama <skanakad@cisco.com>
      Signed-off-by: default avatarEmil Tantilov <emil.s.tantilov@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      e251ecf7
    • John Fastabend's avatar
      ixgbe: delay tail write to every 'n' packets · 7379f97a
      John Fastabend authored
      Current XDP implementation hits the tail on every XDP_TX return
      code. This patch changes driver behavior to only hit the tail after
      packet processing is complete.
      
      With this patch I can run XDP drop programs @ 14+Mpps and XDP_TX
      programs are at ~13.5Mpps.
      Signed-off-by: default avatarJohn Fastabend <john.r.fastabend@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      7379f97a
    • John Fastabend's avatar
      ixgbe: add support for XDP_TX action · 33fdc82f
      John Fastabend authored
      A couple design choices were made here. First I use a new ring
      pointer structure xdp_ring[] in the adapter struct instead of
      pushing the newly allocated XDP TX rings into the tx_ring[]
      structure. This means we have to duplicate loops around rings
      in places we want to initialize both TX rings and XDP rings.
      But by making it explicit it is obvious when we are using XDP
      rings and when we are using TX rings. Further we don't have
      to do ring arithmatic which is error prone. As a proof point
      for doing this my first patches used only a single ring structure
      and introduced bugs in FCoE code and macvlan code paths.
      
      Second I am aware this is not the most optimized version of
      this code possible. I want to get baseline support in using
      the most readable format possible and then once this series
      is included I will optimize the TX path in another series
      of patches.
      Signed-off-by: default avatarJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      33fdc82f
    • John Fastabend's avatar
      ixgbe: add XDP support for pass and drop actions · 92470808
      John Fastabend authored
      Basic XDP drop support for ixgbe. Uses READ_ONCE/xchg semantics on XDP
      programs instead of RCU primitives as suggested by Daniel Borkmann and
      Alex Duyck.
      
      v2: fix the build issues seen w/ XDP when page sizes are larger than 4K
          and made minor fixes based on feedback from Jakub Kicinski
      Signed-off-by: default avatarJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      92470808
  2. 29 Apr, 2017 1 commit
  3. 28 Apr, 2017 17 commits