1. 15 Sep, 2021 40 commits
    • David S. Miller's avatar
      Merge branch 'mlxsw-Add-support-for-transceiver-modules-reset' · 5706383b
      David S. Miller authored
      Ido Schimmel says:
      
      ====================
      mlxsw: Add support for transceiver modules reset
      
      This patchset prepares mlxsw for future transceiver modules related [1]
      changes and adds reset support via the existing 'ETHTOOL_RESET'
      interface.
      
      Patches #1-#6 are relatively straightforward preparations.
      
      Patch #7 tracks the number of logical ports that are mapped to the
      transceiver module and the number of logical ports using it that are
      administratively up. Needed for both reset support and power mode policy
      support.
      
      Patches #8-#9 add required fields in device registers.
      
      Patch #10 implements support for ethtool_ops::reset in order to reset
      transceiver modules.
      
      [1] https://lore.kernel.org/netdev/20210824130344.1828076-1-idosch@idosch.org/
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5706383b
    • Ido Schimmel's avatar
      mlxsw: Add support for transceiver modules reset · 49fd3b64
      Ido Schimmel authored
      Implement support for ethtool_ops::reset in order to reset transceiver
      modules. The module backing the netdev is reset when the 'ETH_RESET_PHY'
      flag is set. After a successful reset, the flag is cleared by the driver
      and other flags are ignored. This is in accordance with the interface
      documentation:
      
      "The reset() operation must clear the flags for the components which
      were actually reset. On successful return, the flags indicate the
      components which were not reset, either because they do not exist in the
      hardware or because they cannot be reset independently. The driver must
      never reset any components that were not requested."
      
      Reset is useful in order to allow a module to transition out of a fault
      state. From section 6.3.2.12 in CMIS 5.0: "Except for a power cycle, the
      only exit path from the ModuleFault state is to perform a module reset
      by taking an action that causes the ResetS transition signal to become
      TRUE (see Table 6-11)".
      
      An error is returned when the netdev is administratively up:
      
       # ip link set dev swp11 up
      
       # ethtool --reset swp11 phy
       ETHTOOL_RESET 0x40
       Cannot issue ETHTOOL_RESET: Invalid argument
      
       # ip link set dev swp11 down
      
       # ethtool --reset swp11 phy
       ETHTOOL_RESET 0x40
       Components reset:     0x40
      
      An error is returned when the module is shared by multiple ports (split
      ports) and the "phy-shared" flag is not set:
      
       # devlink port split swp11 count 4
      
       # ethtool --reset swp11s0 phy
       ETHTOOL_RESET 0x40
       Cannot issue ETHTOOL_RESET: Invalid argument
      
       # ethtool --reset swp11s0 phy-shared
       ETHTOOL_RESET 0x400000
       Components reset:     0x400000
      
       # devlink port unsplit swp11s0
      
       # ethtool --reset swp11 phy
       ETHTOOL_RESET 0x40
       Components reset:     0x40
      
      An error is also returned when one of the ports using the module is
      administratively up:
      
       # devlink port split swp11 count 4
      
       # ip link set dev swp11s1 up
      
       # ethtool --reset swp11s0 phy-shared
       ETHTOOL_RESET 0x400000
       Cannot issue ETHTOOL_RESET: Invalid argument
      
       # ip link set dev swp11s1 down
      
       # ethtool --reset swp11s0 phy-shared
       ETHTOOL_RESET 0x400000
       Components reset:     0x400000
      
      Reset is performed by writing to the "rst" bit of the PMAOS register,
      which instructs the firmware to assert the reset signal connected to the
      module for a fixed amount of time.
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      49fd3b64
    • Ido Schimmel's avatar
      mlxsw: Make PMAOS pack function more generic · 8f4ebdb0
      Ido Schimmel authored
      The PMAOS register has enable bits (e.g., PMAOS.ee) that allow changing
      only a subset of the fields, which is exactly what subsequent patches
      will need to do. Instead of passing multiple arguments to its pack
      function, only pass the module index and let the rest be set by the
      different callers.
      
      No functional changes intended.
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8f4ebdb0
    • Ido Schimmel's avatar
      mlxsw: reg: Add fields to PMAOS register · ef23841b
      Ido Schimmel authored
      The Ports Module Administrative and Operational Status (PMAOS) register
      configures and retrieves the per-module status. Extend it with fields
      required to support various module settings such as reset and power
      mode.
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ef23841b
    • Ido Schimmel's avatar
      mlxsw: Track per-module port status · 896f399b
      Ido Schimmel authored
      In the common port module core, track the number of logical ports that
      are mapped to the port module and the number of logical ports using it
      that are administratively up.
      
      This will be used by later patches to potentially veto and control
      certain operations on the module, such as reset and setting its power
      mode.
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      896f399b
    • Ido Schimmel's avatar
      mlxsw: spectrum: Do not return an error in mlxsw_sp_port_module_unmap() · 196bff29
      Ido Schimmel authored
      The return value is never checked. Allows us to simplify a later patch.
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      196bff29
    • Ido Schimmel's avatar
      mlxsw: spectrum: Do not return an error in ndo_stop() · 06277ca2
      Ido Schimmel authored
      The return value is not checked by the networking stack. Allows us to
      simplify a later patch.
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      06277ca2
    • Ido Schimmel's avatar
      mlxsw: core_env: Convert 'module_info_lock' to a mutex · bd6e43f5
      Ido Schimmel authored
      After the previous patch, the lock is always taken in process context so
      it can be converted to a mutex. It is needed for future changes where we
      will need to be able to sleep when holding the lock.
      
      Convert the lock to a mutex.
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bd6e43f5
    • Ido Schimmel's avatar
      mlxsw: core_env: Defer handling of module temperature warning events · 163f3d2d
      Ido Schimmel authored
      Module temperature events are currently handled in softIRQ context,
      requiring the 'module_info_lock' to be a spin lock. In future patchsets
      we will need to be able to hold the lock while sleeping.
      
      Therefore, defer handling of these events using a work queue so that the
      next patch will be able to convert the lock to a mutex.
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      163f3d2d
    • Ido Schimmel's avatar
      mlxsw: core: Remove mlxsw_core_is_initialized() · 25a91f83
      Ido Schimmel authored
      After the previous patch, the switch driver is always initialized last,
      making this function redundant.
      
      Remove it.
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      25a91f83
    • Ido Schimmel's avatar
      mlxsw: core: Initialize switch driver last · 3d7a6f67
      Ido Schimmel authored
      Commit 961cf99a ("mlxsw: core: Re-order initialization sequence")
      changed the initialization sequence so that the switch driver (e.g.,
      mlxsw_spectrum) is initialized before registration with the hwmon and
      thermal subsystems.
      
      This was done in order to avoid situations where hwmon/thermal code uses
      features not supported by current firmware version, which is only
      validated as part of switch driver initialization.
      
      Later, commit b79cb787 ("mlxsw: Move fw flashing code into core.c")
      moved firmware validation and flashing code from the switch driver to
      mlxsw_core so that it is performed before driver initialization.
      
      Therefore, change the initialization sequence back to its original form.
      
      In addition to being more straightforward, it will allow us to simplify
      parts of the code in subsequent patches and future patchsets.
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3d7a6f67
    • David S. Miller's avatar
      Merge branch 'devlink-delete-publidh-api' · 00135227
      David S. Miller authored
      Leon Romanovsky says:
      
      ====================
      devlink: Delete publish of single parameter API
      
      This short series removes the single parameter publish/unpublish API
      that does nothing expect mimics already existing
      devlink_paramss_*publish calls.
      
      In near future, we will be able to delete devlink_paramss_*publish too.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      00135227
    • Leon Romanovsky's avatar
      devlink: Delete not-used single parameter notification APIs · c2d2f988
      Leon Romanovsky authored
      There is no need in specific devlink_param_*publish(), because same
      output can be achieved by using devlink_params_*publish() in correct
      places.
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c2d2f988
    • Leon Romanovsky's avatar
      net/mlx5: Publish and unpublish all devlink parameters at once · e9310aed
      Leon Romanovsky authored
      The devlink parameters were published in two steps despite being static
      and known in advance.
      
      First step was to use devlink_params_publish() which iterated over all
      known up to that point parameters and sent notification messages.
      In second step, the call was devlink_param_publish() that looped over
      same parameters list and sent notification for new parameters.
      
      In order to simplify the API, move devlink_params_publish() to be called
      when all parameters were already added and save the need to iterate over
      parameters list again.
      
      As a side effect, this change fixes the error unwind flow in which
      parameters were not marked as unpublished.
      
      Fixes: 82e6c96f ("net/mlx5: Register to devlink ingress VLAN filter trap")
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e9310aed
    • David S. Miller's avatar
      Merge branch 'qdisc-visibility' · dc50b930
      David S. Miller authored
      Jakub Kicinski says:
      
      ====================
      net: sched: update default qdisc visibility after Tx queue cnt changes
      
      Matthew noticed that number of children reported by mq does not match
      number of queues on reconfigured interfaces. For example if mq is
      instantiated when there is 8 queues it will always show 8 children,
      regardless of config being changed:
      
       # ethtool -L eth0 combined 8
       # tc qdisc replace dev eth0 root handle 100: mq
       # tc qdisc show dev eth0
       qdisc mq 100: root
       qdisc pfifo_fast 0: parent 100:8 bands 3 priomap 1 2 ...
       qdisc pfifo_fast 0: parent 100:7 bands 3 priomap 1 2 ...
       qdisc pfifo_fast 0: parent 100:6 bands 3 priomap 1 2 ...
       qdisc pfifo_fast 0: parent 100:5 bands 3 priomap 1 2 ...
       qdisc pfifo_fast 0: parent 100:4 bands 3 priomap 1 2 ...
       qdisc pfifo_fast 0: parent 100:3 bands 3 priomap 1 2 ...
       qdisc pfifo_fast 0: parent 100:2 bands 3 priomap 1 2 ...
       qdisc pfifo_fast 0: parent 100:1 bands 3 priomap 1 2 ...
       # ethtool -L eth0 combined 1
       # tc qdisc show dev eth0
       qdisc mq 100: root
       qdisc pfifo_fast 0: parent 100:8 bands 3 priomap 1 2 ...
       qdisc pfifo_fast 0: parent 100:7 bands 3 priomap 1 2 ...
       qdisc pfifo_fast 0: parent 100:6 bands 3 priomap 1 2 ...
       qdisc pfifo_fast 0: parent 100:5 bands 3 priomap 1 2 ...
       qdisc pfifo_fast 0: parent 100:4 bands 3 priomap 1 2 ...
       qdisc pfifo_fast 0: parent 100:3 bands 3 priomap 1 2 ...
       qdisc pfifo_fast 0: parent 100:2 bands 3 priomap 1 2 ...
       qdisc pfifo_fast 0: parent 100:1 bands 3 priomap 1 2 ...
       # ethtool -L eth0 combined 32
       # tc qdisc show dev eth0
       qdisc mq 100: root
       qdisc pfifo_fast 0: parent 100:8 bands 3 priomap 1 2 ...
       qdisc pfifo_fast 0: parent 100:7 bands 3 priomap 1 2 ...
       qdisc pfifo_fast 0: parent 100:6 bands 3 priomap 1 2 ...
       qdisc pfifo_fast 0: parent 100:5 bands 3 priomap 1 2 ...
       qdisc pfifo_fast 0: parent 100:4 bands 3 priomap 1 2 ...
       qdisc pfifo_fast 0: parent 100:3 bands 3 priomap 1 2 ...
       qdisc pfifo_fast 0: parent 100:2 bands 3 priomap 1 2 ...
       qdisc pfifo_fast 0: parent 100:1 bands 3 priomap 1 2 ...
      
      This patchset fixes this by hashing and unhasing the default
      child qdiscs as number of queues gets adjusted.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dc50b930
    • Jakub Kicinski's avatar
      selftests: net: test ethtool -L vs mq · 2d6a5899
      Jakub Kicinski authored
      Add a selftest for checking mq children are visible after ethtool -L.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2d6a5899
    • Jakub Kicinski's avatar
      netdevsim: add ability to change channel count · 2e367522
      Jakub Kicinski authored
      For testing visibility of mq/mqprio default children.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2e367522
    • Jakub Kicinski's avatar
      net: sched: update default qdisc visibility after Tx queue cnt changes · 1e080f17
      Jakub Kicinski authored
      mq / mqprio make the default child qdiscs visible. They only do
      so for the qdiscs which are within real_num_tx_queues when the
      device is registered. Depending on order of calls in the driver,
      or if user space changes config via ethtool -L the number of
      qdiscs visible under tc qdisc show will differ from the number
      of queues. This is confusing to users and potentially to system
      configuration scripts which try to make sure qdiscs have the
      right parameters.
      
      Add a new Qdisc_ops callback and make relevant qdiscs TTRT.
      
      Note that this uncovers the "shortcut" created by
      commit 1f27cde3 ("net: sched: use pfifo_fast for non real queues")
      The default child qdiscs beyond initial real_num_tx are always
      pfifo_fast, no matter what the sysfs setting is. Fixing this
      gets a little tricky because we'd need to keep a reference
      on whatever the default qdisc was at the time of creation.
      In practice this is likely an non-issue the qdiscs likely have
      to be configured to non-default settings, so whatever user space
      is doing such configuration can replace the pfifos... now that
      it will see them.
      Reported-by: default avatarMatthew Massey <matthewmassey@fb.com>
      Reviewed-by: default avatarDave Taht <dave.taht@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1e080f17
    • David S. Miller's avatar
      Merge branch 'ibmvnic-next' · c506cc5b
      David S. Miller authored
      Sukadev Bhattiprolu says:
      
      ====================
      ibmvnic: Reuse ltb, rx, tx pools
      
      It can take a long time to free and reallocate rx and tx pools and long
      term buffer (LTB) during each reset of the VNIC. This is specially true
      when the partition (LPAR) is heavily loaded and going through a Logical
      Partition Migration (LPM). The long drawn reset causes the LPAR to lose
      connectivity for extended periods of time and results in "RMC connection"
      errors and the LPM failing.
      
      What is worse is that during the LPM we could get a failover because
      of the lost connectivity. At that point, the vnic driver releases
      even the resources it has already allocated and starts over.
      
      As long as the resources we have already allocated are valid/applicable,
      we might as well hold on to them while trying to allocate the remaining
      resources. This patch set attempts to reuse the resources previously
      allocated as long as they are valid. It seems to vastly improve the
      time taken for the vnic reset and signficantly reduces the chances of
      getting the RMC connection errors. We do get still them occasionally,
      but appears to be for reasons other than memory allocation delays and
      those are still being investigated.
      
      If the backing devices for a vnic adapter are not "matched" (see "pool
      parameters" in patches 8 and 9) it is possible that we will still free
      all the resources and allocate them. If that becomes a common problem,
      we have to address it separately.
      
      Thanks to input and extensive testing from Brian King, Cris Forno,
      Dany Madden, Rick Lindsley.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c506cc5b
    • Sukadev Bhattiprolu's avatar
      ibmvnic: Reuse tx pools when possible · bbd80930
      Sukadev Bhattiprolu authored
      Rather than releasing the tx pools on every close and reallocating
      them on open, reuse the tx pools unless the pool parameters (number
      of pools, size of each pool or size of each buffer in a pool) have
      changed.
      
      If the pool parameters changed, then release the old pools (if
      any) and allocate new ones.
      
      Specifically release tx pools, if:
      	- adapter is removed,
      	- pool parameters change during reset,
      	- we encounter an error when opening the adapter in response
      	  to a user request (in ibmvnic_open()).
      
      and don't release them:
      	- in __ibmvnic_close() or
      	- on errors in __ibmvnic_open()
      
      in the hope that we can reuse them during this or next reset.
      
      With these changes reset_tx_pools() can be dropped because its
      optimization is now included in init_tx_pools() itself.
      
      cleanup_tx_pools() releases all the skbs associated with the pool and
      is called from ibmvnic_cleanup(), which is called on every reset. Since
      we want to reuse skbs across resets, move cleanup_tx_pools() out of
      ibmvnic_cleanup() and call it only when user closes the adapter.
      
      Add two new adapter fields, ->prev_mtu, ->prev_tx_pool_size to track the
      previous values and use them to decide whether to reuse or realloc the
      pools.
      Reviewed-by: default avatarRick Lindsley <ricklind@linux.vnet.ibm.com>
      Reviewed-by: default avatarDany Madden <drt@linux.ibm.com>
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bbd80930
    • Sukadev Bhattiprolu's avatar
      ibmvnic: Reuse rx pools when possible · 489de956
      Sukadev Bhattiprolu authored
      Rather than releasing the rx pools on and reallocating them on every
      reset, reuse the rx pools unless the pool parameters (number of pools,
      size of each pool or size of each buffer in a pool) have changed.
      
      If the pool parameters changed, then release the old pools (if any)
      and allocate new ones.
      
      Specifically release rx pools, if:
      	- adapter is removed,
      	- pool parameters change during reset,
      	- we encounter an error when opening the adapter in response
      	  to a user request (in ibmvnic_open()).
      
      and don't release them:
      	- in __ibmvnic_close() or
      	- on errors in __ibmvnic_open()
      
      in the hope that we can reuse them on the next reset.
      
      With these, reset_rx_pools() can be dropped because its optimzation is
      now included in init_rx_pools() itself.
      
      cleanup_rx_pools() releases all the skbs associated with the pool and
      is called from ibmvnic_cleanup(), which is called on every reset. Since
      we want to reuse skbs across resets, move cleanup_rx_pools() out of
      ibmvnic_cleanup() and call it only when user closes the adapter.
      
      Add two new adapter fields, ->prev_rx_buf_sz, ->prev_rx_pool_size to
      keep track of the previous values and use them to decide whether to
      reuse or realloc the pools.
      Reviewed-by: default avatarRick Lindsley <ricklind@linux.vnet.ibm.com>
      Reviewed-by: default avatarDany Madden <drt@linux.ibm.com>
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      489de956
    • Sukadev Bhattiprolu's avatar
      ibmvnic: Reuse LTB when possible · f8ac0bfa
      Sukadev Bhattiprolu authored
      Reuse the long term buffer during a reset as long as its size has
      not changed. If the size has changed, free it and allocate a new
      one of the appropriate size.
      
      When we do this, alloc_long_term_buff() and reset_long_term_buff()
      become identical. Drop reset_long_term_buff().
      Reviewed-by: default avatarRick Lindsley <ricklind@linux.vnet.ibm.com>
      Reviewed-by: default avatarDany Madden <drt@linux.ibm.com>
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f8ac0bfa
    • Sukadev Bhattiprolu's avatar
      ibmvnic: Use bitmap for LTB map_ids · 129854f0
      Sukadev Bhattiprolu authored
      In a follow-on patch, we will reuse long term buffers when possible.
      When doing so we have to be careful to properly assign map ids. We
      can no longer assign them sequentially because a lower map id may be
      available and we could wrap at 255 and collide with an in-use map id.
      
      Instead, use a bitmap to track active map ids and to find a free map id.
      Don't need to take locks here since the map_id only changes during reset
      and at that time only the reset worker thread should be using the adapter.
      
      Noticed this when analyzing an error Dany Madden ran into with the
      patch set.
      Reported-by: default avatarDany Madden <drt@linux.ibm.com>
      Reviewed-by: default avatarRick Lindsley <ricklind@linux.vnet.ibm.com>
      Reviewed-by: default avatarDany Madden <drt@linux.ibm.com>
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      129854f0
    • Sukadev Bhattiprolu's avatar
      ibmvnic: init_tx_pools move loop-invariant code · 0d1af4fa
      Sukadev Bhattiprolu authored
      In init_tx_pools() move some loop-invariant code out of the loop.
      Reviewed-by: default avatarRick Lindsley <ricklind@linux.vnet.ibm.com>
      Reviewed-by: default avatarDany Madden <drt@linux.ibm.com>
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d1af4fa
    • Sukadev Bhattiprolu's avatar
      ibmvnic: Use/rename local vars in init_tx_pools · 8243c7ed
      Sukadev Bhattiprolu authored
      Use/rename local variables in init_tx_pools() for consistency with
      init_rx_pools() and for readability. Also add some comments
      Reviewed-by: default avatarRick Lindsley <ricklind@linux.vnet.ibm.com>
      Reviewed-by: default avatarDany Madden <drt@linux.ibm.com>
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8243c7ed
    • Sukadev Bhattiprolu's avatar
      ibmvnic: Use/rename local vars in init_rx_pools · 0df7b9ad
      Sukadev Bhattiprolu authored
      To make the code more readable, use/rename some local variables.
      Basically we have a set of pools, num_pools. Each pool has a set of
      buffers, pool_size and each buffer is of size buff_size.
      
      pool_size is a bit ambiguous (whether size in bytes or buffers). Add
      a comment in the header file to make it explicit.
      Reviewed-by: default avatarRick Lindsley <ricklind@linux.vnet.ibm.com>
      Reviewed-by: default avatarDany Madden <drt@linux.ibm.com>
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0df7b9ad
    • Sukadev Bhattiprolu's avatar
      ibmvnic: Fix up some comments and messages · 0f2bf318
      Sukadev Bhattiprolu authored
      Add/update some comments/function headers and fix up some messages.
      Reviewed-by: default avatarRick Lindsley <ricklind@linux.vnet.ibm.com>
      Reviewed-by: default avatarDany Madden <drt@linux.ibm.com>
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0f2bf318
    • Sukadev Bhattiprolu's avatar
      ibmvnic: Consolidate code in replenish_rx_pool() · 38106b2c
      Sukadev Bhattiprolu authored
      For better readability, consolidate related code in replenish_rx_pool()
      and add some comments.
      Reviewed-by: default avatarRick Lindsley <ricklind@linux.vnet.ibm.com>
      Reviewed-by: default avatarDany Madden <drt@linux.ibm.com>
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      38106b2c
    • David S. Miller's avatar
      Merge branch 'ptp-ocp-timecard-v13-fw' · 923990f6
      David S. Miller authored
      Jonathan Lemon says:
      
      ====================
      timecard updates for v13 firmware
      
      This update mainly deals with features for the TimeCard v13 firmware.
      
      The signals provided from the external SMA connectors can be steered
      to different locations, and the generated SMA signals can be chosen.
      
      Future timecard revisions will allow selectable I/O on any of the
      SMA connectors, so name the attributes appropriately, and set up
      the ABI in preparation for the new features.
      
      The update also adds support for IRIG-B and DCF formats, as well
      as NMEA output.
      
      A ts_window_adjust tunable is also provided to fine-tune the
      PHC:SYS time mapping.
      --
      v1: Earlier reviewed series was for v10 firmware, this is expanded to
          include the v13 features.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      923990f6
    • Jonathan Lemon's avatar
      docs: ABI: Add sysfs documentation for timecard · d7050a2b
      Jonathan Lemon authored
      This patch describes the sysfs interface implemented by the
      ptp_ocp driver, under /sys/class/timecard.
      Signed-off-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d7050a2b
    • Jonathan Lemon's avatar
      ptp: ocp: Add timestamp window adjustment · 1acffc6e
      Jonathan Lemon authored
      The following process is used to read the PHC clock and correlate
      the reading with the "correct" system time.
      
      - get starting timestamp
      - issue PCI write command
      - issue PCI read command
      - get ending timestamp
      - read latched sec/nsec registers
      
      The write command is posted to PCI bus and returns.  When the write
      arrives at the FPGA, the PHC time is latched into the sec/nsec registers,
      and a flag is set indicating the registers are valid.  The read command
      returns this flag, and the time retrieval proceeds.
      
      Below is a non-scaled picture of the timing diagram involved.  The
      PHC time corresponds to some SYS time between [start, end].  Userspace
      usually uses the midpoint between [start, end] to estimate the PCI
      delay and match this with the PHC time.
      
       [start] |                |
         write |-------+        |
      	 |        \       |
          read |----+    +----->|
      	 |     \          * PHC time latched into register
      	 |      \         |
      midpoint |       +------->|
      	 |                |
      	 |                |
      	 |           +----|
      	 |          /     |
      	 |<--------+      |
         [end] |                |
      
      As the diagram indicates, the PHC time is latched before the midpoint,
      so the system clock time is slightly off the real PHC time.  This shows
      up as a phase error with an oscilliscope.
      
      The workaround here is to provide a tunable which reduces (shrinks)
      the end time in the above diagram.  This in turn moves the calculated
      midpoint so the system time and PHC time are in agreemment.
      
      Currently, the adjustment reduces the end time by 3/16th of the entire
      window.  E.g.:  [start, end] ==> [start, (end - (3/16 * end)], which
      produces reasonably good results.
      
      Also reduce delays by just writing to the clock control register
      instead of performing a read/modify/write sequence, as the contents
      of the control register are known.
      Signed-off-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1acffc6e
    • Jonathan Lemon's avatar
      ptp: ocp: Have FPGA fold in ns adjustment for adjtime. · 6d59d4fa
      Jonathan Lemon authored
      The current implementation of adjtime uses gettime/settime to
      perform nanosecond adjustments.  This introduces addtional phase
      errors due to delays.
      
      Instead, use the FPGA's ability to just apply the nanosecond
      adjustment to the clock directly.
      Signed-off-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6d59d4fa
    • Jonathan Lemon's avatar
      ptp: ocp: Enable 4th timestamper / PPS generator · a62a56d0
      Jonathan Lemon authored
      A 4th timestamper is added which timestamps the output of the PHC.
      
      The clock nanosecond offset is not always zero, so when compared
      to other timestampers, this provides precise measurements.
      
      Also, the timestamper interrupt from the PHC can be used to generate
      a PPS signal for /dev/pps.
      
      Also allow PTP_CLK_REQ_PEROUT requests for a 1PPS output, but do
      not actually configure any output pins, this is done via sysfs.
      Signed-off-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a62a56d0
    • Jonathan Lemon's avatar
      ptp: ocp: Add second GNSS device · 71d7e085
      Jonathan Lemon authored
      Upcoming boards may have a second GNSS receiver, getting information
      from a different constellation than the first receiver, which provides
      some measure of anti-spoofing.
      
      Expose the sysfs attribute for this device, if detected.
      Signed-off-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      71d7e085
    • Jonathan Lemon's avatar
      ptp: ocp: Add NMEA output · e3516bb4
      Jonathan Lemon authored
      The timecard can provide a NMEA-1083 ZDA (time and date) output
      string on a serial port, which can be used to drive other devices.
      
      Add the NMEA resources, and the serial port as a sysfs attribute.
      Signed-off-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e3516bb4
    • Jonathan Lemon's avatar
      ptp: ocp: Add debugfs entry for timecard · f67bf662
      Jonathan Lemon authored
      Provide a view into the timecard internals for debugging.
      Signed-off-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f67bf662
    • Jonathan Lemon's avatar
      ptp: ocp: Separate the init and info logic · 065efcc5
      Jonathan Lemon authored
      On startup, parts of the FPGA need to be initialized - break these
      out into their own functions, separate from the purely informational
      blocks.
      
      On startup, distrbute the UTC:TAI offset from the NMEA GNSS parser,
      if it is available.
      Signed-off-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      065efcc5
    • Jonathan Lemon's avatar
      ptp: ocp: Add sysfs attribute utc_tai_offset · 89260d87
      Jonathan Lemon authored
      IRIG and DCF output time in UTC, but the timecard operates
      on TAI internally.  Add an attribute node which allows adding
      an offset to these modes before output.
      Signed-off-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      89260d87
    • Jonathan Lemon's avatar
      ptp: ocp: Add IRIG-B output mode control · d14ee252
      Jonathan Lemon authored
      IRIG-B has several different output formats, the timecard defaults
      to using B007.  Add a control which selects different output modes.
      Signed-off-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d14ee252
    • Jonathan Lemon's avatar
      ptp: ocp: Add IRIG-B and DCF blocks · 6baf2925
      Jonathan Lemon authored
      IRIG (Inter-range Instrumentation Group) timecode format on
      one of the SMA output channels is provided by the IRIG master
      FPGA block.  Enable the master when the IRIG output format is
      selected on either one of the output channels.
      
      By default, the output is in B007 format.
      
      DCF output format is provided by the DCF master block.
      
      Also enable the IRIG and DCF slaves, which parse an incoming
      signal from the external SMA connectors, and may be used to
      adjust the PHC.
      Signed-off-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6baf2925