1. 29 Mar, 2017 25 commits
    • David S. Miller's avatar
      Merge branch 'qed-load-unload-mfw' · c552a50e
      David S. Miller authored
      Yuval Mintz says:
      
      ====================
      qed: load/unload mfw series
      
      This series correct the unload flow and greatly enhances its
      initialization flow in regard to interactions between driver
      and management firmware.
      
      Patch #1 makes sure unloading is done under management-firmware's
      'criticial section' protection.
      
      Patches #2 - #4 move driver into using a newer scheme for loading
      in regard to the MFW; This newer scheme would help cleaning the device
      in case a previous instance has dirtied it [preboot, PDA, etc.].
      
      Patches #5 - #6 let driver inform management-firmware on number of
      resources which are dependent on the non-management firmware used.
      Patch #7 then uses a new resource [BDQ] instead of some set value.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c552a50e
    • Mintz, Yuval's avatar
      qed: Use BDQ resource for storage protocols · d0d40a73
      Mintz, Yuval authored
      Until now, qed used some port-defined value as BDQ index for both iSCSI
      and FCoE.
      
      As management firmware now treats BDQ as a resource and tells each PF
      its BDQ-range, start using a valure from that range instead.
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d0d40a73
    • Tomer Tayar's avatar
      qed: Utilize resource-lock based scheme · 9c8517c4
      Tomer Tayar authored
      Management firmware is used as an arbiter between the various PFs
      in matters of resources, but some of the resources that need to
      be divided are dependent on the non-management firmware used,
      so management firmware first needs to be told how many resources
      there are before trying to divide them.
      
      As part of the initialization sequence, driver would first inform
      the management firmware of the available resources under
      a dedicated resource lock, and afterwards request for various
      resources which might be based on the previous set values.
      Signed-off-by: default avatarTomer Tayar <Tomer.Tayar@cavium.com>
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9c8517c4
    • Tomer Tayar's avatar
      qed: Support management-based resource locking · 95691c9c
      Tomer Tayar authored
      Global locking can't properly be used to synchronize between different
      PFs in all scenarios, as those instances might reside in different
      logical partitions [e.g., when a PF is assigned via PDA to some VM].
      
      The management firmware provides a generic infrastructure for
      device locks. For each 'resource', it's guaranteed it could be acquired
      by at most a single PF at any given time [or by management firmware].
      
      This patch adds the necessary logic in qed for utilizing said
      infrastructure, implementing lock/unlock internal APIs.
      Signed-off-by: default avatarTomer Tayar <Tomer.Tayar@cavium.com>
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95691c9c
    • Mintz, Yuval's avatar
      qed: Send pf-flr as part of initialization · 18a69e36
      Mintz, Yuval authored
      During HW initialization, driver would set various registers to their
      needed values - but it assumes all registers start at their reset-value,
      so there's no need to re-configure a register's default value.
      
      This assumption might be incorrect, e.g., in case of preboot driver
      running and initializing the driver prior to our driver.
      
      To overcome this, we now ask management firmware to initiate a PF-flr
      early during the initialization sequence. That would return everything
      in the PF's scope back to default and prevent previous configurations
      from still being applied.
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      18a69e36
    • Tomer Tayar's avatar
      qed: Move to new load request scheme · 5d24bcf1
      Tomer Tayar authored
      Management firmware is used as an arbiter between the various PFs
      in regard to loading - it causes the various PFs to load/unload
      sequentially and informs each of its appropriate rule in the init.
      
      But the existing flow is too weak to handle some scenarios where
      PFs aren't properly cleaned prior to loading.
      The significant scenarios falling under this criteria:
        a. Preboot drivers in some environment can't properly unload.
        b. Unexpected driver replacement [kdump, PDA].
      
      Modern management firmware supports a more intricate loading flow,
      where the driver has the ability to overcome previous limitations.
      This moves qed into using this newer scheme.
      
      Notice new scheme is backward compatible, so new drivers would
      still be able to load properly on top of older management firmwares
      and vice versa.
      Signed-off-by: default avatarTomer Tayar <Tomer.Tayar@cavium.com>
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5d24bcf1
    • Mintz, Yuval's avatar
      qed: hw_init() to receive parameter-struct · c0c2d0b4
      Mintz, Yuval authored
      We'll soon need additional information, so start by changing
      the infrastructure to receive the initializing variables
      via a parameter struct.
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c0c2d0b4
    • Tomer Tayar's avatar
      qed: Correct HW stop flow · 1226337a
      Tomer Tayar authored
      Management firmware is used as arbiter between different PFs
      which are loading/unloading, but in order to use the synchronization
      it offers the contending configurations need to be applied either
      between their LOAD_REQ <-> LOAD_DONE or UNLOAD_REQ <-> UNLOAD_DONE
      management firmware commands.
      
      Existing HW stop flow utilizes 2 different functions: qed_hw_stop() and
      qed_hw_reset() which don't abide this requirement; Most of the closure
      is doing outside the scope of the unload request.
      
      This patch removes qed_hw_reset() and places the relevant stop
      functionality underneath the management firmware protection.
      Signed-off-by: default avatarTomer Tayar <Tomer.Tayar@cavium.com>
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1226337a
    • David S. Miller's avatar
      Merge branch 'tipc-subscription-refcount-simplifications' · 30b38236
      David S. Miller authored
      Parthasarathy Bhuvaragan says:
      
      ====================
      tipc: subscription refcount simplifications
      
      The first patch makes the subscription refcount cleanup lockless and
      the second updates the subscription refcount policy.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      30b38236
    • Ying Xue's avatar
      tipc: adjust the policy of holding subscription kref · 7efea60d
      Ying Xue authored
      When a new subscription object is inserted into name_seq->subscriptions
      list, it's under name_seq->lock protection; when a subscription is
      deleted from the list, it's also under the same lock protection;
      similarly, when accessing a subscription by going through subscriptions
      list, the entire process is also protected by the name_seq->lock.
      
      Therefore, if subscription refcount is increased before it's inserted
      into subscriptions list, and its refcount is decreased after it's
      deleted from the list, it will be unnecessary to hold refcount at all
      before accessing subscription object which is obtained by going through
      subscriptions list under name_seq->lock protection.
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7efea60d
    • Ying Xue's avatar
      tipc: advance the time of deleting subscription from subscriber->subscrp_list · 139bb36f
      Ying Xue authored
      After a subscription object is created, it's inserted into its
      subscriber subscrp_list list under subscriber lock protection,
      similarly, before it's destroyed, it should be first removed from
      its subscriber->subscrp_list. Since the subscription list is
      accessed with subscriber lock, all the subscriptions are valid
      during the lock duration. Hence in tipc_subscrb_subscrp_delete(), we
      remove subscription get/put and the extra subscriber unlock/lock.
      
      After this change, the subscriptions refcount cleanup is very simple
      and does not access any lock.
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      139bb36f
    • Arnd Bergmann's avatar
      stmmac: use netif_set_real_num_{rx,tx}_queues · 589a1a2e
      Arnd Bergmann authored
      A driver must not access the two fields directly but should instead use
      the helper functions to set the values and keep a consistent internal
      state:
      
      ethernet/stmicro/stmmac/stmmac_main.c: In function 'stmmac_dvr_probe':
      ethernet/stmicro/stmmac/stmmac_main.c:4083:8: error: 'struct net_device' has no member named 'real_num_rx_queues'; did you mean 'real_num_tx_queues'?
      
      Fixes: a8f5102a ("net: stmmac: TX and RX queue priority configuration")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      589a1a2e
    • Bjorn Andersson's avatar
      soc: qcom: smd-rpm: Add msm8996 compatibility · 2b624250
      Bjorn Andersson authored
      With the RPM driver transitioned to RPMSG we can reuse the SMD-RPM
      driver ontop of GLINK for 8996, without any modifications.
      Acked-by: default avatarAndy Gross <andy.gross@linaro.org>
      Signed-off-by: default avatarBjorn Andersson <bjorn.andersson@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2b624250
    • Bjorn Andersson's avatar
      soc: qcom: smd: Remove standalone driver · 395a4805
      Bjorn Andersson authored
      Remove the standalone SMD implementation as we have transitioned the
      client drivers to use the RPMSG based one.
      
      Also remove all dependencies on QCOM_SMD from Kconfig files, in order to
      keep them selectable in the absence of the removed symbol.
      Acked-by: default avatarAndy Gross <andy.gross@linaro.org>
      Signed-off-by: default avatarBjorn Andersson <bjorn.andersson@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      395a4805
    • Bjorn Andersson's avatar
      soc: qcom: smd: Transition client drivers from smd to rpmsg · 5052de8d
      Bjorn Andersson authored
      By moving these client drivers to use RPMSG instead of the direct SMD
      API we can reuse them ontop of the newly added GLINK wire-protocol
      support found in the 820 and 835 Qualcomm platforms.
      
      As the new (RPMSG-based) and old SMD implementations are mutually
      exclusive we have to change all client drivers in one commit, to make
      sure we have a working system before and after this transition.
      Acked-by: default avatarAndy Gross <andy.gross@linaro.org>
      Acked-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Acked-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: default avatarBjorn Andersson <bjorn.andersson@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5052de8d
    • Roopa Prabhu's avatar
      vxlan: don't age NTF_EXT_LEARNED fdb entries · def499c9
      Roopa Prabhu authored
      vxlan driver already implicitly supports installing
      of external fdb entries with NTF_EXT_LEARNED. This
      patch just makes sure these entries are not aged
      by the vxlan driver. An external entity managing these
      entries will age them out. This is consistent with
      the use of NTF_EXT_LEARNED in the bridge driver.
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      def499c9
    • David S. Miller's avatar
      Merge branch 'net-dpipe' · 2a69ca71
      David S. Miller authored
      Jiri Pirko says:
      
      ====================
      Add support for pipeline debug (dpipe)
      
      Arkadi says:
      
      While doing the hardware offloading process much of the hardware
      specifics cannot be presented. An example for such is the routing
      LPM algorithm which differ in hardware implementation from the
      kernel software implementation. The only information the user receives
      is whether specific route is offloaded or not, but he cannot really
      understand the underlying implementation nor get the specific statistics
      related to that process.
      
      Another example is ACL offload using TC which is commonly implemented
      using TCAM memory. Currently there is no capability to gain visibility
      into the TCAM structure and to debug suboptimal resource allocation.
      
      This patchset introduces capability for exporting the ASICs pipeline
      abstraction via devlink infrastructure, which should serve as an
      complementary tool. This infrastructure allows the user to get visibility
      into the ASIC by modeling it as a set of match/action tables.
      
      The main objects defined:
      Table - abstraction for a single pipeline stage. Contains the
              available match/actions and counter availability.
      Entry - entry in a specific table with specific matches/actions
              values and dedicated counter.
      Header/field - tuples which describes the tables behavior.
      
      As an example one of the ASIC's L3 blocks will be modeled. The egress
      rif (router interface) table is the final step in the L3 pipeline
      processing which does match on the internal rif index which was
      determined before by the routing logic. The erif table determines
      whether to forward or drop the packet and updates the corresponding
      rif L3 statistics.
      
      To expose this internal resources a special metadata header will
      be introduced that describes the internal information gathered by
      the ASIC's pipeline and contains the following fields: rif_port_index,
      forward and drop.
      
      Some internal hardware resources have direct mapping to kernel
      objects. For example the rif_port_index is mapped to the net-devices
      ifindex. By providing this mapping the users gains visibility into
      the offloading process.
      
      Follow-up work will include exporting more L3 tables which will give
      visibility into the routing process.
      
      First stage is adding support for dpipe in devlink. Next add support
      in spectrum driver. Finally implement egress router interface
      (erif) table for spectrum ASIC as an example.
      
      ---
      v1->v2: Please see individual patches
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2a69ca71
    • Arkadi Sharshevsky's avatar
      mlxsw: spectrum: Add Support for erif table entries access · 2ba5999f
      Arkadi Sharshevsky authored
      Implement dpipe's table ops for erif table which provide:
      1. Getting the entries in the table with the associate values.
      	- match on "mlxsw_meta:erif_index"
      	- action on "mlxsw_meta:forwared_out"
      2. Synchronize the hardware in case of enabling/disabling counters which
         mean removing erif counters from all interfaces.
      Signed-off-by: default avatarArkadi Sharshevsky <arkadis@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2ba5999f
    • Arkadi Sharshevsky's avatar
      mlxsw: spectrum_router: Add rif helper functions · fd1b9d41
      Arkadi Sharshevsky authored
      Add rif helper function to access the rif index and rif devices ifindex.
      This functions will be used by dpipe in order to dump the rif table.
      Signed-off-by: default avatarArkadi Sharshevsky <arkadis@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fd1b9d41
    • Arkadi Sharshevsky's avatar
      mlxsw: spectrum: Support for counters on router interfaces · e0c0afd8
      Arkadi Sharshevsky authored
      Add support for counter allocation on router interfaces. The allocation
      depends on the counter state of relevant table. In case the counting is
      disabled or no counters left the counter index will be set as invalid.
      
      Also a counter pool for router allocation is added.
      Signed-off-by: default avatarArakdi Sharshevsky <arkadis@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e0c0afd8
    • Arkadi Sharshevsky's avatar
      mlxsw: reg: Add Router Interface Counter Register · ba73e97a
      Arkadi Sharshevsky authored
      The RICNT register retrieves per port performance counter. It will be
      used to query the router interfaces statistics.
      Signed-off-by: default avatarArkadi Sharshevsky <arkadis@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ba73e97a
    • Arkadi Sharshevsky's avatar
      mlxsw: spectrum: Add definition for egress rif table · d54b70fe
      Arkadi Sharshevsky authored
      Add definition for egress router interface table. This table describes
      the final part in the routing pipeline. This table matches the egress
      interface index (rif index, which is set by the previous stages and
      determine the out port) and makes the decision of forwarding the packet
      towards the L2 logic or dropping it.
      
      The metadata header is added to represent this internal information.
      The rif index field is mapped logically to netdevice ifindex.
      Signed-off-by: default avatarArkadi Sharshevsky <arkadis@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d54b70fe
    • Arkadi Sharshevsky's avatar
      mlxsw: spectrum: Add placeholder for dpipe · 230ead01
      Arkadi Sharshevsky authored
      Add placeholder for dpipe. Support for specific tables and headers will
      be introduced in following patches. The headers are shared between all
      mlxsw_sp instances.
      Signed-off-by: default avatarArkadi Sharshevsky <arkadis@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      230ead01
    • Arkadi Sharshevsky's avatar
      mlxsw: reg: Add counter fields to RITR register · 0f630fcb
      Arkadi Sharshevsky authored
      Update RITR for counter support. This allows adding counters for
      ASIC's router ports.
      Signed-off-by: default avatarArkadi Sharshevsky <arkadis@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0f630fcb
    • Arkadi Sharshevsky's avatar
      devlink: Support for pipeline debug (dpipe) · 1555d204
      Arkadi Sharshevsky authored
      The pipeline debug is used to export the pipeline abstractions for the
      main objects - tables, headers and entries. The only support for set is
      for changing the counter parameter on specific table.
      
      The basic structures:
      
      Header - can represent a real protocol header information or internal
               metadata. Generic protocol headers like IPv4 can be shared
               between drivers. Each driver can add local headers.
      
      Field - part of a header. Can represent protocol field or specific ASIC
              metadata field. Hardware special metadata fields can be mapped
              to different resources, for example switch ASIC ports can have
              internal number which from the systems point of view is mapped
              to netdeivce ifindex.
      
      Match - represent specific match rule. Can describe match on specific
              field or header. The header index should be specified as well
              in order to support several header instances of the same type
              (tunneling).
      
      Action - represents specific action rule. Actions can describe operations
               on specific field values for example like set, increment, etc.
               And header operation like add and delete.
      
      Value - represents value which can be associated with specific match or
              action.
      
      Table - represents a hardware block which can be described with match/
              action behavior. The match/action can be done on the packets
              data or on the internal metadata that it gathered along the
              packets traversal throw the pipeline which is vendor specific
              and should be exported in order to provide understanding of
              ASICs behavior.
      
      Entry - represents single record in a specific table. The entry is
              identified by specific combination of values for match/action.
      
      Prior to accessing the tables/entries the drivers provide the header/
      field data base which is used by driver to user-space. The data base
      is split between the shared headers and unique headers.
      Signed-off-by: default avatarArkadi Sharshevsky <arkadis@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1555d204
  2. 28 Mar, 2017 8 commits
    • David S. Miller's avatar
      Merge tag 'mlx5e-failsafe' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · cc628c96
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      mlx5e-failsafe 27-03-2017
      
      This series provides a fail-safe mechanism to allow safely re-configuring
      mlx5e netdevice and provides a resiliency against sporadic
      configuration failures.
      
      To enable this we do some refactoring and code reorganizing to allow
      breaking the drivers open/close flows to stages:
            open -> activate -> deactivate -> close.
      
      In addition we need to allow creating fresh HW ring resources
      (mlx5e_channels) with their own "new" set of parameters, while keeping
      the current ones running and active until the new channels are
      successfully created with the new configuration, and only then we can
      safly replace (switch) old channels with new ones.
      
      For that we introduce mlx5e_channels object and an API to manage it:
       - channels = open_channels(new_params):
         open fresh TX/RX channels
       - activate_channels(channels):
         redirect traffic to them and attach them to the netdev
       - deactivate_channes(channels)
         stop traffic and detach from netdev
       - close(channels)
         Free the TX/RX HW resources of those channels
      
      With the above strategy it is straightforward to achieve the desired
      behavior of fail-safe configuration.  In pseudo code:
      
      make_new_config(new_params)
      {
      	old_channels = current_active_channels;
      	new_channels = create_channels(new_params);
      	if (!new_channels)
      		return "Failed, but current channels are still active :)"
      
      	deactivate_channels(old_channels); /* Can't fail */
      	set_hw_new_state();                /* If needed  */
      	activate_channels(new_channels);   /* Can't fail */
      	close_channels(old_channels);
      	current_active_channels = new_channels;
      
              return "SUCCESS";
      }
      
      At the top of this series, we change the following flows to be fail-safe:
      ethtool:
         - ring parameters
         - coalesce parameters
         - tx copy break parameters
         - cqe compressing/moderation mode setting (priv flags)
      ndos:
         - tc setup
         - set features: LRO
         - change mtu
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc628c96
    • David S. Miller's avatar
      Merge branch 'bond-link-status-fixes' · 95ed0edd
      David S. Miller authored
      Mahesh Bandewar says:
      
      ====================
      link-status fixes for mii-monitoring
      
      The mii monitoring is divided into two phases - inspect and commit. The
      inspect phase technically should not make any changes to the state and
      defer it to the commit phase. However detected link state inconsistencies
      on several machines and discovered that it's the result of some
      inconsistent update to link states and assumption that you *always* get
      rtnl-mutex. In reality when trylock() fails to acquire rtnl-mutex, the
      commit phase is postponed until next mii-mon run. At the next round
      because of the state change performed in the previous inspect-run, this
      round does not detect any changes and would skip calling commit phase.
      This would result in an inconsistent state until next link event happens
      (if it ever happens).
      
      During the the commit phase, it's always assumed that speed and duplex
      fetch is always successful, but that's always not the case. However the
      slave state is marked UP irrespective of speed / duplex fetch operation.
      If the speed / duplex fetch operation results in insane values for either
      of these two fields, then keeping internal link state UP is not going to
      provide fruitful results either.
      
      Please see into individual patches for more details.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95ed0edd
    • Mahesh Bandewar's avatar
    • Mahesh Bandewar's avatar
      bonding: correctly update link status during mii-commit phase · b5bf0f5b
      Mahesh Bandewar authored
      bond_miimon_commit() marks the link UP after attempting to get the speed
      and duplex settings for the link. There is a possibility that
      bond_update_speed_duplex() could fail. This is another place where it
      could result into an inconsistent bonding link state.
      
      With this patch the link will be marked UP only if the speed and duplex
      values retrieved have sane values and processed further.
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b5bf0f5b
    • Mahesh Bandewar's avatar
      bonding: make speed, duplex setting consistent with link state · c4adfc82
      Mahesh Bandewar authored
      bond_update_speed_duplex() retrieves speed and duplex settings. There
      is a possibility of failure in retrieving these values but caller has
      to assume it's always successful. This leads to having inconsistent
      slave link settings. If these (speed, duplex) values cannot be
      retrieved, then keeping the link UP causes problems.
      
      The updated bond_update_speed_duplex() returns 0 on success if it
      retrieves sane values for speed and duplex. On failure it returns 1
      and marks the link down.
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c4adfc82
    • Mahesh Bandewar's avatar
      bonding: improve link-status update in mii-monitoring · de77ecd4
      Mahesh Bandewar authored
      The primary issue is that mii-inspect phase updates link-state and
      expects changes to be committed during the mii-commit phase. After
      the inspect phase if it fails to acquire rtnl-mutex, the commit
      phase (bond_mii_commit) doesn't get to run. This partially updated
      state stays and makes the internal-state inconsistent.
      
      e.g. setup bond0 => slaves: eth1, eth2
      eth1 goes DOWN -> UP
         mii_monitor()
      	mii-inspect()
      	    bond_set_slave_link_state(eth1, UP, DontNotify)
      	rtnl_trylock() <- fails!
      
      Next mii-monitor round
      eth1: No change
         mii_monitor()
      	mii-inspect()
      	    eth1->link == current-status (ethtool_ops->get_link)
      	    no-change-detected
      
      End result:
          eth1:
            Link = BOND_LINK_UP
            Speed = 0xfffff  [SpeedUnknown]
            Duplex = 0xff    [DuplexUnknown]
      
      This doesn't always happen but for some unlucky machines in a large set
      of machines it creates problems.
      
      The fix for this is to avoid making changes during inspect phase and
      postpone them until acquiring the rtnl-mutex / invoking commit phase.
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      de77ecd4
    • Mahesh Bandewar's avatar
      bonding: split bond_set_slave_link_state into two parts · f307668b
      Mahesh Bandewar authored
      Split the function into two (a) propose (b) commit phase without
      changing the semantics for the original API.
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f307668b
    • David S. Miller's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 205ed44e
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      40GbE Intel Wired LAN Driver Updates 2017-03-27
      
      This series contains updates to i40e and i40evf only.
      
      Alex updates the driver code so that we can do bulk updates of the page
      reference count instead of just incrementing it by one reference at a
      time.  Fixed an issue where we were not resetting skb back to NULL when
      we have freed it.  Cleaned up the i40e_process_skb_fields() to align with
      other Intel drivers.  Removed FCoE code, since it is not supported in any
      of the Fortville/Fortpark hardware, so there is not much point of carrying
      the code around, especially if it is broken and untested.
      
      Harshitha fixes a bug in the driver where the calculation of the RSS size
      was not taking into account the number of traffic classes enabled.
      
      Robert fixes a potential race condition during VF reset by eliminating
      IOMMU DMAR Faults caused by VF hardware and when the OS initiates a VF
      reset and before the reset is finished we modify the VF's settings.
      
      Bimmy removes a delay that is no longer needed, since it was only needed
      for preproduction hardware.
      
      Colin King fixes null pointer dereference, where VSI was being
      dereferenced before the VSI NULL check.
      
      Jake fixes an issue with the recent addition of the "client code" to the
      driver, where we attempt to use an uninitialized variable, so correctly
      initialize the params variable by calling i40e_client_get_params().
      
      v2: dropped patch 5 of the original series from Carolyn since we need
          more documentation and reason why the added delay, so Carolyn is
          taking the time to update the patch before we re-submit it for
          kernel inclusion.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      205ed44e
  3. 27 Mar, 2017 7 commits