1. 27 May, 2020 22 commits
  2. 26 May, 2020 18 commits
    • David S. Miller's avatar
      Merge branch 'net-phy-mscc-miim-reduce-waiting-time-between-MDIO-transactions' · 0e348119
      David S. Miller authored
      Antoine Tenart says:
      
      ====================
      net: phy: mscc-miim: reduce waiting time between MDIO transactions
      
      This series aims at reducing the waiting time between MDIO transactions
      when using the MSCC MIIM MDIO controller.
      
      I'm not sure we need patch 4/4 and we could reasonably drop it from the
      series. I'm including the patch as it could help to ensure the system
      is functional with a non optimal configuration.
      
      We needed to improve the driver's performances as when using a PHY
      requiring lots of registers accesses (such as the VSC85xx family),
      delays would add up and ended up to be quite large which would cause
      issues such as: a slow initialization of the PHY, and issues when using
      timestamping operations (this feature will be sent quite soon to the
      mailing lists).
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0e348119
    • Antoine Tenart's avatar
      net: phy: mscc-miim: read poll when high resolution timers are disabled · a021ada2
      Antoine Tenart authored
      The driver uses a read polling mechanism to check the status of the MDIO
      bus, to know if it is ready to accept next commands. This polling
      mechanism uses usleep_delay() under the hood between reads which is fine
      as long as high resolution timers are enabled. Otherwise the delays will
      end up to be much longer than expected.
      
      This patch fixes this by using udelay() under the hood when
      CONFIG_HIGH_RES_TIMERS isn't enabled. This increases CPU usage.
      Signed-off-by: default avatarAntoine Tenart <antoine.tenart@bootlin.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a021ada2
    • Antoine Tenart's avatar
      net: phy: mscc-miim: improve waiting logic · d9c6de35
      Antoine Tenart authored
      The MSCC MIIM MDIO driver uses a waiting logic to wait for the MDIO bus
      to be ready to accept next commands. It does so by polling the BUSY
      status bit which indicates the MDIO bus has completed all pending
      operations. This can take time, and the controller supports writing the
      next command as soon as there are no pending commands (which happens
      while the MDIO bus is busy completing its current command).
      
      This patch implements this improved logic by adding an helper to poll
      the PENDING status bit, and by adjusting where we should wait for the
      bus to not be busy or to not be pending.
      Signed-off-by: default avatarAntoine Tenart <antoine.tenart@bootlin.com>
      Reviewed-by: default avatarAlexandre Belloni <alexandre.belloni@bootlin.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d9c6de35
    • Antoine Tenart's avatar
      net: phy: mscc-miim: remove redundant timeout check · f5112c8a
      Antoine Tenart authored
      readl_poll_timeout already returns -ETIMEDOUT if the condition isn't
      satisfied, there's no need to check again the condition after calling
      it. Remove the redundant timeout check.
      Signed-off-by: default avatarAntoine Tenart <antoine.tenart@bootlin.com>
      Reviewed-by: default avatarAlexandre Belloni <alexandre.belloni@bootlin.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f5112c8a
    • Antoine Tenart's avatar
      net: phy: mscc-miim: use more reasonable delays · 9513167e
      Antoine Tenart authored
      The MSCC MIIM MDIO driver uses delays to read poll a status register. I
      made multiple tests on a Ocelot PCS120 platform which led me to reduce
      those delays. The delay in between which the polling function is allowed
      to sleep is reduced from 100us to 50us which in almost all cases is a
      good value to succeed at the first retry. The overall delay is also
      lowered as the prior value was really way to high, 10000us is large
      enough.
      Signed-off-by: default avatarAntoine Tenart <antoine.tenart@bootlin.com>
      Reviewed-by: default avatarAlexandre Belloni <alexandre.belloni@bootlin.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9513167e
    • Russell King's avatar
      net: mdiobus: add clause 45 mdiobus accessors · 90ce665c
      Russell King authored
      There is a recurring pattern throughout some of the PHY code converting
      a devad and regnum to our packed clause 45 representation. Rather than
      having this scattered around the code, let's put a common translation
      function in mdio.h, and provide some register accessors.
      
      Convert the phylib core, phylink, bcm87xx and cortina to use these.
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      90ce665c
    • David S. Miller's avatar
      Merge branch 'flow-mpls' · 8928e19a
      David S. Miller authored
      Guillaume Nault says:
      
      ====================
      flow_dissector, cls_flower: Add support for multiple MPLS Label Stack Entries
      
      Currently, the flow dissector and the Flower classifier can only handle
      the first entry of an MPLS label stack. This patch series generalises
      the code to allow parsing and matching the Label Stack Entries that
      follow.
      
      Patch 1 extends the flow dissector to parse MPLS LSEs until the Bottom
      Of Stack bit is reached. The number of parsed LSEs is capped at
      FLOW_DIS_MPLS_MAX (arbitrarily set to 7). Flower and the NFP driver
      are updated to take into account the new layout of struct
      flow_dissector_key_mpls.
      
      Patch 2 extends Flower. It defines new netlink attributes, which are
      independent from the previous MPLS ones. Mixing the old and the new
      attributes in a same filter is not allowed. For backward compatibility,
      the old attributes are used when dumping filters that don't require the
      new ones.
      
      Changes since v2:
        * Fix compilation with the new MLX5 bareudp tunnel code.
      
      Changes since v1:
        * Fix compilation of NFP driver (kbuild test robot).
        * Fix sparse warning with entropy label (kbuild test robot).
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8928e19a
    • Guillaume Nault's avatar
      cls_flower: Support filtering on multiple MPLS Label Stack Entries · 61aec25a
      Guillaume Nault authored
      With struct flow_dissector_key_mpls now recording the first
      FLOW_DIS_MPLS_MAX labels, we can extend Flower to filter on any of
      these LSEs independently.
      
      In order to avoid creating new netlink attributes for every possible
      depth, let's define a new TCA_FLOWER_KEY_MPLS_OPTS nested attribute
      that contains the list of LSEs to match. Each LSE is represented by
      another attribute, TCA_FLOWER_KEY_MPLS_OPTS_LSE, which then contains
      the attributes representing the depth and the MPLS fields to match at
      this depth (label, TTL, etc.).
      
      For each MPLS field, the mask is always set to all-ones, as this is
      what the original API did. We could allow user configurable masks in
      the future if there is demand for more flexibility.
      
      The new API also allows to only specify an LSE depth. In that case,
      Flower only verifies that the MPLS label stack depth is greater or
      equal to the provided depth (that is, an LSE exists at this depth).
      
      Filters that only match on one (or more) fields of the first LSE are
      dumped using the old netlink attributes, to avoid confusing user space
      programs that don't understand the new API.
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      61aec25a
    • Guillaume Nault's avatar
      flow_dissector: Parse multiple MPLS Label Stack Entries · 58cff782
      Guillaume Nault authored
      The current MPLS dissector only parses the first MPLS Label Stack
      Entry (second LSE can be parsed too, but only to set a key_id).
      
      This patch adds the possibility to parse several LSEs by making
      __skb_flow_dissect_mpls() return FLOW_DISSECT_RET_PROTO_AGAIN as long
      as the Bottom Of Stack bit hasn't been seen, up to a maximum of
      FLOW_DIS_MPLS_MAX entries.
      
      FLOW_DIS_MPLS_MAX is arbitrarily set to 7. This should be enough for
      many practical purposes, without wasting too much space.
      
      To record the parsed values, flow_dissector_key_mpls is modified to
      store an array of stack entries, instead of just the values of the
      first one. A bit field, "used_lses", is also added to keep track of
      the LSEs that have been set. The objective is to avoid defining a
      new FLOW_DISSECTOR_KEY_MPLS_XX for each level of the MPLS stack.
      
      TC flower is adapted for the new struct flow_dissector_key_mpls layout.
      Matching on several MPLS Label Stack Entries will be added in the next
      patch.
      
      The NFP and MLX5 drivers are also adapted: nfp_flower_compile_mac() and
      mlx5's parse_tunnel() now verify that the rule only uses the first LSE
      and fail if it doesn't.
      
      Finally, the behaviour of the FLOW_DISSECTOR_KEY_MPLS_ENTROPY key is
      slightly modified. Instead of recording the first Entropy Label, it
      now records the last one. This shouldn't have any consequences since
      there doesn't seem to have any user of FLOW_DISSECTOR_KEY_MPLS_ENTROPY
      in the tree. We'd probably better do a hash of all parsed MPLS labels
      instead (excluding reserved labels) anyway. That'd give better entropy
      and would probably also simplify the code. But that's not the purpose
      of this patch, so I'm keeping that as a future possible improvement.
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      58cff782
    • David S. Miller's avatar
      Merge tag 'batadv-next-for-davem-20200526' of git://git.open-mesh.org/linux-merge · fb8ddaa9
      David S. Miller authored
      Simon Wunderlich says:
      
      ====================
      This cleanup patchset includes the following patches:
      
       - Fix revert dynamic lockdep key changes for batman-adv,
         by Sven Eckelmann
      
       - use rcu_replace_pointer() where appropriate, by Antonio Quartulli
      
       - Revert "disable ethtool link speed detection when auto negotiation
         off", by Sven Eckelmann
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fb8ddaa9
    • David S. Miller's avatar
      Merge branch 'tipc-add-some-improvements' · 6a862a44
      David S. Miller authored
      Tuong Lien says:
      
      ====================
      tipc: add some improvements
      
      This series adds some improvements to TIPC.
      
      The first patch improves the TIPC broadcast's performance with the 'Gap
      ACK blocks' mechanism similar to unicast before, while the others give
      support on tracing & statistics for broadcast links, and an alternative
      to carry broadcast retransmissions via unicast which might be useful in
      some cases.
      
      Besides, the Nagle algorithm can now automatically 'adjust' itself
      depending on the specific network condition a stream connection runs by
      the last patch.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6a862a44
    • Tuong Lien's avatar
      tipc: add test for Nagle algorithm effectiveness · 0a3e060f
      Tuong Lien authored
      When streaming in Nagle mode, we try to bundle small messages from user
      as many as possible if there is one outstanding buffer, i.e. not ACK-ed
      by the receiving side, which helps boost up the overall throughput. So,
      the algorithm's effectiveness really depends on when Nagle ACK comes or
      what the specific network latency (RTT) is, compared to the user's
      message sending rate.
      
      In a bad case, the user's sending rate is low or the network latency is
      small, there will not be many bundles, so making a Nagle ACK or waiting
      for it is not meaningful.
      For example: a user sends its messages every 100ms and the RTT is 50ms,
      then for each messages, we require one Nagle ACK but then there is only
      one user message sent without any bundles.
      
      In a better case, even if we have a few bundles (e.g. the RTT = 300ms),
      but now the user sends messages in medium size, then there will not be
      any difference at all, that says 3 x 1000-byte data messages if bundled
      will still result in 3 bundles with MTU = 1500.
      
      When Nagle is ineffective, the delay in user message sending is clearly
      wasted instead of sending directly.
      
      Besides, adding Nagle ACKs will consume some processor load on both the
      sending and receiving sides.
      
      This commit adds a test on the effectiveness of the Nagle algorithm for
      an individual connection in the network on which it actually runs.
      Particularly, upon receipt of a Nagle ACK we will compare the number of
      bundles in the backlog queue to the number of user messages which would
      be sent directly without Nagle. If the ratio is good (e.g. >= 2), Nagle
      mode will be kept for further message sending. Otherwise, we will leave
      Nagle and put a 'penalty' on the connection, so it will have to spend
      more 'one-way' messages before being able to re-enter Nagle.
      
      In addition, the 'ack-required' bit is only set when really needed that
      the number of Nagle ACKs will be reduced during Nagle mode.
      
      Testing with benchmark showed that with the patch, there was not much
      difference in throughput for small messages since the tool continuously
      sends messages without a break, so Nagle would still take in effect.
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Acked-by: default avatarJon Maloy <jmaloy@redhat.com>
      Signed-off-by: default avatarTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0a3e060f
    • Tuong Lien's avatar
      tipc: add support for broadcast rcv stats dumping · 03b6fefd
      Tuong Lien authored
      This commit enables dumping the statistics of a broadcast-receiver link
      like the traditional 'broadcast-link' one (which is for broadcast-
      sender). The link dumping can be triggered via netlink (e.g. the
      iproute2/tipc tool) by the link flag - 'TIPC_NLA_LINK_BROADCAST' as the
      indicator.
      
      The name of a broadcast-receiver link of a specific peer will be in the
      format: 'broadcast-link:<peer-id>'.
      
      For example:
      
      Link <broadcast-link:1001002>
        Window:50 packets
        RX packets:7841 fragments:2408/440 bundles:0/0
        TX packets:0 fragments:0/0 bundles:0/0
        RX naks:0 defs:124 dups:0
        TX naks:21 acks:0 retrans:0
        Congestion link:0  Send queue max:0 avg:0
      
      In addition, the broadcast-receiver link statistics can be reset in the
      usual way via netlink by specifying that link name in command.
      
      Note: the 'tipc_link_name_ext()' is removed because the link name can
      now be retrieved simply via the 'l->name'.
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Acked-by: default avatarJon Maloy <jmaloy@redhat.com>
      Signed-off-by: default avatarTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      03b6fefd
    • Tuong Lien's avatar
      tipc: enable broadcast retrans via unicast · a91d55d1
      Tuong Lien authored
      In some environment, broadcast traffic is suppressed at high rate (i.e.
      a kind of bandwidth limit setting). When it is applied, TIPC broadcast
      can still run successfully. However, when it comes to a high load, some
      packets will be dropped first and TIPC tries to retransmit them but the
      packet retransmission is intentionally broadcast too, so making things
      worse and not helpful at all.
      
      This commit enables the broadcast retransmission via unicast which only
      retransmits packets to the specific peer that has really reported a gap
      i.e. not broadcasting to all nodes in the cluster, so will prevent from
      being suppressed, and also reduce some overheads on the other peers due
      to duplicates, finally improve the overall TIPC broadcast performance.
      
      Note: the functionality can be turned on/off via the sysctl file:
      
      echo 1 > /proc/sys/net/tipc/bc_retruni
      echo 0 > /proc/sys/net/tipc/bc_retruni
      
      Default is '0', i.e. the broadcast retransmission still works as usual.
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Acked-by: default avatarJon Maloy <jmaloy@redhat.com>
      Signed-off-by: default avatarTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a91d55d1
    • Tuong Lien's avatar
      tipc: add back link trace events · c6ed7a5c
      Tuong Lien authored
      In the previous commit ("tipc: add Gap ACK blocks support for broadcast
      link"), we have removed the following link trace events due to the code
      changes:
      
      - tipc_link_bc_ack
      - tipc_link_retrans
      
      This commit adds them back along with some minor changes to adapt to
      the new code.
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Acked-by: default avatarJon Maloy <jmaloy@redhat.com>
      Signed-off-by: default avatarTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c6ed7a5c
    • Tuong Lien's avatar
      tipc: introduce Gap ACK blocks for broadcast link · d7626b5a
      Tuong Lien authored
      As achieved through commit 9195948f ("tipc: improve TIPC throughput
      by Gap ACK blocks"), we apply the same mechanism for the broadcast link
      as well. The 'Gap ACK blocks' data field in a 'PROTOCOL/STATE_MSG' will
      consist of two parts built for both the broadcast and unicast types:
      
       31                       16 15                        0
      +-------------+-------------+-------------+-------------+
      |  bgack_cnt  |  ugack_cnt  |            len            |
      +-------------+-------------+-------------+-------------+  -
      |            gap            |            ack            |   |
      +-------------+-------------+-------------+-------------+    > bc gacks
      :                           :                           :   |
      +-------------+-------------+-------------+-------------+  -
      |            gap            |            ack            |   |
      +-------------+-------------+-------------+-------------+    > uc gacks
      :                           :                           :   |
      +-------------+-------------+-------------+-------------+  -
      
      which is "automatically" backward-compatible.
      
      We also increase the max number of Gap ACK blocks to 128, allowing upto
      64 blocks per type (total buffer size = 516 bytes).
      
      Besides, the 'tipc_link_advance_transmq()' function is refactored which
      is applicable for both the unicast and broadcast cases now, so some old
      functions can be removed and the code is optimized.
      
      With the patch, TIPC broadcast is more robust regardless of packet loss
      or disorder, latency, ... in the underlying network. Its performance is
      boost up significantly.
      For example, experiment with a 5% packet loss rate results:
      
      $ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000
      real    0m 42.46s
      user    0m 1.16s
      sys     0m 17.67s
      
      Without the patch:
      
      $ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000
      real    8m 27.94s
      user    0m 0.55s
      sys     0m 2.38s
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Acked-by: default avatarJon Maloy <jmaloy@redhat.com>
      Signed-off-by: default avatarTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d7626b5a
    • Yuval Basson's avatar
      qed: Add EDPM mode type for user-fw compatibility · ff937b91
      Yuval Basson authored
      In older FW versions the completion flag was treated as the ack flag in
      edpm messages. Expose the FW option of setting which mode the QP is in
      by adding a flag to the qedr <-> qed API.
      
      Flag is added for backward compatibility with libqedr.
      This flag will be set by qedr after determining whether the libqedr is
      using the updated version.
      
      Fixes: f1093940 ("qed: Add support for QP verbs")
      Signed-off-by: default avatarYuval Basson <yuval.bason@marvell.com>
      Signed-off-by: default avatarMichal Kalderon <michal.kalderon@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ff937b91
    • Eric Dumazet's avatar
      tcp: tcp_v4_err() icmp skb is named icmp_skb · 23917494
      Eric Dumazet authored
      I missed the fact that tcp_v4_err() differs from tcp_v6_err().
      
      After commit 4d1a2d9e ("Rename skb to icmp_skb in tcp_v4_err()")
      the skb argument has been renamed to icmp_skb only in one function.
      
      I will in a future patch reconciliate these functions to avoid
      this kind of confusion.
      
      Fixes: 45af29ca ("tcp: allow traceroute -Mtcp for unpriv users")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      23917494