Commits · 759d095741721888b6ee51afa74e0a66ce65e974 · Kirill Smelkov / linux

29 Jun, 2019 8 commits

r8169: improve handling VLAN tag · 759d0957

Heiner Kallweit authored Jun 27, 2019

The VLAN tag is stored in the descriptor in network byte order.
Using swab16 works on little endian host systems only. Better play safe
and use ntohs or htons respectively.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

759d0957

selftests: rtnetlink: skip ipsec offload tests if netdevsim isn't present · 3099c59d

Florian Westphal authored Jun 27, 2019

running the script on systems without netdevsim now prints:

SKIP: ipsec_offload can't load netdevsim

instead of error message & failed status.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

3099c59d

Merge branch 'em_ipt-add-support-for-addrtype' · fc413885

David S. Miller authored Jun 29, 2019

Nikolay Aleksandrov says:

====================
em_ipt: add support for addrtype

We would like to be able to use the addrtype from tc for ACL rules and
em_ipt seems the best place to add support for the already existing xt
match. The biggest issue is that addrtype revision 1 (with ipv6 support)
is NFPROTO_UNSPEC and currently em_ipt can't differentiate between v4/v6
if such xt match is used because it passes the match's family instead of
the packet one. The first 3 patches make em_ipt match only on IP
traffic (currently both policy and addrtype recognize such traffic
only) and make it pass the actual packet's protocol instead of the xt
match family when it's unspecified. They also add support for NFPROTO_UNSPEC
xt matches. The last patch allows to add addrtype rules via em_ipt.
We need to keep the user-specified nfproto for dumping in order to be
compatible with libxtables, we cannot dump NFPROTO_UNSPEC as the nfproto
or we'll get an error from libxtables, thus the nfproto is limited to
ipv4/ipv6 in patch 03 and is recorded.

v3: don't use the user nfproto for matching, only for dumping, more
    information is available in the commit message in patch 03
v2: change patch 02 to set the nfproto only when unspecified and drop
    patch 04 from v1 (Eyal Birger)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

fc413885

net: sched: em_ipt: add support for addrtype matching · 0c4231c7

Nikolay Aleksandrov authored Jun 27, 2019

Allow em_ipt to use addrtype for matching. Restrict the use only to
revision 1 which has IPv6 support. Since it's a NFPROTO_UNSPEC xt match
we use the user-specified nfproto for matching, in case it's unspecified
both v4/v6 will be matched by the rule.

v2: no changes, was patch 5 in v1
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0c4231c7

net: sched: em_ipt: keep the user-specified nfproto and dump it · ba3d24d4

Nikolay Aleksandrov authored Jun 27, 2019

If we dump NFPROTO_UNSPEC as nfproto user-space libxtables can't handle
it and would exit with an error like:
"libxtables: unhandled NFPROTO in xtables_set_nfproto"
In order to avoid the error return the user-specified nfproto. If we
don't record it then the match family is used which can be
NFPROTO_UNSPEC. Even if we add support to mask NFPROTO_UNSPEC in
iproute2 we have to be compatible with older versions which would be
also be allowed to add NFPROTO_UNSPEC matches (e.g. addrtype after the
last patch).

v3: don't use the user nfproto for matching, only for dumping the rule,
    also don't allow the nfproto to be unspecified (explained above)
v2: adjust changes to missing patch, was patch 04 in v1
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ba3d24d4

net: sched: em_ipt: set the family based on the packet if it's unspecified · f4c1c40c

Nikolay Aleksandrov authored Jun 27, 2019

Set the family based on the packet if it's unspecified otherwise
protocol-neutral matches will have wrong information (e.g. NFPROTO_UNSPEC).
In preparation for using NFPROTO_UNSPEC xt matches.

v2: set the nfproto only when unspecified
Suggested-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f4c1c40c

net: sched: em_ipt: match only on ip/ipv6 traffic · 9e10edd7

Nikolay Aleksandrov authored Jun 27, 2019

Restrict matching only to ip/ipv6 traffic and make sure we can use the
headers, otherwise matches will be attempted on any protocol which can
be unexpected by the xt matches. Currently policy supports only ipv4/6.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9e10edd7

hinic: add vlan offload support · aebd17b7

Xue Chaojing authored Jun 29, 2019

This patch adds vlan offload support for the HINIC driver.
Signed-off-by: Xue Chaojing <xuechaojing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

aebd17b7

28 Jun, 2019 32 commits

Merge branch 'net-sched-Add-txtime-assist-support-for-taprio' · 0a7960c7

David S. Miller authored Jun 28, 2019

Vedang Patel says:

====================
net/sched: Add txtime-assist support for taprio.

Changes in v6:
- Use _BITUL() instead of BIT() in UAPI for etf. (patch #1)
- Fix a bug reported by kbuild test bot in length_to_duration(). (patch #6)
- Remove an unused function (get_cycle_start()). (Patch #6)

Changes in v5:
- Commit message improved for the igb patch (patch #1).
- Fixed typo in commit message for etf patch (patch #2).

Changes in v4:
- Remove inline directive from functions in foo.c.
- Fix spacing in pkt_sched.h (for etf patch).

Changes in v3:
- Simplify implementation for taprio flags.
- txtime_delay can only be set if txtime-assist mode is enabled.
- txtime_delay and flags will only be visible in tc output if set by user.
- Minor changes in error reporting.

Changes in v2:
- Txtime-offload has now been renamed to txtime-assist mode.
- Renamed the offload parameter to flags.
- Removed the code which introduced the hardware offloading functionality.

Original Cover letter (with above changes included)
--------------------------------------------------

Currently, we are seeing packets being transmitted outside their
timeslices. We can confirm that the packets are being dequeued at the right
time. So, the delay is induced after the packet is dequeued, because
taprio, without any offloading, has no control of when a packet is actually
transmitted.

In order to solve this, we are making use of the txtime feature provided by
ETF qdisc. Hardware offloading needs to be supported by the ETF qdisc in
order to take advantage of this feature. The taprio qdisc will assign
txtime (in skb->tstamp) for all the packets which do not have the txtime
allocated via the SO_TXTIME socket option. For the packets which already
have SO_TXTIME set, taprio will validate whether the packet will be
transmitted in the correct interval.

In order to support this, the following parameters have been added:
- flags (taprio): This is added in order to support different offloading
  modes which will be added in the future.
- txtime-delay (taprio): This indicates the minimum time it will take for
  the packet to hit the wire after it reaches taprio_enqueue(). This is
  useful in determining whether we can transmit the packet in the remaining
  time if the gate corresponding to the packet is currently open.
- skip_skb_check (ETF): ETF currently drops any packet which does not have
  the SO_TXTIME socket option set. This check can be skipped by specifying
  this option.

Following is an example configuration:

tc qdisc replace dev $IFACE parent root handle 100 taprio \\
    num_tc 3 \\
    map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \\
    queues 1@0 1@0 1@0 \\
    base-time $BASE_TIME \\
    sched-entry S 01 300000 \\
    sched-entry S 02 300000 \\
    sched-entry S 04 400000 \\
    flags 0x1 \\
    txtime-delay 200000 \\
    clockid CLOCK_TAI

tc qdisc replace dev $IFACE parent 100:1 etf \\
    offload delta 200000 clockid CLOCK_TAI skip_skb_check

Here, the "flags" parameter is indicating that the txtime-assist mode is
enabled. Also, all the traffic classes have been assigned the same queue.
This is to prevent the traffic classes in the lower priority queues from
getting starved. Note that this configuration is specific to the i210
ethernet card. Other network cards where the hardware queues are given the
same priority, might be able to utilize more than one queue.

Following are some of the other highlights of the series:
- Fix a bug where hardware timestamping and SO_TXTIME options cannot be
  used together. (Patch 1)
- Introduces the skip_skb_check option.  (Patch 2)
- Make TxTime assist mode work with TCP packets (Patch 7).

The following changes are recommended to be done in order to get the best
performance from taprio in this mode:
ip link set dev enp1s0 mtu 1514
ethtool -K eth0 gso off
ethtool -K eth0 tso off
ethtool --set-eee eth0 eee off
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

0a7960c7

taprio: Adjust timestamps for TCP packets · 54002066

Vedang Patel authored Jun 25, 2019

When the taprio qdisc is running in "txtime offload" mode, it will
set the launchtime value (in skb->tstamp) for all the packets which do
not have the SO_TXTIME socket option. But, the TCP packets already have
this value set and it indicates the earliest departure time represented
in CLOCK_MONOTONIC clock.

We need to respect the timestamp set by the TCP subsystem. So, convert
this time to the clock which taprio is using and ensure that the packet
is not transmitted before the deadline set by TCP.
Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

54002066

taprio: make clock reference conversions easier · 7ede7b03

Vedang Patel authored Jun 25, 2019

Later in this series we will need to transform from
CLOCK_MONOTONIC (used in TCP) to the clock reference used in TAPRIO.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7ede7b03

taprio: Add support for txtime-assist mode · 4cfd5779

Vedang Patel authored Jun 25, 2019

Currently, we are seeing non-critical packets being transmitted outside of
their timeslice. We can confirm that the packets are being dequeued at the
right time. So, the delay is induced in the hardware side.  The most likely
reason is the hardware queues are starving the lower priority queues.

In order to improve the performance of taprio, we will be making use of the
txtime feature provided by the ETF qdisc. For all the packets which do not
have the SO_TXTIME option set, taprio will set the transmit timestamp (set
in skb->tstamp) in this mode. TAPrio Qdisc will ensure that the transmit
time for the packet is set to when the gate is open. If SO_TXTIME is set,
the TAPrio qdisc will validate whether the timestamp (in skb->tstamp)
occurs when the gate corresponding to skb's traffic class is open.

Following two parameters added to support this mode:
- flags: used to enable txtime-assist mode. Will also be used to enable
  other modes (like hardware offloading) later.
- txtime-delay: This indicates the minimum time it will take for the packet
  to hit the wire. This is useful in determining whether we can transmit
the packet in the remaining time if the gate corresponding to the packet is
currently open.

An example configuration for enabling txtime-assist:

tc qdisc replace dev eth0 parent root handle 100 taprio \\
      num_tc 3 \\
      map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \\
      queues 1@0 1@0 1@0 \\
      base-time 1558653424279842568 \\
      sched-entry S 01 300000 \\
      sched-entry S 02 300000 \\
      sched-entry S 04 400000 \\
      flags 0x1 \\
      txtime-delay 40000 \\
      clockid CLOCK_TAI

tc qdisc replace dev $IFACE parent 100:1 etf skip_sock_check \\
      offload delta 200000 clockid CLOCK_TAI

Note that all the traffic classes are mapped to the same queue.  This is
only possible in taprio when txtime-assist is enabled. Also, note that the
ETF Qdisc is enabled with offload mode set.

In this mode, if the packet's traffic class is open and the complete packet
can be transmitted, taprio will try to transmit the packet immediately.
This will be done by setting skb->tstamp to current_time + the time delta
indicated in the txtime-delay parameter. This parameter indicates the time
taken (in software) for packet to reach the network adapter.

If the packet cannot be transmitted in the current interval or if the
packet's traffic is not currently transmitting, the skb->tstamp is set to
the next available timestamp value. This is tracked in the next_launchtime
parameter in the struct sched_entry.

The behaviour w.r.t admin and oper schedules is not changed from what is
present in software mode.

The transmit time is already known in advance. So, we do not need the HR
timers to advance the schedule and wakeup the dequeue side of taprio.  So,
HR timer won't be run when this mode is enabled.
Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4cfd5779

taprio: Remove inline directive · 566af331

Vedang Patel authored Jun 25, 2019

Remove inline directive from length_to_duration(). We will let the compiler
make the decisions.
Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

566af331

taprio: calculate cycle_time when schedule is installed · 037be037

Vedang Patel authored Jun 25, 2019

cycle time for a particular schedule is calculated only when it is first
installed. So, it makes sense to just calculate it once right after the
'cycle_time' parameter has been parsed and store it in cycle_time.
Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

037be037

etf: Add skip_sock_check · d14d2b20

Vedang Patel authored Jun 25, 2019

Currently, etf expects a socket with SO_TXTIME option set for each packet
it encounters. So, it will drop all other packets. But, in the future
commits we are planning to add functionality where tstamp value will be set
by another qdisc. Also, some packets which are generated from within the
kernel (e.g. ICMP packets) do not have any socket associated with them.

So, this commit adds support for skip_sock_check. When this option is set,
etf will skip checking for a socket and other associated options for all
skbs.
Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d14d2b20

etf: Don't use BIT() in UAPI headers. · 9903c8dc

Vedang Patel authored Jun 25, 2019

The BIT() macro isn't exported as part of the UAPI interface. So, the
compile-test to ensure they are self contained fails. So, use _BITUL()
instead.
Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9903c8dc

igb: clear out skb->tstamp after reading the txtime · 1e08511d

Vedang Patel authored Jun 25, 2019

If a packet which is utilizing the launchtime feature (via SO_TXTIME socket
option) also requests the hardware transmit timestamp, the hardware
timestamp is not delivered to the userspace. This is because the value in
skb->tstamp is mistaken as the software timestamp.

Applications, like ptp4l, request a hardware timestamp by setting the
SOF_TIMESTAMPING_TX_HARDWARE socket option. Whenever a new timestamp is
detected by the driver (this work is done in igb_ptp_tx_work() which calls
igb_ptp_tx_hwtstamps() in igb_ptp.c[1]), it will queue the timestamp in the
ERR_QUEUE for the userspace to read. When the userspace is ready, it will
issue a recvmsg() call to collect this timestamp. The problem is in this
recvmsg() call. If the skb->tstamp is not cleared out, it will be
interpreted as a software timestamp and the hardware tx timestamp will not
be successfully sent to the userspace. Look at skb_is_swtx_tstamp() and the
callee function __sock_recv_timestamp() in net/socket.c for more details.
Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1e08511d

Merge branch 'mirred-recurse' · 8747d82d

David S. Miller authored Jun 28, 2019

John Hurley says:

====================
Track recursive calls in TC act_mirred

These patches aim to prevent act_mirred causing stack overflow events from
recursively calling packet xmit or receive functions. Such events can
occur with poor TC configuration that causes packets to travel in loops
within the system.

Florian Westphal advises that a recursion crash and packets looping are
separate issues and should be treated as such. David Miller futher points
out that pcpu counters cannot track the precise skb context required to
detect loops. Hence these patches are not aimed at detecting packet loops,
rather, preventing stack flows arising from such loops.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

8747d82d

net: sched: protect against stack overflow in TC act_mirred · e2ca070f

John Hurley authored Jun 24, 2019

TC hooks allow the application of filters and actions to packets at both
ingress and egress of the network stack. It is possible, with poor
configuration, that this can produce loops whereby an ingress hook calls
a mirred egress action that has an egress hook that redirects back to
the first ingress etc. The TC core classifier protects against loops when
doing reclassifies but there is no protection against a packet looping
between multiple hooks and recursively calling act_mirred. This can lead
to stack overflow panics.

Add a per CPU counter to act_mirred that is incremented for each recursive
call of the action function when processing a packet. If a limit is passed
then the packet is dropped and CPU counter reset.

Note that this patch does not protect against loops in TC datapaths. Its
aim is to prevent stack overflow kernel panics that can be a consequence
of such loops.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e2ca070f

net: sched: refactor reinsert action · 720f22fe

John Hurley authored Jun 24, 2019

The TC_ACT_REINSERT return type was added as an in-kernel only option to
allow a packet ingress or egress redirect. This is used to avoid
unnecessary skb clones in situations where they are not required. If a TC
hook returns this code then the packet is 'reinserted' and no skb consume
is carried out as no clone took place.

This return type is only used in act_mirred. Rather than have the reinsert
called from the main datapath, call it directly in act_mirred. Instead of
returning TC_ACT_REINSERT, change the type to the new TC_ACT_CONSUMED
which tells the caller that the packet has been stolen by another process
and that no consume call is required.

Moving all redirect calls to the act_mirred code is in preparation for
tracking recursion created by act_mirred.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

720f22fe

ipv4: enable route flushing in network namespaces · 5cdda5f1

Christian Brauner authored Jun 24, 2019

Tools such as vpnc try to flush routes when run inside network
namespaces by writing 1 into /proc/sys/net/ipv4/route/flush. This
currently does not work because flush is not enabled in non-initial
network namespaces.
Since routes are per network namespace it is safe to enable
/proc/sys/net/ipv4/route/flush in there.

Link: https://github.com/lxc/lxd/issues/4257Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5cdda5f1

Merge tag 'batadv-next-for-davem-20190627v2' of git://git.open-mesh.org/linux-merge · 65dc5416

David S. Miller authored Jun 28, 2019

Simon Wunderlich says:

====================
This feature/cleanup patchset includes the following patches:

 - bump version strings, by Simon Wunderlich

 - fix includes for _MAX constants, atomic functions and fwdecls,
   by Sven Eckelmann (3 patches)

 - shorten multicast tt/tvlv worker spinlock section, by Linus Luessing

 - routeable multicast preparations: implement MAC multicast filtering,
   by Linus Luessing (2 patches, David Millers comments integrated)

 - remove return value checks for debugfs_create, by Greg Kroah-Hartman

 - add routable multicast optimizations, by Linus Luessing (2 patches)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

65dc5416

Merge branch 'hns3-next' · fcd71efd

David S. Miller authored Jun 28, 2019

Huazhong Tan says:

====================
net: hns3: some code optimizations & cleanups & bugfixes

[patch 01/12] fixes a TX timeout issue.

[patch 02/12 - 04/12] adds some patch related to TM module.

[patch 05/12] fixes a compile warning.

[patch 06/12] adds Asym Pause support for autoneg

[patch 07/12] optimizes the error handler for VF reset.

[patch 08/12] deals with the empty interrupt case.

[patch 09/12 - 12/12] adds some cleanups & optimizations.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

fcd71efd

net: hns3: optimize the CSQ cmd error handling · 82c8ae6e

Peng Li authored Jun 28, 2019

If CMDQ ring is full, hclge_cmd_send may return directly, but IMP still
working and HW pointer changed, SW ring pointer do not match the HW
pointer. This patch update the SW pointer every time when the space is
full, so it can work normally next time if IMP and HW still working.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

82c8ae6e

net: hns3: remove RXD_VLD check in hns3_handle_bdinfo · 289f8125

Yunsheng Lin authored Jun 28, 2019

The HNS3_RXD_VLD_B bit has already been checked in hns3_add_frag
or hns3_handle_rx_bd before calling hns3_handle_bdinfo, so when
hns3_handle_bdinfo is called, the HNS3_RXD_VLD_B bit is always
set, which makes the checking in hns3_handle_bdinfo unnecessary.

This patch removes the RXD_VLD_B checking in hns3_handle_bdinfo.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

289f8125

net: hns3: remove unused linkmode definition · 53eb60c7

Jian Shen authored Jun 28, 2019

This patch removes unused linkmode definition.
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

53eb60c7

net: hns3: fix a statistics issue about l3l4 checksum error · 8b552079

Yufeng Mo authored Jun 28, 2019

The frame column is based on rx_crc_errors and rx_frame_errors. So
l3l4 checksum error should not be counted by rx_crc_errors. Instead,
l3l4 checksum error should be counted in ifconfig error column.

Fixes: d3ec4ef6 ("net: hns3: refactor the statistics updating for netdev")
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8b552079

net: hns3: handle empty unknown interrupt · 9bc6ac91

Huazhong Tan authored Jun 28, 2019

Since some MSI-X interrupt's status may be cleared by hardware,
so when the driver receives the interrupt, reading
HCLGE_VECTOR0_PF_OTHER_INT_STS_REG register will get an empty
unknown interrupt. For this case, the irq handler should enable
vector0 interrupt. This patch also use dev_info() instead of
dev_dbg() in the hclge_check_event_cause(), since this information
will be useful for normal usage.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9bc6ac91

net: hns3: re-schedule reset task while VF reset fail · bbe6540e

Huazhong Tan authored Jun 28, 2019

The VF reset may fail for some probabilistic reasons,
such as wait for hardware reset timeout, wait for mailbox
response timeout, so this patch tries to re-schedule the
reset task when the number of reset failing is under
HCLGEVF_RESET_MAX_FAIL_CNT. This patch also add a function
hclgevf_reset_err_handle() to handle the reset failing.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bbe6540e

net: hns3: add Asym Pause support to fix autoneg problem · bc3781ed

Yonglong Liu authored Jun 28, 2019

Local device and link partner config auto-negotiation on both,
local device config pause frame use as: rx on/tx off,
link partner config pause frame use as: rx off/tx on.

We except the result is:
Local device:
Autonegotiate:  on
RX:             on
TX:             off
RX negotiated:  on
TX negotiated:  off

Link partner:
Autonegotiate:  on
RX:             off
TX:             on
RX negotiated:  off
TX negotiated:  on

But actually, the result of Local device and link partner is both:
Autonegotiate:  on
RX:             off
TX:             off
RX negotiated:  off
TX negotiated:  off

The root cause is that the supported flag is has only Pause,
reference to the function genphy_config_advert():
static int genphy_config_advert(struct phy_device *phydev)
{
	...
	linkmode_and(phydev->advertising, phydev->advertising,
		     phydev->supported);
	...
}
The pause frame use of link partner is rx off/tx on, so its
advertising only set the bit Asym_Pause, and the supported is
only set the bit Pause, so the result of linkmode_and(), is
rx off/tx off.

This patch adds Asym_Pause to the supported flag to fix it.
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bc3781ed

net: hns3: fix a -Wformat-nonliteral compile warning · 18d219b7

Yonglong Liu authored Jun 28, 2019

When setting -Wformat=2, there is a compiler warning like this:

hclge_main.c:xxx:x: warning: format not a string literal and no
format arguments [-Wformat-nonliteral]
strs[i].desc);
^~~~

This patch adds missing format parameter "%s" to snprintf() to
fix it.

Fixes: 46a3df9f ("Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

18d219b7

net: hns3: add some error checking in hclge_tm module · 04f25edb

Yunsheng Lin authored Jun 28, 2019

When hdev->tx_sch_mode is HCLGE_FLAG_VNET_BASE_SCH_MODE, the
hclge_tm_schd_mode_vnet_base_cfg calls hclge_tm_pri_schd_mode_cfg
with vport->vport_id as pri_id, which is used as index for
hdev->tm_info.tc_info, it will cause out of bound access issue
if vport_id is equal to or larger than HNAE3_MAX_TC.

Also hardware only support maximum speed of HCLGE_ETHER_MAX_RATE.

So this patch adds two checks for above cases.

Fixes: 84844054 ("net: hns3: Add support of TX Scheduler & Shaper to HNS3 driver")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

04f25edb

net: hns3: change SSU's buffer allocation according to UM · 9e15be90

Yunsheng Lin authored Jun 28, 2019

Currently when there is share buffer in the SSU(storage
switching unit), the low waterline for RX private buffer is
too low to keep the hardware running. Hardware may have
processed all the packet stored in the private buffer of the
low waterline before the new packet comes, because hardware
only tell the peer send packet again when the private buffer
is under the low waterline.

So this patch only allocate RX private buffer if there is
enough buffer according to hardware user manual.

This patch also reserve some buffer for reusing when TC num
is less than or equal to 2, and change PAUSE_TRANS_GAP &
HCLGE_NON_DCB_ADDITIONAL_BUF according to hardware user
manual.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9e15be90

net: hns3: enable DCB when TC num is one and pfc_en is non-zero · ae179b2f

Yunsheng Lin authored Jun 28, 2019

Currently when TC num is one, the DCB will be disabled no matter if
pfc_en is non-zero or not.

This patch enables the DCB if pfc_en is non-zero, even when TC num
is one.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ae179b2f

net: hns3: fix __QUEUE_STATE_STACK_XOFF not cleared issue · f96315f2

Huazhong Tan authored Jun 28, 2019

When change MTU or other operations, which just calling .reset_notify
to do HNAE3_DOWN_CLIENT and HNAE3_UP_CLIENT, then
the netdev_tx_reset_queue() in the hns3_clear_all_ring() will be
ignored. So the dev_watchdog() may misdiagnose a TX timeout.

This patch separates netdev_tx_reset_queue() from
hns3_clear_all_ring(), and unifies hns3_clear_all_ring() and
hns3_force_clear_all_ring into one, since they are doing
similar things.

Fixes: 3a30964a ("net: hns3: delay ring buffer clearing during reset")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f96315f2

Merge branch 'Better-PHYLINK-compliance-for-SJA1105-DSA' · 5b18c705

David S. Miller authored Jun 28, 2019

Vladimir Oltean says:

====================
Better PHYLINK compliance for SJA1105 DSA

After discussing with Russell King, it appears this driver is making a
few confusions and not performing some checks for consistent operation.

Changes in v2:
- Removed redundant print in the phylink_validate callback (in 2/3).
====================
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

5b18c705

net: dsa: sja1105: Mark in-band AN modes not supported for PHYLINK · 9f971573

Vladimir Oltean authored Jun 28, 2019

We need a better way to signal this, perhaps in phylink_validate, but
for now just print this error message as guidance for other people
looking at this driver's code while trying to rework PHYLINK.

Cc: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9f971573

net: dsa: sja1105: Check for PHY mode mismatches with what PHYLINK reports · 39710229

Vladimir Oltean authored Jun 28, 2019

PHYLINK being designed with PHYs in mind that can change MII protocol,
for correct operation it is necessary to ensure that the PHY interface
mode stays the same (otherwise clear the supported bit mask, as
required).

Because this is just a hypothetical situation for now, we don't bother
to check whether we could actually support the new PHY interface mode.
Actually we could modify the xMII table, reset the switch and send an
updated static configuration, but adding that would just be dead code.

Cc: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

39710229

net: dsa: sja1105: Don't check state->link in phylink_mac_config · a979a0ab

Vladimir Oltean authored Jun 28, 2019

It has been pointed out that PHYLINK can call mac_config only to update
the phy_interface_type and without knowing what the AN results are.

Experimentally, when this was observed to happen, state->link was also
unset, and therefore was used as a proxy to ignore this call. However it
is also suggested that state->link is undefined for this callback and
should not be relied upon.

So let the previously-dead codepath for SPEED_UNKNOWN be called, and
update the comment to make sure the MAC's behavior is sane.

Cc: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a979a0ab

hinic: reduce rss_init stack usage · f7110b75

Arnd Bergmann authored Jun 28, 2019

On 32-bit architectures, putting an array of 256 u32 values on the
stack uses more space than the warning limit:

drivers/net/ethernet/huawei/hinic/hinic_main.c: In function 'hinic_rss_init':
drivers/net/ethernet/huawei/hinic/hinic_main.c:286:1: error: the frame size of 1068 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

I considered changing the code to use u8 values here, since that's
all the hardware supports, but dynamically allocating the array is
a more isolated fix here.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

f7110b75