Commits · 0364a8824c020f12e2d5e9fad963685b58f7574e · Kirill Smelkov / linux

22 Sep, 2016 40 commits

xen-netback: switch to threaded irq for control ring · 0364a882

Juergen Gross authored Sep 22, 2016

Instead of open coding it use the threaded irq mechanism in
xen-netback.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0364a882

net: ethernet: mediatek: get out of potential invalid pointer access · f6f7d9c0

Sean Wang authored Sep 22, 2016

Potential dangerous invalid pointer might be accessed if
the error happens when couple phy_device to net_device so
cleanup the error path.
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f6f7d9c0

net: ethernet: mediatek: use [get|set]_link_ksettings · 3e60b748

Sean Wang authored Sep 22, 2016

1) use new api [get|set]_link_ksettings instead
of [get|set]_settings old ones.

2) dev->phydev is sure being ready before calling
these callbacks, so removing all the sanity check
if it is existing.
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3e60b748

net: ethernet: mediatek: remove superfluous local variable for phy address · a2b2a19f

Sean Wang authored Sep 22, 2016

remove the unused variable for parsing PHY address
and the related logic for sanity test which would
be all already handled done when of_mdiobus_register
was called
Reported-by: Nelson Chang <nelson.chang@mediatek.com>
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a2b2a19f

net: ethernet: mediatek: use phydev from struct net_device · 2364c5c5

Sean Wang authored Sep 22, 2016

reuse phydev already in struct net_device instead of creating
another new one in private structure.
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2364c5c5

Merge branch 'mediatek-trgmii' · 4fa01af4

David S. Miller authored Sep 22, 2016

Sean Wang says:

====================
mediatek: add support for RGMII on GMAC0 through TRGMII hardware module

By default, GMAC0 is connected to built-in switch called
MT7530 through the proprietary interface called Turbo RGMII
(TRGMII). TRGMII also supports well for RGMII as generic external
PHY uses but requires some slight changes to the setup of TRGMII
and doesn't have well support on current driver.

So this patchset
1) provides the slight changes of the setup for RGMII can work
   through TRGMII
2) adds additional setting "trgmii" as PHY_INTERFACE_MODE_TRGMII
   about phy-mode on device tree to make GMAC0 distinguish which
   mode it runs
3) changes dynamically source clock, TX/RX delay and interface
   mode on TRGMII for adapting various link

Changes since v1:
- fixed the style of comment which doesn't have a space at
   the beginning and end of comment lines
- add support for phy-mode "trgmii" as PHY_INTERFACE_MODE_TRGMII
   into linux/phy.h
- enhance the Documentation about device tree binding for trgmii
  which is applicable only for GMAC0 which uses fixed-link
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

4fa01af4

net: ethernet: mediatek: add the dts property to set if TRGMII supported on GMAC0 · b8853965

Sean Wang authored Sep 22, 2016

Add the dts property for the capability if TRGMII supported on GAMC0
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b8853965

net: ethernet: mediatek: add support for GMAC0 connecting with external PHY through TRGMII · f430dea7

Sean Wang authored Sep 22, 2016

Changing dynamically source clock, TX/RX delay and interface mode
used by TRGMII hardware module inside PHY capability polling routine
for adapting to the various speed of RGMII used by external PHY for
GMAC0.
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f430dea7

net: ethernet: mediatek: add extension of phy-mode for TRGMII · 572de608

Sean Wang authored Sep 22, 2016

adds PHY-mode "trgmii" as an extension for the operation
mode of the PHY interface for PHY_INTERFACE_MODE_TRGMII.
and adds a variable trgmii inside mtk_mac as the indication
to make the difference between the MAC connected to internal
switch or connected to external PHY by the given configuration
on the board and then to perform the corresponding setup on
TRGMII hardware module.
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

572de608

Merge tag 'rxrpc-rewrite-20160922-v2' of... · 60cd6e63

David S. Miller authored Sep 22, 2016

Merge tag 'rxrpc-rewrite-20160922-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

David Howells says:

====================
rxrpc: Preparation for slow-start algorithm [ver #2]

Here are some patches that prepare for improvements in ACK generation and
for the implementation of the slow-start part of the protocol:

 (1) Stop storing the protocol header in the Tx socket buffers, but rather
     generate it on the fly.  This potentially saves a little space and
     makes it easier to alter the header just before transmission (the
     flags may get altered and the serial number has to be changed).

 (2) Mask off the Tx buffer annotations and add a flag to record which ones
     have already been resent.

 (3) Track RTT on a per-peer basis for use in future changes.  Tracepoints
     are added to log this.

 (4) Send PING ACKs in response to incoming calls to elicit a PING-RESPONSE
     ACK from which RTT data can be calculated.  The response also carries
     other useful information.

 (5) Expedite PING-RESPONSE ACK generation from sendmsg.  If we're actively
     using sendmsg, this allows us, under some circumstances, to avoid
     having to rely on the background work item to run to generate this
     ACK.

     This requires ktime_sub_ms() to be added.

 (6) Set the REQUEST-ACK flag on some DATA packets to elicit ACK-REQUESTED
     ACKs from which RTT data can be calculated.

 (7) Limit the use of pings and ACK requests for RTT determination.

Changes:

 (V2) Don't use the C division operator for 64-bit division.  One instance
      should use do_div() and the other should be using nsecs_to_jiffies().

      The last two patches got transposed, leading to an undefined symbol
      in one of them.
Reported-by: kbuild test robot <lkp@intel.com>
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

60cd6e63

rxrpc: Reduce the number of PING ACKs sent · fc943f67

David Howells authored Sep 22, 2016

We don't want to send a PING ACK for every new incoming call as that just
adds to the network traffic.  Instead, we send a PING ACK to the first
three that we receive and then once per second thereafter.

This could probably be made adjustable in future.
Signed-off-by: David Howells <dhowells@redhat.com>

fc943f67

rxrpc: Reduce the number of ACK-Requests sent · 0d4b103c

David Howells authored Sep 22, 2016

Reduce the number of ACK-Requests we set on DATA packets that we're sending
to reduce network traffic. We set the flag on odd-numbered DATA packets to
start off the RTT cache until we have at least three entries in it and then
probe once per second thereafter to keep it topped up.

This could be made tunable in future.

Note that from this point, the RXRPC_REQUEST_ACK flag is set on DATA
packets as we transmit them and not stored statically in the sk_buff.
Signed-off-by: David Howells <dhowells@redhat.com>

0d4b103c

Merge branch 'ftgmac100-ast2500-support' · cdd0766d

David S. Miller authored Sep 22, 2016

Joel Stanley says:

====================
ftgmac100 support for ast2500

This series adds support to the ftgmac100 driver for the Aspeed ast2400 and
ast2500 SoCs. In particular, they ensure the driver works correctly on the
ast2500 where the MAC block has seen some changes in register layout.

They have been tested on ast2400 and ast2500 systems with the NCSI stack and
with a directly attached PHY.

V2 reworks the two patches relating to PHYSTS_CHG into the one patch that
disables the interrupt instead of playing with interrupt sensitivity. I kept
patch 4 'net/faraday: Clear stale interrupts' which was first introduced to
clear the stale PHYSTS_CHG interrupt, as it helps keep us safe from unhygienic
(vendor) bootloaders.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

cdd0766d

net/faraday: Mask out PHYSTS_CHG interrupt · edcd692f

Joel Stanley authored Sep 22, 2016

The PHYSTS_CHG (the ftgmac100's PHY IRQ) is telling the system to go
look at the PHY registers for a link status change.

The interrupt was causing issues on Aspeed SoC where some board designs
had an active high configuration, some active low, and in some cases
repurposed for other functions. When misconfigured Linux would chew 100%
of CPU cycles servicing interrupts:

 [   20.280000] ftgmac100 1e660000.ethernet eth0: [ISR] = 0x200: PHYSTS_CHG
 [   20.280000] ftgmac100 1e660000.ethernet eth0: [ISR] = 0x200: PHYSTS_CHG
 [   20.280000] ftgmac100 1e660000.ethernet eth0: [ISR] = 0x200: PHYSTS_CHG
 [   20.300000] ftgmac100 1e660000.ethernet eth0: [ISR] = 0x200: PHYSTS_CHG

While in the ftgmac100 IP can be configured for high, low and edge
sensitivity the current driver always polls the PHY, so we chose to mask
out the interrupt.

See https://patchwork.ozlabs.org/patch/672099/ for more discussion.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

edcd692f

net/faraday: Configure old MDIO interface on Aspeed SoCs · e07dc63b

Joel Stanley authored Sep 22, 2016

The Aspeed SoCs have a new MDIO interface as an option in the G4 and G5
SoCs. The old one is still available, so select it in order to remain
compatible with the ftgmac100 driver.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

e07dc63b

net/faraday: Clear stale interrupts · 08c9c126

Gavin Shan authored Sep 22, 2016

There is stale interrupt (PHYSTS_CHG in ISR, bit#6 in 0x0) from
the bootloader (uboot) when enabling the MAC. The stale interrupts
aren't part of kernel and should be cleared.

This clears the stale interrupts in ISR (0x0) when enabling the MAC.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

08c9c126

net/faraday: Adapt for Aspeed SoCs · 2a0ab8eb

Joel Stanley authored Sep 22, 2016

The RXDES and TXDES registers bits in the ftgmac100 indicates EDO{R,T}R
at bit position 15 for the Faraday Tech IP. However, the version of this
IP present in the Aspeed SoCs has these bits at position 30 in the
registers.

It appers that ast2400 SoCs support both positions, with the 15th bit
marked as reserved but still functional. In the ast2500 this bit is
reused for another function, so we need a work around.

This was confirmed with engineers from Aspeed that using bit 30 is
correct for both the ast2400 and ast2500 SoCs.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

2a0ab8eb

net/faraday: Make EDO{R,T}R bits configurable · 7906a4da

Andrew Jeffery authored Sep 22, 2016

These bits are #defined at a fixed location. In order to support future
hardware that has chosen to move these bits around move the bits into a
member of the struct ftgmac100.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

7906a4da

net/faraday: Separate rx page storage from rxdesc · ada66b54

Andrew Jeffery authored Sep 22, 2016

The ftgmac100 hardware revision in e.g. the Aspeed AST2500 no longer
reserves all bits in RXDES#2 but instead uses the bottom 16 bits to
store MAC frame metadata. Avoid corruption by shifting struct page
pointers out to their own member in struct ftgmac100.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

ada66b54

rxrpc: Obtain RTT data by requesting ACKs on DATA packets · 50235c4b

David Howells authored Sep 22, 2016

In addition to sending a PING ACK to gain RTT data, we can set the
RXRPC_REQUEST_ACK flag on a DATA packet and get a REQUESTED-ACK ACK. The
ACK packet contains the serial number of the packet it is in response to,
so we can look through the Tx buffer for a matching DATA packet.

This requires that the data packets be stamped with the time of
transmission as a ktime rather than having the resend_at time in jiffies.

This further requires the resend code to do the resend determination in
ktimes and convert to jiffies to set the timer.
Signed-off-by: David Howells <dhowells@redhat.com>

50235c4b

rxrpc: Add ktime_sub_ms() · 77f2efcb

David Howells authored Sep 22, 2016

Add a ktime_sub_ms() to go with ktime_add_ms() and co. for use in AF_RXRPC
RTT determination.
Signed-off-by: David Howells <dhowells@redhat.com>

77f2efcb

rxrpc: Expedite ping response transmission · 7aa51da7

David Howells authored Sep 22, 2016

Expedite the transmission of a response to a PING ACK by sending it from
sendmsg if one is pending.  We're most likely to see a PING ACK during the
client call Tx phase as the other side may use it to determine a number of
parameters, such as the client's receive window size, the RTT and whether
the client is doing slow start (similar to RFC5681).

If we don't expedite it, it's left to the background processing thread to
transmit.
Signed-off-by: David Howells <dhowells@redhat.com>

7aa51da7

rxrpc: Send pings to get RTT data · 8e83134d

David Howells authored Sep 22, 2016

Send a PING ACK packet to the peer when we get a new incoming call from a
peer we don't have a record for.  The PING RESPONSE ACK packet will tell us
the following about the peer:

 (1) its receive window size

 (2) its MTU sizes

 (3) its support for jumbo DATA packets

 (4) if it supports slow start (similar to RFC 5681)

 (5) an estimate of the RTT

This is necessary because the peer won't normally send us an ACK until it
gets to the Rx phase and we send it a packet, but we would like to know
some of this information before we start sending packets.

A pair of tracepoints are added so that RTT determination can be observed.
Signed-off-by: David Howells <dhowells@redhat.com>

8e83134d

cxgb4: Convert to use simple_open() · 524605e5

Wei Yongjun authored Sep 21, 2016

Remove an open coded simple_open() function and replace file
operations references to the function with simple_open()
instead.

Generated by: scripts/coccinelle/api/simple_open.cocci
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

524605e5

net: dsa: qca8k: use mdio_module_driver to simplify the code · a084ab33

Wei Yongjun authored Sep 21, 2016

mdio_module_driver() makes the code simpler by eliminating
boilerplate code.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a084ab33

net: dsa: qca8k: fix non static symbol warning · fcfbfd68

Wei Yongjun authored Sep 21, 2016

Fixes the following sparse warning:

drivers/net/dsa/qca8k.c:259:22: warning:
 symbol 'qca8k_regmap_config' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fcfbfd68

Merge branch 'sctp-align' · 9ba62f95

David S. Miller authored Sep 22, 2016

Marcelo Ricardo Leitner says:

====================
Rename WORD_TRUNC/ROUND macros and use them

This patchset aims to rename these macros to a non-confusing name, as
reported by David Laight and David Miller, and to update all remaining
places to make use of it, which was 1 last remaining spot.

v3:
- Name it SCTP_PAD4 instead of SCTP_ALIGN4, as suggested by David Laight
v2:
- fixed 2nd patch summary

Details on the specific changelogs.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

9ba62f95

sctp: make use of SCTP_TRUNC4 macro · 4a225ce3

Marcelo Ricardo Leitner authored Sep 21, 2016

And avoid the usage of '&~3'. This is the last place still not using
the macro.
Also break the line to make it easier to read.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4a225ce3

sctp: rename WORD_TRUNC/ROUND macros · e2f036a9

Marcelo Ricardo Leitner authored Sep 21, 2016

To something more meaningful these days, specially because this is
working on packet headers or lengths and which are not tied to any CPU
arch but to the protocol itself.

So, WORD_TRUNC becomes SCTP_TRUNC4 and WORD_ROUND becomes SCTP_PAD4.
Reported-by: David Laight <David.Laight@ACULAB.COM>
Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e2f036a9

Merge branch 'mlx5e-xdp' · b80b8d7a

David S. Miller authored Sep 22, 2016

Tariq Toukan says:

====================
mlx5e XDP support

This series adds XDP support in mlx5e driver.
This includes the use cases: XDP_DROP, XDP_PASS, and XDP_TX.

Single stream performance tests show 16.5 Mpps for XDP_DROP,
and 12.4 Mpps for XDP_TX, with nice scalability for multiple streams/rings.

This rate of XDP_DROP is lower than the 32 Mpps we got in previous
implementation, when Striding RQ was used.

We moved to non-Striding RQ, as some XDP_TX requirements (like headroom,
packet-per-page) cannot be satisfied with the current Striding RQ HW,
and we decided to fully support both DROP/TX.

Few directions are considered in order to enable the faster rate for XDP_DROP,
e.g a possibility for users to enable Striding RQ so they choose optimized
XDP_DROP on the price of partial XDP_TX functionality, or some HW changes.

Series generated against net-next commit:
cf714ac1 'ipvlan: Fix dependency issue'

Thanks,
Tariq

V2:
* patch 8:
 - when XDP_TX fails, call mlx5e_page_release and drop the packet.
 - update xdp_tx counter within mlx5e_xmit_xdp_frame.
   (mlx5e_xmit_xdp_frame return value becomes obsolete, change it to void)
 - drop the packet for unknown XDP return code.
* patch 9:
 - use a boolean for xdp_doorbell in SQ struct, instead of dragging it
   throughout the functions calls.
 - handle doorbell and counters within mlx5e_xmit_xdp_frame.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

b80b8d7a

net/mlx5e: XDP TX xmit more · 35b510e2

Saeed Mahameed authored Sep 21, 2016

Previously we rang XDP SQ doorbell on every forwarded XDP packet.

Here we introduce a xmit more like mechanism that will queue up more
than one packet into SQ (up to RX napi budget) w/o notifying the hardware.

Once RX napi budget is consumed and we exit napi RX loop, we will
flush (doorbell) all XDP looped packets in case there are such.

XDP forward packet rate:

Comparing XDP with and w/o xmit more (bulk transmit):

RX Cores    XDP TX       XDP TX (xmit more)
---------------------------------------------------
1           6.5Mpps      12.4Mpps
2          13.2Mpps      24.2Mpps
4          25.2Mpps      36.3Mpps*
8          36.3Mpps*     36.3Mpps*

*My xmitter was limited to 36.3Mpps, so it is the bottleneck.
It seems that receive side can handle more.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

35b510e2

net/mlx5e: XDP TX forwarding support · b5503b99

Saeed Mahameed authored Sep 21, 2016

Adding support for XDP_TX forwarding from xdp program.
Using XDP, now user can loop packets out of the same port.

We create a dedicated TX SQ for each channel that will serve
XDP programs that return XDP_TX action to loop packets back to
the wire directly from the channel RQ RX path.

For that RX pages will now need to be mapped bi-directionally,
and on XDP_TX action we will sync the page back to device then
queue it into SQ for transmission.  The XDP xmit frame function will
report back to the RX path if the page was consumed (transmitted), if so,
RX path will forget about that page as if it were released to the stack.
Later on, on XDP TX completion, the page will be released back to the
page cache.

For simplicity this patch will hit a doorbell on every XDP TX packet.

Next patch will introduce a xmit more like mechanism that will
queue up more than one packet into SQ w/o notifying the hardware,
once RX napi loop is done we will hit doorbell once for all XDP TX
packets form the previous loop.  This should drastically improve
XDP TX performance.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b5503b99

net/mlx5e: Have a clear separation between different SQ types · f10b7cc7

Saeed Mahameed authored Sep 21, 2016

Make a clear separate between Regular SQ (TXQ) and ICO SQ creation,
destruction and union their mutual information structures.

Don't allocate redundant TXQ skb/wqe_info/dma_fifo arrays for ICO SQ.
And have a different SQ edge for ICO SQ than TXQ SQ, to be more
accurate.

In preparation for XDP TX support.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f10b7cc7

net/mlx5e: XDP fast RX drop bpf programs support · 86994156

Rana Shahout authored Sep 21, 2016

Add support for the BPF_PROG_TYPE_PHYS_DEV hook in mlx5e driver.

When XDP is on we make sure to change channels RQs type to
MLX5_WQ_TYPE_LINKED_LIST rather than "striding RQ" type to
ensure "page per packet".

On XDP set, we fail if HW LRO is set and request from user to turn it
off.  Since on ConnectX4-LX HW LRO is always on by default, this will be
annoying, but we prefer not to enforce LRO off from XDP set function.

Full channels reset (close/open) is required only when setting XDP
on/off.

When XDP set is called just to exchange programs, we will update
each RQ xdp program on the fly and for synchronization with current
data path RX activity of that RQ, we temporally disable that RQ and
ensure RX path is not running, quickly update and re-enable that RQ,
for that we do:
	- rq.state = disabled
	- napi_synnchronize
	- xchg(rq->xdp_prg)
	- rq.state = enabled
	- napi_schedule // Just in case we've missed an IRQ

Packet rate performance testing was done with pktgen 64B packets and on
TX side and, TC drop action on RX side compared to XDP fast drop.

CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz

Comparison is done between:
	1. Baseline, Before this patch with TC drop action
	2. This patch with TC drop action
	3. This patch with XDP RX fast drop

RX Cores  Baseline(TC drop)    TC drop    XDP fast Drop
--------------------------------------------------------------
1            5.3Mpps           5.3Mpps     16.5Mpps
2           10.2Mpps          10.2Mpps     31.3Mpps
4           20.5Mpps          19.9Mpps     36.3Mpps*

*My xmitter was limited to 36.3Mpps, so it is the bottleneck.
It seems that receive side can handle more.
Signed-off-by: Rana Shahout <ranas@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

86994156

net/mlx5e: Dynamic RQ type infrastructure · 2fc4bfb7

Saeed Mahameed authored Sep 21, 2016

Add two helper functions to allow dynamic changes of RQ type.

mlx5e_set_rq_priv_params and mlx5e_set_rq_type_params will be
used on netdev creation to determine the default RQ type.

This will be needed later for downstream patches of XDP support.
When enabling XDP we will dynamically move from striding RQ to
linked list RQ type.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2fc4bfb7

net/mlx5e: Slightly reduce hardware LRO size · e4b85508

Saeed Mahameed authored Sep 21, 2016

Before this patch LRO size was 64K, now with build_skb requires
extra room, headroom + sizeof(skb_shared_info) added to the data
buffer will make wqe size or page_frag_size slightly larger than
64K which will demand order 5 page instead of order 4 in 4K page systems.

We take those extra bytes from hardware LRO data size in order to not
increase the required page order for when hardware LRO is enabled.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e4b85508

net/mlx5e: Union RQ RX info per RQ type · 21c59685

Saeed Mahameed authored Sep 21, 2016

We have two types of RX RQs, and they use two separate sets of
info arrays and structures in RX data path function.  Today those
structures are mutually exclusive per RQ type, hence one kind is
allocated on RQ creation according to the RQ type.

For better cache locality and to minimalize the
sizeof(struct mlx5e_rq), in this patch we define them as a union.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

21c59685

net/mlx5e: Build RX SKB on demand · 1bfecfca

Saeed Mahameed authored Sep 21, 2016

For non-striding RQ configuration before this patch we had a ring
with pre-allocated SKBs and mapped the SKB->data buffers for
device.

For robustness and better RX data buffers management, we allocate a
page per packet and build_skb around it.

This patch (which is a prerequisite for XDP) will actually reduce
performance for normal stack usage, because we are now hitting a bottleneck
in the page allocator. We use the page-cache to restore or even improve
performance in comparison to the old RX scheme.

Packet rate performance testing was done with pktgen 64B packets on xmit
side and TC ingress dropping action on RX side.

CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz

Comparison is done between:
 1.Baseline, before 'net/mlx5e: Build RX SKB on demand'
 2.Build SKB with RX page cache (This patch)

RX Cores  Baseline    Build SKB+page-cache    Improvement
-----------------------------------------------------------
1          4.16Mpps       5.33Mpps                28%
2          7.16Mpps      10.24Mpps                43%
4         13.61Mpps      20.51Mpps                51%
8         25.32Mpps      32.00Mpps                26%

All respective cores were 100% utilized.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1bfecfca

tcp: implement TSQ for retransmits · f9616c35

Eric Dumazet authored Sep 20, 2016

We saw sch_fq drops caused by the per flow limit of 100 packets and TCP
when dealing with large cwnd and bursts of retransmits.

Even after increasing the limit to 1000, and even after commit
10d3be56 ("tcp-tso: do not split TSO packets at retransmit time"),
we can still have these drops.

Under certain conditions, TCP can spend a considerable amount of
time queuing thousands of skbs in a single tcp_xmit_retransmit_queue()
invocation, incurring latency spikes and stalls of other softirq
handlers.

This patch implements TSQ for retransmits, limiting number of packets
and giving more chance for scheduling packets in both ways.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f9616c35

Merge branch 'mv88e6390-prep' · 0f1100c1

David S. Miller authored Sep 22, 2016

Andrew Lunn says:

====================
Preparation for mv88e6390

These two patches are a couple of preparation steps for supporting the
the MV88E6390 family of chips. This is a new generation from Marvell,
and will need more feature flags than are currently available in an
unsigned long. Expand to an unsigned long long. The MV88E6390 also
places its port registers somewhere else, so add a wrapper around port
register access.

v2:
 Rework wrappers to use mv88e6xxx_{read|write}
 Simpliy some (err < ) to (err)
Add Reviewed by tag.

v3::
 reg = reg & foo -> reg &= foo
 Fix over zealous s/ret/err
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

0f1100c1