Commits · 3d9017d9449bf8756f881970c970999c9148d306 · Kirill Smelkov / linux

09 Mar, 2017 40 commits

net: mvpp2: adapt mvpp2_defaults_set() to PPv2.2 · 3d9017d9

Thomas Petazzoni authored Mar 07, 2017

This commit modifies the mvpp2_defaults_set() function to not do the
loopback and FIFO threshold initialization, which are not needed for
PPv2.2.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3d9017d9

net: mvpp2: adapt the mvpp2_rxq_*_pool_set functions to PPv2.2 · 5eac892a

Thomas Petazzoni authored Mar 07, 2017

The MVPP2_RXQ_CONFIG_REG register has a slightly different layout
between PPv2.1 and PPv2.2, so this commit adapts the functions modifying
this register to accommodate for both the PPv2.1 and PPv2.2 cases.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5eac892a

net: mvpp2: adjust the allocation/free of BM pools for PPv2.2 · d01524d8

Thomas Petazzoni authored Mar 07, 2017

This commit adjusts the allocation and freeing of BM pools to support
PPv2.2. This involves:

 - Checking that the number of buffer pointers is a multiple of 16, as
   required by the hardware.

 - Adjusting the size of the DMA coherent area allocated for buffer
   pointers. Indeed, PPv2.2 needs space for 2 pointers of 64-bits per
   buffer, as opposed to 2 pointers of 32-bits per buffer in
   PPv2.1. The size in bytes is now stored in a new field of the
   mvpp2_bm_pool structure.

 - On PPv2.2, getting the DMA address and cookie (used for the physical
   address) of each buffer requires reading the
   MVPP22_BM_ADDR_HIGH_ALLOC to get the high order bits of those
   addresses. A new utility function mvpp2_bm_bufs_get_addrs() is
   introduced to handle this.

 - On PPv2.2, releasing a buffer requires writing the high order 32 bits
   of the DMA address and cookie to MVPP22_BM_PHY_VIRT_HIGH_RLS_REG.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d01524d8

net: mvpp2: introduce PPv2.2 HW descriptors and adapt accessors · e7c5359f

Thomas Petazzoni authored Mar 07, 2017

This commit adds the definition of the PPv2.2 HW descriptors, adjusts
the mvpp2_tx_desc and mvpp2_rx_desc structures accordingly, and adapts
the accessors to work on both PPv2.1 and PPv2.2.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e7c5359f

net: mvpp2: introduce an intermediate union for the TX/RX descriptors · 054f6372

Thomas Petazzoni authored Mar 07, 2017

Since the format of the HW descriptors is different between PPv2.1 and
PPv2.2, this commit introduces an intermediate union, with for now
only the PPv2.1 descriptors. The bulk of the driver code only
manipulates opaque mvpp2_tx_desc and mvpp2_rx_desc pointers, and the
descriptors can only be accessed and modified through the accessor
functions. A follow-up commit will add the descriptor definitions for
PPv2.2.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

054f6372

net: mvpp2: add hw_version field in "struct mvpp2" · faca9247

Thomas Petazzoni authored Mar 07, 2017

In preparation to the introduction for the support of PPv2.2 in the
mvpp2 driver, this commit adds a hw_version field to the struct
mvpp2, and uses the .data field of the DT match table to fill it in.

Having the MVPP21 and MVPP22 definitions available will allow to start
adding the necessary conditional code to support PPv2.2.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

faca9247

net: mvpp2: add and use accessors for TX/RX descriptors · ac3dd277

Thomas Petazzoni authored Mar 07, 2017

The PPv2.2 IP has a different TX and RX descriptor layout compared to
PPv2.1. In order to prepare for the introduction of PPv2.2 support in
mvpp2, this commit adds accessors for the different fields of the TX
and RX descriptors, and changes the code to use them.

For now, the mvpp2_port argument passed to the accessors is not used,
but it will be used in follow-up to update the descriptor according to
the version of the IP being used.

Apart from the mechanical changes to use the newly introduced
accessors, a few other changes, needed to use the accessors, are made:

 - The mvpp2_txq_inc_put() function now takes a mvpp2_port as first
   argument, as it is needed to use the accessors.

 - Similarly, the mvpp2_bm_cookie_build() gains a mvpp2_port first
   argument, for the same reason.

 - In mvpp2_rx_error(), instead of accessing the RX descriptor in each
   case of the switch, we introduce a local variable to store the
   packet size.

 - In mvpp2_tx_frag_process() and mvpp2_tx() instead of accessing the
   packet size from the TX descriptor, we use the actual value
   available in the function, which is used to set the TX descriptor
   packet size a few lines before.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ac3dd277

net: mvpp2: store physical address of buffer in rx_desc->buf_cookie · 4e4a105f

Thomas Petazzoni authored Mar 07, 2017

The RX descriptors of the PPv2 hardware allow to store several
information, amongst which:

 - the DMA address of the buffer in which the data has been received
 - a "cookie" field, left to the use of the driver, and not used by the
   hardware

In the current implementation, the "cookie" field is used to store the
virtual address of the buffer, so that in the receive completion path,
we can easily get the virtual address of the buffer that corresponds to
a completed RX descriptors.

On PPv2.1, used on 32-bit platforms, those two fields are 32-bit wide,
which is enough to store a DMA address in the first field, and a virtual
address in the second field.

On PPv2.2, used on 64-bit platforms, these two fields have been extended
to 40 bits. While 40 bits is enough to store a DMA address (as long as
the DMA mask is 40 bits or lower), it is not enough to store a virtual
address. Therefore, the "cookie" field can no longer be used to store
the virtual address of the buffer.

However, as Russell King pointed out, the RX buffers are always
allocated in the kernel linear mapping, and therefore using
phys_to_virt() on the physical address of the RX buffer is possible and
correct.

Therefore, this commit changes the driver to use the "cookie" field to
store the physical address instead of the virtual
address. phys_to_virt() is used in the receive completion path to
retrieve the virtual address from the physical address.

It is obviously important to realize that the DMA address and physical
address are two different things, which is why we store both in the RX
descriptors. While those addresses may be identical in some situations,
it remains two distinct concepts, and both addresses should be handled
separately.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4e4a105f

net: mvpp2: remove mvpp2_txq_pend_desc_num_get() function · 4d6c2a67

Thomas Petazzoni authored Mar 07, 2017

The mvpp2_txq_pend_desc_num_get() function only selects a TX queue, and
reads the number of pending descriptors. It is used in only one place,
in mvpp2_txq_clean(), where the TX queue has already been selected by a
write to MVPP2_TXQ_NUM_REG.

Therefore, this function is useless, and the caller can simply read the
value of the MVPP2_TXQ_PENDING_REG register instead.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4d6c2a67

net: mvpp2: remove unused register definition MVPP2_TXQ_THRESH_REG · df905f2e

Thomas Petazzoni authored Mar 07, 2017

This register is no longer used since commit edc660fa ("net: mvpp2:
replace TX coalescing interrupts with hrtimer").
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

df905f2e

net: mvpp2: remove support for buffer header · aeb3d110

Thomas Petazzoni authored Mar 07, 2017

The "buffer header" functionality is a functionality used by the
hardware to split an incoming packets over multiple BM buffers if they
are not large enough. However, the mvpp2 driver guarantees that a pool
of BM buffers has buffers with a size large enough to store MTU-sized
packets. Therefore, this functionality is completely unused, and the
code can be removed, and we should never get a descriptor with bit
MVPP2_RXD_BUF_HDR set.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

aeb3d110

net: mvpp2: use "dma" instead of "phys" where appropriate · 20396136

Thomas Petazzoni authored Mar 07, 2017

As indicated by Russell King, the mvpp2 driver currently uses a lot
"phys" or "phys_addr" to store what really is a DMA address. This commit
clarifies this by using "dma" or "dma_addr" where appropriate.

This is especially important as we are going to introduce more changes
where the distinction between physical address and DMA address will be
key.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

20396136

dt-bindings: net: update Marvell PPv2 binding for PPv2.2 support · aee44119

Thomas Petazzoni authored Mar 07, 2017

The Marvell PPv2 Device Tree binding was so far only used to describe
the PPv2.1 network controller, used in the Marvell Armada 375.

A new version of this IP block, PPv2.2 is used in the Marvell Armada
7K/8K processor. This commit extends the existing binding so that it can
also be used to describe PPv2.2 hardware.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

aee44119

Merge branch 'mlx4-order-0-allocations-and-page-recycling' · 3f2aa566

David S. Miller authored Mar 09, 2017

Eric Dumazet says:

====================
mlx4: order-0 allocations and page recycling

As mentioned half a year ago, we better switch mlx4 driver to order-0
allocations and page recycling.

This reduces vulnerability surface thanks to better skb->truesize
tracking and provides better performance in most cases.
(33 Gbit for one TCP flow on my lab hosts)

I will provide for linux-4.13 a patch on top of this series,
trying to improve data locality as described in
https://www.spinics.net/lists/netdev/msg422258.html

v2 provides an ethtool -S new counter (rx_alloc_pages) and
code factorization, plus Tariq fix.

v3 includes various fixes based on Tariq tests and feedback
from Saeed and Tariq.

v4 rebased on net-next for inclusion in linux-4.12, as requested
by Tariq.

Worth noting this patch series deletes ~250 lines of code ;)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

3f2aa566

mlx4: remove duplicate code in mlx4_en_process_rx_cq() · 68b8df46

Eric Dumazet authored Mar 08, 2017

We should keep one way to build skbs, regardless of GRO being on or off.

Note that I made sure to defer as much as possible the point we need to
pull data from the frame, so that future prefetch() we might add
are more effective.

These skb attributes derive from the CQE or ring :
 ip_summed, csum
 hash
 vlan offload
 hwtstamps
 queue_mapping

As a bonus, this patch removes mlx4 dependency on eth_get_headlen()
which is very often broken enough to give us headaches.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

68b8df46

mlx4: make validate_loopback() more generic · 6969cf0f

Eric Dumazet authored Mar 08, 2017

Testing a boolean in fast path is not worth duplicating
the code allocating packets, when GRO is on or off.

If this proves to be a problem, we might later use a jump label.

Next patch will remove this duplicated code and ease code review.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6969cf0f

mlx4: factorize page_address() calls · 02e6fd3e

Eric Dumazet authored Mar 08, 2017

We need to compute the frame virtual address at different points.
Do it once.

Following patch will use the new va address for validate_loopback()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

02e6fd3e

mlx4: do not access rx_desc from mlx4_en_process_rx_cq() · 9e8c0395

Eric Dumazet authored Mar 08, 2017

Instead of fetching dma address from rx_desc->data[0].addr,
prefer using frags[0].dma + frags[0].page_offset to avoid
a potential cache line miss.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9e8c0395

mlx4: add rx_alloc_pages counter in ethtool -S · 7d7bfc6a

Eric Dumazet authored Mar 08, 2017

This new counter tracks number of pages that we allocated for one port.

lpaa24:~# ethtool -S eth0 | egrep 'rx_alloc_pages|rx_packets'
     rx_packets: 306755183
     rx_alloc_pages: 932897
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7d7bfc6a

mlx4: add page recycling in receive path · 34db548b

Eric Dumazet authored Mar 08, 2017

Same technique than some Intel drivers, for arches where PAGE_SIZE = 4096

In most cases, pages are reused because they were consumed
before we could loop around the RX ring.

This brings back performance, and is even better,
a single TCP flow reaches 30Gbit on my hosts.

v2: added full memset() in mlx4_en_free_frag(), as Tariq found it was needed
if we switch to large MTU, as priv->log_rx_info can dynamically be changed.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

34db548b

mlx4: use order-0 pages for RX · b5a54d9a

Eric Dumazet authored Mar 08, 2017

Use of order-3 pages is problematic in some cases.

This patch might add three kinds of regression :

1) a CPU performance regression, but we will add later page
recycling and performance should be back.

2) TCP receiver could grow its receive window slightly slower,
   because skb->len/skb->truesize ratio will decrease.
   This is mostly ok, we prefer being conservative to not risk OOM,
   and eventually tune TCP better in the future.
   This is consistent with other drivers using 2048 per ethernet frame.

3) Because we allocate one page per RX slot, we consume more
   memory for the ring buffers. XDP already had this constraint anyway.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b5a54d9a

mlx4: removal of frag_sizes[] · 60c7f5ae

Eric Dumazet authored Mar 08, 2017

We will soon use order-0 pages, and frag truesize will more precisely
match real sizes.

In the new model, we prefer to use <= 2048 bytes fragments, so that
we can use page-recycle technique on PAGE_SIZE=4096 arches.

We will still pack as much frames as possible on arches with big
pages, like PowerPC.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

60c7f5ae

mlx4: reduce rx ring page_cache size · acd7628d

Eric Dumazet authored Mar 08, 2017

We only need to store the page and dma address.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

acd7628d

mlx4: rx_headroom is a per port attribute · d85f6c14

Eric Dumazet authored Mar 08, 2017

No need to duplicate it per RX queue / frags.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d85f6c14

mlx4: get rid of frag_prefix_size · aaca121d

Eric Dumazet authored Mar 08, 2017

Using per frag storage for frag_prefix_size is really silly.

mlx4_en_complete_rx_desc() has all needed info already.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

aaca121d

mlx4: remove order field from mlx4_en_frag_info · 159ddfd2

Eric Dumazet authored Mar 08, 2017

This is really a port attribute, no need to duplicate it per
RX queue and per frag.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

159ddfd2

mlx4: dma_dir is a mlx4_en_priv attribute · 69ba9431

Eric Dumazet authored Mar 08, 2017

No need to duplicate it for all queues and frags.

num_frags & log_rx_info become u8 to save space.
u8 accesses are a bit faster than u16 anyway.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

69ba9431

Merge branch 'mlxsw-cosmetics' · 3c66d1c7

David S. Miller authored Mar 08, 2017

Jiri Pirko says:

====================
mlxsw: cosmetics

Couple of cosmetic mlxsw patches
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

3c66d1c7

mlxsw: pci: Remove unused bit · 61793af6

Ido Schimmel authored Mar 06, 2017

The overrun ignore bit isn't supported by the device's firmware and was
recently removed from the programmer's reference manual (PRM).

Remove it from the driver as well.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

61793af6

mlxsw: spectrum: Fix helper function and port variable names · 1182e536

Jiri Pirko authored Mar 06, 2017

Commit dd82364c ("mlxsw: Flip to the new dev walk API") did some
small changes in mlxsw code, but it did not respect the naming
conventions. So fix this now.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1182e536

net: use proper lockdep annotation in __sk_dst_set() · 95964c6d

Eric Dumazet authored Mar 06, 2017

__sk_dst_set() must be called while we own the socket.

We can get proper lockdep coverage using lockdep_sock_is_held()
and rcu_dereference_protected()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

95964c6d

Merge branch 'flow_dissector-improvements' · 337e6392

David S. Miller authored Mar 08, 2017

Jiri Pirko says:

====================
flow dissector improvements

This patchset follows-up the discussion about future extensions of flow
dissector and tries to address the mentioned concerns. Some parts are
cut out into sub-functions. Also, the processing of the code (ARP, MPLS)
is made dependent on user actually requiring the bisected values.
This prepares the code for future extensions to bisect IPv6 ND messages,
TCP flags, etc.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

337e6392

flow_dissector: Move GRE dissection into a separate function · 7c92de8e

Jiri Pirko authored Mar 06, 2017

Make the main flow_dissect function a bit smaller and move the GRE
dissection into a separate function.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7c92de8e

flow_dissector: rename "proto again" goto label · c5ef188e

Jiri Pirko authored Mar 06, 2017

Align with "ip_proto_again" label used in the same function and rename
vague "again" to "proto_again".
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c5ef188e

flow_dissector: Fix GRE header error path · d5774b93

Jiri Pirko authored Mar 06, 2017

Now, when an unexpected element in the GRE header appears, we break so
the l4 ports are processed. But since the ports are processed
unconditionally, there will be certainly random values dissected. Fix
this by just bailing out in such situations.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d5774b93

flow_dissector: Move MPLS dissection into a separate function · 4a5d6c8b

Jiri Pirko authored Mar 06, 2017

Make the main flow_dissect function a bit smaller and move the MPLS
dissection into a separate function. Along with that, do the MPLS header
processing only in case the flow dissection user requires it.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4a5d6c8b

flow_dissector: Move ARP dissection into a separate function · 9bf881ff

Jiri Pirko authored Mar 06, 2017

Make the main flow_dissect function a bit smaller and move the ARP
dissection into a separate function. Along with that, do the ARP header
processing only in case the flow dissection user requires it.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9bf881ff

net: toshiba: spider_net: use new api ethtool_{get|set}_link_ksettings · fa383d66

Philippe Reynes authored Mar 05, 2017

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fa383d66

net: toshiba: ps3_genic_net: use new api ethtool_{get|set}_link_ksettings · 60f28512

Philippe Reynes authored Mar 05, 2017

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Tested-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

60f28512

net: sun: sunhme: use new api ethtool_{get|set}_link_ksettings · e016cc64

Philippe Reynes authored Mar 05, 2017

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e016cc64