Commits · 83b1bc122cab87547731a154db5feec5b9d4807c · Kirill Smelkov / linux

06 Dec, 2018 22 commits

tun: align write-heavy flow entry members to a cache line · 83b1bc12

Li RongQing authored Dec 06, 2018

tun flow entry 'updated' fields are written when receive
every packet. Thus if a flow is receiving packets from a
particular flow entry, it'll cause false-sharing with
all the other who has looked it up, so move it in its own
cache line

and update 'queue_index' and 'update' field only when
they are changed to reduce the cache false-sharing.
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Wang Li <wangli39@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

83b1bc12

neighbor: Add extack messages for add and delete commands · 7a35a50d

David Ahern authored Dec 05, 2018

Add extack messages for failures in neigh_add and neigh_delete.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7a35a50d

tipc: fix node keep alive interval calculation · f5d6c3e5

Hoang Le authored Dec 06, 2018

When setting LINK tolerance, node timer interval will be calculated
base on the LINK with lowest tolerance.

But when calculated, the old node timer interval only updated if current
setting value (tolerance/4) less than old ones regardless of number of
links as well as links' lowest tolerance value.

This caused to two cases missing if tolerance changed as following:
Case 1:
1.1/ There is one link (L1) available in the system
1.2/ Set L1's tolerance from 1500ms => lower (i.e 500ms)
1.3/ Then, fallback to default (1500ms) or higher (i.e 2000ms)

Expected:
    node timer interval is 1500/4=375ms after 1.3

Result:
node timer interval will not being updated after changing tolerance at 1.3
since its value 1500/4=375ms is not less than 500/4=125ms at 1.2.

Case 2:
2.1/ There are two links (L1, L2) available in the system
2.2/ L1 and L2 tolerance value are 2000ms as initial
2.3/ Set L2's tolerance from 2000ms => lower 1500ms
2.4/ Disable link L2 (bring down its bearer)

Expected:
    node timer interval is 2000ms/4=500ms after 2.4

Result:
node timer interval will not being updated after disabling L2 since
its value 2000ms/4=500ms is still not less than 1500/4=375ms at 2.3
although L2 is already not available in the system.

To fix this, we start the node interval calculation by initializing it to
a value larger than any conceivable calculated value. This way, the link
with the lowest tolerance will always determine the calculated value.
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

f5d6c3e5

net: Use of_node_name_eq for node name comparisons · bf5849f1

Rob Herring authored Dec 05, 2018

Convert string compares of DT node names to use of_node_name_eq helper
instead. This removes direct access to the node name pointer.

For instances using of_node_cmp, this has the side effect of now using
case sensitive comparisons. This should not matter for any FDT based
system which all of these are.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Claudiu Manoil <claudiu.manoil@nxp.com>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Wingman Kwok <w-kwok2@ti.com>
Cc: Murali Karicheri <m-karicheri2@ti.com>
Cc: netdev@vger.kernel.org
Cc: linux-omap@vger.kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bf5849f1

net: netem: use a list in addition to rbtree · d66280b1

Peter Oskolkov authored Dec 04, 2018

When testing high-bandwidth TCP streams with large windows,
high latency, and low jitter, netem consumes a lot of CPU cycles
doing rbtree rebalancing.

This patch uses a linear list/queue in addition to the rbtree:
if an incoming packet is past the tail of the linear queue, it is
added there, otherwise it is inserted into the rbtree.

Without this patch, perf shows netem_enqueue, netem_dequeue,
and rb_* functions among the top offenders. With this patch,
only netem_enqueue is noticeable if jitter is low/absent.
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Peter Oskolkov <posk@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d66280b1

Merge branch 'net-bridge-convert-multicast-to-generic-rhashtable' · 932c4417

David S. Miller authored Dec 05, 2018

Nikolay Aleksandrov says:

====================
net: bridge: convert multicast to generic rhashtable

The current bridge multicast code uses a custom rhashtable
implementation which predates the generic rhashtable API. Patch 01
converts it to use the generic kernel rhashtable which simplifies the
code a lot and removes duplicated functionality. The convert also makes
hash_elasticity obsolete as the generic rhashtable already has such
checks and has a fixed elasticity of RHT_ELASTICITY (16 currently) so we
emit a warning whenever elasticity is set and return RHT_ELASTICITY when
read (patch 03). Patch 02 converts the multicast code to use non-bh RCU
flavor as it was mixing bh and non-bh. Since now we have the generic
rhashtable which autoshrinks we can be more liberal with the default
hash maximum so patch 04 increases it to 4096 and moves it to a define in
br_private.h.

v3: add non-rcu br_mdb_get variant and use it where we have
    multicast_lock, drop special hash_max handling and just set it where
    needed and use non-bh RCU consistently (patch 02, new)
v2: send the latest version of the set which handles when IGMP snooping
    is not defined, changes are in patch 01
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

932c4417

net: bridge: increase multicast's default maximum number of entries · d08c6bc0

Nikolay Aleksandrov authored Dec 05, 2018

bridge's default hash_max was 512 which is rather conservative, now that
we're using the generic rhashtable API which autoshrinks let's increase
it to 4096 and move it to a define in br_private.h.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d08c6bc0

net: bridge: mark hash_elasticity as obsolete · cf332bca

Nikolay Aleksandrov authored Dec 05, 2018

Now that the bridge multicast uses the generic rhashtable interface we
can drop the hash_elasticity option as that is already done for us and
it's hardcoded to a maximum of RHT_ELASTICITY (16 currently). Add a
warning about the obsolete option when the hash_elasticity is set.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cf332bca

net: bridge: multicast: use non-bh rcu flavor · 4329596c

Nikolay Aleksandrov authored Dec 05, 2018

The bridge multicast code has been using a mix of RCU and RCU-bh flavors
sometimes in questionable way. Since we've moved to rhashtable just use
non-bh RCU everywhere. In addition this simplifies freeing of objects
and allows us to remove some unnecessary callback functions.

v3: new patch
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4329596c

net: bridge: convert multicast to generic rhashtable · 19e3a9c9

Nikolay Aleksandrov authored Dec 05, 2018

The bridge multicast code currently uses a custom resizable hashtable
which predates the generic rhashtable interface. It has many
shortcomings compared and duplicates functionality that is presently
available via the generic rhashtable, so this patch removes the custom
rhashtable implementation in favor of the kernel's generic rhashtable.
The hash maximum is kept and the rhashtable's size is used to do a loose
check if it's reached in which case we revert to the old behaviour and
disable further bridge multicast processing. Also now we can support any
hash maximum, doesn't need to be a power of 2.

v3: add non-rcu br_mdb_get variant and use it where multicast_lock is
    held to avoid RCU splat, drop hash_max function and just set it
    directly

v2: handle when IGMP snooping is undefined, add br_mdb_init/uninit
    placeholders
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

19e3a9c9

Merge tag 'mlx5e-updates-2018-12-04' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · ba5dfaff

David S. Miller authored Dec 05, 2018

Saeed Mahameed says:

====================
mlx5e-updates-2018-12-04

This series includes updates to mlx5e netdevice driver

From Saeed, Remove trailing space of tx_pause ethtool stat
From Gal, Cleanup unused defines
From Aya, ethtool Support for configuring of RX hash fields
From Tariq, Improve ethtool private-flags code structure
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ba5dfaff

Merge branch 'u32-to-linkmode-fixes' · 7127f2fe

David S. Miller authored Dec 05, 2018

Andrew Lunn says:

====================
u32 to linkmode fixes

This patchset fixes issues found in the last patchset which converted
the phydev advertise etc, from a u32 to a linux bitmap. Most of the
issues are the result of clearing bits which should not of been
cleared. To make the API clearer, the idea from Heiner Kallweit was
used, with _mod_ to indicate the function modifies just the bits it
needs to, or _to_ to clear all bits and just set bit that need to be
set.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

7127f2fe

net: phy: Fix ioctl handler when modifing MII_ADVERTISE · 9db299c7

Andrew Lunn authored Dec 05, 2018

When the MII_ADVERTISE register is modified by the IOCTL handler,
phydev->advertising needs recalculating. Use the _mod_ variant of
mii_adv_to_linkmode_adv_t so that bits outside of the advertise
registers are not cleared.

Fixes: c0ec3c27 ("net: phy: Convert u32 phydev->lp_advertising to linkmode")
Reported-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

9db299c7

net: mii: mii_lpa_mod_linkmode_lpa_t: Make use of linkmode_mod_bit helper · 6dbd0090

Andrew Lunn authored Dec 05, 2018

Replace the if else code structure with a call to the helper
linkmode_mod_bit.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

6dbd0090

net: mii: Add mii_lpa_mod_linkmode_lpa_t · d3351931

Andrew Lunn authored Dec 05, 2018

Add a _mod_ variant of mii_lpa_to_linkmode_lpa_t. Use this to fix the
genphy_read_status() where the 1G link partner features are getting
lost.

Fixes: c0ec3c27 ("net: phy: Convert u32 phydev->lp_advertising to linkmode")
Reported-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

d3351931

phy: marvell: Rename mii_lpa_to_linkmode_lpa_t · ab9cb729

Andrew Lunn authored Dec 05, 2018

Rename mii_lpa_to_linkmode_lpa_t to mii_lpa_mod_linkmode_lpa_t to
indicate it modifies the passed linkmode bitmap, without clearing any
other bits.

Also, ensure bit are clear which the lpa indicates should not be set.

Fixes: c0ec3c27 ("net: phy: Convert u32 phydev->lp_advertising to linkmode")
Suggested-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

ab9cb729

net: mii: Rename mii_stat1000_to_linkmode_lpa_t · 78a24df3

Andrew Lunn authored Dec 05, 2018

Rename mii_stat1000_to_linkmode_lpa_t to
mii_stat1000_mod_linkmode_lpa_t to indicate it modifies the passed
linkmode bitmap, without clearing any other bits.

Add a helper to set/clear bits in a linkmode.

Use this helper to ensure bit are clear which the stat1000 indicates
should not be set.

Fixes: c0ec3c27 ("net: phy: Convert u32 phydev->lp_advertising to linkmode")
Suggested-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

78a24df3

net: mii: Fix autoneg in mii_lpa_to_linkmode_lpa_t() · 5f15eed2

Andrew Lunn authored Dec 05, 2018

mii_adv_to_linkmode_adv_t() clears all bits before setting it needs to
set. This means the freshly set Autoneg gets cleared.

Change the order, and add comments about it clearing the old content
of the bitmap.

Fixes: c0ec3c27 ("net: phy: Convert u32 phydev->lp_advertising to linkmode")
Reported-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

5f15eed2

net/mlx5e: Improve ethtool private-flags code structure · 8ff57c18

Tariq Toukan authored Nov 19, 2018

Refactor the code of private-flags setter.
Replace consecutive calls to mlx5e_handle_pflag with a loop
that uses a preset set of parameters.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

8ff57c18

net/mlx5e: ethtool, Support user configuration for RX hash fields · 756c4160

Aya Levin authored Oct 23, 2018

Enable user configuration of RX hash fields that are used for traffic
spreading into RX queues. User can change built-in RSS (Receive Side
Scaling) profiles on the following traffic types: UDP4, UDP6, TCP4 and
TCP6.  This configuration effects both outer and inner headers.  Added
support for ethtool commands: ETHTOOL_SRXFH and ETHTOOL_GRXFH.

Command example respectively:
$ethtool -N eth1 rx-flow-hash tcp4 sdfn
$ethtool -n eth1 rx-flow-hash tcpp4
IP SA
IP DA
L4 bytes 0 & 1 [TCP/UDP src port]
L4 bytes 2 & 3 [TCP/UDP dst port]
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

756c4160

net/mlx5e: Move RSS params to a dedicated struct · bbeb53b8

Aya Levin authored Nov 06, 2018

Remove RSS params from params struct under channels, and introduce
a new struct with RSS configuration params under priv struct. There is
no functional change here.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

bbeb53b8

net/mlx5e: Refactor TIR configuration function · d930ac79

Aya Levin authored Oct 28, 2018

Refactor mlx5e_build_indir_tir_ctx_hash for better code re-use. TIR
stands for Transport Interface Receive, which is responsible for all
transport related operations on the receive side. Added a
static array with TIR default configuration values. This separates
configuration values from command setting, which is needed for
downstream patch.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

d930ac79

05 Dec, 2018 9 commits

net: documentation: build a directory structure for drivers · b255e500

Jakub Kicinski authored Dec 03, 2018

Documentation/networking/ is full of cryptically named files with
driver documentation.  This makes finding interesting information
at a glance really hard.  Move all those files into a directory
called device_drivers (since not all drivers are for device) and
fix up references.

RFC v0.1 -> RFC v1:
 - also add .txt suffix to the files which are missing it (Quentin)
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: David Ahern <dsahern@gmail.com>
Acked-by: Henrik Austad <henrik@austad.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

b255e500

tcp: reduce POLLOUT events caused by TCP_NOTSENT_LOWAT · a74f0fa0

Eric Dumazet authored Dec 04, 2018

TCP_NOTSENT_LOWAT socket option or sysctl was added in linux-3.12
as a step to enable bigger tcp sndbuf limits.

It works reasonably well, but the following happens :

Once the limit is reached, TCP stack generates
an [E]POLLOUT event for every incoming ACK packet.

This causes a high number of context switches.

This patch implements the strategy David Miller added
in sock_def_write_space() :

 - If TCP socket has a notsent_lowat constraint of X bytes,
   allow sendmsg() to fill up to X bytes, but send [E]POLLOUT
   only if number of notsent bytes is below X/2

This considerably reduces TCP_NOTSENT_LOWAT overhead,
while allowing to keep the pipe full.

Tested:
 100 ms RTT netem testbed between A and B, 100 concurrent TCP_STREAM

A:/# cat /proc/sys/net/ipv4/tcp_wmem
4096	262144	64000000
A:/# super_netperf 100 -H B -l 1000 -- -K bbr &

A:/# grep TCP /proc/net/sockstat
TCP: inuse 203 orphan 0 tw 19 alloc 414 mem 1364904 # This is about 54 MB of memory per flow :/

A:/# vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 256220672  13532 694976    0    0    10     0   28   14  0  1 99  0  0
 2  0      0 256320016  13532 698480    0    0   512     0 715901 5927  0 10 90  0  0
 0  0      0 256197232  13532 700992    0    0   735    13 771161 5849  0 11 89  0  0
 1  0      0 256233824  13532 703320    0    0   512    23 719650 6635  0 11 89  0  0
 2  0      0 256226880  13532 705780    0    0   642     4 775650 6009  0 12 88  0  0

A:/# echo 2097152 >/proc/sys/net/ipv4/tcp_notsent_lowat

A:/# grep TCP /proc/net/sockstat
TCP: inuse 203 orphan 0 tw 19 alloc 414 mem 86411 # 3.5 MB per flow

A:/# vmstat 5 5  # check that context switches have not inflated too much.
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  0      0 260386512  13592 662148    0    0    10     0   17   14  0  1 99  0  0
 0  0      0 260519680  13592 604184    0    0   512    13 726843 12424  0 10 90  0  0
 1  1      0 260435424  13592 598360    0    0   512    25 764645 12925  0 10 90  0  0
 1  0      0 260855392  13592 578380    0    0   512     7 722943 13624  0 11 88  0  0
 1  0      0 260445008  13592 601176    0    0   614    34 772288 14317  0 10 90  0  0
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a74f0fa0

Merge branch 'act_tunnel_key-support-key-less-tunnels' · 4dc88ce6

David S. Miller authored Dec 04, 2018

Or Gerlitz says:

====================
net/sched: act_tunnel_key: support key-less tunnels

This short series from Adi Nissim allows to support key-less tunnels
by the tc tunnel key actions, which is needed for some GRE use-cases.

changes from V0:
 - addresses build warning spotted by kbuild, make sure to always init
   to zero the tunnel key
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

4dc88ce6

net/sched: act_tunnel_key: Don't dump dst port if it wasn't set · 1c25324c

Adi Nissim authored Dec 02, 2018

It's possible to set a tunnel without a destination port. However,
on dump(), a zero dst port is returned to user space even if it was not
set, fix that.

Note that so far it wasn't required, b/c key less tunnels were not
supported and the UDP tunnels do require destination port.
Signed-off-by: Adi Nissim <adin@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1c25324c

net/sched: act_tunnel_key: Allow key-less tunnels · 80ef0f22

Adi Nissim authored Dec 02, 2018

Allow setting a tunnel without a tunnel key. This is required for
tunneling protocols, such as GRE, that define the key as an optional
field.
Signed-off-by: Adi Nissim <adin@mellanox.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

80ef0f22

qed: fix spelling mistake "Dispalying" -> "Displaying" · d1ecf8a6

Colin Ian King authored Dec 03, 2018

There is a spelling mistake in a DP_NOTICE message, fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d1ecf8a6

net/mlx5e: Move modify tirs hash functionality · 080d1b17

Aya Levin authored Oct 23, 2018

Move modify tirs hash functionality (mlx5e_modify_tirs_hash) from
en_ethtool.c to en_main.c. This allows future use of this fuctionality
from en_fs_ethtool.c, while keeping current convention: en_ethtool.c
doesn't have an API.  There is no functional change here.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

080d1b17

net/mlx5e: Cleanup unused defines · 30543831

Gal Pressman authored Nov 07, 2018

Remove couple of defines that are no longer used.
Signed-off-by: Gal Pressman <pressmangal@gmail.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

30543831

net/mlx5e: Remove trailing space of tx_pause ethtool counter name · 8742c7eb

Saeed Mahameed authored Nov 05, 2018

tx_pause_storm_warning_events ethtool counter name has a trailing
space, remove it.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>

8742c7eb

04 Dec, 2018 9 commits

Merge branch 'mlxsw-Add-one-armed-router-support' · 55827458

David S. Miller authored Dec 04, 2018

Ido Schimmel says:

====================
mlxsw: Add one-armed router support

Up until now, when a packet was routed by the ASIC through the same
router interface (RIF) from which it ingressed from, the ASIC passed the
sole copy of the packet to the kernel. This allowed the kernel to route
the packet and also potentially generate an ICMP redirect.

There are scenarios (e.g., "one-armed router") where packets are
intentionally routed this way and are therefore not deemed as
exceptions. In such scenarios the current method of trapping packets to
the CPU is problematic, as it results in major packet loss.

This patchset solves the problem by having the ASIC forward the packet,
but also send a copy to the CPU, which gives the kernel the opportunity
to generate required exceptions.

To prevent the kernel from forwarding such packets again, the driver
marks them with 'offload_l3_fwd_mark', which causes the kernel to
consume them in ip{,6}_forward_finish().

Patch #1 renames 'offload_mr_fwd_mark' to 'offload_l3_fwd_mark'. When
set, the field indicates that a packet was already forwarded in L3
(unicast / multicast) by a capable device.

Patch #2 teaches the kernel to consume unicast packets that have
'offload_l3_fwd_mark' set.

Patch #3 changes mlxsw to mirror loopbacked (iRIF == eRIF) packets,
instead of trapping them.

Patch #4 adds a test case for above mentioned scenario.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

55827458

selftests: mlxsw: Add one-armed router test · b6f153d3

Ido Schimmel authored Dec 04, 2018

Construct a "one-armed router" topology and test that packets are
forwarded by the ASIC and that a copy of the packet is sent to the
kernel, which does not forward the packet again.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b6f153d3

mlxsw: spectrum: Mirror loopbacked packets instead of trapping them · 2f4f4494

Ido Schimmel authored Dec 04, 2018

When the ASIC detects that a unicast packet is routed through the same
router interface (RIF) from which it ingressed (iRIF == eRIF), it raises
a trap called loopback error (LBERROR).

Thus far, this trap was configured to send a sole copy of the packet to
the CPU so that ICMP redirect packets could be potentially generated by
the kernel.

This is problematic as the CPU cannot forward packets at 3.2Tb/s and
there are scenarios (e.g., "one-armed router") where iRIF == eRIF is not
an exception.

Solve this by changing the trap to send a copy of the packet to the CPU.
To prevent the kernel from forwarding the packet again, it is marked
with 'offload_l3_fwd_mark'.

The trap is configured in a trap group of its own with a dedicated
policer in order not to prevent packets trapped by other traps from
reaching the CPU.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2f4f4494

net: Do not route unicast IP packets twice · f839a6c9

Ido Schimmel authored Dec 04, 2018

Packets marked with 'offload_l3_fwd_mark' were already forwarded by a
capable device and should not be forwarded again by the kernel.
Therefore, have the kernel consume them.

The check is performed in ip{,6}_forward_finish() in order to allow the
kernel to process such packets in ip{,6}_forward() and generate required
exceptions. For example, ICMP redirects.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f839a6c9

skbuff: Rename 'offload_mr_fwd_mark' to 'offload_l3_fwd_mark' · 875e8939

Ido Schimmel authored Dec 04, 2018

Commit abf4bb6b ("skbuff: Add the offload_mr_fwd_mark field") added
the 'offload_mr_fwd_mark' field to indicate that a packet has already
undergone L3 multicast routing by a capable device. The field is used to
prevent the kernel from forwarding a packet through a netdev through
which the device has already forwarded the packet.

Currently, no unicast packet is routed by both the device and the
kernel, but this is about to change by subsequent patches and we need to
be able to mark such packets, so that they will no be forwarded twice.

Instead of adding yet another field to 'struct sk_buff', we can just
rename 'offload_mr_fwd_mark' to 'offload_l3_fwd_mark', as a packet
either has a multicast or a unicast destination IP.

While at it, add a comment about both 'offload_fwd_mark' and
'offload_l3_fwd_mark'.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

875e8939

net: marvell: convert to DEFINE_SHOW_ATTRIBUTE · d9bbd6a1

Yangtao Li authored Dec 03, 2018

Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.
Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d9bbd6a1

net: qca_spi: convert to DEFINE_SHOW_ATTRIBUTE · 25079154

Yangtao Li authored Dec 03, 2018

Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.
Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Acked-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

25079154

net: stmmac: convert to DEFINE_SHOW_ATTRIBUTE · fb0d9c63

Yangtao Li authored Dec 03, 2018

Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.
Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fb0d9c63

nfp: convert to DEFINE_SHOW_ATTRIBUTE · 6f6c74fa

Yangtao Li authored Dec 03, 2018

Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.
Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6f6c74fa