Commits · e54b708c5441e3aee20b9352334ff610649ac227 · Kirill Smelkov / linux

29 Nov, 2021 10 commits

net: hns3: use macro IANA_VXLAN_GPE_UDP_PORT to replace number 4790 · e54b708c

Hao Chen authored Nov 27, 2021

This patch uses macro IANA_VXLAN_GPE_UDP_PORT to replace number 4790 for
cleanup.
Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e54b708c

net: vxlan: add macro definition for number of IANA VXLAN-GPE port · ed618bd8

Hao Chen authored Nov 27, 2021

Add macro definition for number of IANA VXLAN-GPE port for generic use.
Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ed618bd8

net: Write lock dev_base_lock without disabling bottom halves. · fd888e85

Sebastian Andrzej Siewior authored Nov 26, 2021

The writer acquires dev_base_lock with disabled bottom halves.
The reader can acquire dev_base_lock without disabling bottom halves
because there is no writer in softirq context.

On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts
as a lock to ensure that resources, that are protected by disabling
bottom halves, remain protected.
This leads to a circular locking dependency if the lock acquired with
disabled bottom halves (as in write_lock_bh()) and somewhere else with
enabled bottom halves (as by read_lock() in netstat_show()) followed by
disabling bottom halves (cxgb_get_stats() -> t4_wr_mbox_meat_timeout()
-> spin_lock_bh()). This is the reverse locking order.

All read_lock() invocation are from sysfs callback which are not invoked
from softirq context. Therefore there is no need to disable bottom
halves while acquiring a write lock.

Acquire the write lock of dev_base_lock without disabling bottom halves.
Reported-by: Pei Zhang <pezhang@redhat.com>
Reported-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

fd888e85

net/l2tp: convert tunnel rwlock_t to rcu · 07b8ca37

Tom Parkin authored Nov 26, 2021

Previously commit e02d494d ("l2tp: Convert rwlock to RCU") converted
most, but not all, rwlock instances in the l2tp subsystem to RCU.

The remaining rwlock protects the per-tunnel hashlist of sessions which
is used for session lookups in the UDP-encap data path.

Convert the remaining rwlock to rcu to improve performance of UDP-encap
tunnels.

Note that the tunnel and session, which both live on RCU-protected
lists, use slightly different approaches to incrementing their refcounts
in the various getter functions.

The tunnel has to use refcount_inc_not_zero because the tunnel shutdown
process involves dropping the refcount to zero prior to synchronizing
RCU readers (via. kfree_rcu).

By contrast, the session shutdown removes the session from the list(s)
it is on, synchronizes with readers, and then decrements the session
refcount.  Since the getter functions increment the session refcount
with the RCU read lock held we prevent getters seeing a zero session
refcount, and therefore don't need to use refcount_inc_not_zero.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

07b8ca37

Merge branch 'mvneta-next' · 275f37ea

David S. Miller authored Nov 29, 2021

Maxime Chevallier says:

====================
net: mvneta: mqprio cleanups and shaping support

This is the second version of the series that adds some improvements to the
existing mqprio implementation in mvneta, and adds support for
egress shaping offload.

The first 3 patches are some minor cleanups, such as using the
tc_mqprio_qopt_offload structure to get access to more offloading
options, cleaning the logic to detect whether or not we should offload
mqprio setting, and allowing to have a 1 to N mapping between TCs and
queues.

The last patch adds traffic shaping offload, using mvneta's per-queue
token buckets, allowing to limit rates from 10Kbps up to 5Gbps with
10Kbps increments.

This was tested only on an Armada 3720, with traffic up to 2.5Gbps.

Changes since V1 fixes the build for 32bits kernels, using the right
div helpers as suggested by Jakub.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

275f37ea

net: mvneta: Add TC traffic shaping offload · 2551dc9e

Maxime Chevallier authored Nov 26, 2021

The mvneta controller is able to do some tocken-bucket per-queue traffic
shaping. This commit adds support for setting these using the TC mqprio
interface.

The token-bucket parameters are customisable, but the current
implementation configures them to have a 10kbps resolution for the
rate limitation, since it allows to cover the whole range of max_rate
values from 10kbps to 5Gbps with 10kbps increments.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2551dc9e

net: mvneta: Allow having more than one queue per TC · e9f7099d

Maxime Chevallier authored Nov 26, 2021

The current mqprio implementation assumed that we are only using one
queue per TC. Use the offset and count parameters to allow using
multiple queues per TC. In that case, the controller will use a standard
round-robin algorithm to pick queues assigned to the same TC, with the
same priority.

This only applies to VLAN priorities in ingress traffic, each TC
corresponding to a vlan priority.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e9f7099d

net: mvneta: Don't force-set the offloading flag · e7ca75fe

Maxime Chevallier authored Nov 26, 2021

The qopt->hw flag is set by the TC code according to the offloading mode
asked by user. Don't force-set it in the driver, but instead read it to
make sure we do what's asked.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e7ca75fe

net: mvneta: Use struct tc_mqprio_qopt_offload for MQPrio configuration · 75fa71e3

Maxime Chevallier authored Nov 26, 2021

The struct tc_mqprio_qopt_offload is a container for struct tc_mqprio_qopt,
that allows passing extra parameters, such as traffic shaping. This commit
converts the current mqprio code to that new struct.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

75fa71e3

net: mdio: ipq8064: replace ioremap() with devm_ioremap() · 2f7ed29f

Yang Yingliang authored Nov 26, 2021

Use devm_ioremap() instead of ioremap() to avoid iounmap() missing.
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2f7ed29f

27 Nov, 2021 27 commits

Merge branch 'af_unix-replace-unix_table_lock-with-per-hash-locks' · d40ce48c

Jakub Kicinski authored Nov 26, 2021

Kuniyuki Iwashima says:

====================
af_unix: Replace unix_table_lock with per-hash locks.

The hash table of AF_UNIX sockets is protected by a single big lock,
unix_table_lock.  This series replaces it with small per-hash locks.

1st -  2nd : Misc refactoring
3rd -  8th : Separate BSD/abstract address logics
9th - 11th : Prep to save a hash in each socket
12th       : Replace the big lock
13th       : Speed up autobind()

Note to maintainers:
The 12th patch adds two kinds of Sparse warnings on patchwork:

  about unix_table_double_lock/unlock()
    We can avoid this by adding two apparent acquires/releases annotations,
    but there are the same kinds of warnings about unix_state_double_lock().

  about unix_next_socket() and unix_seq_stop() (/proc/net/unix)
    This is because Sparse does not understand logic in unix_next_socket(),
    which leaves a spin lock held until it returns NULL.
    Also, tcp_seq_stop() causes a warning for the same reason.

These warnings seem reasonable, but let me know if there is any better way.
Please see [0] for details.

[0]: https://lore.kernel.org/netdev/20211117001611.74123-1-kuniyu@amazon.co.jp/
====================

Link: https://lore.kernel.org/r/20211124021431.48956-1-kuniyu@amazon.co.jpSigned-off-by: Jakub Kicinski <kuba@kernel.org>

d40ce48c

af_unix: Relax race in unix_autobind(). · 9acbc584

Kuniyuki Iwashima authored Nov 24, 2021

When we bind an AF_UNIX socket without a name specified, the kernel selects
an available one from 0x00000 to 0xFFFFF. unix_autobind() starts searching
from a number in the 'static' variable and increments it after acquiring
two locks.

If multiple processes try autobind, they obtain the same lock and check if
a socket in the hash list has the same name. If not, one process uses it,
and all except one end up retrying the _next_ number (actually not, it may
be incremented by the other processes). The more we autobind sockets in
parallel, the longer the latency gets. We can avoid such a race by
searching for a name from a random number.

These show latency in unix_autobind() while 64 CPUs are simultaneously
autobind-ing 1024 sockets for each.

Without this patch:

usec : count distribution
0 : 1176 |*** |
2 : 3655 |*********** |
4 : 4094 |************* |
6 : 3831 |************ |
8 : 3829 |************ |
10 : 3844 |************ |
12 : 3638 |*********** |
14 : 2992 |********* |
16 : 2485 |******* |
18 : 2230 |******* |
20 : 2095 |****** |
22 : 1853 |***** |
24 : 1827 |***** |
26 : 1677 |***** |
28 : 1473 |**** |
30 : 1573 |***** |
32 : 1417 |**** |
34 : 1385 |**** |
36 : 1345 |**** |
38 : 1344 |**** |
40 : 1200 |*** |

With this patch:

usec : count distribution
0 : 1855 |****** |
2 : 6464 |********************* |
4 : 9936 |******************************** |
6 : 12107 |****************************************|
8 : 10441 |********************************** |
10 : 7264 |*********************** |
12 : 4254 |************** |
14 : 2538 |******** |
16 : 1596 |***** |
18 : 1088 |*** |
20 : 800 |** |
22 : 670 |** |
24 : 601 |* |
26 : 562 |* |
28 : 525 |* |
30 : 446 |* |
32 : 378 |* |
34 : 337 |* |
36 : 317 |* |
38 : 314 |* |
40 : 298 | |
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

9acbc584

af_unix: Replace the big lock with small locks. · afd20b92

Kuniyuki Iwashima authored Nov 24, 2021

The hash table of AF_UNIX sockets is protected by the single lock. This
patch replaces it with per-hash locks.

The effect is noticeable when we handle multiple sockets simultaneously.
Here is a test result on an EC2 c5.24xlarge instance. It shows latency
(under 10us only) in unix_insert_unbound_socket() while 64 CPUs creating
1024 sockets for each in parallel.

Without this patch:

nsec : count distribution
0 : 179 | |
500 : 3021 |********* |
1000 : 6271 |******************* |
1500 : 6318 |******************* |
2000 : 5828 |***************** |
2500 : 5124 |*************** |
3000 : 4426 |************* |
3500 : 3672 |*********** |
4000 : 3138 |********* |
4500 : 2811 |******** |
5000 : 2384 |******* |
5500 : 2023 |****** |
6000 : 1954 |***** |
6500 : 1737 |***** |
7000 : 1749 |***** |
7500 : 1520 |**** |
8000 : 1469 |**** |
8500 : 1394 |**** |
9000 : 1232 |*** |
9500 : 1138 |*** |
10000 : 994 |*** |

With this patch:

nsec : count distribution
0 : 1634 |**** |
500 : 13170 |****************************************|
1000 : 13156 |*************************************** |
1500 : 9010 |*************************** |
2000 : 6363 |******************* |
2500 : 4443 |************* |
3000 : 3240 |********* |
3500 : 2549 |******* |
4000 : 1872 |***** |
4500 : 1504 |**** |
5000 : 1247 |*** |
5500 : 1035 |*** |
6000 : 889 |** |
6500 : 744 |** |
7000 : 634 |* |
7500 : 498 |* |
8000 : 433 |* |
8500 : 355 |* |
9000 : 336 |* |
9500 : 284 | |
10000 : 243 | |
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

afd20b92

af_unix: Save hash in sk_hash. · e6b4b873

Kuniyuki Iwashima authored Nov 24, 2021

To replace unix_table_lock with per-hash locks in the next patch, we need
to save a hash in each socket because /proc/net/unix or BPF prog iterate
sockets while holding a hash table lock and release it later in a different
function.

Currently, we store a real/pseudo hash in struct unix_address.  However, we
do not allocate it to unbound sockets, nor should we do just for that.  For
this purpose, we can use sk_hash.  Then, we no longer use the hash field in
struct unix_address and can remove it.

Also, this patch does
  - rename unix_insert_socket() to unix_insert_unbound_socket()
  - remove the redundant list argument from __unix_insert_socket() and
     unix_insert_unbound_socket()
  - use 'unsigned int' instead of 'unsigned' in __unix_set_addr_hash()
  - remove 'inline' from unix_remove_socket() and
     unix_insert_unbound_socket().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

e6b4b873

af_unix: Add helpers to calculate hashes. · f452be49

Kuniyuki Iwashima authored Nov 24, 2021

This patch adds three helper functions that calculate hashes for unbound
sockets and bound sockets with BSD/abstract addresses.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

f452be49

af_unix: Remove UNIX_ABSTRACT() macro and test sun_path[0] instead. · 5ce7ab49

Kuniyuki Iwashima authored Nov 24, 2021

In BSD and abstract address cases, we store sockets in the hash table with
keys between 0 and UNIX_HASH_SIZE - 1. However, the hash saved in a socket
varies depending on its address type; sockets with BSD addresses always
have UNIX_HASH_SIZE in their unix_sk(sk)->addr->hash.

This is just for the UNIX_ABSTRACT() macro used to check the address type.
The difference of the saved hashes comes from the first byte of the address
in the first place. So, we can test it directly.

Then we can keep a real hash in each socket and replace unix_table_lock
with per-hash locks in the later patch.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

5ce7ab49

af_unix: Allocate unix_address in unix_bind_(bsd|abstract)(). · 12f21c49

Kuniyuki Iwashima authored Nov 24, 2021

To terminate address with '\0' in unix_bind_bsd(), we add
unix_create_addr() and call it in unix_bind_bsd() and unix_bind_abstract().

Also, unix_bind_abstract() does not return -EEXIST.  Only
kern_path_create() and vfs_mknod() in unix_bind_bsd() can return it,
so we move the last error check in unix_bind() to unix_bind_bsd().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

12f21c49

af_unix: Remove unix_mkname(). · 5c32a3ed

Kuniyuki Iwashima authored Nov 24, 2021

This patch removes unix_mkname() and postpones calculating a hash to
unix_bind_abstract().  Some BSD stuffs still remain in unix_bind()
though, the next patch packs them into unix_bind_bsd().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

5c32a3ed

af_unix: Copy unix_mkname() into unix_find_(bsd|abstract)(). · d2d8c9fd

Kuniyuki Iwashima authored Nov 24, 2021

We should not call unix_mkname() before unix_find_other() and instead do
the same thing where necessary based on the address type:

  - terminating the address with '\0' in unix_find_bsd()
  - calculating the hash in unix_find_abstract().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

d2d8c9fd

af_unix: Cut unix_validate_addr() out of unix_mkname(). · b8a58aa6

Kuniyuki Iwashima authored Nov 24, 2021

unix_mkname() tests socket address length and family and does some
processing based on the address type.  It is called in the early stage,
and therefore some instructions are redundant and can end up in vain.

The address length/family tests are done twice in unix_bind().  Also, the
address type is rechecked later in unix_bind() and unix_find_other(), where
we can do the same processing.  Moreover, in the BSD address case, the hash
is set to 0 but never used and confusing.

This patch moves the address tests out of unix_mkname(), and the following
patches move the other part into appropriate places and remove
unix_mkname() finally.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

b8a58aa6

af_unix: Return an error as a pointer in unix_find_other(). · aed26f55

Kuniyuki Iwashima authored Nov 24, 2021

We can return an error as a pointer and need not pass an additional
argument to unix_find_other().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

aed26f55

af_unix: Factorise unix_find_other() based on address types. · fa39ef0e

Kuniyuki Iwashima authored Nov 24, 2021

As done in the commit fa42d910 ("unix_bind(): take BSD and abstract
address cases into new helpers"), this patch moves BSD and abstract address
cases from unix_find_other() into unix_find_bsd() and unix_find_abstract().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

fa39ef0e

af_unix: Pass struct sock to unix_autobind(). · f7ed31f4

Kuniyuki Iwashima authored Nov 24, 2021

We do not use struct socket in unix_autobind() and pass struct sock to
unix_bind_bsd() and unix_bind_abstract().  Let's pass it to unix_autobind()
as well.

Also, this patch fixes these errors by checkpatch.pl.

  ERROR: do not use assignment in if condition
  #1795: FILE: net/unix/af_unix.c:1795:
  +	if (test_bit(SOCK_PASSCRED, &sock->flags) && !u->addr

  CHECK: Logical continuations should be on the previous line
  #1796: FILE: net/unix/af_unix.c:1796:
  +	if (test_bit(SOCK_PASSCRED, &sock->flags) && !u->addr
  +	    && (err = unix_autobind(sock)) != 0)
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

f7ed31f4

af_unix: Use offsetof() instead of sizeof(). · 755662ce

Kuniyuki Iwashima authored Nov 24, 2021

The length of the AF_UNIX socket address contains an offset to the member
sun_path of struct sockaddr_un.

Currently, the preceding member is just sun_family, and its type is
sa_family_t and resolved to short.  Therefore, the offset is represented by
sizeof(short).  However, it is not clear and fragile to changes in struct
sockaddr_storage or sockaddr_un.

This commit makes it clear and robust by rewriting sizeof() with
offsetof().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

755662ce

bridge: use __set_bit in __br_vlan_set_default_pvid · 442b03c3

Xin Long authored Nov 24, 2021

The same optimization as the one in commit cc0be1ad ("net:
bridge: Slightly optimize 'find_portno()'") is needed for the
'changed' bitmap in __br_vlan_set_default_pvid().
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/4e35f415226765e79c2a11d2c96fbf3061c486e2.1637782773.git.lucien.xin@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

442b03c3

net: ethtool: set a default driver name · bde3b0fd

Tonghao Zhang authored Nov 26, 2021

The netdev (e.g. ifb, bareudp), which not support ethtool ops
(e.g. .get_drvinfo), we can use the rtnl kind as a default name.

ifb netdev may be created by others prefix, not ifbX.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Hao Chen <chenhao288@hisilicon.com>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Cc: Danielle Ratson <danieller@nvidia.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20211125163049.84970-1-xiangxia.m.yue@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

bde3b0fd

Merge branch 'selftests-net-bridge-vlan-multicast-tests' · c2e0cf08

Jakub Kicinski authored Nov 26, 2021

Nikolay Aleksandrov says:

====================
selftests: net: bridge: vlan multicast tests

This patch-set adds selftests for the new vlan multicast options that
were recently added. Most of the tests check for default values,
changing options and try to verify that the changes actually take
effect. The last test checks if the dependency between vlan_filtering
and mcast_vlan_snooping holds. The rest are pretty self-explanatory.

TEST: Vlan multicast snooping enable [ OK ]
TEST: Vlan global options existence [ OK ]
TEST: Vlan mcast_snooping global option default value [ OK ]
TEST: Vlan 10 multicast snooping control [ OK ]
TEST: Vlan mcast_querier global option default value [ OK ]
TEST: Vlan 10 multicast querier enable [ OK ]
TEST: Vlan 10 tagged IGMPv2 general query sent [ OK ]
TEST: Vlan 10 tagged MLD general query sent [ OK ]
TEST: Vlan mcast_igmp_version global option default value [ OK ]
TEST: Vlan mcast_mld_version global option default value [ OK ]
TEST: Vlan 10 mcast_igmp_version option changed to 3 [ OK ]
TEST: Vlan 10 tagged IGMPv3 general query sent [ OK ]
TEST: Vlan 10 mcast_mld_version option changed to 2 [ OK ]
TEST: Vlan 10 tagged MLDv2 general query sent [ OK ]
TEST: Vlan mcast_last_member_count global option default value [ OK ]
TEST: Vlan mcast_last_member_interval global option default value [ OK ]
TEST: Vlan 10 mcast_last_member_count option changed to 3 [ OK ]
TEST: Vlan 10 mcast_last_member_interval option changed to 200 [ OK ]
TEST: Vlan mcast_startup_query_interval global option default value [ OK ]
TEST: Vlan mcast_startup_query_count global option default value [ OK ]
TEST: Vlan 10 mcast_startup_query_interval option changed to 100 [ OK ]
TEST: Vlan 10 mcast_startup_query_count option changed to 3 [ OK ]
TEST: Vlan mcast_membership_interval global option default value [ OK ]
TEST: Vlan 10 mcast_membership_interval option changed to 200 [ OK ]
TEST: Vlan 10 mcast_membership_interval mdb entry expire [ OK ]
TEST: Vlan mcast_querier_interval global option default value [ OK ]
TEST: Vlan 10 mcast_querier_interval option changed to 100 [ OK ]
TEST: Vlan 10 mcast_querier_interval expire after outside query [ OK ]
TEST: Vlan mcast_query_interval global option default value [ OK ]
TEST: Vlan 10 mcast_query_interval option changed to 200 [ OK ]
TEST: Vlan mcast_query_response_interval global option default value [ OK ]
TEST: Vlan 10 mcast_query_response_interval option changed to 200 [ OK ]
TEST: Port vlan 10 option mcast_router default value [ OK ]
TEST: Port vlan 10 mcast_router option changed to 2 [ OK ]
TEST: Flood unknown vlan multicast packets to router port only [ OK ]
TEST: Disable multicast vlan snooping when vlan filtering is disabled [ OK ]
====================

Link: https://lore.kernel.org/r/20211125140858.3639139-1-razor@blackwall.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

c2e0cf08

selftests: net: bridge: add test for vlan_filtering dependency · f5a9dd58

Nikolay Aleksandrov authored Nov 25, 2021

Add a test for dependency of mcast_vlan_snooping on vlan_filtering. If
vlan_filtering gets disabled, then mcast_vlan_snooping must be
automatically disabled as well.

TEST: Disable multicast vlan snooping when vlan filtering is disabled   [ OK ]
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

f5a9dd58

selftests: net: bridge: add vlan mcast_router tests · 2cd67a4e

Nikolay Aleksandrov authored Nov 25, 2021

Add tests for the new per-port/vlan mcast_router option, verify that
unknown multicast packets are flooded only to router ports.

TEST: Port vlan 10 option mcast_router default value [ OK ]
TEST: Port vlan 10 mcast_router option changed to 2 [ OK ]
TEST: Flood unknown vlan multicast packets to router port only [ OK ]
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

2cd67a4e

selftests: net: bridge: add vlan mcast query and query response interval tests · b4ce7b95

Nikolay Aleksandrov authored Nov 25, 2021

Add tests which change the new per-vlan mcast_query_interval and verify
the new value is in effect, also add a test to change
mcast_query_response_interval's value.

TEST: Vlan mcast_query_interval global option default value [ OK ]
TEST: Vlan 10 mcast_query_interval option changed to 200 [ OK ]
TEST: Vlan mcast_query_response_interval global option default value [ OK ]
TEST: Vlan 10 mcast_query_response_interval option changed to 200 [ OK ]
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

b4ce7b95

selftests: net: bridge: add vlan mcast_querier_interval tests · 4d8610ee

Nikolay Aleksandrov authored Nov 25, 2021

Add tests which change the new per-vlan mcast_querier_interval and
verify that it is taken into account when an outside querier is present.

TEST: Vlan mcast_querier_interval global option default value [ OK ]
TEST: Vlan 10 mcast_querier_interval option changed to 100 [ OK ]
TEST: Vlan 10 mcast_querier_interval expire after outside query [ OK ]
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

4d8610ee

selftests: net: bridge: add vlan mcast_membership_interval test · a45fe974

Nikolay Aleksandrov authored Nov 25, 2021

Add a test which changes the new per-vlan mcast_membership_interval and
verifies that a newly learned mdb entry would expire in that interval.

TEST: Vlan mcast_membership_interval global option default value [ OK ]
TEST: Vlan 10 mcast_membership_interval option changed to 200 [ OK ]
TEST: Vlan 10 mcast_membership_interval mdb entry expire [ OK ]
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

a45fe974

selftests: net: bridge: add vlan mcast_startup_query_count/interval tests · bdf1b2c0

Nikolay Aleksandrov authored Nov 25, 2021

Add tests which change the new per-vlan startup query count/interval
options and verify the proper number of queries are sent in the expected
interval.

TEST: Vlan mcast_startup_query_interval global option default value [ OK ]
TEST: Vlan mcast_startup_query_count global option default value [ OK ]
TEST: Vlan 10 mcast_startup_query_interval option changed to 100 [ OK ]
TEST: Vlan 10 mcast_startup_query_count option changed to 3 [ OK ]
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bdf1b2c0

selftests: net: bridge: add vlan mcast_last_member_count/interval tests · 3825f1fb

Nikolay Aleksandrov authored Nov 25, 2021

Add tests which verify the default values of mcast_last_member_count
mcast_last_member_count and also try to change them.

TEST: Vlan mcast_last_member_count global option default value [ OK ]
TEST: Vlan mcast_last_member_interval global option default value [ OK ]
TEST: Vlan 10 mcast_last_member_count option changed to 3 [ OK ]
TEST: Vlan 10 mcast_last_member_interval option changed to 200 [ OK ]
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

3825f1fb

selftests: net: bridge: add vlan mcast igmp/mld version tests · 2b75e9dd

Nikolay Aleksandrov authored Nov 25, 2021

Add tests which change the new per-vlan IGMP/MLD versions and verify
that proper tagged general query packets are sent.

TEST: Vlan mcast_igmp_version global option default value [ OK ]
TEST: Vlan mcast_mld_version global option default value [ OK ]
TEST: Vlan 10 mcast_igmp_version option changed to 3 [ OK ]
TEST: Vlan 10 tagged IGMPv3 general query sent [ OK ]
TEST: Vlan 10 mcast_mld_version option changed to 2 [ OK ]
TEST: Vlan 10 tagged MLDv2 general query sent [ OK ]
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

2b75e9dd

selftests: net: bridge: add vlan mcast querier test · dee2cdc0

Nikolay Aleksandrov authored Nov 25, 2021

Add a test to try the new global vlan mcast_querier control and also
verify that tagged general query packets are properly generated when
querier is enabled for a single vlan.

TEST: Vlan mcast_querier global option default value [ OK ]
TEST: Vlan 10 multicast querier enable [ OK ]
TEST: Vlan 10 tagged IGMPv2 general query sent [ OK ]
TEST: Vlan 10 tagged MLD general query sent [ OK ]
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dee2cdc0

selftests: net: bridge: add vlan mcast snooping control test · 71ae450f

Nikolay Aleksandrov authored Nov 25, 2021

Add the first test for bridge per-vlan multicast snooping which checks
if control of the global and per-vlan options work as expected, joins
and leaves are tested at each option value.

71ae450f

26 Nov, 2021 3 commits

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 93d5404e

Jakub Kicinski authored Nov 26, 2021

drivers/net/ipa/ipa_main.c
  8afc7e47 ("net: ipa: separate disabling setup from modem stop")
  76b5fbcd ("net: ipa: kill ipa_modem_init()")

Duplicated include, drop one.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

93d5404e

Merge tag 'net-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · c5c17547

Linus Torvalds authored Nov 26, 2021

Pull networking fixes from Jakub Kicinski:
 "Networking fixes, including fixes from netfilter.

  Current release - regressions:

   - r8169: fix incorrect mac address assignment

   - vlan: fix underflow for the real_dev refcnt when vlan creation
     fails

   - smc: avoid warning of possible recursive locking

  Current release - new code bugs:

   - vsock/virtio: suppress used length validation

   - neigh: fix crash in v6 module initialization error path

  Previous releases - regressions:

   - af_unix: fix change in behavior in read after shutdown

   - igb: fix netpoll exit with traffic, avoid warning

   - tls: fix splice_read() when starting mid-record

   - lan743x: fix deadlock in lan743x_phy_link_status_change()

   - marvell: prestera: fix bridge port operation

  Previous releases - always broken:

   - tcp_cubic: fix spurious Hystart ACK train detections for
     not-cwnd-limited flows

   - nexthop: fix refcount issues when replacing IPv6 groups

   - nexthop: fix null pointer dereference when IPv6 is not enabled

   - phylink: force link down and retrigger resolve on interface change

   - mptcp: fix delack timer length calculation and incorrect early
     clearing

   - ieee802154: handle iftypes as u32, prevent shift-out-of-bounds

   - nfc: virtual_ncidev: change default device permissions

   - netfilter: ctnetlink: fix error codes and flags used for kernel
     side filtering of dumps

   - netfilter: flowtable: fix IPv6 tunnel addr match

   - ncsi: align payload to 32-bit to fix dropped packets

   - iavf: fix deadlock and loss of config during VF interface reset

   - ice: avoid bpf_prog refcount underflow

   - ocelot: fix broken PTP over IP and PTP API violations

  Misc:

   - marvell: mvpp2: increase MTU limit when XDP enabled"

* tag 'net-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (94 commits)
  net: dsa: microchip: implement multi-bridge support
  net: mscc: ocelot: correctly report the timestamping RX filters in ethtool
  net: mscc: ocelot: set up traps for PTP packets
  net: ptp: add a definition for the UDP port for IEEE 1588 general messages
  net: mscc: ocelot: create a function that replaces an existing VCAP filter
  net: mscc: ocelot: don't downgrade timestamping RX filters in SIOCSHWTSTAMP
  net: hns3: fix incorrect components info of ethtool --reset command
  net: hns3: fix one incorrect value of page pool info when queried by debugfs
  net: hns3: add check NULL address for page pool
  net: hns3: fix VF RSS failed problem after PF enable multi-TCs
  net: qed: fix the array may be out of bound
  net/smc: Don't call clcsock shutdown twice when smc shutdown
  net: vlan: fix underflow for the real_dev refcnt
  ptp: fix filter names in the documentation
  ethtool: ioctl: fix potential NULL deref in ethtool_set_coalesce()
  nfc: virtual_ncidev: change default device permissions
  net/sched: sch_ets: don't peek at classes beyond 'nbands'
  net: stmmac: Disable Tx queues when reconfiguring the interface
  selftests: tls: test for correct proto_ops
  tls: fix replacing proto_ops
  ...

c5c17547

net: dsa: microchip: implement multi-bridge support · b3612ccd

Oleksij Rempel authored Nov 26, 2021

Current driver version is able to handle only one bridge at time.
Configuring two bridges on two different ports would end up shorting this
bridges by HW. To reproduce it:

	ip l a name br0 type bridge
	ip l a name br1 type bridge
	ip l s dev br0 up
	ip l s dev br1 up
	ip l s lan1 master br0
	ip l s dev lan1 up
	ip l s lan2 master br1
	ip l s dev lan2 up

	Ping on lan1 and get response on lan2, which should not happen.

This happened, because current driver version is storing one global "Port VLAN
Membership" and applying it to all ports which are members of any
bridge.
To solve this issue, we need to handle each port separately.

This patch is dropping the global port member storage and calculating
membership dynamically depending on STP state and bridge participation.

Note: STP support was broken before this patch and should be fixed
separately.

Fixes: c2e86691 ("net: dsa: microchip: break KSZ9477 DSA driver into two files")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20211126123926.2981028-1-o.rempel@pengutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

b3612ccd