Commits · 4a753ca5013d465694d6639fbe1378831a399bda · Kirill Smelkov / linux

09 Jan, 2023 12 commits

selftest: mptcp: exit from copyfd_io_poll() when receive SIGUSR1 · 4a753ca5

Menglong Dong authored Jan 06, 2023

For now, mptcp_connect won't exit after receiving the 'SIGUSR1' signal
if '-r' is set. Fix this by skipping poll and sleep in copyfd_io_poll()
if 'quit' is set.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4a753ca5

mptcp: add statistics for mptcp socket in use · c558246e

Menglong Dong authored Jan 06, 2023

Do the statistics of mptcp socket in use with sock_prot_inuse_add().
Therefore, we can get the count of used mptcp socket from
/proc/net/protocols:

& cat /proc/net/protocols
protocol  size sockets  memory press maxhdr  slab module     cl co di ac io in de sh ss gs se re sp bi br ha uh gp em
MPTCPv6   2048      0       0   no       0   yes  kernel      y  n  y  y  y  y  y  y  y  y  y  y  n  n  n  y  y  y  n
MPTCP     1896      1       0   no       0   yes  kernel      y  n  y  y  y  y  y  y  y  y  y  y  n  n  n  y  y  y  n
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c558246e

mptcp: rename 'sk' to 'ssk' in mptcp_token_new_connect() · 294de909

Menglong Dong authored Jan 06, 2023

'ssk' should be more appropriate to be the name of the first argument
in mptcp_token_new_connect().
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

294de909

mptcp: init sk->sk_prot in build_msk() · ade4d754

Menglong Dong authored Jan 06, 2023

The 'sk_prot' field in token KUNIT self-tests will be dereferenced in
mptcp_token_new_connect(). Therefore, init it with tcp_prot.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ade4d754

mptcp: introduce 'sk' to replace 'sock->sk' in mptcp_listen() · cfdcfeed

Menglong Dong authored Jan 06, 2023

'sock->sk' is used frequently in mptcp_listen(). Therefore, we can
introduce the 'sk' and replace 'sock->sk' with it.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cfdcfeed

mptcp: use local variable ssk in write_options · 3c976f4c

Geliang Tang authored Jan 06, 2023

The local variable 'ssk' has been defined at the beginning of the function
mptcp_write_options(), use it instead of getting 'ssk' again.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3c976f4c

mptcp: use net instead of sock_net · a963853f

Geliang Tang authored Jan 06, 2023

Use the local variable 'net' instead of sock_net() in the functions where
the variable 'struct net *net' has been defined.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a963853f

mptcp: use msk_owned_by_me helper · 109cdeb8

Geliang Tang authored Jan 06, 2023

The helper msk_owned_by_me() is defined in protocol.h, so use it instead
of sock_owned_by_me().
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

109cdeb8

usbnet: optimize usbnet_bh() to reduce CPU load · fb59bf28

Leesoo Ahn authored Jan 06, 2023

The current source pushes skb into dev-done queue by calling
skb_dequeue_tail() and then pop it by skb_dequeue() to branch to
rx_cleanup state for freeing urb/skb in usbnet_bh(). It takes extra CPU
load, 2.21% (skb_queue_tail) as follows,

-   11.58%     0.26%  swapper          [k] usbnet_bh
   - 11.32% usbnet_bh
      - 6.43% skb_dequeue
           6.34% _raw_spin_unlock_irqrestore
      - 2.21% skb_queue_tail
           2.19% _raw_spin_unlock_irqrestore
      - 1.68% consume_skb
         - 0.97% kfree_skbmem
              0.80% kmem_cache_free
           0.53% skb_release_data

To reduce the extra CPU load use return values to call helper function
usb_free_skb() to free the resources instead of calling skb_queue_tail()
and skb_dequeue() for push and pop respectively.

-    7.87%     0.25%  swapper          [k] usbnet_bh
   - 7.62% usbnet_bh
      - 4.81% skb_dequeue
           4.74% _raw_spin_unlock_irqrestore
      - 1.75% consume_skb
         - 0.98% kfree_skbmem
              0.78% kmem_cache_free
           0.58% skb_release_data
        0.53% smsc95xx_rx_fixup
Signed-off-by: Leesoo Ahn <lsahn@ooseel.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

fb59bf28

Merge branch 'phy-micrel-warnings' · 9cb8bae3

David S. Miller authored Jan 09, 2023

Divya Koppera says:

====================
Fixed warnings

Fixed warnings related to PTR_ERR and initialization.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

9cb8bae3

net: phy: micrel: Fix warn: passing zero to PTR_ERR · 3f88d7d1

Divya Koppera authored Jan 06, 2023

Handle the NULL pointer case

Fixes New smatch warnings:
drivers/net/phy/micrel.c:2613 lan8814_ptp_probe_once() warn: passing zero to 'PTR_ERR'

vim +/PTR_ERR +2613 drivers/net/phy/micrel.c
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Divya Koppera <Divya.Koppera@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3f88d7d1

net: phy: micrel: Fixed error related to uninitialized symbol ret · d50ede4f

Divya Koppera authored Jan 06, 2023

Initialized return variable

Fixes Old smatch warnings:
drivers/net/phy/micrel.c:1750 ksz886x_cable_test_get_status() error:
uninitialized symbol 'ret'.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Divya Koppera <Divya.Koppera@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

d50ede4f

07 Jan, 2023 13 commits

Merge branch 'net-wangxun-adjust-code-structure' · f23395b4

Jakub Kicinski authored Jan 06, 2023

Jiawen Wu says:

====================
net: wangxun: Adjust code structure

Remove useless structs 'txgbe_hw' and 'ngbe_hw' make the codes clear.
And move the same codes which sets MAC address between txgbe and ngbe
to libwx. Further more, rename struct 'wx_hw' to 'wx' and move total
adapter members to wx.
====================

Link: https://lore.kernel.org/r/20230106033853.2806007-1-jiawenwu@trustnetic.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

f23395b4

net: ngbe: Remove structure ngbe_adapter · 803df55d

Mengyuan Lou authored Jan 06, 2023

Move the total private structure to libwx.
Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

803df55d

net: txgbe: Remove structure txgbe_adapter · 270a71e6

Jiawen Wu authored Jan 06, 2023

Move the total private structure to libwx.
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

270a71e6

net: wangxun: Rename private structure in libwx · 9607a3e6

Jiawen Wu authored Jan 06, 2023

In order to move the total members in struct adapter to struct wx_hw
to keep the code clean, it's a bad name of 'wx_hw' only for hardware.
Rename 'wx_hw' to 'wx', and rename the pointers at use.
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

9607a3e6

net: wangxun: Move MAC address handling to libwx · 79625f45

Jiawen Wu authored Jan 06, 2023

For setting MAC address, both txgbe and ngbe drivers have the same handling
flow with different parameters. Move the same codes to libwx.
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

79625f45

net: ngbe: Move defines into unified file · 92710fe6

Jiawen Wu authored Jan 06, 2023

Remove ngbe.h, move defines into ngbe_type.h file.
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

92710fe6

net: txgbe: Move defines into unified file · 524f6b29

Jiawen Wu authored Jan 06, 2023

Remove txgbe.h, move defines into txgbe_type.h file.
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

524f6b29

net: ngbe: Remove structure ngbe_hw · 8f727eec

Jiawen Wu authored Jan 06, 2023

Remove useless structure ngbe_hw to make the codes clear.
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

8f727eec

net: txgbe: Remove structure txgbe_hw · ce2b4ad5

Jiawen Wu authored Jan 06, 2023

Remove useless structure txgbe_hw to make the codes clear.
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ce2b4ad5

net: phy: micrel: Change handler interrupt for lan8814 · 7abd92a5

Horatiu Vultur authored Jan 04, 2023

The lan8814 represents a package of 4 PHYs. All of them are sharing the
same interrupt line. So when a link was going down/up or a frame was
timestamped, then the interrupt handler of all the PHYs was called.
Which is all fine and expected but the problem is the way the handler
interrupt works.
Basically if one of the PHYs timestamp a frame, then all the other 3
PHYs were polling the status of the interrupt until that PHY actually
cleared the interrupt by reading the timestamp.
The reason of polling was in case another PHY was also timestamping a
frame at the same time, it could miss this interrupt. But this is not
the right approach, because it is the interrupt controller who needs to
call the interrupt handlers again if the interrupt line is still
active.
Therefore change this such when the interrupt handler is called check
only if the interrupt is for itself, otherwise just exit. In this way
save CPU usage.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://lore.kernel.org/r/20230104194218.3785229-1-horatiu.vultur@microchip.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

7abd92a5

ethtool: Replace 0-length array with flexible array · b466a25c

Kees Cook authored Jan 05, 2023

Zero-length arrays are deprecated[1]. Replace struct ethtool_rxnfc's
"rule_locs" 0-length array with a flexible array. Detected with GCC 13,
using -fstrict-flex-arrays=3:

net/ethtool/common.c: In function 'ethtool_get_max_rxnfc_channel':
net/ethtool/common.c:558:55: warning: array subscript i is outside array bounds of '__u32[0]' {aka 'unsigned int[]'} [-Warray-bounds=]
  558 |                         .fs.location = info->rule_locs[i],
      |                                        ~~~~~~~~~~~~~~~^~~
In file included from include/linux/ethtool.h:19,
                 from include/uapi/linux/ethtool_netlink.h:12,
                 from include/linux/ethtool_netlink.h:6,
                 from net/ethtool/common.c:3:
include/uapi/linux/ethtool.h:1186:41: note: while referencing
'rule_locs'
 1186 |         __u32                           rule_locs[0];
      |                                         ^~~~~~~~~

[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays

Cc: Andrew Lunn <andrew@lunn.ch>
Cc: kernel test robot <lkp@intel.com>
Cc: Oleksij Rempel <linux@rempel-privat.de>
Cc: Sean Anderson <sean.anderson@seco.com>
Cc: Alexandru Tachici <alexandru.tachici@analog.com>
Cc: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20230106042844.give.885-kees@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

b466a25c

net: ipv6: rpl_iptunnel: Replace 0-length arrays with flexible arrays · e8d283b6

Kees Cook authored Jan 05, 2023

Zero-length arrays are deprecated[1]. Replace struct ipv6_rpl_sr_hdr's
"segments" union of 0-length arrays with flexible arrays. Detected with
GCC 13, using -fstrict-flex-arrays=3:

In function 'rpl_validate_srh',
    inlined from 'rpl_build_state' at ../net/ipv6/rpl_iptunnel.c:96:7:
../net/ipv6/rpl_iptunnel.c:60:28: warning: array subscript <unknown> is outside array bounds of 'struct in6_addr[0]' [-Warray-bounds=]
   60 |         if (ipv6_addr_type(&srh->rpl_segaddr[srh->segments_left - 1]) &
      |                            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from ../include/net/rpl.h:12,
                 from ../net/ipv6/rpl_iptunnel.c:13:
../include/uapi/linux/rpl.h: In function 'rpl_build_state':
../include/uapi/linux/rpl.h:40:33: note: while referencing 'addr'
   40 |                 struct in6_addr addr[0];
      |                                 ^~~~

[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays

Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230105221533.never.711-kees@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

e8d283b6

ipv6: ioam: Replace 0-length array with flexible array · 0b5dfa35

Kees Cook authored Jan 05, 2023

Zero-length arrays are deprecated[1]. Replace struct ioam6_trace_hdr's
"data" 0-length array with a flexible array. Detected with GCC 13,
using -fstrict-flex-arrays=3:

net/ipv6/ioam6_iptunnel.c: In function 'ioam6_build_state':
net/ipv6/ioam6_iptunnel.c:194:37: warning: array subscript <unknown> is outside array bounds of '__u8[0]' {aka 'unsigned char[]'} [-Warray-bounds=]
  194 |                 tuninfo->traceh.data[trace->remlen * 4] = IPV6_TLV_PADN;
      |                 ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~
In file included from include/linux/ioam6.h:11,
                 from net/ipv6/ioam6_iptunnel.c:13:
include/uapi/linux/ioam6.h:130:17: note: while referencing 'data'
  130 |         __u8    data[0];
      |                 ^~~~

[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arraysSigned-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Justin Iurman <justin.iurman@uliege.be>
Tested-by: Justin Iurman <justin.iurman@uliege.be>
Link: https://lore.kernel.org/r/20230105222115.never.661-kees@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

0b5dfa35

06 Jan, 2023 15 commits

Merge branch 'devlink-unregister' · 6bd4755c

David S. Miller authored Jan 06, 2023

Jakub Kicinski says:

====================
devlink: remove the wait-for-references on unregister

Move the registration and unregistration of the devlink instances
under their instance locks. Don't perform the netdev-style wait
for all references when unregistering the instance.

Instead the devlink instance refcount will only ensure that
the memory of the instance is not freed. All places which acquire
access to devlink instances via a reference must check that the
instance is still registered under the instance lock.

This fixes the problem of the netdev code accessing devlink
instances before they are registered.

RFC: https://lore.kernel.org/all/20221217011953.152487-1-kuba@kernel.org/
 - rewrite the cover letter
 - rewrite the commit message for patch 1
 - un-export and rename devl_is_alive
 - squash the netdevsim patches
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

6bd4755c

netdevsim: move devlink registration under the instance lock · 82a3aef2

Jakub Kicinski authored Jan 05, 2023

To prevent races with netdev code accessing free devlink instances
move the registration under the devlink instance lock.
Core now waits for the instance to be registered before accessing it.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

82a3aef2

netdevsim: rename a label · 5c5ea1d0

Jakub Kicinski authored Jan 05, 2023

err_dl_unregister should unregister the devlink instance.
Looks like renaming it was missed in one of the reshufflings.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

5c5ea1d0

devlink: allow registering parameters after the instance · 1d18bb1a

Jakub Kicinski authored Jan 05, 2023

It's most natural to register the instance first and then its
subobjects. Now that we can use the instance lock to protect
the atomicity of all init - it should also be safe.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

1d18bb1a

devlink: don't require setting features before registration · 6ef8f7da

Jakub Kicinski authored Jan 05, 2023

Requiring devlink_set_features() to be run before devlink is
registered is overzealous. devlink_set_features() itself is
a leftover from old workarounds which were trying to prevent
initiating reload before probe was complete.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6ef8f7da

devlink: remove the registration guarantee of references · 9053637e

Jakub Kicinski authored Jan 05, 2023

The objective of exposing the devlink instance locks to
drivers was to let them use these locks to prevent user space
from accessing the device before it's fully initialized.
This is difficult because devlink_unregister() waits for all
references to be released, meaning that devlink_unregister()
can't itself be called under the instance lock.

To avoid this issue devlink_register() was moved after subobject
registration a while ago. Unfortunately the netdev paths get
a hold of the devlink instances _before_ they are registered.
Ideally netdev should wait for devlink init to finish (synchronizing
on the instance lock). This can't work because we don't know if the
instance will _ever_ be registered (in case of failures it may not).
The other option of returning an error until devlink_register()
is called is unappealing (user space would get a notification
netdev exist but would have to wait arbitrary amount of time
before accessing some of its attributes).

Weaken the guarantees of the devlink references.

Holding a reference will now only guarantee that the memory
of the object is around. Another way of looking at it is that
the reference now protects the object not its "registered" status.
Use devlink instance lock to synchronize unregistration.

This implies that releasing of the "main" reference of the devlink
instance moves from devlink_unregister() to devlink_free().
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9053637e

devlink: always check if the devlink instance is registered · ed539ba6

Jakub Kicinski authored Jan 05, 2023

Always check under the instance lock whether the devlink instance
is still / already registered.

This is a no-op for the most part, as the unregistration path currently
waits for all references. On the init path, however, we may temporarily
open up a race with netdev code, if netdevs are registered before the
devlink instance. This is temporary, the next change fixes it, and this
commit has been split out for the ease of review.

Note that in case of iterating over sub-objects which have their
own lock (regions and line cards) we assume an implicit dependency
between those objects existing and devlink unregistration.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ed539ba6

devlink: protect devlink->dev by the instance lock · 870c7ad4

Jakub Kicinski authored Jan 05, 2023

devlink->dev is assumed to be always valid as long as any
outstanding reference to the devlink instance exists.

In prep for weakening of the references take the instance lock.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

870c7ad4

devlink: update the code in netns move to latest helpers · 7a54a519

Jakub Kicinski authored Jan 05, 2023

devlink_pernet_pre_exit() is the only obvious place which takes
the instance lock without using the devl_ helpers. Update the code
and move the error print after releasing the reference
(having unlock and put together feels slightly idiomatic).
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

7a54a519

devlink: bump the instance index directly when iterating · d7727819

Jakub Kicinski authored Jan 05, 2023

xa_find_after() is designed to handle multi-index entries correctly.
If a xarray has two entries one which spans indexes 0-3 and one at
index 4 xa_find_after(0) will return the entry at index 4.

Having to juggle the two callbacks, however, is unnecessary in case
of the devlink xarray, as there is 1:1 relationship with indexes.

Always use xa_find() and increment the index manually.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d7727819

sysctl: expose all net/core sysctls inside netns · 6b754d7b

Mahesh Bandewar authored Jan 04, 2023

All were not visible to the non-priv users inside netns. However,
with 4ecb9009 ("sysctl: allow override of /proc/sys/net with
CAP_NET_ADMIN"), these vars are protected from getting modified.
A proc with capable(CAP_NET_ADMIN) can change the values so
not having them visible inside netns is just causing nuisance to
process that check certain values (e.g. net.core.somaxconn) and
see different behavior in root-netns vs. other-netns

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6b754d7b

Merge branch 'devlink-code-split-and-structured-instance-walk' · 3d759e9e

Jakub Kicinski authored Jan 05, 2023

Jakub Kicinski says:

====================
devlink: code split and structured instance walk

Split devlink.c into a handful of files, trying to keep the "core"
code away from all the command-specific implementations.
The core code has been quite scattered until now. Going forward we can
consider using a source file per-subobject, I think that it's quite
beneficial to newcomers (based on relative ease with which folks
contribute to ethtool vs devlink). But this series doesn't split
everything out, yet - partially due to backporting concerns,
but mostly due to lack of time. Bulk of the netlink command
handling is left in a leftover.c file.

Introduce a context structure for dumps, and use it to store
the devlink instance ID of the last dumped devlink instance.
This means we don't have to restart the walk from 0 each time.

Finally - introduce a "structured walk". A centralized dump handler
in devlink/netlink.c which walks the devlink instances, deals with
refcounting/locking, simplifying the per-object implementations quite
a bit. Inspired by the ethtool code.

v1: https://lore.kernel.org/all/20230104041636.226398-1-kuba@kernel.org/
RFC: https://lore.kernel.org/all/20221215020155.1619839-1-kuba@kernel.org/
====================

Link: https://lore.kernel.org/r/20230105040531.353563-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

3d759e9e

devlink: convert remaining dumps to the by-instance scheme · 5ce76d78

Jakub Kicinski authored Jan 04, 2023

Soon we'll have to check if a devlink instance is alive after
locking it. Convert to the by-instance dumping scheme to make
refactoring easier.

Most of the subobject code no longer has to worry about any devlink
locking / lifetime rules (the only ones that still do are the two subject
types which stubbornly use their own locking). Both dump and do callbacks
are given a devlink instance which is already locked and good-to-access
(do from the .pre_doit handler, dump from the new dump indirection).

Note that we'll now check presence of an op (e.g. for sb_pool_get)
under the devlink instance lock, that will soon be necessary anyway,
because we don't hold refs on the driver modules so the memory
in which ops live may be gone for a dead instance, after upcoming
locking changes.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

5ce76d78

devlink: add by-instance dump infra · 07f3af66

Jakub Kicinski authored Jan 04, 2023

Most dumpit implementations walk the devlink instances.
This requires careful lock taking and reference dropping.
Factor the loop out and provide just a callback to handle
a single instance dump.

Convert one user as an example, other users converted
in the next change.

Slightly inspired by ethtool netlink code.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

07f3af66

devlink: uniformly take the devlink instance lock in the dump loop · e4d5015b

Jakub Kicinski authored Jan 04, 2023

Move the lock taking out of devlink_nl_cmd_region_get_devlink_dumpit().
This way all dumps will take the instance lock in the main iteration
loop directly, making refactoring and reading the code easier.
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

e4d5015b