Commits · 5f3e915c36d59c94a917e207df4361f23d9c821d · Kirill Smelkov / linux

01 Dec, 2020 15 commits

Merge branch 'mptcp-avoid-workqueue-usage-for-data' · 5f3e915c

Jakub Kicinski authored Nov 30, 2020

Paolo Abeni says:

====================
mptcp: avoid workqueue usage for data

The current locking schema used to protect the MPTCP data-path
requires the usage of the MPTCP workqueue to process the incoming
data, depending on trylock result.

The above poses scalability limits and introduces random delays
in MPTCP-level acks.

With this series we use a single spinlock to protect the MPTCP
data-path, removing the need for workqueue and delayed ack usage.

This additionally reduces the number of atomic operations required
per packet and cleans-up considerably the poll/wake-up code.
====================

Link: https://lore.kernel.org/r/cover.1606413118.git.pabeni@redhat.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

5f3e915c

mptcp: use mptcp release_cb for delayed tasks · 6e628cd3

Paolo Abeni authored Nov 27, 2020

We have some tasks triggered by the subflow receive path
which require to access the msk socket status, specifically:
mptcp_clean_una() and mptcp_push_pending()

We have almost everything in place to defer to the msk
release_cb such tasks when the msk sock is owned.

Since the worker is no more used to clean the acked data,
for fallback sockets we need to explicitly flush them.

As an added bonus we can move the wake-up code in __mptcp_clean_una(),
simplify a lot mptcp_poll() and move the timer update under
the data lock.

The worker is now used only to process and send DATA_FIN
packets and do the mptcp-level retransmissions.
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

6e628cd3

mptcp: avoid a few atomic ops in the rx path · 7439d687

Paolo Abeni authored Nov 27, 2020

Extending the data_lock scope in mptcp_incoming_option
we can use that to protect both snd_una and wnd_end.
In the typical case, we will have a single atomic op instead of 2
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

7439d687

mptcp: allocate TX skbs in msk context · 724cfd2e

Paolo Abeni authored Nov 27, 2020

Move the TX skbs allocation in mptcp_sendmsg() scope,
and tentatively pre-allocate a skbs number proportional
to the sendmsg() length.

Use the ssk tx skb cache to prevent the subflow allocation.

This allows removing the msk skb extension cache and will
make possible the later patches.
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

724cfd2e

mptcp: protect the rx path with the msk socket spinlock · 87952603

Paolo Abeni authored Nov 27, 2020

Such spinlock is currently used only to protect the 'owned'
flag inside the socket lock itself. With this patch, we extend
its scope to protect the whole msk receive path and
sk_forward_memory.

Given the above, we can always move data into the msk receive
queue (and OoO queue) from the subflow.

We leverage the previous commit, so that we need to acquire the
spinlock in the tx path only when moving fwd memory.

recvmsg() must now explicitly acquire the socket spinlock
when moving skbs out of sk_receive_queue. To reduce the number of
lock operations required we use a second rx queue and splice the
first into the latter in mptcp_lock_sock(). Additionally rmem
allocated memory is bulk-freed via release_cb()
Acked-by: Florian Westphal <fw@strlen.de>
Co-developed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

87952603

mptcp: implement wmem reservation · e93da928

Paolo Abeni authored Nov 27, 2020

This leverages the previous commit to reserve the wmem
required for the sendmsg() operation when the msk socket
lock is first acquired.
Some heuristics are used to get a reasonable [over] estimation of
the whole memory required. If we can't forward alloc such amount
fallback to a reasonable small chunk, otherwise enter the wait
for memory path.

When sendmsg() needs more memory it looks at wmem_reserved
first and if that is exhausted, move more space from
sk_forward_alloc.

The reserved memory is not persistent and is released at the
next socket unlock via the release_cb().

Overall this will simplify the next patch.
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

e93da928

mptcp: open code mptcp variant for lock_sock · ad80b0fc

Paolo Abeni authored Nov 27, 2020

This allows invoking an additional callback under the
socket spin lock.

Will be used by the next patches to avoid additional
spin lock contention.
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ad80b0fc

Merge branch 'dpaa_eth-add-xdp-support' · be572424

Jakub Kicinski authored Nov 30, 2020

Camelia Groza says:

====================
dpaa_eth: add XDP support

Enable XDP support for the QorIQ DPAA1 platforms.

Implement all the current actions (DROP, ABORTED, PASS, TX, REDIRECT). No
Tx batching is added at this time.

Additional XDP_PACKET_HEADROOM bytes are reserved in each frame's headroom.

After transmit, a reference to the xdp_frame is saved in the buffer for
clean-up on confirmation in a newly created structure for software
annotations. DPAA_TX_PRIV_DATA_SIZE bytes are reserved in the buffer for
storing this structure and the XDP program is restricted from accessing
them.

The driver shares the egress frame queues used for XDP with the network
stack. The DPAA driver is a LLTX driver so no explicit locking is required
on transmission.
====================

Link: https://lore.kernel.org/r/cover.1606322126.git.camelia.groza@nxp.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

be572424

dpaa_eth: implement the A050385 erratum workaround for XDP · ae680bcb

Camelia Groza authored Nov 25, 2020

For XDP TX, even tough we start out with correctly aligned buffers, the
XDP program might change the data's alignment. For REDIRECT, we have no
control over the alignment either.

Create a new workaround for xdp_frame structures to verify the erratum
conditions and move the data to a fresh buffer if necessary. Create a new
xdp_frame for managing the new buffer and free the old one using the XDP
API.

Due to alignment constraints, all frames have a 256 byte headroom that
is offered fully to XDP under the erratum. If the XDP program uses all
of it, the data needs to be move to make room for the xdpf backpointer.

Disable the metadata support since the information can be lost.
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ae680bcb

dpaa_eth: rename current skb A050385 erratum workaround · d7af0448

Camelia Groza authored Nov 25, 2020

Explicitly point that the current workaround addresses skbs. This change is
in preparation for adding a workaround for XDP scenarios.
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

d7af0448

dpaa_eth: add XDP_REDIRECT support · a1e031ff

Camelia Groza authored Nov 25, 2020

After transmission, the frame is returned on confirmation queues for
cleanup. For this, store a backpointer to the xdp_frame in the private
reserved area at the start of the TX buffer.

No TX batching support is implemented at this time.
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

a1e031ff

dpaa_eth: add XDP_TX support · d57e57d0

Camelia Groza authored Nov 25, 2020

Use an xdp_frame structure for managing the frame. Store a backpointer to
the structure at the start of the buffer before enqueueing for cleanup
on TX confirmation. Reserve DPAA_TX_PRIV_DATA_SIZE bytes from the frame
size shared with the XDP program for this purpose. Use the XDP
API for freeing the buffer when it returns to the driver on the TX
confirmation path.

The frame queues are shared with the netstack. The DPAA driver is a LLTX
driver so no explicit locking is required on transmission.

This approach will be reused for XDP REDIRECT.
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

d57e57d0

dpaa_eth: limit the possible MTU range when XDP is enabled · 828eadba

Camelia Groza authored Nov 25, 2020

Implement the ndo_change_mtu callback to prevent users from setting an
MTU that would permit processing of S/G frames. The maximum MTU size
is dependent on the buffer size.
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

828eadba

dpaa_eth: add basic XDP support · 86c0c196

Camelia Groza authored Nov 25, 2020

Implement the XDP_DROP and XDP_PASS actions.

Avoid draining and reconfiguring the buffer pool at each XDP
setup/teardown by increasing the frame headroom and reserving
XDP_PACKET_HEADROOM bytes from the start. Since we always reserve an
entire page per buffer, this change only impacts Jumbo frame scenarios
where the maximum linear frame size is reduced by 256 bytes. Multi
buffer Scatter/Gather frames are now used instead in these scenarios.

Allow XDP programs to access the entire buffer.

The data in the received frame's headroom can be overwritten by the XDP
program. Extract the relevant fields from the headroom while they are
still available, before running the XDP program.

Since the headroom might be resized before the frame is passed up to the
stack, remove the check for a fixed headroom value when building an skb.

Allow the meta data to be updated and pass the information up the stack.

Scatter/Gather frames are dropped when XDP is enabled.
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

86c0c196

dpaa_eth: add struct for software backpointers · fb9afd96

Camelia Groza authored Nov 25, 2020

We maintain an skb backpointer in the software annotations area of Tx
frames. Introduce a structure for explicit handling.
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

fb9afd96

28 Nov, 2020 25 commits

Merge branch 'net-ipa-start-adding-ipa-v4-5-support' · e71d2b95

Jakub Kicinski authored Nov 28, 2020

Alex Elder says:

====================
net: ipa: start adding IPA v4.5 support

This series starts updating the IPA code to support IPA hardware
version 4.5.

The first patch fixes a problem found while preparing these updates.
Testing shows the code works with or without the change, and with
the fix the code matches "downstream" Qualcomm code.

The second patch updates the definitions for IPA register offsets
and field masks to reflect the changes that come with IPA v4.5.  A
few register updates have been deferred until later, because making
use of them involves some nontrivial code updates.

One type of change that IPA v4.5 brings is expanding the range of
certain configuration values.  High-order bits are added in a few
cases, and the third patch implements the code changes necessary to
use those newly available bits.

The fourth patch implements several fairly minor changes to the code
required for IPA v4.5 support.

The last two patches implement changes to the GSI registers used for
IPA.  Almost none of the registers change, but the range of memory
in which most of the GSI registers is located is shifted by a fixed
amount.  The fifth patch updates the GSI register definitions, and
the last patch implements the memory shift for IPA v4.5.
====================

Link: https://lore.kernel.org/r/20201125204522.5884-1-elder@linaro.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

e71d2b95

net: ipa: adjust GSI register addresses · cdeee49f

Alex Elder authored Nov 25, 2020

The offsets for almost all GSI registers we use have different
offsets starting at IPA version 4.5.  Only two registers remain
in their original location.

In a way though, the new register locations are not *that*
different.  The entire group of affected registers has simply
been shifted down in memory by a fixed amount (0xd000).  So for
example, the channel context 0 register that has a base offset of
0x0001c000 for "older" hardware now has a base offset of 0x0000f000.

This patch aims to add support for IPA v4.5 registers at their new
offets in a way that minimizes the amount of code that needs to
change.  It is not ideal, but it avoids the need to maintain
a nearly complete set of additional register offset definitions.

The approach takes advantage of the fact that when accessing GSI
registers we do not access any of memory at lower end of the "gsi"
memory range (with two exceptions already noted).  In particular,
we do not access anything within the bottom 0xd000 bytes of the
GSI memory range.

For IPA version 4.5, after we map the GSI memory, we adjust the
virtual memory pointer downward by the fixed amount (0xd000).
That way, register accesses using the offsets defined by the
existing GSI_REG_*() macros will resolve to the proper locations
for IPA version 4.5.

The two registers *not* affected by this offset are accessed only
in gsi_irq_setup().  There, for IPA version 4.5, we undo the general
register adjustment by adding the fixed amount back to the virtual
address to access these registers.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

cdeee49f

net: ipa: update gsi registers for IPA v4.5 · b0b6f0dd

Alex Elder authored Nov 25, 2020

Very few GSI register definitions change for IPA v4.5, however
as a group their position in memory shifts a constant amount
(handled by the next commit).

Add definitions and update comments to the set of GSI registers to
support changes that come with IPA v4.5.

Update the logic in gsi_channel_program() to accommodate the new
(expanded) PREFETCH_MODE field in the CH_C_QOS register.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

b0b6f0dd

net: ipa: add support to code for IPA v4.5 · 8bfc4e21

Alex Elder authored Nov 25, 2020

Update the IPA code to make use of the updated IPA v4.5 register
definitions.  Generally what this patch does is, if IPA v4.5
hardware is in use:
  - Ensure new registers or fields in IPA v4.5 are updated where
    required
  - Ensure registers or fields not supported in IPA v4.5 are not
    examined when read, or are set to 0 when written
It does this while preserving the existing functionality for IPA
versions lower than v4.5.

The values to program for QSB_MAX_READS and QSB_MAX_WRITES and the
source and destination resource counts are updated to be correct for
all versions through v4.5 as well.

Note that IPA_RESOURCE_GROUP_SRC_MAX and IPA_RESOURCE_GROUP_DST_MAX
already reflect that 5 is an acceptable number of resources (which
IPA v4.5 implements).
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

8bfc4e21

net: ipa: add new most-significant bits to registers · 1af15c2a

Alex Elder authored Nov 25, 2020

IPA v4.5 adds a few fields to the endpoint header and extended
header configuration registers that represent new high-order bits
for certain offsets and sizes.  Add code to incorporate these upper
bits into the registers for IPA v4.5.

This includes creating ipa_header_size_encoded(), which handles
encoding the metadata offset field for use in the ENDP_INIT_HDR
register in a way appropriate for the hardware version.  This and
ipa_metadata_offset_encoded() ensure the mask argument passed to
u32_encode_bits() is constant.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

1af15c2a

net: ipa: update IPA registers for IPA v4.5 · 5b6cd69e

Alex Elder authored Nov 25, 2020

Update "ipa_reg.h" so that register definitions support IPA hardware
version 4.5, in addition to versions 3.5.1 through v4.2.  Most of
the register definitions are the same, but in some cases fields are
added, changed, or eliminated.

Updates for a few IPA v4.5 registers are more complex, and adding
those definition will be deferred to separate patches.  This patch
only updates the register offset and field definitions, and adds
informational comments.

The only code change avoids accessing the backward compatibility
register for IPA version 4.5 in ipa_hardware_config().  Other IPA
v4.5-specific code changes will come later.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

5b6cd69e

net: ipa: reverse logic on escape buffer use · 9f848198

Alex Elder authored Nov 25, 2020

Starting with IPA v4.2 there is a GSI channel option to use an
"escape buffer" instead of prefetch buffers.  This should be used
for all channels *except* the AP command TX channel.  The logic
that implements this has it backwards; fix this bug.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

9f848198

net/sched: act_ct: enable stats for HW offloaded entries · 3567e233

Marcelo Ricardo Leitner authored Nov 26, 2020

By setting NF_FLOWTABLE_COUNTER. Otherwise, the updates added by
commit ef803b3c ("netfilter: flowtable: add counter support in HW
offload") are not effective when using act_ct.

While at it, now that we have the flag set, protect the call to
nf_ct_acct_update() by commit beb97d3a ("net/sched: act_ct: update
nf_conn_acct for act_ct SW offload in flowtable") with the check on
NF_FLOWTABLE_COUNTER, as also done on other places.

Note that this shouldn't impact performance as these stats are only
enabled when net.netfilter.nf_conntrack_acct is enabled.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: wenxu <wenxu@ucloud.cn>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Link: https://lore.kernel.org/r/481a65741261fd81b0a0813e698af163477467ec.1606415787.git.marcelo.leitner@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

3567e233

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 5c39f26e

Jakub Kicinski authored Nov 27, 2020

Trivial conflict in CAN, keep the net-next + the byteswap wrapper.

Conflicts:
	drivers/net/can/usb/gs_usb.c
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

5c39f26e

Merge branch 'tipc-some-minor-improvements' · 6375da9d

Jakub Kicinski authored Nov 27, 2020

Jon Maloy says:

====================
tipc: some minor improvements

We add some improvements that will be useful in future commits.
====================

Link: https://lore.kernel.org/r/20201125182915.711370-1-jmaloy@redhat.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

6375da9d

tipc: update address terminology in code · b6f88d9c

Jon Maloy authored Nov 25, 2020

We update the terminology in the code so that deprecated structure
names and macros are replaced with those currently recommended in
the user API.

struct tipc_portid   -> struct tipc_socket_addr
struct tipc_name     -> struct tipc_service_addr
struct tipc_name_seq -> struct tipc_service_range

TIPC_ADDR_ID       -> TIPC_SOCKET_ADDR
TIPC_ADDR_NAME     -> TIPC_SERVICE_ADDR
TIPC_ADDR_NAMESEQ  -> TIPC_SERVICE_RANGE
TIPC_CFG_SRV       -> TIPC_NODE_STATE
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

b6f88d9c

tipc: make node number calculation reproducible · 5f75e0a0

Jon Maloy authored Nov 25, 2020

The 32-bit node number, aka node hash or node address, is calculated
based on the 128-bit node identity when it is not set explicitly by
the user. In future commits we will need to perform this hash operation
on peer nodes while feeling safe that we obtain the same result.

We do this by interpreting the initial hash as a network byte order
number. Whenever we need to use the number locally on a node
we must therefore translate it to host byte order to obtain an
architecure independent result.

Furthermore, given the context where we use this number, we must not
allow it to be zero unless the node identity also is zero. Hence, in
the rare cases when the xor-ed hash value may end up as zero we replace
it with a fix number, knowing that the code anyway is capable of
handling hash collisions.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

5f75e0a0

tipc: refactor tipc_sk_bind() function · 60c102ee

Jon Maloy authored Nov 25, 2020

We refactor the tipc_sk_bind() function, so that the lock handling
is handled separately from the logics. We also move some sanity
tests to earlier in the call chain, to the function tipc_bind().
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

60c102ee

net: phy: micrel: fix interrupt handling · fff4c746

Oleksij Rempel authored Nov 27, 2020

After migration to the shared interrupt support, the KSZ8031 PHY with
enabled interrupt support was not able to notify about link status
change.

Fixes: 59ca4e58 ("net: phy: micrel: implement generic .handle_interrupt() callback")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20201127123621.31234-1-o.rempel@pengutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

fff4c746

Merge branch 'net-x25-netdev-event-handling' · 35c58418

Jakub Kicinski authored Nov 27, 2020

Martin Schiller says:

====================
net/x25: netdev event handling
====================

Link: https://lore.kernel.org/r/20201126063557.1283-1-ms@dev.tdt.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

35c58418

net/x25: remove x25_kill_by_device() · 139d6eb1

Martin Schiller authored Nov 26, 2020

Remove obsolete function x25_kill_by_device(). It's not used any more.
Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

139d6eb1

net/x25: fix restart request/confirm handling · d023b2b9

Martin Schiller authored Nov 26, 2020

We have to take the actual link state into account to handle
restart requests/confirms well.
Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

d023b2b9

net/lapb: fix t1 timer handling for LAPB_STATE_0 · 62480b99

Martin Schiller authored Nov 26, 2020

1. DTE interface changes immediately to LAPB_STATE_1 and start sending
   SABM(E).

2. DCE interface sends N2-times DM and changes to LAPB_STATE_1
   afterwards if there is no response in the meantime.
Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

62480b99

net/lapb: support netdev events · a4989fa9

Martin Schiller authored Nov 26, 2020

This patch allows layer2 (LAPB) to react to netdev events itself and
avoids the detour via layer3 (X.25).

1. Establish layer2 on NETDEV_UP events, if the carrier is already up.

2. Call lapb_disconnect_request() on NETDEV_GOING_DOWN events to signal
   the peer that the connection will go down.
   (Only when the carrier is up.)

3. When a NETDEV_DOWN event occur, clear all queues, enter state
   LAPB_STATE_0 and stop all timers.

4. The NETDEV_CHANGE event makes it possible to handle carrier loss and
   detection.

   In case of Carrier Loss, clear all queues, enter state LAPB_STATE_0
   and stop all timers.

   In case of Carrier Detection, we start timer t1 on a DCE interface,
   and on a DTE interface we change to state LAPB_STATE_1 and start
   sending SABM(E).
Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Acked-by: Xie He <xie.he.0141@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

a4989fa9

net/x25: handle additional netdev events · 7eed751b

Martin Schiller authored Nov 26, 2020

1. Add / remove x25_link_device by NETDEV_REGISTER/UNREGISTER and also
   by NETDEV_POST_TYPE_CHANGE/NETDEV_PRE_TYPE_CHANGE.

   This change is needed so that the x25_neigh struct for an interface
   is already created when it shows up and is kept independently if the
   interface goes UP or DOWN.

   This is used in an upcomming commit, where x25 params of an neighbour
   will get configurable through ioctls.

2. NETDEV_CHANGE event makes it possible to handle carrier loss and
   detection. If carrier is lost, clean up everything related to this
   neighbour by calling x25_link_terminated().

3. Also call x25_link_terminated() for NETDEV_DOWN events and remove the
   call to x25_clear_forward_by_dev() in x25_route_device_down(), as
   this is already called by x25_kill_by_neigh() which gets called by
   x25_link_terminated().

4. Do nothing for NETDEV_UP and NETDEV_GOING_DOWN events, as these will
   be handled in layer 2 (LAPB) and layer3 (X.25) will be informed by
   layer2 when layer2 link is established and layer3 link should be
   initiated.
Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

7eed751b

Merge branch 'mlxsw-update-adjacency-index-more-efficiently' · f5d709ff

Jakub Kicinski authored Nov 27, 2020

Ido Schimmel says:

====================
mlxsw: Update adjacency index more efficiently

The device supports an operation that allows the driver to issue one
request to update the adjacency index for all the routes in a given
virtual router (VR) from old index and size to new ones. This is useful
in case the configuration of a certain nexthop group is updated and its
adjacency index changes.

Currently, the driver does not use this operation in an efficient
manner. It iterates over all the routes using the nexthop group and
issues an update request for the VR if it is not the same as the
previous VR.

Instead, this patch set tracks the VRs in which the nexthop group is
used and issues one request for each VR.

Example:

8k IPv6 routes were added in an alternating manner to two VRFs. All the
routes are using the same nexthop object ('nhid 1').

Before:

 Performance counter stats for 'ip nexthop replace id 1 via 2001:db8:1::2 dev swp3':

            16,385      devlink:devlink_hwmsg

       4.255933213 seconds time elapsed

       0.000000000 seconds user
       0.666923000 seconds sys

Number of EMAD transactions corresponds to number of routes using the
nexthop group.

After:

 Performance counter stats for 'ip nexthop replace id 1 via 2001:db8:1::2 dev swp3':

                 3      devlink:devlink_hwmsg

       0.077655094 seconds time elapsed

       0.000000000 seconds user
       0.076698000 seconds sys

Number of EMAD transactions corresponds to number of VRFs / VRs.

Patch set overview:

Patch #1 is a fix for a bug introduced in previous submission. Detected
by Coverity.

Patches #2 and #3 are preparations.

Patch #4 tracks the VRs a nexthop group is member of.

Patch #5 uses the membership tracking from the previous patch to issue
one update request per each VR.
====================

Link: https://lore.kernel.org/r/20201125193505.1052466-1-idosch@idosch.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

f5d709ff

mlxsw: spectrum_router: Update adjacency index more efficiently · ff47fa13

Ido Schimmel authored Nov 25, 2020

The device supports an operation that allows the driver to issue one
request to update the adjacency index for all the routes in a given
virtual router (VR) from old index and size to new ones. This is useful
in case the configuration of a certain nexthop group is updated and its
adjacency index changes.

Currently, the driver does not use this operation in an efficient
manner. It iterates over all the routes using the nexthop group and
issues an update request for the VR if it is not the same as the
previous VR.

Instead, use the VR tracking added in the previous patch to update the
adjacency index once for each VR currently using the nexthop group.

Example:

8k IPv6 routes were added in an alternating manner to two VRFs. All the
routes are using the same nexthop object ('nhid 1').

Before:

# perf stat -e devlink:devlink_hwmsg --filter='incoming==0' -- ip nexthop replace id 1 via 2001:db8:1::2 dev swp3

 Performance counter stats for 'ip nexthop replace id 1 via 2001:db8:1::2 dev swp3':

            16,385      devlink:devlink_hwmsg

       4.255933213 seconds time elapsed

       0.000000000 seconds user
       0.666923000 seconds sys

Number of EMAD transactions corresponds to number of routes using the
nexthop group.

After:

# perf stat -e devlink:devlink_hwmsg --filter='incoming==0' -- ip nexthop replace id 1 via 2001:db8:1::2 dev swp3

 Performance counter stats for 'ip nexthop replace id 1 via 2001:db8:1::2 dev swp3':

                 3      devlink:devlink_hwmsg

       0.077655094 seconds time elapsed

       0.000000000 seconds user
       0.076698000 seconds sys

Number of EMAD transactions corresponds to number of VRFs / VRs.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ff47fa13

mlxsw: spectrum_router: Track nexthop group virtual router membership · d2141a42

Ido Schimmel authored Nov 25, 2020

For each nexthop group, track in which virtual routers (VRs) the group is
used. This is going to be used by the next patch to perform a more
efficient adjacency index update whenever the group's adjacency index
changes.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

d2141a42

mlxsw: spectrum_router: Rollback virtual router adjacency pointer update · 9a4ab10c

Ido Schimmel authored Nov 25, 2020

In the rare case where the adjacency pointer cannot be updated for a
given virtual router, rollback the operation so that virtual routers
that are already using the new index will use the old one again.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

9a4ab10c

mlxsw: spectrum_router: Pass virtual router parameters directly instead of pointer · 40e4413d

Ido Schimmel authored Nov 25, 2020

mlxsw_sp_adj_index_mass_update_vr() only needs the virtual router's
identifier and protocol, so pass them directly. In a subsequent patch
the caller will not have access to the pointer.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

40e4413d