Commits · 720ccd9a728506ca4721b18a22a2157a9d48ed60 · Kirill Smelkov / linux

29 Jan, 2021 33 commits

nexthop: Assert the invariant that a NH group is of only one type · 720ccd9a

Petr Machata authored Jan 28, 2021

Most of the code that deals with nexthop groups relies on the fact that the
group is of exactly one well-known type. Currently there is only one type,
"mpath", but as more next-hop group types come, it becomes desirable to
have a central place where the setting is validated. Introduce such place
into nexthop_create_group(), such that the check is done before the code
that relies on that invariant is invoked.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

720ccd9a

nexthop: Introduce to struct nh_grp_entry a per-type union · b9bae61b

Petr Machata authored Jan 28, 2021

The values that a next-hop group needs to keep track of depend on the group
type. Introduce a union to separate fields specific to the mpath groups
from fields specific to other group types.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

b9bae61b

nexthop: Dispatch nexthop_select_path() by group type · 79bc55e3

Petr Machata authored Jan 28, 2021

The logic for selecting path depends on the next-hop group type. Adapt the
nexthop_select_path() to dispatch according to the group type.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

79bc55e3

nexthop: Rename nexthop_free_mpath · 5d1f0f09

David Ahern authored Jan 28, 2021

nexthop_free_mpath really should be nexthop_free_group. Rename it.
Signed-off-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

5d1f0f09

Merge branch 'net-iucv-updates-2021-01-28' · 4915a404

Jakub Kicinski authored Jan 28, 2021

Julian Wiedmann says:

====================
net/iucv: updates 2021-01-28

This reworks & simplifies the TX notification path in af_iucv, so that we
can send out SG skbs over TRANS_HIPER sockets. Also remove a noisy
WARN_ONCE() in the RX path.
====================

Link: https://lore.kernel.org/r/20210128114108.39409-1-jwi@linux.ibm.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

4915a404

net/af_iucv: build SG skbs for TRANS_HIPER sockets · 2c3b4456

Julian Wiedmann authored Jan 28, 2021

The TX path no longer falls apart when some of its SG skbs are later
linearized by lower layers of the stack. So enable the use of SG skbs
in iucv_sock_sendmsg() again.

This effectively reverts
commit dc5367bc ("net/af_iucv: don't use paged skbs for TX on HiperSockets").
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

2c3b4456

net/af_iucv: don't track individual TX skbs for TRANS_HIPER sockets · 80bc97aa

Julian Wiedmann authored Jan 28, 2021

Stop maintaining the skb_send_q list for TRANS_HIPER sockets.

Not only is it extra overhead, but keeping around a list of skb clones
means that we later also have to match the ->sk_txnotify() calls
against these clones and free them accordingly.
The current matching logic (comparing the skbs' shinfo location) is
frustratingly fragile, and breaks if the skb's head is mangled in any
sort of way while passing from dev_queue_xmit() to the device's
HW queue.

Also adjust the interface for ->sk_txnotify(), to make clear that we
don't actually care about any skb internals.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

80bc97aa

net/af_iucv: count packets in the xmit path · ef6af7bd

Julian Wiedmann authored Jan 28, 2021

The TX code keeps track of all skbs that are in-flight but haven't
actually been sent out yet. For native IUCV sockets that's not a huge
deal, but with TRANS_HIPER sockets it would be much better if we
didn't need to maintain a list of skb clones.

Note that we actually only care about the _count_ of skbs in this stage
of the TX pipeline. So as prep work for removing the skb tracking on
TRANS_HIPER sockets, keep track of the skb count in a separate variable
and pair any list {enqueue, unlink} with a count {increment, decrement}.

Then replace all occurences where we currently look at the skb list's
fill level.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ef6af7bd

net/af_iucv: don't lookup the socket on TX notification · c464444f

Julian Wiedmann authored Jan 28, 2021

Whoever called iucv_sk(sk)->sk_txnotify() must already know that they're
dealing with an af_iucv socket.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

c464444f

net/af_iucv: remove WARN_ONCE on malformed RX packets · 27e9c1de

Alexander Egorenkov authored Jan 28, 2021

syzbot reported the following finding:

AF_IUCV failed to receive skb, len=0
WARNING: CPU: 0 PID: 522 at net/iucv/af_iucv.c:2039 afiucv_hs_rcv+0x174/0x190 net/iucv/af_iucv.c:2039
CPU: 0 PID: 522 Comm: syz-executor091 Not tainted 5.10.0-rc1-syzkaller-07082-g55027a88ec9f #0
Hardware name: IBM 3906 M04 701 (KVM/Linux)
Call Trace:
 [<00000000b87ea538>] afiucv_hs_rcv+0x178/0x190 net/iucv/af_iucv.c:2039
([<00000000b87ea534>] afiucv_hs_rcv+0x174/0x190 net/iucv/af_iucv.c:2039)
 [<00000000b796533e>] __netif_receive_skb_one_core+0x13e/0x188 net/core/dev.c:5315
 [<00000000b79653ce>] __netif_receive_skb+0x46/0x1c0 net/core/dev.c:5429
 [<00000000b79655fe>] netif_receive_skb_internal+0xb6/0x220 net/core/dev.c:5534
 [<00000000b796ac3a>] netif_receive_skb+0x42/0x318 net/core/dev.c:5593
 [<00000000b6fd45f4>] tun_rx_batched.isra.0+0x6fc/0x860 drivers/net/tun.c:1485
 [<00000000b6fddc4e>] tun_get_user+0x1c26/0x27f0 drivers/net/tun.c:1939
 [<00000000b6fe0f00>] tun_chr_write_iter+0x158/0x248 drivers/net/tun.c:1968
 [<00000000b4f22bfa>] call_write_iter include/linux/fs.h:1887 [inline]
 [<00000000b4f22bfa>] new_sync_write+0x442/0x648 fs/read_write.c:518
 [<00000000b4f238fe>] vfs_write.part.0+0x36e/0x5d8 fs/read_write.c:605
 [<00000000b4f2984e>] vfs_write+0x10e/0x148 fs/read_write.c:615
 [<00000000b4f29d0e>] ksys_write+0x166/0x290 fs/read_write.c:658
 [<00000000b8dc4ab4>] system_call+0xe0/0x28c arch/s390/kernel/entry.S:415
Last Breaking-Event-Address:
 [<00000000b8dc64d4>] __s390_indirect_jump_r14+0x0/0xc

Malformed RX packets shouldn't generate any warnings because
debugging info already flows to dropmon via the kfree_skb().
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

27e9c1de

Merge branch 's390-qeth-updates-2021-01-28' · 14a6daf3

Jakub Kicinski authored Jan 28, 2021

Julian Wiedmann says:

====================
s390/qeth: updates 2021-01-28

Nothing special, mostly fine-tuning and follow-on cleanups for earlier fixes.
====================

Link: https://lore.kernel.org/r/20210128112551.18780-1-jwi@linux.ibm.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

14a6daf3

s390/qeth: don't fake a TX completion interrupt after TX error · d6e51503

Julian Wiedmann authored Jan 28, 2021

When do_qdio() returns with an unexpected error, qeth_flush_buffers()
kicks off a recovery action.

In such a case there's no point in starting TX completion processing,
the device gets torn down anyway. So take a closer look at do_qdio()'s
return value, and skip the TX completion processing accordingly.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

d6e51503

s390/qeth: make cast type selection for af_iucv skbs robust · a667fee1

Julian Wiedmann authored Jan 28, 2021

As part of the TX queue selection for af_iucv skbs,
qeth_l3_get_cast_type_rcu() ends up calling qeth_get_ether_cast_type().
Which is rather fragile, since such skbs don't have a proper ETH header
and we rely on it being zeroed out in the right places. Add a separate
case for ETH_P_AF_IUCV instead that does the right thing.

When later building the HW header for such skbs, don't hard-code the
cast type but follow the same path as for other protocol types. Here
the cast type should naturally come from the skb's queue mapping.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

a667fee1

s390/qeth: pass proto to qeth_l3_get_cast_type() · c61dff3c

Julian Wiedmann authored Jan 28, 2021

qeth_l3_hard_start_xmit() already determined the skb's proto. Avoid
doing so a second time when it calls qeth_l3_get_cast_type().
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

c61dff3c

s390/qeth: remove qeth_get_ip_version() · 17f3a8b5

Julian Wiedmann authored Jan 28, 2021

Replace our home-grown helper with the more robust vlan_get_protocol().
This is pretty much a 1:1 replacement, we just need to pass around a
proper ETH_P_* everyhwere and convert the old value range.

For readability also convert the protocol checks in
qeth_l3_hard_start_xmit() to a switch statement.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

17f3a8b5

s390/qeth: clean up load/remove code for disciplines · ea12f1b3

Julian Wiedmann authored Jan 28, 2021

We have two usage patterns:
1. get & ->setup() a new discipline, or
2. ->remove() & put the currently loaded one.

Add corresponding helpers that hide the internals & error handling.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ea12f1b3

Merge branch 'net-ipa-hardware-pipeline-cleanup-fixes' · 699e4bc8

Jakub Kicinski authored Jan 28, 2021

Alex Elder says:

====================
net: ipa: hardware pipeline cleanup fixes

There is a procedure currently referred to as a "tag process" that
is performed to clear the IPA hardware pipeline--either at the time
of a modem crash, or when suspending modem GSI channels.

One thing done in this procedure is issuing a command that sends a
data packet originating from the AP->command TX endpoint, destined
for the AP<-LAN RX (default) endpoint.  And although we currently
wait for the send to complete, we do *not* wait for the packet to be
received.  But the pipeline can't be assumed clear until we have
actually received this packet.

This series addresses this by detecting when the pipeline-clearing
packet has been received, and using a completion to allow a waiter
to know when that has happened.  This uses the IPA status capability
(which sends an extra status buffer for certain packets).  It also
uses the ability to supply a "tag" with a packet, which will be
delivered with the packet's status buffer.  We tag the data packet
that's sent to clear the pipeline, and use the receipt of a status
buffer associated with a tagged packet to determine when that packet
has arrived.

"Tag status" just desribes one aspect of this procedure, so some
symbols are renamed to be more like "pipeline clear" so they better
describe the larger purpose.  Finally, two functions used in this
code don't use their arguments, so those arguments are removed.
====================

Link: https://lore.kernel.org/r/20210126185703.29087-1-elder@linaro.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

699e4bc8

net: ipa: don't pass size to ipa_cmd_transfer_add() · 070740d3

Alex Elder authored Jan 26, 2021

The only time we transfer data (rather than issuing a command) out
of the AP->command TX endpoint is when we're clearing the hardware
pipeline.  All that's needed is a "small" data buffer, and its
contents aren't even important.

For convenience, we just transfer a command structure in this case
(it's already mapped for DMA).  The TRE is added to a transaction
using ipa_cmd_ip_tag_status_add(), but we ignore the size value
provided to that function.  So just get rid of the size argument.
Signed-off-by: Alex Elder <elder@linaro.org>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

070740d3

net: ipa: don't pass tag value to ipa_cmd_ip_tag_status_add() · 792b75b1

Alex Elder authored Jan 26, 2021

We only send a tagged packet from the AP->command TX endpoint when
we're clearing the hardware pipeline.  And when we receive the
tagged packet we don't care what the actual tag value is.

Stop passing a tag value to ipa_cmd_ip_tag_status_add(), and just
encode 0 as the tag sent.  Fix the function that encodes the tag so
it uses the proper byte ordering.
Signed-off-by: Alex Elder <elder@linaro.org>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

792b75b1

net: ipa: signal when tag transfer completes · 51c48ce2

Alex Elder authored Jan 26, 2021

There are times, such as when the modem crashes, when we issue
commands to clear the IPA hardware pipeline.  These commands include
a data transfer command that delivers a small packet directly to the
default (AP<-LAN RX) endpoint.

The places that do this wait for the transactions that contain these
commands to complete, but the pipeline can't be assumed clear until
the sent packet has been *received*.

The small transfer will be delivered with a status structure, and
that status will indicate its tag is valid.  This is the only place
we send a tagged packet, so we use the tag to determine when the
pipeline clear packet has arrived.

Add a completion to the IPA structure to to be used to signal
the receipt of a pipeline clear packet.  Create a new function
ipa_cmd_pipeline_clear_wait() that will wait for that completion.

Reinitialize the completion whenever pipeline clear commands are
added to a transaction.  Extend ipa_endpoint_status_tag() to check
whether a packet whose status contains a valid tag was sent from the
AP->command TX endpoint, and if so, signal the new IPA completion.

Have all callers of ipa_cmd_pipeline_clear_add() wait for the
pipeline clear indication after the transaction that clears the
pipeline has completed.
Signed-off-by: Alex Elder <elder@linaro.org>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

51c48ce2

net: ipa: drop packet if status has valid tag · f6aba7b5

Alex Elder authored Jan 26, 2021

Introduce ipa_endpoint_status_tag(), which returns true if received
status indicates its tag field is valid.  The endpoint parameter is
not yet used.

Call this from ipa_status_drop_packet(), and drop the packet if the
status indicates the tag was valid.  Pass the endpoint pointer to
ipa_status_drop_packet(), and rename it ipa_endpoint_status_drop().
The endpoint will be used in the next patch.
Signed-off-by: Alex Elder <elder@linaro.org>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

f6aba7b5

net: ipa: minor update to handling of packet with status · 162fbc6f

Alex Elder authored Jan 26, 2021

Rearrange some comments and assignments made when handling a packet
that is received with status, aiming to improve understandability.

Use DIV_ROUND_CLOSEST() to get a better per-packet true size estimate.
Signed-off-by: Alex Elder <elder@linaro.org>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

162fbc6f

net: ipa: rename "tag status" symbols · aa56e3e5

Alex Elder authored Jan 26, 2021

There is a set of functions and symbols related to performing
"tag_process" immediate commands to clear the IPA pipeline.  The
name is related to one of the commands issued when doing this, but
it doesn't really convey the overall purpose of taking this action.

The purpose is to take some steps to "clear out" the hardware
pipeline, and to wait until that process completes, to ensure the
IPA hardware is in a well-defined state.

Rename these symbols to use "pipeline_clear" in their names instead.
Add some comments to explain a bit more about what's going on.
Signed-off-by: Alex Elder <elder@linaro.org>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

aa56e3e5

net: adjust net_device layout for cacheline usage · 28af22c6

Jesper Dangaard Brouer authored Jan 26, 2021

The current layout of net_device is not optimal for cacheline usage.

The member adj_list.lower linked list is split between cacheline 2 and 3.
The ifindex is placed together with stats (struct net_device_stats),
although most modern drivers don't update this stats member.

The members netdev_ops, mtu and hard_header_len are placed on three
different cachelines. These members are accessed for XDP redirect into
devmap, which were noticeably with perf tool. When not using the map
redirect variant (like TC-BPF does), then ifindex is also used, which is
placed on a separate fourth cacheline. These members are also accessed
during forwarding with regular network stack. The members priv_flags and
flags are on fast-path for network stack transmit path in __dev_queue_xmit
(currently located together with mtu cacheline).

This patch creates a read mostly cacheline, with the purpose of keeping the
above mentioned members on the same cacheline.

Some netdev_features_t members also becomes part of this cacheline, which is
on purpose, as function netif_skb_features() is on fast-path via
validate_xmit_skb().
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/r/161168277983.410784.12401225493601624417.stgit@firesoulSigned-off-by: Jakub Kicinski <kuba@kernel.org>

28af22c6

lan743x: fix endianness when accessing descriptors · 46251282

Alexey Denisov authored Jan 28, 2021

TX/RX descriptor ring fields are always little-endian, but conversion
wasn't performed for big-endian CPUs, so the driver failed to work.

This patch makes the driver work on big-endian CPUs. It was tested and
confirmed to work on NXP P1010 processor (PowerPC).
Signed-off-by: Alexey Denisov <rtgbnm@gmail.com>
Link: https://lore.kernel.org/r/20210128044859.280219-1-rtgbnm@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

46251282

e100: switch from 'pci_' to 'dma_' API · 4140ff1b

Christophe JAILLET authored Jan 28, 2021

The wrappers in include/linux/pci-dma-compat.h should go away.

The patch has been generated with the coccinelle script below and has been
hand modified to replace GFP_ with a correct flag.
It has been compile tested.

When memory is allocated in 'e100_alloc()', GFP_KERNEL can be used because
it is only called from the probe function and no lock is acquired.

@@
@@
-    PCI_DMA_BIDIRECTIONAL
+    DMA_BIDIRECTIONAL

@@
@@
-    PCI_DMA_TODEVICE
+    DMA_TO_DEVICE

@@
@@
-    PCI_DMA_FROMDEVICE
+    DMA_FROM_DEVICE

@@
@@
-    PCI_DMA_NONE
+    DMA_NONE

@@
expression e1, e2, e3;
@@
-    pci_alloc_consistent(e1, e2, e3)
+    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)

@@
expression e1, e2, e3;
@@
-    pci_zalloc_consistent(e1, e2, e3)
+    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)

@@
expression e1, e2, e3, e4;
@@
-    pci_free_consistent(e1, e2, e3, e4)
+    dma_free_coherent(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_map_single(e1, e2, e3, e4)
+    dma_map_single(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_single(e1, e2, e3, e4)
+    dma_unmap_single(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4, e5;
@@
-    pci_map_page(e1, e2, e3, e4, e5)
+    dma_map_page(&e1->dev, e2, e3, e4, e5)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_page(e1, e2, e3, e4)
+    dma_unmap_page(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_map_sg(e1, e2, e3, e4)
+    dma_map_sg(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_sg(e1, e2, e3, e4)
+    dma_unmap_sg(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_single_for_cpu(e1, e2, e3, e4)
+    dma_sync_single_for_cpu(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_single_for_device(e1, e2, e3, e4)
+    dma_sync_single_for_device(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_sg_for_cpu(e1, e2, e3, e4)
+    dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_sg_for_device(e1, e2, e3, e4)
+    dma_sync_sg_for_device(&e1->dev, e2, e3, e4)

@@
expression e1, e2;
@@
-    pci_dma_mapping_error(e1, e2)
+    dma_mapping_error(&e1->dev, e2)

@@
expression e1, e2;
@@
-    pci_set_dma_mask(e1, e2)
+    dma_set_mask(&e1->dev, e2)

@@
expression e1, e2;
@@
-    pci_set_consistent_dma_mask(e1, e2)
+    dma_set_coherent_mask(&e1->dev, e2)
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Link: https://lore.kernel.org/r/20210128210736.749724-1-christophe.jaillet@wanadoo.frSigned-off-by: Jakub Kicinski <kuba@kernel.org>

4140ff1b

net: qmi_wwan: Add pass through mode · 59e139cf

Subash Abhinov Kasiviswanathan authored Jan 25, 2021

Pass through mode is to allow packets in MAP format to be passed
on to the stack. rmnet driver can be used to process and demultiplex
these packets.

Pass through mode can be enabled when the device is in raw ip mode only.
Conversely, raw ip mode cannot be disabled when pass through mode is
enabled.

Userspace can use pass through mode in conjunction with rmnet driver
through the following steps-

1. Enable raw ip mode on qmi_wwan device
2. Enable pass through mode on qmi_wwan device
3. Create a rmnet device with qmi_wwan device as real device using netlink
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Acked-by: Bjørn Mork <bjorn@mork.no>
Link: https://lore.kernel.org/r/1611560015-20034-1-git-send-email-subashab@codeaurora.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

59e139cf

Merge branch 'net-usb-qmi_wwan-new-mux_id-sysfs-file' · bbe25b7d

Jakub Kicinski authored Jan 28, 2021

Daniele Palmas says:

====================
net: usb: qmi_wwan: new mux_id sysfs file

this patch series add a sysfs file to let userspace know which mux
id has been used to create a qmimux network interface.

I'm aware that adding new sysfs files is not usually the right path,
but my understanding is that this piece of information can't be
retrieved in any other way and its absence restricts how
userspace application (e.g. like libqmi) can take advantage of the
qmimux implementation in qmi_wwan.
====================

Link: https://lore.kernel.org/r/20210127153433.12237-1-dnlplm@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

bbe25b7d

net: qmi_wwan: document qmap/mux_id sysfs file · b4b91e24

Daniele Palmas authored Jan 27, 2021

Document qmap/mux_id sysfs file showing qmimux interface id
Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

b4b91e24

net: usb: qmi_wwan: add qmap id sysfs file for qmimux interfaces · e594ad98

Daniele Palmas authored Jan 27, 2021

Add qmimux interface sysfs file qmap/mux_id to show qmap id set
during the interface creation, in order to provide a method for
userspace to associate QMI control channels to network interfaces.
Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Acked-by: Aleksander Morgado <aleksander@aleksander.es>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

e594ad98

ipvlan: remove h from printk format specifier · d7a177ea

Tom Rix authored Jan 24, 2021

This change fixes the checkpatch warning described in this commit
commit cbacb5ab ("docs: printk-formats: Stop encouraging use of
unnecessary %h[xudi] and %hh[xudi]")

Standard integer promotion is already done and %hx and %hhx is useless
so do not encourage the use of %hh[xudi] or %h[xudi].

Cleanup output to use __func__ over explicit function strings.
Signed-off-by: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/20210124190804.1964580-1-trix@redhat.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

d7a177ea

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · c358f952

Jakub Kicinski authored Jan 28, 2021

drivers/net/can/dev.c
  b552766c ("can: dev: prevent potential information leak in can_fill_info()")
  3e77f70e ("can: dev: move driver related infrastructure into separate subdir")
  0a042c6e ("can: dev: move netlink related code into seperate file")

  Code move.

drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
  57ac4a31 ("net/mlx5e: Correctly handle changing the number of queues when the interface is down")
  214baf22 ("net/mlx5e: Support HTB offload")

  Adjacent code changes

net/switchdev/switchdev.c
  20776b46 ("net: switchdev: don't set port_obj_info->handled true when -EOPNOTSUPP")
  ffb68fc5 ("net: switchdev: remove the transaction structure from port object notifiers")
  bae33f2b ("net: switchdev: remove the transaction structure from port attributes")

  Transaction parameter gets dropped otherwise keep the fix.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

c358f952

Merge tag 'mlx5-updates-2021-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 24a790da

Jakub Kicinski authored Jan 28, 2021

Saeed Mahameed says:

====================
mlx5 subfunction support

Parav Pandit says:

This patchset introduces support for mlx5 subfunction (SF).

A subfunction is a lightweight function that has a parent PCI function on
which it is deployed. mlx5 subfunction has its own function capabilities
and its own resources. This means a subfunction has its own dedicated
queues(txq, rxq, cq, eq). These queues are neither shared nor stolen from
the parent PCI function.

When subfunction is RDMA capable, it has its own QP1, GID table and rdma
resources neither shared nor stolen from the parent PCI function.

A subfunction has dedicated window in PCI BAR space that is not shared
with the other subfunctions or parent PCI function. This ensures that all
class devices of the subfunction accesses only assigned PCI BAR space.

A Subfunction supports eswitch representation through which it supports tc
offloads. User must configure eswitch to send/receive packets from/to
subfunction port.

Subfunctions share PCI level resources such as PCI MSI-X IRQs with
their other subfunctions and/or with its parent PCI function.

Subfunction support is discussed in detail in RFC [1] and [2].
RFC [1] and extension [2] describes requirements, design and proposed
plumbing using devlink, auxiliary bus and sysfs for systemd/udev
support. Functionality of this patchset is best explained using real
examples further below.

overview:
--------
A subfunction can be created and deleted by a user using devlink port
add/delete interface.

A subfunction can be configured using devlink port function attribute
before its activated.

When a subfunction is activated, it results in an auxiliary device on
the host PCI device where it is deployed. A driver binds to the
auxiliary device that further creates supported class devices.

example subfunction usage sequence:
-----------------------------------
Change device to switchdev mode:
$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

Add a devlink port of subfunction flavour:
$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88

Configure mac address of the port function:
$ devlink port function set ens2f0npf0sf88 hw_addr 00:00:00:00:88:88

Now activate the function:
$ devlink port function set ens2f0npf0sf88 state active

Now use the auxiliary device and class devices:
$ devlink dev show
pci/0000:06:00.0
auxiliary/mlx5_core.sf.4

$ ip link show
127: ens2f0np0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 24:8a:07:b3:d1:12 brd ff:ff:ff:ff:ff:ff
    altname enp6s0f0np0
129: p0sf88: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:00:00:00:88:88 brd ff:ff:ff:ff:ff:ff

$ rdma dev show
43: rdmap6s0f0: node_type ca fw 16.29.0550 node_guid 248a:0703:00b3:d112 sys_image_guid 248a:0703:00b3:d112
44: mlx5_0: node_type ca fw 16.29.0550 node_guid 0000:00ff:fe00:8888 sys_image_guid 248a:0703:00b3:d112

After use inactivate the function:
$ devlink port function set ens2f0npf0sf88 state inactive

Now delete the subfunction port:
$ devlink port del ens2f0npf0sf88

[1] https://lore.kernel.org/netdev/20200519092258.GF4655@nanopsycho/
[2] https://marc.info/?l=linux-netdev&m=158555928517777&w=2

=================

* tag 'mlx5-updates-2021-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: Add devlink subfunction port documentation
  devlink: Extend devlink port documentation for subfunctions
  devlink: Add devlink port documentation
  net/mlx5: SF, Port function state change support
  net/mlx5: SF, Add port add delete functionality
  net/mlx5: E-switch, Add eswitch helpers for SF vport
  net/mlx5: E-switch, Prepare eswitch to handle SF vport
  net/mlx5: SF, Add auxiliary device driver
  net/mlx5: SF, Add auxiliary device support
  net/mlx5: Introduce vhca state event notifier
  devlink: Support get and set state of port function
  devlink: Support add and delete devlink port
  devlink: Introduce PCI SF port flavour and port attribute
  devlink: Prepare code to fill multiple port function attributes
====================

Link: https://lore.kernel.org/r/20210122193658.282884-1-saeed@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

24a790da

28 Jan, 2021 7 commits

Merge tag 'net-5.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 909b447d

Linus Torvalds authored Jan 28, 2021

Pull networking fixes from Jakub Kicinski:
 "Networking fixes including fixes from can, xfrm, wireless,
  wireless-drivers and netfilter trees. Nothing scary, Intel
  WiFi-related fixes seemed most notable to the users.

  Current release - regressions:

   - dsa: microchip: ksz8795: fix KSZ8794 port map again to program the
     CPU port correctly

  Current release - new code bugs:

   - iwlwifi: pcie: reschedule in long-running memory reads

  Previous releases - regressions:

   - iwlwifi: dbg: don't try to overwrite read-only FW data

   - iwlwifi: provide gso_type to GSO packets

   - octeontx2: make sure the buffer is 128 byte aligned

   - tcp: make TCP_USER_TIMEOUT accurate for zero window probes

   - xfrm: fix wraparound in xfrm_policy_addr_delta()

   - xfrm: fix oops in xfrm_replay_advance_bmp due to a race between
     CPUs in presence of packet reorder

   - tcp: fix TLP timer not set when CA_STATE changes from DISORDER to
     OPEN

   - wext: fix NULL-ptr-dereference with cfg80211's lack of commit()

  Previous releases - always broken:

   - igc: fix link speed advertising

   - stmmac: configure EHL PSE0 GbE and PSE1 GbE to 32 bits DMA
     addressing

   - team: protect features update by RCU to avoid deadlock

   - xfrm: fix disable_xfrm sysctl when used on xfrm interfaces
     themselves

   - fec: fix temporary RMII clock reset on link up

   - can: dev: prevent potential information leak in can_fill_info()

  Misc:

   - mrp: fix bad packing of MRP test packet structures

   - uapi: fix big endian definition of ipv6_rpl_sr_hdr

   - add David Ahern to IPv4/IPv6 maintainers"

* tag 'net-5.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (86 commits)
  rxrpc: Fix memory leak in rxrpc_lookup_local
  mlxsw: spectrum_span: Do not overwrite policer configuration
  selftests: forwarding: Specify interface when invoking mausezahn
  stmmac: intel: Configure EHL PSE0 GbE and PSE1 GbE to 32 bits DMA addressing
  net: usb: cdc_ether: added support for Thales Cinterion PLSx3 modem family.
  ibmvnic: Ensure that CRQ entry read are correctly ordered
  MAINTAINERS: add missing header for bonding
  net: decnet: fix netdev refcount leaking on error path
  net: switchdev: don't set port_obj_info->handled true when -EOPNOTSUPP
  can: dev: prevent potential information leak in can_fill_info()
  net: fec: Fix temporary RMII clock reset on link up
  net: lapb: Add locking to the lapb module
  team: protect features update by RCU to avoid deadlock
  MAINTAINERS: add David Ahern to IPv4/IPv6 maintainers
  net/mlx5: CT: Fix incorrect removal of tuple_nat_node from nat rhashtable
  net/mlx5e: Revert parameters on errors when changing MTU and LRO state without reset
  net/mlx5e: Revert parameters on errors when changing trust state without reset
  net/mlx5e: Correctly handle changing the number of queues when the interface is down
  net/mlx5e: Fix CT rule + encap slow path offload and deletion
  net/mlx5e: Disable hw-tc-offload when MLX5_CLS_ACT config is disabled
  ...

909b447d

Merge branch 'net-sfp-add-support-for-gpon-rtl8672-rtl9601c-and-ubiquiti-u-fiber' · 32e31b78

Jakub Kicinski authored Jan 28, 2021

Pali Rohár says:

====================
net: sfp: add support for GPON RTL8672/RTL9601C and Ubiquiti U-Fiber

This is fourth version of patches which add workarounds for
RTL8672/RTL9601C EEPROMs and Ubiquiti U-Fiber Instant SFP.
====================

Link: https://lore.kernel.org/r/20210125150228.8523-1-pali@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

32e31b78

net: sfp: add mode quirk for GPON module Ubiquiti U-Fiber Instant · f0b4f847

Pali Rohár authored Jan 25, 2021

The Ubiquiti U-Fiber Instant SFP GPON module has nonsensical information
stored in its EEPROM. It claims to support all transceiver types including
10G Ethernet. Clear all claimed modes and set only 1000baseX_Full, which is
the only one supported.

This module has also phys_id set to SFF, and the SFP subsystem currently
does not allow to use SFP modules detected as SFFs. Add exception for this
module so it can be detected as supported.

This change finally allows to detect and use SFP GPON module Ubiquiti
U-Fiber Instant on Linux system.

EEPROM content of this SFP module is (where XX is serial number):

00: 02 04 0b ff ff ff ff ff ff ff ff 03 0c 00 14 c8 ???........??.??
10: 00 00 00 00 55 42 4e 54 20 20 20 20 20 20 20 20 ....UBNT
20: 20 20 20 20 00 18 e8 29 55 46 2d 49 4e 53 54 41 .??)UF-INSTA
30: 4e 54 20 20 20 20 20 20 34 20 20 20 05 1e 00 36 NT 4 ??.6
40: 00 06 00 00 55 42 4e 54 XX XX XX XX XX XX XX XX .?..UBNTXXXXXXXX
50: 20 20 20 20 31 34 30 31 32 33 20 20 60 80 02 41 140123 `??A
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

f0b4f847

net: sfp: add workaround for Realtek RTL8672 and RTL9601C chips · 426c6cbc

Pali Rohár authored Jan 25, 2021

The workaround for VSOL V2801F brand based GPON SFP modules added in commit
0d035bed ("net: sfp: VSOL V2801F / CarlitoxxPro CPGOS03-0490 v2.0
workaround") works only for IDs added explicitly to the list. Since there
are rebranded modules where OEM vendors put different strings into the
vendor name field, we cannot base workaround on IDs only.

Moreover the issue which the above mentioned commit tried to work around is
generic not only to VSOL based modules, but rather to all GPON modules
based on Realtek RTL8672 and RTL9601C chips.

These include at least the following GPON modules:
* V-SOL V2801F
* C-Data FD511GX-RM0
* OPTON GP801R
* BAUDCOM BD-1234-SFM
* CPGOS03-0490 v2.0
* Ubiquiti U-Fiber Instant
* EXOT EGS1

These Realtek chips have broken EEPROM emulator which for N-byte read
operation returns just the first byte of EEPROM data, followed by N-1
zeros.

Introduce a new function, sfp_id_needs_byte_io(), which detects SFP modules
with broken EEPROM emulator based on N-1 zeros and switch to 1 byte EEPROM
reading operation.

Function sfp_i2c_read() now always uses single byte reading when it is
required and when function sfp_hwmon_probe() detects single byte access,
it disables registration of hwmon device, because in this case we cannot
reliably and atomically read 2 bytes as is required by the standard for
retrieving values from diagnostic area.

(These Realtek chips are broken in a way that violates SFP standards for
diagnostic interface. Kernel in this case simply cannot do anything less
of skipping registration of the hwmon interface.)

This patch fixes reading of EEPROM content from SFP modules based on
Realtek RTL8672 and RTL9601C chips. Diagnostic interface of EEPROM stays
broken and cannot be fixed.

Fixes: 0d035bed ("net: sfp: VSOL V2801F / CarlitoxxPro CPGOS03-0490 v2.0 workaround")
Co-developed-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

426c6cbc

rxrpc: Fix memory leak in rxrpc_lookup_local · b8323f72

Takeshi Misawa authored Jan 28, 2021

Commit 9ebeddef ("rxrpc: rxrpc_peer needs to hold a ref on the rxrpc_local record")
Then release ref in __rxrpc_put_peer and rxrpc_put_peer_locked.

	struct rxrpc_peer *rxrpc_alloc_peer(struct rxrpc_local *local, gfp_t gfp)
	-               peer->local = local;
	+               peer->local = rxrpc_get_local(local);

rxrpc_discard_prealloc also need ref release in discarding.

syzbot report:
BUG: memory leak
unreferenced object 0xffff8881080ddc00 (size 256):
  comm "syz-executor339", pid 8462, jiffies 4294942238 (age 12.350s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 0a 00 00 00 00 c0 00 08 81 88 ff ff  ................
  backtrace:
    [<000000002b6e495f>] kmalloc include/linux/slab.h:552 [inline]
    [<000000002b6e495f>] kzalloc include/linux/slab.h:682 [inline]
    [<000000002b6e495f>] rxrpc_alloc_local net/rxrpc/local_object.c:79 [inline]
    [<000000002b6e495f>] rxrpc_lookup_local+0x1c1/0x760 net/rxrpc/local_object.c:244
    [<000000006b43a77b>] rxrpc_bind+0x174/0x240 net/rxrpc/af_rxrpc.c:149
    [<00000000fd447a55>] afs_open_socket+0xdb/0x200 fs/afs/rxrpc.c:64
    [<000000007fd8867c>] afs_net_init+0x2b4/0x340 fs/afs/main.c:126
    [<0000000063d80ec1>] ops_init+0x4e/0x190 net/core/net_namespace.c:152
    [<00000000073c5efa>] setup_net+0xde/0x2d0 net/core/net_namespace.c:342
    [<00000000a6744d5b>] copy_net_ns+0x19f/0x3e0 net/core/net_namespace.c:483
    [<0000000017d3aec3>] create_new_namespaces+0x199/0x4f0 kernel/nsproxy.c:110
    [<00000000186271ef>] unshare_nsproxy_namespaces+0x9b/0x120 kernel/nsproxy.c:226
    [<000000002de7bac4>] ksys_unshare+0x2fe/0x5c0 kernel/fork.c:2957
    [<00000000349b12ba>] __do_sys_unshare kernel/fork.c:3025 [inline]
    [<00000000349b12ba>] __se_sys_unshare kernel/fork.c:3023 [inline]
    [<00000000349b12ba>] __x64_sys_unshare+0x12/0x20 kernel/fork.c:3023
    [<000000006d178ef7>] do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
    [<00000000637076d4>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 9ebeddef ("rxrpc: rxrpc_peer needs to hold a ref on the rxrpc_local record")
Signed-off-by: Takeshi Misawa <jeliantsurux@gmail.com>
Reported-and-tested-by: syzbot+305326672fed51b205f7@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/161183091692.3506637.3206605651502458810.stgit@warthog.procyon.org.ukSigned-off-by: Jakub Kicinski <kuba@kernel.org>

b8323f72

Merge branch 'mlxsw-various-fixes' · 924b171c

Jakub Kicinski authored Jan 28, 2021

Ido Schimmel says:

====================
mlxsw: Various fixes

Patch #1 fixes wrong invocation of mausezahn in a couple of selftests.
The tests started failing after Fedora updated their libnet package from
version 1.1.6 to 1.2.1. With the fix the tests pass regardless of libnet
version.

Patch #2 fixes an issue in the mirroring to CPU code that results in
policer configuration being overwritten.
====================

Link: https://lore.kernel.org/r/20210128144820.3280295-1-idosch@idosch.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

924b171c

mlxsw: spectrum_span: Do not overwrite policer configuration · b6f6881a

Ido Schimmel authored Jan 28, 2021

The purpose of the delayed work in the SPAN module is to potentially
update the destination port and various encapsulation parameters of SPAN
agents that point to a VLAN device or a GRE tap. The destination port
can change following the insertion of a new route, for example.

SPAN agents that point to a physical port or the CPU port are static and
never change throughout the lifetime of the SPAN agent. Therefore, skip
over them in the delayed work.

This fixes an issue where the delayed work overwrites the policer
that was set on a SPAN agent pointing to the CPU. Modifying the delayed
work to inherit the original policer configuration is error-prone, as
the same will be needed for any new parameter.

Fixes: 4039504e ("mlxsw: spectrum_span: Allow setting policer on a SPAN agent")
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

b6f6881a