- 06 Aug, 2020 1 commit
-
-
Oliver Neukum authored
The driver tries to reuse code for disconnect in case of a failed probe. If resources need to be freed after an error in probe, the netdev must not be freed because it has never been registered. Fix it by telling the helper which path we are in. Signed-off-by: Oliver Neukum <oneukum@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 05 Aug, 2020 13 commits
-
-
Stefano Brivio authored
On architectures defining _HAVE_ARCH_IPV6_CSUM, we get csum_ipv6_magic() defined by means of arch checksum.h headers. On other architectures, we actually need to include net/ip6_checksum.h to be able to use it. Without this include, building with defconfig breaks at least for s390. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: 4cb47a86 ("tunnels: PMTU discovery support for directly bridged IP packets") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Willem de Bruijn authored
The msg_zerocopy test pins the sender and receiver threads to separate cores to reduce variance between runs. But it hardcodes the cores and skips core 0, so it fails on machines with the selected cores offline, or simply fewer cores. The test mainly gives code coverage in automated runs. The throughput of zerocopy ('-z') and non-zerocopy runs is logged for manual inspection. Continue even when sched_setaffinity fails. Just log to warn anyone interpreting the data. Fixes: 07b65c5b ("test: add msg_zerocopy test") Reported-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
Nicolas reported the following oops: [ 1521.392541] BUG: kernel NULL pointer dereference, address: 00000000000000c0 [ 1521.394189] #PF: supervisor read access in kernel mode [ 1521.395376] #PF: error_code(0x0000) - not-present page [ 1521.396607] PGD 0 P4D 0 [ 1521.397156] Oops: 0000 [#1] SMP PTI [ 1521.398020] CPU: 0 PID: 22986 Comm: kworker/0:2 Not tainted 5.8.0-rc4+ #109 [ 1521.399618] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 1521.401728] Workqueue: events mptcp_worker [ 1521.402651] RIP: 0010:mptcp_subflow_create_socket+0xf1/0x1c0 [ 1521.403954] Code: 24 08 89 44 24 04 48 8b 7a 18 e8 2a 48 d4 ff 8b 44 24 04 85 c0 75 7a 48 8b 8b 78 02 00 00 48 8b 54 24 08 48 8d bb 80 00 00 00 <48> 8b 89 c0 00 00 00 48 89 8a c0 00 00 00 48 8b 8b 78 02 00 00 8b [ 1521.408201] RSP: 0000:ffffabc4002d3c60 EFLAGS: 00010246 [ 1521.409433] RAX: 0000000000000000 RBX: ffffa0b9ad8c9a00 RCX: 0000000000000000 [ 1521.411096] RDX: ffffa0b9ae78a300 RSI: 00000000fffffe01 RDI: ffffa0b9ad8c9a80 [ 1521.412734] RBP: ffffa0b9adff2e80 R08: ffffa0b9af02d640 R09: ffffa0b9ad923a00 [ 1521.414333] R10: ffffabc4007139f8 R11: fefefefefefefeff R12: ffffabc4002d3cb0 [ 1521.415918] R13: ffffa0b9ad91fa58 R14: ffffa0b9ad8c9f9c R15: 0000000000000000 [ 1521.417592] FS: 0000000000000000(0000) GS:ffffa0b9af000000(0000) knlGS:0000000000000000 [ 1521.419490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1521.420839] CR2: 00000000000000c0 CR3: 000000002951e006 CR4: 0000000000160ef0 [ 1521.422511] Call Trace: [ 1521.423103] __mptcp_subflow_connect+0x94/0x1f0 [ 1521.425376] mptcp_pm_create_subflow_or_signal_addr+0x200/0x2a0 [ 1521.426736] mptcp_worker+0x31b/0x390 [ 1521.431324] process_one_work+0x1fc/0x3f0 [ 1521.432268] worker_thread+0x2d/0x3b0 [ 1521.434197] kthread+0x117/0x130 [ 1521.435783] ret_from_fork+0x22/0x30 on some unconventional configuration. The MPTCP protocol is trying to create a subflow for an unaccepted server socket. That is allowed by the RFC, even if subflow creation will likely fail. Unaccepted sockets have still a NULL sk_socket field, avoid the issue by failing earlier. Reported-and-tested-by: Nicolas Rybowski <nicolas.rybowski@tessares.net> Fixes: 7d14b0d2 ("mptcp: set correct vfs info for subflows") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Po-Hsu Lin says: ==================== selftests: rtnetlink: Fix for false-negative return values This patchset will address the false-negative return value issue caused by the following: 1. The return value "ret" in this script will be reset to 0 from the beginning of each sub-test in rtnetlink.sh, therefore this rtnetlink test will always pass if the last sub-test has passed. 2. The test result from two sub-tests in kci_test_encap() were not being processed, thus they will not affect the final test result of this test. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Po-Hsu Lin authored
kci_test_encap() is actually composed by two different sub-tests, kci_test_encap_vxlan() and kci_test_encap_fou() Therefore we should check the test result of these two in kci_test_encap() to let the script be aware of the pass / fail status. Otherwise it will generate false-negative result like below: $ sudo ./test.sh PASS: policy routing PASS: route get PASS: preferred_lft addresses have expired PASS: promote_secondaries complete PASS: tc htb hierarchy PASS: gre tunnel endpoint PASS: gretap PASS: ip6gretap PASS: erspan PASS: ip6erspan PASS: bridge setup PASS: ipv6 addrlabel PASS: set ifalias 5b193daf-0a08-46d7-af2c-e7aadd422ded for test-dummy0 PASS: vrf PASS: vxlan FAIL: can't add fou port 7777, skipping test PASS: macsec PASS: bridge fdb get PASS: neigh get $ echo $? 0 Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Po-Hsu Lin authored
The return value "ret" will be reset to 0 from the beginning of each sub-test in rtnetlink.sh, therefore this test will always pass if the last sub-test has passed: $ sudo ./rtnetlink.sh PASS: policy routing PASS: route get PASS: preferred_lft addresses have expired PASS: promote_secondaries complete PASS: tc htb hierarchy PASS: gre tunnel endpoint PASS: gretap PASS: ip6gretap PASS: erspan PASS: ip6erspan PASS: bridge setup PASS: ipv6 addrlabel PASS: set ifalias a39ee707-e36b-41d3-802f-63179ed4d580 for test-dummy0 PASS: vrf PASS: vxlan FAIL: can't add fou port 7777, skipping test PASS: macsec PASS: ipsec 3,7c3,7 < sa[0] spi=0x00000009 proto=0x32 salt=0x64636261 crypt=1 < sa[0] key=0x31323334 35363738 39303132 33343536 < sa[1] rx ipaddr=0x00000000 00000000 00000000 c0a87b03 < sa[1] spi=0x00000009 proto=0x32 salt=0x64636261 crypt=1 < sa[1] key=0x31323334 35363738 39303132 33343536 --- > sa[0] spi=0x00000009 proto=0x32 salt=0x61626364 crypt=1 > sa[0] key=0x34333231 38373635 32313039 36353433 > sa[1] rx ipaddr=0x00000000 00000000 00000000 037ba8c0 > sa[1] spi=0x00000009 proto=0x32 salt=0x61626364 crypt=1 > sa[1] key=0x34333231 38373635 32313039 36353433 FAIL: ipsec_offload incorrect driver data FAIL: ipsec_offload PASS: bridge fdb get PASS: neigh get $ echo $? 0 Make "ret" become a local variable for all sub-tests. Also, check the sub-test results in kci_test_rtnl() and return the final result for this test. Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
Although we can detect the chip revision 100% at runtime, it is useful to specify it in the device tree compatible string too, because otherwise there would be no way to assess the correctness of device tree bindings statically, without booting a board (only some switch versions have internal RGMII delays and/or an SGMII port). But for testing the P/Q/R/S support, what I have is a reworked board with the SJA1105T replaced by a pin-compatible SJA1105Q, and I don't want to keep a separate device tree blob just for this one-off board. Since just the chip has been replaced, its RGMII delay setup is inherently the same (meaning: delays added by the PHY on the slave ports, and by PCB traces on the fixed-link CPU port). For this board, I'd rather have the driver shout at me, but go ahead and use what it found even if it doesn't match what it's been told is there. [ 2.970826] sja1105 spi0.1: Device tree specifies chip SJA1105T but found SJA1105Q, please fix it! [ 2.980010] sja1105 spi0.1: Probed switch chip: SJA1105Q [ 3.005082] sja1105 spi0.1: Enabled switch tagging Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Xin Long says: ==================== net: fix a mcast issue for tipc udp media Patch 1 is to add a function to get the dev by source address, which will be used by Patch 2. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Xin Long authored
Without ub->ifindex set for ipv6 address in tipc_udp_enable(), ipv6_sock_mc_join() may make the wrong dev join the multicast address in enable_mcast(). This causes that tipc links would never be created. So fix it by getting the right netdev and setting ub->ifindex, as it does for ipv4 address. Reported-by: Shuang Li <shuali@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Xin Long authored
This is to add an ip_dev_find like function for ipv6, used to find the dev by saddr. It will be used by TIPC protocol. So also export it. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tonghao Zhang authored
ovs_flow_tbl_destroy always is called from RCU callback or error path. It is no need to check if rcu_read_lock or lockdep_ovsl_is_held was held. ovs_dp_cmd_fill_info always is called with ovs_mutex, So use the rcu_dereference_ovsl instead of rcu_dereference in ovs_flow_tbl_masks_cache_size. Fixes: 9bf24f59 ("net: openvswitch: make masks cache size configurable") Cc: Eelco Chaudron <echaudro@redhat.com> Reported-by: syzbot+c0eb9e7cdde04e4eb4be@syzkaller.appspotmail.com Reported-by: syzbot+f612c02823acb02ff9bc@syzkaller.appspotmail.com Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Hangbin Liu authored
This reverts commit 71130f29. In commit 71130f29 ("vxlan: fix tos value before xmit") we want to make sure the tos value are filtered by RT_TOS() based on RFC1349. 0 1 2 3 4 5 6 7 +-----+-----+-----+-----+-----+-----+-----+-----+ | PRECEDENCE | TOS | MBZ | +-----+-----+-----+-----+-----+-----+-----+-----+ But RFC1349 has been obsoleted by RFC2474. The new DSCP field defined like 0 1 2 3 4 5 6 7 +-----+-----+-----+-----+-----+-----+-----+-----+ | DS FIELD, DSCP | ECN FIELD | +-----+-----+-----+-----+-----+-----+-----+-----+ So with IPTOS_TOS_MASK 0x1E RT_TOS(tos) ((tos)&IPTOS_TOS_MASK) the first 3 bits DSCP info will get lost. To take all the DSCP info in xmit, we should revert the patch and just push all tos bits to ip_tunnel_ecn_encap(), which will handling ECN field later. Fixes: 71130f29 ("vxlan: fix tos value before xmit") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vladimir Oltean authored
The way we define the phase (the difference between the time of the signal's rising edge, and the closest integer multiple of the period), it doesn't make sense to have a phase value equal or larger than 1 period. So deny these settings coming from the user. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Acked-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 04 Aug, 2020 26 commits
-
-
Christophe JAILLET authored
The wrappers in include/linux/pci-dma-compat.h should go away. The patch has been generated with the coccinelle script below and has been hand modified to replace GFP_ with a correct flag. It has been compile tested. When memory is allocated in 'fst_add_one()', GFP_KERNEL can be used because it is a probe function and no lock is acquired. @@ @@ - PCI_DMA_BIDIRECTIONAL + DMA_BIDIRECTIONAL @@ @@ - PCI_DMA_TODEVICE + DMA_TO_DEVICE @@ @@ - PCI_DMA_FROMDEVICE + DMA_FROM_DEVICE @@ @@ - PCI_DMA_NONE + DMA_NONE @@ expression e1, e2, e3; @@ - pci_alloc_consistent(e1, e2, e3) + dma_alloc_coherent(&e1->dev, e2, e3, GFP_) @@ expression e1, e2, e3; @@ - pci_zalloc_consistent(e1, e2, e3) + dma_alloc_coherent(&e1->dev, e2, e3, GFP_) @@ expression e1, e2, e3, e4; @@ - pci_free_consistent(e1, e2, e3, e4) + dma_free_coherent(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_map_single(e1, e2, e3, e4) + dma_map_single(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_unmap_single(e1, e2, e3, e4) + dma_unmap_single(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4, e5; @@ - pci_map_page(e1, e2, e3, e4, e5) + dma_map_page(&e1->dev, e2, e3, e4, e5) @@ expression e1, e2, e3, e4; @@ - pci_unmap_page(e1, e2, e3, e4) + dma_unmap_page(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_map_sg(e1, e2, e3, e4) + dma_map_sg(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_unmap_sg(e1, e2, e3, e4) + dma_unmap_sg(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_single_for_cpu(e1, e2, e3, e4) + dma_sync_single_for_cpu(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_single_for_device(e1, e2, e3, e4) + dma_sync_single_for_device(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_sg_for_cpu(e1, e2, e3, e4) + dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_sg_for_device(e1, e2, e3, e4) + dma_sync_sg_for_device(&e1->dev, e2, e3, e4) @@ expression e1, e2; @@ - pci_dma_mapping_error(e1, e2) + dma_mapping_error(&e1->dev, e2) @@ expression e1, e2; @@ - pci_set_dma_mask(e1, e2) + dma_set_mask(&e1->dev, e2) @@ expression e1, e2; @@ - pci_set_consistent_dma_mask(e1, e2) + dma_set_coherent_mask(&e1->dev, e2) Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christophe JAILLET authored
The wrappers in include/linux/pci-dma-compat.h should go away. The patch has been generated with the coccinelle script below and has been hand modified to replace GFP_ with a correct flag. It has been compile tested. When memory is allocated in 'wanxl_pci_init_one()', GFP_KERNEL can be used because it is a probe function and no lock is acquired. Moreover, just a few lines above, GFP_KERNEL is already used. @@ @@ - PCI_DMA_BIDIRECTIONAL + DMA_BIDIRECTIONAL @@ @@ - PCI_DMA_TODEVICE + DMA_TO_DEVICE @@ @@ - PCI_DMA_FROMDEVICE + DMA_FROM_DEVICE @@ @@ - PCI_DMA_NONE + DMA_NONE @@ expression e1, e2, e3; @@ - pci_alloc_consistent(e1, e2, e3) + dma_alloc_coherent(&e1->dev, e2, e3, GFP_) @@ expression e1, e2, e3; @@ - pci_zalloc_consistent(e1, e2, e3) + dma_alloc_coherent(&e1->dev, e2, e3, GFP_) @@ expression e1, e2, e3, e4; @@ - pci_free_consistent(e1, e2, e3, e4) + dma_free_coherent(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_map_single(e1, e2, e3, e4) + dma_map_single(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_unmap_single(e1, e2, e3, e4) + dma_unmap_single(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4, e5; @@ - pci_map_page(e1, e2, e3, e4, e5) + dma_map_page(&e1->dev, e2, e3, e4, e5) @@ expression e1, e2, e3, e4; @@ - pci_unmap_page(e1, e2, e3, e4) + dma_unmap_page(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_map_sg(e1, e2, e3, e4) + dma_map_sg(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_unmap_sg(e1, e2, e3, e4) + dma_unmap_sg(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_single_for_cpu(e1, e2, e3, e4) + dma_sync_single_for_cpu(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_single_for_device(e1, e2, e3, e4) + dma_sync_single_for_device(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_sg_for_cpu(e1, e2, e3, e4) + dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_sg_for_device(e1, e2, e3, e4) + dma_sync_sg_for_device(&e1->dev, e2, e3, e4) @@ expression e1, e2; @@ - pci_dma_mapping_error(e1, e2) + dma_mapping_error(&e1->dev, e2) @@ expression e1, e2; @@ - pci_set_dma_mask(e1, e2) + dma_set_mask(&e1->dev, e2) @@ expression e1, e2; @@ - pci_set_consistent_dma_mask(e1, e2) + dma_set_coherent_mask(&e1->dev, e2) Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stephen Hemminger authored
If the accelerated networking SRIOV VF device has lost carrier use the synthetic network device which is available as backup path. This is a rare case since if VF link goes down, normally the VMBus device will also loose external connectivity as well. But if the communication is between two VM's on the same host the VMBus device will still work. Reported-by: "Shah, Ashish N" <ashish.n.shah@intel.com> Fixes: 0c195567 ("netvsc: transparent VF management") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
YueHaibing authored
Fix smatch warning: drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c:2419 alloc_channel() warn: passing zero to 'ERR_PTR' setup_dpcon() should return ERR_PTR(err) instead of zero in error handling case. Fixes: d7f5a9d8 ("dpaa2-eth: defer probe on object allocate") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stefan Roese authored
I just recently noticed that ethernet does not work anymore since v5.5 on the GARDENA smart Gateway, which is based on the AT91SAM9G25. Debugging showed that the "GEM bits" in the NCFGR register are now unconditionally accessed, which is incorrect for the !macb_is_gem() case. This patch adds the macb_is_gem() checks back to the code (in macb_mac_config() & macb_mac_link_up()), so that the GEM register bits are not accessed in this case any more. Fixes: 7897b071 ("net: macb: convert to phylink") Signed-off-by: Stefan Roese <sr@denx.de> Cc: Reto Schneider <reto.schneider@husqvarnagroup.com> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Cc: Nicolas Ferre <nicolas.ferre@microchip.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller authored
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Flush the cleanup xtables worker to make sure destructors have completed, from Florian Westphal. 2) iifgroup is matching erroneously, also from Florian. 3) Add selftest for meta interface matching, from Florian Westphal. 4) Move nf_ct_offload_timeout() to header, from Roi Dayan. 5) Call nf_ct_offload_timeout() from flow_offload_add() to make sure garbage collection does not evict offloaded flow, from Roi Dayan. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Xin Long authored
A dead lock was triggered on thunderx driver: CPU0 CPU1 ---- ---- [01] lock(&(&nic->rx_mode_wq_lock)->rlock); [11] lock(&(&mc->mca_lock)->rlock); [12] lock(&(&nic->rx_mode_wq_lock)->rlock); [02] <Interrupt> lock(&(&mc->mca_lock)->rlock); The path for each is: [01] worker_thread() -> process_one_work() -> nicvf_set_rx_mode_task() [02] mld_ifc_timer_expire() [11] ipv6_add_dev() -> ipv6_dev_mc_inc() -> igmp6_group_added() -> [12] dev_mc_add() -> __dev_set_rx_mode() -> nicvf_set_rx_mode() To fix it, it needs to disable bh on [1], so that the timer on [2] wouldn't be triggered until rx_mode_wq_lock is released. So change to use spin_lock_bh() instead of spin_lock(). Thanks to Paolo for helping with this. v1->v2: - post to netdev. Reported-by: Rafael P. <rparrazo@redhat.com> Tested-by: Dean Nelson <dnelson@redhat.com> Fixes: 469998c8 ("net: thunderx: prevent concurrent data re-writing by nicvf_set_rx_mode") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Stefano Brivio says: ==================== Support PMTU discovery with bridged UDP tunnels Currently, PMTU discovery for UDP tunnels only works if packets are routed to the encapsulating interfaces, not bridged. This results from the fact that we generally don't have valid routes to the senders we can use to relay ICMP and ICMPv6 errors, and makes PMTU discovery completely non-functional for VXLAN and GENEVE ports of both regular bridges and Open vSwitch instances. If the sender is local, and packets are forwarded to the port by a regular bridge, all it takes is to generate a corresponding route exception on the encapsulating device. The bridge then finds the route exception carrying the PMTU value estimate as it forwards frames, and relays ICMP messages back to the socket of the local sender. Patch 1/6 fixes this case. If the sender resides on another node, we actually need to reply to IP and IPv6 packets ourselves and send these ICMP or ICMPv6 errors back, using the same encapsulating device. Patch 2/6, based on an original idea by Florian Westphal, adds the needed functionality, while patches 3/6 and 4/6 add matching support for VXLAN and GENEVE. Finally, 5/6 and 6/6 introduce selftests for all combinations of inner and outer IP versions, covering both VXLAN and GENEVE, with both regular bridges and Open vSwitch instances. v2: Add helper to check for any bridge port, skip oif check for PMTU routes for bridge ports only, split IPv4 and IPv6 helpers and functions (all suggested by David Ahern) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stefano Brivio authored
The new tests check that IP and IPv6 packets exceeding the local PMTU estimate, forwarded by an Open vSwitch instance from another node, result in the correct route exceptions being created, and that communication with end-to-end fragmentation, over GENEVE and VXLAN Open vSwitch ports, is now possible as a result of PMTU discovery. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stefano Brivio authored
The new tests check that IP and IPv6 packets exceeding the local PMTU estimate, both locally generated and forwarded by a bridge from another node, result in the correct route exceptions being created, and that communication with end-to-end fragmentation over VXLAN and GENEVE tunnels is now possible as a result of PMTU discovery. Part of the existing setup functions aren't generic enough to simply add a namespace and a bridge to the existing routing setup. This rework is in progress and we can easily shrink this once more generic topology functions are available. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stefano Brivio authored
If the interface is a bridge or Open vSwitch port, and we can't forward a packet because it exceeds the local PMTU estimate, trigger an ICMP or ICMPv6 reply to the sender, using the same interface to forward it back. If metadata collection is enabled, set destination and source addresses for the flow as if we were receiving the packet, so that Open vSwitch can match the ICMP error against the existing association. v2: Use netif_is_any_bridge_port() (David Ahern) Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stefano Brivio authored
If the interface is a bridge or Open vSwitch port, and we can't forward a packet because it exceeds the local PMTU estimate, trigger an ICMP or ICMPv6 reply to the sender, using the same interface to forward it back. If metadata collection is enabled, reverse destination and source addresses, so that Open vSwitch is able to match this packet against the existing, reverse flow. v2: Use netif_is_any_bridge_port() (David Ahern) Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stefano Brivio authored
It's currently possible to bridge Ethernet tunnels carrying IP packets directly to external interfaces without assigning them addresses and routes on the bridged network itself: this is the case for UDP tunnels bridged with a standard bridge or by Open vSwitch. PMTU discovery is currently broken with those configurations, because the encapsulation effectively decreases the MTU of the link, and while we are able to account for this using PMTU discovery on the lower layer, we don't have a way to relay ICMP or ICMPv6 messages needed by the sender, because we don't have valid routes to it. On the other hand, as a tunnel endpoint, we can't fragment packets as a general approach: this is for instance clearly forbidden for VXLAN by RFC 7348, section 4.3: VTEPs MUST NOT fragment VXLAN packets. Intermediate routers may fragment encapsulated VXLAN packets due to the larger frame size. The destination VTEP MAY silently discard such VXLAN fragments. The same paragraph recommends that the MTU over the physical network accomodates for encapsulations, but this isn't a practical option for complex topologies, especially for typical Open vSwitch use cases. Further, it states that: Other techniques like Path MTU discovery (see [RFC1191] and [RFC1981]) MAY be used to address this requirement as well. Now, PMTU discovery already works for routed interfaces, we get route exceptions created by the encapsulation device as they receive ICMP Fragmentation Needed and ICMPv6 Packet Too Big messages, and we already rebuild those messages with the appropriate MTU and route them back to the sender. Add the missing bits for bridged cases: - checks in skb_tunnel_check_pmtu() to understand if it's appropriate to trigger a reply according to RFC 1122 section 3.2.2 for ICMP and RFC 4443 section 2.4 for ICMPv6. This function is already called by UDP tunnels - a new function generating those ICMP or ICMPv6 replies. We can't reuse icmp_send() and icmp6_send() as we don't see the sender as a valid destination. This doesn't need to be generic, as we don't cover any other type of ICMP errors given that we only provide an encapsulation function to the sender While at it, make the MTU check in skb_tunnel_check_pmtu() accurate: we might receive GSO buffers here, and the passed headroom already includes the inner MAC length, so we don't have to account for it a second time (that would imply three MAC headers on the wire, but there are just two). This issue became visible while bridging IPv6 packets with 4500 bytes of payload over GENEVE using IPv4 with a PMTU of 4000. Given the 50 bytes of encapsulation headroom, we would advertise MTU as 3950, and we would reject fragmented IPv6 datagrams of 3958 bytes size on the wire. We're exclusively dealing with network MTU here, though, so we could get Ethernet frames up to 3964 octets in that case. v2: - moved skb_tunnel_check_pmtu() to ip_tunnel_core.c (David Ahern) - split IPv4/IPv6 functions (David Ahern) Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stefano Brivio authored
Currently, processes sending traffic to a local bridge with an encapsulation device as a port don't get ICMP errors if they exceed the PMTU of the encapsulated link. David Ahern suggested this as a hack, but it actually looks like the correct solution: when we update the PMTU for a given destination by means of updating or creating a route exception, the encapsulation might trigger this because of PMTU discovery happening either on the encapsulation device itself, or its lower layer. This happens on bridged encapsulations only. The output interface shouldn't matter, because we already have a valid destination. Drop the output interface restriction from the associated route lookup. For UDP tunnels, we will now have a route exception created for the encapsulation itself, with a MTU value reflecting its headroom, which allows a bridge forwarding IP packets originated locally to deliver errors back to the sending socket. The behaviour is now consistent with IPv6 and verified with selftests pmtu_ipv{4,6}_br_{geneve,vxlan}{4,6}_exception introduced later in this series. v2: - reset output interface only for bridge ports (David Ahern) - add and use netif_is_any_bridge_port() helper (David Ahern) Suggested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Merge tag 'wireless-drivers-next-2020-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Valo says: ==================== wireless-drivers-next patches for v5.9 Second set of patches for v5.9. mt76 has most of patches this time. Otherwise it's just smaller fixes and cleanups to other drivers. There was a major conflict in mt76 driver between wireless-drivers and wireless-drivers-next. I solved that by merging the former to the latter. Major changes: rtw88 * add support for ieee80211_ops::change_interface * add support for enabling and disabling beacon * add debugfs file for testing h2c mt76 * ARP filter offload for 7663 * runtime power management for 7663 * testmode support for mfg calibration * support for more channels ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Joe Perches authored
Use netdev_<level> in place of VELOCITY_PRT. Use pr_<level> in place of printk(KERN_<LEVEL>. Miscellanea: o Add pr_fmt to prefix pr_<level> output with "via-velocity: " o Remove now unused functions and macros o Realign some logging lines o Remove devname where pr_<level> is also used Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Luo bin says: ==================== hinic: mailbox channel enhancement add support to generate mailbox random id for VF to ensure that the mailbox message from VF is valid and PF should check whether the cmd from VF is supported before passing it to hw. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Luo bin authored
PF should check whether the cmd from VF is supported and its content is right before passing it to hw. Signed-off-by: Luo bin <luobin9@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Luo bin authored
add support to generate mailbox random id of VF to ensure that mailbox messages PF received are from the correct VF. Signed-off-by: Luo bin <luobin9@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers.gitKalle Valo authored
mt76 driver had major conflicts within mt7615 directory. To make it easier for every merge wireless-drivers to wireless-drivers-next and solve those conflicts.
-
David S. Miller authored
drivers/net/ethernet/sfc/ef100_nic.c:835:3: error: 'const struct efx_nic_type' has no member named 'filter_rfs_expire_one' 835 | .filter_rfs_expire_one = efx_mcdi_filter_rfs_expire_one, | ^~~~~~~~~~~~~~~~~~~~~ >> drivers/net/ethernet/sfc/ef100_nic.c:835:27: error: initialization of 'void (*)(struct efx_nic *, u32)' {aka 'void (*)(struct efx_nic *, unsigned int)'} from incompatible pointer type 'bool (*)(struct efx_nic *, u32, unsigned int)' {aka '_Bool (*)(struct efx_nic *, unsigned int, unsigned int)'} [-Werror=incompatible-pointer-types] 835 | .filter_rfs_expire_one = efx_mcdi_filter_rfs_expire_one, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller authored
Daniel Borkmann says: ==================== pull-request: bpf-next 2020-08-04 The following pull-request contains BPF updates for your *net-next* tree. We've added 73 non-merge commits during the last 9 day(s) which contain a total of 135 files changed, 4603 insertions(+), 1013 deletions(-). The main changes are: 1) Implement bpf_link support for XDP. Also add LINK_DETACH operation for the BPF syscall allowing processes with BPF link FD to force-detach, from Andrii Nakryiko. 2) Add BPF iterator for map elements and to iterate all BPF programs for efficient in-kernel inspection, from Yonghong Song and Alexei Starovoitov. 3) Separate bpf_get_{stack,stackid}() helpers for perf events in BPF to avoid unwinder errors, from Song Liu. 4) Allow cgroup local storage map to be shared between programs on the same cgroup. Also extend BPF selftests with coverage, from YiFei Zhu. 5) Add BPF exception tables to ARM64 JIT in order to be able to JIT BPF_PROBE_MEM load instructions, from Jean-Philippe Brucker. 6) Follow-up fixes on BPF socket lookup in combination with reuseport group handling. Also add related BPF selftests, from Jakub Sitnicki. 7) Allow to use socket storage in BPF_PROG_TYPE_CGROUP_SOCK-typed programs for socket create/release as well as bind functions, from Stanislav Fomichev. 8) Fix an info leak in xsk_getsockopt() when retrieving XDP stats via old struct xdp_statistics, from Peilin Ye. 9) Fix PT_REGS_RC{,_CORE}() macros in libbpf for MIPS arch, from Jerry Crunchtime. 10) Extend BPF kernel test infra with skb->family and skb->{local,remote}_ip{4,6} fields and allow user space to specify skb->dev via ifindex, from Dmitry Yakunin. 11) Fix a bpftool segfault due to missing program type name and make it more robust to prevent them in future gaps, from Quentin Monnet. 12) Consolidate cgroup helper functions across selftests and fix a v6 localhost resolver issue, from John Fastabend. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linuxDavid S. Miller authored
Saeed Mahameed says: ==================== mlx5-updates-2020-08-03 This patchset introduces some updates to mlx5 driver. 1) Jakub converts mlx5 to use the new udp tunnel infrastructure. Starting with a hack to allow drivers to request a static configuration of the default vxlan port, and then a patch that converts mlx5. 2) Parav implements change_carrier ndo for VF eswitch representors, to speedup link state control of representors netdevices. 3) Alex Vesker, makes a simple update to software steering to fix an issue with push vlan action sequence 4) Leon removes a redundant dump stack on error flow. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Edward Cree says: ==================== sfc: driver for EF100 family NICs, part 2 This series implements the data path and various other functionality for Xilinx/Solarflare EF100 NICs. Changed from v2: * Improved error handling of design params (patch #3) * Removed 'inline' from .c file in patch #4 * Don't report common stats to ethtool -S (patch #8) Changed from v1: * Fixed build errors on CONFIG_RFS_ACCEL=n (patch #5) and 32-bit (patch #8) * Dropped patch #10 (ethtool ops) as it's buggy and will need a bigger rework to fix. ==================== Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Edward Cree authored
We don't yet have a .sriov_configure() to create them, though. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Edward Cree authored
We'll need it later, for VF representors. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-