- 15 Oct, 2020 28 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski authored
Daniel Borkmann says: ==================== pull-request: bpf-next 2020-10-15 The main changes are: 1) Fix register equivalence tracking in verifier, from Alexei Starovoitov. 2) Fix sockmap error path to not call bpf_prog_put() with NULL, from Alex Dewar. 3) Fix sockmap to add locking annotations to iterator, from Lorenz Bauer. 4) Fix tcp_hdr_options test to use loopback address, from Martin KaFai Lau. ==================== Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski authored
Minor conflicts in net/mptcp/protocol.h and tools/testing/selftests/net/Makefile. In both cases code was added on both sides in the same place so just keep both. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
This reverts commit 1d273fcc. Alexei points out there's nothing implying headers will be built and therefore exist under usr/include, so this fix doesn't make much sense. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Dewar authored
If bpf_prog_inc_not_zero() fails for skb_parser, then bpf_prog_put() is called unconditionally on skb_verdict, even though it may be NULL. Fix and tidy up error path. Fixes: 743df8b7 ("bpf, sockmap: Check skb_verdict and skb_parser programs explicitly") Addresses-Coverity-ID: 1497799: Null pointer dereferences (FORWARD_NULL) Signed-off-by: Alex Dewar <alex.dewar90@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20201012170952.60750-1-alex.dewar90@gmail.com
-
Martin KaFai Lau authored
The tcp_hdr_options test adds a "::eB9F" addr to the lo dev. However, this non loopback address will have a race on ipv6 dad which may lead to EADDRNOTAVAIL error from time to time. Even nodad is used in the iproute2 command, there is still a race in when the route will be added. This will then lead to ENETUNREACH from time to time. To avoid the above, this patch uses the default loopback address "::1" to do the test. Fixes: ad2f8eb0 ("bpf: selftests: Tcp header options") Reported-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20201012234940.1707941-1-kafai@fb.com
-
Lorenz Bauer authored
The sparse checker currently outputs the following warnings: include/linux/rcupdate.h:632:9: sparse: sparse: context imbalance in 'sock_hash_seq_start' - wrong count at exit include/linux/rcupdate.h:632:9: sparse: sparse: context imbalance in 'sock_map_seq_start' - wrong count at exit Add the necessary __acquires and __release annotations to make the iterator locking schema palatable to sparse. Also add __must_hold for good measure. The kernel codebase uses both __acquires(rcu) and __acquires(RCU). I couldn't find any guidance which one is preferred, so I used what is easier to type out. Fixes: 03653515 ("net: Allow iterating sockmap and sockhash") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20201012091850.67452-1-lmb@cloudflare.com
-
Davide Caratti authored
nftables payload statements are used to mangle SCTP headers, but they can only replace the Internet Checksum. As a consequence, nftables rules that mangle sport/dport/vtag in SCTP headers potentially generate packets that are discarded by the receiver, unless the CRC-32C is "offloaded" (e.g the rule mangles a skb having 'ip_summed' equal to 'CHECKSUM_PARTIAL'. Fix this extending uAPI definitions and L4 checksum update function, in a way that userspace programs (e.g. nft) can instruct the kernel to compute CRC-32C in SCTP headers. Also ensure that LIBCRC32C is built if NF_TABLES is 'y' or 'm' in the kernel build configuration. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fsJakub Kicinski authored
David Howells says: ==================== rxrpc fixes Here are a couple of fixes that need to be applied on top of rxrpc patches in net-next: (1) Fix a bug in the connection bundle changes in the net-next tree. (2) Fix the loss of final ACK on socket shutdown. ==================== Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Yonghong Song authored
Commit 4fc427e0 ("ipv6_route_seq_next should increase position index") tried to fix the issue where seq_file pos is not increased if a NULL element is returned with seq_ops->next(). See bug https://bugzilla.kernel.org/show_bug.cgi?id=206283 The commit effectively does: - increase pos for all seq_ops->start() - increase pos for all seq_ops->next() For ipv6_route, increasing pos for all seq_ops->next() is correct. But increasing pos for seq_ops->start() is not correct since pos is used to determine how many items to skip during seq_ops->start(): iter->skip = *pos; seq_ops->start() just fetches the *current* pos item. The item can be skipped only after seq_ops->show() which essentially is the beginning of seq_ops->next(). For example, I have 7 ipv6 route entries, root@arch-fb-vm1:~/net-next dd if=/proc/net/ipv6_route bs=4096 00000000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000400 00000001 00000000 00000001 eth0 fe800000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000001 00000000 00000001 eth0 00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo 00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000003 00000000 80200001 lo fe800000000000002050e3fffebd3be8 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80200001 eth0 ff000000000000000000000000000000 08 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000004 00000000 00000001 eth0 00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo 0+1 records in 0+1 records out 1050 bytes (1.0 kB, 1.0 KiB) copied, 0.00707908 s, 148 kB/s root@arch-fb-vm1:~/net-next In the above, I specify buffer size 4096, so all records can be returned to user space with a single trip to the kernel. If I use buffer size 128, since each record size is 149, internally kernel seq_read() will read 149 into its internal buffer and return the data to user space in two read() syscalls. Then user read() syscall will trigger next seq_ops->start(). Since the current implementation increased pos even for seq_ops->start(), it will skip record #2, #4 and #6, assuming the first record is #1. root@arch-fb-vm1:~/net-next dd if=/proc/net/ipv6_route bs=128 00000000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000400 00000001 00000000 00000001 eth0 00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo fe800000000000002050e3fffebd3be8 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80200001 eth0 00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo 4+1 records in 4+1 records out 600 bytes copied, 0.00127758 s, 470 kB/s To fix the problem, create a fake pos pointer so seq_ops->start() won't actually increase seq_file pos. With this fix, the above `dd` command with `bs=128` will show correct result. Fixes: 4fc427e0 ("ipv6_route_seq_next should increase position index") Cc: Alexei Starovoitov <ast@kernel.org> Suggested-by: Vasily Averin <vvs@virtuozzo.com> Reviewed-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Karsten Graul says: ==================== net/smc: fixes 2020-10-14 The first patch fixes a possible use-after-free of delayed llc events. Patch 2 corrects the number of DMB buffer sizes. And patch 3 ensures a correctly formatted return code when smc_ism_register_dmb() fails to create a new DMB. ==================== Link: https://lore.kernel.org/r/20201014174329.35791-1-kgraul@linux.ibm.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Karsten Graul authored
smc_ism_register_dmb() returns error codes set by the ISM driver which are not guaranteed to be negative or in the errno range. Such values would not be handled by ERR_PTR() and finally the return code will be used as a memory address. Fix that by using a valid negative errno value with ERR_PTR(). Fixes: 72b7f6c4 ("net/smc: unique reason code for exceeded max dmb count") Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Karsten Graul authored
The SMCD_DMBE_SIZES should include all valid DMBE buffer sizes, so the correct value is 6 which means 1MB. With 7 the registration of an ISM buffer would always fail because of the invalid size requested. Fix that and set the value to 6. Fixes: c6ba7c9b ("net/smc: add base infrastructure for SMC-D and ISM") Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Karsten Graul authored
When a delayed event is enqueued then the event worker will send this event the next time it is running and no other flow is currently active. The event handler is called for the delayed event, and the pointer to the event keeps set in lgr->delayed_event. This pointer is cleared later in the processing by smc_llc_flow_start(). This can lead to a use-after-free condition when the processing does not reach smc_llc_flow_start(), but frees the event because of an error situation. Then the delayed_event pointer is still set but the event is freed. Fix this by always clearing the delayed event pointer when the event is provided to the event handler for processing, and remove the code to clear it in smc_llc_flow_start(). Fixes: 555da9af ("net/smc: add event-based llc_flow framework") Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
YueHaibing authored
IF CONFIG_BPFILTER_UMH is set, building fails: In file included from /usr/include/sys/socket.h:33:0, from net/bpfilter/main.c:6: /usr/include/bits/socket.h:390:10: fatal error: asm/socket.h: No such file or directory #include <asm/socket.h> ^~~~~~~~~~~~~~ compilation terminated. scripts/Makefile.userprogs:43: recipe for target 'net/bpfilter/main.o' failed make[2]: *** [net/bpfilter/main.o] Error 1 Add missing include path to fix this. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Ayush Sawal authored
This patch changes the module name to "ch_ipsec" and prepends "ch_ipsec" string instead of "chcr" in all debug messages and function names. V1->V2: -Removed inline keyword from functions. -Removed CH_IPSEC prefix from pr_debug. -Used proper indentation for the continuation line of the function arguments. V2->V3: Fix the checkpatch.pl warnings. Fixes: 1b77be46 ("crypto/chcr: Moving chelsio's inline ipsec functionality to /drivers/net") Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Leon Romanovsky authored
The access of tcf_tunnel_info() produces the following splat, so fix it by dereferencing the tcf_tunnel_key_params pointer with marker that internal tcfa_liock is held. ============================= WARNING: suspicious RCU usage 5.9.0+ #1 Not tainted ----------------------------- include/net/tc_act/tc_tunnel_key.h:59 suspicious rcu_dereference_protected() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by tc/34839: #0: ffff88828572c2a0 (&p->tcfa_lock){+...}-{2:2}, at: tc_setup_flow_action+0xb3/0x48b5 stack backtrace: CPU: 1 PID: 34839 Comm: tc Not tainted 5.9.0+ #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x9a/0xd0 tc_setup_flow_action+0x14cb/0x48b5 fl_hw_replace_filter+0x347/0x690 [cls_flower] fl_change+0x2bad/0x4875 [cls_flower] tc_new_tfilter+0xf6f/0x1ba0 rtnetlink_rcv_msg+0x5f2/0x870 netlink_rcv_skb+0x124/0x350 netlink_unicast+0x433/0x700 netlink_sendmsg+0x6f1/0xbd0 sock_sendmsg+0xb0/0xe0 ____sys_sendmsg+0x4fa/0x6d0 ___sys_sendmsg+0x12e/0x1b0 __sys_sendmsg+0xa4/0x120 do_syscall_64+0x2d/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f1f8cd4fe57 Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 RSP: 002b:00007ffdc1e193b8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1f8cd4fe57 RDX: 0000000000000000 RSI: 00007ffdc1e19420 RDI: 0000000000000003 RBP: 000000005f85aafa R08: 0000000000000001 R09: 00007ffdc1e1936c R10: 000000000040522d R11: 0000000000000246 R12: 0000000000000001 R13: 0000000000000000 R14: 00007ffdc1e1d6f0 R15: 0000000000482420 Fixes: 3ebaf6da ("net: sched: Do not assume RTNL is held in tunnel key action helpers") Fixes: 7a472814 ("net: sched: lock action when translating it to flow_action infra") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alexei Starovoitov authored
The 64-bit JEQ/JNE handling in reg_set_min_max() was clearing reg->id in either true or false branch. In the case 'if (reg->id)' check was done on the other branch the counter part register would have reg->id == 0 when called into find_equal_scalars(). In such case the helper would incorrectly identify other registers with id == 0 as equivalent and propagate the state incorrectly. Fix it by preserving ID across reg_set_min_max(). In other words any kind of comparison operator on the scalar register should preserve its ID to recognize: r1 = r2 if (r1 == 20) { #1 here both r1 and r2 == 20 } else if (r2 < 20) { #2 here both r1 and r2 < 20 } The patch is addressing #1 case. The #2 was working correctly already. Fixes: 75748837 ("bpf: Propagate scalar ranges through register assignments.") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Tested-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20201014175608.1416-1-alexei.starovoitov@gmail.com
-
David Howells authored
Fix the loss of transmission of a call's final ack when a socket gets shut down. This means that the server will retransmit the last data packet or send a ping ack and then get an ICMP indicating the port got closed. The server will then view this as a failure. Fixes: 3136ef49 ("rxrpc: Delay terminal ACK transmission on a client call") Signed-off-by: David Howells <dhowells@redhat.com>
-
David Howells authored
Fix rxrpc_unbundle_conn() to not drop the bundle usage count when cleaning up an exclusive connection. Based on the suggested fix from Hillf Danton. Fixes: 245500d8 ("rxrpc: Rewrite the client connection manager") Reported-by: syzbot+d57aaf84dd8a550e6d91@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Hillf Danton <hdanton@sina.com>
-
Pablo Neira Ayuso authored
This definition is used by the iptables legacy UAPI, restore it. Fixes: d3519cb8 ("netfilter: nf_tables: add inet ingress support") Reported-by: Jason A. Donenfeld <Jason@zx2c4.com> Tested-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
David Wilder says: ==================== ibmveth gso fix The ibmveth driver is a virtual Ethernet driver used on IBM pSeries systems. Gso packets can be sent between LPARS (virtual hosts) without segmentation, by flagging gso packets using one of two methods depending on the firmware version. Some gso packet were not correctly identified by the receiver. This patch-set corrects this issue. V2: - Added fix tags. - Byteswap the constant at compilation time. - Updated the commit message to clarify what frame validation is performed by the hypervisor. ==================== Link: https://lore.kernel.org/r/20201013232014.26044-1-dwilder@us.ibm.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
David Wilder authored
Ingress large send packets are identified by either: The IBMVETH_RXQ_LRG_PKT flag in the receive buffer or with a -1 placed in the ip header checksum. The method used depends on firmware version. Frame geometry and sufficient header validation is performed by the hypervisor eliminating the need for further header checks here. Fixes: 7b596738 ("ibmveth: set correct gso_size and gso_type") Signed-off-by: David Wilder <dwilder@us.ibm.com> Reviewed-by: Thomas Falcon <tlfalcon@linux.ibm.com> Reviewed-by: Cristobal Forno <cris.forno@ibm.com> Reviewed-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
David Wilder authored
ibmveth_rx_csum_helper() must be called after ibmveth_rx_mss_helper() as ibmveth_rx_csum_helper() may alter ip and tcp checksum values. Fixes: 66aa0678 ("ibmveth: Support to enable LSO/CSO for Trunk VEA.") Signed-off-by: David Wilder <dwilder@us.ibm.com> Reviewed-by: Thomas Falcon <tlfalcon@linux.ibm.com> Reviewed-by: Cristobal Forno <cris.forno@ibm.com> Reviewed-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Herat Ramani authored
The 4-tuple NAT offload via PEDIT always overwrites all the 4-tuple fields even if they had not been explicitly enabled. If any fields in the 4-tuple are not enabled, then the hardware overwrites the disabled fields with zeros, instead of ignoring them. So, add a parser that can translate the enabled 4-tuple PEDIT fields to one of the NAT mode combinations supported by the hardware and hence avoid overwriting disabled fields to 0. Any rule with unsupported NAT mode combination is rejected. Signed-off-by: Herat Ramani <herat@chelsio.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Mathieu Desnoyers says: ==================== l3mdev icmp error route lookup fixes Here is a series of fixes for ipv4 and ipv6 which ensure the route lookup is performed on the right routing table in VRF configurations when sending TTL expired icmp errors (useful for traceroute). It includes tests for both ipv4 and ipv6. These fixes address specifically address the code paths involved in sending TTL expired icmp errors. As detailed in the individual commit messages, those fixes do not address similar icmp errors related to network namespaces and unreachable / fragmentation needed messages, which appear to use different code paths. ==================== Link: https://lore.kernel.org/r/20201012145016.2023-1-mathieu.desnoyers@efficios.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Michael Jeanson authored
The objective of the tests is to check that ICMP errors generated while crossing between VRFs are properly routed back to the source host. The first ttl test sends a ping with a ttl of 1 from h1 to h2 and parses the output of the command to check that a ttl expired error is received. The second ttl test runs traceroute from h1 to h2 and parses the output to check for a hop on r1. The mtu test sends a ping with a payload of 1450 from h1 to h2, through r1 which has an interface with a mtu of 1400 and parses the output of the command to check that a fragmentation needed error is received. [ The IPv6 MTU test still fails with the symmetric routing setup. It appears to be caused by source address selection picking ::1. Fixing this is beyond the scope of this series. ] Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Mathieu Desnoyers authored
As per RFC4443, the destination address field for ICMPv6 error messages is copied from the source address field of the invoking packet. In configurations with Virtual Routing and Forwarding tables, looking up which routing table to use for sending ICMPv6 error messages is currently done by using the destination net_device. If the source and destination interfaces are within separate VRFs, or one in the global routing table and the other in a VRF, looking up the source address of the invoking packet in the destination interface's routing table will fail if the destination interface's routing table contains no route to the invoking packet's source address. One observable effect of this issue is that traceroute6 does not work in the following cases: - Route leaking between global routing table and VRF - Route leaking between VRFs Use the source device routing table when sending ICMPv6 error messages. [ In the context of ipv4, it has been pointed out that a similar issue may exist with ICMP errors triggered when forwarding between network namespaces. It would be worthwhile to investigate whether ipv6 has similar issues, but is outside of the scope of this investigation. ] [ Testing shows that similar issues exist with ipv6 unreachable / fragmentation needed messages. However, investigation of this additional failure mode is beyond this investigation's scope. ] Link: https://tools.ietf.org/html/rfc4443Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Mathieu Desnoyers authored
As per RFC792, ICMP errors should be sent to the source host. However, in configurations with Virtual Routing and Forwarding tables, looking up which routing table to use is currently done by using the destination net_device. commit 9d1a6c4e ("net: icmp_route_lookup should use rt dev to determine L3 domain") changes the interface passed to l3mdev_master_ifindex() and inet_addr_type_dev_table() from skb_in->dev to skb_dst(skb_in)->dev. This effectively uses the destination device rather than the source device for choosing which routing table should be used to lookup where to send the ICMP error. Therefore, if the source and destination interfaces are within separate VRFs, or one in the global routing table and the other in a VRF, looking up the source host in the destination interface's routing table will fail if the destination interface's routing table contains no route to the source host. One observable effect of this issue is that traceroute does not work in the following cases: - Route leaking between global routing table and VRF - Route leaking between VRFs Preferably use the source device routing table when sending ICMP error messages. If no source device is set, fall-back on the destination device routing table. Else, use the main routing table (index 0). [ It has been pointed out that a similar issue may exist with ICMP errors triggered when forwarding between network namespaces. It would be worthwhile to investigate, but is outside of the scope of this investigation. ] [ It has also been pointed out that a similar issue exists with unreachable / fragmentation needed messages, which can be triggered by changing the MTU of eth1 in r1 to 1400 and running: ip netns exec h1 ping -s 1450 -Mdo -c1 172.16.2.2 Some investigation points to raw_icmp_error() and raw_err() as being involved in this last scenario. The focus of this patch is TTL expired ICMP messages, which go through icmp_route_lookup. Investigation of failure modes related to raw_icmp_error() is beyond this investigation's scope. ] Fixes: 9d1a6c4e ("net: icmp_route_lookup should use rt dev to determine L3 domain") Link: https://tools.ietf.org/html/rfc792Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 14 Oct, 2020 12 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfJakub Kicinski authored
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Extend nf_queue selftest to cover re-queueing, non-gso mode and delayed queueing, from Florian Westphal. 2) Clear skb->tstamp in IPVS forwarding path, from Julian Anastasov. 3) Provide netlink extended error reporting for EEXIST case. 4) Missing VLAN offload tag and proto in log target. ==================== Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linuxJakub Kicinski authored
Saeed Mahameed says: ==================== mlx5-updates-2020-10-12 Updates to mlx5 driver: - Cleanup fix of uininitialized pointer read - xfrm IPSec TX offload ==================== Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Tony Nguyen says: ==================== 40GbE Intel Wired LAN Driver Updates 2020-10-12 This series contains updates to i40e and e1000 drivers. Jaroslaw adds support for changing FEC on i40e if the firmware supports it. Jesse fixes a kbuild-bot warning regarding ternary operator on e1000. v2: Return -EOPNOTSUPP instead of -EINVAL when FEC settings are not supported by firmware. Remove, unneeded, done label and return errors directly in i40e_set_fec_param() for patch 1. Dropped, previous patch 2, to send to net. ==================== Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jesse Brandeburg authored
The e1000_clear_vfta function was triggering a warning in kbuild-bot testing. It's actually a bug but has no functional impact. drivers/net/ethernet/intel/e1000/e1000_hw.c:4415:58: warning: Same expression in both branches of ternary operator. [duplicateExpressionTernary] Fix this warning by removing the offending code and simplifying the routine to do exactly what it did before, no functional change. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jaroslaw Gawin authored
Starting with API version 1.10 firmware for X722 devices has ability to change FEC settings in PHY. Code added in this patch allows changing FEC settings if the capability flag indicates the device supports this feature. Signed-off-by: Jaroslaw Gawin <jaroslawx.gawin@intel.com> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Cong Wang authored
GRE tunnel has its own header_ops, ipgre_header_ops, and sets it conditionally. When it is set, it assumes the outer IP header is already created before ipgre_xmit(). This is not true when we send packets through a raw packet socket, where L2 headers are supposed to be constructed by user. Packet socket calls dev_validate_header() to validate the header. But GRE tunnel does not set dev->hard_header_len, so that check can be simply bypassed, therefore uninit memory could be passed down to ipgre_xmit(). Similar for dev->needed_headroom. dev->hard_header_len is supposed to be the length of the header created by dev->header_ops->create(), so it should be used whenever header_ops is set, and dev->needed_headroom should be used when it is not set. Reported-and-tested-by: syzbot+4a2c52677a8a1aa283cb@syzkaller.appspotmail.com Cc: William Tu <u9012063@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Xie He <xie.he.0141@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Heiner Kallweit says: ==================== net: add and use function dev_fetch_sw_netstats for fetching pcpu_sw_netstats In several places the same code is used to populate rtnl_link_stats64 fields with data from pcpu_sw_netstats. Therefore factor out this code to a new function dev_fetch_sw_netstats(). ==================== Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Heiner Kallweit authored
Simplify the code by using new function dev_fetch_sw_netstats(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/a6b816f4-bbf2-9db0-d59a-7e4e9cc808fe@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Heiner Kallweit authored
Simplify the code by using new function dev_fetch_sw_netstats(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/5e52dc91-97b1-82b0-214b-65d404e4a2ec@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Heiner Kallweit authored
Simplify the code by using new function dev_fetch_sw_netstats(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/93dda477-70ae-0ccf-71b4-bfebb66c9beb@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Heiner Kallweit authored
Simplify the code by using new function dev_fetch_sw_netstats(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/050f9a83-b195-a3d6-edbd-91a59040be21@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Heiner Kallweit authored
Simplify the code by using new function dev_fetch_sw_netstats(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Tested-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/b6047017-8226-6b7e-a3cd-064e69fdfa27@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-