- 11 Sep, 2018 1 commit
-
-
Steffen Klassert authored
Since commit 222d7dbd ("net: prevent dst uses after free") skb_dst_force() might clear the dst_entry attached to the skb. The xfrm code don't expect this to happen, so we crash with a NULL pointer dereference in this case. Fix it by checking skb_dst(skb) for NULL after skb_dst_force() and drop the packet in cast the dst_entry was cleared. Fixes: 222d7dbd ("net: prevent dst uses after free") Reported-by: Tobias Hommel <netdev-list@genoetigt.de> Reported-by: Kristian Evensen <kristian.evensen@gmail.com> Reported-by: Wolfgang Walter <linux@stwm.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
-
- 04 Sep, 2018 2 commits
-
-
Sowmini Varadhan authored
We only support one offloaded xfrm (we do not have devices that can handle more than one offload), so reset crypto_done in xfrm_input() when iterating over multiple transforms in xfrm_input, so that we can invoke the appropriate x->type->input for the non-offloaded transforms Fixes: d77e38e6 ("xfrm: Add an IPsec hardware offloading API") Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
-
Sowmini Varadhan authored
A policy may have been set up with multiple transforms (e.g., ESP and ipcomp). In this situation, the ingress IPsec processing iterates in xfrm_input() and applies each transform in turn, processing the nexthdr to find any additional xfrm that may apply. This patch resets the transport header back to network header only after the last transformation so that subsequent xfrms can find the correct transport header. Fixes: 7785bba2 ("esp: Add a software GRO codepath") Suggested-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
-
- 03 Sep, 2018 1 commit
-
-
Thadeu Lima de Souza Cascardo authored
After commit d6990976 ("vti6: fix PMTU caching and reporting on xmit"), some too big skbs might be potentially passed down to __xfrm6_output, causing it to fail to transmit but not free the skb, causing a leak of skb, and consequentially a leak of dst references. After running pmtu.sh, that shows as failure to unregister devices in a namespace: [ 311.397671] unregister_netdevice: waiting for veth_b to become free. Usage count = 1 The fix is to call kfree_skb in case of transmit failures. Fixes: dd767856 ("xfrm6: Don't call icmpv6_send on local error") Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
-
- 03 Aug, 2018 1 commit
-
-
Steffen Klassert authored
We don't validate the address prefix lengths in the xfrm selector we got from userspace. This can lead to undefined behaviour in the address matching functions if the prefix is too big for the given address family. Fix this by checking the prefixes and refuse SA/policy insertation when a prefix is invalid. Fixes: 1da177e4 ("Linux-2.6.12-rc2") Reported-by: Air Icy <icytxw@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
-
- 29 Jul, 2018 11 commits
-
-
Justin Pettit authored
The meter code would create an entry for each new meter. However, it would not set the meter id in the new entry, so every meter would appear to have a meter id of zero. This commit properly sets the meter id when adding the entry. Fixes: 96fbc13d ("openvswitch: Add meter infrastructure") Signed-off-by: Justin Pettit <jpettit@ovn.org> Cc: Andy Zhou <azhou@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Dmitry Safonov authored
Make ABI more strict about subscribing to group > ngroups. Code doesn't check for that and it looks bogus. (one can subscribe to non-existing group) Still, it's possible to bind() to all possible groups with (-1) Cc: "David S. Miller" <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: netdev@vger.kernel.org Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eugeniy Paltsev authored
As for today STMMAC_ALIGN macro (which is used to align DMA stuff) relies on L1 line length (L1_CACHE_BYTES). This isn't correct in case of system with several cache levels which might have L1 cache line length smaller than L2 line. This can lead to sharing one cache line between DMA buffer and other data, so we can lose this data while invalidate DMA buffer before DMA transaction. Fix that by using SMP_CACHE_BYTES instead of L1_CACHE_BYTES for aligning. Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Neal Cardwell authored
For some very small BDPs (with just a few packets) there was a quantization effect where the target number of packets in flight during the super-unity-gain (1.25x) phase of gain cycling was implicitly truncated to a number of packets no larger than the normal unity-gain (1.0x) phase of gain cycling. This meant that in multi-flow scenarios some flows could get stuck with a lower bandwidth, because they did not push enough packets inflight to discover that there was more bandwidth available. This was really only an issue in multi-flow LAN scenarios, where RTTs and BDPs are low enough for this to be an issue. This fix ensures that gain cycling can raise inflight for small BDPs by ensuring that in PROBE_BW mode target inflight values with a super-unity gain are always greater than inflight values with a gain <= 1. Importantly, this applies whether the inflight value is calculated for use as a cwnd value, or as a target inflight value for the end of the super-unity phase in bbr_is_next_cycle_phase() (both need to be bigger to ensure we can probe with more packets in flight reliably). This is a candidate fix for stable releases. Fixes: 0f8782ea ("tcp_bbr: add BBR congestion control") Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Priyaranjan Jha <priyarjha@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Jeremy Cline says: ==================== net: socket: Fix potential spectre v1 gadgets This fixes a pair of potential spectre v1 gadgets. Note that because the speculation window is large, the policy is to stop the speculative out-of-bounds load and not worry if the attack can be completed with a dependent load or store[0]. [0] https://marc.info/?l=linux-kernel&m=152449131114778 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jeremy Cline authored
'family' can be a user-controlled value, so sanitize it after the bounds check to avoid speculative out-of-bounds access. Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Jeremy Cline <jcline@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jeremy Cline authored
'call' is a user-controlled value, so sanitize the array index after the bounds check to avoid speculating past the bounds of the 'nargs' array. Found with the help of Smatch: net/socket.c:2508 __do_sys_socketcall() warn: potential spectre issue 'nargs' [r] (local cap) Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Jeremy Cline <jcline@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller authored
Daniel Borkmann says: ==================== pull-request: bpf 2018-07-28 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) API fixes for libbpf's BTF mapping of map key/value types in order to make them compatible with iproute2's BPF_ANNOTATE_KV_PAIR() markings, from Martin. 2) Fix AF_XDP to not report POLLIN prematurely by using the non-cached consumer pointer of the RX queue, from Björn. 3) Fix __xdp_return() to check for NULL pointer after the rhashtable lookup that retrieves the allocator object, from Taehee. 4) Fix x86-32 JIT to adjust ebp register in prologue and epilogue by 4 bytes which got removed from overall stack usage, from Wang. 5) Fix bpf_skb_load_bytes_relative() length check to use actual packet length, from Daniel. 6) Fix uninitialized return code in libbpf bpf_perf_event_read_simple() handler, from Thomas. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Anton Vasilyev authored
mdio_mux_iproc_probe() uses platform_set_drvdata() to store md pointer in device, whereas mdio_mux_iproc_remove() restores md pointer by dev_get_platdata(&pdev->dev). This leads to wrong resources release. The patch replaces getter to platform_get_drvdata. Fixes: 98bc865a ("net: mdio-mux: Add MDIO mux driver for iProc SoCs") Signed-off-by: Anton Vasilyev <vasilyev@ispras.ru> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Lorenzo Bianconi authored
Remove BUG_ON() from fib_compute_spec_dst routine and check in_dev pointer during flowi4 data structure initialization. fib_compute_spec_dst routine can be run concurrently with device removal where ip_ptr net_device pointer is set to NULL. This can happen if userspace enables pkt info on UDP rx socket and the device is removed while traffic is flowing Fixes: 35ebf65e ("ipv4: Create and use fib_compute_spec_dst() helper") Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Govindarajulu Varadarajan authored
When driver gets notification for mtu change, driver does not handle it for all RQs. It handles only RQ[0]. Fix is to use enic_change_mtu() interface to change mtu for vf. Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 28 Jul, 2018 5 commits
-
-
Stefan Wahren authored
As long the bh tasklet isn't scheduled once, no packet from the rx path will be handled. Since the tx path also schedule the same tasklet this situation only persits until the first packet transmission. So fix this issue by scheduling the tasklet after link reset. Link: https://github.com/raspberrypi/linux/issues/2617 Fixes: 55d7de9d ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet") Suggested-by: Floris Bos <bos@je-eigen-domein.nl> Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
John Hurley authored
Function nfp_flower_repr_get_type_and_port expects an enum nfp_repr_type return value but, if the repr type is unknown, returns a value of type enum nfp_flower_cmsg_port_type. This means that if FW encodes the port ID in a way the driver does not understand instead of dropping the frame driver may attribute it to a physical port (uplink) provided the port number is less than physical port count. Fix this and ensure a net_device of NULL is returned if the repr can not be determined. Fixes: 1025351a ("nfp: add flower app") Signed-off-by: John Hurley <john.hurley@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Taehee Yoo authored
bpf_parse_prog() is protected by rcu_read_lock(). so that GFP_KERNEL is not allowed in the bpf_parse_prog(). [51015.579396] ============================= [51015.579418] WARNING: suspicious RCU usage [51015.579444] 4.18.0-rc6+ #208 Not tainted [51015.579464] ----------------------------- [51015.579488] ./include/linux/rcupdate.h:303 Illegal context switch in RCU read-side critical section! [51015.579510] other info that might help us debug this: [51015.579532] rcu_scheduler_active = 2, debug_locks = 1 [51015.579556] 2 locks held by ip/1861: [51015.579577] #0: 00000000a8c12fd1 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x2e0/0x910 [51015.579711] #1: 00000000bf815f8e (rcu_read_lock){....}, at: lwtunnel_build_state+0x96/0x390 [51015.579842] stack backtrace: [51015.579869] CPU: 0 PID: 1861 Comm: ip Not tainted 4.18.0-rc6+ #208 [51015.579891] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015 [51015.579911] Call Trace: [51015.579950] dump_stack+0x74/0xbb [51015.580000] ___might_sleep+0x16b/0x3a0 [51015.580047] __kmalloc_track_caller+0x220/0x380 [51015.580077] kmemdup+0x1c/0x40 [51015.580077] bpf_parse_prog+0x10e/0x230 [51015.580164] ? kasan_kmalloc+0xa0/0xd0 [51015.580164] ? bpf_destroy_state+0x30/0x30 [51015.580164] ? bpf_build_state+0xe2/0x3e0 [51015.580164] bpf_build_state+0x1bb/0x3e0 [51015.580164] ? bpf_parse_prog+0x230/0x230 [51015.580164] ? lock_is_held_type+0x123/0x1a0 [51015.580164] lwtunnel_build_state+0x1aa/0x390 [51015.580164] fib_create_info+0x1579/0x33d0 [51015.580164] ? sched_clock_local+0xe2/0x150 [51015.580164] ? fib_info_update_nh_saddr+0x1f0/0x1f0 [51015.580164] ? sched_clock_local+0xe2/0x150 [51015.580164] fib_table_insert+0x201/0x1990 [51015.580164] ? lock_downgrade+0x610/0x610 [51015.580164] ? fib_table_lookup+0x1920/0x1920 [51015.580164] ? lwtunnel_valid_encap_type.part.6+0xcb/0x3a0 [51015.580164] ? rtm_to_fib_config+0x637/0xbd0 [51015.580164] inet_rtm_newroute+0xed/0x1b0 [51015.580164] ? rtm_to_fib_config+0xbd0/0xbd0 [51015.580164] rtnetlink_rcv_msg+0x331/0x910 [ ... ] Fixes: 3a0af8fd ("bpf: BPF for lightweight tunnel infrastructure") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Daniel Borkmann authored
The len > skb_headlen(skb) cannot be used as a maximum upper bound for the packet length since it does not have any relation to the full linear packet length when filtering is used from upper layers (e.g. in case of reuseport BPF programs) as by then skb->data, skb->len already got mangled through __skb_pull() and others. Fixes: 4e1ec56c ("bpf: add skb_load_bytes_relative helper") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com>
-
Thomas Richter authored
In linux-next tree compiling the perf tool with additional make flags EXTRA_CFLAGS="-Wp,-D_FORTIFY_SOURCE=2 -O2" causes a compiler error. It is the warning 'variable may be used uninitialized' which is treated as error: I compile it using a FEDORA 28 installation, my gcc compiler version: gcc (GCC) 8.0.1 20180324 (Red Hat 8.0.1-0.20). The file that causes the error is tools/lib/bpf/libbpf.c. [root@p23lp27] # make V=1 EXTRA_CFLAGS="-Wp,-D_FORTIFY_SOURCE=2 -O2" [...] Makefile.config:849: No openjdk development package found, please install JDK package, e.g. openjdk-8-jdk, java-1.8.0-openjdk-devel Warning: Kernel ABI header at 'tools/include/uapi/linux/if_link.h' differs from latest version at 'include/uapi/linux/if_link.h' CC libbpf.o libbpf.c: In function ‘bpf_perf_event_read_simple’: libbpf.c:2342:6: error: ‘ret’ may be used uninitialized in this function [-Werror=maybe-uninitialized] int ret; ^ cc1: all warnings being treated as errors mv: cannot stat './.libbpf.o.tmp': No such file or directory /home6/tmricht/linux-next/tools/build/Makefile.build:96: recipe for target 'libbpf.o' failed Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
- 27 Jul, 2018 4 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsecDavid S. Miller authored
Steffen Klassert says: ==================== pull request (net): ipsec 2018-07-27 1) Fix PMTU handling of vti6. We update the PMTU on the xfrm dst_entry which is not cached anymore after the flowchache removal. So update the PMTU of the original dst_entry instead. From Eyal Birger. 2) Fix a leak of kernel memory to userspace. From Eric Dumazet. 3) Fix a possible dst_entry memleak in xfrm_lookup_route. From Tommi Rantala. 4) Fix a skb leak in case we can't call nlmsg_multicast from xfrm_nlmsg_multicast. From Florian Westphal. 5) Fix a leak of a temporary buffer in the error path of esp6_input. From Zhen Lei. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Gal Pressman authored
UBSAN triggers the following undefined behaviour warnings: [...] [ 13.236124] UBSAN: Undefined behaviour in drivers/net/ethernet/amazon/ena/ena_eth_com.c:468:22 [ 13.240043] shift exponent 64 is too large for 64-bit type 'long long unsigned int' [...] [ 13.744769] UBSAN: Undefined behaviour in drivers/net/ethernet/amazon/ena/ena_eth_com.c:373:4 [ 13.748694] shift exponent 64 is too large for 64-bit type 'long long unsigned int' [...] When splitting the address to high and low, GENMASK_ULL is used to generate a bitmask with dma_addr_bits field from io_sq (in ena_com_prepare_tx and ena_com_add_single_rx_desc). The problem is that dma_addr_bits is not initialized with a proper value (besides being cleared in ena_com_create_io_queue). Assign dma_addr_bits the correct value that is stored in ena_dev when initializing the SQ. Fixes: 1738cd3e ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Gal Pressman <pressmangal@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Martin KaFai Lau authored
The current map_check_btf() in BPF_MAP_TYPE_ARRAY rejects '> map->value_size' to ensure map_seq_show_elem() will not access things beyond an array element. Yonghong suggested that using '!=' is a more correct check. The 8 bytes round_up on value_size is stored in array->elem_size. Hence, using '!=' on map->value_size is a proper check. This patch also adds new tests to check the btf array key type and value type. Two of these new tests verify the btf's value_size (the change in this patch). It also fixes two existing tests that wrongly encoded a btf's type size (pprint_test) and the value_type_id (in one of the raw_tests[]). However, that do not affect these two BTF verification tests before or after this test changes. These two tests mainly failed at array creation time after this patch. Fixes: a26ca7c9 ("bpf: btf: Add pretty print support to the basic arraymap") Suggested-by: Yonghong Song <yhs@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Taehee Yoo authored
rhashtable_lookup() can return NULL. so that NULL pointer check routine should be added. Fixes: 02b55e56 ("xdp: add MEM_TYPE_ZERO_COPY") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
- 26 Jul, 2018 7 commits
-
-
Avinash Repaka authored
Registration of a memory region(MR) through FRMR/fastreg(unlike FMR) needs a connection/qp. With a proxy qp, this dependency on connection will be removed, but that needs more infrastructure patches, which is a work in progress. As an intermediate fix, the get_mr returns EOPNOTSUPP when connection details are not populated. The MR registration through sendmsg() will continue to work even with fast registration, since connection in this case is formed upfront. This patch fixes the following crash: kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN Modules linked in: CPU: 1 PID: 4244 Comm: syzkaller468044 Not tainted 4.16.0-rc6+ #361 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:rds_ib_get_mr+0x5c/0x230 net/rds/ib_rdma.c:544 RSP: 0018:ffff8801b059f890 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: ffff8801b07e1300 RCX: ffffffff8562d96e RDX: 000000000000000d RSI: 0000000000000001 RDI: 0000000000000068 RBP: ffff8801b059f8b8 R08: ffffed0036274244 R09: ffff8801b13a1200 R10: 0000000000000004 R11: ffffed0036274243 R12: ffff8801b13a1200 R13: 0000000000000001 R14: ffff8801ca09fa9c R15: 0000000000000000 FS: 00007f4d050af700(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f4d050aee78 CR3: 00000001b0d9b006 CR4: 00000000001606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __rds_rdma_map+0x710/0x1050 net/rds/rdma.c:271 rds_get_mr_for_dest+0x1d4/0x2c0 net/rds/rdma.c:357 rds_setsockopt+0x6cc/0x980 net/rds/af_rds.c:347 SYSC_setsockopt net/socket.c:1849 [inline] SyS_setsockopt+0x189/0x360 net/socket.c:1828 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 RIP: 0033:0x4456d9 RSP: 002b:00007f4d050aedb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036 RAX: ffffffffffffffda RBX: 00000000006dac3c RCX: 00000000004456d9 RDX: 0000000000000007 RSI: 0000000000000114 RDI: 0000000000000004 RBP: 00000000006dac38 R08: 00000000000000a0 R09: 0000000000000000 R10: 0000000020000380 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fffbfb36d6f R14: 00007f4d050af9c0 R15: 0000000000000005 Code: fa 48 c1 ea 03 80 3c 02 00 0f 85 cc 01 00 00 4c 8b bb 80 04 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7f 68 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 9c 01 00 00 4d 8b 7f 68 48 b8 00 00 00 00 00 RIP: rds_ib_get_mr+0x5c/0x230 net/rds/ib_rdma.c:544 RSP: ffff8801b059f890 ---[ end trace 7e1cea13b85473b0 ]--- Reported-by: syzbot+b51c77ef956678a65834@syzkaller.appspotmail.com Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Avinash Repaka <avinash.repaka@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tariq Toukan authored
Fix dev_change_tx_queue_len so it rolls back original value upon a failure in dev_qdisc_change_tx_queue_len. This is already done for notifirers' failures, share the code. In case of failure in dev_qdisc_change_tx_queue_len, some tx queues would still be of the new length, while they should be reverted. Currently, the revert is not done, and is marked with a TODO label in dev_qdisc_change_tx_queue_len, and should find some nice solution to do it. Yet it is still better to not apply the newly requested value. Fixes: 48bfd55e ("net_sched: plug in qdisc ops change_tx_queue_len") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Reported-by: Ran Rozenstein <ranro@mellanox.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
tangpengpeng authored
If we enable or disable xgbe flow-control by ethtool , it does't work.Because the parameter is not properly assigned,so we need to adjust the assignment order of the parameters. Fixes: c1ce2f77 ("amd-xgbe: Fix flow control setting logic") Signed-off-by: tangpengpeng <tangpengpeng@higon.com> Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arjun Vynipadath authored
Break statements were missing for Geneve case in ndo_udp_tunnel_{add/del}, thereby raw mac matchall entries were not getting added. Fixes: c746fc0e("cxgb4: add geneve offload support for T6") Signed-off-by: Arjun Vynipadath <arjun@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Devlink resources registered with devlink_resource_register() have to be unregistered. Fixes: 37923ed6 ("netdevsim: Add simple FIB resource controller via devlink") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Björn Töpel authored
Polling for the ingress queues relies on reading the producer/consumer pointers of the Rx queue. Prior this commit, a cached consumer pointer could be used, instead of the actual consumer pointer and therefore report POLLIN prematurely. This patch makes sure that the non-cached consumer pointer is used instead. Reported-by: Qi Zhang <qi.z.zhang@intel.com> Tested-by: Qi Zhang <qi.z.zhang@intel.com> Fixes: c497176c ("xsk: add Rx receive functions and poll support") Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Wang YanQing authored
Commit 24dea047 ("bpf, x32: remove ld_abs/ld_ind") removed the 4 /* Extra space for skb_copy_bits buffer */ from _STACK_SIZE, but it didn't fix the concerned code in emit_prologue and emit_epilogue, and this error will bring very strange kernel runtime errors. This patch fixes it. Fixes: 24dea047 ("bpf, x32: remove ld_abs/ld_ind") Reported-by: Meelis Roos <mroos@linux.ee> Bisected-by: Meelis Roos <mroos@linux.ee> Signed-off-by: Wang YanQing <udknight@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
- 25 Jul, 2018 8 commits
-
-
Wei Yongjun authored
Fixes the following sparse warnings: net/ipv4/igmp.c:1391:6: warning: symbol '__ip_mc_inc_group' was not declared. Should it be static? Fixes: 6e2059b5 ("ipv4/igmp: init group mode as INCLUDE when join source group") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Lawrence Brakmo authored
We observed high 99 and 99.9% latencies when doing RPCs with DCTCP. The problem is triggered when the last packet of a request arrives CE marked. The reply will carry the ECE mark causing TCP to shrink its cwnd to 1 (because there are no packets in flight). When the 1st packet of the next request arrives, the ACK was sometimes delayed even though it is CWR marked, adding up to 40ms to the RPC latency. This patch insures that CWR marked data packets arriving will be acked immediately. Packetdrill script to reproduce the problem: 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 0.000 setsockopt(3, SOL_TCP, TCP_CONGESTION, "dctcp", 5) = 0 0.000 bind(3, ..., ...) = 0 0.000 listen(3, 1) = 0 0.100 < [ect0] SEW 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> 0.100 > SE. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8> 0.110 < [ect0] . 1:1(0) ack 1 win 257 0.200 accept(3, ..., ...) = 4 0.200 < [ect0] . 1:1001(1000) ack 1 win 257 0.200 > [ect01] . 1:1(0) ack 1001 0.200 write(4, ..., 1) = 1 0.200 > [ect01] P. 1:2(1) ack 1001 0.200 < [ect0] . 1001:2001(1000) ack 2 win 257 0.200 write(4, ..., 1) = 1 0.200 > [ect01] P. 2:3(1) ack 2001 0.200 < [ect0] . 2001:3001(1000) ack 3 win 257 0.200 < [ect0] . 3001:4001(1000) ack 3 win 257 0.200 > [ect01] . 3:3(0) ack 4001 0.210 < [ce] P. 4001:4501(500) ack 3 win 257 +0.001 read(4, ..., 4500) = 4500 +0 write(4, ..., 1) = 1 +0 > [ect01] PE. 3:4(1) ack 4501 +0.010 < [ect0] W. 4501:5501(1000) ack 4 win 257 // Previously the ACK sequence below would be 4501, causing a long RTO +0.040~+0.045 > [ect01] . 4:4(0) ack 5501 // delayed ack +0.311 < [ect0] . 5501:6501(1000) ack 4 win 257 // More data +0 > [ect01] . 4:4(0) ack 6501 // now acks everything +0.500 < F. 9501:9501(0) ack 4 win 257 Modified based on comments by Neal Cardwell <ncardwell@google.com> Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
dann frazier authored
Otherwise interfaces get exposed under /sys/devices/virtual, which doesn't give udev the context it needs for PCI-based predictable interface names. Signed-off-by: dann frazier <dann.frazier@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Toshiaki Makita authored
When received packets are dropped in virtio_net driver, received packets counter is incremented but bytes counter is not. As a result, for instance if we drop all packets by XDP, only received is counted and bytes stays 0, which looks inconsistent. IMHO received packets/bytes should be counted if packets are produced by the hypervisor, like what common NICs on physical machines are doing. So fix the bytes counter. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
Martin KaFai Lau says: ==================== The series allows the BPF loader to figure out the btf_key_id and btf_value_id from a map's name by using BPF_ANNOTATE_KV_PAIR() similarly as in iproute2 commit f823f36012fb ("bpf: implement btf handling and map annotation"). It also removes the old 'typedef' way which requires two separate typedefs (one for the key and one for the value). By doing this, iproute2 and libbpf have one consistent way to figure out the btf_key_type_id and btf_value_type_id for a map. The first two patches are some prep/cleanup works. The last patch introduces BPF_ANNOTATE_KV_PAIR. v3: - Replace some more *int*_t and u* usages with the equivalent __[su]* in btf.c v2: - Fix the incorrect '&&' check on container_type in bpf_map_find_btf_info(). - Expose the existing static btf_type_by_id() instead of creating a new one. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Martin KaFai Lau authored
This patch introduces BPF_ANNOTATE_KV_PAIR to signal the bpf loader about the btf key_type and value_type of a bpf map. Please refer to the changes in test_btf_haskv.c for its usage. Both iproute2 and libbpf loader will then have the same convention to find out the map's btf_key_type_id and btf_value_type_id from a map's name. Fixes: 8a138aed ("bpf: btf: Add BTF support to libbpf") Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Martin KaFai Lau authored
This patch replaces [u]int32_t and [u]int64_t usage with __[su]32 and __[su]64. The same change goes for [u]int16_t and [u]int8_t. Fixes: 8a138aed ("bpf: btf: Add BTF support to libbpf") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Martin KaFai Lau authored
This patch sync the uapi btf.h to tools/ Fixes: 36fc3c8c bpf: btf: Clean up BTF_INT_BITS() in uapi btf.h Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-