- 23 Feb, 2018 5 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller authored
Alexei Starovoitov says: ==================== pull-request: bpf 2018-02-22 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) two urgent fixes for bpf_tail_call logic for x64 and arm64 JITs, from Daniel. 2) cond_resched points in percpu array alloc/free paths, from Eric. 3) lockdep and other minor fixes, from Yonghong, Arnd, Anders, Li. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andy Spencer authored
Previously, buffer descriptors containing only the frame check sequence (FCS) were skipped and not added to the skb. However, the page reference count was still incremented, leading to a memory leak. Fixing this inside gfar_add_rx_frag() is difficult due to reserved memory handling and page reuse. Instead, move the FCS handling to gfar_process_frame() and trim off the FCS before passing the skb up the networking stack. Signed-off-by: Andy Spencer <aspencer@spacex.com> Signed-off-by: Jim Gruen <jgruen@spacex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arnd Bergmann authored
gcc-8 has a new warning that detects overlapping input and output arguments in memcpy(). It triggers for sit_init_net() calling ipip6_tunnel_clone_6rd(), which is actually correct: net/ipv6/sit.c: In function 'sit_init_net': net/ipv6/sit.c:192:3: error: 'memcpy' source argument is the same as destination [-Werror=restrict] The problem here is that the logic detecting the memcpy() arguments finds them to be the same, but the conditional that tests for the input and output of ipip6_tunnel_clone_6rd() to be identical is not a compile-time constant. We know that netdev_priv(t->dev) is the same as t for a tunnel device, and comparing "dev" directly here lets the compiler figure out as well that 'dev == sitn->fb_tunnel_dev' when called from sit_init_net(), so it no longer warns. This code is old, so Cc stable to make sure that we don't get the warning for older kernels built with new gcc. Cc: Martin Sebor <msebor@gmail.com> Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83456Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alexey Kodanev authored
The following use-after-free was reported by KASan when running LTP macvtap01 test on 4.16-rc2: [10642.528443] BUG: KASAN: use-after-free in macvlan_common_newlink+0x12ef/0x14a0 [macvlan] [10642.626607] Read of size 8 at addr ffff880ba49f2100 by task ip/18450 ... [10642.963873] Call Trace: [10642.994352] dump_stack+0x5c/0x7c [10643.035325] print_address_description+0x75/0x290 [10643.092938] kasan_report+0x28d/0x390 [10643.137971] ? macvlan_common_newlink+0x12ef/0x14a0 [macvlan] [10643.207963] macvlan_common_newlink+0x12ef/0x14a0 [macvlan] [10643.275978] macvtap_newlink+0x171/0x260 [macvtap] [10643.334532] rtnl_newlink+0xd4f/0x1300 ... [10646.256176] Allocated by task 18450: [10646.299964] kasan_kmalloc+0xa6/0xd0 [10646.343746] kmem_cache_alloc_trace+0xf1/0x210 [10646.397826] macvlan_common_newlink+0x6de/0x14a0 [macvlan] [10646.464386] macvtap_newlink+0x171/0x260 [macvtap] [10646.522728] rtnl_newlink+0xd4f/0x1300 ... [10647.022028] Freed by task 18450: [10647.061549] __kasan_slab_free+0x138/0x180 [10647.111468] kfree+0x9e/0x1c0 [10647.147869] macvlan_port_destroy+0x3db/0x650 [macvlan] [10647.211411] rollback_registered_many+0x5b9/0xb10 [10647.268715] rollback_registered+0xd9/0x190 [10647.319675] register_netdevice+0x8eb/0xc70 [10647.370635] macvlan_common_newlink+0xe58/0x14a0 [macvlan] [10647.437195] macvtap_newlink+0x171/0x260 [macvtap] Commit d02fd6e7 ("macvlan: Fix one possible double free") handles the case when register_netdevice() invokes ndo_uninit() on error and as a result free the port. But 'macvlan_port_get_rtnl(dev))' check (returns dev->rx_handler_data), which was added by this commit in order to prevent double free, is not quite correct: * for macvlan it always returns NULL because 'lowerdev' is the one that was used to register rx handler (port) in macvlan_port_create() as well as to unregister it in macvlan_port_destroy(). * for macvtap it always returns a valid pointer because macvtap registers its own rx handler before macvlan_common_newlink(). Fixes: d02fd6e7 ("macvlan: Fix one possible double free") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
I recently noticed a crash on arm64 when feeding a bogus index into BPF tail call helper. The crash would not occur when the interpreter is used, but only in case of JIT. Output looks as follows: [ 347.007486] Unable to handle kernel paging request at virtual address fffb850e96492510 [...] [ 347.043065] [fffb850e96492510] address between user and kernel address ranges [ 347.050205] Internal error: Oops: 96000004 [#1] SMP [...] [ 347.190829] x13: 0000000000000000 x12: 0000000000000000 [ 347.196128] x11: fffc047ebe782800 x10: ffff808fd7d0fd10 [ 347.201427] x9 : 0000000000000000 x8 : 0000000000000000 [ 347.206726] x7 : 0000000000000000 x6 : 001c991738000000 [ 347.212025] x5 : 0000000000000018 x4 : 000000000000ba5a [ 347.217325] x3 : 00000000000329c4 x2 : ffff808fd7cf0500 [ 347.222625] x1 : ffff808fd7d0fc00 x0 : ffff808fd7cf0500 [ 347.227926] Process test_verifier (pid: 4548, stack limit = 0x000000007467fa61) [ 347.235221] Call trace: [ 347.237656] 0xffff000002f3a4fc [ 347.240784] bpf_test_run+0x78/0xf8 [ 347.244260] bpf_prog_test_run_skb+0x148/0x230 [ 347.248694] SyS_bpf+0x77c/0x1110 [ 347.251999] el0_svc_naked+0x30/0x34 [ 347.255564] Code: 9100075a d280220a 8b0a002a d37df04b (f86b694b) [...] In this case the index used in BPF r3 is the same as in r1 at the time of the call, meaning we fed a pointer as index; here, it had the value 0xffff808fd7cf0500 which sits in x2. While I found tail calls to be working in general (also for hitting the error cases), I noticed the following in the code emission: # bpftool p d j i 988 [...] 38: ldr w10, [x1,x10] 3c: cmp w2, w10 40: b.ge 0x000000000000007c <-- signed cmp 44: mov x10, #0x20 // #32 48: cmp x26, x10 4c: b.gt 0x000000000000007c 50: add x26, x26, #0x1 54: mov x10, #0x110 // #272 58: add x10, x1, x10 5c: lsl x11, x2, #3 60: ldr x11, [x10,x11] <-- faulting insn (f86b694b) 64: cbz x11, 0x000000000000007c [...] Meaning, the tests passed because commit ddb55992 ("arm64: bpf: implement bpf_tail_call() helper") was using signed compares instead of unsigned which as a result had the test wrongly passing. Change this but also the tail call count test both into unsigned and cap the index as u32. Latter we did as well in 90caccdd ("bpf: fix bpf_tail_call() x64 JIT") and is needed in addition here, too. Tested on HiSilicon Hi1616. Result after patch: # bpftool p d j i 268 [...] 38: ldr w10, [x1,x10] 3c: add w2, w2, #0x0 40: cmp w2, w10 44: b.cs 0x0000000000000080 48: mov x10, #0x20 // #32 4c: cmp x26, x10 50: b.hi 0x0000000000000080 54: add x26, x26, #0x1 58: mov x10, #0x110 // #272 5c: add x10, x1, x10 60: lsl x11, x2, #3 64: ldr x11, [x10,x11] 68: cbz x11, 0x0000000000000080 [...] Fixes: ddb55992 ("arm64: bpf: implement bpf_tail_call() helper") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
- 22 Feb, 2018 17 commits
-
-
Daniel Borkmann authored
Implement a retpoline [0] for the BPF tail call JIT'ing that converts the indirect jump via jmp %rax that is used to make the long jump into another JITed BPF image. Since this is subject to speculative execution, we need to control the transient instruction sequence here as well when CONFIG_RETPOLINE is set, and direct it into a pause + lfence loop. The latter aligns also with what gcc / clang emits (e.g. [1]). JIT dump after patch: # bpftool p d x i 1 0: (18) r2 = map[id:1] 2: (b7) r3 = 0 3: (85) call bpf_tail_call#12 4: (b7) r0 = 2 5: (95) exit With CONFIG_RETPOLINE: # bpftool p d j i 1 [...] 33: cmp %edx,0x24(%rsi) 36: jbe 0x0000000000000072 |* 38: mov 0x24(%rbp),%eax 3e: cmp $0x20,%eax 41: ja 0x0000000000000072 | 43: add $0x1,%eax 46: mov %eax,0x24(%rbp) 4c: mov 0x90(%rsi,%rdx,8),%rax 54: test %rax,%rax 57: je 0x0000000000000072 | 59: mov 0x28(%rax),%rax 5d: add $0x25,%rax 61: callq 0x000000000000006d |+ 66: pause | 68: lfence | 6b: jmp 0x0000000000000066 | 6d: mov %rax,(%rsp) | 71: retq | 72: mov $0x2,%eax [...] * relative fall-through jumps in error case + retpoline for indirect jump Without CONFIG_RETPOLINE: # bpftool p d j i 1 [...] 33: cmp %edx,0x24(%rsi) 36: jbe 0x0000000000000063 |* 38: mov 0x24(%rbp),%eax 3e: cmp $0x20,%eax 41: ja 0x0000000000000063 | 43: add $0x1,%eax 46: mov %eax,0x24(%rbp) 4c: mov 0x90(%rsi,%rdx,8),%rax 54: test %rax,%rax 57: je 0x0000000000000063 | 59: mov 0x28(%rax),%rax 5d: add $0x25,%rax 61: jmpq *%rax |- 63: mov $0x2,%eax [...] * relative fall-through jumps in error case - plain indirect jump as before [0] https://support.google.com/faqs/answer/7625886 [1] https://github.com/gcc-mirror/gcc/commit/a31e654fa107be968b802786d747e962c2fcdb2bSigned-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
David Howells authored
All the kernel_sendmsg() calls in rxrpc_send_data_packet() need to send both parts of the iov[] buffer, but one of them does not. Fix it so that it does. Without this, short IPv6 rxrpc DATA packets may be seen that have the rxrpc header included, but no payload. Fixes: 5a924b89 ("rxrpc: Don't store the rxrpc header in the Tx queue sk_buffs") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Dan Carpenter authored
We should check "self->aq_hw" for allocation failure, and also we should free it on the error paths. Fixes: 23ee07ad ("net: aquantia: Cleanup pci functions module") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yonghong Song authored
Commit 9a3efb6b ("bpf: fix memory leak in lpm_trie map_free callback function") fixed a memory leak and removed unnecessary locks in map_free callback function. Unfortrunately, it introduced a lockdep warning. When lockdep checking is turned on, running tools/testing/selftests/bpf/test_lpm_map will have: [ 98.294321] ============================= [ 98.294807] WARNING: suspicious RCU usage [ 98.295359] 4.16.0-rc2+ #193 Not tainted [ 98.295907] ----------------------------- [ 98.296486] /home/yhs/work/bpf/kernel/bpf/lpm_trie.c:572 suspicious rcu_dereference_check() usage! [ 98.297657] [ 98.297657] other info that might help us debug this: [ 98.297657] [ 98.298663] [ 98.298663] rcu_scheduler_active = 2, debug_locks = 1 [ 98.299536] 2 locks held by kworker/2:1/54: [ 98.300152] #0: ((wq_completion)"events"){+.+.}, at: [<00000000196bc1f0>] process_one_work+0x157/0x5c0 [ 98.301381] #1: ((work_completion)(&map->work)){+.+.}, at: [<00000000196bc1f0>] process_one_work+0x157/0x5c0 Since actual trie tree removal happens only after no other accesses to the tree are possible, replacing rcu_dereference_protected(*slot, lockdep_is_held(&trie->lock)) with rcu_dereference_protected(*slot, 1) fixed the issue. Fixes: 9a3efb6b ("bpf: fix memory leak in lpm_trie map_free callback function") Reported-by: Eric Dumazet <edumazet@google.com> Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yonghong Song <yhs@fb.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Eric Dumazet authored
syszbot managed to trigger RCU detected stalls in bpf_array_free_percpu() It takes time to allocate a huge percpu map, but even more time to free it. Since we run in process context, use cond_resched() to yield cpu if needed. Fixes: a10423b8 ("bpf: introduce BPF_MAP_TYPE_PERCPU_ARRAY map") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
David S. Miller authored
Merge tag 'mac80211-for-davem-2018-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Various fixes across the tree, the shortlog basically says it all: cfg80211: fix cfg80211_beacon_dup -> old bug in this code cfg80211: clear wep keys after disconnection -> certain ways of disconnecting left the keys mac80211: round IEEE80211_TX_STATUS_HEADROOM up to multiple of 4 -> alignment issues with using 14 bytes mac80211: Do not disconnect on invalid operating class -> if the AP has a bogus operating class, let it be mac80211: Fix sending ADDBA response for an ongoing session -> don't send the same frame twice cfg80211: use only 1Mbps for basic rates in mesh -> interop issue with old versions of our code mac80211_hwsim: don't use WQ_MEM_RECLAIM -> it causes splats because it flushes work on a non-reclaim WQ regulatory: add NUL to request alpha2 -> nla_put_string() issue from Kees mac80211: mesh: fix wrong mesh TTL offset calculation -> protocol issue mac80211: fix a possible leak of station stats -> error path might leak memory mac80211: fix calling sleeping function in atomic context -> percpu allocations need to be made with gfp flags ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Johannes Berg authored
Similar to the ancient commit a5fe8e76 ("regulatory: add NUL to alpha2"), add another byte to alpha2 in the request struct so that when we use nla_put_string(), we don't overrun anything. Fixes: 73d54c9e ("cfg80211: add regulatory netlink multicast group") Reported-by: Kees Cook <keescook@google.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
-
Thomas Falcon authored
The login buffer is released before the driver can perform sanity checks between resources the driver requested and what firmware will provide. Don't release the login buffer until the sanity check is performed. Fixes: 34f0f4e3 ("ibmvnic: Fix login buffer memory leaks") Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Finn Thain authored
AFAIK the only version of smc9194.c with Mac support is the one in the linux-mac68k CVS repo, which never made it to the mainline. Despite that, from v2.3.45, arch/m68k/config.in listed CONFIG_SMC9194 under CONFIG_MAC. This mistake got carried over into Kconfig in v2.5.55. (See pre-git era "[PATCH] add m68k dependencies to net driver config".) Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Ahern authored
The result of the skb flow dissect is copied from keys to hash_keys to ensure only the intended data is hashed. The original L4 hash patch overlooked setting the addr_type for this case; add it. Fixes: bf4e0a3d ("net: ipv4: add support for ECMP hash policy choice") Reported-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
BBR uses tcp_tso_autosize() in an attempt to probe what would be the burst sizes and to adjust cwnd in bbr_target_cwnd() with following gold formula : /* Allow enough full-sized skbs in flight to utilize end systems. */ cwnd += 3 * bbr->tso_segs_goal; But GSO can be lacking or be constrained to very small units (ip link set dev ... gso_max_segs 2) What we really want is to have enough packets in flight so that both GSO and GRO are efficient. So in the case GSO is off or downgraded, we still want to have the same number of packets in flight as if GSO/TSO was fully operational, so that GRO can hopefully be working efficiently. To fix this issue, we make tcp_tso_autosize() unaware of sk->sk_gso_max_segs Only tcp_tso_segs() has to enforce the gso_max_segs limit. Tested: ethtool -K eth0 tso off gso off tc qd replace dev eth0 root pfifo_fast Before patch: for f in {1..5}; do ./super_netperf 1 -H lpaa24 -- -K bbr; done 691 (ss -temoi shows cwnd is stuck around 6 ) 667 651 631 517 After patch : # for f in {1..5}; do ./super_netperf 1 -H lpaa24 -- -K bbr; done 1733 (ss -temoi shows cwnd is around 386 ) 1778 1746 1781 1718 Fixes: 0f8782ea ("tcp_bbr: add BBR congestion control") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
If an attempt is made to disable RX checksums, USB adapter is changed but netdev->features is not, because smsc75xx_set_features() returns a non zero value. This throws errors from netdev_rx_csum_fault() : <devname>: hw csum failure Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steve Glendinning <steve.glendinning@shawell.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jason A. Donenfeld authored
Before, if cb->start() failed, the module reference would never be put, because cb->cb_running is intentionally false at this point. Users are generally annoyed by this because they can no longer unload modules that leak references. Also, it may be possible to tediously wrap a reference counter back to zero, especially since module.c still uses atomic_inc instead of refcount_inc. This patch expands the error path to simply call module_put if cb->start() fails. Fixes: 41c87425 ("netlink: do not set cb_running if dump's start() errs") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Li Zhijian authored
test_maps contains a series of stress tests, and previously it will break the rest tests when it failed to alloc memory. ----------------------- Failed to create hashmap key=8 value=262144 'Cannot allocate memory' Failed to create hashmap key=16 value=262144 'Cannot allocate memory' Failed to create hashmap key=8 value=262144 'Cannot allocate memory' Failed to create hashmap key=8 value=262144 'Cannot allocate memory' test_maps: test_maps.c:955: run_parallel: Assertion `status == 0' failed. Aborted not ok 1..3 selftests: test_maps [FAIL] ----------------------- after this patch, the rest tests will be continue when it occurs an ENOMEM failure CC: Alexei Starovoitov <alexei.starovoitov@gmail.com> CC: Philip Li <philip.li@intel.com> Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Li Zhijian <zhijianx.li@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Anders Roxell authored
bpf builds a test program for loading BPF ELF files. Add the executable to the .gitignore list. Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Tested-by: Daniel Díaz <daniel.diaz@linaro.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Anders Roxell authored
Both glibc and the kernel have in6_* macros definitions. Build fails because it picks up wrong in6_* macro from the kernel header and not the header from glibc. Fixes build error below: clang -I. -I./include/uapi -I../../../include/uapi -Wno-compare-distinct-pointer-types \ -O2 -target bpf -emit-llvm -c test_tcpbpf_kern.c -o - | \ llc -march=bpf -mcpu=generic -filetype=obj -o .../tools/testing/selftests/bpf/test_tcpbpf_kern.o In file included from test_tcpbpf_kern.c:12: .../netinet/in.h:101:5: error: expected identifier IPPROTO_HOPOPTS = 0, /* IPv6 Hop-by-Hop options. */ ^ .../linux/in6.h:131:26: note: expanded from macro 'IPPROTO_HOPOPTS' ^ In file included from test_tcpbpf_kern.c:12: /usr/include/netinet/in.h:103:5: error: expected identifier IPPROTO_ROUTING = 43, /* IPv6 routing header. */ ^ .../linux/in6.h:132:26: note: expanded from macro 'IPPROTO_ROUTING' ^ In file included from test_tcpbpf_kern.c:12: .../netinet/in.h:105:5: error: expected identifier IPPROTO_FRAGMENT = 44, /* IPv6 fragmentation header. */ ^ Since both glibc and the kernel have in6_* macros definitions, use the one from glibc. Kernel headers will check for previous libc definitions by including include/linux/libc-compat.h. Reported-by: Daniel Díaz <daniel.diaz@linaro.org> Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Tested-by: Daniel Díaz <daniel.diaz@linaro.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Arnd Bergmann authored
The only user of this variable is inside of an #ifdef, causing a warning without CONFIG_INET: net/core/filter.c: In function '____bpf_sock_ops_cb_flags_set': net/core/filter.c:3382:6: error: unused variable 'val' [-Werror=unused-variable] int val = argval & BPF_SOCK_OPS_ALL_CB_FLAGS; This replaces the #ifdef with a nicer IS_ENABLED() check that makes the code more readable and avoids the warning. Fixes: b13d8807 ("bpf: Adds field bpf_sock_ops_cb_flags to tcp_sock") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
- 21 Feb, 2018 10 commits
-
-
Tom Lendacky authored
After resuming from suspend, the PCI device support must re-enable the interrupt setting so that interrupts are actually delivered. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller authored
Daniel Borkmann says: ==================== pull-request: bpf 2018-02-20 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix a memory leak in LPM trie's map_free() callback function, where the trie structure itself was not freed since initial implementation. Also a synchronize_rcu() was needed in order to wait for outstanding programs accessing the trie to complete, from Yonghong. 2) Fix sock_map_alloc()'s error path in order to correctly propagate the -EINVAL error in case of too large allocation requests. This was just recently introduced when fixing close hooks via ULP layer, fix from Eric. 3) Do not use GFP_ATOMIC in __cpu_map_entry_alloc(). Reason is that this will not work with the recent __ptr_ring_init_queue_alloc() conversion to kvmalloc_array(), where in case of fallback to vmalloc() that GFP flag is invalid, from Jason. 4) Fix two recent syzkaller warnings: i) fix bpf_prog_array_copy_to_user() when a prog query with a big number of ids was performed where we'd otherwise trigger a warning from allocator side, ii) fix a missing mlock precharge on arraymaps, from Daniel. 5) Two fixes for bpftool in order to avoid breaking JSON output when used in batch mode, from Quentin. 6) Move a pr_debug() in libbpf in order to avoid having an otherwise uninitialized variable in bpf_program__reloc_text(), from Jeremy. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Jesper Dangaard Brouer says: ==================== virtio_net: several bugs in XDP code for driver virtio_net The virtio_net driver actually violates the original memory model of XDP causing hard to debug crashes. Per request of John Fastabend, instead of removing the XDP feature I'm fixing as much as possible. While testing virtio_net with XDP_REDIRECT I found 4 different bugs. Patch-1: not enough tail-room for build_skb in receive_mergeable() only option is to disable XDP_REDIRECT in receive_mergeable() Patch-2: XDP in receive_small() basically never worked (check wrong flag) Patch-3: fix memory leak for XDP_REDIRECT in error cases Patch-4: avoid crash when ndo_xdp_xmit is called on dev not ready for XDP In the longer run, we should consider introducing a separate receive function when attaching an XDP program, and also change the memory model to be compatible with XDP when attaching an XDP prog. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jesper Dangaard Brouer authored
When a driver implements the ndo_xdp_xmit() function, there is (currently) no generic way to determine whether it is safe to call. It is e.g. unsafe to call the drivers ndo_xdp_xmit, if it have not allocated the needed XDP TX queues yet. This is the case for virtio_net, which first allocates the XDP TX queues once an XDP/bpf prog is attached (in virtnet_xdp_set()). Thus, a crash will occur for virtio_net when redirecting to another virtio_net device's ndo_xdp_xmit, which have not attached a XDP prog. The sample xdp_redirect_map tries to attach a dummy XDP prog to take this into account, but it can also easily fail if the virtio_net (or actually underlying vhost driver) have not allocated enough extra queues for the device. Allocating more queue this is currently a manual config. Hint for libvirt XML add: <driver name='vhost' queues='16'> <host mrg_rxbuf='off'/> <guest tso4='off' tso6='off' ecn='off' ufo='off'/> </driver> The solution in this patch is to check that the device have loaded an XDP/bpf prog before proceeding. This is similar to the check performed in driver ixgbe. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jesper Dangaard Brouer authored
XDP_REDIRECT calling xdp_do_redirect() can fail for multiple reasons (which can be inspected by tracepoints). The current semantics is that on failure the driver calling xdp_do_redirect() must handle freeing or recycling the page associated with this frame. This can be seen as an optimization, as drivers usually have an optimized XDP_DROP code path for frame recycling in place already. The virtio_net driver didn't handle when xdp_do_redirect() failed. This caused a memory leak as the page refcnt wasn't decremented on failures. The function __virtnet_xdp_xmit() did handle one type of failure, when the xmit queue virtqueue_add_outbuf() is full, which "hides" releasing a refcnt on the page. Instead the function __virtnet_xdp_xmit() must follow API of xdp_do_redirect(), which on errors leave it up to the caller to free the page, of the failed send operation. Fixes: 186b3c99 ("virtio-net: support XDP_REDIRECT") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jesper Dangaard Brouer authored
When configuring virtio_net to use the code path 'receive_small()', in-order to get correct XDP_REDIRECT support, I discovered TCP packets would get silently dropped when loading an XDP program action XDP_PASS. The bug seems to be that receive_small() when XDP is loaded check that hdr->hdr.flags is zero, which seems wrong as hdr.flags contains the flags VIRTIO_NET_HDR_F_* : #define VIRTIO_NET_HDR_F_NEEDS_CSUM 1 /* Use csum_start, csum_offset */ #define VIRTIO_NET_HDR_F_DATA_VALID 2 /* Csum is valid */ TCP got dropped as it had the VIRTIO_NET_HDR_F_DATA_VALID flag set. The flags that are relevant here are the VIRTIO_NET_HDR_GSO_* flags stored in hdr->hdr.gso_type. Thus, the fix is just check that none of the gso_type flags have been set. Fixes: bb91accf ("virtio-net: XDP support for small buffers") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jesper Dangaard Brouer authored
The virtio_net code have three different RX code-paths in receive_buf(). Two of these code paths can handle XDP, but one of them is broken for at least XDP_REDIRECT. Function(1): receive_big() does not support XDP. Function(2): receive_small() support XDP fully and uses build_skb(). Function(3): receive_mergeable() broken XDP_REDIRECT uses napi_alloc_skb(). The simple explanation is that receive_mergeable() is broken because it uses napi_alloc_skb(), which violates XDP given XDP assumes packet header+data in single page and enough tail room for skb_shared_info. The longer explaination is that receive_mergeable() tries to work-around and satisfy these XDP requiresments e.g. by having a function xdp_linearize_page() that allocates and memcpy RX buffers around (in case packet is scattered across multiple rx buffers). This does currently satisfy XDP_PASS, XDP_DROP and XDP_TX (but only because we have not implemented bpf_xdp_adjust_tail yet). The XDP_REDIRECT action combined with cpumap is broken, and cause hard to debug crashes. The main issue is that the RX packet does not have the needed tail-room (SKB_DATA_ALIGN(skb_shared_info)), causing skb_shared_info to overlap the next packets head-room (in which cpumap stores info). Reproducing depend on the packet payload length and if RX-buffer size happened to have tail-room for skb_shared_info or not. But to make this even harder to troubleshoot, the RX-buffer size is runtime dynamically change based on an Exponentially Weighted Moving Average (EWMA) over the packet length, when refilling RX rings. This patch only disable XDP_REDIRECT support in receive_mergeable() case, because it can cause a real crash. IMHO we should consider NOT supporting XDP in receive_mergeable() at all, because the principles behind XDP are to gain speed by (1) code simplicity, (2) sacrificing memory and (3) where possible moving runtime checks to setup time. These principles are clearly being violated in receive_mergeable(), that e.g. runtime track average buffer size to save memory consumption. In the longer run, we should consider introducing a separate receive function when attaching an XDP program, and also change the memory model to be compatible with XDP when attaching an XDP prog. Fixes: 186b3c99 ("virtio-net: support XDP_REDIRECT") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linuxDavid S. Miller authored
Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2018-02-20 The following pull request includes some fixes for the mlx5 core and netdevice driver. Please pull and let me know if there's any issue. -stable 4.10.y: ('net/mlx5e: Fix loopback self test when GRO is off') -stable 4.12.y: ('net/mlx5e: Specify numa node when allocating drop rq') -stable 4.13.y: ('net/mlx5e: Verify inline header size do not exceed SKB linear size') -stable 4.15.y: ('net/mlx5e: Fix TCP checksum in LRO buffers') ('net/mlx5: Fix error handling when adding flow rules') ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller authored
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains large batch with Netfilter fixes for your net tree, mostly due to syzbot report fixups and pr_err() ratelimiting, more specifically, they are: 1) Get rid of superfluous unnecessary check in x_tables before vmalloc(), we don't hit BUG there anymore, patch from Michal Hock, suggested by Andrew Morton. 2) Race condition in proc file creation in ipt_CLUSTERIP, from Cong Wang. 3) Drop socket lock that results in circular locking dependency, patch from Paolo Abeni. 4) Drop packet if case of malformed blob that makes backpointer jump in x_tables, from Florian Westphal. 5) Fix refcount leak due to race in ipt_CLUSTERIP in clusterip_config_find_get(), from Cong Wang. 6) Several patches to ratelimit pr_err() for x_tables since this can be a problem where CAP_NET_ADMIN semantics can protect us in untrusted namespace, from Florian Westphal. 7) Missing .gitignore update for new autogenerated asn1 state machine for the SNMP NAT helper, from Zhu Lingshan. 8) Missing timer initialization in xt_LED, from Paolo Abeni. 9) Do not allow negative port range in NAT, also from Paolo. 10) Lock imbalance in the xt_hashlimit rate match mode, patch from Eric Dumazet. 11) Initialize workqueue before timer in the idletimer match, from Eric Dumazet. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Roman Kapl authored
So far, if the filter was too large to fit in the allocated skb, the kernel did not return any error and stopped dumping. Modify the dumper so that it returns -EMSGSIZE when a filter fails to dump and it is the first filter in the skb. If we are not first, we will get a next chance with more room. I understand this is pretty near to being an API change, but the original design (silent truncation) can be considered a bug. Note: The error case can happen pretty easily if you create a filter with 32 actions and have 4kb pages. Also recent versions of iproute try to be clever with their buffer allocation size, which in turn leads to Signed-off-by: Roman Kapl <code@rkapl.cz> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 20 Feb, 2018 8 commits
-
-
Vlad Buslov authored
If building match list or adding existing fg fails when node is locked, function returned without unlocking it. This happened if node version changed or adding existing fg returned with EAGAIN after jumping to search_again_locked label. Fixes: bd71b08e ("net/mlx5: Support multiple updates of steering rules in parallel") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Eugenia Emantayev authored
First use of drop counters happens in esw_apply_vport_conf function, while they are allocated later in the flow. Fix that by moving esw_vport_create_drop_counters function to be called before the first use. Fixes: b8a0dbe3 ("net/mlx5e: E-switch, Add steering drop counters") Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Or Gerlitz authored
We can't allow only some of the rules sharing an FTE to ask for header re-write, add it to the conflicting action checks. Fixes: 0d235c3f ('net/mlx5: Add hash table to search FTEs in a flow-group') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Daniel Jurgens authored
The adapter uses the cache_line_128byte setting to set the bounds for end padding. On systems where the cacheline size is greater than 128B use 128B instead of the default of 64B. This results in fewer partial cacheline writes. There's a 50% chance it will pad to the end of a 256B cache line vs only 25% when using 64B. Fixes: f32f5bd2 ("net/mlx5: Configure cache line size for start and end padding") Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Gal Pressman authored
When allocating a drop rq, no numa node is explicitly set which means allocations are done on node zero. This is not necessarily the nearest numa node to the HCA, and even worse, might even be a memoryless numa node. Choose the numa_node given to us by the pci device in order to properly allocate the coherent dma memory instead of assuming zero is valid. Fixes: 556dd1b9 ("net/mlx5e: Set drop RQ's necessary parameters only") Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Or Gerlitz authored
This isn't supported when we emulate eswitch vlan push action which is the current state of things. Fixes: 8b32580d ('net/mlx5e: Add TC vlan action for SRIOV offloads') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Or Gerlitz authored
Address these sparse warnings on drivers/net/ethernet/mellanox/mlx5 [..]/core/diag/fs_tracepoint.c:99:53: warning: non-constant initializer for static object [..]/core/diag/fs_tracepoint.c:102:53: warning: non-constant initializer for static object etc Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Or Gerlitz authored
Fix these gcc warnings on drivers/net/ethernet/mellanox/mlx5: [..]/core/lib/clock.c:454:6: warning: no previous prototype for 'mlx5_init_clock' [-Wmissing-prototypes] [..]/core/lib/clock.c:510:6: warning: no previous prototype for 'mlx5_cleanup_clock' [-Wmissing-prototypes] [..]/core/en_main.c:3141:5: warning: no previous prototype for 'mlx5e_setup_tc' [-Wmissing-prototypes] Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-