- 16 Sep, 2019 6 commits
-
-
Toke Høiland-Jørgensen authored
syzbot found a crash in dev_map_hash_update_elem(), when replacing an element with a new one. Jesper correctly identified the cause of the crash as a race condition between the initial lookup in the map (which is done before taking the lock), and the removal of the old element. Rather than just add a second lookup into the hashmap after taking the lock, fix this by reworking the function logic to take the lock before the initial lookup. Fixes: 6f9d451a ("xdp: Add devmap_hash map type for looking up devices by hashed index") Reported-and-tested-by: syzbot+4e7a85b1432052e8d6f8@syzkaller.appspotmail.com Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Daniel Borkmann authored
Ciara Loftus says: ==================== This patch set contains some fixes for AF_XDP zero copy in the i40e and ixgbe drivers as well as a fix for the 'xdpsock' sample application when running in unaligned mode. Patches 1 and 2 fix a regression for the i40e and ixgbe drivers which caused the umem headroom to be added to the xdp handle twice, resulting in an incorrect value being received by the user for the case where the umem headroom is non-zero. Patch 3 fixes an issue with the xdpsock sample application whereby the start of the tx packet data (offset) was not being set correctly when the application was being run in unaligned mode. This patch set has been applied against commit a2c11b03 ("kcm: use BPF_PROG_RUN") ==================== Acked-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Ciara Loftus authored
Preserve the offset of the address of the received descriptor, and include it in the address set for the tx descriptor, so the kernel can correctly locate the start of the packet data. Fixes: 03895e63 ("samples/bpf: add buffer recycling for unaligned chunks to xdpsock") Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Ciara Loftus authored
Commit 7cbbf9f1 ("ixgbe: fix xdp handle calculations") reintroduced the addition of the umem headroom to the xdp handle in the ixgbe_zca_free, ixgbe_alloc_buffer_slow_zc and ixgbe_alloc_buffer_zc functions. However, the headroom is already added to the handle in the function ixgbe_run_xdp_zc. This commit removes the latter addition and fixes the case where the headroom is non-zero. Fixes: 7cbbf9f1 ("ixgbe: fix xdp handle calculations") Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Ciara Loftus authored
Commit 4c5d9a7f ("i40e: fix xdp handle calculations") reintroduced the addition of the umem headroom to the xdp handle in the i40e_zca_free, i40e_alloc_buffer_slow_zc and i40e_alloc_buffer_zc functions. However, the headroom is already added to the handle in the function i40_run_xdp_zc. This commit removes the latter addition and fixes the case where the headroom is non-zero. Fixes: 4c5d9a7f ("i40e: fix xdp handle calculations") Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Ilya Leoshkevich authored
Now that binutils and gcc support for BPF is upstream, make use of it in BPF selftests using alu32-like approach. Share as much as possible of CFLAGS calculation with clang. Fixes only obvious issues, leaving more complex ones for later: - Use gcc-provided bpf-helpers.h instead of manually defining the helpers, change bpf_helpers.h include guard to avoid conflict. - Include <linux/stddef.h> for __always_inline. - Add $(OUTPUT)/../usr/include to include path in order to use local kernel headers instead of system kernel headers when building with O=. In order to activate the bpf-gcc support, one needs to configure binutils and gcc with --target=bpf and make them available in $PATH. In particular, gcc must be installed as `bpf-gcc`, which is the default. Right now with binutils 25a2915e8dba and gcc r275589 only a handful of tests work: # ./test_progs_bpf_gcc # Summary: 7/39 PASSED, 1 SKIPPED, 98 FAILED The reason for those failures are as follows: - Build errors: - `error: too many function arguments for eBPF` for __always_inline functions read_str_var and read_map_var - must be inlining issue, and for process_l3_headers_v6, which relies on optimizing away function arguments. - `error: indirect call in function, which are not supported by eBPF` where there are no obvious indirect calls in the source calls, e.g. in __encap_ipip_none. - `error: field 'lock' has incomplete type` for fields of `struct bpf_spin_lock` type - bpf_spin_lock is re#defined by bpf-helpers.h, so its usage is sensitive to order of #includes. - `error: eBPF stack limit exceeded` in sysctl_tcp_mem. - Load errors: - Missing object files due to above build errors. - `libbpf: failed to create map (name: 'test_ver.bss')`. - `libbpf: object file doesn't contain bpf program`. - `libbpf: Program '.text' contains unrecognized relo data pointing to section 0`. - `libbpf: BTF is required, but is missing or corrupted` - no BTF support in gcc yet. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Jose E. Marchesi <jose.marchesi@oracle.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
- 06 Sep, 2019 29 commits
-
-
Sami Tolvanen authored
Instead of invoking struct bpf_prog::bpf_func directly, use the BPF_PROG_RUN macro. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Alexei Starovoitov authored
Stanislav Fomichev says: ==================== Now that test_progs is shaping into more generic test framework, let's convert sockopt tests to it. This requires adding a helper to create and join a cgroup first (test__join_cgroup). Since we already hijack stdout/stderr that shouldn't be a problem (cgroup helpers log to stderr). The rest of the patches just move sockopt tests files under prog_tests/ and do the required small adjustments. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Stanislav Fomichev authored
Move the files, adjust includes, remove entry from Makefile & .gitignore Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Stanislav Fomichev authored
Move the files, adjust includes, remove entry from Makefile & .gitignore I also added pthread_cond_wait for the server thread startup. We don't want to connect to the server that's not yet up (for some reason this existing race is now more prominent with test_progs). Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Stanislav Fomichev authored
Move the files, adjust includes, remove entry from Makefile & .gitignore Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Stanislav Fomichev authored
Move the files, adjust includes, remove entry from Makefile & .gitignore Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Stanislav Fomichev authored
Move the files, adjust includes, remove entry from Makefile & .gitignore Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Stanislav Fomichev authored
test__join_cgroup() combines the following operations that usually go hand in hand and returns cgroup fd: * setup cgroup environment (make sure cgroupfs is mounted) * mkdir cgroup * join cgroup It also marks a test as a "cgroup cleanup needed" and removes cgroup state after the test is done. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrii Nakryiko authored
${@:2} is BASH-specific extension, which makes link-vmlinux.sh rely on BASH. Use shift and ${@} instead to fix this issue. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: 341dfcf8 ("btf: expose BTF info through sysfs") Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller authored
Daniel Borkmann says: ==================== The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Add the ability to use unaligned chunks in the AF_XDP umem. By relaxing where the chunks can be placed, it allows to use an arbitrary buffer size and place whenever there is a free address in the umem. Helps more seamless DPDK AF_XDP driver integration. Support for i40e, ixgbe and mlx5e, from Kevin and Maxim. 2) Addition of a wakeup flag for AF_XDP tx and fill rings so the application can wake up the kernel for rx/tx processing which avoids busy-spinning of the latter, useful when app and driver is located on the same core. Support for i40e, ixgbe and mlx5e, from Magnus and Maxim. 3) bpftool fixes for printf()-like functions so compiler can actually enforce checks, bpftool build system improvements for custom output directories, and addition of 'bpftool map freeze' command, from Quentin. 4) Support attaching/detaching XDP programs from 'bpftool net' command, from Daniel. 5) Automatic xskmap cleanup when AF_XDP socket is released, and several barrier/{read,write}_once fixes in AF_XDP code, from Björn. 6) Relicense of bpf_helpers.h/bpf_endian.h for future libbpf inclusion as well as libbpf versioning improvements, from Andrii. 7) Several new BPF kselftests for verifier precision tracking, from Alexei. 8) Several BPF kselftest fixes wrt endianess to run on s390x, from Ilya. 9) And more BPF kselftest improvements all over the place, from Stanislav. 10) Add simple BPF map op cache for nfp driver to batch dumps, from Jakub. 11) AF_XDP socket umem mapping improvements for 32bit archs, from Ivan. 12) Add BPF-to-BPF call and BTF line info support for s390x JIT, from Yauheni. 13) Small optimization in arm64 JIT to spare 1 insns for BPF_MOD, from Jerin. 14) Fix an error check in bpf_tcp_gen_syncookie() helper, from Petar. 15) Various minor fixes and cleanups, from Nathan, Masahiro, Masanari, Peter, Wei, Yue. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Colin Ian King authored
The variable rx_process_result is being initialized with a value that is never read and is being re-assigned immediately afterwards. The assignment is redundant, so replace it with the return from function lan743x_rx_process_packet. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Simon Horman says: ==================== ravb: remove use of undocumented registers this short series cleans up the RAVB driver a little. The first patch corrects the spelling of the FBP field of SFO register. This register field is unused and should have no run-time effect. The remaining patches remove the use of undocumented registers after some consultation with the internal Renesas BSP team. Changes in v2: * Corrected mangled state of first patch * Patches 2/4 and 3/4 split out of a large patch * Accumulated acks * Tweaked changelog * Claimed authorship of all patches v1 of this series was tested on the following platforms. No behaviour change is expected in v2. * E3 Ebisu * H3 Salvator-XS (ES2.0) * M3-W Salvator-XS * M3-N Salvator-XS * RZ/G1C iW-RainboW-G23S ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Simon Horman authored
Only use the TROCR register on R-Car Gen3 as it is not present on other SoCs. Offsets used for the undocumented registers are considered reserved and should not be written to. After some internal investigation with Renesas it remains unclear why this driver accesses these fields on R-Car Gen2 but regardless of what the historical reasons are the current code is considered incorrect. Signed-off-by: Simon Horman <horms+renesas@verge.net.au> Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Simon Horman authored
This patch removes the use of the undocumented BOC bit of the CCC register. Current documentation for EtherAVB (ravb) describes the offset of what the driver uses as the BOC bit as reserved and that only a value of 0 should be written. After some internal investigation with Renesas it remains unclear why this driver accesses these fields but regardless of what the historical reasons are the current code is considered incorrect. Based on work by Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com> Signed-off-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Simon Horman authored
This patch removes the use of the undocumented counter registers CDCR, LCCR, CERCR, CEECR. Offsets used for undocumented registers are considered reserved and should not be written to. After some internal investigation with Renesas it remains unclear why this driver accesses these fields but regardless of what the historical reasons are the current code is considered incorrect. Based on work by Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com> Signed-off-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Simon Horman authored
The field name is FBP rather than FPB. This field is unused and could equally be removed from the driver entirely. But there seems no harm in leaving as documentation of the presence of the field. Based on work by Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com> Signed-off-by: Simon Horman <horms+renesas@verge.net.au> Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Huazhong Tan says: ==================== net: hns3: add some bugfixes and cleanups This patch-set includes bugfixes and cleanups for the HNS3 ethernet controller driver. [patch 01/07] fixes an error when setting VLAN offload. [patch 02/07] fixes an double free issue when setting ringparam. [patch 03/07] fixes a mis-assignment of hdev->reset_level. [patch 04/07] adds a checking for client's validity. [patch 05/07] simplifies bool variable's assignment. [patch 06/07] disables loopback when initializing. [patch 07/07] makes internal function to static. Change log: V1->V2: fixes comment from Sergei Shtylyov. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Guojia Liao authored
hclge_dbg_get_m7_info is used only in the hclge_debugfs.c, so it should be declared with static. Signed-off-by: Guojia Liao <liaoguojia@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yufeng Mo authored
If the selftest and reset are performed at the same time, the loopback setting may be still in the enable state after the reset. As a result, packets cannot be sent out. This patch fixes this issue by disabling loopback in hclge_mac_init. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Guojia Liao authored
Relational and logical operators evaluate to bool, explicit conversion is overly verbose and unnecessary. Signed-off-by: Guojia Liao <liaoguojia@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Peng Li authored
HNS3 driver can only unregister client which included in hnae3_client_list. This patch adds the client node validity judgment. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Huazhong Tan authored
Since hclge_get_reset_level may return HNAE3_NONE_RESET, so hdev->reset_level can not be assigned with the return value in the hclge_reset(), otherwise, it will cause the use of hdev->reset_level in hclge_reset_event get into error. Fixes: 012fcb52 ("net: hns3: activate reset timer when calling reset_event") Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Huazhong Tan authored
The system will panic when change the ringparam in HNS3 drivers: [ 1459.627727] hns3 0000:bd:00.0 eth6: Changing Tx/Rx ring ds from 1024/1024 to 24/24 [ 1459.635766] hns3 0000:bd:00.0 eth6: link down [ 1459.640788] BUG: Bad page state in process ethtool pfn:203f75c18 [ 1459.646940] page:ffff7ee4ffd70600 refcount:0 mapcount:0 mapping:ffff993fff40f400 index:0x0 compound_mapcount: 0 [ 1459.656987] flags: 0x9fffe00000010200(slab|head) [ 1459.661591] raw: 9fffe00000010200 dead000000000100 dead000000000122 ffff993fff40f400 [ 1459.669302] raw: 0000000000000000 0000000080100010 00000000ffffffff 0000000000000000 [ 1459.677016] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set [ 1459.683432] bad because of flags: 0x200(slab) [ 1459.687775] Modules linked in: ib_ipoib ib_umad rpcrdma ib_iser libiscsi scsi_transport_iscsi hns_roce_hw_v2 crct10dif_ce hns3 ses hclge hnae3 hisi_hpre hisi_zip qm uacce ip_tables x_tables hisi_sas_v3_hw hisi_sas_main libsas scsi_transport_sas [ 1459.709329] CPU: 14 PID: 17244 Comm: ethtool Tainted: G O 5.3.0-rc4-00415-gc86f057 #1 [ 1459.718419] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V3.B040.01 07/26/2019 [ 1459.727248] Call trace: [ 1459.729688] dump_backtrace+0x0/0x150 [ 1459.733335] show_stack+0x24/0x30 [ 1459.736639] dump_stack+0xa0/0xc4 [ 1459.739943] bad_page+0xf0/0x158 [ 1459.743157] free_pages_check_bad+0x84/0xa0 [ 1459.747322] __free_pages_ok+0x348/0x378 [ 1459.751228] page_frag_free+0x80/0x88 [ 1459.754877] skb_free_head+0x38/0x48 [ 1459.758436] skb_release_data+0x134/0x160 [ 1459.762427] skb_release_all+0x30/0x40 [ 1459.766158] consume_skb+0x38/0x108 [ 1459.769633] __dev_kfree_skb_any+0x58/0x68 [ 1459.773718] hns3_fini_ring+0x48/0x58 [hns3] [ 1459.777970] hns3_set_ringparam+0x2a8/0x418 [hns3] [ 1459.782741] dev_ethtool+0x5f4/0x2080 [ 1459.786390] dev_ioctl+0x190/0x3d8 [ 1459.789777] sock_do_ioctl+0xf8/0x220 [ 1459.793423] sock_ioctl+0x3bc/0x490 [ 1459.796896] do_vfs_ioctl+0xc4/0x868 [ 1459.800454] ksys_ioctl+0x8c/0xa0 [ 1459.803752] __arm64_sys_ioctl+0x28/0x38 [ 1459.807658] el0_svc_common.constprop.0+0xe0/0x1e0 [ 1459.812426] el0_svc_handler+0x34/0x90 [ 1459.816158] el0_svc+0x10/0x14 [ 1459.819220] Disabling lock debugging due to kernel taint [ 1459.825182] ------------[ cut here ]------------ Since ndo_stop will reclaim the RX's skb allocated by the driver, so the backed up ring parameter should not keep this info. Fixes: a723fb8e ("net: hns3: refine for set ring parameters") Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jian Shen authored
In original codes, the VF index used incorrectly in function hclge_set_vlan_rx_offload_cfg() and hclge_set_vlan_rx_offload_cfg(). When VF id is greater than 8, for example 9, it will set the same bit with VF id 1. This patch fixes it by using vport->vport_id % HCLGE_VF_NUM_PER_CMD / HCLGE_VF_NUM_PER_BYTE as the array index, instead of vport->vport_id / HCLGE_VF_NUM_PER_CMD. Fixes: 052ece6d ("net: hns3: add ethtool related offload command") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andy Shevchenko authored
This patch amends the error and warning messages across the platform driver. It includes the following changes: - append \n to the end of messages - change pr_* macros to dev_* Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jose Abreu authored
While running stmmac selftests I found that in my 1G setup some tests were failling when running with PHY loopback enabled. It looks like when loopback is enabled the PHY will report that Link is down even though there is a valid connection. As in loopback mode the data will not be sent anywhere we can bypass the logic of checking if Link is valid thus saving unecessary reads. Signed-off-by: Jose Abreu <joabreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Dai authored
For high speed adapter like Mellanox CX-5 card, it can reach upto 100 Gbits per second bandwidth. Currently htb already supports 64bit rate in tc utility. However police action rate and peakrate are still limited to 32bit value (upto 32 Gbits per second). Add 2 new attributes TCA_POLICE_RATE64 and TCA_POLICE_RATE64 in kernel for 64bit support so that tc utility can use them for 64bit rate and peakrate value to break the 32bit limit, and still keep the backward binary compatibility. Tested-by: David Dai <zdai@linux.vnet.ibm.com> Signed-off-by: David Dai <zdai@linux.vnet.ibm.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paul Blakey authored
Offloaded OvS datapath rules are translated one to one to tc rules, for example the following simplified OvS rule: recirc_id(0),in_port(dev1),eth_type(0x0800),ct_state(-trk) actions:ct(),recirc(2) Will be translated to the following tc rule: $ tc filter add dev dev1 ingress \ prio 1 chain 0 proto ip \ flower tcp ct_state -trk \ action ct pipe \ action goto chain 2 Received packets will first travel though tc, and if they aren't stolen by it, like in the above rule, they will continue to OvS datapath. Since we already did some actions (action ct in this case) which might modify the packets, and updated action stats, we would like to continue the proccessing with the correct recirc_id in OvS (here recirc_id(2)) where we left off. To support this, introduce a new skb extension for tc, which will be used for translating tc chain to ovs recirc_id to handle these miss cases. Last tc chain index will be set by tc goto chain action and read by OvS datapath. Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
zhong jiang authored
Continue is not needed at the bottom of a loop. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 05 Sep, 2019 5 commits
-
-
Daniel Borkmann authored
Björn Töpel says: ==================== This is a four patch series of various barrier, {READ, WRITE}_ONCE cleanups in the AF_XDP socket code. More details can be found in the corresponding commit message. Previous revisions: v1 [4] and v2 [5]. For an AF_XDP socket, most control plane operations are done under the control mutex (struct xdp_sock, mutex), but there are some places where members of the struct is read outside the control mutex. The dev, queue_id members are set in bind() and cleared at cleanup. The umem, fq, cq, tx, rx, and state member are all assigned in various places, e.g. bind() and setsockopt(). When the members are assigned, they are protected by the control mutex, but since they are read outside the mutex, a WRITE_ONCE is required to avoid store-tearing on the read-side. Prior the state variable was introduced by Ilya, the dev member was used to determine whether the socket was bound or not. However, when dev was read, proper SMP barriers and READ_ONCE were missing. In order to address the missing barriers and READ_ONCE, we start using the state variable as a point of synchronization. The state member read/write is paired with proper SMP barriers, and from this follows that the members described above does not need READ_ONCE statements if used in conjunction with state check. To summarize: The members struct xdp_sock members dev, queue_id, umem, fq, cq, tx, rx, and state were read lock-less, with incorrect barriers and missing {READ, WRITE}_ONCE. After this series umem, fq, cq, tx, rx, and state are read lock-less. When these members are updated, WRITE_ONCE is used. When read, READ_ONCE are only used when read outside the control mutex (e.g. mmap) or, not synchronized with the state member (XSK_BOUND plus smp_rmb()) [1] https://lore.kernel.org/bpf/beef16bb-a09b-40f1-7dd0-c323b4b89b17@iogearbox.net/ [2] https://lwn.net/Articles/793253/ [3] https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE [4] https://lore.kernel.org/bpf/20190822091306.20581-1-bjorn.topel@gmail.com/ [5] https://lore.kernel.org/bpf/20190826061053.15996-1-bjorn.topel@gmail.com/ v2->v3: Minor restructure of commits. Improve cover and commit messages. (Daniel) v1->v2: Removed redundant dev check. (Jonathan) ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Björn Töpel authored
When accessing the members of an XDP socket, the control mutex should be held. This commit fixes that. Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Fixes: a36b38aa ("xsk: add sock_diag interface for AF_XDP") Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Björn Töpel authored
Prior the state variable was introduced by Ilya, the dev member was used to determine whether the socket was bound or not. However, when dev was read, proper SMP barriers and READ_ONCE were missing. In order to address the missing barriers and READ_ONCE, we start using the state variable as a point of synchronization. The state member read/write is paired with proper SMP barriers, and from this follows that the members described above does not need READ_ONCE if used in conjunction with state check. In all syscalls and the xsk_rcv path we check if state is XSK_BOUND. If that is the case we do a SMP read barrier, and this implies that the dev, umem and all rings are correctly setup. Note that no READ_ONCE are needed for these variable if used when state is XSK_BOUND (plus the read barrier). To summarize: The members struct xdp_sock members dev, queue_id, umem, fq, cq, tx, rx, and state were read lock-less, with incorrect barriers and missing {READ, WRITE}_ONCE. Now, umem, fq, cq, tx, rx, and state are read lock-less. When these members are updated, WRITE_ONCE is used. When read, READ_ONCE are only used when read outside the control mutex (e.g. mmap) or, not synchronized with the state member (XSK_BOUND plus smp_rmb()) Note that dev and queue_id do not need a WRITE_ONCE or READ_ONCE, due to the introduce state synchronization (XSK_BOUND plus smp_rmb()). Introducing the state check also fixes a race, found by syzcaller, in xsk_poll() where umem could be accessed when stale. Suggested-by: Hillf Danton <hdanton@sina.com> Reported-by: syzbot+c82697e3043781e08802@syzkaller.appspotmail.com Fixes: 77cd0d7b ("xsk: add support for need_wakeup flag in AF_XDP rings") Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Björn Töpel authored
The umem member of struct xdp_sock is read outside of the control mutex, in the mmap implementation, and needs a WRITE_ONCE to avoid potential store-tearing. Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Fixes: 423f3832 ("xsk: add umem fill queue support and mmap") Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Björn Töpel authored
Use WRITE_ONCE when doing the store of tx, rx, fq, and cq, to avoid potential store-tearing. These members are read outside of the control mutex in the mmap implementation. Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Fixes: 37b07693 ("xsk: add missing write- and data-dependency barrier") Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-