- 07 Apr, 2021 3 commits
-
-
Aya Levin authored
Add reserved mapping to cover all the register in order to avoid setting arbitrary values to newer FW which implements the reserved fields. Fixes: a58837f5 ("net/mlx5e: Expose FEC feilds and related capability bit") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Raed Salem authored
The cited commit wrongly placed log_max_flow_counter field of mlx5_ifc_flow_table_prop_layout_bits, align it to the HW spec intended placement. Fixes: 16f1c5bb ("net/mlx5: Check device capability for maximum flow counters") Signed-off-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Eli Cohen authored
Make sure to modify uplink port to follow only if the uplink_follow capability is set as required by the HW spec. Failure to do so causes traffic to the uplink representor net device to cease after switching to switchdev mode. Fixes: 7d0314b1 ("net/mlx5e: Modify uplink state on interface up/down") Signed-off-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
- 06 Apr, 2021 10 commits
-
-
Jakub Kicinski authored
Fix incorrect documentation. Mostly referring to other objects, likely because the text was copied and not adjusted. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Phillip Potter authored
When changing type with TUNSETLINK ioctl command, set tun->dev->addr_len to match the appropriate type, using new tun_get_addr_len utility function which returns appropriate address length for given type. Fixes a KMSAN-found uninit-value bug reported by syzbot at: https://syzkaller.appspot.com/bug?id=0766d38c656abeace60621896d705743aeefed51 Reported-by: syzbot+001516d86dbe88862cec@syzkaller.appspotmail.com Diagnosed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Phillip Potter <phil@philpotter.co.uk> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Wong Vee Khee authored
The member 'tx_lpi_timer' is defined with __u32 datatype in the ethtool header file. Hence, we should use ethnl_update_u32() in set_eee ops. Fixes: fd77be7b ("ethtool: set EEE settings with EEE_SET request") Cc: <stable@vger.kernel.org> # 5.10.x Cc: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Guangbin Huang authored
Currently, the VF down state bit is cleared after VF sending link status request command. There is problem that when VF gets link status replied from PF, the down state bit may still set as 1. In this case, the link status replied from PF will be ignored and always set VF link status to down. To fix this problem, clear VF down state bit before VF requests link status. Fixes: e2cb1dec ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support") Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Merge tag 'linux-can-fixes-for-5.12-20210406' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2021-04-06 this is a pull request of 1 patch for net/master. The patch is by me and fixes the SPI half duplex support in the mcp251x CAN driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Guenter Roeck authored
pci_resource_start() is not a good indicator to determine if a PCI resource exists or not, since the resource may start at address 0. This is seen when trying to instantiate the driver in qemu for riscv32 or riscv64. pci 0000:00:01.0: reg 0x10: [io 0x0000-0x001f] pci 0000:00:01.0: reg 0x14: [mem 0x00000000-0x0000001f] ... pcnet32: card has no PCI IO resources, aborting Use pci_resouce_len() instead. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Xin Long authored
Li Shuang found a NULL pointer dereference crash in her testing: [] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 [] RIP: 0010:tipc_crypto_rcv_complete+0xc8/0x7e0 [tipc] [] Call Trace: [] <IRQ> [] tipc_crypto_rcv+0x2d9/0x8f0 [tipc] [] tipc_rcv+0x2fc/0x1120 [tipc] [] tipc_udp_recv+0xc6/0x1e0 [tipc] [] udpv6_queue_rcv_one_skb+0x16a/0x460 [] udp6_unicast_rcv_skb.isra.35+0x41/0xa0 [] ip6_protocol_deliver_rcu+0x23b/0x4c0 [] ip6_input+0x3d/0xb0 [] ipv6_rcv+0x395/0x510 [] __netif_receive_skb_core+0x5fc/0xc40 This is caused by NULL returned by tipc_aead_get(), and then crashed when dereferencing it later in tipc_crypto_rcv_complete(). This might happen when tipc_crypto_rcv_complete() is called by two threads at the same time: the tmp attached by tipc_crypto_key_attach() in one thread may be released by the one attached by that in the other thread. This patch is to fix it by incrementing the tmp's refcnt before attaching it instead of calling tipc_aead_get() after attaching it. Fixes: fc1b6d6d ("tipc: introduce TIPC encryption & authentication") Reported-by: Li Shuang <shuali@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Xuan Zhuo reported that commit 3226b158 ("net: avoid 32 x truesize under-estimation for tiny skbs") brought a ~10% performance drop. The reason for the performance drop was that GRO was forced to chain sk_buff (using skb_shinfo(skb)->frag_list), which uses more memory but also cause packet consumers to go over a lot of overhead handling all the tiny skbs. It turns out that virtio_net page_to_skb() has a wrong strategy : It allocates skbs with GOOD_COPY_LEN (128) bytes in skb->head, then copies 128 bytes from the page, before feeding the packet to GRO stack. This was suboptimal before commit 3226b158 ("net: avoid 32 x truesize under-estimation for tiny skbs") because GRO was using 2 frags per MSS, meaning we were not packing MSS with 100% efficiency. Fix is to pull only the ethernet header in page_to_skb() Then, we change virtio_net_hdr_to_skb() to pull the missing headers, instead of assuming they were already pulled by callers. This fixes the performance regression, but could also allow virtio_net to accept packets with more than 128bytes of headers. Many thanks to Xuan Zhuo for his report, and his tests/help. Fixes: 3226b158 ("net: avoid 32 x truesize under-estimation for tiny skbs") Reported-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://www.spinics.net/lists/netdev/msg731397.htmlCo-Developed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: virtualization@lists.linux-foundation.org Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Lv Yunlong authored
In bcm4908_enet_dma_alloc, if callee bcm4908_dma_alloc_buf_descs() failed, it will free the ring->cpu_addr by dma_free_coherent() and return error. Then bcm4908_enet_dma_free() will be called, and free the same cpu_addr by dma_free_coherent() again. My patch set ring->cpu_addr to NULL after it is freed in bcm4908_dma_alloc_buf_descs() to avoid the double free. Fixes: 4feffead ("net: broadcom: bcm4908enet: add BCM4908 controller driver") Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Marc Kleine-Budde authored
Some SPI host controllers do not support full-duplex SPI transfers. The function mcp251x_spi_trans() does a full duplex transfer. It is used in several places in the driver, where a TX half duplex transfer is sufficient. To fix support for half duplex SPI host controllers, this patch introduces a new function mcp251x_spi_write() and changes all callers that do a TX half duplex transfer to use mcp251x_spi_write(). Fixes: e0e25001 ("can: mcp251x: add support for half duplex controllers") Link: https://lore.kernel.org/r/20210330100246.1074375-1-mkl@pengutronix.de Cc: Tim Harvey <tharvey@gateworks.com> Tested-By: Tim Harvey <tharvey@gateworks.com> Reported-by: Gerhard Bertelsmann <info@gerhard-bertelsmann.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-
- 05 Apr, 2021 6 commits
-
-
Tetsuo Handa authored
KMSAN found uninitialized value at batadv_tt_prepare_tvlv_local_data() [1], for commit ced72933 ("batman-adv: use CRC32C instead of CRC16 in TT code") inserted 'reserved' field into "struct batadv_tvlv_tt_data" and commit 7ea7b4a1 ("batman-adv: make the TT CRC logic VLAN specific") moved that field to "struct batadv_tvlv_tt_vlan_data" but left that field uninitialized. [1] https://syzkaller.appspot.com/bug?id=07f3e6dba96f0eb3cabab986adcd8a58b9bdbe9dReported-by: syzbot <syzbot+50ee810676e6a089487b@syzkaller.appspotmail.com> Tested-by: syzbot <syzbot+50ee810676e6a089487b@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: ced72933 ("batman-adv: use CRC32C instead of CRC16 in TT code") Fixes: 7ea7b4a1 ("batman-adv: make the TT CRC logic VLAN specific") Acked-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Salil Mehta says: ==================== Misc. fixes for hns3 driver Fixes for the miscellaneous problems found during the review of the code. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Salil Mehta authored
Code to defer the reset(which caps the frequency of the reset) schedules the timer and returns. Hence, following 'else-if' looks un-necessary. Fixes: 9de0b86f ("net: hns3: Prevent to request reset frequently") Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Salil Mehta authored
This removes the left over check and assignment which is no longer used anywhere in the function and should have been removed as part of the below mentioned patch. Fixes: 012fcb52 ("net: hns3: activate reset timer when calling reset_event") Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Maciej Żenczykowski authored
Found by virtue of ipv6 raw sockets not honouring the per-socket IP{,V6}_FREEBIND setting. Based on hits found via: git grep '[.]ip_nonlocal_bind' We fix both raw ipv6 sockets to honour IP{,V6}_FREEBIND and IP{,V6}_TRANSPARENT, and we fix sctp sockets to honour IP{,V6}_TRANSPARENT (they already honoured FREEBIND), and not just the ipv6 'ip_nonlocal_bind' sysctl. The helper is defined as: static inline bool ipv6_can_nonlocal_bind(struct net *net, struct inet_sock *inet) { return net->ipv6.sysctl.ip_nonlocal_bind || inet->freebind || inet->transparent; } so this change only widens the accepted opt-outs and is thus a clean bugfix. I'm not entirely sure what 'fixes' tag to add, since this is AFAICT an ancient bug, but IMHO this should be applied to stable kernels as far back as possible. As such I'm adding a 'fixes' tag with the commit that originally added the helper, which happened in 4.19. Backporting to older LTS kernels (at least 4.9 and 4.14) would presumably require open-coding it or backporting the helper as well. Other possibly relevant commits: v4.18-rc6-1502-g83ba4645 net: add helpers checking if socket can be bound to nonlocal address v4.18-rc6-1431-gd0c1f011 net/ipv6: allow any source address for sendmsg pktinfo with ip_nonlocal_bind v4.14-rc5-271-gb71d21c2 sctp: full support for ipv6 ip_nonlocal_bind & IP_FREEBIND v4.7-rc7-1883-g9b974202 sctp: support ipv6 nonlocal bind v4.1-12247-g35a256fe ipv6: Nonlocal bind Cc: Lorenzo Colitti <lorenzo@google.com> Fixes: 83ba4645 ("net: add helpers checking if socket can be bound to nonlocal address") Signed-off-by: Maciej Żenczykowski <maze@google.com> Reviewed-By: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ilya Maximets authored
'struct ovs_zone_limit' has more members than initialized in ovs_ct_limit_get_default_limit(). The rest of the memory is a random kernel stack content that ends up being sent to userspace. Fix that by using designated initializer that will clear all non-specified fields. Fixes: 11efd5cb ("openvswitch: Support conntrack zone limit") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 02 Apr, 2021 3 commits
-
-
Claudiu Beznea authored
Restore CMP screener registers on resume path. Fixes: c1e85c6c ("net: macb: save/restore the remaining registers and features") Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yunjian Wang authored
The 'unlocked_driver_cb' struct field in 'bo' is not being initialized in tcf_block_offload_init(). The uninitialized 'unlocked_driver_cb' will be used when calling unlocked_driver_cb(). So initialize 'bo' to zero to avoid the issue. Addresses-Coverity: ("Uninitialized scalar variable") Fixes: 0fdcf78d ("net: use flow_indr_dev_setup_offload()") Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller authored
Alexei Starovoitov says: ==================== pull-request: bpf 2021-04-01 The following pull-request contains BPF updates for your *net* tree. We've added 11 non-merge commits during the last 8 day(s) which contain a total of 10 files changed, 151 insertions(+), 26 deletions(-). The main changes are: 1) xsk creation fixes, from Ciara. 2) bpf_get_task_stack fix, from Dave. 3) trampoline in modules fix, from Jiri. 4) bpf_obj_get fix for links and progs, from Lorenz. 5) struct_ops progs must be gpl compatible fix, from Toke. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 01 Apr, 2021 17 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queueDavid S. Miller authored
Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2021-04-01 This series contains updates to i40e driver only. Arkadiusz fixes warnings for inconsistent indentation. Magnus fixes an issue on xsk receive where single packets over time are batched rather than received immediately. Eryk corrects warnings and reporting of veb-stats. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Paolo Abeni says: ==================== mptcp: mptcp: fix deadlock in mptcp{,6}_release syzkaller has reported a few deadlock triggered by mptcp{,6}_release. These patches address the issue in the easy way - blocking the relevant, multicast related, sockopt options on MPTCP sockets. Note that later on net-next we are going to revert patch 1/2, as a part of a larger MPTCP sockopt implementation refactor ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
This change reverts commit ad98dd37 ("mptcp: provide subflow aware release function"). The latter introduced a deadlock spotted by syzkaller and is not needed anymore after the previous commit. Fixes: ad98dd37 ("mptcp: provide subflow aware release function") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
Unrolling mcast state at msk dismantel time is bug prone, as syzkaller reported: ====================================================== WARNING: possible circular locking dependency detected 5.11.0-syzkaller #0 Not tainted ------------------------------------------------------ syz-executor905/8822 is trying to acquire lock: ffffffff8d678fe8 (rtnl_mutex){+.+.}-{3:3}, at: ipv6_sock_mc_close+0xd7/0x110 net/ipv6/mcast.c:323 but task is already holding lock: ffff888024390120 (sk_lock-AF_INET6){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1600 [inline] ffff888024390120 (sk_lock-AF_INET6){+.+.}-{0:0}, at: mptcp6_release+0x57/0x130 net/mptcp/protocol.c:3507 which lock already depends on the new lock. Instead we can simply forbit any mcast-related setsockopt Fixes: 717e79c8 ("mptcp: Add setsockopt()/getsockopt() socket operations") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Pavel Skripkin authored
syzbot reported memory leak in peak_usb. The problem was in case of failure after calling ->dev_init()[2] in peak_usb_create_dev()[1]. The data allocated int dev_init() wasn't freed, so simple ->dev_free() call fix this problem. backtrace: [<0000000079d6542a>] kmalloc include/linux/slab.h:552 [inline] [<0000000079d6542a>] kzalloc include/linux/slab.h:682 [inline] [<0000000079d6542a>] pcan_usb_fd_init+0x156/0x210 drivers/net/can/usb/peak_usb/pcan_usb_fd.c:868 [2] [<00000000c09f9057>] peak_usb_create_dev drivers/net/can/usb/peak_usb/pcan_usb_core.c:851 [inline] [1] [<00000000c09f9057>] peak_usb_probe+0x389/0x490 drivers/net/can/usb/peak_usb/pcan_usb_core.c:949 Reported-by: syzbot+91adee8d9ebb9193d22d@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Norman Maurer authored
Support for UDP_GRO was added in the past but the implementation for getsockopt was missed which did lead to an error when we tried to retrieve the setting for UDP_GRO. This patch adds the missing switch case for UDP_GRO Fixes: e20cf8d3 ("udp: implement GRO for plain UDP sockets.") Signed-off-by: Norman Maurer <norman_maurer@apple.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Pavel Skripkin authored
syzbot reported memory leak in atusb_probe()[1]. The problem was in atusb_alloc_urbs(). Since urb is anchored, we need to release the reference to correctly free the urb backtrace: [<ffffffff82ba0466>] kmalloc include/linux/slab.h:559 [inline] [<ffffffff82ba0466>] usb_alloc_urb+0x66/0xe0 drivers/usb/core/urb.c:74 [<ffffffff82ad3888>] atusb_alloc_urbs drivers/net/ieee802154/atusb.c:362 [inline][2] [<ffffffff82ad3888>] atusb_probe+0x158/0x820 drivers/net/ieee802154/atusb.c:1038 [1] Reported-by: syzbot+28a246747e0a465127f3@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alexei Starovoitov authored
Ciara Loftus says: ==================== This series fixes some issues around socket creation for AF_XDP. Patch 1 fixes a potential NULL pointer dereference in xsk_socket__create_shared. Patch 2 ensures that the umem passed to xsk_socket__create(_shared) remains unchanged in event of failure. Patch 3 makes it possible for xsk_socket__create(_shared) to succeed even if the rx and tx XDP rings have already been set up by introducing a new fields to struct xsk_umem which represent the ring setup status for the xsk which shares the fd with the umem. v3->v4: * Reduced nesting in xsk_put_ctx as suggested by Alexei. * Use bools instead of a u8 and flags to represent the ring setup status as suggested by Björn. v2->v3: * Instead of ignoring the return values of the setsockopt calls, introduce a new flag to determine whether or not to call them based on the ring setup status as suggested by Alexei. v1->v2: * Simplified restoring the _save pointers as suggested by Magnus. * Fixed the condition which determines whether to unmap umem rings when socket create fails. ==================== Acked-by: Björn Töpel <bjorn@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Ciara Loftus authored
Prior to this commit xsk_socket__create(_shared) always attempted to create the rx and tx rings for the socket. However this causes an issue when the socket being setup is that which shares the fd with the UMEM. If a previous call to this function failed with this socket after the rings were set up, a subsequent call would always fail because the rings are not torn down after the first call and when we try to set them up again we encounter an error because they already exist. Solve this by remembering whether the rings were set up by introducing new bools to struct xsk_umem which represent the ring setup status and using them to determine whether or not to set up the rings. Fixes: 1cad0788 ("libbpf: add support for using AF_XDP sockets") Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210331061218.1647-4-ciara.loftus@intel.com
-
Ciara Loftus authored
If the call to xsk_socket__create fails, the user may want to retry the socket creation using the same umem. Ensure that the umem is in the same state on exit if the call fails by: 1. ensuring the umem _save pointers are unmodified. 2. not unmapping the set of umem rings that were set up with the umem during xsk_umem__create, since those maps existed before the call to xsk_socket__create and should remain in tact even in the event of failure. Fixes: 2f6324a3 ("libbpf: Support shared umems between queues and devices") Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210331061218.1647-3-ciara.loftus@intel.com
-
Ciara Loftus authored
Calls to xsk_socket__create dereference the umem to access the fill_save and comp_save pointers. Make sure the umem is non-NULL before doing this. Fixes: 2f6324a3 ("libbpf: Support shared umems between queues and devices") Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20210331061218.1647-2-ciara.loftus@intel.com
-
Lorenz Bauer authored
As for bpf_link, refuse creating a non-O_RDWR fd. Since program fds currently don't allow modifications this is a precaution, not a straight up bug fix. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210326160501.46234-2-lmb@cloudflare.com
-
Lorenz Bauer authored
Invoking BPF_OBJ_GET on a pinned bpf_link checks the path access permissions based on file_flags, but the returned fd ignores flags. This means that any user can acquire a "read-write" fd for a pinned link with mode 0664 by invoking BPF_OBJ_GET with BPF_F_RDONLY in file_flags. The fd can be used to invoke BPF_LINK_DETACH, etc. Fix this by refusing non-O_RDWR flags in BPF_OBJ_GET. This works because OBJ_GET by default returns a read write mapping and libbpf doesn't expose a way to override this behaviour for programs and links. Fixes: 70ed506c ("bpf: Introduce pinnable bpf_link abstraction") Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210326160501.46234-1-lmb@cloudflare.com
-
Dave Marchevsky authored
On x86 the struct pt_regs * grabbed by task_pt_regs() points to an offset of task->stack. The pt_regs are later dereferenced in __bpf_get_stack (e.g. by user_mode() check). This can cause a fault if the task in question exits while bpf_get_task_stack is executing, as warned by task_stack_page's comment: * When accessing the stack of a non-current task that might exit, use * try_get_task_stack() instead. task_stack_page will return a pointer * that could get freed out from under you. Taking the comment's advice and using try_get_task_stack() and put_task_stack() to hold task->stack refcount, or bail early if it's already 0. Incrementing stack_refcount will ensure the task's stack sticks around while we're using its data. I noticed this bug while testing a bpf task iter similar to bpf_iter_task_stack in selftests, except mine grabbed user stack, and getting intermittent crashes, which resulted in dumps like: BUG: unable to handle page fault for address: 0000000000003fe0 \#PF: supervisor read access in kernel mode \#PF: error_code(0x0000) - not-present page RIP: 0010:__bpf_get_stack+0xd0/0x230 <snip...> Call Trace: bpf_prog_0a2be35c092cb190_get_task_stacks+0x5d/0x3ec bpf_iter_run_prog+0x24/0x81 __task_seq_show+0x58/0x80 bpf_seq_read+0xf7/0x3d0 vfs_read+0x91/0x140 ksys_read+0x59/0xd0 do_syscall_64+0x48/0x120 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: fa28dcb8 ("bpf: Introduce helper bpf_get_task_stack()") Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210401000747.3648767-1-davemarchevsky@fb.com
-
Eryk Rybak authored
If veb-stats was enabled, the ethtool stats triggered a warning due to invalid size: 'unexpected stat size for veb.tc_%u_tx_packets'. This was due to an incorrect structure definition for the statistics. Structures and functions have been improved in line with requirements for the presentation of statistics, in particular for the functions: 'i40e_add_ethtool_stats' and 'i40e_add_stat_strings'. Fixes: 1510ae0b ("i40e: convert VEB TC stats to use an i40e_stats array") Signed-off-by: Eryk Rybak <eryk.roch.rybak@intel.com> Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Dave Switzer <david.switzer@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Magnus Karlsson authored
Fix so that single packets are received immediately instead of in batches of 8. If you sent 1 pps to a system, you received 8 packets every 8 seconds instead of 1 packet every second. The problem behind this was that the work_done reporting from the Tx part of the driver was broken. The work_done reporting in i40e controls not only the reporting back to the napi logic but also the setting of the interrupt throttling logic. When Tx or Rx reports that it has more to do, interrupts are throttled or coalesced and when they both report that they are done, interrupts are armed right away. If the wrong work_done value is returned, the logic will start to throttle interrupts in a situation where it should have just enabled them. This leads to the undesired batching behavior seen in user-space. Fix this by returning the correct boolean value from the Tx xsk zero-copy path. Return true if there is nothing to do or if we got fewer packets to process than we asked for. Return false if we got as many packets as the budget since there might be more packets we can process. Fixes: 3106c580 ("i40e: Use batched xsk Tx interfaces to increase performance") Reported-by: Sreedevi Joshi <sreedevi.joshi@intel.com> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Arkadiusz Kubalewski authored
Fixed new static analysis findings: "warn: inconsistent indenting" - introduced lately, reported with lkp and smatch build. Fixes: 4b208eaa ("i40e: Add init and default config of software based DCB") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Tested-by: Dave Switzer <david.switzer@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
- 31 Mar, 2021 1 commit
-
-
Ong Boon Leong authored
xdp_return_frame() may be called outside of NAPI context to return xdpf back to page_pool. xdp_return_frame() calls __xdp_return() with napi_direct = false. For page_pool memory model, __xdp_return() calls xdp_return_frame_no_direct() unconditionally and below false negative kernel BUG throw happened under preempt-rt build: [ 430.450355] BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/3884 [ 430.451678] caller is __xdp_return+0x1ff/0x2e0 [ 430.452111] CPU: 0 PID: 3884 Comm: modprobe Tainted: G U E 5.12.0-rc2+ #45 Changes in v2: - This patch fixes the issue by making xdp_return_frame_no_direct() is only called if napi_direct = true, as recommended for better by Jesper Dangaard Brouer. Thanks! Fixes: 2539650f ("xdp: Helpers for disabling napi_direct of xdp_return_frame") Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-