- 30 Sep, 2021 37 commits
-
-
Lama Kayal authored
When performing PF reload, VF can't communicate with FW until it recovers and reloads as well. Add a warning message when performing devlink reload while VFs are still present. Thus, giving a notice of an unfavorable behavior that might occur as a result of a consequential reloads and cause interruption of VF recovery. Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Yevgeny Kliteynik authored
Add missing string value for DR_ACTION_TYP_SAMPLER action type Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Yevgeny Kliteynik authored
Allocate next steering table entry only if the remaining space requires to. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Yevgeny Kliteynik authored
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Yevgeny Kliteynik authored
Increase max supported number of actions in the same rule. Signed-off-by: Hamdan Igbaria <hamdani@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Yevgeny Kliteynik authored
Move all the vport capabilities to a separate struct and store vport caps in XArray: SFs vport numbers will not come in the same range as VF vports, so the existing implementation of vport capabilities as a fixed size array is not suitable here. XArray is a perfect fit: it is efficient when the indices used are densely clustered. In addition to being a perfect fit as a dynamic data structure, XArray also provides locking - it uses RCU and an internal spinlock to synchronise access, so no additional protection needed. Now except for the eswitch manager vport, all other vports (including the uplink vport) are handled in the same way: when a new go-to-vport action is added, this vport's caps are loaded from the xarray. If it is the first time for this particular vport number, then its capabilities are queried from FW and filled in into the appropriate entry. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Muhammad Sammar <muhammads@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Yevgeny Kliteynik authored
Implement csum recalculation flow tables in XAarray instead of a fixed array, thus adding support for csum recalc table on any valid vport number, which enables this support for SFs. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Muhammad Sammar <muhammads@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Yevgeny Kliteynik authored
Print similar error messages when an invalid vport number is provided during action creation and during STEv0/1 creation. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Muhammad Sammar <muhammads@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Yevgeny Kliteynik authored
Currently, vport 0 capabilities are not set. To fix this, we now querying both eswitch manager and vport 0. Eswitch manager has an access to all the vports - for eswitch manager PF, all vports can be referred as other vports. The exception is embedded CPU mode, where there is vport 0 of ECPF and the PF vport 0. Here is how vport are queried: For Connect-X5/6: PF vport (0) and vports 1..n: vport number, other = true esw_manager is vport 0 (PF) For BlueField (in embedded CPU mode): ECPF vport: vport = 0, other = false PF vport (0) and 1..n: vport number, other = true esw_manager = vport 0 (ECPF) Also, note that there's no need for other_vport function parameter in dr_domain_query_vport - this value is now deduced locally in the function. Signed-off-by: Yuval Avnery <yuvalav@mellanox.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Muhammad Sammar <muhammads@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Yevgeny Kliteynik authored
SW steering defines its own macro for uplink vport number. Replace this macro with an already existing mlx5 macro. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Yevgeny Kliteynik authored
According to the HW spec, vport number is a 16-bit value. Fix vport usage all over the code to u16 data type. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Muhammad Sammar <muhammads@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski authored
drivers/net/phy/bcm7xxx.c d88fd1b5 ("net: phy: bcm7xxx: Fixed indirect MMD operations") f68d08c4 ("net: phy: bcm7xxx: Add EPHY entry for 72165") net/sched/sch_api.c b193e15a ("net: prevent user from passing illegal stab size") 69508d43 ("net_sched: Use struct_size() and flex_array_size() helpers") Both cases trivial - adjacent code additions. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds authored
Pull networking fixes from Jakub Kicinski: "Networking fixes, including fixes from mac80211, netfilter and bpf. Current release - regressions: - bpf, cgroup: assign cgroup in cgroup_sk_alloc when called from interrupt - mdio: revert mechanical patches which broke handling of optional resources - dev_addr_list: prevent address duplication Previous releases - regressions: - sctp: break out if skb_header_pointer returns NULL in sctp_rcv_ootb (NULL deref) - Revert "mac80211: do not use low data rates for data frames with no ack flag", fixing broadcast transmissions - mac80211: fix use-after-free in CCMP/GCMP RX - netfilter: include zone id in tuple hash again, minimize collisions - netfilter: nf_tables: unlink table before deleting it (race -> UAF) - netfilter: log: work around missing softdep backend module - mptcp: don't return sockets in foreign netns - sched: flower: protect fl_walk() with rcu (race -> UAF) - ixgbe: fix NULL pointer dereference in ixgbe_xdp_setup - smsc95xx: fix stalled rx after link change - enetc: fix the incorrect clearing of IF_MODE bits - ipv4: fix rtnexthop len when RTA_FLOW is present - dsa: mv88e6xxx: 6161: use correct MAX MTU config method for this SKU - e100: fix length calculation & buffer overrun in ethtool::get_regs Previous releases - always broken: - mac80211: fix using stale frag_tail skb pointer in A-MSDU tx - mac80211: drop frames from invalid MAC address in ad-hoc mode - af_unix: fix races in sk_peer_pid and sk_peer_cred accesses (race -> UAF) - bpf, x86: Fix bpf mapping of atomic fetch implementation - bpf: handle return value of BPF_PROG_TYPE_STRUCT_OPS prog - netfilter: ip6_tables: zero-initialize fragment offset - mhi: fix error path in mhi_net_newlink - af_unix: return errno instead of NULL in unix_create1() when over the fs.file-max limit Misc: - bpf: exempt CAP_BPF from checks against bpf_jit_limit - netfilter: conntrack: make max chain length random, prevent guessing buckets by attackers - netfilter: nf_nat_masquerade: make async masq_inet6_event handling generic, defer conntrack walk to work queue (prevent hogging RTNL lock)" * tag 'net-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (77 commits) af_unix: fix races in sk_peer_pid and sk_peer_cred accesses net: stmmac: fix EEE init issue when paired with EEE capable PHYs net: dev_addr_list: handle first address in __hw_addr_add_ex net: sched: flower: protect fl_walk() with rcu net: introduce and use lock_sock_fast_nested() net: phy: bcm7xxx: Fixed indirect MMD operations net: hns3: disable firmware compatible features when uninstall PF net: hns3: fix always enable rx vlan filter problem after selftest net: hns3: PF enable promisc for VF when mac table is overflow net: hns3: fix show wrong state when add existing uc mac address net: hns3: fix mixed flag HCLGE_FLAG_MQPRIO_ENABLE and HCLGE_FLAG_DCB_ENABLE net: hns3: don't rollback when destroy mqprio fail net: hns3: remove tc enable checking net: hns3: do not allow call hns3_nic_net_open repeatedly ixgbe: Fix NULL pointer dereference in ixgbe_xdp_setup net: bridge: mcast: Associate the seqcount with its protecting lock. net: mdio-ipq4019: Fix the error for an optional regs resource net: hns3: fix hclge_dbg_dump_tm_pg() stack usage net: mdio: mscc-miim: Fix the mdio controller af_unix: Return errno instead of NULL in unix_create1(). ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linuxLinus Torvalds authored
Pull gpio fixes from Bartosz Golaszewski: "A single fix for the gpio-pca953x driver and two commits updating the MAINTAINERS entries for Mun Yew Tham (GPIO specific) and myself (treewide after a change in professional situation). Summary: - don't ignore I2C errors in gpio-pca953x - update MAINTAINERS entries for Mun Yew Tham and myself" * tag 'gpio-fixes-for-v5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: MAINTAINERS: Update Mun Yew Tham as Altera Pio Driver maintainer MAINTAINERS: update my email address gpio: pca953x: do not ignore i2c errors
-
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds authored
Pull rdma fixes from Jason Gunthorpe: "Not much too exciting here, although two syzkaller bugs that seem to have 9 lives may have finally been squashed. Several core bugs and a batch of driver bug fixes: - Fix compilation problems in qib and hfi1 - Do not corrupt the joined multicast group state when using SEND_ONLY - Several CMA bugs, a reference leak for listening and two syzkaller crashers - Various bug fixes for irdma - Fix a Sleeping while atomic bug in usnic - Properly sanitize kernel pointers in dmesg - Two bugs in the 64b CQE support for hns" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/hns: Add the check of the CQE size of the user space RDMA/hns: Fix the size setting error when copying CQE in clean_cq() RDMA/hfi1: Fix kernel pointer leak RDMA/usnic: Lock VF with mutex instead of spinlock RDMA/hns: Work around broken constant propagation in gcc 8 RDMA/cma: Ensure rdma_addr_cancel() happens before issuing more requests RDMA/cma: Do not change route.addr.src_addr.ss_family RDMA/irdma: Report correct WC error when there are MW bind errors RDMA/irdma: Report correct WC error when transport retry counter is exceeded RDMA/irdma: Validate number of CQ entries on create CQ RDMA/irdma: Skip CQP ring during a reset MAINTAINERS: Update Broadcom RDMA maintainers RDMA/cma: Fix listener leak in rdma_cma_listen_on_all() failure IB/cma: Do not send IGMP leaves for sendonly Multicast groups IB/qib: Fix clang confusion of NULL pointer comparison
-
Eric Dumazet authored
Jann Horn reported that SO_PEERCRED and SO_PEERGROUPS implementations are racy, as af_unix can concurrently change sk_peer_pid and sk_peer_cred. In order to fix this issue, this patch adds a new spinlock that needs to be used whenever these fields are read or written. Jann also pointed out that l2cap_sock_get_peer_pid_cb() is currently reading sk->sk_peer_pid which makes no sense, as this field is only possibly set by AF_UNIX sockets. We will have to clean this in a separate patch. This could be done by reverting b48596d1 "Bluetooth: L2CAP: Add get_peer_pid callback" or implementing what was truly expected. Fixes: 109f6e39 ("af_unix: Allow SO_PEERCRED to work across namespaces.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jann Horn <jannh@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Cc: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Eric Dumazet says: ==================== net: snmp: minor optimizations Fetching many SNMP counters on hosts with large number of cpus takes a lot of time. mptcp still uses the old non-batched fashion which is not cache friendly. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Using snmp_get_cpu_field_batch() allows for better cpu cache utilization, especially on hosts with large number of cpus. Also remove special handling when mptcp mibs where not yet allocated. I chose to use temporary storage on the stack to keep this patch simple. We might in the future use the storage allocated in netstat_seq_show(). Combined with prior patch (inlining snmp_get_cpu_field) time to fetch and output mptcp counters on a 256 cpu host [1] goes from 75 usec to 16 usec. [1] L1 cache size is 32KB, it is not big enough to hold all dataset. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
This trivial function is called ~90,000 times on 256 cpus hosts, when reading /proc/net/netstat. And this number keeps inflating. Inlining it saves many cycles. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Joshua Roys authored
Add counters for XDP REDIRECT success and failure. This brings the redirect path in line with metrics gathered via the other XDP paths. Signed-off-by: Joshua Roys <roysjosh@gmail.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Wong Vee Khee authored
When STMMAC is paired with Energy-Efficient Ethernet(EEE) capable PHY, and the PHY is advertising EEE by default, we need to enable EEE on the xPCS side too, instead of having user to manually trigger the enabling config via ethtool. Fixed this by adding xpcs_config_eee() call in stmmac_eee_init(). Fixes: 7617af3d ("net: pcs: Introducing support for DWC xpcs Energy Efficient Ethernet") Cc: Michael Sit Wei Hong <michael.wei.hong.sit@intel.com> Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jason Xing authored
Originally, ixgbe driver doesn't allow the mounting of xdpdrv if the server is equipped with more than 64 cpus online. So it turns out that the loading of xdpdrv causes the "NOMEM" failure. Actually, we can adjust the algorithm and then make it work through mapping the current cpu to some xdp ring with the protect of @tx_lock. Here are some numbers before/after applying this patch with xdp-example loaded on the eth0X: As client (tx path): Before After TCP_STREAM send-64 734.14 714.20 TCP_STREAM send-128 1401.91 1395.05 TCP_STREAM send-512 5311.67 5292.84 TCP_STREAM send-1k 9277.40 9356.22 (not stable) TCP_RR send-1 22559.75 21844.22 TCP_RR send-128 23169.54 22725.13 TCP_RR send-512 21670.91 21412.56 As server (rx path): Before After TCP_STREAM send-64 1416.49 1383.12 TCP_STREAM send-128 3141.49 3055.50 TCP_STREAM send-512 9488.73 9487.44 TCP_STREAM send-1k 9491.17 9356.22 (not stable) TCP_RR send-1 23617.74 23601.60 ... Notice: the TCP_RR mode is unstable as the official document explains. I tested many times with different parameters combined through netperf. Though the result is not that accurate, I cannot see much influence on this patch. The static key is places on the hot path, but it actually shouldn't cause a huge regression theoretically. Co-developed-by: Shujin Li <lishujin@kuaishou.com> Signed-off-by: Shujin Li <lishujin@kuaishou.com> Signed-off-by: Jason Xing <xingwanli@kuaishou.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Wei Wang says: ==================== net: add new socket option SO_RESERVE_MEM This patch series introduces a new socket option SO_RESERVE_MEM. This socket option provides a mechanism for users to reserve a certain amount of memory for the socket to use. When this option is set, kernel charges the user specified amount of memory to memcg, as well as sk_forward_alloc. This amount of memory is not reclaimable and is available in sk_forward_alloc for this socket. With this socket option set, the networking stack spends less cycles doing forward alloc and reclaim, which should lead to better system performance, with the cost of an amount of pre-allocated and unreclaimable memory, even under memory pressure. With a tcp_stream test with 10 flows running on a simulated 100ms RTT link, I can see the cycles spent in __sk_mem_raise_allocated() dropping by ~0.02%. Not a whole lot, since we already have logic in sk_mem_uncharge() to only reclaim 1MB when sk_forward_alloc has more than 2MB free space. But on a system suffering memory pressure constently, the savings should be more. The first patch is the implementation of this socket option. The following 2 patches change the tcp stack to make use of this reserved memory when under memory pressure. This makes the tcp stack behavior more flexible when under memory pressure, and provides a way for user to control the distribution of the memory among its sockets. With a TCP connection on a simulated 100ms RTT link, the default throughput under memory pressure is ~500Kbps. With SO_RESERVE_MEM set to 100KB, the throughput under memory pressure goes up to ~3.5Mbps. Change since v2: - Added description for new field added in struct sock in patch 1 Change since v1: - Added performance stats in cover letter and rebased ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Wei Wang authored
When user sets SO_RESERVE_MEM socket option, in order to utilize the reserved memory when in memory pressure state, we adjust rcv_ssthresh according to the available reserved memory for the socket, instead of using 4 * advmss always. Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Wei Wang authored
If user sets SO_RESERVE_MEM socket option, in order to fully utilize the reserved memory in memory pressure state on the tx path, we modify the logic in sk_stream_moderate_sndbuf() to set sk_sndbuf according to available reserved memory, instead of MIN_SOCK_SNDBUF, and adjust it when new data is acked. Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Wei Wang authored
This socket option provides a mechanism for users to reserve a certain amount of memory for the socket to use. When this option is set, kernel charges the user specified amount of memory to memcg, as well as sk_forward_alloc. This amount of memory is not reclaimable and is available in sk_forward_alloc for this socket. With this socket option set, the networking stack spends less cycles doing forward alloc and reclaim, which should lead to better system performance, with the cost of an amount of pre-allocated and unreclaimable memory, even under memory pressure. Note: This socket option is only available when memory cgroup is enabled and we require this reserved memory to be charged to the user's memcg. We hope this could avoid mis-behaving users to abused this feature to reserve a large amount on certain sockets and cause unfairness for others. Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
struct dev_addr_list is used for device addresses, unicast addresses and multicast addresses. The first of those needs special handling of the main address - netdev->dev_addr points directly the data of the entry and drivers write to it freely, so we can't maintain it in the rbtree (for now, at least, to be fixed in net-next). Current work around sprinkles special handling of the first address on the list throughout the code but it missed the case where address is being added. First address will not be visible during subsequent adds. Syzbot found a warning where unicast addresses are modified without holding the rtnl lock, tl;dr is that team generates the same modification multiple times, not necessarily when right locks are held. In the repro we have: macvlan -> team -> veth macvlan adds a unicast address to the team. Team then pushes that address down to its memebers (veths). Next something unrelated makes team sync member addrs again, and because of the bug the addr entries get duplicated in the veths. macvlan gets removed, removes its addr from team which removes only one of the duplicated addresses from veths. This removal is done under rtnl. Next syzbot uses iptables to add a multicast addr to team (which does not hold rtnl lock). Team syncs veth addrs, but because veths' unicast list still has the duplicate it will also get sync, even though this update is intended for mc addresses. Again, uc address updates need rtnl lock, boom. Reported-by: syzbot+7a2ab2cdc14d134de553@syzkaller.appspotmail.com Fixes: 406f42fa ("net-next: When a bond have a massive amount of VLANs with IPv6 addresses, performance of changing link state, attaching a VRF, changing an IPv6 address, etc. go down dramtically.") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Russell King authored
Add support for the downshift tunable for the Marvell 88x3310 PHY. Downshift is only usable with firmware 0.3.5.0 and later. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vlad Buslov authored
Patch that refactored fl_walk() to use idr_for_each_entry_continue_ul() also removed rcu protection of individual filters which causes following use-after-free when filter is deleted concurrently. Fix fl_walk() to obtain rcu read lock while iterating and taking the filter reference and temporary release the lock while calling arg->fn() callback that can sleep. KASAN trace: [ 352.773640] ================================================================== [ 352.775041] BUG: KASAN: use-after-free in fl_walk+0x159/0x240 [cls_flower] [ 352.776304] Read of size 4 at addr ffff8881c8251480 by task tc/2987 [ 352.777862] CPU: 3 PID: 2987 Comm: tc Not tainted 5.15.0-rc2+ #2 [ 352.778980] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 352.781022] Call Trace: [ 352.781573] dump_stack_lvl+0x46/0x5a [ 352.782332] print_address_description.constprop.0+0x1f/0x140 [ 352.783400] ? fl_walk+0x159/0x240 [cls_flower] [ 352.784292] ? fl_walk+0x159/0x240 [cls_flower] [ 352.785138] kasan_report.cold+0x83/0xdf [ 352.785851] ? fl_walk+0x159/0x240 [cls_flower] [ 352.786587] kasan_check_range+0x145/0x1a0 [ 352.787337] fl_walk+0x159/0x240 [cls_flower] [ 352.788163] ? fl_put+0x10/0x10 [cls_flower] [ 352.789007] ? __mutex_unlock_slowpath.constprop.0+0x220/0x220 [ 352.790102] tcf_chain_dump+0x231/0x450 [ 352.790878] ? tcf_chain_tp_delete_empty+0x170/0x170 [ 352.791833] ? __might_sleep+0x2e/0xc0 [ 352.792594] ? tfilter_notify+0x170/0x170 [ 352.793400] ? __mutex_unlock_slowpath.constprop.0+0x220/0x220 [ 352.794477] tc_dump_tfilter+0x385/0x4b0 [ 352.795262] ? tc_new_tfilter+0x1180/0x1180 [ 352.796103] ? __mod_node_page_state+0x1f/0xc0 [ 352.796974] ? __build_skb_around+0x10e/0x130 [ 352.797826] netlink_dump+0x2c0/0x560 [ 352.798563] ? netlink_getsockopt+0x430/0x430 [ 352.799433] ? __mutex_unlock_slowpath.constprop.0+0x220/0x220 [ 352.800542] __netlink_dump_start+0x356/0x440 [ 352.801397] rtnetlink_rcv_msg+0x3ff/0x550 [ 352.802190] ? tc_new_tfilter+0x1180/0x1180 [ 352.802872] ? rtnl_calcit.isra.0+0x1f0/0x1f0 [ 352.803668] ? tc_new_tfilter+0x1180/0x1180 [ 352.804344] ? _copy_from_iter_nocache+0x800/0x800 [ 352.805202] ? kasan_set_track+0x1c/0x30 [ 352.805900] netlink_rcv_skb+0xc6/0x1f0 [ 352.806587] ? rht_deferred_worker+0x6b0/0x6b0 [ 352.807455] ? rtnl_calcit.isra.0+0x1f0/0x1f0 [ 352.808324] ? netlink_ack+0x4d0/0x4d0 [ 352.809086] ? netlink_deliver_tap+0x62/0x3d0 [ 352.809951] netlink_unicast+0x353/0x480 [ 352.810744] ? netlink_attachskb+0x430/0x430 [ 352.811586] ? __alloc_skb+0xd7/0x200 [ 352.812349] netlink_sendmsg+0x396/0x680 [ 352.813132] ? netlink_unicast+0x480/0x480 [ 352.813952] ? __import_iovec+0x192/0x210 [ 352.814759] ? netlink_unicast+0x480/0x480 [ 352.815580] sock_sendmsg+0x6c/0x80 [ 352.816299] ____sys_sendmsg+0x3a5/0x3c0 [ 352.817096] ? kernel_sendmsg+0x30/0x30 [ 352.817873] ? __ia32_sys_recvmmsg+0x150/0x150 [ 352.818753] ___sys_sendmsg+0xd8/0x140 [ 352.819518] ? sendmsg_copy_msghdr+0x110/0x110 [ 352.820402] ? ___sys_recvmsg+0xf4/0x1a0 [ 352.821110] ? __copy_msghdr_from_user+0x260/0x260 [ 352.821934] ? _raw_spin_lock+0x81/0xd0 [ 352.822680] ? __handle_mm_fault+0xef3/0x1b20 [ 352.823549] ? rb_insert_color+0x2a/0x270 [ 352.824373] ? copy_page_range+0x16b0/0x16b0 [ 352.825209] ? perf_event_update_userpage+0x2d0/0x2d0 [ 352.826190] ? __fget_light+0xd9/0xf0 [ 352.826941] __sys_sendmsg+0xb3/0x130 [ 352.827613] ? __sys_sendmsg_sock+0x20/0x20 [ 352.828377] ? do_user_addr_fault+0x2c5/0x8a0 [ 352.829184] ? fpregs_assert_state_consistent+0x52/0x60 [ 352.830001] ? exit_to_user_mode_prepare+0x32/0x160 [ 352.830845] do_syscall_64+0x35/0x80 [ 352.831445] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 352.832331] RIP: 0033:0x7f7bee973c17 [ 352.833078] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 [ 352.836202] RSP: 002b:00007ffcbb368e28 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 352.837524] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7bee973c17 [ 352.838715] RDX: 0000000000000000 RSI: 00007ffcbb368e50 RDI: 0000000000000003 [ 352.839838] RBP: 00007ffcbb36d090 R08: 00000000cea96d79 R09: 00007f7beea34a40 [ 352.841021] R10: 00000000004059bb R11: 0000000000000246 R12: 000000000046563f [ 352.842208] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffcbb36d088 [ 352.843784] Allocated by task 2960: [ 352.844451] kasan_save_stack+0x1b/0x40 [ 352.845173] __kasan_kmalloc+0x7c/0x90 [ 352.845873] fl_change+0x282/0x22db [cls_flower] [ 352.846696] tc_new_tfilter+0x6cf/0x1180 [ 352.847493] rtnetlink_rcv_msg+0x471/0x550 [ 352.848323] netlink_rcv_skb+0xc6/0x1f0 [ 352.849097] netlink_unicast+0x353/0x480 [ 352.849886] netlink_sendmsg+0x396/0x680 [ 352.850678] sock_sendmsg+0x6c/0x80 [ 352.851398] ____sys_sendmsg+0x3a5/0x3c0 [ 352.852202] ___sys_sendmsg+0xd8/0x140 [ 352.852967] __sys_sendmsg+0xb3/0x130 [ 352.853718] do_syscall_64+0x35/0x80 [ 352.854457] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 352.855830] Freed by task 7: [ 352.856421] kasan_save_stack+0x1b/0x40 [ 352.857139] kasan_set_track+0x1c/0x30 [ 352.857854] kasan_set_free_info+0x20/0x30 [ 352.858609] __kasan_slab_free+0xed/0x130 [ 352.859348] kfree+0xa7/0x3c0 [ 352.859951] process_one_work+0x44d/0x780 [ 352.860685] worker_thread+0x2e2/0x7e0 [ 352.861390] kthread+0x1f4/0x220 [ 352.862022] ret_from_fork+0x1f/0x30 [ 352.862955] Last potentially related work creation: [ 352.863758] kasan_save_stack+0x1b/0x40 [ 352.864378] kasan_record_aux_stack+0xab/0xc0 [ 352.865028] insert_work+0x30/0x160 [ 352.865617] __queue_work+0x351/0x670 [ 352.866261] rcu_work_rcufn+0x30/0x40 [ 352.866917] rcu_core+0x3b2/0xdb0 [ 352.867561] __do_softirq+0xf6/0x386 [ 352.868708] Second to last potentially related work creation: [ 352.869779] kasan_save_stack+0x1b/0x40 [ 352.870560] kasan_record_aux_stack+0xab/0xc0 [ 352.871426] call_rcu+0x5f/0x5c0 [ 352.872108] queue_rcu_work+0x44/0x50 [ 352.872855] __fl_put+0x17c/0x240 [cls_flower] [ 352.873733] fl_delete+0xc7/0x100 [cls_flower] [ 352.874607] tc_del_tfilter+0x510/0xb30 [ 352.886085] rtnetlink_rcv_msg+0x471/0x550 [ 352.886875] netlink_rcv_skb+0xc6/0x1f0 [ 352.887636] netlink_unicast+0x353/0x480 [ 352.888285] netlink_sendmsg+0x396/0x680 [ 352.888942] sock_sendmsg+0x6c/0x80 [ 352.889583] ____sys_sendmsg+0x3a5/0x3c0 [ 352.890311] ___sys_sendmsg+0xd8/0x140 [ 352.891019] __sys_sendmsg+0xb3/0x130 [ 352.891716] do_syscall_64+0x35/0x80 [ 352.892395] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 352.893666] The buggy address belongs to the object at ffff8881c8251000 which belongs to the cache kmalloc-2k of size 2048 [ 352.895696] The buggy address is located 1152 bytes inside of 2048-byte region [ffff8881c8251000, ffff8881c8251800) [ 352.897640] The buggy address belongs to the page: [ 352.898492] page:00000000213bac35 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1c8250 [ 352.900110] head:00000000213bac35 order:3 compound_mapcount:0 compound_pincount:0 [ 352.901541] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff) [ 352.902908] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042f00 [ 352.904391] raw: 0000000000000000 0000000000080008 00000001ffffffff 0000000000000000 [ 352.905861] page dumped because: kasan: bad access detected [ 352.907323] Memory state around the buggy address: [ 352.908218] ffff8881c8251380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 352.909471] ffff8881c8251400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 352.910735] >ffff8881c8251480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 352.912012] ^ [ 352.912642] ffff8881c8251500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 352.913919] ffff8881c8251580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 352.915185] ================================================================== Fixes: d39d7149 ("idr: introduce idr_for_each_entry_continue_ul()") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Acked-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Colin Ian King authored
The variable pin is being initialized with a value that is never read, it is being updated later on in only one case of a switch statement. The assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Lars-Peter Clausen authored
The macb PTP support currently implements the `gettime64` callback to allow to retrieve the hardware clock time. Update the implementation to provide the `gettimex64` callback instead. The difference between the two is that with `gettime64` a snapshot of the system clock is taken before and after invoking the callback. Whereas `gettimex64` expects the callback itself to take the snapshots. To get the time from the macb Ethernet core multiple register accesses have to be done. Only one of which will happen at the time reported by the function. This leads to a non-symmetric delay and adds a slight offset between the hardware and system clock time when using the `gettime64` method. This offset can be a few 100 nanoseconds. Switching to the `gettimex64` method allows for a more precise correlation of the hardware and system clocks and results in a lower offset between the two. On a Xilinx ZynqMP system `phc2sys` reports a delay of 1120 ns before and 300 ns after the patch. With the latter being mostly symmetric. Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Boris Sukholitko authored
The following flower filter fails to match non-PPP_IP{V6} packets wrapped in PPP_SES protocol: tc filter add dev eth0 ingress protocol ppp_ses flower \ action simple sdata hi64 The reason is that proto local variable is being set even when FLOW_DISSECT_RET_OUT_BAD status is returned. The fix is to avoid setting proto variable if the PPP protocol is unknown. Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Linus Walleij authored
We added a state variable to track whether a certain port was VLAN filtering or not, but we can just inquire the DSA core about this. Cc: Vladimir Oltean <olteanv@gmail.com> Cc: Mauri Sandberg <sandberg@mailfence.com> Cc: DENG Qingfang <dqfext@gmail.com> Cc: Alvin Šipraga <alsi@bang-olufsen.dk> Cc: Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
Syzkaller reported a false positive deadlock involving the nl socket lock and the subflow socket lock: MPTCP: kernel_bind error, err=-98 ============================================ WARNING: possible recursive locking detected 5.15.0-rc1-syzkaller #0 Not tainted -------------------------------------------- syz-executor998/6520 is trying to acquire lock: ffff8880795718a0 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close+0x267/0x7b0 net/mptcp/protocol.c:2738 but task is already holding lock: ffff8880787c8c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1612 [inline] ffff8880787c8c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close+0x23/0x7b0 net/mptcp/protocol.c:2720 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(k-sk_lock-AF_INET); lock(k-sk_lock-AF_INET); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by syz-executor998/6520: #0: ffffffff8d176c50 (cb_lock){++++}-{3:3}, at: genl_rcv+0x15/0x40 net/netlink/genetlink.c:802 #1: ffffffff8d176d08 (genl_mutex){+.+.}-{3:3}, at: genl_lock net/netlink/genetlink.c:33 [inline] #1: ffffffff8d176d08 (genl_mutex){+.+.}-{3:3}, at: genl_rcv_msg+0x3e0/0x580 net/netlink/genetlink.c:790 #2: ffff8880787c8c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1612 [inline] #2: ffff8880787c8c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close+0x23/0x7b0 net/mptcp/protocol.c:2720 stack backtrace: CPU: 1 PID: 6520 Comm: syz-executor998 Not tainted 5.15.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_deadlock_bug kernel/locking/lockdep.c:2944 [inline] check_deadlock kernel/locking/lockdep.c:2987 [inline] validate_chain kernel/locking/lockdep.c:3776 [inline] __lock_acquire.cold+0x149/0x3ab kernel/locking/lockdep.c:5015 lock_acquire kernel/locking/lockdep.c:5625 [inline] lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5590 lock_sock_fast+0x36/0x100 net/core/sock.c:3229 mptcp_close+0x267/0x7b0 net/mptcp/protocol.c:2738 inet_release+0x12e/0x280 net/ipv4/af_inet.c:431 __sock_release net/socket.c:649 [inline] sock_release+0x87/0x1b0 net/socket.c:677 mptcp_pm_nl_create_listen_socket+0x238/0x2c0 net/mptcp/pm_netlink.c:900 mptcp_nl_cmd_add_addr+0x359/0x930 net/mptcp/pm_netlink.c:1170 genl_family_rcv_msg_doit+0x228/0x320 net/netlink/genetlink.c:731 genl_family_rcv_msg net/netlink/genetlink.c:775 [inline] genl_rcv_msg+0x328/0x580 net/netlink/genetlink.c:792 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504 genl_rcv+0x24/0x40 net/netlink/genetlink.c:803 netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340 netlink_sendmsg+0x86d/0xdb0 net/netlink/af_netlink.c:1929 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:724 sock_no_sendpage+0x101/0x150 net/core/sock.c:2980 kernel_sendpage.part.0+0x1a0/0x340 net/socket.c:3504 kernel_sendpage net/socket.c:3501 [inline] sock_sendpage+0xe5/0x140 net/socket.c:1003 pipe_to_sendpage+0x2ad/0x380 fs/splice.c:364 splice_from_pipe_feed fs/splice.c:418 [inline] __splice_from_pipe+0x43e/0x8a0 fs/splice.c:562 splice_from_pipe fs/splice.c:597 [inline] generic_splice_sendpage+0xd4/0x140 fs/splice.c:746 do_splice_from fs/splice.c:767 [inline] direct_splice_actor+0x110/0x180 fs/splice.c:936 splice_direct_to_actor+0x34b/0x8c0 fs/splice.c:891 do_splice_direct+0x1b3/0x280 fs/splice.c:979 do_sendfile+0xae9/0x1240 fs/read_write.c:1249 __do_sys_sendfile64 fs/read_write.c:1314 [inline] __se_sys_sendfile64 fs/read_write.c:1300 [inline] __x64_sys_sendfile64+0x1cc/0x210 fs/read_write.c:1300 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f215cb69969 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 14 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffc96bb3868 EFLAGS: 00000246 ORIG_RAX: 0000000000000028 RAX: ffffffffffffffda RBX: 00007f215cbad072 RCX: 00007f215cb69969 RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000005 RBP: 0000000000000000 R08: 00007ffc96bb3a08 R09: 00007ffc96bb3a08 R10: 0000000100000002 R11: 0000000000000246 R12: 00007ffc96bb387c R13: 431bde82d7b634db R14: 0000000000000000 R15: 0000000000000000 the problem originates from uncorrect lock annotation in the mptcp code and is only visible since commit 2dcb96ba ("net: core: Correct the sock::sk_lock.owned lockdep annotations"), but is present since the port-based endpoint support initial implementation. This patch addresses the issue introducing a nested variant of lock_sock_fast() and using it in the relevant code path. Fixes: 1729cf18 ("mptcp: create the listening socket for new port") Fixes: 2dcb96ba ("net: core: Correct the sock::sk_lock.owned lockdep annotations") Suggested-by: Thomas Gleixner <tglx@linutronix.de> Reported-and-tested-by: syzbot+1dd53f7a89b299d59eaf@syzkaller.appspotmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Geetha sowjanya authored
Adds XDP_PASS, XDP_TX, XDP_DROP and XDP_REDIRECT support for netdev PF. Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Kiran Kumar K authored
In case of ltype NPC_LT_LA_CPT_HDR, LA pointer is pointing to the start of cpt parse header. Since cpt parse header has veriable length padding, this will be a problem for DMAC extraction. Adding KPU profile changes to adjust the LA pointer to start at ether header in case of cpt parse header by - Adding ptr advance in pkind 58 to a fixed value 40 - Adding variable length offset 7 and mask 7 (pad len in CPT_PARSE_HDR). Also added the missing static declaration for npc_set_var_len_offset_pkind function. Signed-off-by: Kiran Kumar K <kirankumark@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Gustavo A. R. Silva authored
Make use of the struct_size() and flex_array_size() helpers instead of an open-coded version, in order to avoid any potential type mistakes or integer overflows that, in the worse scenario, could lead to heap overflows. Link: https://github.com/KSPP/linux/issues/160Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20210928193107.GA262595@embeddedorSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 29 Sep, 2021 3 commits
-
-
Mun Yew Tham authored
Update Altera Pio Driver maintainer's email from <joyce.ooi@intel.com> to <mun.yew.tham@intel.com> Signed-off-by: Mun Yew Tham <mun.yew.tham@intel.com> Acked-by: Joyce Ooi <joyce.ooi@intel.com> Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
-
Bartosz Golaszewski authored
My professional situation changes soon. Update my email address. Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
-
Andrey Gusakov authored
Per gpio_chip interface, error shall be proparated to the caller. Attempt to silent diagnostics by returning zero (as written in the comment) is plain wrong, because the zero return can be interpreted by the caller as the gpio value. Cc: stable@vger.kernel.org Signed-off-by: Andrey Gusakov <andrey.gusakov@cogentembedded.com> Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com> Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
-