- 28 Mar, 2024 13 commits
-
-
Florian Westphal authored
ip_local_out() and other functions can pass skb->sk as function argument. If the skb is a fragment and reassembly happens before such function call returns, the sk must not be released. This affects skb fragments reassembled via netfilter or similar modules, e.g. openvswitch or ct_act.c, when run as part of tx pipeline. Eric Dumazet made an initial analysis of this bug. Quoting Eric: Calling ip_defrag() in output path is also implying skb_orphan(), which is buggy because output path relies on sk not disappearing. A relevant old patch about the issue was : 8282f274 ("inet: frag: Always orphan skbs inside ip_defrag()") [..] net/ipv4/ip_output.c depends on skb->sk being set, and probably to an inet socket, not an arbitrary one. If we orphan the packet in ipvlan, then downstream things like FQ packet scheduler will not work properly. We need to change ip_defrag() to only use skb_orphan() when really needed, ie whenever frag_list is going to be used. Eric suggested to stash sk in fragment queue and made an initial patch. However there is a problem with this: If skb is refragmented again right after, ip_do_fragment() will copy head->sk to the new fragments, and sets up destructor to sock_wfree. IOW, we have no choice but to fix up sk_wmem accouting to reflect the fully reassembled skb, else wmem will underflow. This change moves the orphan down into the core, to last possible moment. As ip_defrag_offset is aliased with sk_buff->sk member, we must move the offset into the FRAG_CB, else skb->sk gets clobbered. This allows to delay the orphaning long enough to learn if the skb has to be queued or if the skb is completing the reasm queue. In the former case, things work as before, skb is orphaned. This is safe because skb gets queued/stolen and won't continue past reasm engine. In the latter case, we will steal the skb->sk reference, reattach it to the head skb, and fix up wmem accouting when inet_frag inflates truesize. Fixes: 7026b1dd ("netfilter: Pass socket pointer down through okfn().") Diagnosed-by: Eric Dumazet <edumazet@google.com> Reported-by: xingwei lee <xrivendell7@gmail.com> Reported-by: yue sun <samsun1006219@gmail.com> Reported-by: syzbot+e5167d7144a62715044c@syzkaller.appspotmail.com Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240326101845.30836-1-fw@strlen.deSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Hariprasad Kelam authored
The Octeontx2 MAC block (CGX) has separate data paths (SMU and GMP) for different speeds, allowing for efficient data transfer. The previous patch which added pause frame configuration has a bug due to which pause frame feature is not working in GMP mode. This patch fixes the issue by configurating appropriate registers. Fixes: f7e086e7 ("octeontx2-af: Pause frame configuration at cgx") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240326052720.4441-1-hkelam@marvell.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Raju Lakkaraju authored
PCI11x1x Rev B0 devices might drop packets when receiving back to back frames at 2.5G link speed. Change the B0 Rev device's Receive filtering Engine FIFO threshold parameter from its hardware default of 4 to 3 dwords to prevent the problem. Rev C0 and later hardware already defaults to 3 dwords. Fixes: bb4f6bff ("net: lan743x: Add PCI11010 / PCI11414 device IDs") Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240326065805.686128-1-Raju.Lakkaraju@microchip.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Paolo Abeni authored
Justin Chen says: ==================== net: bcmasp: phy managements fixes Fix two issues. - The unimac may be put in a bad state if PHY RX clk doesn't exist during reset. Work around this by bringing the unimac out of reset during phy up. - Remove redundant phy_{suspend/resume} ==================== Link: https://lore.kernel.org/r/20240325193025.1540737-1-justin.chen@broadcom.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Justin Chen authored
phy_{suspend/resume} is redundant. It gets called from phy_{stop/start}. Fixes: 490cb412 ("net: bcmasp: Add support for ASP2.0 Ethernet controller") Signed-off-by: Justin Chen <justin.chen@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Justin Chen authored
The unimac requires the PHY RX clk during reset or it may be put into a bad state. Bring up the unimac after link up to ensure the PHY RX clk exists. Fixes: 490cb412 ("net: bcmasp: Add support for ASP2.0 Ethernet controller") Signed-off-by: Justin Chen <justin.chen@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Christian Marangi authored
On reworking and splitting the at803x driver, in splitting function of at803x PHYs it was added a NULL dereference bug where priv is referenced before it's actually allocated and then is tried to write to for the is_1000basex and is_fiber variables in the case of at8031, writing on the wrong address. Fix this by correctly setting priv local variable only after at803x_probe is called and actually allocates priv in the phydev struct. Reported-by: William Wortel <wwortel@dorpstraat.com> Cc: <stable@vger.kernel.org> Fixes: 25d2ba94 ("net: phy: at803x: move specific at8031 probe mode check to dedicated probe") Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20240325190621.2665-1-ansuelsmth@gmail.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfPaolo Abeni authored
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: Patch #1 reject destroy chain command to delete device hooks in netdev family, hence, only delchain commands are allowed. Patch #2 reject table flag update interference with netdev basechain hook updates, this can leave hooks in inconsistent registration/unregistration state. Patch #3 do not unregister netdev basechain hooks if table is dormant. Otherwise, splat with double unregistration is possible. Patch #4 fixes Kconfig to allow to restore IP_NF_ARPTABLES, from Kuniyuki Iwashima. There are a more fixes still in progress on my side that need more work. * tag 'nf-24-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: arptables: Select NETFILTER_FAMILY_ARP when building arp_tables.c netfilter: nf_tables: skip netdev hook unregistration if table is dormant netfilter: nf_tables: reject table flag and netdev basechain updates netfilter: nf_tables: reject destroy command to remove basechain hooks ==================== Link: https://lore.kernel.org/r/20240328031855.2063-1-pablo@netfilter.orgSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfPaolo Abeni authored
Alexei Starovoitov says: ==================== pull-request: bpf 2024-03-27 The following pull-request contains BPF updates for your *net* tree. We've added 4 non-merge commits during the last 1 day(s) which contain a total of 5 files changed, 26 insertions(+), 3 deletions(-). The main changes are: 1) Fix bloom filter value size validation and protect the verifier against such mistakes, from Andrei. 2) Fix build due to CONFIG_KEXEC_CORE/CRASH_DUMP split, from Hari. 3) Update bpf_lsm maintainers entry, from Matt. * tag 'for-net' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: update BPF LSM designated reviewer list bpf: Protect against int overflow for stack access size bpf: Check bloom filter map value size bpf: fix warning for crash_kexec ==================== Link: https://lore.kernel.org/r/20240328012938.24249-1-alexei.starovoitov@gmail.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Kuniyuki Iwashima authored
syzkaller started to report a warning below [0] after consuming the commit 4654467d ("netfilter: arptables: allow xtables-nft only builds"). The change accidentally removed the dependency on NETFILTER_FAMILY_ARP from IP_NF_ARPTABLES. If NF_TABLES_ARP is not enabled on Kconfig, NETFILTER_FAMILY_ARP will be removed and some code necessary for arptables will not be compiled. $ grep -E "(NETFILTER_FAMILY_ARP|IP_NF_ARPTABLES|NF_TABLES_ARP)" .config CONFIG_NETFILTER_FAMILY_ARP=y # CONFIG_NF_TABLES_ARP is not set CONFIG_IP_NF_ARPTABLES=y $ make olddefconfig $ grep -E "(NETFILTER_FAMILY_ARP|IP_NF_ARPTABLES|NF_TABLES_ARP)" .config # CONFIG_NF_TABLES_ARP is not set CONFIG_IP_NF_ARPTABLES=y So, when nf_register_net_hooks() is called for arptables, it will trigger the splat below. Now IP_NF_ARPTABLES is only enabled by IP_NF_ARPFILTER, so let's restore the dependency on NETFILTER_FAMILY_ARP in IP_NF_ARPFILTER. [0]: WARNING: CPU: 0 PID: 242 at net/netfilter/core.c:316 nf_hook_entry_head+0x1e1/0x2c0 net/netfilter/core.c:316 Modules linked in: CPU: 0 PID: 242 Comm: syz-executor.0 Not tainted 6.8.0-12821-g537c2e91 #10 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:nf_hook_entry_head+0x1e1/0x2c0 net/netfilter/core.c:316 Code: 83 fd 04 0f 87 bc 00 00 00 e8 5b 84 83 fd 4d 8d ac ec a8 0b 00 00 e8 4e 84 83 fd 4c 89 e8 5b 5d 41 5c 41 5d c3 e8 3f 84 83 fd <0f> 0b e8 38 84 83 fd 45 31 ed 5b 5d 4c 89 e8 41 5c 41 5d c3 e8 26 RSP: 0018:ffffc90000b8f6e8 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000003 RCX: ffffffff83c42164 RDX: ffff888106851180 RSI: ffffffff83c42321 RDI: 0000000000000005 RBP: 0000000000000000 R08: 0000000000000005 R09: 000000000000000a R10: 0000000000000003 R11: ffff8881055c2f00 R12: ffff888112b78000 R13: 0000000000000000 R14: ffff8881055c2f00 R15: ffff8881055c2f00 FS: 00007f377bd78800(0000) GS:ffff88811b000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000496068 CR3: 000000011298b003 CR4: 0000000000770ef0 PKRU: 55555554 Call Trace: <TASK> __nf_register_net_hook+0xcd/0x7a0 net/netfilter/core.c:428 nf_register_net_hook+0x116/0x170 net/netfilter/core.c:578 nf_register_net_hooks+0x5d/0xc0 net/netfilter/core.c:594 arpt_register_table+0x250/0x420 net/ipv4/netfilter/arp_tables.c:1553 arptable_filter_table_init+0x41/0x60 net/ipv4/netfilter/arptable_filter.c:39 xt_find_table_lock+0x2e9/0x4b0 net/netfilter/x_tables.c:1260 xt_request_find_table_lock+0x2b/0xe0 net/netfilter/x_tables.c:1285 get_info+0x169/0x5c0 net/ipv4/netfilter/arp_tables.c:808 do_arpt_get_ctl+0x3f9/0x830 net/ipv4/netfilter/arp_tables.c:1444 nf_getsockopt+0x76/0xd0 net/netfilter/nf_sockopt.c:116 ip_getsockopt+0x17d/0x1c0 net/ipv4/ip_sockglue.c:1777 tcp_getsockopt+0x99/0x100 net/ipv4/tcp.c:4373 do_sock_getsockopt+0x279/0x360 net/socket.c:2373 __sys_getsockopt+0x115/0x1e0 net/socket.c:2402 __do_sys_getsockopt net/socket.c:2412 [inline] __se_sys_getsockopt net/socket.c:2409 [inline] __x64_sys_getsockopt+0xbd/0x150 net/socket.c:2409 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x4f/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x46/0x4e RIP: 0033:0x7f377beca6fe Code: 1f 44 00 00 48 8b 15 01 97 0a 00 f7 d8 64 89 02 b8 ff ff ff ff eb b8 0f 1f 44 00 00 f3 0f 1e fa 49 89 ca b8 37 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 0a c3 66 0f 1f 84 00 00 00 00 00 48 8b 15 c9 RSP: 002b:00000000005df728 EFLAGS: 00000246 ORIG_RAX: 0000000000000037 RAX: ffffffffffffffda RBX: 00000000004966e0 RCX: 00007f377beca6fe RDX: 0000000000000060 RSI: 0000000000000000 RDI: 0000000000000003 RBP: 000000000042938a R08: 00000000005df73c R09: 00000000005df800 R10: 00000000004966e8 R11: 0000000000000246 R12: 0000000000000003 R13: 0000000000496068 R14: 0000000000000003 R15: 00000000004bc9d8 </TASK> Fixes: 4654467d ("netfilter: arptables: allow xtables-nft only builds") Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Pablo Neira Ayuso authored
Skip hook unregistration when adding or deleting devices from an existing netdev basechain. Otherwise, commit/abort path try to unregister hooks which not enabled. Fixes: b9703ed4 ("netfilter: nf_tables: support for adding new devices to an existing netdev chain") Fixes: 7d937b10 ("netfilter: nf_tables: support for deleting devices in an existing netdev chain") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Pablo Neira Ayuso authored
netdev basechain updates are stored in the transaction object hook list. When setting on the table dormant flag, it iterates over the existing hooks in the basechain. Thus, skipping the hooks that are being added/deleted in this transaction, which leaves hook registration in inconsistent state. Reject table flag updates in combination with netdev basechain updates in the same batch: - Update table flags and add/delete basechain: Check from basechain update path if there are pending flag updates for this table. - add/delete basechain and update table flags: Iterate over the transaction list to search for basechain updates from the table update path. In both cases, the batch is rejected. Based on suggestion from Florian Westphal. Fixes: b9703ed4 ("netfilter: nf_tables: support for adding new devices to an existing netdev chain") Fixes: 7d937b10 ("netfilter: nf_tables: support for deleting devices in an existing netdev chain") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Pablo Neira Ayuso authored
Report EOPNOTSUPP if NFT_MSG_DESTROYCHAIN is used to delete hooks in an existing netdev basechain, thus, only NFT_MSG_DELCHAIN is allowed. Fixes: 7d937b10 ("netfilter: nf_tables: support for deleting devices in an existing netdev chain") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
- 27 Mar, 2024 15 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wirelessJakub Kicinski authored
Kalle Valo says: ==================== wireless fixes for v6.9-rc2 The first fixes for v6.9. Ping-Ke Shih now maintains a separate tree for Realtek drivers, document that in the MAINTAINERS. Plenty of fixes for both to stack and iwlwifi. Our kunit tests were working only on um architecture but that's fixed now. * tag 'wireless-2024-03-27' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: (21 commits) MAINTAINERS: wifi: mwifiex: add Francesco as reviewer kunit: fix wireless test dependencies wifi: iwlwifi: mvm: include link ID when releasing frames wifi: iwlwifi: mvm: handle debugfs names more carefully wifi: iwlwifi: mvm: guard against invalid STA ID on removal wifi: iwlwifi: read txq->read_ptr under lock wifi: iwlwifi: fw: don't always use FW dump trig wifi: iwlwifi: mvm: rfi: fix potential response leaks wifi: mac80211: correctly set active links upon TTLM wifi: iwlwifi: mvm: Configure the link mapping for non-MLD FW wifi: iwlwifi: mvm: consider having one active link wifi: iwlwifi: mvm: pick the version of SESSION_PROTECTION_NOTIF wifi: mac80211: fix prep_connection error path wifi: cfg80211: fix rdev_dump_mpp() arguments order wifi: iwlwifi: mvm: disable MLO for the time being wifi: cfg80211: add a flag to disable wireless extensions wifi: mac80211: fix ieee80211_bss_*_flags kernel-doc wifi: mac80211: check/clear fast rx for non-4addr sta VLAN changes wifi: mac80211: fix mlme_link_id_dbg() MAINTAINERS: wifi: add git tree for Realtek WiFi drivers ... ==================== Link: https://lore.kernel.org/r/20240327191346.1A1EAC433C7@smtp.kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Matt Bobrowski authored
Adding myself in place of both Brendan and Florent as both have since moved on from working on the BPF LSM and will no longer be devoting their time to maintaining the BPF LSM. Signed-off-by: Matt Bobrowski <mattbobrowski@google.com> Acked-by: KP Singh <kpsingh@kernel.org> Link: https://lore.kernel.org/r/ZgMhWF_egdYF8t4D@google.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Alexei Starovoitov authored
Andrei Matei says: ==================== Check bloom filter map value size v1->v2: - prepend a patch addressing the bloom map specifically - change low-level rejection error to EFAULT, to indicate a bug ==================== Link: https://lore.kernel.org/r/20240327024245.318299-1-andreimatei1@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrei Matei authored
This patch re-introduces protection against the size of access to stack memory being negative; the access size can appear negative as a result of overflowing its signed int representation. This should not actually happen, as there are other protections along the way, but we should protect against it anyway. One code path was missing such protections (fixed in the previous patch in the series), causing out-of-bounds array accesses in check_stack_range_initialized(). This patch causes the verification of a program with such a non-sensical access size to fail. This check used to exist in a more indirect way, but was inadvertendly removed in a833a17a. Fixes: a833a17a ("bpf: Fix verification of indirect var-off stack access") Reported-by: syzbot+33f4297b5f927648741a@syzkaller.appspotmail.com Reported-by: syzbot+aafd0513053a1cbf52ef@syzkaller.appspotmail.com Closes: https://lore.kernel.org/bpf/CAADnVQLORV5PT0iTAhRER+iLBTkByCYNBYyvBSgjN1T31K+gOw@mail.gmail.com/Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Andrei Matei <andreimatei1@gmail.com> Link: https://lore.kernel.org/r/20240327024245.318299-3-andreimatei1@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrei Matei authored
This patch adds a missing check to bloom filter creating, rejecting values above KMALLOC_MAX_SIZE. This brings the bloom map in line with many other map types. The lack of this protection can cause kernel crashes for value sizes that overflow int's. Such a crash was caught by syzkaller. The next patch adds more guard-rails at a lower level. Signed-off-by: Andrei Matei <andreimatei1@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240327024245.318299-2-andreimatei1@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Hari Bathini authored
With [1], crash dump specific code is moved out of CONFIG_KEXEC_CORE and placed under CONFIG_CRASH_DUMP, where it is more appropriate. And since CONFIG_KEXEC & !CONFIG_CRASH_DUMP build option is supported with that, it led to the below warning: "WARN: resolve_btfids: unresolved symbol crash_kexec" Fix it by using the appropriate #ifdef. [1] https://lore.kernel.org/all/20240124051254.67105-1-bhe@redhat.com/Acked-by: Baoquan He <bhe@redhat.com> Fixes: 02aff848 ("crash: split crash dumping code out from kexec_core.c") Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Link: https://lore.kernel.org/r/20240319080152.36987-1-hbathini@linux.ibm.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Jakub Kicinski authored
The longest running netdevsim test, nexthop.sh, currently takes 5 min to finish. Around 260s to be exact, and 310s on a debug kernel. The default timeout in selftest is 45sec, so we need an explicit config. Give ourselves some headroom and use 10min. Commit under Fixes isn't really to "blame" but prior to that netdevsim tests weren't integrated with kselftest infra so blaming the tests themselves doesn't seem right, either. Fixes: 8ff25dac ("netdevsim: add Makefile for selftests") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Herve Codina authored
Compilation with CONFIG_GENERIC_FRAMER disabled lead to the following warnings: framer.h:184:16: warning: no previous prototype for function 'framer_get' [-Wmissing-prototypes] 184 | struct framer *framer_get(struct device *dev, const char *con_id) framer.h:184:1: note: declare 'static' if the function is not intended to be used outside of this translation unit 184 | struct framer *framer_get(struct device *dev, const char *con_id) framer.h:189:6: warning: no previous prototype for function 'framer_put' [-Wmissing-prototypes] 189 | void framer_put(struct device *dev, struct framer *framer) framer.h:189:1: note: declare 'static' if the function is not intended to be used outside of this translation unit 189 | void framer_put(struct device *dev, struct framer *framer) Add missing 'static inline' qualifiers for these functions. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202403241110.hfJqeJRu-lkp@intel.com/ Fixes: 82c944d0 ("net: wan: Add framer framework support") Cc: stable@vger.kernel.org Signed-off-by: Herve Codina <herve.codina@bootlin.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queueJakub Kicinski authored
Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2024-03-25 (ice, ixgbe, igc) This series contains updates to ice, ixgbe, and igc drivers. Steven fixes incorrect casting of bitmap type for ice driver. Jesse fixes memory corruption issue with suspend flow on ice. Przemek adds GFP_ATOMIC flag to avoid sleeping in IRQ context for ixgbe. Kurt Kanzenbach removes no longer valid comment on igc. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: igc: Remove stale comment about Tx timestamping ixgbe: avoid sleeping allocation in ixgbe_ipsec_vf_add_sa() ice: fix memory corruption bug with suspend and rebuild ice: Refactor FW data type and fix bitmap casting issue ==================== Link: https://lore.kernel.org/r/20240325200659.993749-1-anthony.l.nguyen@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
David Thompson authored
The mlxbf_gige driver encounters a NULL pointer exception in mlxbf_gige_open() when kdump is enabled. The sequence to reproduce the exception is as follows: a) enable kdump b) trigger kdump via "echo c > /proc/sysrq-trigger" c) kdump kernel executes d) kdump kernel loads mlxbf_gige module e) the mlxbf_gige module runs its open() as the the "oob_net0" interface is brought up f) mlxbf_gige module will experience an exception during its open(), something like: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Mem abort info: ESR = 0x0000000086000004 EC = 0x21: IABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault user pgtable: 4k pages, 48-bit VAs, pgdp=00000000e29a4000 [0000000000000000] pgd=0000000000000000, p4d=0000000000000000 Internal error: Oops: 0000000086000004 [#1] SMP CPU: 0 PID: 812 Comm: NetworkManager Tainted: G OE 5.15.0-1035-bluefield #37-Ubuntu Hardware name: https://www.mellanox.com BlueField-3 SmartNIC Main Card/BlueField-3 SmartNIC Main Card, BIOS 4.6.0.13024 Jan 19 2024 pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : 0x0 lr : __napi_poll+0x40/0x230 sp : ffff800008003e00 x29: ffff800008003e00 x28: 0000000000000000 x27: 00000000ffffffff x26: ffff000066027238 x25: ffff00007cedec00 x24: ffff800008003ec8 x23: 000000000000012c x22: ffff800008003eb7 x21: 0000000000000000 x20: 0000000000000001 x19: ffff000066027238 x18: 0000000000000000 x17: ffff578fcb450000 x16: ffffa870b083c7c0 x15: 0000aaab010441d0 x14: 0000000000000001 x13: 00726f7272655f65 x12: 6769675f6662786c x11: 0000000000000000 x10: 0000000000000000 x9 : ffffa870b0842398 x8 : 0000000000000004 x7 : fe5a48b9069706ea x6 : 17fdb11fc84ae0d2 x5 : d94a82549d594f35 x4 : 0000000000000000 x3 : 0000000000400100 x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000066027238 Call trace: 0x0 net_rx_action+0x178/0x360 __do_softirq+0x15c/0x428 __irq_exit_rcu+0xac/0xec irq_exit+0x18/0x2c handle_domain_irq+0x6c/0xa0 gic_handle_irq+0xec/0x1b0 call_on_irq_stack+0x20/0x2c do_interrupt_handler+0x5c/0x70 el1_interrupt+0x30/0x50 el1h_64_irq_handler+0x18/0x2c el1h_64_irq+0x7c/0x80 __setup_irq+0x4c0/0x950 request_threaded_irq+0xf4/0x1bc mlxbf_gige_request_irqs+0x68/0x110 [mlxbf_gige] mlxbf_gige_open+0x5c/0x170 [mlxbf_gige] __dev_open+0x100/0x220 __dev_change_flags+0x16c/0x1f0 dev_change_flags+0x2c/0x70 do_setlink+0x220/0xa40 __rtnl_newlink+0x56c/0x8a0 rtnl_newlink+0x58/0x84 rtnetlink_rcv_msg+0x138/0x3c4 netlink_rcv_skb+0x64/0x130 rtnetlink_rcv+0x20/0x30 netlink_unicast+0x2ec/0x360 netlink_sendmsg+0x278/0x490 __sock_sendmsg+0x5c/0x6c ____sys_sendmsg+0x290/0x2d4 ___sys_sendmsg+0x84/0xd0 __sys_sendmsg+0x70/0xd0 __arm64_sys_sendmsg+0x2c/0x40 invoke_syscall+0x78/0x100 el0_svc_common.constprop.0+0x54/0x184 do_el0_svc+0x30/0xac el0_svc+0x48/0x160 el0t_64_sync_handler+0xa4/0x12c el0t_64_sync+0x1a4/0x1a8 Code: bad PC value ---[ end trace 7d1c3f3bf9d81885 ]--- Kernel panic - not syncing: Oops: Fatal exception in interrupt Kernel Offset: 0x2870a7a00000 from 0xffff800008000000 PHYS_OFFSET: 0x80000000 CPU features: 0x0,000005c1,a3332a5a Memory Limit: none ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]--- The exception happens because there is a pending RX interrupt before the call to request_irq(RX IRQ) executes. Then, the RX IRQ handler fires immediately after this request_irq() completes. The RX IRQ handler runs "napi_schedule()" before NAPI is fully initialized via "netif_napi_add()" and "napi_enable()", both which happen later in the open() logic. The logic in mlxbf_gige_open() must fully initialize NAPI before any calls to request_irq() execute. Fixes: f92e1869 ("Add Mellanox BlueField Gigabit Ethernet driver") Signed-off-by: David Thompson <davthompson@nvidia.com> Reviewed-by: Asmaa Mnebhi <asmaa@nvidia.com> Link: https://lore.kernel.org/r/20240325183627.7641-1-davthompson@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Sabrina Dubroca says: ==================== tls: recvmsg fixes The first two fixes are again related to async decrypt. The last one is unrelated but I stumbled upon it while reading the code. ==================== Link: https://lore.kernel.org/r/cover.1711120964.git.sd@queasysnail.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Sabrina Dubroca authored
At the start of tls_sw_recvmsg, we take a reference on the psock, and then call tls_rx_reader_lock. If that fails, we return directly without releasing the reference. Instead of adding a new label, just take the reference after locking has succeeded, since we don't need it before. Fixes: 4cbc325e ("tls: rx: allow only one reader at a time") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/fe2ade22d030051ce4c3638704ed58b67d0df643.1711120964.git.sd@queasysnail.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Sabrina Dubroca authored
Make sure that we don't return more bytes than we actually received if the userspace buffer was bogus. We expect to receive at least the rest of rec1, and possibly some of rec2 (currently, we don't, but that would be ok). Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/720e61b3d3eab40af198a58ce2cd1ee019f0ceb1.1711120964.git.sd@queasysnail.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Sabrina Dubroca authored
process_rx_list may not copy as many bytes as we want to the userspace buffer, for example in case we hit an EFAULT during the copy. If this happens, we should only count the bytes that were actually copied, which may be 0. Subtracting async_copy_bytes is correct in both peek and !peek cases, because decrypted == async_copy_bytes + peeked for the peek case: peek is always !ZC, and we can go through either the sync or async path. In the async case, we add chunk to both decrypted and async_copy_bytes. In the sync case, we add chunk to both decrypted and peeked. I missed that in commit 6caaf104 ("tls: fix peeking with sync+async decryption"). Fixes: 4d42cd6b ("tls: rx: fix return value for async crypto") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/1b5a1eaab3c088a9dd5d9f1059ceecd7afe888d1.1711120964.git.sd@queasysnail.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Sabrina Dubroca authored
Only MSG_PEEK needs to copy from an offset during the final process_rx_list call, because the bytes we copied at the beginning of tls_sw_recvmsg were left on the rx_list. In the KVEC case, we removed data from the rx_list as we were copying it, so there's no need to use an offset, just like in the normal case. Fixes: 692d7b5d ("tls: Fix recvmsg() to be able to peek across multiple records") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/e5487514f828e0347d2b92ca40002c62b58af73d.1711120964.git.sd@queasysnail.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 26 Mar, 2024 12 commits
-
-
Paolo Abeni authored
Jijie Shao says: ==================== There are some bugfix for the HNS3 ethernet driver ==================== Link: https://lore.kernel.org/r/20240325124311.1866197-1-shaojijie@huawei.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Jian Shen authored
Currently, loopback test may be skipped when resetting, but the test result will still show as 'PASS', because the driver doesn't set ETH_TEST_FL_FAILED flag. Fix it by setting the flag and initializating the value to UNEXECUTED. Fixes: 4c8dab1c ("net: hns3: reconstruct function hns3_self_test") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Yonglong Liu authored
The devlink reload process will access the hardware resources, but the register operation is done before the hardware is initialized. So, processing the devlink reload during initialization may lead to kernel crash. This patch fixes this by taking devl_lock during initialization. Fixes: b741269b ("net: hns3: add support for registering devlink for PF") Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Jie Wang authored
Currently, hns hardware supports more than 512 queues and the index limit in hclge_comm_tqps_update_stats is wrong. So this patch removes it. Fixes: 287db5c4 ("net: hns3: create new set of common tqp stats APIs for PF and VF reuse") Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Francesco Dolcini authored
As discussed on the mailing list, add myself as mwifiex driver reviewer. Link: https://lore.kernel.org/all/20240318112830.GA9565@francesco-nb/Signed-off-by: Francesco Dolcini <francesco@dolcini.it> Acked-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://msgid.link/20240321163420.11158-1-francesco@dolcini.it
-
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfPaolo Abeni authored
Daniel Borkmann says: ==================== pull-request: bpf 2024-03-25 The following pull-request contains BPF updates for your *net* tree. We've added 17 non-merge commits during the last 12 day(s) which contain a total of 19 files changed, 184 insertions(+), 61 deletions(-). The main changes are: 1) Fix an arm64 BPF JIT bug in BPF_LDX_MEMSX implementation's offset handling found via test_bpf module, from Puranjay Mohan. 2) Various fixups to the BPF arena code in particular in the BPF verifier and around BPF selftests to match latest corresponding LLVM implementation, from Puranjay Mohan and Alexei Starovoitov. 3) Fix xsk to not assume that metadata is always requested in TX completion, from Stanislav Fomichev. 4) Fix riscv BPF JIT's kfunc parameter incompatibility between BPF and the riscv ABI which requires sign-extension on int/uint, from Pu Lehui. 5) Fix s390x BPF JIT's bpf_plt pointer arithmetic which triggered a crash when testing struct_ops, from Ilya Leoshkevich. 6) Fix libbpf's arena mmap handling which had incorrect u64-to-pointer cast on 32-bit architectures, from Andrii Nakryiko. 7) Fix libbpf to define MFD_CLOEXEC when not available, from Arnaldo Carvalho de Melo. 8) Fix arm64 BPF JIT implementation for 32bit unconditional bswap which resulted in an incorrect swap as indicated by test_bpf, from Artem Savkov. 9) Fix BPF man page build script to use silent mode, from Hangbin Liu. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: riscv, bpf: Fix kfunc parameters incompatibility between bpf and riscv abi bpf: verifier: reject addr_space_cast insn without arena selftests/bpf: verifier_arena: fix mmap address for arm64 bpf: verifier: fix addr_space_cast from as(1) to as(0) libbpf: Define MFD_CLOEXEC if not available arm64: bpf: fix 32bit unconditional bswap bpf, arm64: fix bug in BPF_LDX_MEMSX libbpf: fix u64-to-pointer cast on 32-bit arches s390/bpf: Fix bpf_plt pointer arithmetic xsk: Don't assume metadata is always requested in TX completion selftests/bpf: Add arena test case for 4Gbyte corner case selftests/bpf: Remove hard coded PAGE_SIZE macro. libbpf, selftests/bpf: Adjust libbpf, bpftool, selftests to match LLVM bpf: Clarify bpf_arena comments. MAINTAINERS: Update email address for Quentin Monnet scripts/bpf_doc: Use silent mode when exec make cmd bpf: Temporarily disable atomic operations in BPF arena ==================== Link: https://lore.kernel.org/r/20240325213520.26688-1-daniel@iogearbox.netSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Ido Schimmel authored
Locally generated IP multicast packets (such as the ones used in the test) do not perform routing and simply egress the bound device. However, as explained in commit 8bcfb4ae ("selftests: forwarding: Fix failing tests with old libnet"), old versions of libnet (used by mausezahn) do not use the "SO_BINDTODEVICE" socket option. Specifically, the library started using the option for IPv6 sockets in version 1.1.6 and for IPv4 sockets in version 1.2. This explains why on Ubuntu - which uses version 1.1.6 - the IPv4 overlay tests are failing whereas the IPv6 ones are passing. Fix by specifying the source and destination MAC of the packets which will cause mausezahn to use a packet socket instead of an IP socket. Fixes: 62199e3f ("selftests: net: Add VXLAN MDB test") Reported-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr> Closes: https://lore.kernel.org/netdev/5bb50349-196d-4892-8ed2-f37543aa863f@alu.unizg.hr/Tested-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20240325075030.2379513-1-idosch@nvidia.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Sergey Shtylyov authored
Since the Renesas Ethernet Switch driver was added by Yoshihiro Shimoda, I started receiving the patches to review for it -- which I was unable to do, as I don't know this hardware and don't even have the manuals for it. Fortunately, Shimoda-san has volunteered to be a reviewer for this new driver, thus let's now split the single entry into 3 per-driver entries, each with its own reviewer... Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Link: https://lore.kernel.org/r/de0ccc1d-6fc0-583f-4f80-f70e6461d62d@omp.ruSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Arınç ÜNAL authored
The MT7530 switch after reset initialises with a core clock frequency that works with a 25MHz XTAL connected to it. For 40MHz XTAL, the core clock frequency must be set to 500MHz. The mt7530_pll_setup() function is responsible of setting the core clock frequency. Currently, it runs on MT7530 with 25MHz and 40MHz XTAL. This causes MT7530 switch with 25MHz XTAL to egress and ingress frames improperly. Introduce a check to run it only on MT7530 with 40MHz XTAL. The core clock frequency is set by writing to a switch PHY's register. Access to the PHY's register is done via the MDIO bus the switch is also on. Therefore, it works only when the switch makes switch PHYs listen on the MDIO bus the switch is on. This is controlled either by the state of the ESW_P1_LED_1 pin after reset deassertion or modifying bit 5 of the modifiable trap register. When ESW_P1_LED_1 is pulled high, PHY indirect access is used. That means accessing PHY registers via the PHY indirect access control register of the switch. When ESW_P1_LED_1 is pulled low, PHY direct access is used. That means accessing PHY registers via the MDIO bus the switch is on. For MT7530 switch with 40MHz XTAL on a board with ESW_P1_LED_1 pulled high, the core clock frequency won't be set to 500MHz, causing the switch to egress and ingress frames improperly. Run mt7530_pll_setup() after PHY direct access is set on the modifiable trap register. With these two changes, all MT7530 switches with 25MHz and 40MHz, and P1_LED_1 pulled high or low, will egress and ingress frames properly. Link: https://github.com/BPI-SINOVOIP/BPI-R2-bsp/blob/4a5dd143f2172ec97a2872fa29c7c4cd520f45b5/linux-mt/drivers/net/ethernet/mediatek/gsw_mt7623.c#L1039 Fixes: b8f126a8 ("net-next: dsa: add dsa support for Mediatek MT7530 switch") Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Link: https://lore.kernel.org/r/20240320-for-net-mt7530-fix-25mhz-xtal-with-direct-phy-access-v1-1-d92f605f1160@arinc9.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Bjørn Mork authored
Some of the registers are aligned on a 32bit boundary, causing alignment faults on 64bit platforms. Unable to handle kernel paging request at virtual address ffffffc084a1d004 Mem abort info: ESR = 0x0000000096000061 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x21: alignment fault Data abort info: ISV = 0, ISS = 0x00000061, ISS2 = 0x00000000 CM = 0, WnR = 1, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000046ad6000 [ffffffc084a1d004] pgd=100000013ffff003, p4d=100000013ffff003, pud=100000013ffff003, pmd=0068000020a00711 Internal error: Oops: 0000000096000061 [#1] SMP Modules linked in: mtk_t7xx(+) qcserial pppoe ppp_async option nft_fib_inet nf_flow_table_inet mt7921u(O) mt7921s(O) mt7921e(O) mt7921_common(O) iwlmvm(O) iwldvm(O) usb_wwan rndis_host qmi_wwan pppox ppp_generic nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack mt7996e(O) mt792x_usb(O) mt792x_lib(O) mt7915e(O) mt76_usb(O) mt76_sdio(O) mt76_connac_lib(O) mt76(O) mac80211(O) iwlwifi(O) huawei_cdc_ncm cfg80211(O) cdc_ncm cdc_ether wwan usbserial usbnet slhc sfp rtc_pcf8563 nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 mt6577_auxadc mdio_i2c libcrc32c compat(O) cdc_wdm cdc_acm at24 crypto_safexcel pwm_fan i2c_gpio i2c_smbus industrialio i2c_algo_bit i2c_mux_reg i2c_mux_pca954x i2c_mux_pca9541 i2c_mux_gpio i2c_mux dummy oid_registry tun sha512_arm64 sha1_ce sha1_generic seqiv md5 geniv des_generic libdes cbc authencesn authenc leds_gpio xhci_plat_hcd xhci_pci xhci_mtk_hcd xhci_hcd nvme nvme_core gpio_button_hotplug(O) dm_mirror dm_region_hash dm_log dm_crypt dm_mod dax usbcore usb_common ptp aquantia pps_core mii tpm encrypted_keys trusted CPU: 3 PID: 5266 Comm: kworker/u9:1 Tainted: G O 6.6.22 #0 Hardware name: Bananapi BPI-R4 (DT) Workqueue: md_hk_wq t7xx_fsm_uninit [mtk_t7xx] pstate: 804000c5 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : t7xx_cldma_hw_set_start_addr+0x1c/0x3c [mtk_t7xx] lr : t7xx_cldma_start+0xac/0x13c [mtk_t7xx] sp : ffffffc085d63d30 x29: ffffffc085d63d30 x28: 0000000000000000 x27: 0000000000000000 x26: 0000000000000000 x25: ffffff80c804f2c0 x24: ffffff80ca196c05 x23: 0000000000000000 x22: ffffff80c814b9b8 x21: ffffff80c814b128 x20: 0000000000000001 x19: ffffff80c814b080 x18: 0000000000000014 x17: 0000000055c9806b x16: 000000007c5296d0 x15: 000000000f6bca68 x14: 00000000dbdbdce4 x13: 000000001aeaf72a x12: 0000000000000001 x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 x8 : ffffff80ca1ef6b4 x7 : ffffff80c814b818 x6 : 0000000000000018 x5 : 0000000000000870 x4 : 0000000000000000 x3 : 0000000000000000 x2 : 000000010a947000 x1 : ffffffc084a1d004 x0 : ffffffc084a1d004 Call trace: t7xx_cldma_hw_set_start_addr+0x1c/0x3c [mtk_t7xx] t7xx_fsm_uninit+0x578/0x5ec [mtk_t7xx] process_one_work+0x154/0x2a0 worker_thread+0x2ac/0x488 kthread+0xe0/0xec ret_from_fork+0x10/0x20 Code: f9400800 91001000 8b214001 d50332bf (f9000022) ---[ end trace 0000000000000000 ]--- The inclusion of io-64-nonatomic-lo-hi.h indicates that all 64bit accesses can be replaced by pairs of nonatomic 32bit access. Fix alignment by forcing all accesses to be 32bit on 64bit platforms. Link: https://forum.openwrt.org/t/fibocom-fm350-gl-support/142682/72 Fixes: 39d43904 ("net: wwan: t7xx: Add control DMA interface") Signed-off-by: Bjørn Mork <bjorn@mork.no> Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Tested-by: Liviu Dudau <liviu@dudau.co.uk> Link: https://lore.kernel.org/r/20240322144000.1683822-1-bjorn@mork.noSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Eric Dumazet authored
We had various syzbot reports about tcp timers firing after the corresponding netns has been dismantled. Fortunately Josef Bacik could trigger the issue more often, and could test a patch I wrote two years ago. When TCP sockets are closed, we call inet_csk_clear_xmit_timers() to 'stop' the timers. inet_csk_clear_xmit_timers() can be called from any context, including when socket lock is held. This is the reason it uses sk_stop_timer(), aka del_timer(). This means that ongoing timers might finish much later. For user sockets, this is fine because each running timer holds a reference on the socket, and the user socket holds a reference on the netns. For kernel sockets, we risk that the netns is freed before timer can complete, because kernel sockets do not hold reference on the netns. This patch adds inet_csk_clear_xmit_timers_sync() function that using sk_stop_timer_sync() to make sure all timers are terminated before the kernel socket is released. Modules using kernel sockets close them in their netns exit() handler. Also add sock_not_owned_by_me() helper to get LOCKDEP support : inet_csk_clear_xmit_timers_sync() must not be called while socket lock is held. It is very possible we can revert in the future commit 3a58f13a ("net: rds: acquire refcount on TCP sockets") which attempted to solve the issue in rds only. (net/smc/af_smc.c and net/mptcp/subflow.c have similar code) We probably can remove the check_net() tests from tcp_out_of_resources() and __tcp_close() in the future. Reported-by: Josef Bacik <josef@toxicpanda.com> Closes: https://lore.kernel.org/netdev/20240314210740.GA2823176@perftesting/ Fixes: 26abe143 ("net: Modify sk_alloc to not reference count the netns of kernel sockets.") Fixes: 8a681736 ("net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket") Link: https://lore.kernel.org/bpf/CANn89i+484ffqb93aQm1N-tjxxvb3WDKX0EbD7318RwRgsatjw@mail.gmail.com/Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Josef Bacik <josef@toxicpanda.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Link: https://lore.kernel.org/r/20240322135732.1535772-1-edumazet@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Ravi Gunasekaran authored
commit e748d0fd ("net: hsr: Disable promiscuous mode in offload mode") disables promiscuous mode of slave devices while creating an HSR interface. But while deleting the HSR interface, it does not take care of it. It decreases the promiscuous mode count, which eventually enables promiscuous mode on the slave devices when creating HSR interface again. Fix this by not decrementing the promiscuous mode count while deleting the HSR interface when offload is enabled. Fixes: e748d0fd ("net: hsr: Disable promiscuous mode in offload mode") Signed-off-by: Ravi Gunasekaran <r-gunasekaran@ti.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240322100447.27615-1-r-gunasekaran@ti.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-