- 11 Jan, 2023 2 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queueJakub Kicinski authored
Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-01-09 (ice) This series contains updates to ice driver only. Jiasheng Jiang frees allocated cmd_buf if write_buf allocation failed to prevent memory leak. Yuan Can adds check, and proper cleanup, of gnss_tty_port allocation call to avoid memory leaks. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: Add check for kzalloc ice: Fix potential memory leak in ice_gnss_tty_write() ==================== Link: https://lore.kernel.org/r/20230109225358.3478060-1-anthony.l.nguyen@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Frederick Lawler authored
While experimenting with applying noqueue to a classful queue discipline, we discovered a NULL pointer dereference in the __dev_queue_xmit() path that generates a kernel OOPS: # dev=enp0s5 # tc qdisc replace dev $dev root handle 1: htb default 1 # tc class add dev $dev parent 1: classid 1:1 htb rate 10mbit # tc qdisc add dev $dev parent 1:1 handle 10: noqueue # ping -I $dev -w 1 -c 1 1.1.1.1 [ 2.172856] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 2.173217] #PF: supervisor instruction fetch in kernel mode ... [ 2.178451] Call Trace: [ 2.178577] <TASK> [ 2.178686] htb_enqueue+0x1c8/0x370 [ 2.178880] dev_qdisc_enqueue+0x15/0x90 [ 2.179093] __dev_queue_xmit+0x798/0xd00 [ 2.179305] ? _raw_write_lock_bh+0xe/0x30 [ 2.179522] ? __local_bh_enable_ip+0x32/0x70 [ 2.179759] ? ___neigh_create+0x610/0x840 [ 2.179968] ? eth_header+0x21/0xc0 [ 2.180144] ip_finish_output2+0x15e/0x4f0 [ 2.180348] ? dst_output+0x30/0x30 [ 2.180525] ip_push_pending_frames+0x9d/0xb0 [ 2.180739] raw_sendmsg+0x601/0xcb0 [ 2.180916] ? _raw_spin_trylock+0xe/0x50 [ 2.181112] ? _raw_spin_unlock_irqrestore+0x16/0x30 [ 2.181354] ? get_page_from_freelist+0xcd6/0xdf0 [ 2.181594] ? sock_sendmsg+0x56/0x60 [ 2.181781] sock_sendmsg+0x56/0x60 [ 2.181958] __sys_sendto+0xf7/0x160 [ 2.182139] ? handle_mm_fault+0x6e/0x1d0 [ 2.182366] ? do_user_addr_fault+0x1e1/0x660 [ 2.182627] __x64_sys_sendto+0x1b/0x30 [ 2.182881] do_syscall_64+0x38/0x90 [ 2.183085] entry_SYSCALL_64_after_hwframe+0x63/0xcd ... [ 2.187402] </TASK> Previously in commit d66d6c31 ("net: sched: register noqueue qdisc"), NULL was set for the noqueue discipline on noqueue init so that __dev_queue_xmit() falls through for the noqueue case. This also sets a bypass of the enqueue NULL check in the register_qdisc() function for the struct noqueue_disc_ops. Classful queue disciplines make it past the NULL check in __dev_queue_xmit() because the discipline is set to htb (in this case), and then in the call to __dev_xmit_skb(), it calls into htb_enqueue() which grabs a leaf node for a class and then calls qdisc_enqueue() by passing in a queue discipline which assumes ->enqueue() is not set to NULL. Fix this by not allowing classes to be assigned to the noqueue discipline. Linux TC Notes states that classes cannot be set to the noqueue discipline. [1] Let's enforce that here. Links: 1. https://linux-tc-notes.sourceforge.net/tc/doc/sch_noqueue.txt Fixes: d66d6c31 ("net: sched: register noqueue qdisc") Cc: stable@vger.kernel.org Signed-off-by: Frederick Lawler <fred@cloudflare.com> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/r/20230109163906.706000-1-fred@cloudflare.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 10 Jan, 2023 7 commits
-
-
Hariprasad Kelam authored
resources allocated like mcam entries to support the Ntuple feature and hash tables for the tc feature are not getting freed in driver unbind. This patch fixes the issue. Fixes: 2da48943 ("octeontx2-pf: devlink params support to set mcam entry count") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Link: https://lore.kernel.org/r/20230109061325.21395-1-hkelam@marvell.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Paolo Abeni authored
Guillaume Nault says: ==================== selftests/net: Isolate l2_tos_ttl_inherit.sh in its own netns. l2_tos_ttl_inherit.sh uses a veth pair to run its tests, but only one of the veth interfaces runs in a dedicated netns. The other one remains in the initial namespace where the existing network configuration can interfere with the setup used for the tests. Isolate both veth devices in their own netns and ensure everything gets cleaned up when the script exits. Link: https://lore.kernel.org/netdev/924f1062-ab59-9b88-3b43-c44e73a30387@alu.unizg.hr/ ==================== Link: https://lore.kernel.org/r/cover.1673191942.git.gnault@redhat.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Guillaume Nault authored
Use 'set -e' and an exit handler to stop the script if a command fails and ensure the test environment is cleaned up in any case. Also, handle the case where the script is interrupted by SIGINT. The only command that's expected to fail is 'wait $ping_pid', since it's killed by the script. Handle this case with '|| true' to make it play well with 'set -e'. Finally, return the Kselftest SKIP code (4) when the script breaks because of an environment problem or a command line failure. The 0 and 1 return codes should now reliably indicate that all tests have been run (0: all tests run and passed, 1: all tests run but at least one failed, 4: test script didn't run completely). Fixes: b690842d ("selftests/net: test l2 tunnel TOS/TTL inheriting") Reported-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr> Tested-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr> Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Guillaume Nault authored
This selftest currently runs half in the current namespace and half in a netns of its own. Therefore, the test can fail if the current namespace is already configured with incompatible parameters (for example if it already has a veth0 interface). Adapt the script to put both ends of the veth pair in their own netns. Now veth0 is created in NS0 instead of the current namespace, while veth1 is set up in NS1 (instead of the 'testing' netns). The user visible netns names are randomised to minimise the risk of conflicts with already existing namespaces. The cleanup() function doesn't need to remove the virtual interface anymore: deleting NS0 and NS1 automatically removes the virtual interfaces they contained. We can remove $ns, which was only used to run ip commands in the 'testing' netns (let's use the builtin "-netns" option instead). However, we still need a similar functionality as ping and tcpdump now need to run in NS0. So we now have $RUN_NS0 for that. Fixes: b690842d ("selftests/net: test l2 tunnel TOS/TTL inheriting") Reported-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr> Tested-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr> Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Guillaume Nault authored
The ping command can run before DAD completes. In that case, ping may fail and break the selftest. We don't need DAD here since we're working on isolated device pairs. Fixes: b690842d ("selftests/net: test l2 tunnel TOS/TTL inheriting") Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Heiner Kallweit authored
This reverts commit 42666b2c. This chip version seems to be very rare, but it exits in consumer devices, see linked report. Link: https://stackoverflow.com/questions/75049473/cant-setup-a-wired-network-in-archlinux-fresh-install Fixes: 42666b2c ("r8169: disable detection of chip version 36") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/42e9674c-d5d0-a65a-f578-e5c74f244739@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Ido Schimmel authored
The 'TCA_MPLS_LABEL' attribute is of 'NLA_U32' type, but has a validation type of 'NLA_VALIDATE_FUNCTION'. This is an invalid combination according to the comment above 'struct nla_policy': " Meaning of `validate' field, use via NLA_POLICY_VALIDATE_FN: NLA_BINARY Validation function called for the attribute. All other Unused - but note that it's a union " This can trigger the warning [1] in nla_get_range_unsigned() when validation of the attribute fails. Despite being of 'NLA_U32' type, the associated 'min'/'max' fields in the policy are negative as they are aliased by the 'validate' field. Fix by changing the attribute type to 'NLA_BINARY' which is consistent with the above comment and all other users of NLA_POLICY_VALIDATE_FN(). As a result, move the length validation to the validation function. No regressions in MPLS tests: # ./tdc.py -f tc-tests/actions/mpls.json [...] # echo $? 0 [1] WARNING: CPU: 0 PID: 17743 at lib/nlattr.c:118 nla_get_range_unsigned+0x1d8/0x1e0 lib/nlattr.c:117 Modules linked in: CPU: 0 PID: 17743 Comm: syz-executor.0 Not tainted 6.1.0-rc8 #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812dda519-prebuilt.qemu.org 04/01/2014 RIP: 0010:nla_get_range_unsigned+0x1d8/0x1e0 lib/nlattr.c:117 [...] Call Trace: <TASK> __netlink_policy_dump_write_attr+0x23d/0x990 net/netlink/policy.c:310 netlink_policy_dump_write_attr+0x22/0x30 net/netlink/policy.c:411 netlink_ack_tlv_fill net/netlink/af_netlink.c:2454 [inline] netlink_ack+0x546/0x760 net/netlink/af_netlink.c:2506 netlink_rcv_skb+0x1b7/0x240 net/netlink/af_netlink.c:2546 rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:6109 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0x5e9/0x6b0 net/netlink/af_netlink.c:1345 netlink_sendmsg+0x739/0x860 net/netlink/af_netlink.c:1921 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg net/socket.c:734 [inline] ____sys_sendmsg+0x38f/0x500 net/socket.c:2482 ___sys_sendmsg net/socket.c:2536 [inline] __sys_sendmsg+0x197/0x230 net/socket.c:2565 __do_sys_sendmsg net/socket.c:2574 [inline] __se_sys_sendmsg net/socket.c:2572 [inline] __x64_sys_sendmsg+0x42/0x50 net/socket.c:2572 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Link: https://lore.kernel.org/netdev/CAO4mrfdmjvRUNbDyP0R03_DrD_eFCLCguz6OxZ2TYRSv0K9gxA@mail.gmail.com/ Fixes: 2a2ea508 ("net: sched: add mpls manipulation actions to TC") Reported-by: Wei Chen <harperchen1110@gmail.com> Tested-by: Wei Chen <harperchen1110@gmail.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/20230107171004.608436-1-idosch@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 09 Jan, 2023 9 commits
-
-
Jiasheng Jiang authored
Add the check for the return value of kzalloc in order to avoid NULL pointer dereference. Moreover, use the goto-label to share the clean code. Fixes: d6b98c8d ("ice: add write functionality for GNSS TTY") Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Yuan Can authored
The ice_gnss_tty_write() return directly if the write_buf alloc failed, leaking the cmd_buf. Fix by free cmd_buf if write_buf alloc failed. Fixes: d6b98c8d ("ice: add write functionality for GNSS TTY") Signed-off-by: Yuan Can <yuancan@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Mirsad Goran Todorovac authored
Adjust size parameter in connect() to match the type of the parameter, to fix "No such file or directory" error in selftests/net/af_unix/ test_oob_unix.c:127. The existing code happens to work provided that the autogenerated pathname is shorter than sizeof (struct sockaddr), which is why it hasn't been noticed earlier. Visible from the trace excerpt: bind(3, {sa_family=AF_UNIX, sun_path="unix_oob_453059"}, 110) = 0 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fa6a6577a10) = 453060 [pid <child>] connect(6, {sa_family=AF_UNIX, sun_path="unix_oob_45305"}, 16) = -1 ENOENT (No such file or directory) BUG: The filename is trimmed to sizeof (struct sockaddr). Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Cc: Florian Westphal <fw@strlen.de> Reviewed-by: Florian Westphal <fw@strlen.de> Fixes: 314001f0 ("af_unix: Add OOB support") Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Horatiu Vultur authored
The blamed commit implemented the vcap_operations to allow to add an entry in the TCAM. One of the callbacks is to validate the supported keysets. If the TCAM lookup was not enabled, then this will return failure so no entries could be added. This doesn't make much sense, as you can enable at a later point the TCAM. Therefore change it such to allow entries in TCAM even it is not enabled. Fixes: 4426b78c ("net: lan966x: Add port keyset config and callback interface") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Jaroslav reported a recent throughput regression with virtio_net caused by blamed commit. It is unclear if DODGY GSO packets coming from user space can be accepted by GRO engine in the future with minimal changes, and if there is any expected gain from it. In the meantime, make sure to detect and flush DODGY packets. Fixes: 5eddb249 ("gro: add support of (hw)gro packets to gro stack") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-and-bisected-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com> Cc: Coco Li <lixiaoyan@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Minsuk Kang authored
Fix a use-after-free that occurs in hcd when in_urb sent from pn533_usb_send_frame() is completed earlier than out_urb. Its callback frees the skb data in pn533_send_async_complete() that is used as a transfer buffer of out_urb. Wait before sending in_urb until the callback of out_urb is called. To modify the callback of out_urb alone, separate the complete function of out_urb and ack_urb. Found by a modified version of syzkaller. BUG: KASAN: use-after-free in dummy_timer Call Trace: memcpy (mm/kasan/shadow.c:65) dummy_perform_transfer (drivers/usb/gadget/udc/dummy_hcd.c:1352) transfer (drivers/usb/gadget/udc/dummy_hcd.c:1453) dummy_timer (drivers/usb/gadget/udc/dummy_hcd.c:1972) arch_static_branch (arch/x86/include/asm/jump_label.h:27) static_key_false (include/linux/jump_label.h:207) timer_expire_exit (include/trace/events/timer.h:127) call_timer_fn (kernel/time/timer.c:1475) expire_timers (kernel/time/timer.c:1519) __run_timers (kernel/time/timer.c:1790) run_timer_softirq (kernel/time/timer.c:1803) Fixes: c46ee386 ("NFC: pn533: add NXP pn533 nfc device driver") Signed-off-by: Minsuk Kang <linuxlovemin@yonsei.ac.kr> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Kees Cook authored
Zero-length arrays are deprecated[1]. Replace struct mlxsw_sp_nexthop_group_info's "nexthops" 0-length array with a flexible array. Detected with GCC 13, using -fstrict-flex-arrays=3: drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c: In function 'mlxsw_sp_nexthop_group_hash_obj': drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:3278:38: warning: array subscript i is outside array bounds of 'struct mlxsw_sp_nexthop[0]' [-Warray-bounds=] 3278 | val ^= jhash(&nh->ifindex, sizeof(nh->ifindex), seed); | ^~~~~~~~~~~~ drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:2954:33: note: while referencing 'nexthops' 2954 | struct mlxsw_sp_nexthop nexthops[0]; | ^~~~~~~~ [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays Cc: Ido Schimmel <idosch@nvidia.com> Cc: Petr Machata <petrm@nvidia.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Tested-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alex Elder authored
Commit b310de78 ("net: ipa: add IPA v4.7 support") was merged despite an unresolved comment made by Konrad Dybcio. Konrad observed that the IMEM region specified for IPA v4.7 did not match that used downstream for the SM7225 SoC. In "lagoon.dtsi" present in a Sony Xperia source tree, a ipa_smmu_ap node was defined with a "qcom,additional-mapping" property that defined the IPA IMEM area starting at offset 0x146a8000 (not 0x146a9000 that was committed). The IPA v4.7 target system used for testing uses the SM7225 SoC, so we'll adhere what the downstream code specifies is the address of the IMEM region used for IPA. Link: https://lore.kernel.org/linux-arm-msm/20221208211529.757669-1-elder@linaro.org Fixes: b310de78 ("net: ipa: add IPA v4.7 support") Tested-by: Luca Weiss <luca.weiss@fairphone.com> Signed-off-by: Alex Elder <elder@linaro.org> Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ivan T. Ivanov authored
The introduction of support for Apple board types inadvertently changed the precedence order, causing hybrid SMBIOS+DT platforms to look up the firmware using the DMI information instead of the device tree compatible to generate the board type. Revert back to the old behavior, as affected platforms use firmwares named after the DT compatible. Fixes: 7682de8b ("wifi: brcmfmac: of: Fetch Apple properties") [1] https://bugzilla.opensuse.org/show_bug.cgi?id=1206697#c13 Cc: stable@vger.kernel.org Signed-off-by: Ivan T. Ivanov <iivanov@suse.de> Reviewed-by: Hector Martin <marcan@marcan.st> Reviewed-by: Arend van Spriel <arend.vanspriel@broadcom.com> Tested-by: Peter Robinson <pbrobinson@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 07 Jan, 2023 3 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fsDavid S. Miller authored
David Howells says: ==================== rxrpc: Fix race between call connection, data transmit and call disconnect Here are patches to fix an oops[1] caused by a race between call connection, initial packet transmission and call disconnection which results in something like: kernel BUG at net/rxrpc/peer_object.c:413! when the syzbot test is run. The problem is that the connection procedure is effectively split across two threads and can get expanded by taking an interrupt, thereby adding the call to the peer error distribution list *after* it has been disconnected (say by the rxrpc socket shutting down). The easiest solution is to look at the fourth set of I/O thread conversion/SACK table expansion patches that didn't get applied[2] and take from it those patches that move call connection and disconnection into the I/O thread. Moving these things into the I/O thread means that the sequencing is managed by all being done in the same thread - and the race can no longer happen. This is preferable to introducing an extra lock as adding an extra lock would make the I/O thread have to wait for the app thread in yet another place. The changes can be considered as a number of logical parts: (1) Move all of the call state changes into the I/O thread. (2) Make client connection ID space per-local endpoint so that the I/O thread doesn't need locks to access it. (3) Move actual abort generation into the I/O thread and clean it up. If sendmsg or recvmsg want to cause an abort, they have to delegate it. (4) Offload the setting up of the security context on a connection to the thread of one of the apps that's starting a call. We don't want to be doing any sort of crypto in the I/O thread. (5) Connect calls (ie. assign them to channel slots on connections) in the I/O thread. Calls are set up by sendmsg/kafs and passed to the I/O thread to connect. Connections are allocated in the I/O thread after this. (6) Disconnect calls in the I/O thread. I've also added a patch for an unrelated bug that cropped up during testing, whereby a race can occur between an incoming call and socket shutdown. Note that whilst this fixes the original syzbot bug, another bug may get triggered if this one is fixed: INFO: rcu detected stall in corrupted rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { P5792 } 2657 jiffies s: 2825 root: 0x0/T rcu: blocking rcu_node structures (internal RCU debug): It doesn't look this should be anything to do with rxrpc, though, as I've tested an additional patch[3] that removes practically all the RCU usage from rxrpc and it still occurs. It seems likely that it is being caused by something in the tunnelling setup that the syzbot test does, but there's not enough info to go on. It also seems unlikely to be anything to do with the afs driver as the test doesn't use that. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Howells authored
An incoming call can race with rxrpc socket destruction, leading to a leaked call. This may result in an oops when the call timer eventually expires: BUG: kernel NULL pointer dereference, address: 0000000000000874 RIP: 0010:_raw_spin_lock_irqsave+0x2a/0x50 Call Trace: <IRQ> try_to_wake_up+0x59/0x550 ? __local_bh_enable_ip+0x37/0x80 ? rxrpc_poke_call+0x52/0x110 [rxrpc] ? rxrpc_poke_call+0x110/0x110 [rxrpc] ? rxrpc_poke_call+0x110/0x110 [rxrpc] call_timer_fn+0x24/0x120 with a warning in the kernel log looking something like: rxrpc: Call 00000000ba5e571a still in use (1,SvAwtACK,1061d,0)! incurred during rmmod of rxrpc. The 1061d is the call flags: RECVMSG_READ_ALL, RX_HEARD, BEGAN_RX_TIMER, RX_LAST, EXPOSED, IS_SERVICE, RELEASED but no DISCONNECTED flag (0x800), so it's an incoming (service) call and it's still connected. The race appears to be that: (1) rxrpc_new_incoming_call() consults the service struct, checks sk_state and allocates a call - then pauses, possibly for an interrupt. (2) rxrpc_release_sock() sets RXRPC_CLOSE, nulls the service pointer, discards the prealloc and releases all calls attached to the socket. (3) rxrpc_new_incoming_call() resumes, launching the new call, including its timer and attaching it to the socket. Fix this by read-locking local->services_lock to access the AF_RXRPC socket providing the service rather than RCU in rxrpc_new_incoming_call(). There's no real need to use RCU here as local->services_lock is only write-locked by the socket side in two places: when binding and when shutting down. Fixes: 5e6ef4f1 ("rxrpc: Make the I/O thread take over the call and local processor work") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-afs@lists.infradead.org
-
Angela Czubak authored
PF netdev can request AF to enable or disable reception and transmission on assigned CGX::LMAC. The current code instead of disabling or enabling 'reception and transmission' also disables/enable the LMAC. This patch fixes this issue. Fixes: 1435f66a ("octeontx2-af: CGX Rx/Tx enable/disable mbox handlers") Signed-off-by: Angela Czubak <aczubak@marvell.com> Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230105160107.17638-1-hkelam@marvell.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 06 Jan, 2023 19 commits
-
-
Tung Nguyen authored
This unexpected behavior is observed: node 1 | node 2 ------ | ------ link is established | link is established reboot | link is reset up | send discovery message receive discovery message | link is established | link is established send discovery message | | receive discovery message | link is reset (unexpected) | send reset message link is reset | It is due to delayed re-discovery as described in function tipc_node_check_dest(): "this link endpoint has already reset and re-established contact with the peer, before receiving a discovery message from that node." However, commit 598411d7 has changed the condition for calling tipc_node_link_down() which was the acceptance of new media address. This commit fixes this by restoring the old and correct behavior. Fixes: 598411d7 ("tipc: make resetting of links non-atomic") Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Howells authored
Move the connection setup of client calls to the I/O thread so that a whole load of locking and barrierage can be eliminated. This necessitates the app thread waiting for connection to complete before it can begin encrypting data. This also completes the fix for a race that exists between call connection and call disconnection whereby the data transmission code adds the call to the peer error distribution list after the call has been disconnected (say by the rxrpc socket getting closed). The fix is to complete the process of moving call connection, data transmission and call disconnection into the I/O thread and thus forcibly serialising them. Note that the issue may predate the overhaul to an I/O thread model that were included in the merge window for v6.2, but the timing is very much changed by the change given below. Fixes: cf37b598 ("rxrpc: Move DATA transmission into call processor work item") Reported-by: syzbot+c22650d2844392afdcfd@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
-
David Howells authored
Move the management of the client connection cache to the I/O thread rather than managing it from the namespace as an aggregate across all the local endpoints within the namespace. This will allow a load of locking to be got rid of in a future patch as only the I/O thread will be looking at the this. The downside is that the total number of cached connections on the system can get higher because the limit is now per-local rather than per-netns. We can, however, keep the number of client conns in use across the entire netfs and use that to reduce the expiration time of idle connection. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
-
David Howells authored
All the setters of call->state are now in the I/O thread and thus the state lock is now unnecessary. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
-
David Howells authored
Move the call state changes that are made in rxrpc_recvmsg() to the I/O thread. This means that, thenceforth, only the I/O thread does this and the call state lock can be removed. This requires the Rx phase to be ended when the last packet is received, not when it is processed. Since this now changes the rxrpc call state to SUCCEEDED before we've consumed all the data from it, rxrpc_kernel_check_life() mustn't say the call is dead until the recvmsg queue is empty (unless the call has failed). Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
-
David Howells authored
Move all the call state changes that are made in rxrpc_sendmsg() to the I/O thread. This is a step towards removing the call state lock. This requires the switch to the RXRPC_CALL_CLIENT_AWAIT_REPLY and RXRPC_CALL_SERVER_SEND_REPLY states to be done when the last packet is decanted from ->tx_sendmsg to ->tx_buffer in the I/O thread, not when it is added to ->tx_sendmsg by sendmsg(). Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
-
David Howells authored
Wrap accesses to get the state of a call from outside of the I/O thread in a single place so that the barrier needed to order wrt the error code and abort code is in just that place. Also use a barrier when setting the call state and again when reading the call state such that the auxiliary completion info (error code, abort code) can be read without taking a read lock on the call state lock. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
-
David Howells authored
Split out the functions that change the state of an rxrpc call into their own file. The idea being to remove anything to do with changing the state of a call directly from the rxrpc sendmsg() and recvmsg() paths and have all that done in the I/O thread only, with the ultimate aim of removing the state lock entirely. Moving the code out of sendmsg.c and recvmsg.c makes that easier to manage. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
-
David Howells authored
Use the information now stored in struct rxrpc_call to configure the connection bundle and thence the connection, rather than using the rxrpc_conn_parameters struct. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
-
David Howells authored
Offload the completion of the challenge/response cycle on a service connection to the I/O thread. After the RESPONSE packet has been successfully decrypted and verified by the work queue, offloading the changing of the call states to the I/O thread makes iteration over the conn's channel list simpler. Do this by marking the RESPONSE skbuff and putting it onto the receive queue for the I/O thread to collect. We put it on the front of the queue as we've already received the packet for it. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
-
David Howells authored
Make the set of connection IDs per local endpoint so that endpoints don't cause each other's connections to get dismissed. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
-
David Howells authored
Tidy up the abort generation infrastructure in the following ways: (1) Create an enum and string mapping table to list the reasons an abort might be generated in tracing. (2) Replace the 3-char string with the values from (1) in the places that use that to log the abort source. This gets rid of a memcpy() in the tracepoint. (3) Subsume the rxrpc_rx_eproto tracepoint with the rxrpc_abort tracepoint and use values from (1) to indicate the trace reason. (4) Always make a call to an abort function at the point of the abort rather than stashing the values into variables and using goto to get to a place where it reported. The C optimiser will collapse the calls together as appropriate. The abort functions return a value that can be returned directly if appropriate. Note that this extends into afs also at the points where that generates an abort. To aid with this, the afs sources need to #define RXRPC_TRACE_ONLY_DEFINE_ENUMS before including the rxrpc tracing header because they don't have access to the rxrpc internal structures that some of the tracepoints make use of. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
-
David Howells authored
Clean up connection abort, using the connection state_lock to gate access to change that state, and use an rxrpc_call_completion value to indicate the difference between local and remote aborts as these can be pasted directly into the call state. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
-
David Howells authored
Provide a means by which an event notification can be sent to a connection through such that the I/O thread can pick it up and handle it rather than doing it in a separate workqueue. This is then used to move the deferred final ACK of a call into the I/O thread rather than a separate work queue as part of the drive to do all transmission from the I/O thread. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
-
David Howells authored
Only perform call disconnection in the I/O thread to reduce the locking requirement. This is the first part of a fix for a race that exists between call connection and call disconnection whereby the data transmission code adds the call to the peer error distribution list after the call has been disconnected (say by the rxrpc socket getting closed). The fix is to complete the process of moving call connection, data transmission and call disconnection into the I/O thread and thus forcibly serialising them. Note that the issue may predate the overhaul to an I/O thread model that were included in the merge window for v6.2, but the timing is very much changed by the change given below. Fixes: cf37b598 ("rxrpc: Move DATA transmission into call processor work item") Reported-by: syzbot+c22650d2844392afdcfd@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
-
David Howells authored
Only set the abort call completion state in the I/O thread and only transmit ABORT packets from there. rxrpc_abort_call() can then be made to actually send the packet. Further, ABORT packets should only be sent if the call has been exposed to the network (ie. at least one attempted DATA transmission has occurred for it). Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
-
David Howells authored
Call the rxrpc_conn_retransmit_call() directly from rxrpc_input_packet() rather than calling it via connection event handling. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
-
David Howells authored
Make the local endpoint and it's I/O thread hold a reference on a connected call until that call is disconnected. Without this, we're reliant on either the AF_RXRPC socket to hold a ref (which is dropped when the call is released) or a queued work item to hold a ref (the work item is being replaced with the I/O thread). Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
-
David Howells authored
Stash the network namespace pointer in the rxrpc_local struct in addition to a pointer to the rxrpc-specific net namespace info. Use this to remove some places where the socket is passed as a parameter. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
-