• Alexandra Winter's avatar
    s390/qeth: Fix kernel panic after setting hsuid · 8a2e4d37
    Alexandra Winter authored
    Symptom:
    When the hsuid attribute is set for the first time on an IQD Layer3
    device while the corresponding network interface is already UP,
    the kernel will try to execute a napi function pointer that is NULL.
    
    Example:
    ---------------------------------------------------------------------------
    [ 2057.572696] illegal operation: 0001 ilc:1 [#1] SMP
    [ 2057.572702] Modules linked in: af_iucv qeth_l3 zfcp scsi_transport_fc sunrpc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6
    nft_reject nft_ct nf_tables_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables libcrc32c nfnetlink ghash_s390 prng xts aes_s390 des_s390 de
    s_generic sha3_512_s390 sha3_256_s390 sha512_s390 vfio_ccw vfio_mdev mdev vfio_iommu_type1 eadm_sch vfio ext4 mbcache jbd2 qeth_l2 bridge stp llc dasd_eckd_mod qeth dasd_mod
     qdio ccwgroup pkey zcrypt
    [ 2057.572739] CPU: 6 PID: 60182 Comm: stress_client Kdump: loaded Not tainted 4.18.0-541.el8.s390x #1
    [ 2057.572742] Hardware name: IBM 3931 A01 704 (LPAR)
    [ 2057.572744] Krnl PSW : 0704f00180000000 0000000000000002 (0x2)
    [ 2057.572748]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 RI:0 EA:3
    [ 2057.572751] Krnl GPRS: 0000000000000004 0000000000000000 00000000a3b008d8 0000000000000000
    [ 2057.572754]            00000000a3b008d8 cb923a29c779abc5 0000000000000000 00000000814cfd80
    [ 2057.572756]            000000000000012c 0000000000000000 00000000a3b008d8 00000000a3b008d8
    [ 2057.572758]            00000000bab6d500 00000000814cfd80 0000000091317e46 00000000814cfc68
    [ 2057.572762] Krnl Code:#0000000000000000: 0000                illegal
                             >0000000000000002: 0000                illegal
                              0000000000000004: 0000                illegal
                              0000000000000006: 0000                illegal
                              0000000000000008: 0000                illegal
                              000000000000000a: 0000                illegal
                              000000000000000c: 0000                illegal
                              000000000000000e: 0000                illegal
    [ 2057.572800] Call Trace:
    [ 2057.572801] ([<00000000ec639700>] 0xec639700)
    [ 2057.572803]  [<00000000913183e2>] net_rx_action+0x2ba/0x398
    [ 2057.572809]  [<0000000091515f76>] __do_softirq+0x11e/0x3a0
    [ 2057.572813]  [<0000000090ce160c>] do_softirq_own_stack+0x3c/0x58
    [ 2057.572817] ([<0000000090d2cbd6>] do_softirq.part.1+0x56/0x60)
    [ 2057.572822]  [<0000000090d2cc60>] __local_bh_enable_ip+0x80/0x98
    [ 2057.572825]  [<0000000091314706>] __dev_queue_xmit+0x2be/0xd70
    [ 2057.572827]  [<000003ff803dd6d6>] afiucv_hs_send+0x24e/0x300 [af_iucv]
    [ 2057.572830]  [<000003ff803dd88a>] iucv_send_ctrl+0x102/0x138 [af_iucv]
    [ 2057.572833]  [<000003ff803de72a>] iucv_sock_connect+0x37a/0x468 [af_iucv]
    [ 2057.572835]  [<00000000912e7e90>] __sys_connect+0xa0/0xd8
    [ 2057.572839]  [<00000000912e9580>] sys_socketcall+0x228/0x348
    [ 2057.572841]  [<0000000091514e1a>] system_call+0x2a6/0x2c8
    [ 2057.572843] Last Breaking-Event-Address:
    [ 2057.572844]  [<0000000091317e44>] __napi_poll+0x4c/0x1d8
    [ 2057.572846]
    [ 2057.572847] Kernel panic - not syncing: Fatal exception in interrupt
    -------------------------------------------------------------------------------------------
    
    Analysis:
    There is one napi structure per out_q: card->qdio.out_qs[i].napi
    The napi.poll functions are set during qeth_open().
    
    Since
    commit 1cfef80d ("s390/qeth: Don't call dev_close/dev_open (DOWN/UP)")
    qeth_set_offline()/qeth_set_online() no longer call dev_close()/
    dev_open(). So if qeth_free_qdio_queues() cleared
    card->qdio.out_qs[i].napi.poll while the network interface was UP and the
    card was offline, they are not set again.
    
    Reproduction:
    chzdev -e $devno layer2=0
    ip link set dev $network_interface up
    echo 0 > /sys/bus/ccwgroup/devices/0.0.$devno/online
    echo foo > /sys/bus/ccwgroup/devices/0.0.$devno/hsuid
    echo 1 > /sys/bus/ccwgroup/devices/0.0.$devno/online
    -> Crash (can be enforced e.g. by af_iucv connect(), ip link down/up, ...)
    
    Note that a Completion Queue (CQ) is only enabled or disabled, when hsuid
    is set for the first time or when it is removed.
    
    Workarounds:
    - Set hsuid before setting the device online for the first time
    or
    - Use chzdev -d $devno; chzdev $devno hsuid=xxx; chzdev -e $devno;
    to set hsuid on an existing device. (this will remove and recreate the
    network interface)
    
    Fix:
    There is no need to free the output queues when a completion queue is
    added or removed.
    card->qdio.state now indicates whether the inbound buffer pool and the
    outbound queues are allocated.
    card->qdio.c_q indicates whether a CQ is allocated.
    
    Fixes: 1cfef80d ("s390/qeth: Don't call dev_close/dev_open (DOWN/UP)")
    Signed-off-by: default avatarAlexandra Winter <wintera@linux.ibm.com>
    Reviewed-by: default avatarSimon Horman <horms@kernel.org>
    Link: https://lore.kernel.org/r/20240430091004.2265683-1-wintera@linux.ibm.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
    8a2e4d37
qeth_core_main.c 187 KB