• Qiao Ma's avatar
    net: hinic: avoid kernel hung in hinic_get_stats64() · 98f9fcde
    Qiao Ma authored
    When using hinic device as a bond slave device, and reading device stats
    of master bond device, the kernel may hung.
    
    The kernel panic calltrace as follows:
    Kernel panic - not syncing: softlockup: hung tasks
    Call trace:
      native_queued_spin_lock_slowpath+0x1ec/0x31c
      dev_get_stats+0x60/0xcc
      dev_seq_printf_stats+0x40/0x120
      dev_seq_show+0x1c/0x40
      seq_read_iter+0x3c8/0x4dc
      seq_read+0xe0/0x130
      proc_reg_read+0xa8/0xe0
      vfs_read+0xb0/0x1d4
      ksys_read+0x70/0xfc
      __arm64_sys_read+0x20/0x30
      el0_svc_common+0x88/0x234
      do_el0_svc+0x2c/0x90
      el0_svc+0x1c/0x30
      el0_sync_handler+0xa8/0xb0
      el0_sync+0x148/0x180
    
    And the calltrace of task that actually caused kernel hungs as follows:
      __switch_to+124
      __schedule+548
      schedule+72
      schedule_timeout+348
      __down_common+188
      __down+24
      down+104
      hinic_get_stats64+44 [hinic]
      dev_get_stats+92
      bond_get_stats+172 [bonding]
      dev_get_stats+92
      dev_seq_printf_stats+60
      dev_seq_show+24
      seq_read_iter+964
      seq_read+220
      proc_reg_read+164
      vfs_read+172
      ksys_read+108
      __arm64_sys_read+28
      el0_svc_common+132
      do_el0_svc+40
      el0_svc+24
      el0_sync_handler+164
      el0_sync+324
    
    When getting device stats from bond, kernel will call bond_get_stats().
    It first holds the spinlock bond->stats_lock, and then call
    hinic_get_stats64() to collect hinic device's stats.
    However, hinic_get_stats64() calls `down(&nic_dev->mgmt_lock)` to
    protect its critical section, which may schedule current task out.
    And if system is under high pressure, the task cannot be woken up
    immediately, which eventually triggers kernel hung panic.
    
    Since previous patch has replaced hinic_dev.tx_stats/rx_stats with local
    variable in hinic_get_stats64(), there is nothing need to be protected
    by lock, so just removing down()/up() is ok.
    
    Fixes: edd384f6 ("net-next/hinic: Add ethtool and stats")
    Signed-off-by: default avatarQiao Ma <mqaio@linux.alibaba.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    98f9fcde
hinic_main.c 37.1 KB