• Michael J. Ruhl's avatar
    IB/hfi1: Invalid NUMA node information can cause a divide by zero · 52ec8484
    Michael J. Ruhl authored
    [ Upstream commit c513de49 ]
    
    If the system BIOS does not supply NUMA node information to the
    PCI devices, the NUMA node is selected by choosing the current
    node.
    
    This can lead to the following crash:
    
    divide error: 0000 SMP
    CPU: 0 PID: 4 Comm: kworker/0:0 Tainted: G          IOE
    ------------   3.10.0-693.21.1.el7.x86_64 #1
    Hardware name: Intel Corporation S2600KP/S2600KP, BIOS
    SE5C610.86B.01.01.0005.101720141054 10/17/2014
    Workqueue: events work_for_cpu_fn
    task: ffff880174480fd0 ti: ffff880174488000 task.ti: ffff880174488000
    RIP: 0010: [<ffffffffc020ac69>] hfi1_dev_affinity_init+0x129/0x6a0 [hfi1]
    RSP: 0018:ffff88017448bbf8  EFLAGS: 00010246
    RAX: 0000000000000011 RBX: ffff88107ffba6c0 RCX: ffff88085c22e130
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880824ad0000
    RBP: ffff88017448bc48 R08: 0000000000000011 R09: 0000000000000002
    R10: ffff8808582b6ca0 R11: 0000000000003151 R12: ffff8808582b6ca0
    R13: ffff8808582b6518 R14: ffff8808582b6010 R15: 0000000000000012
    FS:  0000000000000000(0000) GS:ffff88085ec00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007efc707404f0 CR3: 0000000001a02000 CR4: 00000000001607f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Call Trace:
     hfi1_init_dd+0x14b3/0x27a0 [hfi1]
     ? pcie_capability_write_word+0x46/0x70
     ? hfi1_pcie_init+0xc0/0x200 [hfi1]
     do_init_one+0x153/0x4c0 [hfi1]
     ? sched_clock_cpu+0x85/0xc0
     init_one+0x1b5/0x260 [hfi1]
     local_pci_probe+0x4a/0xb0
     work_for_cpu_fn+0x1a/0x30
     process_one_work+0x17f/0x440
     worker_thread+0x278/0x3c0
     ? manage_workers.isra.24+0x2a0/0x2a0
     kthread+0xd1/0xe0
     ? insert_kthread_work+0x40/0x40
     ret_from_fork+0x77/0xb0
     ? insert_kthread_work+0x40/0x40
    
    If the BIOS is not supplying NUMA information:
      - set the default table count to 1 for all possible nodes
      - select node 0 (instead of current NUMA) node to get consistent
        performance
      - generate an error indicating that the BIOS should be upgraded
    Reviewed-by: default avatarGary Leshner <gary.s.leshner@intel.com>
    Reviewed-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
    Signed-off-by: default avatarMichael J. Ruhl <michael.j.ruhl@intel.com>
    Signed-off-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
    Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
    Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    52ec8484
affinity.c 22.5 KB