• Masayoshi Mizuma's avatar
    x86/smpboot: Fix uncore_pci_remove() indexing bug when hot-removing a physical CPU · 295cc7eb
    Masayoshi Mizuma authored
    When a physical CPU is hot-removed, the following warning messages
    are shown while the uncore device is removed in uncore_pci_remove():
    
      WARNING: CPU: 120 PID: 5 at arch/x86/events/intel/uncore.c:988
      uncore_pci_remove+0xf1/0x110
      ...
      CPU: 120 PID: 5 Comm: kworker/u1024:0 Not tainted 4.15.0-rc8 #1
      Workqueue: kacpi_hotplug acpi_hotplug_work_fn
      ...
      Call Trace:
      pci_device_remove+0x36/0xb0
      device_release_driver_internal+0x145/0x210
      pci_stop_bus_device+0x76/0xa0
      pci_stop_root_bus+0x44/0x60
      acpi_pci_root_remove+0x1f/0x80
      acpi_bus_trim+0x54/0x90
      acpi_bus_trim+0x2e/0x90
      acpi_device_hotplug+0x2bc/0x4b0
      acpi_hotplug_work_fn+0x1a/0x30
      process_one_work+0x141/0x340
      worker_thread+0x47/0x3e0
      kthread+0xf5/0x130
    
    When uncore_pci_remove() runs, it tries to get the package ID to
    clear the value of uncore_extra_pci_dev[].dev[] by using
    topology_phys_to_logical_pkg(). The warning messesages are
    shown because topology_phys_to_logical_pkg() returns -1.
    
      arch/x86/events/intel/uncore.c:
      static void uncore_pci_remove(struct pci_dev *pdev)
      {
      ...
              phys_id = uncore_pcibus_to_physid(pdev->bus);
      ...
                      pkg = topology_phys_to_logical_pkg(phys_id); // returns -1
                      for (i = 0; i < UNCORE_EXTRA_PCI_DEV_MAX; i++) {
                              if (uncore_extra_pci_dev[pkg].dev[i] == pdev) {
                                      uncore_extra_pci_dev[pkg].dev[i] = NULL;
                                      break;
                              }
                      }
                      WARN_ON_ONCE(i >= UNCORE_EXTRA_PCI_DEV_MAX); // <=========== HERE!!
    
    topology_phys_to_logical_pkg() tries to find
    cpuinfo_x86->phys_proc_id that matches the phys_pkg argument.
    
      arch/x86/kernel/smpboot.c:
      int topology_phys_to_logical_pkg(unsigned int phys_pkg)
      {
              int cpu;
    
              for_each_possible_cpu(cpu) {
                      struct cpuinfo_x86 *c = &cpu_data(cpu);
    
                      if (c->initialized && c->phys_proc_id == phys_pkg)
                              return c->logical_proc_id;
              }
              return -1;
      }
    
    However, the phys_proc_id was already set to 0 by remove_siblinginfo()
    when the CPU was offlined.
    
    So, topology_phys_to_logical_pkg() cannot find the correct
    logical_proc_id and always returns -1.
    
    As the result, uncore_pci_remove() calls WARN_ON_ONCE() and the warning
    messages are shown.
    
    What is worse is that the bogus 'pkg' index results in two bugs:
    
     - We dereference uncore_extra_pci_dev[] with a negative index
     - We fail to clean up a stale pointer in uncore_extra_pci_dev[][]
    
    To fix these bugs, remove the clearing of ->phys_proc_id from remove_siblinginfo().
    
    This should not cause any problems, because ->phys_proc_id is not
    used after it is hot-removed and it is re-set while hot-adding.
    Signed-off-by: default avatarMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
    Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: yasu.isimatu@gmail.com
    Cc: <stable@vger.kernel.org>
    Fixes: 30bb9811 ("x86/topology: Avoid wasting 128k for package id array")
    Link: http://lkml.kernel.org/r/ed738d54-0f01-b38b-b794-c31dc118c207@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
    295cc7eb
smpboot.c 38.9 KB