• Rafael J. Wysocki's avatar
    ACPI / dock / PCI: Synchronous handling of dock events for PCI devices · 21a31013
    Rafael J. Wysocki authored
    The interactions between the ACPI dock driver and the ACPI-based PCI
    hotplug (acpiphp) are currently problematic because of ordering
    issues during hot-remove operations.
    
    First of all, the current ACPI glue code expects that physical
    devices will always be deleted before deleting the companion ACPI
    device objects.  Otherwise, acpi_unbind_one() will fail with a
    warning message printed to the kernel log, for example:
    
    [  185.026073] usb usb5: Oops, 'acpi_handle' corrupt
    [  185.035150] pci 0000:1b:00.0: Oops, 'acpi_handle' corrupt
    [  185.035515] pci 0000:18:02.0: Oops, 'acpi_handle' corrupt
    [  180.013656]  port1: Oops, 'acpi_handle' corrupt
    
    This means, in particular, that struct pci_dev objects have to
    be deleted before the struct acpi_device objects they are "glued"
    with.
    
    Now, the following happens the during the undocking of an ACPI-based
    dock station:
     1) hotplug_dock_devices() invokes registered hotplug callbacks to
        destroy physical devices associated with the ACPI device objects
        depending on the dock station.  It calls dd->ops->handler() for
        each of those device objects.
     2) For PCI devices dd->ops->handler() points to
        handle_hotplug_event_func() that queues up a separate work item
        to execute _handle_hotplug_event_func() for the given device and
        returns immediately.  That work item will be executed later.
     3) hotplug_dock_devices() calls dock_remove_acpi_device() for each
        device depending on the dock station.  This runs acpi_bus_trim()
        for each of them, which causes the underlying ACPI device object
        to be destroyed, but the work items queued up by
        handle_hotplug_event_func() haven't been started yet.
     4) _handle_hotplug_event_func() queued up in step 2) are executed
        and cause the above failure to happen, because the PCI devices
        they handle do not have the companion ACPI device objects any
        more (those objects have been deleted in step 3).
    
    The possible breakage doesn't end here, though, because
    hotplug_dock_devices() may return before at least some of the
    _handle_hotplug_event_func() work items spawned by it have a
    chance to complete and then undock() will cause _DCK to be
    evaluated and that will cause the devices handled by the
    _handle_hotplug_event_func() to go away possibly while they are
    being accessed.
    
    This means that dd->ops->handler() for PCI devices should not point
    to handle_hotplug_event_func().  Instead, it should point to a
    function that will do the work of _handle_hotplug_event_func()
    synchronously.  For this reason, introduce such a function,
    hotplug_event_func(), and modity acpiphp_dock_ops to point to
    it as the handler.
    
    Unfortunately, however, this is not sufficient, because if the dock
    code were not changed further, hotplug_event_func() would now
    deadlock with hotplug_dock_devices() that called it, since it would
    run unregister_hotplug_dock_device() which in turn would attempt to
    acquire the dock station's hp_lock mutex already acquired by
    hotplug_dock_devices().
    
    To resolve that deadlock use the observation that
    unregister_hotplug_dock_device() won't need to acquire hp_lock
    if PCI bridges the devices on the dock station depend on are
    prevented from being removed prematurely while the first loop in
    hotplug_dock_devices() is in progress.
    
    To make that possible, introduce a mechanism by which the callers of
    register_hotplug_dock_device() can provide "init" and "release"
    routines that will be executed, respectively, during the addition
    and removal of the physical device object associated with the
    given ACPI device handle.  Make acpiphp use two new functions,
    acpiphp_dock_init() and acpiphp_dock_release(), that call
    get_bridge() and put_bridge(), respectively, on the acpiphp bridge
    holding the given device, for this purpose.
    
    In addition to that, remove the dock station's list of
    "hotplug devices" and make the dock code always walk the whole list
    of "dependent devices" instead in such a way that the loops in
    hotplug_dock_devices() and dock_event() (replacing the loops over
    "hotplug devices") will take references to the list entries that
    register_hotplug_dock_device() has been called for.  That prevents
    the "release" routines associated with those entries from being
    called while the given entry is being processed and for PCI
    devices this means that their bridges won't be removed (by a
    concurrent thread) while hotplug_event_func() handling them is
    being executed.
    
    This change is based on two earlier patches from Jiang Liu.
    
    References: https://bugzilla.kernel.org/show_bug.cgi?id=59501Reported-and-tested-by: default avatarAlexander E. Patrakov <patrakov@gmail.com>
    Tracked-down-by: default avatarJiang Liu <jiang.liu@huawei.com>
    Tested-by: default avatarIllya Klymov <xanf@xanf.me>
    Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
    Acked-by: default avatarYinghai Lu <yinghai@kernel.org>
    Cc: 3.9+ <stable@vger.kernel.org>
    21a31013
acpiphp_glue.c 32.2 KB