• Luis R. Rodriguez's avatar
    firmware: fix batched requests - wake all waiters · e44565f6
    Luis R. Rodriguez authored
    The firmware cache mechanism serves two purposes, the secondary purpose is
    not well documented nor understood. This fixes a regression with the
    secondary purpose of the firmware cache mechanism: batched requests on
    successful lookups. Without this fix *any* time a batched request is
    triggered, secondary requests for which the batched request mechanism
    was designed for will seem to last forver and seem to never return.
    This issue is present for all kernel builds possible, and a hard reset
    is required.
    
    The firmware cache is used for:
    
    1) Addressing races with file lookups during the suspend/resume cycle
       by keeping firmware in memory during the suspend/resume cycle
    
    2) Batched requests for the same file rely only on work from the first file
       lookup, which keeps the firmware in memory until the last
       release_firmware() is called
    
    Batched requests *only* take effect if secondary requests come in prior to
    the first user calling release_firmware(). The devres name used for the
    internal firmware cache is used as a hint other pending requests are
    ongoing, the firmware buffer data is kept in memory until the last user of
    the buffer calls release_firmware(), therefore serializing requests and
    delaying the release until all requests are done.
    
    Batched requests wait for a wakup or signal so we can rely on the first file
    fetch to write to the pending secondary requests. Commit 5b029624
    ("firmware: do not use fw_lock for fw_state protection") ported the firmware
    API to use swait, and in doing so failed to convert complete_all() to
    swake_up_all() -- it used swake_up(), loosing the ability for *some* batched
    requests to take effect.
    
    We *could* fix this by just using swake_up_all() *but* swait is now known
    to be very special use case, so its best to just move away from it. So we
    just go back to using completions as before commit 5b029624 ("firmware:
    do not use fw_lock for fw_state protection") given this was using
    complete_all().
    
    Without this fix it has been reported plugging in two Intel 6260 Wifi cards
    on a system will end up enumerating the two devices only 50% of the time
    [0]. The ported swake_up() should have actually handled the case with two
    devices, however, *if more than two cards are used* the swake_up() would
    not have sufficed. This change is only part of the required fixes for
    batched requests. Another fix is provided in the next patch.
    
    This particular change should fix the cases where more than three requests
    with the same firmware name is used, otherwise batched requests will wait
    for MAX_SCHEDULE_TIMEOUT and just timeout eventually.
    
    Below is a summary of tests triggering batched requests on different
    kernel builds.
    
    Before this patch:
    ============================================================================
    CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n
    CONFIG_FW_LOADER_USER_HELPER=y
    
    Most common Linux distribution setup.
    
    API-type                               no-firmware-found   firmware-found
    ----------------------------------------------------------------------
    request_firmware()                     FAIL                FAIL
    request_firmware_direct()              FAIL                FAIL
    request_firmware_nowait(uevent=true)   FAIL                FAIL
    request_firmware_nowait(uevent=false)  FAIL                FAIL
    ============================================================================
    CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n
    CONFIG_FW_LOADER_USER_HELPER=n
    
    Only possible if CONFIG_DELL_RBU=n and CONFIG_LEDS_LP55XX_COMMON=n, rare.
    
    API-type                               no-firmware-found   firmware-found
    ----------------------------------------------------------------------
    request_firmware()                     FAIL                FAIL
    request_firmware_direct()              FAIL                FAIL
    request_firmware_nowait(uevent=true)   FAIL                FAIL
    request_firmware_nowait(uevent=false)  FAIL                FAIL
    ============================================================================
    CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y
    CONFIG_FW_LOADER_USER_HELPER=y
    
    Google Android setup.
    
    API-type                               no-firmware-found   firmware-found
    ----------------------------------------------------------------------
    request_firmware()                     FAIL                FAIL
    request_firmware_direct()              FAIL                FAIL
    request_firmware_nowait(uevent=true)   FAIL                FAIL
    request_firmware_nowait(uevent=false)  FAIL                FAIL
    ============================================================================
    
    After this patch:
    ============================================================================
    CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n
    CONFIG_FW_LOADER_USER_HELPER=y
    
    Most common Linux distribution setup.
    
    API-type                               no-firmware-found   firmware-found
    ----------------------------------------------------------------------
    request_firmware()                     FAIL                OK
    request_firmware_direct()              FAIL                OK
    request_firmware_nowait(uevent=true)   FAIL                OK
    request_firmware_nowait(uevent=false)  FAIL                OK
    ============================================================================
    CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n
    CONFIG_FW_LOADER_USER_HELPER=n
    
    Only possible if CONFIG_DELL_RBU=n and CONFIG_LEDS_LP55XX_COMMON=n, rare.
    
    API-type                               no-firmware-found   firmware-found
    ----------------------------------------------------------------------
    request_firmware()                     FAIL                OK
    request_firmware_direct()              FAIL                OK
    request_firmware_nowait(uevent=true)   FAIL                OK
    request_firmware_nowait(uevent=false)  FAIL                OK
    ============================================================================
    CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y
    CONFIG_FW_LOADER_USER_HELPER=y
    
    Google Android setup.
    
    API-type                               no-firmware-found   firmware-found
    ----------------------------------------------------------------------
    request_firmware()                     OK                  OK
    request_firmware_direct()              FAIL                OK
    request_firmware_nowait(uevent=true)   OK                  OK
    request_firmware_nowait(uevent=false)  OK                  OK
    ============================================================================
    
    [0] https://bugzilla.kernel.org/show_bug.cgi?id=195477
    
    CC: <stable@vger.kernel.org>    [4.10+]
    Cc: Ming Lei <ming.lei@redhat.com>
    Fixes: 5b029624 ("firmware: do not use fw_lock for fw_state protection")
    Reported-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
    Signed-off-by: default avatarLuis R. Rodriguez <mcgrof@kernel.org>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    e44565f6
firmware_class.c 45.2 KB