1. 15 Aug, 2018 12 commits
    • Bjorn Helgaas's avatar
      Merge branch 'pci/peer-to-peer' · c689209b
      Bjorn Helgaas authored
        - Add "pci=disable_acs_redir=" parameter to disable ACS redirection for
          peer-to-peer DMA support (we don't have the peer-to-peer support yet;
          this is just one piece) (Logan Gunthorpe)
      
      * pci/peer-to-peer:
        PCI: Add ACS Redirect disable quirk for Intel Sunrise Point
        PCI: Add device-specific ACS Redirect disable infrastructure
        PCI: Convert device-specific ACS quirks from NULL termination to ARRAY_SIZE
        PCI: Add "pci=disable_acs_redir=" parameter for peer-to-peer support
        PCI: Allow specifying devices using a base bus and path of devfns
        PCI: Make specifying PCI devices in kernel parameters reusable
        PCI: Hide ACS quirk declarations inside PCI core
      c689209b
    • Bjorn Helgaas's avatar
      Merge branch 'pci/notes' · eadf3d32
      Bjorn Helgaas authored
        - Document ACPI description of PCI host bridges (Bjorn Helgaas)
      
      * pci/notes:
        PCI: Document ACPI description of PCI host bridges
      eadf3d32
    • Bjorn Helgaas's avatar
      Merge branch 'pci/msi' · 11c1a8e1
      Bjorn Helgaas authored
        - Set IRQCHIP_ONESHOT_SAFE for PCI MSI irqchips (Heiner Kallweit)
      
      * pci/msi:
        PCI/MSI: Set IRQCHIP_ONESHOT_SAFE for PCI-MSI irqchips
      11c1a8e1
    • Bjorn Helgaas's avatar
      Merge branch 'pci/misc' · a40f72db
      Bjorn Helgaas authored
        - Mark fall-through switch cases before enabling -Wimplicit-fallthrough
          (Gustavo A. R. Silva)
      
        - Move DMA-debug PCI init from arch code to PCI core (Christoph Hellwig)
      
        - Fix pci_request_irq() usage of IRQF_ONESHOT when no handler is supplied
          (Heiner Kallweit)
      
        - Unify PCI and DMA direction #defines (Shunyong Yang)
      
        - Add PCI_DEVICE_DATA() macro (Andy Shevchenko)
      
        - Check for VPD completion before checking for timeout (Bert Kenward)
      
        - Limit Netronome NFP5000 config space size to work around erratum (Jakub
          Kicinski)
      
      * pci/misc:
        PCI: Limit config space size for Netronome NFP5000
        PCI/VPD: Check for VPD access completion before checking for timeout
        PCI: Add PCI_DEVICE_DATA() macro to fully describe device ID entry
        PCI: Unify PCI and normal DMA direction definitions
        PCI: Use IRQF_ONESHOT if pci_request_irq() called with no handler
        PCI: Call dma_debug_add_bus() for pci_bus_type from PCI core
        PCI: Mark fall-through switch cases before enabling -Wimplicit-fallthrough
      
      # Conflicts:
      #	drivers/pci/hotplug/pciehp_ctrl.c
      a40f72db
    • Bjorn Helgaas's avatar
      Merge branch 'pci/hotplug' · c0638a45
      Bjorn Helgaas authored
        - Simplify SHPC existence/permission checks (Bjorn Helgaas)
      
        - Remove hotplug sample skeleton driver (Lukas Wunner)
      
        - Convert pciehp to threaded IRQ handling (Lukas Wunner)
      
        - Improve pciehp tolerance of missed events and initially unstable links
          (Lukas Wunner)
      
        - Clear spurious pciehp events on resume (Lukas Wunner)
      
        - Add pciehp runtime PM support, including for Thunderbolt controllers
          (Lukas Wunner)
      
        - Support interrupts from pciehp bridges in D3hot (Lukas Wunner)
      
      * pci/hotplug:
        PCI: pciehp: Deduplicate presence check on probe & resume
        PCI: pciehp: Avoid implicit fallthroughs in switch statements
        PCI: Whitelist Thunderbolt ports for runtime D3
        PCI: Whitelist native hotplug ports for runtime D3
        PCI: sysfs: Resume to D0 on function reset
        PCI: pciehp: Resume parent to D0 on config space access
        PCI: pciehp: Resume to D0 on enable/disable
        PCI: pciehp: Support interrupts sent from D3hot
        PCI: pciehp: Obey compulsory command delay after resume
        PCI: pciehp: Clear spurious events earlier on resume
        PCI: portdrv: Deduplicate PM callback iterator
        PCI: pciehp: Avoid slot access during reset
        PCI: pciehp: Always enable occupied slot on probe
        PCI: pciehp: Become resilient to missed events
        PCI: pciehp: Tolerate initially unstable link
        PCI: pciehp: Declare pciehp_enable/disable_slot() static
        PCI: pciehp: Drop enable/disable lock
        PCI: pciehp: Enable/disable exclusively from IRQ thread
        PCI: pciehp: Track enable/disable status
        PCI: pciehp: Publish to user space last on probe
        PCI: hotplug: Demidlayer registration with the core
        PCI: pciehp: Drop slot workqueue
        PCI: pciehp: Handle events synchronously
        PCI: pciehp: Stop blinking on slot enable failure
        PCI: pciehp: Convert to threaded polling
        PCI: pciehp: Convert to threaded IRQ
        PCI: pciehp: Document struct slot and struct controller
        PCI: pciehp: Declare pciehp_unconfigure_device() void
        PCI: pciehp: Drop unnecessary NULL pointer check
        PCI: pciehp: Fix unprotected list iteration in IRQ handler
        PCI: pciehp: Fix use-after-free on unplug
        PCI: hotplug: Don't leak pci_slot on registration failure
        PCI: hotplug: Delete skeleton driver
        PCI: shpchp: Separate existence of SHPC and permission to use it
      c0638a45
    • Bjorn Helgaas's avatar
      Merge branch 'pci/enumeration' · a8bcb5e5
      Bjorn Helgaas authored
        - Work around IDT switch ACS Source Validation erratum (James
          Puthukattukaran)
      
        - Emit diagnostics for all cases of PCIe Link downtraining (Links
          operating slower than they're capable of) (Alexandru Gagniuc)
      
        - Skip VFs when configuring Max Payload Size (Myron Stowe)
      
        - Reduce Root Port Max Payload Size if necessary when hot-adding a device
          below it (Myron Stowe)
      
      * pci/enumeration:
        PCI: Match Root Port's MPS to endpoint's MPSS as necessary
        PCI: Skip MPS logic for Virtual Functions (VFs)
        PCI: Check for PCIe Link downtraining
        PCI: Workaround IDT switch ACS Source Validation erratum
      a8bcb5e5
    • Bjorn Helgaas's avatar
      Merge branch 'pci/dpc' · 1ca358a8
      Bjorn Helgaas authored
        - Defer DPC event handling to work queue (Keith Busch)
      
        - Use threaded IRQ for DPC bottom half (Keith Busch)
      
        - Print AER status while handling DPC events (Keith Busch)
      
      * pci/dpc:
        PCI/DPC: Remove indirection waiting for inactive link
        PCI/DPC: Use threaded IRQ for bottom half handling
        PCI/DPC: Print AER status in DPC event handling
        PCI/DPC: Remove rp_pio_status from dpc struct
        PCI/DPC: Defer event handling to work queue
        PCI/DPC: Leave interrupts enabled while handling event
      1ca358a8
    • Bjorn Helgaas's avatar
      Merge branch 'pci/aspm' · 187dacce
      Bjorn Helgaas authored
        - Use sysfs_match_string() to simplify ASPM sysfs parsing (Andy
          Shevchenko)
      
        - Remove unnecessary includes of <linux/pci-aspm.h> (Bjorn Helgaas)
      
      * pci/aspm:
        PCI: Remove unnecessary include of <linux/pci-aspm.h>
        iwlwifi: Remove unnecessary include of <linux/pci-aspm.h>
        ath9k: Remove unnecessary include of <linux/pci-aspm.h>
        igb: Remove unnecessary include of <linux/pci-aspm.h>
        PCI/ASPM: Convert to use sysfs_match_string() helper
      187dacce
    • Bjorn Helgaas's avatar
      Merge branch 'pci/aer' · 3c3ab37f
      Bjorn Helgaas authored
        - Decode AER errors with names similar to "lspci" (Tyler Baicar)
      
        - Expose AER statistics in sysfs (Rajat Jain)
      
        - Clear AER status bits selectively based on the type of recovery (Oza
          Pawandeep)
      
        - Honor "pcie_ports=native" even if HEST sets FIRMWARE_FIRST (Alexandru
          Gagniuc)
      
        - Don't clear AER status bits if we're using the "Firmware-First"
          strategy where firmware owns the registers (Alexandru Gagniuc)
      
      * pci/aer:
        PCI/AER: Don't clear AER bits if error handling is Firmware-First
        PCI/AER: Remove duplicate PCI_EXP_AER_FLAGS definition
        PCI/portdrv: Remove pcie_portdrv_err_handler.slot_reset
        PCI/AER: Clear device status bits during ERR_COR handling
        PCI/AER: Clear device status bits during ERR_FATAL and ERR_NONFATAL
        PCI/AER: Remove ERR_FATAL code from ERR_NONFATAL path
        PCI/AER: Factor out ERR_NONFATAL status bit clearing
        PCI/AER: Clear only ERR_NONFATAL bits during non-fatal recovery
        PCI/AER: Clear only ERR_FATAL status bits during fatal recovery
        PCI/AER: Honor "pcie_ports=native" even if HEST sets FIRMWARE_FIRST
        PCI/AER: Add sysfs attributes for rootport cumulative stats
        PCI/AER: Add sysfs attributes to provide AER stats and breakdown
        PCI/AER: Define aer_stats structure for AER capable devices
        PCI/AER: Move internal declarations to drivers/pci/pci.h
        PCI/AER: Adopt lspci names for AER error decoding
        PCI/AER: Expose internal API for obtaining AER information
      
      # Conflicts:
      #	drivers/pci/pci.h
      3c3ab37f
    • Bjorn Helgaas's avatar
      Merge branch 'for-linus' · af863d18
      Bjorn Helgaas authored
      * for-linus:
        PCI: Fix is_added/is_busmaster race condition
        PCI: mobiveil: Avoid integer overflow in IB_WIN_SIZE
        PCI/AER: Work around use-after-free in pcie_do_fatal_recovery()
        PCI: v3-semi: Fix I/O space page leak
        PCI: mediatek: Fix I/O space page leak
        PCI: faraday: Fix I/O space page leak
        PCI: aardvark: Fix I/O space page leak
        PCI: designware: Fix I/O space page leak
        PCI: versatile: Fix I/O space page leak
        PCI: xgene: Fix I/O space page leak
        PCI: OF: Fix I/O space page leak
        PCI: endpoint: Fix NULL pointer dereference error when CONFIGFS is disabled
        PCI: hv: Disable/enable IRQs rather than BH in hv_compose_msi_msg()
        nfp: stop limiting VFs to 0
        PCI/IOV: Reset total_VFs limit after detaching PF driver
        PCI: faraday: Add missing of_node_put()
        PCI: xilinx-nwl: Add missing of_node_put()
        PCI: xilinx: Add missing of_node_put()
        PCI: endpoint: Use after free in pci_epf_unregister_driver()
        PCI: controller: dwc: Do not let PCIE_DW_PLAT_HOST default to yes
        PCI: rcar: Clean up PHY init on failure
        PCI: rcar: Shut the PHY down in failpath
        PCI: controller: Move PCI_DOMAINS selection to arch Kconfig
        PCI: Initialize endpoint library before controllers
        PCI: shpchp: Manage SHPC unconditionally on non-ACPI systems
      af863d18
    • Alexandru Gagniuc's avatar
      PCI/AER: Don't clear AER bits if error handling is Firmware-First · 45687f96
      Alexandru Gagniuc authored
      If the platform requests Firmware-First error handling, firmware is
      responsible for reading and clearing AER status bits.  If OSPM also clears
      them, we may miss errors.  See ACPI v6.2, sec 18.3.2.5 and 18.4.
      
      This race is mostly of theoretical significance, as it is not easy to
      reasonably demonstrate it in testing.
      Signed-off-by: default avatarAlexandru Gagniuc <mr.nuke.me@gmail.com>
      [bhelgaas: add similar guards to pci_cleanup_aer_uncorrect_error_status()
      and pci_aer_clear_fatal_status()]
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      45687f96
    • Jakub Kicinski's avatar
      PCI: Limit config space size for Netronome NFP5000 · 2538fb89
      Jakub Kicinski authored
      Like the NFP4000 and NFP6000, the NFP5000 as an erratum where reading/
      writing to PCI config space addresses above 0x600 can cause the NFP to
      generate PCIe completion timeouts.
      
      Limit the NFP5000's PF's config space size to 0x600 bytes as is already
      done for the NFP4000 and NFP6000.
      
      The NFP5000's VF is 0x6003 (PCI_DEVICE_ID_NETRONOME_NFP6000_VF), the same
      device ID as the NFP6000's VF.  Thus, its config space is already limited
      by the existing use of quirk_nfp6000().
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarTony Egan <tony.egan@netronome.com>
      2538fb89
  2. 14 Aug, 2018 5 commits
    • Heiner Kallweit's avatar
      PCI/MSI: Set IRQCHIP_ONESHOT_SAFE for PCI-MSI irqchips · 923aa4c3
      Heiner Kallweit authored
      If flag IRQCHIP_ONESHOT_SAFE isn't set for an irqchip and we have a
      threaded interrupt with no primary handler, flag IRQF_ONESHOT needs to be
      set for the interrupt, causing some overhead in the threaded interrupt
      handler.  For more detailed explanation also check following comment in
      __setup_irq():
      
        The interrupt was requested with handler = NULL, so we use the default
        primary handler for it. But it does not have the oneshot flag set.  In
        combination with level interrupts this is deadly, because the default
        primary handler just wakes the thread, then the irq lines is reenabled,
        but the device still has the level irq asserted.  Rinse and repeat....
      
        While this works for edge type interrupts, we play it safe and reject
        unconditionally because we can't say for sure which type this interrupt
        really has.  The type flags are unreliable as the underlying chip
        implementation can override them.
      
      Another comment in __setup_irq() gives a hint already that this
      overhead can be avoided for PCI-MSI:
      
        Some irq chips like MSI based interrupts are per se one shot safe.  Check
        the chip flags, so we can avoid the unmask dance at the end of the
        threaded handler for those.
      
      Following this let's mark all PCI-MSI irqchips as oneshot-safe.
      
      See also discussion here:
      https://lkml.kernel.org/r/alpine.DEB.2.21.1808032136490.1658@nanos.tec.linutronix.deSigned-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      923aa4c3
    • Bert Kenward's avatar
      PCI/VPD: Check for VPD access completion before checking for timeout · 6eaf2781
      Bert Kenward authored
      Previously we checked the timeout before checking the VPD access completion
      bit.  On a very heavily loaded system this can cause VPD access to timeout.
      Check the completion bit before checking the timeout.
      Signed-off-by: default avatarBert Kenward <bkenward@solarflare.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      6eaf2781
    • Andy Shevchenko's avatar
      PCI: Add PCI_DEVICE_DATA() macro to fully describe device ID entry · b72ae8ca
      Andy Shevchenko authored
      There are a lot of examples in the kernel where PCI_VDEVICE() is used and
      still looks not so convenient due to additional driver_data field attached.
      
      Introduce PCI_DEVICE_DATA() macro to fully describe device ID entry in
      shortest possible form. For example,
      
        before:
      
          { PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_MRFLD),
            (kernel_ulong_t) &dwc3_pci_mrfld_properties, },
      
        after:
      
          { PCI_DEVICE_DATA(INTEL, MRFLD, &dwc3_pci_mrfld_properties) },
      
      Drivers can be converted later on in independent way.
      
      While here, remove the unused macro with the same name from Ralink wireless
      driver.
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Acked-by: Kalle Valo <kvalo@codeaurora.org>	# for rt2x00
      b72ae8ca
    • Myron Stowe's avatar
      PCI: Match Root Port's MPS to endpoint's MPSS as necessary · 9f0e8935
      Myron Stowe authored
      In commit 27d868b5 ("PCI: Set MPS to match upstream bridge"), we made
      sure every device's MPS setting matches its upstream bridge, making it more
      likely that a hot-added device will work in a system with an optimized MPS
      configuration.
      
      Recently I've started encountering systems where the endpoint device's MPSS
      capability is less than its Root Port's current MPS value, thus the
      endpoint is not capable of matching its upstream bridge's MPS setting (see:
      bugzilla via "Link:" below).  This leaves the system vulnerable - the
      upstream Root Port could respond with larger TLPs than the device can
      handle, and the device will consider them to be 'Malformed'.
      
      One could use the "pci=pcie_bus_safe" kernel parameter to work around the
      issue, but that forces a user to supply a kernel parameter to get the
      system to function reliably and may end up limiting MPS settings of other
      unrelated, sub-topologies which could benefit from maintaining their larger
      values.
      
      Augment Keith's approach to include tuning down a Root Port's MPS setting
      when its hot-added endpoint device is not capable of matching it.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=200527Signed-off-by: default avatarMyron Stowe <myron.stowe@redhat.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Acked-by: default avatarJon Mason <jdmason@kudzu.us>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Sinan Kaya <okaya@kernel.org>
      Cc: Dongdong Liu <liudongdong3@huawei.com>
      9f0e8935
    • Myron Stowe's avatar
      PCI: Skip MPS logic for Virtual Functions (VFs) · 3dbe97ef
      Myron Stowe authored
      PCIe r4.0, sec 9.3.5.4, "Device Control Register", shows both
      Max_Payload_Size (MPS) and Max_Read_request_Size (MRRS) to be 'RsvdP' for
      VFs.  Just prior to the table it states:
      
        "PF and VF functionality is defined in Section 7.5.3.4 except where
         noted in Table 9-16.  For VF fields marked 'RsvdP', the PF setting
         applies to the VF."
      
      All of which implies that with respect to Max_Payload_Size Supported
      (MPSS), MPS, and MRRS values, we should not be paying any attention to the
      VF's fields, but rather only to the PF's.  Only looking at the PF's fields
      also logically makes sense as it's the sole physical interface to the PCIe
      bus.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=200527
      Fixes: 27d868b5 ("PCI: Set MPS to match upstream bridge")
      Signed-off-by: default avatarMyron Stowe <myron.stowe@redhat.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Cc: stable@vger.kernel.org # 4.3+
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Sinan Kaya <okaya@kernel.org>
      Cc: Dongdong Liu <liudongdong3@huawei.com>
      Cc: Jon Mason <jdmason@kudzu.us>
      3dbe97ef
  3. 10 Aug, 2018 1 commit
    • Alexandru Gagniuc's avatar
      PCI: Check for PCIe Link downtraining · 2d1ce5ec
      Alexandru Gagniuc authored
      When both ends of a PCIe Link are capable of a higher bandwidth than is
      currently in use, the Link is said to be "downtrained".  A downtrained Link
      may indicate hardware or configuration problems in the system, but it's
      hard to identify such Links from userspace.
      
      Refactor pcie_print_link_status() so it continues to always print PCIe
      bandwidth information, as several NIC drivers desire.
      
      Add a new internal __pcie_print_link_status() to emit a message only when a
      device's bandwidth is constrained by the fabric and call it from the PCI
      core for all devices, which identifies all downtrained Links.  It also
      emits messages for a few cases that are technically not downtrained, such
      as a x4 device in an open-ended x1 slot.
      Signed-off-by: default avatarAlexandru Gagniuc <mr.nuke.me@gmail.com>
      [bhelgaas: changelog, move __pcie_print_link_status() declaration to
      drivers/pci/, rename pcie_check_upstream_link() to
      pcie_report_downtraining()]
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      2d1ce5ec
  4. 09 Aug, 2018 7 commits
  5. 06 Aug, 2018 5 commits
  6. 31 Jul, 2018 10 commits
    • Shunyong Yang's avatar
      PCI: Unify PCI and normal DMA direction definitions · 546c596c
      Shunyong Yang authored
      Current DMA direction definitions in pci-dma-compat.h and dma-direction.h
      are mirrored in value.  Unify them to enhance readability and avoid
      possible inconsistency.
      Signed-off-by: default avatarShunyong Yang <shunyong.yang@hxt-semitech.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Cc: Joey Zheng <yu.zheng@hxt-semitech.com>
      546c596c
    • Bjorn Helgaas's avatar
      PCI/AER: Remove duplicate PCI_EXP_AER_FLAGS definition · 944d5859
      Bjorn Helgaas authored
      PCI_EXP_AER_FLAGS was defined twice (with identical definitions), once
      under #ifdef CONFIG_ACPI_APEI, and again at the top level.  This looks like
      my merge error from these commits:
      
        fd3362cb ("PCI/AER: Squash aerdrv_core.c into aerdrv.c")
        41cbc9eb ("PCI/AER: Squash ecrc.c into aerdrv.c")
      
      Remove the duplicate PCI_EXP_AER_FLAGS definition.
      
      Fixes: 41cbc9eb ("PCI/AER: Squash ecrc.c into aerdrv.c")
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarOza Pawandeep <poza@codeaurora.org>
      944d5859
    • Lukas Wunner's avatar
      PCI: pciehp: Deduplicate presence check on probe & resume · 4e6a1335
      Lukas Wunner authored
      On driver probe and on resume from system sleep, pciehp checks the
      Presence Detect State bit in the Slot Status register to bring up an
      occupied slot or bring down an unoccupied slot.  Both code paths are
      identical, so deduplicate them per Mika's request.
      
      On probe, an additional check is performed to disable power of an
      unoccupied slot.  This can e.g. happen if power was enabled by BIOS.
      It cannot happen once pciehp has taken control, hence is not necessary
      on resume:  The Slot Control register is set to the same value that it
      had on suspend by pci_restore_state(), so if the slot was occupied,
      power is enabled and if it wasn't, power is disabled.  Should occupancy
      have changed during the system sleep transition, power is adjusted by
      bringing up or down the slot per the paragraph above.
      
      To allow for deduplication of the presence check, move the power check
      to pcie_init().  This seems safer anyway, because right now it is
      performed while interrupts are already enabled, and although I can't
      think of a scenario where pciehp_power_off_slot() and the IRQ thread
      collide, it does feel brittle.
      
      However this means that pcie_init() may now write to the Slot Control
      register before the IRQ is requested.  If both the CCIE and HPIE bits
      happen to be set, pcie_wait_cmd() will wait for an interrupt (instead
      of polling the Command Completed bit) and eventually emit a timeout
      message.  Additionally, if a level-triggered INTx interrupt is used,
      the user may see a spurious interrupt splat.  Avoid by disabling
      interrupts before disabling power.  (Normally the HPIE and CCIE bits
      should be clear on probe, but conceivably they may already have been
      set e.g. by BIOS.)
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      4e6a1335
    • Lukas Wunner's avatar
      PCI: pciehp: Avoid implicit fallthroughs in switch statements · 8bb46b07
      Lukas Wunner authored
      Per Mika's request, add an explicit break to the last case of switch
      statements everywhere in pciehp to be more defensive towards future
      amendments.
      
      Per Gustavo's request, mark all non-empty implicit fallthroughs with a
      comment to silence warnings triggered by -Wimplicit-fallthrough=2.
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Acked-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      8bb46b07
    • Hari Vyas's avatar
      PCI: Fix is_added/is_busmaster race condition · 44bda4b7
      Hari Vyas authored
      When a PCI device is detected, pdev->is_added is set to 1 and proc and
      sysfs entries are created.
      
      When the device is removed, pdev->is_added is checked for one and then
      device is detached with clearing of proc and sys entries and at end,
      pdev->is_added is set to 0.
      
      is_added and is_busmaster are bit fields in pci_dev structure sharing same
      memory location.
      
      A strange issue was observed with multiple removal and rescan of a PCIe
      NVMe device using sysfs commands where is_added flag was observed as zero
      instead of one while removing device and proc,sys entries are not cleared.
      This causes issue in later device addition with warning message
      "proc_dir_entry" already registered.
      
      Debugging revealed a race condition between the PCI core setting the
      is_added bit in pci_bus_add_device() and the NVMe driver reset work-queue
      setting the is_busmaster bit in pci_set_master().  As these fields are not
      handled atomically, that clears the is_added bit.
      
      Move the is_added bit to a separate private flag variable and use atomic
      functions to set and retrieve the device addition state.  This avoids the
      race because is_added no longer shares a memory location with is_busmaster.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=200283Signed-off-by: default avatarHari Vyas <hari.vyas@broadcom.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarLukas Wunner <lukas@wunner.de>
      Acked-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      44bda4b7
    • Lukas Wunner's avatar
      PCI: Whitelist Thunderbolt ports for runtime D3 · 47a8e237
      Lukas Wunner authored
      Thunderbolt controllers can be runtime suspended to D3cold to save ~1.5W.
      This requires that runtime D3 is allowed on its PCIe ports, so whitelist
      them.
      
      The 2015 BIOS cutoff that we've instituted for runtime D3 on PCIe ports
      is unnecessary on Thunderbolt because we know that even the oldest
      controller, Light Ridge (2010), is able to suspend its ports to D3 just
      fine -- specifically including its hotplug ports.  And the power saving
      should be afforded to machines even if their BIOS predates 2015.
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Andreas Noever <andreas.noever@gmail.com>
      47a8e237
    • Lukas Wunner's avatar
      PCI: Whitelist native hotplug ports for runtime D3 · eb3b5bf1
      Lukas Wunner authored
      Previously we blacklisted PCIe hotplug ports for runtime D3 because:
      
      (a) Ports handled by the firmware must not be transitioned to D3 by the
          OS behind the firmware's back:
          https://bugzilla.kernel.org/show_bug.cgi?id=53811
      
      (b) Ports handled natively by the OS lacked runtime D3 support in the
          pciehp driver.
      
      We've just rectified the latter, so allow users to manually enable and
      test it by passing pcie_port_pm=force on the command line.  Vendors are
      thus put in a position to validate hotplug ports for runtime D3 and
      perhaps we can someday enable it by default, but with a BIOS cutoff date.
      
      Ashok Raj tested runtime D3 on hotplug ports of a SkyLake Xeon-SP in
      2017 and encountered Hardware Error NMIs, so this feature clearly cannot
      be enabled for everyone yet:
      https://lkml.kernel.org/r/20170503180426.GA4058@otc-nc-03
      
      While at it, remove an erroneous code comment I added with 97a90aee
      ("PCI: Consolidate conditions to allow runtime PM on PCIe ports") which
      claims that parents of a hotplug port must stay awake lest interrupts
      cannot be delivered.  That has turned out to be wrong at least for
      Thunderbolt hotplug ports.
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      eb3b5bf1
    • Lukas Wunner's avatar
      PCI: sysfs: Resume to D0 on function reset · 82c3fbff
      Lukas Wunner authored
      When performing a function reset via sysfs, the device's config space is
      accessed in places such as pcie_flr() and its MMIO space is accessed e.g.
      in reset_ivb_igd(), so ensure accessibility by resuming the device to D0.
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      82c3fbff
    • Lukas Wunner's avatar
      PCI: pciehp: Resume parent to D0 on config space access · 4417aa45
      Lukas Wunner authored
      Ensure accessibility of a hotplug port's config space when accessed via
      sysfs by resuming its parent to D0.
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      4417aa45
    • Lukas Wunner's avatar
      PCI: pciehp: Resume to D0 on enable/disable · 83503074
      Lukas Wunner authored
      pciehp's IRQ thread ensures accessibility of the port by runtime resuming
      its parent to D0.  However when the slot is enabled/disabled, the port
      itself needs to be in D0 because its secondary bus is accessed in:
      
          pciehp_check_link_status(),
          pciehp_configure_device() (both called from board_added())
      and
          pciehp_unconfigure_device() (called from remove_board()).
      
      Thus, acquire a runtime PM ref on enable/disablement of the slot.
      
      Yinghai Lu additionally discovered that some SkyLake servers feature a
      Power Controller for their PCIe hotplug ports (PCIe r3.1, sec 6.7.1.8)
      which requires the port to be in D0 when invoking
      
          pciehp_power_on_slot() (likewise called from board_added()).
      
      If slot power is turned on while in D3hot, link training later fails:
      https://lkml.kernel.org/r/20170205073454.GA253@wunner.de
      
      The spec is silent about such a requirement, but it seems prudent to
      assume that any hotplug port with a Power Controller may need this.
      
      The present commit holds a runtime PM ref whenever slot power is turned
      on and off, but it doesn't keep the port in D0 as long as slot power is
      on.  If vendors determine that's necessary, they need to amend pciehp to
      acquire a runtime PM ref in pciehp_power_on_slot() and release one in
      pciehp_power_off_slot().
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      83503074