1. 23 Jul, 2018 21 commits
    • Lukas Wunner's avatar
      PCI: pciehp: Always enable occupied slot on probe · cdf6b736
      Lukas Wunner authored
      Per PCIe r4.0, sec 6.7.3.4, a "port may optionally send an MSI when
      there are hot-plug events that occur while interrupt generation is
      disabled, and interrupt generation is subsequently enabled."
      
      On probe, we currently clear all event bits in the Slot Status register
      with the notable exception of the Presence Detect Changed bit.  Thereby
      we seek to receive an interrupt for an already occupied slot once event
      notification is enabled.
      
      But because the interrupt is optional, users may have to specify the
      pciehp_force parameter on the command line, which is inconvenient.
      
      Moreover, now that pciehp's event handling has become resilient to
      missed events, a Presence Detect Changed interrupt for a slot which is
      powered on is interpreted as removal of the card.  If the slot has
      already been brought up by the BIOS, receiving such an interrupt on
      probe causes the slot to be powered off and immediately back on, which
      is likewise undesirable.
      
      Avoid both issues by making the behavior of pciehp_force the default and
      clearing the Presence Detect Changed bit on probe.
      
      Note that the stated purpose of pciehp_force per the MODULE_PARM_DESC
      ("Force pciehp, even if OSHP is missing") seems nonsensical because the
      OSHP control method is only relevant for SHCP slots according to the
      PCI Firmware specification r3.0, sec 4.8.
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
      cdf6b736
    • Lukas Wunner's avatar
      PCI: pciehp: Become resilient to missed events · d331710e
      Lukas Wunner authored
      A hotplug port's Slot Status register does not count how often each type
      of event occurred, it only records the fact *that* an event has occurred.
      
      Previously pciehp queued a work item for each event.  But if it missed
      an event, e.g. removal of a card in-between two back-to-back insertions,
      it queued up the wrong work item or no work item at all.  Commit
      fad214b0 ("PCI: pciehp: Process all hotplug events before looking
      for new ones") sought to improve the situation by shrinking the window
      during which events may be missed.
      
      But Stefan Roese reports unbalanced Card present and Link Up events,
      suggesting that we're still missing events if they occur very rapidly.
      Bjorn Helgaas responds that he considers pciehp's event handling
      "baroque" and calls for its simplification and rationalization:
      https://lkml.kernel.org/r/20180202192045.GA53759@bhelgaas-glaptop.roam.corp.google.com
      
      It gets worse once a hotplug port is runtime suspended:  The port can
      signal an interrupt while it and its parents are in D3hot, i.e. while
      it is inaccessible.  By the time we've runtime resumed all parents to D0
      and read the port's Slot Status register, we may have missed an arbitrary
      number of events.  Event handling therefore needs to be reworked to
      become resilient to missed events.
      
      Assume that a Presence Detect Changed event has occurred.
      Consider the following truth table:
      - Slot is in OFF_STATE and is currently empty.    => Do nothing.
        (The event is trailing a Link Down or we've
        missed an insertion and subsequent removal.)
      - Slot is in OFF_STATE and is currently occupied. => Turn the slot on.
      - Slot is in ON_STATE  and is currently empty.    => Turn the slot off.
      - Slot is in ON_STATE  and is currently occupied. => Turn the slot off,
        (Be cautious and assume the card in                then back on.
        the slot isn't the same as before.)
      
      This leads to the following simple algorithm:
      1 If the slot is in ON_STATE, turn it off unconditionally.
      2 If the slot is currently occupied, turn it on.
      
      Because those actions are now carried out synchronously, rather than by
      scheduled work items, pciehp reacts to the *current* situation and
      missed events no longer matter.
      
      Data Link Layer State Changed events can be handled identically to
      Presence Detect Changed events.  Note that in the above truth table,
      a Link Up trailing a Card present event didn't have to be accounted for:
      It is filtered out by pciehp_check_link_status().
      
      As for Attention Button Pressed events, PCIe r4.0, sec 6.7.1.5 says:
      "Once the Power Indicator begins blinking, a 5-second abort interval
      exists during which a second depression of the Attention Button cancels
      the operation."  In other words, the user can only expect the system to
      react to a button press after it starts blinking.  Missed button presses
      that occur in-between are irrelevant.
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Cc: Stefan Roese <sr@denx.de>
      Cc: Mayurkumar Patel <mayurkumar.patel@intel.com>
      Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
      Cc: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
      d331710e
    • Lukas Wunner's avatar
      PCI: pciehp: Tolerate initially unstable link · 6c35a1ac
      Lukas Wunner authored
      When a device is hotplugged, Presence Detect and Link Up events often do
      not occur simultaneously, but with a lag of a few milliseconds.  Only
      the first event received is relevant, the other one can be disregarded.
      
      Moreover, Stefan Roese reports that on certain platforms, Link State and
      Presence Detect may flap for up to 100 ms before stabilizing, suggesting
      that such events should be disregarded for at least this long:
      https://lkml.kernel.org/r/20180130084121.18653-1-sr@denx.de
      
      On slot enablement, pciehp_check_link_status() waits for 100 ms per
      PCIe r4.0, sec 6.7.3.3, then probes the hotplugged device's vendor
      register for up to 1 second.
      
      If this succeeds, the link is definitely up, so ignore any Presence
      Detect or Link State events that occurred up to this point.
      
      pciehp_check_link_status() then checks the Link Training bit in the
      Link Status register.  This is the final opportunity to detect
      inaccessibility of the device and abort slot enablement.  Any link
      or presence change that occurs afterwards will cause the slot to be
      disabled again immediately after attempting to enable it.
      
      The astute reviewer may appreciate that achieving this behavior would be
      more complicated had pciehp not just been converted to enable/disable
      the slot exclusively from the IRQ thread:  When the slot is enabled via
      sysfs, each link or presence flap would otherwise cause the IRQ thread
      to run and it would have to sense that those events are belonging to a
      concurrent slot enablement operation and disregard them.  It would be
      much more difficult than this mere 3 line change.
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Cc: Stefan Roese <sr@denx.de>
      6c35a1ac
    • Lukas Wunner's avatar
      PCI: pciehp: Declare pciehp_enable/disable_slot() static · 25c83b84
      Lukas Wunner authored
      No callers of pciehp_enable/disable_slot() outside of pciehp_ctrl.c
      remain, so declare the functions static.  For now this requires forward
      declarations.  Those can be eliminated by reshuffling functions once the
      ongoing effort to refactor the driver has settled.
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      25c83b84
    • Lukas Wunner's avatar
      PCI: pciehp: Drop enable/disable lock · 1656716d
      Lukas Wunner authored
      Previously slot enablement and disablement could happen concurrently.
      But now it's under the exclusive control of the IRQ thread, rendering
      the locking obsolete.  Drop it.
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      1656716d
    • Lukas Wunner's avatar
      PCI: pciehp: Enable/disable exclusively from IRQ thread · 32a8cef2
      Lukas Wunner authored
      Besides the IRQ thread, there are several other places in the driver
      which enable or disable the slot:
      
      - pciehp_probe() enables the slot if it's occupied and the pciehp_force
        module parameter is used.
      
      - pciehp_resume() enables or disables the slot after system sleep.
      
      - pciehp_queue_pushbutton_work() enables or disables the slot after the
        5 second delay following an Attention Button press.
      
      - pciehp_sysfs_enable_slot() and pciehp_sysfs_disable_slot() enable or
        disable the slot on sysfs write.
      
      This requires locking and complicates pciehp's state machine.
      
      A simplification can be achieved by enabling and disabling the slot
      exclusively from the IRQ thread.
      
      Amend the functions listed above to request slot enable/disablement from
      the IRQ thread by either synthesizing a Presence Detect Changed event or,
      in the case of a disable user request (via sysfs or an Attention Button
      press), submitting a newly introduced force disable request.  The latter
      is needed because the slot shall be forced off despite being occupied.
      For this force disable request, avoid colliding with Slot Status register
      bits by using a bit number greater than 16.
      
      For synchronous execution of requests (on sysfs write), wait for the
      request to finish and retrieve the result.  There can only ever be one
      sysfs write in flight due to the locking in kernfs_fop_write(), hence
      there is no risk of returning the result of a different sysfs request to
      user space.
      
      The POWERON_STATE and POWEROFF_STATE is now no longer entered by the
      above-listed functions, but solely by the IRQ thread when it begins a
      power transition.  Afterwards, it moves to STATIC_STATE.  The same
      applies to canceling the Attention Button work, it likewise becomes an
      IRQ thread only operation.
      
      An immediate consequence is that the POWERON_STATE and POWEROFF_STATE is
      never observed by the IRQ thread itself, only by functions called in a
      different context, such as pciehp_sysfs_enable_slot().  So remove
      handling of these states from pciehp_handle_button_press() and
      pciehp_handle_link_change() which are exclusively called from the IRQ
      thread.
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      32a8cef2
    • Lukas Wunner's avatar
      PCI: pciehp: Track enable/disable status · 9590192f
      Lukas Wunner authored
      handle_button_press_event() currently determines whether the slot has
      been turned on or off by looking at the Power Controller Control bit in
      the Slot Control register.  This assumes that an attention button
      implies presence of a power controller even though that's not mandated
      by the spec.  Moreover the Power Controller Control bit is unreliable
      when a power fault occurs (PCIe r4.0, sec 6.7.1.8).  This issue has
      existed since the driver was introduced in 2004.
      
      Fix by replacing STATIC_STATE with ON_STATE and OFF_STATE and tracking
      whether the slot has been turned on or off.  This is also a required
      ingredient to make pciehp resilient to missed events, which is the
      object of an upcoming commit.
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      9590192f
    • Lukas Wunner's avatar
      PCI: pciehp: Publish to user space last on probe · 774d446b
      Lukas Wunner authored
      The PCI hotplug core has just been refactored to separate slot
      initialization for in-kernel use from publication to user space.
      
      Take advantage of it in pciehp by publishing to user space last on
      probe.  This will allow enable/disablement of the slot exclusively from
      the IRQ thread because the IRQ is requested after initialization for
      in-kernel use (thereby getting its unique name needed by the IRQ thread)
      but before user space is able to submit enable/disable requests.
      
      On teardown, the order is the same in reverse:  The user space interface
      is removed prior to freeing the IRQ and destroying the slot.
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      774d446b
    • Lukas Wunner's avatar
      PCI: hotplug: Demidlayer registration with the core · 51bbf9be
      Lukas Wunner authored
      When a hotplug driver calls pci_hp_register(), all steps necessary for
      registration are carried out in one go, including creation of a kobject
      and addition to sysfs.  That's a problem for pciehp once it's converted
      to enable/disable the slot exclusively from the IRQ thread:  The thread
      needs to be spawned after creation of the kobject (because it uses the
      kobject's name), but before addition to sysfs (because it will handle
      enable/disable requests submitted via sysfs).
      
      pci_hp_deregister() does offer a ->release callback that's invoked
      after deletion from sysfs and before destruction of the kobject.  But
      because pci_hp_register() doesn't offer a counterpart, hotplug drivers'
      ->probe and ->remove code becomes asymmetric, which is error prone
      as recently discovered use-after-free bugs in pciehp's ->remove hook
      have shown.
      
      In a sense, this appears to be a case of the midlayer antipattern:
      
         "The core thesis of the "midlayer mistake" is that midlayers are
          bad and should not exist.  That common functionality which it is
          so tempting to put in a midlayer should instead be provided as
          library routines which can [be] used, augmented, or ignored by
          each bottom level driver independently.  Thus every subsystem
          that supports multiple implementations (or drivers) should
          provide a very thin top layer which calls directly into the
          bottom layer drivers, and a rich library of support code that
          eases the implementation of those drivers.  This library is
          available to, but not forced upon, those drivers."
              --  Neil Brown (2009), https://lwn.net/Articles/336262/
      
      The presence of midlayer traits in the PCI hotplug core might be ascribed
      to its age:  When it was introduced in February 2002, the blessings of a
      library approach might not have been well known:
      https://git.kernel.org/tglx/history/c/a8a2069f432c
      
      For comparison, the driver core does offer split functions for creating
      a kobject (device_initialize()) and addition to sysfs (device_add()) as
      an alternative to carrying out everything at once (device_register()).
      This was introduced in October 2002:
      https://git.kernel.org/tglx/history/c/8b290eb19962
      
      The odd ->release callback in the PCI hotplug core was added in 2003:
      https://git.kernel.org/tglx/history/c/69f8d663b595
      
      Clearly, a library approach would not force every hotplug driver to
      implement a ->release callback, but rather allow the driver to remove
      the sysfs files, release its data structures and finally destroy the
      kobject.  Alternatively, a driver may choose to remove everything with
      pci_hp_deregister(), then release its data structures.
      
      To this end, offer drivers pci_hp_initialize() and pci_hp_add() as a
      split-up version of pci_hp_register().  Likewise, offer pci_hp_del()
      and pci_hp_destroy() as a split-up version of pci_hp_deregister().
      
      Eliminate the ->release callback and move its code into each driver's
      teardown routine.
      
      Declare pci_hp_deregister() void, in keeping with the usual kernel
      pattern that enablement can fail, but disablement cannot.  It only
      returned an error if the caller passed in a NULL pointer or a slot which
      has never or is no longer registered or is sharing its name with another
      slot.  Those would be bugs, so WARN about them.  Few hotplug drivers
      actually checked the return value and those that did only printed a
      useless error message to dmesg.  Remove that.
      
      For most drivers the conversion was straightforward since it doesn't
      matter whether the code in the ->release callback is executed before or
      after destruction of the kobject.  But in the case of ibmphp, it was
      unclear to me whether setting slot_cur->ctrl and slot_cur->bus_on to
      NULL needs to happen before the kobject is destroyed, so I erred on
      the side of caution and ensured that the order stays the same.  Another
      nontrivial case is pnv_php, I've found the list and kref logic difficult
      to understand, however my impression was that it is safe to delete the
      list element and drop the references until after the kobject is
      destroyed.
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>  # drivers/platform/x86
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Scott Murray <scott@spiteful.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
      Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: Corentin Chary <corentin.chary@gmail.com>
      Cc: Darren Hart <dvhart@infradead.org>
      Cc: Andy Shevchenko <andy@infradead.org>
      51bbf9be
    • Lukas Wunner's avatar
      PCI: pciehp: Drop slot workqueue · 55a6b7a6
      Lukas Wunner authored
      Previously the slot workqueue was used to handle events and enable or
      disable the slot.  That's no longer the case as those tasks are done
      synchronously in the IRQ thread.  The slot workqueue is thus merely used
      to handle a button press after the 5 second delay and only one such work
      item may be in flight at any given time.  A separate workqueue isn't
      necessary for this simple task, so use the system workqueue instead.
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      55a6b7a6
    • Lukas Wunner's avatar
      PCI: pciehp: Handle events synchronously · 0e94916e
      Lukas Wunner authored
      Up until now, pciehp's IRQ handler schedules a work item for each event,
      which in turn schedules a work item to enable or disable the slot.  This
      double indirection was necessary because sleeping wasn't allowed in the
      IRQ handler.
      
      However it is now that pciehp has been converted to threaded IRQ handling
      and polling, so handle events synchronously in pciehp_ist() and remove
      the work item infrastructure (with the exception of work items to handle
      a button press after the 5 second delay).
      
      For link or presence change events, move the register read to determine
      the current link or presence state behind acquisition of the slot lock
      to prevent it from becoming stale while the lock is contended.
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      0e94916e
    • Lukas Wunner's avatar
      PCI: pciehp: Stop blinking on slot enable failure · b0ccd9dd
      Lukas Wunner authored
      If the attention button is pressed to power on the slot AND the user
      powers on the slot via sysfs before 5 seconds have elapsed AND powering
      on the slot fails because either the slot is unoccupied OR the latch is
      open, we neglect turning off the green LED so it keeps on blinking.
      
      That's because the error path of pciehp_sysfs_enable_slot() doesn't call
      pciehp_green_led_off(), unlike pciehp_power_thread() which does.
      The bug has been present since 2004 when the driver was introduced.
      
      Fix by deduplicating common code in pciehp_sysfs_enable_slot() and
      pciehp_power_thread() into a wrapper function pciehp_enable_slot() and
      renaming the existing function to __pciehp_enable_slot().  Same for
      pciehp_disable_slot().  This will also simplify the upcoming rework of
      pciehp's event handling.
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      b0ccd9dd
    • Lukas Wunner's avatar
      PCI: pciehp: Convert to threaded polling · ec07a447
      Lukas Wunner authored
      We've just converted pciehp to threaded IRQ handling, but still cannot
      sleep in pciehp_ist() because the function is also called in poll mode,
      which runs in softirq context (from a timer).
      
      Convert poll mode to a kthread so that pciehp_ist() always runs in task
      context.
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      ec07a447
    • Lukas Wunner's avatar
      PCI: pciehp: Convert to threaded IRQ · 7b4ce26b
      Lukas Wunner authored
      pciehp's IRQ handler queues up a work item for each event signaled by
      the hardware.  A more modern alternative is to let a long running
      kthread service the events.  The IRQ handler's sole job is then to check
      whether the IRQ originated from the device in question, acknowledge its
      receipt to the hardware to quiesce the interrupt and wake up the kthread.
      
      One benefit is reduced latency to handle the IRQ, which is a necessity
      for realtime environments.  Another benefit is that we can make pciehp
      simpler and more robust by handling events synchronously in process
      context, rather than asynchronously by queueing up work items.  pciehp's
      usage of work items is a historic artifact, it predates the introduction
      of threaded IRQ handlers by two years.  (The former was introduced in
      2007 with commit 5d386e1a ("pciehp: Event handling rework"), the
      latter in 2009 with commit 3aa551c9 ("genirq: add threaded interrupt
      handler support").)
      
      Convert pciehp to threaded IRQ handling by retrieving the pending events
      in pciehp_isr(), saving them for later consumption by the thread handler
      pciehp_ist() and clearing them in the Slot Status register.
      
      By clearing the Slot Status (and thereby acknowledging the events) in
      pciehp_isr(), we can avoid requesting the IRQ with IRQF_ONESHOT, which
      would have the unpleasant side effect of starving devices sharing the
      IRQ until pciehp_ist() has finished.
      
      pciehp_isr() does not count how many times each event occurred, but
      merely records the fact *that* an event occurred.  If the same event
      occurs a second time before pciehp_ist() is woken, that second event
      will not be recorded separately, which is problematic according to
      commit fad214b0 ("PCI: pciehp: Process all hotplug events before
      looking for new ones") because we may miss removal of a card in-between
      two back-to-back insertions.  We're about to make pciehp_ist() resilient
      to missed events.  The present commit regresses the driver's behavior
      temporarily in order to separate the changes into reviewable chunks.
      This doesn't affect regular slow-motion hotplug, only plug-unplug-plug
      operations that happen in a timespan shorter than wakeup of the IRQ
      thread.
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Mayurkumar Patel <mayurkumar.patel@intel.com>
      Cc: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
      7b4ce26b
    • Lukas Wunner's avatar
      PCI: pciehp: Document struct slot and struct controller · 4aed1cd6
      Lukas Wunner authored
      Document the driver's data structures to lower the barrier to entry for
      contributors.
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      4aed1cd6
    • Lukas Wunner's avatar
      PCI: pciehp: Declare pciehp_unconfigure_device() void · 1d2e2673
      Lukas Wunner authored
      Since commit 0f4bd801 ("PCI: hotplug: Drop checking of PCI_BRIDGE_
      CONTROL in *_unconfigure_device()"), pciehp_unconfigure_device() can no
      longer fail, so declare it and its sole caller remove_board() void, in
      keeping with the usual kernel pattern that enablement can fail, but
      disablement cannot.  No functional change intended.
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
      1d2e2673
    • Lukas Wunner's avatar
      PCI: pciehp: Drop unnecessary NULL pointer check · 6641311d
      Lukas Wunner authored
      pciehp_disable_slot() checks if the ctrl attribute of the slot is NULL
      and bails out if so.  However the function is not called prior to the
      attribute being set in pcie_init_slot(), and pcie_init_slot() is not
      called if ctrl is NULL.  So the check is unnecessary.  Drop it.
      
      It has been present ever since the driver was introduced in 2004, but it
      was already unnecessary back then:
      https://git.kernel.org/tglx/history/c/c16b4b14d980Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      6641311d
    • Lukas Wunner's avatar
      PCI: pciehp: Fix unprotected list iteration in IRQ handler · 1204e35b
      Lukas Wunner authored
      Commit b440bde7 ("PCI: Add pci_ignore_hotplug() to ignore hotplug
      events for a device") iterates over the devices on a hotplug port's
      subordinate bus in pciehp's IRQ handler without acquiring pci_bus_sem.
      It is thus possible for a user to cause a crash by concurrently
      manipulating the device list, e.g. by disabling slot power via sysfs
      on a different CPU or by initiating a remove/rescan via sysfs.
      
      This can't be fixed by acquiring pci_bus_sem because it may sleep.
      The simplest fix is to avoid the list iteration altogether and just
      check the ignore_hotplug flag on the port itself.  This works because
      pci_ignore_hotplug() sets the flag both on the device as well as on its
      parent bridge.
      
      We do lose the ability to print the name of the device blocking hotplug
      in the debug message, but that's probably bearable.
      
      Fixes: b440bde7 ("PCI: Add pci_ignore_hotplug() to ignore hotplug events for a device")
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Cc: stable@vger.kernel.org
      1204e35b
    • Lukas Wunner's avatar
      PCI: pciehp: Fix use-after-free on unplug · 281e878e
      Lukas Wunner authored
      When pciehp is unbound (e.g. on unplug of a Thunderbolt device), the
      hotplug_slot struct is deregistered and thus freed before freeing the
      IRQ.  The IRQ handler and the work items it schedules print the slot
      name referenced from the freed structure in various informational and
      debug log messages, each time resulting in a quadruple dereference of
      freed pointers (hotplug_slot -> pci_slot -> kobject -> name).
      
      At best the slot name is logged as "(null)", at worst kernel memory is
      exposed in logs or the driver crashes:
      
        pciehp 0000:10:00.0:pcie204: Slot((null)): Card not present
      
      An attacker may provoke the bug by unplugging multiple devices on a
      Thunderbolt daisy chain at once.  Unplugging can also be simulated by
      powering down slots via sysfs.  The bug is particularly easy to trigger
      in poll mode.
      
      It has been present since the driver's introduction in 2004:
      https://git.kernel.org/tglx/history/c/c16b4b14d980
      
      Fix by rearranging teardown such that the IRQ is freed first.  Run the
      work items queued by the IRQ handler to completion before freeing the
      hotplug_slot struct by draining the work queue from the ->release_slot
      callback which is invoked by pci_hp_deregister().
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Cc: stable@vger.kernel.org # v2.6.4
      281e878e
    • Lukas Wunner's avatar
      PCI: hotplug: Don't leak pci_slot on registration failure · 4ce64358
      Lukas Wunner authored
      If addition of sysfs files fails on registration of a hotplug slot, the
      struct pci_slot as well as the entry in the slot_list is leaked.  The
      issue has been present since the hotplug core was introduced in 2002:
      https://git.kernel.org/tglx/history/c/a8a2069f432c
      
      Perhaps the idea was that even though sysfs addition fails, the slot
      should still be usable.  But that's not how drivers use the interface,
      they abort probe if a non-zero value is returned.
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Cc: stable@vger.kernel.org # v2.4.15+
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      4ce64358
    • Lukas Wunner's avatar
      PCI: hotplug: Delete skeleton driver · b4efce5c
      Lukas Wunner authored
      Ten years ago, commit 58319b80 ("PCI: Hotplug core: remove 'name'")
      dropped the name element from struct hotplug_slot but neglected to update
      the skeleton driver.
      
      That same year, commit f46753c5 ("PCI: introduce pci_slot") raised the
      number of arguments to pci_hp_register() from one to four.
      
      Fourteen years ago, historic commit 7ab60fc1 ("PCI Hotplug skeleton:
      final cleanups") removed all usages of the retval variable from
      pcihp_skel_init() but not the variable itself, provoking a compiler
      warning: https://git.kernel.org/tglx/history/c/7ab60fc1b8e7
      
      It seems fair to assume the driver hasn't been used as a template for a new
      driver in a while.  Per Bjorn's and Christoph's preference, delete it.
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Cc: Christoph Hellwig <hch@lst.de>
      b4efce5c
  2. 26 Jun, 2018 2 commits
  3. 16 Jun, 2018 8 commits
    • Linus Torvalds's avatar
      Linux 4.18-rc1 · ce397d21
      Linus Torvalds authored
      ce397d21
    • Linus Torvalds's avatar
      Merge tag 'for-linus-20180616' of git://git.kernel.dk/linux-block · 265c5596
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "A collection of fixes that should go into -rc1. This contains:
      
         - bsg_open vs bsg_unregister race fix (Anatoliy)
      
         - NVMe pull request from Christoph, with fixes for regressions in
           this window, FC connect/reconnect path code unification, and a
           trace point addition.
      
         - timeout fix (Christoph)
      
         - remove a few unused functions (Christoph)
      
         - blk-mq tag_set reinit fix (Roman)"
      
      * tag 'for-linus-20180616' of git://git.kernel.dk/linux-block:
        bsg: fix race of bsg_open and bsg_unregister
        block: remov blk_queue_invalidate_tags
        nvme-fabrics: fix and refine state checks in __nvmf_check_ready
        nvme-fabrics: handle the admin-only case properly in nvmf_check_ready
        nvme-fabrics: refactor queue ready check
        blk-mq: remove blk_mq_tagset_iter
        nvme: remove nvme_reinit_tagset
        nvme-fc: fix nulling of queue data on reconnect
        nvme-fc: remove reinit_request routine
        blk-mq: don't time out requests again that are in the timeout handler
        nvme-fc: change controllers first connect to use reconnect path
        nvme: don't rely on the changed namespace list log
        nvmet: free smart-log buffer after use
        nvme-rdma: fix error flow during mapping request data
        nvme: add bio remapping tracepoint
        nvme: fix NULL pointer dereference in nvme_init_subsystem
        blk-mq: reinit q->tag_set_list entry only after grace period
      265c5596
    • Linus Torvalds's avatar
      Merge tag 'docs-broken-links' of git://linuxtv.org/mchehab/experimental · 5e7b9212
      Linus Torvalds authored
      Pull documentation fixes from Mauro Carvalho Chehab:
       "This solves a series of broken links for files under Documentation,
        and improves a script meant to detect such broken links (see
        scripts/documentation-file-ref-check).
      
        The changes on this series are:
      
         - can.rst: fix a footnote reference;
      
         - crypto_engine.rst: Fix two parsing warnings;
      
         - Fix a lot of broken references to Documentation/*;
      
         - improve the scripts/documentation-file-ref-check script, in order
           to help detecting/fixing broken references, preventing
           false-positives.
      
        After this patch series, only 33 broken references to doc files are
        detected by scripts/documentation-file-ref-check"
      
      * tag 'docs-broken-links' of git://linuxtv.org/mchehab/experimental: (26 commits)
        fix a series of Documentation/ broken file name references
        Documentation: rstFlatTable.py: fix a broken reference
        ABI: sysfs-devices-system-cpu: remove a broken reference
        devicetree: fix a series of wrong file references
        devicetree: fix name of pinctrl-bindings.txt
        devicetree: fix some bindings file names
        MAINTAINERS: fix location of DT npcm files
        MAINTAINERS: fix location of some display DT bindings
        kernel-parameters.txt: fix pointers to sound parameters
        bindings: nvmem/zii: Fix location of nvmem.txt
        docs: Fix more broken references
        scripts/documentation-file-ref-check: check tools/*/Documentation
        scripts/documentation-file-ref-check: get rid of false-positives
        scripts/documentation-file-ref-check: hint: dash or underline
        scripts/documentation-file-ref-check: add a fix logic for DT
        scripts/documentation-file-ref-check: accept more wildcards at filenames
        scripts/documentation-file-ref-check: fix help message
        media: max2175: fix location of driver's companion documentation
        media: v4l: fix broken video4linux docs locations
        media: dvb: point to the location of the old README.dvb-usb file
        ...
      5e7b9212
    • Linus Torvalds's avatar
      Merge tag 'fsnotify_for_v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs · dbb2816f
      Linus Torvalds authored
      Pull fsnotify updates from Jan Kara:
       "fsnotify cleanups unifying handling of different watch types.
      
        This is the shortened fsnotify series from Amir with the last five
        patches pulled out. Amir has modified those patches to not change
        struct inode but obviously it's too late for those to go into this
        merge window"
      
      * tag 'fsnotify_for_v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
        fsnotify: add fsnotify_add_inode_mark() wrappers
        fanotify: generalize fanotify_should_send_event()
        fsnotify: generalize send_to_group()
        fsnotify: generalize iteration of marks by object type
        fsnotify: introduce marks iteration helpers
        fsnotify: remove redundant arguments to handle_event()
        fsnotify: use type id to identify connector object type
      dbb2816f
    • Linus Torvalds's avatar
      Merge tag 'fbdev-v4.18' of git://github.com/bzolnier/linux · 644f2639
      Linus Torvalds authored
      Pull fbdev updates from Bartlomiej Zolnierkiewicz:
       "There is nothing really major here, few small fixes, some cleanups and
        dead drivers removal:
      
         - mark omapfb drivers as orphans in MAINTAINERS file (Tomi Valkeinen)
      
         - add missing module license tags to omap/omapfb driver (Arnd
           Bergmann)
      
         - add missing GPIOLIB dependendy to omap2/omapfb driver (Arnd
           Bergmann)
      
         - convert savagefb, aty128fb & radeonfb drivers to use msleep & co.
           (Jia-Ju Bai)
      
         - allow COMPILE_TEST build for viafb driver (media part was reviewed
           by media subsystem Maintainer)
      
         - remove unused MERAM support from sh_mobile_lcdcfb and shmob-drm
           drivers (drm parts were acked by shmob-drm driver Maintainer)
      
         - remove unused auo_k190xfb drivers
      
         - misc cleanups (Souptick Joarder, Wolfram Sang, Markus Elfring, Andy
           Shevchenko, Colin Ian King)"
      
      * tag 'fbdev-v4.18' of git://github.com/bzolnier/linux: (26 commits)
        fb_omap2: add gpiolib dependency
        video/omap: add module license tags
        MAINTAINERS: make omapfb orphan
        video: fbdev: pxafb: match_string() conversion fixup
        video: fbdev: nvidia: fix spelling mistake: "scaleing" -> "scaling"
        video: fbdev: fix spelling mistake: "frambuffer" -> "framebuffer"
        video: fbdev: pxafb: Convert to use match_string() helper
        video: fbdev: via: allow COMPILE_TEST build
        video: fbdev: remove unused sh_mobile_meram driver
        drm: shmobile: remove unused MERAM support
        video: fbdev: sh_mobile_lcdcfb: remove unused MERAM support
        video: fbdev: remove unused auo_k190xfb drivers
        video: omap: Improve a size determination in omapfb_do_probe()
        video: sm501fb: Improve a size determination in sm501fb_probe()
        video: fbdev-MMP: Improve a size determination in path_init()
        video: fbdev-MMP: Delete an error message for a failed memory allocation in two functions
        video: auo_k190x: Delete an error message for a failed memory allocation in auok190x_common_probe()
        video: sh_mobile_lcdcfb: Delete an error message for a failed memory allocation in two functions
        video: sh_mobile_meram: Delete an error message for a failed memory allocation in sh_mobile_meram_probe()
        video: fbdev: sh_mobile_meram: Drop SUPERH platform dependency
        ...
      644f2639
    • Linus Torvalds's avatar
      Merge branch 'afs-proc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 35773c93
      Linus Torvalds authored
      Pull AFS updates from Al Viro:
       "Assorted AFS stuff - ended up in vfs.git since most of that consists
        of David's AFS-related followups to Christoph's procfs series"
      
      * 'afs-proc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        afs: Optimise callback breaking by not repeating volume lookup
        afs: Display manually added cells in dynamic root mount
        afs: Enable IPv6 DNS lookups
        afs: Show all of a server's addresses in /proc/fs/afs/servers
        afs: Handle CONFIG_PROC_FS=n
        proc: Make inline name size calculation automatic
        afs: Implement network namespacing
        afs: Mark afs_net::ws_cell as __rcu and set using rcu functions
        afs: Fix a Sparse warning in xdr_decode_AFSFetchStatus()
        proc: Add a way to make network proc files writable
        afs: Rearrange fs/afs/proc.c to remove remaining predeclarations.
        afs: Rearrange fs/afs/proc.c to move the show routines up
        afs: Rearrange fs/afs/proc.c by moving fops and open functions down
        afs: Move /proc management functions to the end of the file
      35773c93
    • Linus Torvalds's avatar
      Merge branch 'work.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 29d6849d
      Linus Torvalds authored
      Pull compat updates from Al Viro:
       "Some biarch patches - getting rid of assorted (mis)uses of
        compat_alloc_user_space().
      
        Not much in that area this cycle..."
      
      * 'work.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        orangefs: simplify compat ioctl handling
        signalfd: lift sigmask copyin and size checks to callers of do_signalfd4()
        vmsplice(): lift importing iovec into vmsplice(2) and compat counterpart
      29d6849d
    • Linus Torvalds's avatar
      Merge branch 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · a5b729ea
      Linus Torvalds authored
      Pull aio fixes from Al Viro:
       "Assorted AIO followups and fixes"
      
      * 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        eventpoll: switch to ->poll_mask
        aio: only return events requested in poll_mask() for IOCB_CMD_POLL
        eventfd: only return events requested in poll_mask()
        aio: mark __aio_sigset::sigmask const
      a5b729ea
  4. 15 Jun, 2018 9 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 9215310c
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Various netfilter fixlets from Pablo and the netfilter team.
      
       2) Fix regression in IPVS caused by lack of PMTU exceptions on local
          routes in ipv6, from Julian Anastasov.
      
       3) Check pskb_trim_rcsum for failure in DSA, from Zhouyang Jia.
      
       4) Don't crash on poll in TLS, from Daniel Borkmann.
      
       5) Revert SO_REUSE{ADDR,PORT} change, it regresses various things
          including Avahi mDNS. From Bart Van Assche.
      
       6) Missing of_node_put in qcom/emac driver, from Yue Haibing.
      
       7) We lack checking of the TCP checking in one special case during SYN
          receive, from Frank van der Linden.
      
       8) Fix module init error paths of mac80211 hwsim, from Johannes Berg.
      
       9) Handle 802.1ad properly in stmmac driver, from Elad Nachman.
      
      10) Must grab HW caps before doing quirk checks in stmmac driver, from
          Jose Abreu.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (81 commits)
        net: stmmac: Run HWIF Quirks after getting HW caps
        neighbour: skip NTF_EXT_LEARNED entries during forced gc
        net: cxgb3: add error handling for sysfs_create_group
        tls: fix waitall behavior in tls_sw_recvmsg
        tls: fix use-after-free in tls_push_record
        l2tp: filter out non-PPP sessions in pppol2tp_tunnel_ioctl()
        l2tp: reject creation of non-PPP sessions on L2TPv2 tunnels
        mlxsw: spectrum_switchdev: Fix port_vlan refcounting
        mlxsw: spectrum_router: Align with new route replace logic
        mlxsw: spectrum_router: Allow appending to dev-only routes
        ipv6: Only emit append events for appended routes
        stmmac: added support for 802.1ad vlan stripping
        cfg80211: fix rcu in cfg80211_unregister_wdev
        mac80211: Move up init of TXQs
        mac80211_hwsim: fix module init error paths
        cfg80211: initialize sinfo in cfg80211_get_station
        nl80211: fix some kernel doc tag mistakes
        hv_netvsc: Fix the variable sizes in ipsecv2 and rsc offload
        rds: avoid unenecessary cong_update in loop transport
        l2tp: clean up stale tunnel or session in pppol2tp_connect's error path
        ...
      9215310c
    • Linus Torvalds's avatar
      Merge tag 'modules-for-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux · de7f01c2
      Linus Torvalds authored
      Pull module updates from Jessica Yu:
       "Minor code cleanup and also allow sig_enforce param to be shown in
        sysfs with CONFIG_MODULE_SIG_FORCE"
      
      * tag 'modules-for-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
        module: Allow to always show the status of modsign
        module: Do not access sig_enforce directly
      de7f01c2
    • Linus Torvalds's avatar
      Merge branch 'for-linus-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml · 8d1e5133
      Linus Torvalds authored
      Pull uml updates from Richard Weinberger:
       "Minor updates for UML:
      
         - fixes for our new vector network driver by Anton
      
         - initcall cleanup by Alexander
      
         - We have a new mailinglist, sourceforge.net sucks"
      
      * 'for-linus-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
        um: Fix raw interface options
        um: Fix initialization of vector queues
        um: remove uml initcalls
        um: Update mailing list address
      8d1e5133
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-4.18-merge_window' of... · 6a4d4b32
      Linus Torvalds authored
      Merge tag 'riscv-for-linus-4.18-merge_window' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux
      
      Pull RISC-V updates from Palmer Dabbelt:
       "This contains some small RISC-V updates I'd like to target for 4.18.
      
        They are all fairly small this time. Here's a short summary, there's
        more info in the commits/merges:
      
         - a fix to __clear_user to respect the passed arguments.
      
         - enough support for the perf subsystem to work with RISC-V's ISA
           defined performance counters.
      
         - support for sparse and cleanups suggested by it.
      
         - support for R_RISCV_32 (a relocation, not the 32-bit ISA).
      
         - some MAINTAINERS cleanups.
      
         - the addition of CONFIG_HVC_RISCV_SBI to our defconfig, as it's
           always present.
      
        I've given these a simple build+boot test"
      
      * tag 'riscv-for-linus-4.18-merge_window' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
        RISC-V: Add CONFIG_HVC_RISCV_SBI=y to defconfig
        RISC-V: Handle R_RISCV_32 in modules
        riscv/ftrace: Export _mcount when DYNAMIC_FTRACE isn't set
        riscv: add riscv-specific predefines to CHECKFLAGS
        riscv: split the declaration of __copy_user
        riscv: no __user for probe_kernel_address()
        riscv: use NULL instead of a plain 0
        perf: riscv: Add Document for Future Porting Guide
        perf: riscv: preliminary RISC-V support
        MAINTAINERS: Update Albert's email, he's back at Berkeley
        MAINTAINERS: Add myself as a maintainer for SiFive's drivers
        riscv: Fix the bug in memory access fixup code
      6a4d4b32
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 8949170c
      Linus Torvalds authored
      Pull more kvm updates from Paolo Bonzini:
       "Mostly the PPC part of the release, but also switching to Arnd's fix
        for the hyperv config issue and a typo fix.
      
        Main PPC changes:
      
         - reimplement the MMIO instruction emulation
      
         - transactional memory support for PR KVM
      
         - improve radix page table handling"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (63 commits)
        KVM: x86: VMX: redo fix for link error without CONFIG_HYPERV
        KVM: x86: fix typo at kvm_arch_hardware_setup comment
        KVM: PPC: Book3S PR: Fix failure status setting in tabort. emulation
        KVM: PPC: Book3S PR: Enable use on POWER9 bare-metal hosts in HPT mode
        KVM: PPC: Book3S PR: Don't let PAPR guest set MSR hypervisor bit
        KVM: PPC: Book3S PR: Fix failure status setting in treclaim. emulation
        KVM: PPC: Book3S PR: Fix MSR setting when delivering interrupts
        KVM: PPC: Book3S PR: Handle additional interrupt types
        KVM: PPC: Book3S PR: Enable kvmppc_get/set_one_reg_pr() for HTM registers
        KVM: PPC: Book3S: Remove load/put vcpu for KVM_GET_REGS/KVM_SET_REGS
        KVM: PPC: Remove load/put vcpu for KVM_GET/SET_ONE_REG ioctl
        KVM: PPC: Move vcpu_load/vcpu_put down to each ioctl case in kvm_arch_vcpu_ioctl
        KVM: PPC: Book3S PR: Enable HTM for PR KVM for KVM_CHECK_EXTENSION ioctl
        KVM: PPC: Book3S PR: Support TAR handling for PR KVM HTM
        KVM: PPC: Book3S PR: Add guard code to prevent returning to guest with PR=0 and Transactional state
        KVM: PPC: Book3S PR: Add emulation for tabort. in privileged state
        KVM: PPC: Book3S PR: Add emulation for trechkpt.
        KVM: PPC: Book3S PR: Add emulation for treclaim.
        KVM: PPC: Book3S PR: Restore NV regs after emulating mfspr from TM SPRs
        KVM: PPC: Book3S PR: Always fail transactions in guest privileged state
        ...
      8949170c
    • Linus Torvalds's avatar
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · 2f3f0566
      Linus Torvalds authored
      Pull virtio updates from Michael Tsirkin:
       "virtio, vhost: features, fixes
      
         - PCI virtual function support for virtio
      
         - DMA barriers for virtio strong barriers
      
         - bugfixes"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        virtio: update the comments for transport features
        virtio_pci: support enabling VFs
        vhost: fix info leak due to uninitialized memory
        virtio_ring: switch to dma_XX barriers for rpmsg
      2f3f0566
    • Mauro Carvalho Chehab's avatar
      fix a series of Documentation/ broken file name references · 44348e8a
      Mauro Carvalho Chehab authored
      As files move around, their previous links break. Fix the
      references for them.
      Acked-by: default avatarAndy Shevchenko <andy.shevchenko@gmail.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Acked-by: default avatarJonathan Corbet <corbet@lwn.net>
      44348e8a
    • Mauro Carvalho Chehab's avatar
      Documentation: rstFlatTable.py: fix a broken reference · 315e6bc5
      Mauro Carvalho Chehab authored
      The old HOWTO was removed a long time ago. The flat table
      version is not metioned elsewhere, so just get rid of the
      text.
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Acked-by: default avatarJonathan Corbet <corbet@lwn.net>
      315e6bc5
    • Mauro Carvalho Chehab's avatar
      ABI: sysfs-devices-system-cpu: remove a broken reference · 6ec71b20
      Mauro Carvalho Chehab authored
      This file doesn't exist anymore:
      	Documentation/cpu-freq/user-guide.txt
      
      As the ABI already points to Documentation/cpu-freq, just
      remove the broken link and the associated text.
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Acked-by: default avatarJonathan Corbet <corbet@lwn.net>
      6ec71b20