1. 18 Feb, 2019 5 commits
    • Thomas Gleixner's avatar
      genirq/affinity: Remove the leftovers of the original set support · a6a309ed
      Thomas Gleixner authored
      Now that the NVME driver is converted over to the calc_set() callback, the
      workarounds of the original set support can be removed.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Bjorn Helgaas <helgaas@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: linux-block@vger.kernel.org
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: linux-nvme@lists.infradead.org
      Cc: linux-pci@vger.kernel.org
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Sumit Saxena <sumit.saxena@broadcom.com>
      Cc: Kashyap Desai <kashyap.desai@broadcom.com>
      Cc: Shivasharan Srikanteshwara <shivasharan.srikanteshwara@broadcom.com>
      Link: https://lkml.kernel.org/r/20190216172228.689834224@linutronix.de
      a6a309ed
    • Ming Lei's avatar
      nvme-pci: Simplify interrupt allocation · 612b7286
      Ming Lei authored
      The NVME PCI driver contains a tedious mechanism for interrupt
      allocation, which is necessary to adjust the number and size of interrupt
      sets to the maximum available number of interrupts which depends on the
      underlying PCI capabilities and the available CPU resources.
      
      It works around the former short comings of the PCI and core interrupt
      allocation mechanims in combination with interrupt sets.
      
      The PCI interrupt allocation function allows to provide a maximum and a
      minimum number of interrupts to be allocated and tries to allocate as
      many as possible. This worked without driver interaction as long as there
      was only a single set of interrupts to handle.
      
      With the addition of support for multiple interrupt sets in the generic
      affinity spreading logic, which is invoked from the PCI interrupt
      allocation, the adaptive loop in the PCI interrupt allocation did not
      work for multiple interrupt sets. The reason is that depending on the
      total number of interrupts which the PCI allocation adaptive loop tries
      to allocate in each step, the number and the size of the interrupt sets
      need to be adapted as well. Due to the way the interrupt sets support was
      implemented there was no way for the PCI interrupt allocation code or the
      core affinity spreading mechanism to invoke a driver specific function
      for adapting the interrupt sets configuration.
      
      As a consequence the driver had to implement another adaptive loop around
      the PCI interrupt allocation function and calling that with maximum and
      minimum interrupts set to the same value. This ensured that the
      allocation either succeeded or immediately failed without any attempt to
      adjust the number of interrupts in the PCI code.
      
      The core code now allows drivers to provide a callback to recalculate the
      number and the size of interrupt sets during PCI interrupt allocation,
      which in turn allows the PCI interrupt allocation function to be called
      in the same way as with a single set of interrupts. The PCI code handles
      the adaptive loop and the interrupt affinity spreading mechanism invokes
      the driver callback to adapt the interrupt set configuration to the
      current loop value. This replaces the adaptive loop in the driver
      completely.
      
      Implement the NVME specific callback which adjusts the interrupt sets
      configuration and remove the adaptive allocation loop.
      
      [ tglx: Simplify the callback further and restore the dropped adjustment of
        	number of sets ]
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Bjorn Helgaas <helgaas@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: linux-block@vger.kernel.org
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: linux-nvme@lists.infradead.org
      Cc: linux-pci@vger.kernel.org
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Sumit Saxena <sumit.saxena@broadcom.com>
      Cc: Kashyap Desai <kashyap.desai@broadcom.com>
      Cc: Shivasharan Srikanteshwara <shivasharan.srikanteshwara@broadcom.com>
      Link: https://lkml.kernel.org/r/20190216172228.602546658@linutronix.de
      
      612b7286
    • Ming Lei's avatar
      genirq/affinity: Add new callback for (re)calculating interrupt sets · c66d4bd1
      Ming Lei authored
      The interrupt affinity spreading mechanism supports to spread out
      affinities for one or more interrupt sets. A interrupt set contains one or
      more interrupts. Each set is mapped to a specific functionality of a
      device, e.g. general I/O queues and read I/O queus of multiqueue block
      devices.
      
      The number of interrupts per set is defined by the driver. It depends on
      the total number of available interrupts for the device, which is
      determined by the PCI capabilites and the availability of underlying CPU
      resources, and the number of queues which the device provides and the
      driver wants to instantiate.
      
      The driver passes initial configuration for the interrupt allocation via a
      pointer to struct irq_affinity.
      
      Right now the allocation mechanism is complex as it requires to have a loop
      in the driver to determine the maximum number of interrupts which are
      provided by the PCI capabilities and the underlying CPU resources.  This
      loop would have to be replicated in every driver which wants to utilize
      this mechanism. That's unwanted code duplication and error prone.
      
      In order to move this into generic facilities it is required to have a
      mechanism, which allows the recalculation of the interrupt sets and their
      size, in the core code. As the core code does not have any knowledge about the
      underlying device, a driver specific callback is required in struct
      irq_affinity, which can be invoked by the core code. The callback gets the
      number of available interupts as an argument, so the driver can calculate the
      corresponding number and size of interrupt sets.
      
      At the moment the struct irq_affinity pointer which is handed in from the
      driver and passed through to several core functions is marked 'const', but for
      the callback to be able to modify the data in the struct it's required to
      remove the 'const' qualifier.
      
      Add the optional callback to struct irq_affinity, which allows drivers to
      recalculate the number and size of interrupt sets and remove the 'const'
      qualifier.
      
      For simple invocations, which do not supply a callback, a default callback
      is installed, which just sets nr_sets to 1 and transfers the number of
      spreadable vectors to the set_size array at index 0.
      
      This is for now guarded by a check for nr_sets != 0 to keep the NVME driver
      working until it is converted to the callback mechanism.
      
      To make sure that the driver configuration is correct under all circumstances
      the callback is invoked even when there are no interrupts for queues left,
      i.e. the pre/post requirements already exhaust the numner of available
      interrupts.
      
      At the PCI layer irq_create_affinity_masks() has to be invoked even for the
      case where the legacy interrupt is used. That ensures that the callback is
      invoked and the device driver can adjust to that situation.
      
      [ tglx: Fixed the simple case (no sets required). Moved the sanity check
        	for nr_sets after the invocation of the callback so it catches
        	broken drivers. Fixed the kernel doc comments for struct
        	irq_affinity and de-'This patch'-ed the changelog ]
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Bjorn Helgaas <helgaas@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: linux-block@vger.kernel.org
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: linux-nvme@lists.infradead.org
      Cc: linux-pci@vger.kernel.org
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Sumit Saxena <sumit.saxena@broadcom.com>
      Cc: Kashyap Desai <kashyap.desai@broadcom.com>
      Cc: Shivasharan Srikanteshwara <shivasharan.srikanteshwara@broadcom.com>
      Link: https://lkml.kernel.org/r/20190216172228.512444498@linutronix.de
      
      c66d4bd1
    • Ming Lei's avatar
      genirq/affinity: Store interrupt sets size in struct irq_affinity · 9cfef55b
      Ming Lei authored
      The interrupt affinity spreading mechanism supports to spread out
      affinities for one or more interrupt sets. A interrupt set contains one
      or more interrupts. Each set is mapped to a specific functionality of a
      device, e.g. general I/O queues and read I/O queus of multiqueue block
      devices.
      
      The number of interrupts per set is defined by the driver. It depends on
      the total number of available interrupts for the device, which is
      determined by the PCI capabilites and the availability of underlying CPU
      resources, and the number of queues which the device provides and the
      driver wants to instantiate.
      
      The driver passes initial configuration for the interrupt allocation via
      a pointer to struct irq_affinity.
      
      Right now the allocation mechanism is complex as it requires to have a
      loop in the driver to determine the maximum number of interrupts which
      are provided by the PCI capabilities and the underlying CPU resources.
      This loop would have to be replicated in every driver which wants to
      utilize this mechanism. That's unwanted code duplication and error
      prone.
      
      In order to move this into generic facilities it is required to have a
      mechanism, which allows the recalculation of the interrupt sets and
      their size, in the core code. As the core code does not have any
      knowledge about the underlying device, a driver specific callback will
      be added to struct affinity_desc, which will be invoked by the core
      code. The callback will get the number of available interupts as an
      argument, so the driver can calculate the corresponding number and size
      of interrupt sets.
      
      To support this, two modifications for the handling of struct irq_affinity
      are required:
      
      1) The (optional) interrupt sets size information is contained in a
         separate array of integers and struct irq_affinity contains a
         pointer to it.
      
         This is cumbersome and as the maximum number of interrupt sets is small,
         there is no reason to have separate storage. Moving the size array into
         struct affinity_desc avoids indirections and makes the code simpler.
      
      2) At the moment the struct irq_affinity pointer which is handed in from
         the driver and passed through to several core functions is marked
         'const'.
      
         With the upcoming callback to recalculate the number and size of
         interrupt sets, it's necessary to remove the 'const'
         qualifier. Otherwise the callback would not be able to update the data.
      
      Implement #1 and store the interrupt sets size in 'struct irq_affinity'.
      
      No functional change.
      
      [ tglx: Fixed the memcpy() size so it won't copy beyond the size of the
        	source. Fixed the kernel doc comments for struct irq_affinity and
        	de-'This patch'-ed the changelog ]
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Bjorn Helgaas <helgaas@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: linux-block@vger.kernel.org
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: linux-nvme@lists.infradead.org
      Cc: linux-pci@vger.kernel.org
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Sumit Saxena <sumit.saxena@broadcom.com>
      Cc: Kashyap Desai <kashyap.desai@broadcom.com>
      Cc: Shivasharan Srikanteshwara <shivasharan.srikanteshwara@broadcom.com>
      Link: https://lkml.kernel.org/r/20190216172228.423723127@linutronix.de
      
      9cfef55b
    • Thomas Gleixner's avatar
      genirq/affinity: Code consolidation · 0145c30e
      Thomas Gleixner authored
      All information and calculations in the interrupt affinity spreading code
      is strictly unsigned int. Though the code uses int all over the place.
      
      Convert it over to unsigned int.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Bjorn Helgaas <helgaas@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: linux-block@vger.kernel.org
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: linux-nvme@lists.infradead.org
      Cc: linux-pci@vger.kernel.org
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Sumit Saxena <sumit.saxena@broadcom.com>
      Cc: Kashyap Desai <kashyap.desai@broadcom.com>
      Cc: Shivasharan Srikanteshwara <shivasharan.srikanteshwara@broadcom.com>
      Link: https://lkml.kernel.org/r/20190216172228.336424556@linutronix.de
      0145c30e
  2. 14 Feb, 2019 1 commit
  3. 13 Feb, 2019 1 commit
  4. 10 Feb, 2019 5 commits
    • Matthias Kaehlcke's avatar
      softirq: Don't skip softirq execution when softirq thread is parking · 1342d808
      Matthias Kaehlcke authored
      When a CPU is unplugged the kernel threads of this CPU are parked (see
      smpboot_park_threads()). kthread_park() is used to mark each thread as
      parked and wake it up, so it can complete the process of parking itselfs
      (see smpboot_thread_fn()).
      
      If local softirqs are pending on interrupt exit invoke_softirq() is called
      to process the softirqs, however it skips processing when the softirq
      kernel thread of the local CPU is scheduled to run. The softirq kthread is
      one of the threads that is parked when a CPU is unplugged. Parking the
      kthread wakes it up, however only to complete the parking process, not to
      process the pending softirqs. Hence processing of softirqs at the end of an
      interrupt is skipped, but not done elsewhere, which can result in warnings
      about pending softirqs when a CPU is unplugged:
      
      /sys/devices/system/cpu # echo 0 > cpu4/online
      [ ... ] NOHZ: local_softirq_pending 02
      [ ... ] NOHZ: local_softirq_pending 202
      [ ... ] CPU4: shutdown
      [ ... ] psci: CPU4 killed.
      
      Don't skip processing of softirqs at the end of an interrupt when the
      softirq thread of the CPU is parking.
      Signed-off-by: default avatarMatthias Kaehlcke <mka@chromium.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Douglas Anderson <dianders@chromium.org>
      Cc: Stephen Boyd <swboyd@chromium.org>
      Link: https://lkml.kernel.org/r/20190128234625.78241-3-mka@chromium.org
      1342d808
    • Matthias Kaehlcke's avatar
      kthread: Add __kthread_should_park() · 0121805d
      Matthias Kaehlcke authored
      kthread_should_park() is used to check if the calling kthread ('current')
      should park, but there is no function to check whether an arbitrary kthread
      should be parked. The latter is required to plug a CPU hotplug race vs. a
      parking ksoftirqd thread.
      
      The new __kthread_should_park() receives a task_struct as parameter to
      check if the corresponding kernel thread should be parked.
      
      Call __kthread_should_park() from kthread_should_park() to avoid code
      duplication.
      Signed-off-by: default avatarMatthias Kaehlcke <mka@chromium.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Douglas Anderson <dianders@chromium.org>
      Cc: Stephen Boyd <swboyd@chromium.org>
      Link: https://lkml.kernel.org/r/20190128234625.78241-2-mka@chromium.org
      0121805d
    • Thomas Gleixner's avatar
      proc/stat: Make the interrupt statistics more efficient · c2da3f1b
      Thomas Gleixner authored
      Waiman reported that on large systems with a large amount of interrupts the
      readout of /proc/stat takes a long time to sum up the interrupt
      statistics. In principle this is not a problem. but for unknown reasons
      some enterprise quality software reads /proc/stat with a high frequency.
      
      The reason for this is that interrupt statistics are accounted per cpu. So
      the /proc/stat logic has to sum up the interrupt stats for each interrupt.
      
      The interrupt core provides now a per interrupt summary counter which can
      be used to avoid the summation loops completely except for interrupts
      marked PER_CPU which are only a small fraction of the interrupt space if at
      all.
      
      Another simplification is to iterate only over the active interrupts and
      skip the potentially large gaps in the interrupt number space and just
      print zeros for the gaps without going into the interrupt core in the first
      place.
      
      Waiman provided test results from a 4-socket IvyBridge-EX system (60-core
      120-thread, 3016 irqs) excuting a test program which reads /proc/stat
      50,000 times:
      
         Before: 18.436s (sys 18.380s)
         After:   3.769s (sys  3.742s)
      Reported-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Reviewed-by: default avatarWaiman Long <longman@redhat.com>
      Reviewed-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Daniel Colascione <dancol@google.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Link: https://lkml.kernel.org/r/20190208135021.013828701@linutronix.de
      c2da3f1b
    • Thomas Gleixner's avatar
      genirq: Avoid summation loops for /proc/stat · 1136b072
      Thomas Gleixner authored
      Waiman reported that on large systems with a large amount of interrupts the
      readout of /proc/stat takes a long time to sum up the interrupt
      statistics. In principle this is not a problem. but for unknown reasons
      some enterprise quality software reads /proc/stat with a high frequency.
      
      The reason for this is that interrupt statistics are accounted per cpu. So
      the /proc/stat logic has to sum up the interrupt stats for each interrupt.
      
      This can be largely avoided for interrupts which are not marked as
      'PER_CPU' interrupts by simply adding a per interrupt summation counter
      which is incremented along with the per interrupt per cpu counter.
      
      The PER_CPU interrupts need to avoid that and use only per cpu accounting
      because they share the interrupt number and the interrupt descriptor and
      concurrent updates would conflict or require unwanted synchronization.
      Reported-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarWaiman Long <longman@redhat.com>
      Reviewed-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Daniel Colascione <dancol@google.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Link: https://lkml.kernel.org/r/20190208135020.925487496@linutronix.de
      
      
      8<-------------
      
      v2: Undo the unintentional layout change of struct irq_desc.
      
       include/linux/irqdesc.h |    1 +
       kernel/irq/chip.c       |   12 ++++++++++--
       kernel/irq/internals.h  |    8 +++++++-
       kernel/irq/irqdesc.c    |    7 ++++++-
       4 files changed, 24 insertions(+), 4 deletions(-)
      
      1136b072
    • Ming Lei's avatar
      genirq/affinity: Move allocation of 'node_to_cpumask' to irq_build_affinity_masks() · 347253c4
      Ming Lei authored
      'node_to_cpumask' is just one temparay variable for irq_build_affinity_masks(),
      so move it into irq_build_affinity_masks().
      
      No functioanl change.
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: linux-block@vger.kernel.org
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: linux-nvme@lists.infradead.org
      Cc: linux-pci@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190125095347.17950-2-ming.lei@redhat.com
      347253c4
  5. 07 Feb, 2019 13 commits
  6. 06 Feb, 2019 5 commits
    • Mike Snitzer's avatar
      dm: don't use bio_trim() afterall · fa8db494
      Mike Snitzer authored
      bio_trim() has an early return, which makes it _not_ idempotent, if the
      offset is 0 and the bio's bi_size already matches the requested size.
      Prior to DM, all users of bio_trim() were fine with this.  But DM has
      exposed the fact that bio_trim()'s early return is incompatible with a
      cloned bio whose integrity payload must be trimmed via
      bio_integrity_trim().
      
      Fix this by reverting DM back to doing the equivalent of bio_trim() but
      in an idempotent manner (so bio_integrity_trim is always performed).
      
      Follow-on work is needed to assess what benefit bio_trim()'s early
      return is providing to its existing callers.
      Reported-by: default avatarMilan Broz <gmazyland@gmail.com>
      Fixes: 57c36519 ("dm: fix clone_bio() to trigger blk_recount_segments()")
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      fa8db494
    • Mikulas Patocka's avatar
      dm: add memory barrier before waitqueue_active · 645efa84
      Mikulas Patocka authored
      Block core changes to switch bio-based IO accounting to be percpu had a
      side-effect of altering DM core to now rely on calling waitqueue_active
      (in both bio-based and request-based) to check if another task is in
      dm_wait_for_completion().
      
      A memory barrier is needed before calling waitqueue_active().  DM core
      doesn't piggyback on a preceding memory barrier so it must explicitly
      use its own.
      
      For more details on why using waitqueue_active() without a preceding
      barrier is unsafe, please see the comment before the waitqueue_active()
      definition in include/linux/wait.h.
      
      Add the missing memory barrier by switching to using wq_has_sleeper().
      
      Fixes: 6f757231 ("dm: remove the pending IO accounting")
      Fixes: c4576aed ("dm: fix request-based dm's use of dm_wait_for_completion")
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      645efa84
    • Chuck Lever's avatar
      svcrdma: Remove max_sge check at connect time · e248aa7b
      Chuck Lever authored
      Two and a half years ago, the client was changed to use gathered
      Send for larger inline messages, in commit 655fec69 ("xprtrdma:
      Use gathered Send for large inline messages"). Several fixes were
      required because there are a few in-kernel device drivers whose
      max_sge is 3, and these were broken by the change.
      
      Apparently my memory is going, because some time later, I submitted
      commit 25fd86ec ("svcrdma: Don't overrun the SGE array in
      svc_rdma_send_ctxt"), and after that, commit f3c1fd0e ("svcrdma:
      Reduce max_send_sges"). These too incorrectly assumed in-kernel
      device drivers would have more than a few Send SGEs available.
      
      The fix for the server side is not the same. This is because the
      fundamental problem on the server is that, whether or not the client
      has provisioned a chunk for the RPC reply, the server must squeeze
      even the most complex RPC replies into a single RDMA Send. Failing
      in the send path because of Send SGE exhaustion should never be an
      option.
      
      Therefore, instead of failing when the send path runs out of SGEs,
      switch to using a bounce buffer mechanism to handle RPC replies that
      are too complex for the device to send directly. That allows us to
      remove the max_sge check to enable drivers with small max_sge to
      work again.
      Reported-by: default avatarDon Dutile <ddutile@redhat.com>
      Fixes: 25fd86ec ("svcrdma: Don't overrun the SGE array in ...")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      e248aa7b
    • Trond Myklebust's avatar
      nfsd: Fix error return values for nfsd4_clone_file_range() · e3fdc89c
      Trond Myklebust authored
      If the parameter 'count' is non-zero, nfsd4_clone_file_range() will
      currently clobber all errors returned by vfs_clone_file_range() and
      replace them with EINVAL.
      
      Fixes: 42ec3d4c ("vfs: make remap_file_range functions take and...")
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Cc: stable@vger.kernel.org # v4.20+
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      e3fdc89c
    • Takashi Iwai's avatar
      ALSA: hda/ca0132 - Fix build error without CONFIG_PCI · c97617a8
      Takashi Iwai authored
      A call of pci_iounmap() call without CONFIG_PCI leads to a build error
      on some architectures.  We tried to address this and add a check of
      IS_ENABLED(CONFIG_PCI), but this still doesn't seem enough for sh.
      Ideally we should fix it globally, it's really a corner case, so let's
      paper over it with a simpler ifdef.
      
      Fixes: 1e73359a ("ALSA: hda/ca0132 - make pci_iounmap() call conditional")
      Reported-by: default avatarKuninori Morimoto <kuninori.morimoto.gx@renesas.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      c97617a8
  7. 05 Feb, 2019 3 commits
    • Charles Keepax's avatar
      ALSA: compress: Fix stop handling on compressed capture streams · 4f2ab5e1
      Charles Keepax authored
      It is normal user behaviour to start, stop, then start a stream
      again without closing it. Currently this works for compressed
      playback streams but not capture ones.
      
      The states on a compressed capture stream go directly from OPEN to
      PREPARED, unlike a playback stream which moves to SETUP and waits
      for a write of data before moving to PREPARED. Currently however,
      when a stop is sent the state is set to SETUP for both types of
      streams. This leaves a capture stream in the situation where a new
      start can't be sent as that requires the state to be PREPARED and
      a new set_params can't be sent as that requires the state to be
      OPEN. The only option being to close the stream, and then reopen.
      
      Correct this issues by allowing snd_compr_drain_notify to set the
      state depending on the stream direction, as we already do in
      set_params.
      
      Fixes: 49bb6402 ("ALSA: compress_core: Add support for capture streams")
      Signed-off-by: default avatarCharles Keepax <ckeepax@opensource.cirrus.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      4f2ab5e1
    • Michael S. Tsirkin's avatar
      virtio: drop internal struct from UAPI · 9c0644ee
      Michael S. Tsirkin authored
      There's no reason to expose struct vring_packed in UAPI - if we do we
      won't be able to change or drop it, and it's not part of any interface.
      
      Let's move it to virtio_ring.c
      
      Cc: Tiwei Bie <tiwei.bie@intel.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      9c0644ee
    • Udo Eberhardt's avatar
      ALSA: usb-audio: Add support for new T+A USB DAC · 3bff2407
      Udo Eberhardt authored
      This patch adds the T+A VID to the generic check in order to enable
      native DSD support for T+A devices. This works with the new T+A USB
      DAC model SD3100HV and will also work with future devices which
      support the XMOS/Thesycon style DSD format.
      Signed-off-by: default avatarUdo Eberhardt <udo.eberhardt@thesycon.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      3bff2407
  8. 03 Feb, 2019 6 commits
    • Linus Torvalds's avatar
      Linux 5.0-rc5 · 8834f560
      Linus Torvalds authored
      8834f560
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 24b888d8
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
       "A few updates for x86:
      
         - Fix an unintended sign extension issue in the fault handling code
      
         - Rename the new resource control config switch so it's less
           confusing
      
         - Avoid setting up EFI info in kexec when the EFI runtime is
           disabled.
      
         - Fix the microcode version check in the AMD microcode loader so it
           only loads higher version numbers and never downgrades
      
         - Set EFER.LME in the 32bit trampoline before returning to long mode
           to handle older AMD/KVM behaviour properly.
      
         - Add Darren and Andy as x86/platform reviewers"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/resctrl: Avoid confusion over the new X86_RESCTRL config
        x86/kexec: Don't setup EFI info if EFI runtime is not enabled
        x86/microcode/amd: Don't falsely trick the late loading mechanism
        MAINTAINERS: Add Andy and Darren as arch/x86/platform/ reviewers
        x86/fault: Fix sign-extend unintended sign extension
        x86/boot/compressed/64: Set EFER.LME=1 in 32-bit trampoline before returning to long mode
        x86/cpu: Add Atom Tremont (Jacobsville)
      24b888d8
    • Linus Torvalds's avatar
      Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · cc6810e3
      Linus Torvalds authored
      Pull cpu hotplug fixes from Thomas Gleixner:
       "Two fixes for the cpu hotplug machinery:
      
         - Replace the overly clever 'SMT disabled by BIOS' detection logic as
           it breaks KVM scenarios and prevents speculation control updates
           when the Hyperthreads are brought online late after boot.
      
         - Remove a redundant invocation of the speculation control update
           function"
      
      * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM
        x86/speculation: Remove redundant arch_smt_update() invocation
      cc6810e3
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 58f6d428
      Linus Torvalds authored
      Pull perf fixes from Thomas Gleixner:
       "A pile of perf updates:
      
         - Fix broken sanity check in the /proc/sys/kernel/perf_cpu_time_max_percent
           write handler
      
         - Cure a perf script crash which caused by an unitinialized data
           structure
      
         - Highlight the hottest instruction in perf top and not a random one
      
         - Cure yet another clang issue when building perf python
      
         - Handle topology entries with no CPU correctly in the tools
      
         - Handle perf data which contains both tracepoints and performance
           counter entries correctly.
      
         - Add a missing NULL pointer check in perf ordered_events_free()"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf script: Fix crash when processing recorded stat data
        perf top: Fix wrong hottest instruction highlighted
        perf tools: Handle TOPOLOGY headers with no CPU
        perf python: Remove -fstack-clash-protection when building with some clang versions
        perf core: Fix perf_proc_update_handler() bug
        perf script: Fix crash with printing mixed trace point and other events
        perf ordered_events: Fix crash in ordered_events__free
      58f6d428
    • Linus Torvalds's avatar
      Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 89401be6
      Linus Torvalds authored
      Pull EFI fix from Thomas Gleixner:
       "The dump info for the efi page table debugging lacks a terminator
        which causes the kernel to crash when the debugfile is read"
      
      * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        efi/arm64: Fix debugfs crash by adding a terminator for ptdump marker
      89401be6
    • Linus Torvalds's avatar
      Merge tag 'for-5.0-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 312b3a93
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
      
       - regression fix: transaction commit can run away due to delayed ref
         waiting heuristic, this is not necessary now because of the proper
         reservation mechanism introduced in 5.0
      
       - regression fix: potential crash due to use-before-check of an ERR_PTR
         return value
      
       - fix for transaction abort during transaction commit that needs to
         properly clean up pending block groups
      
       - fix deadlock during b-tree node/leaf splitting, when this happens on
         some of the fundamental trees, we must prevent new tree block
         allocation to re-enter indirectly via the block group flushing path
      
       - potential memory leak after errors during mount
      
      * tag 'for-5.0-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: On error always free subvol_name in btrfs_mount
        btrfs: clean up pending block groups when transaction commit aborts
        btrfs: fix potential oops in device_list_add
        btrfs: don't end the transaction for delayed refs in throttle
        Btrfs: fix deadlock when allocating tree block during leaf/node split
      312b3a93
  9. 02 Feb, 2019 1 commit