An error occurred fetching the project authors.
  1. 07 May, 2024 1 commit
  2. 03 May, 2024 2 commits
  3. 02 May, 2024 1 commit
  4. 29 Apr, 2024 2 commits
  5. 24 Apr, 2024 7 commits
    • Andreas Gruenbacher's avatar
      gfs2: Remove and replace gfs2_glock_queue_work · 1e860444
      Andreas Gruenbacher authored
      There are no more callers of gfs2_glock_queue_work() left, so remove
      that helper.  With that, we can now rename __gfs2_glock_queue_work()
      back to gfs2_glock_queue_work() to get rid of some unnecessary clutter.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      1e860444
    • Andreas Gruenbacher's avatar
      gfs2: do_xmote fixes · 9947a06d
      Andreas Gruenbacher authored
      Function do_xmote() is called with the glock spinlock held.  Commit
      86934198 added a 'goto skip_inval' statement at the beginning of the
      function to further below where the glock spinlock is expected not to be
      held anymore.  Then it added code there that requires the glock spinlock
      to be held.  This doesn't make sense; fix this up by dropping and
      retaking the spinlock where needed.
      
      In addition, when ->lm_lock() returned an error, do_xmote() didn't fail
      the locking operation, and simply left the glock hanging; fix that as
      well.  (This is a much older error.)
      
      Fixes: 86934198 ("gfs2: Clear flags when withdraw prevents xmote")
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      9947a06d
    • Andreas Gruenbacher's avatar
      gfs2: finish_xmote cleanup · 1cd28e15
      Andreas Gruenbacher authored
      Currently, function finish_xmote() takes and releases the glock
      spinlock.  However, all of its callers immediately take that spinlock
      again, so it makes more sense to take the spin lock before calling
      finish_xmote() already.
      
      With that, thaw_glock() is the only place that sets the GLF_HAVE_REPLY
      flag outside of the glock spinlock, but it also takes that spinlock
      immediately thereafter.  Change that to set the bit when the spinlock is
      already held.  This allows to switch from test_and_clear_bit() to
      test_bit() and clear_bit() in glock_work_func().
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      1cd28e15
    • Andreas Gruenbacher's avatar
      gfs2: Unlock fewer glocks on unmount · a3730c5e
      Andreas Gruenbacher authored
      At unmount time, we would generally like to explicitly unlock as few
      glocks as possible for efficiency.  We are already skipping glocks that
      don't have a lock value block (LVB), but we can also skip glocks which
      are not held in DLM_LOCK_EX or DLM_LOCK_PW mode (of which gfs2 only uses
      DLM_LOCK_EX under the name LM_ST_EXCLUSIVE).
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Cc: David Teigland <teigland@redhat.com>
      a3730c5e
    • Andreas Gruenbacher's avatar
      gfs2: Fix potential glock use-after-free on unmount · d98779e6
      Andreas Gruenbacher authored
      When a DLM lockspace is released and there ares still locks in that
      lockspace, DLM will unlock those locks automatically.  Commit
      fb6791d1 started exploiting this behavior to speed up filesystem
      unmount: gfs2 would simply free glocks it didn't want to unlock and then
      release the lockspace.  This didn't take the bast callbacks for
      asynchronous lock contention notifications into account, which remain
      active until until a lock is unlocked or its lockspace is released.
      
      To prevent those callbacks from accessing deallocated objects, put the
      glocks that should not be unlocked on the sd_dead_glocks list, release
      the lockspace, and only then free those glocks.
      
      As an additional measure, ignore unexpected ast and bast callbacks if
      the receiving glock is dead.
      
      Fixes: fb6791d1 ("GFS2: skip dlm_unlock calls in unmount")
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Cc: David Teigland <teigland@redhat.com>
      d98779e6
    • Andreas Gruenbacher's avatar
      gfs2: Remove ill-placed consistency check · 59f60005
      Andreas Gruenbacher authored
      This consistency check was originally added by commit 9287c645
      ("gfs2: Fix occasional glock use-after-free").  It is ill-placed in
      gfs2_glock_free() because if it holds there, it must equally hold in
      __gfs2_glock_put() already.  Either way, the check doesn't seem
      necessary anymore.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      59f60005
    • Andreas Gruenbacher's avatar
      gfs2: Fix lru_count accounting · 7a1ad9d8
      Andreas Gruenbacher authored
      Currently, gfs2_scan_glock_lru() decrements lru_count when a glock is
      moved onto the dispose list.  When such a glock is then stolen from the
      dispose list while gfs2_dispose_glock_lru() doesn't hold the lru_lock,
      lru_count will be decremented again, so the counter will eventually go
      negative.
      
      This bug has existed in one form or another since at least commit
      97cc1025 ("GFS2: Kill two daemons with one patch").
      
      Fix this by only decrementing lru_count when we actually remove a glock
      and schedule for it to be unlocked and dropped.  We also don't need to
      remove and then re-add glocks when we can just as well move them back
      onto the lru_list when necessary.
      
      In addition, return the number of glocks freed as we should, not the
      number of glocks moved onto the dispose list.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      7a1ad9d8
  6. 09 Apr, 2024 14 commits
  7. 11 Mar, 2024 1 commit
  8. 10 Mar, 2024 7 commits
    • Linus Torvalds's avatar
      Linux 6.8 · e8f897f4
      Linus Torvalds authored
      e8f897f4
    • Linus Torvalds's avatar
      Merge tag 'trace-ring-buffer-v6.8-rc7' of... · fa4b851b
      Linus Torvalds authored
      Merge tag 'trace-ring-buffer-v6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
      
      Pull tracing fixes from Steven Rostedt:
      
       - Do not allow large strings (> 4096) as single write to trace_marker
      
         The size of a string written into trace_marker was determined by the
         size of the sub-buffer in the ring buffer. That size is dependent on
         the PAGE_SIZE of the architecture as it can be mapped into user
         space. But on PowerPC, where PAGE_SIZE is 64K, that made the limit of
         the string of writing into trace_marker 64K.
      
         One of the selftests looks at the size of the ring buffer sub-buffers
         and writes that plus more into the trace_marker. The write will take
         what it can and report back what it consumed so that the user space
         application (like echo) will write the rest of the string. The string
         is stored in the ring buffer and can be read via the "trace" or
         "trace_pipe" files.
      
         The reading of the ring buffer uses vsnprintf(), which uses a
         precision "%.*s" to make sure it only reads what is stored in the
         buffer, as a bug could cause the string to be non terminated.
      
         With the combination of the precision change and the PAGE_SIZE of 64K
         allowing huge strings to be added into the ring buffer, plus the test
         that would actually stress that limit, a bug was reported that the
         precision used was too big for "%.*s" as the string was close to 64K
         in size and the max precision of vsnprintf is 32K.
      
         Linus suggested not to have that precision as it could hide a bug if
         the string was again stored without a nul byte.
      
         Another issue that was brought up is that the trace_seq buffer is
         also based on PAGE_SIZE even though it is not tied to the
         architecture limit like the ring buffer sub-buffer is. Having it be
         64K * 2 is simply just too big and wasting memory on systems with 64K
         page sizes. It is now hardcoded to 8K which is what all other
         architectures with 4K PAGE_SIZE has.
      
         Finally, the write to trace_marker is now limited to 4K as there is
         no reason to write larger strings into trace_marker.
      
       - ring_buffer_wait() should not loop.
      
         The ring_buffer_wait() does not have the full context (yet) on if it
         should loop or not. Just exit the loop as soon as its woken up and
         let the callers decide to loop or not (they already do, so it's a bit
         redundant).
      
       - Fix shortest_full field to be the smallest amount in the ring buffer
         that a waiter is waiting for. The "shortest_full" field is updated
         when a new waiter comes in and wants to wait for a smaller amount of
         data in the ring buffer than other waiters. But after all waiters are
         woken up, it's not reset, so if another waiter comes in wanting to
         wait for more data, it will be woken up when the ring buffer has a
         smaller amount from what the previous waiters were waiting for.
      
       - The wake up all waiters on close is incorrectly called frome
         .release() and not from .flush() so it will never wake up any waiters
         as the .release() will not get called until all .read() calls are
         finished. And the wakeup is for the waiters in those .read() calls.
      
      * tag 'trace-ring-buffer-v6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        tracing: Use .flush() call to wake up readers
        ring-buffer: Fix resetting of shortest_full
        ring-buffer: Fix waking up ring buffer readers
        tracing: Limit trace_marker writes to just 4K
        tracing: Limit trace_seq size to just 8K and not depend on architecture PAGE_SIZE
        tracing: Remove precision vsnprintf() check from print event
      fa4b851b
    • Linus Torvalds's avatar
      Merge tag 'phy-fixes3-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy · 210ee636
      Linus Torvalds authored
      Pull phy fixes from Vinod Koul:
      
       - fixes for Qualcomm qmp-combo driver for ordering of drm and type-c
         switch registartion due to drivers might not probe defer after having
         registered child devices to avoid triggering a probe deferral loop.
      
         This fixes internal display on Lenovo ThinkPad X13s
      
      * tag 'phy-fixes3-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy:
        phy: qcom-qmp-combo: fix type-c switch registration
        phy: qcom-qmp-combo: fix drm bridge registration
      210ee636
    • Steven Rostedt (Google)'s avatar
      tracing: Use .flush() call to wake up readers · e5d7c191
      Steven Rostedt (Google) authored
      The .release() function does not get called until all readers of a file
      descriptor are finished.
      
      If a thread is blocked on reading a file descriptor in ring_buffer_wait(),
      and another thread closes the file descriptor, it will not wake up the
      other thread as ring_buffer_wake_waiters() is called by .release(), and
      that will not get called until the .read() is finished.
      
      The issue originally showed up in trace-cmd, but the readers are actually
      other processes with their own file descriptors. So calling close() would wake
      up the other tasks because they are blocked on another descriptor then the
      one that was closed(). But there's other wake ups that solve that issue.
      
      When a thread is blocked on a read, it can still hang even when another
      thread closed its descriptor.
      
      This is what the .flush() callback is for. Have the .flush() wake up the
      readers.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20240308202432.107909457@goodmis.org
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: linke li <lilinke99@qq.com>
      Cc: Rabin Vincent <rabin@rab.in>
      Fixes: f3ddb74a ("tracing: Wake up ring buffer waiters on closing of the file")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      e5d7c191
    • Steven Rostedt (Google)'s avatar
      ring-buffer: Fix resetting of shortest_full · 68282dd9
      Steven Rostedt (Google) authored
      The "shortest_full" variable is used to keep track of the waiter that is
      waiting for the smallest amount on the ring buffer before being woken up.
      When a tasks waits on the ring buffer, it passes in a "full" value that is
      a percentage. 0 means wake up on any data. 1-100 means wake up from 1% to
      100% full buffer.
      
      As all waiters are on the same wait queue, the wake up happens for the
      waiter with the smallest percentage.
      
      The problem is that the smallest_full on the cpu_buffer that stores the
      smallest amount doesn't get reset when all the waiters are woken up. It
      does get reset when the ring buffer is reset (echo > /sys/kernel/tracing/trace).
      
      This means that tasks may be woken up more often then when they want to
      be. Instead, have the shortest_full field get reset just before waking up
      all the tasks. If the tasks wait again, they will update the shortest_full
      before sleeping.
      
      Also add locking around setting of shortest_full in the poll logic, and
      change "work" to "rbwork" to match the variable name for rb_irq_work
      structures that are used in other places.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20240308202431.948914369@goodmis.org
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: linke li <lilinke99@qq.com>
      Cc: Rabin Vincent <rabin@rab.in>
      Fixes: 2c2b0a78 ("ring-buffer: Add percentage of ring buffer full to wake up reader")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      68282dd9
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 137e0ec0
      Linus Torvalds authored
      Pull kvm fixes from Paolo Bonzini:
       "KVM GUEST_MEMFD fixes for 6.8:
      
         - Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY
           to avoid creating an inconsistent ABI (KVM_MEM_GUEST_MEMFD is not
           writable from userspace, so there would be no way to write to a
           read-only guest_memfd).
      
         - Update documentation for KVM_SW_PROTECTED_VM to make it abundantly
           clear that such VMs are purely for development and testing.
      
         - Limit KVM_SW_PROTECTED_VM guests to the TDP MMU, as the long term
           plan is to support confidential VMs with deterministic private
           memory (SNP and TDX) only in the TDP MMU.
      
         - Fix a bug in a GUEST_MEMFD dirty logging test that caused false
           passes.
      
        x86 fixes:
      
         - Fix missing marking of a guest page as dirty when emulating an
           atomic access.
      
         - Check for mmu_notifier invalidation events before faulting in the
           pfn, and before acquiring mmu_lock, to avoid unnecessary work and
           lock contention with preemptible kernels (including
           CONFIG_PREEMPT_DYNAMIC in non-preemptible mode).
      
         - Disable AMD DebugSwap by default, it breaks VMSA signing and will
           be re-enabled with a better VM creation API in 6.10.
      
         - Do the cache flush of converted pages in svm_register_enc_region()
           before dropping kvm->lock, to avoid a race with unregistering of
           the same region and the consequent use-after-free issue"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        SEV: disable SEV-ES DebugSwap by default
        KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is changing
        KVM: SVM: Flush pages under kvm->lock to fix UAF in svm_register_enc_region()
        KVM: selftests: Add a testcase to verify GUEST_MEMFD and READONLY are exclusive
        KVM: selftests: Create GUEST_MEMFD for relevant invalid flags testcases
        KVM: x86/mmu: Restrict KVM_SW_PROTECTED_VM to the TDP MMU
        KVM: x86: Update KVM_SW_PROTECTED_VM docs to make it clear they're a WIP
        KVM: Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY
        KVM: x86: Mark target gfn of emulated atomic instruction as dirty
      137e0ec0
    • Steven Rostedt (Google)'s avatar
      ring-buffer: Fix waking up ring buffer readers · b3594573
      Steven Rostedt (Google) authored
      A task can wait on a ring buffer for when it fills up to a specific
      watermark. The writer will check the minimum watermark that waiters are
      waiting for and if the ring buffer is past that, it will wake up all the
      waiters.
      
      The waiters are in a wait loop, and will first check if a signal is
      pending and then check if the ring buffer is at the desired level where it
      should break out of the loop.
      
      If a file that uses a ring buffer closes, and there's threads waiting on
      the ring buffer, it needs to wake up those threads. To do this, a
      "wait_index" was used.
      
      Before entering the wait loop, the waiter will read the wait_index. On
      wakeup, it will check if the wait_index is different than when it entered
      the loop, and will exit the loop if it is. The waker will only need to
      update the wait_index before waking up the waiters.
      
      This had a couple of bugs. One trivial one and one broken by design.
      
      The trivial bug was that the waiter checked the wait_index after the
      schedule() call. It had to be checked between the prepare_to_wait() and
      the schedule() which it was not.
      
      The main bug is that the first check to set the default wait_index will
      always be outside the prepare_to_wait() and the schedule(). That's because
      the ring_buffer_wait() doesn't have enough context to know if it should
      break out of the loop.
      
      The loop itself is not needed, because all the callers to the
      ring_buffer_wait() also has their own loop, as the callers have a better
      sense of what the context is to decide whether to break out of the loop
      or not.
      
      Just have the ring_buffer_wait() block once, and if it gets woken up, exit
      the function and let the callers decide what to do next.
      
      Link: https://lore.kernel.org/all/CAHk-=whs5MdtNjzFkTyaUy=vHi=qwWgPi0JgTe6OYUYMNSRZfg@mail.gmail.com/
      Link: https://lore.kernel.org/linux-trace-kernel/20240308202431.792933613@goodmis.org
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: linke li <lilinke99@qq.com>
      Cc: Rabin Vincent <rabin@rab.in>
      Fixes: e30f53aa ("tracing: Do not busy wait in buffer splice")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      b3594573
  9. 09 Mar, 2024 5 commits
    • Linus Torvalds's avatar
      Merge tag 'i2c-for-6.8-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 005f6f34
      Linus Torvalds authored
      Pull i2c fixes from Wolfram Sang:
       "Two patches from Heiner for the i801 are targeting muxes discovered
        while working on some other features. Essentially, there is a
        reordering when adding optional slaves and proper cleanup upon
        registering a mux device.
      
        Christophe fixes the exit path in the wmt driver that was leaving the
        clocks hanging, and the last fix from Tommy avoids false error reports
        in IRQ"
      
      * tag 'i2c-for-6.8-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: aspeed: Fix the dummy irq expected print
        i2c: wmt: Fix an error handling path in wmt_i2c_probe()
        i2c: i801: Avoid potential double call to gpiod_remove_lookup_table
        i2c: i801: Fix using mux_pdev before it's set
      005f6f34
    • Linus Torvalds's avatar
      Merge tag 'firewire-fixes-6.8-final' of... · 66695e7d
      Linus Torvalds authored
      Merge tag 'firewire-fixes-6.8-final' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
      
      Pull firewire fix from Takashi Sakamoto:
       "A fix to suppress a warning about unreleased IRQ for 1394 OHCI
        hardware when disabling MSI.
      
        In Linux kernel v6.5, a PCI driver for 1394 OHCI hardware was
        optimized into the managed device resources. Edmund Raile points out
        that the change brings the warning about unreleased IRQ at the call of
        pci_disable_msi(), since the API expects that the relevant IRQ has
        already been released in advance.
      
        As long as the API is called in .remove callback of PCI device
        operation, it is prohibited to maintain the IRQ as the part of managed
        device resource. As a workaround, the IRQ is explicitly released at
        .remove callback, before the call of pci_disable_msi().
      
        pci_disable_msi() is legacy API nowadays in PCI MSI implementation. I
        have a plan to replace it with the modern API in the development for
        the future version of Linux kernel. So at present I keep them as is"
      
      * tag 'firewire-fixes-6.8-final' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
        firewire: ohci: prevent leak of left-over IRQ on unbind
      66695e7d
    • Paolo Bonzini's avatar
      SEV: disable SEV-ES DebugSwap by default · 5abf6dce
      Paolo Bonzini authored
      The DebugSwap feature of SEV-ES provides a way for confidential guests to use
      data breakpoints.  However, because the status of the DebugSwap feature is
      recorded in the VMSA, enabling it by default invalidates the attestation
      signatures.  In 6.10 we will introduce a new API to create SEV VMs that
      will allow enabling DebugSwap based on what the user tells KVM to do.
      Contextually, we will change the legacy KVM_SEV_ES_INIT API to never
      enable DebugSwap.
      
      For compatibility with kernels that pre-date the introduction of DebugSwap,
      as well as with those where KVM_SEV_ES_INIT will never enable it, do not enable
      the feature by default.  If anybody wants to use it, for now they can enable
      the sev_es_debug_swap_enabled module parameter, but this will result in a
      warning.
      
      Fixes: d1f85fbe ("KVM: SEV: Enable data breakpoints in SEV-ES")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5abf6dce
    • Paolo Bonzini's avatar
      Merge tag 'kvm-x86-guest_memfd_fixes-6.8' of https://github.com/kvm-x86/linux into HEAD · 39fee313
      Paolo Bonzini authored
      KVM GUEST_MEMFD fixes for 6.8:
      
       - Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY to
         avoid creating ABI that KVM can't sanely support.
      
       - Update documentation for KVM_SW_PROTECTED_VM to make it abundantly
         clear that such VMs are purely a development and testing vehicle, and
         come with zero guarantees.
      
       - Limit KVM_SW_PROTECTED_VM guests to the TDP MMU, as the long term plan
         is to support confidential VMs with deterministic private memory (SNP
         and TDX) only in the TDP MMU.
      
       - Fix a bug in a GUEST_MEMFD negative test that resulted in false passes
         when verifying that KVM_MEM_GUEST_MEMFD memslots can't be dirty logged.
      39fee313
    • Paolo Bonzini's avatar
      Merge tag 'kvm-x86-fixes-6.8-2' of https://github.com/kvm-x86/linux into HEAD · 1b6c146d
      Paolo Bonzini authored
      KVM x86 fixes for 6.8, round 2:
      
       - When emulating an atomic access, mark the gfn as dirty in the memslot
         to fix a bug where KVM could fail to mark the slot as dirty during live
         migration, ultimately resulting in guest data corruption due to a dirty
         page not being re-copied from the source to the target.
      
       - Check for mmu_notifier invalidation events before faulting in the pfn,
         and before acquiring mmu_lock, to avoid unnecessary work and lock
         contention.  Contending mmu_lock is especially problematic on preemptible
         kernels, as KVM may yield mmu_lock in response to the contention, which
         severely degrades overall performance due to vCPUs making it difficult
         for the task that triggered invalidation to make forward progress.
      
         Note, due to another kernel bug, this fix isn't limited to preemtible
         kernels, as any kernel built with CONFIG_PREEMPT_DYNAMIC=y will yield
         contended rwlocks and spinlocks.
      
         https://lore.kernel.org/all/20240110214723.695930-1-seanjc@google.com
      1b6c146d