1. 14 May, 2020 2 commits
    • Olga Kornievskaia's avatar
      NFSv3: fix rpc receive buffer size for MOUNT call · 8eed292b
      Olga Kornievskaia authored
      Prior to commit e3d3ab64dd66 ("SUNRPC: Use au_rslack when
      computing reply buffer size"), there was enough slack in the reply
      buffer to commodate filehandles of size 60bytes. However, the real
      problem was that the reply buffer size for the MOUNT operation was
      not correctly calculated. Received buffer size used the filehandle
      size for NFSv2 (32bytes) which is much smaller than the allowed
      filehandle size for the v3 mounts.
      
      Fix the reply buffer size (decode arguments size) for the MNT command.
      
      Fixes: 2c94b8ec ("SUNRPC: Use au_rslack when computing reply buffer size")
      Signed-off-by: default avatarOlga Kornievskaia <kolga@netapp.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      8eed292b
    • J. Bruce Fields's avatar
      SUNRPC: 'Directory with parent 'rpc_clnt' already present!' · 933496e9
      J. Bruce Fields authored
      Each rpc_client has a cl_clid which is allocated from a global ida, and
      a debugfs directory which is named after cl_clid.
      
      We're releasing the cl_clid before we free the debugfs directory named
      after it.  As soon as the cl_clid is released, that value is available
      for another newly created client.
      
      That leaves a window where another client may attempt to create a new
      debugfs directory with the same name as the not-yet-deleted debugfs
      directory from the dying client.  Symptoms are log messages like
      
      	Directory 4 with parent 'rpc_clnt' already present!
      
      Fixes: 7c4310ff "SUNRPC: defer slow parts of rpc_free_client() to a workqueue."
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      933496e9
  2. 13 May, 2020 1 commit
  3. 11 May, 2020 4 commits
  4. 10 May, 2020 8 commits
    • NeilBrown's avatar
      SUNRPC: fix use-after-free in rpc_free_client_work() · 31e9a7f3
      NeilBrown authored
      Parts of rpc_free_client() were recently moved to
      a separate rpc_free_clent_work().  This introduced
      a use-after-free as rpc_clnt_remove_pipedir() calls
      rpc_net_ns(), and that uses clnt->cl_xprt which has already
      been freed.
      So move the call to xprt_put() after the call to
      rpc_clnt_remove_pipedir().
      
      Reported-by: syzbot+22b5ef302c7c40d94ea8@syzkaller.appspotmail.com
      Fixes: 7c4310ff ("SUNRPC: defer slow parts of rpc_free_client() to a workqueue.")
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      31e9a7f3
    • Linus Torvalds's avatar
      Linux 5.7-rc5 · 2ef96a5b
      Linus Torvalds authored
      2ef96a5b
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2020-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c14cab26
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
       "A set of fixes for x86:
      
         - Ensure that direct mapping alias is always flushed when changing
           page attributes. The optimization for small ranges failed to do so
           when the virtual address was in the vmalloc or module space.
      
         - Unbreak the trace event registration for syscalls without arguments
           caused by the refactoring of the SYSCALL_DEFINE0() macro.
      
         - Move the printk in the TSC deadline timer code to a place where it
           is guaranteed to only be called once during boot and cannot be
           rearmed by clearing warn_once after boot. If it's invoked post boot
           then lockdep rightfully complains about a potential deadlock as the
           calling context is different.
      
         - A series of fixes for objtool and the ORC unwinder addressing
           variety of small issues:
      
             - Stack offset tracking for indirect CFAs in objtool ignored
               subsequent pushs and pops
      
             - Repair the unwind hints in the register clearing entry ASM code
      
             - Make the unwinding in the low level exit to usermode code stop
               after switching to the trampoline stack. The unwind hint is no
               longer valid and the ORC unwinder emits a warning as it can't
               find the registers anymore.
      
             - Fix unwind hints in switch_to_asm() and rewind_stack_do_exit()
               which caused objtool to generate bogus ORC data.
      
             - Prevent unwinder warnings when dumping the stack of a
               non-current task as there is no way to be sure about the
               validity because the dumped stack can be a moving target.
      
             - Make the ORC unwinder behave the same way as the frame pointer
               unwinder when dumping an inactive tasks stack and do not skip
               the first frame.
      
             - Prevent ORC unwinding before ORC data has been initialized
      
             - Immediately terminate unwinding when a unknown ORC entry type
               is found.
      
             - Prevent premature stop of the unwinder caused by IRET frames.
      
             - Fix another infinite loop in objtool caused by a negative
               offset which was not catched.
      
             - Address a few build warnings in the ORC unwinder and add
               missing static/ro_after_init annotations"
      
      * tag 'x86-urgent-2020-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/unwind/orc: Move ORC sorting variables under !CONFIG_MODULES
        x86/apic: Move TSC deadline timer debug printk
        ftrace/x86: Fix trace event registration for syscalls without arguments
        x86/mm/cpa: Flush direct map alias during cpa
        objtool: Fix infinite loop in for_offset_range()
        x86/unwind/orc: Fix premature unwind stoppage due to IRET frames
        x86/unwind/orc: Fix error path for bad ORC entry type
        x86/unwind/orc: Prevent unwinding before ORC initialization
        x86/unwind/orc: Don't skip the first frame for inactive tasks
        x86/unwind: Prevent false warnings for non-current tasks
        x86/unwind/orc: Convert global variables to static
        x86/entry/64: Fix unwind hints in rewind_stack_do_exit()
        x86/entry/64: Fix unwind hints in __switch_to_asm()
        x86/entry/64: Fix unwind hints in kernel exit path
        x86/entry/64: Fix unwind hints in register clearing code
        objtool: Fix stack offset tracking for indirect CFAs
      c14cab26
    • Linus Torvalds's avatar
      Merge tag 'objtool-urgent-2020-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 8b000832
      Linus Torvalds authored
      Pull objtool fix from Thomas Gleixner:
       "A single fix for objtool to prevent an infinite loop in the
        jump table search which can be triggered when building the
        kernel with '-ffunction-sections'"
      
      * tag 'objtool-urgent-2020-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        objtool: Fix infinite loop in find_jump_table()
      8b000832
    • Linus Torvalds's avatar
      Merge tag 'locking-urgent-2020-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · bd2049f8
      Linus Torvalds authored
      Pull locking fix from Thomas Gleixner:
       "A single fix for the fallout of the recent futex uacess rework.
      
        With those changes GCC9 fails to analyze arch_futex_atomic_op_inuser()
        correctly and emits a 'maybe unitialized' warning. While we usually
        ignore compiler stupidity the conditional store is pointless anyway
        because the correct case has to store. For the fault case the extra
        store does no harm"
      
      * tag 'locking-urgent-2020-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        ARM: futex: Address build warning
      bd2049f8
    • Linus Torvalds's avatar
      Merge tag 'iommu-fixes-v5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · 27d2dcb1
      Linus Torvalds authored
      Pull iommu fixes from Joerg Roedel:
      
       - Race condition fixes for the AMD IOMMU driver.
      
         These are five patches fixing two race conditions around
         increase_address_space(). The first race condition was around the
         non-atomic update of the domain page-table root pointer and the
         variable containing the page-table depth (called mode). This is fixed
         now be merging page-table root and mode into one 64-bit field which
         is read/written atomically.
      
         The second race condition was around updating the page-table root
         pointer and making it public before the hardware caches were flushed.
         This could cause addresses to be mapped and returned to drivers which
         are not reachable by IOMMU hardware yet, causing IO page-faults. This
         is fixed too by adding the necessary flushes before a new page-table
         root is published.
      
         Related to the race condition fixes these patches also add a missing
         domain_flush_complete() barrier to update_domain() and a fix to bail
         out of the loop which tries to increase the address space when the
         call to increase_address_space() fails.
      
         Qian was able to trigger the race conditions under high load and
         memory pressure within a few days of testing. He confirmed that he
         has seen no issues anymore with the fixes included here.
      
       - Fix for a list-handling bug in the VirtIO IOMMU driver.
      
      * tag 'iommu-fixes-v5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
        iommu/virtio: Reverse arguments to list_add
        iommu/amd: Do not flush Device Table in iommu_map_page()
        iommu/amd: Update Device Table in increase_address_space()
        iommu/amd: Call domain_flush_complete() in update_domain()
        iommu/amd: Do not loop forever when trying to increase address space
        iommu/amd: Fix race in increase_address_space()/fetch_pte()
      27d2dcb1
    • Linus Torvalds's avatar
      Merge tag 'block-5.7-2020-05-09' of git://git.kernel.dk/linux-block · 0a85ed6e
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
      
       - a small series fixing a use-after-free of bdi name (Christoph,Yufen)
      
       - NVMe fix for a regression with the smaller CQ update (Alexey)
      
       - NVMe fix for a hang at namespace scanning error recovery (Sagi)
      
       - fix race with blk-iocost iocg->abs_vdebt updates (Tejun)
      
      * tag 'block-5.7-2020-05-09' of git://git.kernel.dk/linux-block:
        nvme: fix possible hang when ns scanning fails during error recovery
        nvme-pci: fix "slimmer CQ head update"
        bdi: add a ->dev_name field to struct backing_dev_info
        bdi: use bdi_dev_name() to get device name
        bdi: move bdi_dev_name out of line
        vboxsf: don't use the source name in the bdi name
        iocost: protect iocg->abs_vdebt with iocg->waitq.lock
      0a85ed6e
    • Linus Torvalds's avatar
      gcc-10: mark more functions __init to avoid section mismatch warnings · e99332e7
      Linus Torvalds authored
      It seems that for whatever reason, gcc-10 ends up not inlining a couple
      of functions that used to be inlined before.  Even if they only have one
      single callsite - it looks like gcc may have decided that the code was
      unlikely, and not worth inlining.
      
      The code generation difference is harmless, but caused a few new section
      mismatch errors, since the (now no longer inlined) function wasn't in
      the __init section, but called other init functions:
      
         Section mismatch in reference from the function kexec_free_initrd() to the function .init.text:free_initrd_mem()
         Section mismatch in reference from the function tpm2_calc_event_log_size() to the function .init.text:early_memremap()
         Section mismatch in reference from the function tpm2_calc_event_log_size() to the function .init.text:early_memunmap()
      
      So add the appropriate __init annotation to make modpost not complain.
      In both cases there were trivially just a single callsite from another
      __init function.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e99332e7
  5. 09 May, 2020 12 commits
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · 2e28f3b1
      Linus Torvalds authored
      Pull RISC-V fixes from Palmer Dabbelt:
       "A smattering of fixes and cleanups:
      
         - Dead code removal.
      
         - Exporting riscv_cpuid_to_hartid_mask for modules.
      
         - Per-CPU tracking of ISA features.
      
         - Setting max_pfn correctly when probing memory.
      
         - Adding a note to the VDSO so glibc can check the kernel's version
           without a uname().
      
         - A fix to force the bootloader to initialize the boot spin tables,
           which still get used as a fallback when SBI-0.1 is enabled"
      
      * tag 'riscv-for-linus-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        RISC-V: Remove unused code from STRICT_KERNEL_RWX
        riscv: force __cpu_up_ variables to put in data section
        riscv: add Linux note to vdso
        riscv: set max_pfn to the PFN of the last page
        RISC-V: Remove N-extension related defines
        RISC-V: Add bitmap reprensenting ISA features common across CPUs
        RISC-V: Export riscv_cpuid_to_hartid_mask() API
      2e28f3b1
    • Linus Torvalds's avatar
      gcc-10: avoid shadowing standard library 'free()' in crypto · 1a263ae6
      Linus Torvalds authored
      gcc-10 has started warning about conflicting types for a few new
      built-in functions, particularly 'free()'.
      
      This results in warnings like:
      
         crypto/xts.c:325:13: warning: conflicting types for built-in function ‘free’; expected ‘void(void *)’ [-Wbuiltin-declaration-mismatch]
      
      because the crypto layer had its local freeing functions called
      'free()'.
      
      Gcc-10 is in the wrong here, since that function is marked 'static', and
      thus there is no chance of confusion with any standard library function
      namespace.
      
      But the simplest thing to do is to just use a different name here, and
      avoid this gcc mis-feature.
      
      [ Side note: gcc knowing about 'free()' is in itself not the
        mis-feature: the semantics of 'free()' are special enough that a
        compiler can validly do special things when seeing it.
      
        So the mis-feature here is that gcc thinks that 'free()' is some
        restricted name, and you can't shadow it as a local static function.
      
        Making the special 'free()' semantics be a function attribute rather
        than tied to the name would be the much better model ]
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1a263ae6
    • Linus Torvalds's avatar
      gcc-10: disable 'restrict' warning for now · adc71920
      Linus Torvalds authored
      gcc-10 now warns about passing aliasing pointers to functions that take
      restricted pointers.
      
      That's actually a great warning, and if we ever start using 'restrict'
      in the kernel, it might be quite useful.  But right now we don't, and it
      turns out that the only thing this warns about is an idiom where we have
      declared a few functions to be "printf-like" (which seems to make gcc
      pick up the restricted pointer thing), and then we print to the same
      buffer that we also use as an input.
      
      And people do that as an odd concatenation pattern, with code like this:
      
          #define sysfs_show_gen_prop(buffer, fmt, ...) \
              snprintf(buffer, PAGE_SIZE, "%s"fmt, buffer, __VA_ARGS__)
      
      where we have 'buffer' as both the destination of the final result, and
      as the initial argument.
      
      Yes, it's a bit questionable.  And outside of the kernel, people do have
      standard declarations like
      
          int snprintf( char *restrict buffer, size_t bufsz,
                        const char *restrict format, ... );
      
      where that output buffer is marked as a restrict pointer that cannot
      alias with any other arguments.
      
      But in the context of the kernel, that 'use snprintf() to concatenate to
      the end result' does work, and the pattern shows up in multiple places.
      And we have not marked our own version of snprintf() as taking restrict
      pointers, so the warning is incorrect for now, and gcc picks it up on
      its own.
      
      If we do start using 'restrict' in the kernel (and it might be a good
      idea if people find places where it matters), we'll need to figure out
      how to avoid this issue for snprintf and friends.  But in the meantime,
      this warning is not useful.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      adc71920
    • Linus Torvalds's avatar
      gcc-10: disable 'stringop-overflow' warning for now · 5a76021c
      Linus Torvalds authored
      This is the final array bounds warning removal for gcc-10 for now.
      
      Again, the warning is good, and we should re-enable all these warnings
      when we have converted all the legacy array declaration cases to
      flexible arrays. But in the meantime, it's just noise.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5a76021c
    • Sagi Grimberg's avatar
      nvme: fix possible hang when ns scanning fails during error recovery · 59c7c3ca
      Sagi Grimberg authored
      When the controller is reconnecting, the host fails I/O and admin
      commands as the host cannot reach the controller. ns scanning may
      revalidate namespaces during that period and it is wrong to remove
      namespaces due to these failures as we may hang (see 205da243).
      
      One command that may fail is nvme_identify_ns_descs. Since we return
      success due to having ns identify descriptor list optional, we continue
      to compare ns identifiers in nvme_revalidate_disk, obviously fail and
      return -ENODEV to nvme_validate_ns, which will remove the namespace.
      
      Exactly what we don't want to happen.
      
      Fixes: 22802bf7 ("nvme: Namepace identification descriptor list is optional")
      Tested-by: default avatarAnton Eidelman <anton@lightbitslabs.com>
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: default avatarKeith Busch <kbusch@kernel.org>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      59c7c3ca
    • Alexey Dobriyan's avatar
      nvme-pci: fix "slimmer CQ head update" · a8de6639
      Alexey Dobriyan authored
      Pre-incrementing ->cq_head can't be done in memory because OOB value
      can be observed by another context.
      
      This devalues space savings compared to original code :-\
      
      	$ ./scripts/bloat-o-meter ../vmlinux-000 ../obj/vmlinux
      	add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-32 (-32)
      	Function                                     old     new   delta
      	nvme_poll_irqdisable                         464     456      -8
      	nvme_poll                                    455     447      -8
      	nvme_irq                                     388     380      -8
      	nvme_dev_disable                             955     947      -8
      
      But the code is minimal now: one read for head, one read for q_depth,
      one increment, one comparison, single instruction phase bit update and
      one write for new head.
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Reported-by: default avatarJohn Garry <john.garry@huawei.com>
      Tested-by: default avatarJohn Garry <john.garry@huawei.com>
      Fixes: e2a366a4 ("nvme-pci: slimmer CQ head update")
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a8de6639
    • Christoph Hellwig's avatar
      bdi: add a ->dev_name field to struct backing_dev_info · 6bd87eec
      Christoph Hellwig authored
      Cache a copy of the name for the life time of the backing_dev_info
      structure so that we can reference it even after unregistering.
      
      Fixes: 68f23b89 ("memcg: fix a crash in wb_workfn when a device disappears")
      Reported-by: default avatarYufen Yu <yuyufen@huawei.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      6bd87eec
    • Yufen Yu's avatar
      bdi: use bdi_dev_name() to get device name · d51cfc53
      Yufen Yu authored
      Use the common interface bdi_dev_name() to get device name.
      Signed-off-by: default avatarYufen Yu <yuyufen@huawei.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      
      Add missing <linux/backing-dev.h> include BFQ
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d51cfc53
    • Linus Torvalds's avatar
      gcc-10: disable 'array-bounds' warning for now · 44720996
      Linus Torvalds authored
      This is another fine warning, related to the 'zero-length-bounds' one,
      but hitting the same historical code in the kernel.
      
      Because C didn't historically support flexible array members, we have
      code that instead uses a one-sized array, the same way we have cases of
      zero-sized arrays.
      
      The one-sized arrays come from either not wanting to use the gcc
      zero-sized array extension, or from a slight convenience-feature, where
      particularly for strings, the size of the structure now includes the
      allocation for the final NUL character.
      
      So with a "char name[1];" at the end of a structure, you can do things
      like
      
             v = my_malloc(sizeof(struct vendor) + strlen(name));
      
      and avoid the "+1" for the terminator.
      
      Yes, the modern way to do that is with a flexible array, and using
      'offsetof()' instead of 'sizeof()', and adding the "+1" by hand.  That
      also technically gets the size "more correct" in that it avoids any
      alignment (and thus padding) issues, but this is another long-term
      cleanup thing that will not happen for 5.7.
      
      So disable the warning for now, even though it's potentially quite
      useful.  Having a slew of warnings that then hide more urgent new issues
      is not an improvement.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      44720996
    • Linus Torvalds's avatar
      gcc-10: disable 'zero-length-bounds' warning for now · 5c45de21
      Linus Torvalds authored
      This is a fine warning, but we still have a number of zero-length arrays
      in the kernel that come from the traditional gcc extension.  Yes, they
      are getting converted to flexible arrays, but in the meantime the gcc-10
      warning about zero-length bounds is very verbose, and is hiding other
      issues.
      
      I missed one actual build failure because it was hidden among hundreds
      of lines of warning.  Thankfully I caught it on the second go before
      pushing things out, but it convinced me that I really need to disable
      the new warnings for now.
      
      We'll hopefully be all done with our conversion to flexible arrays in
      the not too distant future, and we can then re-enable this warning.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5c45de21
    • Linus Torvalds's avatar
      Stop the ad-hoc games with -Wno-maybe-initialized · 78a5255f
      Linus Torvalds authored
      We have some rather random rules about when we accept the
      "maybe-initialized" warnings, and when we don't.
      
      For example, we consider it unreliable for gcc versions < 4.9, but also
      if -O3 is enabled, or if optimizing for size.  And then various kernel
      config options disabled it, because they know that they trigger that
      warning by confusing gcc sufficiently (ie PROFILE_ALL_BRANCHES).
      
      And now gcc-10 seems to be introducing a lot of those warnings too, so
      it falls under the same heading as 4.9 did.
      
      At the same time, we have a very straightforward way to _enable_ that
      warning when wanted: use "W=2" to enable more warnings.
      
      So stop playing these ad-hoc games, and just disable that warning by
      default, with the known and straight-forward "if you want to work on the
      extra compiler warnings, use W=123".
      
      Would it be great to have code that is always so obvious that it never
      confuses the compiler whether a variable is used initialized or not?
      Yes, it would.  In a perfect world, the compilers would be smarter, and
      our source code would be simpler.
      
      That's currently not the world we live in, though.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      78a5255f
    • Linus Torvalds's avatar
      Merge tag 'io_uring-5.7-2020-05-08' of git://git.kernel.dk/linux-block · 1d3962ae
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
      
       - Fix finish_wait() balancing in file cancelation (Xiaoguang)
      
       - Ensure early cleanup of resources in ring map failure (Xiaoguang)
      
       - Ensure IORING_OP_SLICE does the right file mode checks (Pavel)
      
       - Remove file opening from openat/openat2/statx, it's not needed and
         messes with O_PATH
      
      * tag 'io_uring-5.7-2020-05-08' of git://git.kernel.dk/linux-block:
        io_uring: don't use 'fd' for openat/openat2/statx
        splice: move f_mode checks to do_{splice,tee}()
        io_uring: handle -EFAULT properly in io_uring_setup()
        io_uring: fix mismatched finish_wait() calls in io_uring_cancel_files()
      1d3962ae
  6. 08 May, 2020 13 commits
    • Lei Xue's avatar
      cachefiles: Fix race between read_waiter and read_copier involving op->to_do · 7bb0c533
      Lei Xue authored
      There is a potential race in fscache operation enqueuing for reading and
      copying multiple pages from cachefiles to netfs.  The problem can be seen
      easily on a heavy loaded system (for example many processes reading files
      continually on an NFS share covered by fscache triggered this problem within
      a few minutes).
      
      The race is due to cachefiles_read_waiter() adding the op to the monitor
      to_do list and then then drop the object->work_lock spinlock before
      completing fscache_enqueue_operation().  Once the lock is dropped,
      cachefiles_read_copier() grabs the op, completes processing it, and
      makes it through fscache_retrieval_complete() which sets the op->state to
      the final state of FSCACHE_OP_ST_COMPLETE(4).  When cachefiles_read_waiter()
      finally gets through the remainder of fscache_enqueue_operation()
      it sees the invalid state, and hits the ASSERTCMP and the following
      oops is seen:
      [ 2259.612361] FS-Cache:
      [ 2259.614785] FS-Cache: Assertion failed
      [ 2259.618639] FS-Cache: 4 == 5 is false
      [ 2259.622456] ------------[ cut here ]------------
      [ 2259.627190] kernel BUG at fs/fscache/operation.c:70!
      ...
      [ 2259.791675] RIP: 0010:[<ffffffffc061b4cf>]  [<ffffffffc061b4cf>] fscache_enqueue_operation+0xff/0x170 [fscache]
      [ 2259.802059] RSP: 0000:ffffa0263d543be0  EFLAGS: 00010046
      [ 2259.807521] RAX: 0000000000000019 RBX: ffffa01a4d390480 RCX: 0000000000000006
      [ 2259.814847] RDX: 0000000000000000 RSI: 0000000000000046 RDI: ffffa0263d553890
      [ 2259.822176] RBP: ffffa0263d543be8 R08: 0000000000000000 R09: ffffa0263c2d8708
      [ 2259.829502] R10: 0000000000001e7f R11: 0000000000000000 R12: ffffa01a4d390480
      [ 2259.844483] R13: ffff9fa9546c5920 R14: ffffa0263d543c80 R15: ffffa0293ff9bf10
      [ 2259.859554] FS:  00007f4b6efbd700(0000) GS:ffffa0263d540000(0000) knlGS:0000000000000000
      [ 2259.875571] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 2259.889117] CR2: 00007f49e1624ff0 CR3: 0000012b38b38000 CR4: 00000000007607e0
      [ 2259.904015] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 2259.918764] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 2259.933449] PKRU: 55555554
      [ 2259.943654] Call Trace:
      [ 2259.953592]  <IRQ>
      [ 2259.955577]  [<ffffffffc03a7c12>] cachefiles_read_waiter+0x92/0xf0 [cachefiles]
      [ 2259.978039]  [<ffffffffa34d3942>] __wake_up_common+0x82/0x120
      [ 2259.991392]  [<ffffffffa34d3a63>] __wake_up_common_lock+0x83/0xc0
      [ 2260.004930]  [<ffffffffa34d3510>] ? task_rq_unlock+0x20/0x20
      [ 2260.017863]  [<ffffffffa34d3ab3>] __wake_up+0x13/0x20
      [ 2260.030230]  [<ffffffffa34c72a0>] __wake_up_bit+0x50/0x70
      [ 2260.042535]  [<ffffffffa35bdcdb>] unlock_page+0x2b/0x30
      [ 2260.054495]  [<ffffffffa35bdd09>] page_endio+0x29/0x90
      [ 2260.066184]  [<ffffffffa368fc81>] mpage_end_io+0x51/0x80
      
      CPU1
      cachefiles_read_waiter()
       20 static int cachefiles_read_waiter(wait_queue_entry_t *wait, unsigned mode,
       21                                   int sync, void *_key)
       22 {
      ...
       61         spin_lock(&object->work_lock);
       62         list_add_tail(&monitor->op_link, &op->to_do);
       63         spin_unlock(&object->work_lock);
      <begin race window>
       64
       65         fscache_enqueue_retrieval(op);
      182 static inline void fscache_enqueue_retrieval(struct fscache_retrieval *op)
      183 {
      184         fscache_enqueue_operation(&op->op);
      185 }
       58 void fscache_enqueue_operation(struct fscache_operation *op)
       59 {
       60         struct fscache_cookie *cookie = op->object->cookie;
       61
       62         _enter("{OBJ%x OP%x,%u}",
       63                op->object->debug_id, op->debug_id, atomic_read(&op->usage));
       64
       65         ASSERT(list_empty(&op->pend_link));
       66         ASSERT(op->processor != NULL);
       67         ASSERT(fscache_object_is_available(op->object));
       68         ASSERTCMP(atomic_read(&op->usage), >, 0);
      <end race window>
      
      CPU2
      cachefiles_read_copier()
      168         while (!list_empty(&op->to_do)) {
      ...
      202                 fscache_end_io(op, monitor->netfs_page, error);
      203                 put_page(monitor->netfs_page);
      204                 fscache_retrieval_complete(op, 1);
      
      CPU1
       58 void fscache_enqueue_operation(struct fscache_operation *op)
       59 {
      ...
       69         ASSERTIFCMP(op->state != FSCACHE_OP_ST_IN_PROGRESS,
       70                     op->state, ==,  FSCACHE_OP_ST_CANCELLED);
      Signed-off-by: default avatarLei Xue <carmark.dlut@gmail.com>
      Signed-off-by: default avatarDave Wysochanski <dwysocha@redhat.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      7bb0c533
    • Dave Wysochanski's avatar
      NFSv4: Fix fscache cookie aux_data to ensure change_attr is included · 50eaa652
      Dave Wysochanski authored
      Commit 402cb8dd ("fscache: Attach the index key and aux data to
      the cookie") added the aux_data and aux_data_len to parameters to
      fscache_acquire_cookie(), and updated the callers in the NFS client.
      In the process it modified the aux_data to include the change_attr,
      but missed adding change_attr to a couple places where aux_data was
      used.  Specifically, when opening a file and the change_attr is not
      added, the following attempt to lookup an object will fail inside
      cachefiles_check_object_xattr() = -116 due to
      nfs_fscache_inode_check_aux() failing memcmp on auxdata and returning
      FSCACHE_CHECKAUX_OBSOLETE.
      
      Fix this by adding nfs_fscache_update_auxdata() to set the auxdata
      from all relevant fields in the inode, including the change_attr.
      
      Fixes: 402cb8dd ("fscache: Attach the index key and aux data to the cookie")
      Signed-off-by: default avatarDave Wysochanski <dwysocha@redhat.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      50eaa652
    • Dave Wysochanski's avatar
      NFS: Fix fscache super_cookie allocation · 15751612
      Dave Wysochanski authored
      Commit f2aedb71 ("NFS: Add fs_context support.") reworked
      NFS mount code paths for fs_context support which included
      super_block initialization.  In the process there was an extra
      return left in the code and so we never call
      nfs_fscache_get_super_cookie even if 'fsc' is given on as mount
      option.  In addition, there is an extra check inside
      nfs_fscache_get_super_cookie for the NFS_OPTION_FSCACHE which
      is unnecessary since the only caller nfs_get_cache_cookie
      checks this flag.
      
      Fixes: f2aedb71 ("NFS: Add fs_context support.")
      Signed-off-by: default avatarDave Wysochanski <dwysocha@redhat.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      15751612
    • Dave Wysochanski's avatar
      NFS: Fix fscache super_cookie index_key from changing after umount · d9bfced1
      Dave Wysochanski authored
      Commit 402cb8dd ("fscache: Attach the index key and aux data to
      the cookie") added the index_key and index_key_len parameters to
      fscache_acquire_cookie(), and updated the callers in the NFS client.
      One of the callers was inside nfs_fscache_get_super_cookie()
      and was changed to use the full struct nfs_fscache_key as the
      index_key.  However, a couple members of this structure contain
      pointers and thus will change each time the same NFS share is
      remounted.  Since index_key is used for fscache_cookie->key_hash
      and this subsequently is used to compare cookies, the effectiveness
      of fscache with NFS is reduced to the point at which a umount
      occurs.   Any subsequent remount of the same share will cause a
      unique NFS super_block index_key and key_hash to be generated for
      the same data, rendering any prior fscache data unable to be
      found.  A simple reproducer demonstrates the problem.
      
      1. Mount share with 'fsc', create a file, drop page cache
      systemctl start cachefilesd
      mount -o vers=3,fsc 127.0.0.1:/export /mnt
      dd if=/dev/zero of=/mnt/file1.bin bs=4096 count=1
      echo 3 > /proc/sys/vm/drop_caches
      
      2. Read file into page cache and fscache, then unmount
      dd if=/mnt/file1.bin of=/dev/null bs=4096 count=1
      umount /mnt
      
      3. Remount and re-read which should come from fscache
      mount -o vers=3,fsc 127.0.0.1:/export /mnt
      echo 3 > /proc/sys/vm/drop_caches
      dd if=/mnt/file1.bin of=/dev/null bs=4096 count=1
      
      4. Check for READ ops in mountstats - there should be none
      grep READ: /proc/self/mountstats
      
      Looking at the history and the removed function, nfs_super_get_key(),
      we should only use nfs_fscache_key.key plus any uniquifier, for
      the fscache index_key.
      
      Fixes: 402cb8dd ("fscache: Attach the index key and aux data to the cookie")
      Signed-off-by: default avatarDave Wysochanski <dwysocha@redhat.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      d9bfced1
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · d5eeab8d
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Four minor fixes, all in drivers (qla2xxx, ibmvfc, ibmvscsi)"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: ibmvscsi: Fix WARN_ON during event pool release
        scsi: ibmvfc: Don't send implicit logouts prior to NPIV login
        scsi: qla2xxx: Delete all sessions before unregister local nvme port
        scsi: qla2xxx: Fix hang when issuing nvme disconnect-all in NPIV
      d5eeab8d
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-5.7-rc5' of git://github.com/ceph/ceph-client · eb24fdd8
      Linus Torvalds authored
      Pull ceph fixes from Ilya Dryomov:
       "Fixes for an endianness handling bug that prevented mounts on
        big-endian arches, a spammy log message and a couple error paths.
      
        Also included a MAINTAINERS update"
      
      * tag 'ceph-for-5.7-rc5' of git://github.com/ceph/ceph-client:
        ceph: demote quotarealm lookup warning to a debug message
        MAINTAINERS: remove myself as ceph co-maintainer
        ceph: fix double unlock in handle_cap_export()
        ceph: fix special error code in ceph_try_get_caps()
        ceph: fix endianness bug when handling MDS session feature bits
      eb24fdd8
    • Luis Henriques's avatar
      ceph: demote quotarealm lookup warning to a debug message · 12ae44a4
      Luis Henriques authored
      A misconfigured cephx can easily result in having the kernel client
      flooding the logs with:
      
        ceph: Can't lookup inode 1 (err: -13)
      
      Change this message to debug level.
      
      Cc: stable@vger.kernel.org
      URL: https://tracker.ceph.com/issues/44546Signed-off-by: default avatarLuis Henriques <lhenriques@suse.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      12ae44a4
    • Linus Torvalds's avatar
      Merge tag 'char-misc-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · 4334f30e
      Linus Torvalds authored
      Pull char/misc driver fixes from Greg KH:
       "Here are some small driver fixes for 5.7-rc5 that resolve a number of
        minor reported issues:
      
         - mhi bus driver fixes found as people actually use the code
      
         - phy driver fixes and compat string additions
      
         - most driver fix due to link order changing when the core moved out
           of staging
      
         - mei driver fix
      
         - interconnect build warning fix
      
        All of these have been in linux-next for a while with no reported
        issues"
      
      * tag 'char-misc-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        bus: mhi: core: Fix channel device name conflict
        bus: mhi: core: Fix typo in comment
        bus: mhi: core: Offload register accesses to the controller
        bus: mhi: core: Remove link_status() callback
        bus: mhi: core: Make sure to powerdown if mhi_sync_power_up fails
        bus: mhi: Fix parsing of mhi_flags
        mei: me: disable mei interface on LBG servers.
        phy: qualcomm: usb-hs-28nm: Prepare clocks in init
        MAINTAINERS: Add Vinod Koul as Generic PHY co-maintainer
        interconnect: qcom: Move the static keyword to the front of declaration
        most: core: use function subsys_initcall()
        bus: mhi: core: Fix a NULL vs IS_ERR check in mhi_create_devices()
        phy: qcom-qusb2: Re add "qcom,sdm845-qusb2-phy" compat string
        phy: tegra: Select USB_COMMON for usb_get_maximum_speed()
      4334f30e
    • Linus Torvalds's avatar
      Merge tag 'driver-core-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core · c61529f6
      Linus Torvalds authored
      Pull driver core fixes from Greg KH:
       "Here are a number of small driver core fixes for 5.7-rc5 to resolve a
        bunch of reported issues with the current tree.
      
        Biggest here are the reverts and patches from John Stultz to resolve a
        bunch of deferred probe regressions we have been seeing in 5.7-rc
        right now.
      
        Along with those are some other smaller fixes:
      
         - coredump crash fix
      
         - devlink fix for when permissive mode was enabled
      
         - amba and platform device dma_parms fixes
      
         - component error silenced for when deferred probe happens
      
        All of these have been in linux-next for a while with no reported
        issues"
      
      * tag 'driver-core-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
        regulator: Revert "Use driver_deferred_probe_timeout for regulator_init_complete_work"
        driver core: Ensure wait_for_device_probe() waits until the deferred_probe_timeout fires
        driver core: Use dev_warn() instead of dev_WARN() for deferred_probe_timeout warnings
        driver core: Revert default driver_deferred_probe_timeout value to 0
        component: Silence bind error on -EPROBE_DEFER
        driver core: Fix handling of fw_devlink=permissive
        coredump: fix crash when umh is disabled
        amba: Initialize dma_parms for amba devices
        driver core: platform: Initialize dma_parms for platform devices
      c61529f6
    • Linus Torvalds's avatar
      Merge tag 'staging-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · e7a1c733
      Linus Torvalds authored
      Pull staging driver fixes from Greg KH:
       "Here are three small driver fixes for 5.7-rc5.
      
        Two of these are documentation fixes:
      
         - MAINTAINERS update due to removed driver
      
         - removing Wolfram from the ks7010 driver TODO file
      
        The other patch is a real fix:
      
         - fix gasket driver to proper check the return value of a call
      
        All of these have been in linux-next for a while with no reported
        issues"
      
      * tag 'staging-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
        staging: gasket: Check the return value of gasket_get_bar_index()
        staging: ks7010: remove me from CC list
        MAINTAINERS: remove entry after hp100 driver removal
      e7a1c733
    • Linus Torvalds's avatar
      Merge tag 'tty-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · cbd0e482
      Linus Torvalds authored
      Pull tty/serial fixes from Greg KH:
       "Here are three small TTY/Serial/VT fixes for 5.7-rc5:
      
         - revert for the bcm63xx driver "fix" that was incorrect
      
         - vt unicode console bugfix
      
         - xilinx_uartps console driver fix
      
        All of these have been in linux next with no reported issues"
      
      * tag 'tty-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
        tty: xilinx_uartps: Fix missing id assignment to the console
        vt: fix unicode console freeing with a common interface
        Revert "tty: serial: bcm63xx: fix missing clk_put() in bcm63xx_uart"
      cbd0e482
    • Linus Torvalds's avatar
      Merge tag 'usb-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 0a0b96b2
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are some small USB fixes for 5.7-rc5 to resolve some reported
        issues:
      
         - syzbot found problems fixed
      
         - usbfs dma mapping fix
      
         - typec bugfixs
      
         - chipidea bugfix
      
         - usb4/thunderbolt fix
      
         - new device ids/quirks
      
        All of these have been in linux-next for a while with no reported
        issues"
      
      * tag 'usb-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        usb: chipidea: msm: Ensure proper controller reset using role switch API
        usb: typec: mux: intel: Handle alt mode HPD_HIGH
        usb: usbfs: correct kernel->user page attribute mismatch
        usb: typec: intel_pmc_mux: Fix the property names
        USB: core: Fix misleading driver bug report
        USB: serial: qcserial: Add DW5816e support
        USB: uas: add quirk for LaCie 2Big Quadra
        thunderbolt: Check return value of tb_sw_read() in usb4_switch_op()
        USB: serial: garmin_gps: add sanity checking for data length
      0a0b96b2
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2020-05-08' of git://anongit.freedesktop.org/drm/drm · 775a8e03
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Another pretty normal week. I didn't get any i915 fixes yet, so next
        week I'd expect double the usual i915, but otherwise a bunch of amdgpu
        and some scattered other fixes.
      
        hdcp:
         - fix HDCP regression
      
        amdgpu:
         - Runtime PM fixes
         - DC fix for PPC
         - Misc DC fixes
      
        virtio:
         - fix context ordering issue
      
        sun4i:
         - old gcc warning fix
      
        ingenic-drm:
         - missing module support"
      
      * tag 'drm-fixes-2020-05-08' of git://anongit.freedesktop.org/drm/drm:
        drm/amd/display: Prevent dpcd reads with passive dongles
        drm/amd/display: fix counter in wait_for_no_pipes_pending
        drm/amd/display: Update DCN2.1 DV Code Revision
        drm: Fix HDCP failures when SRM fw is missing
        sun6i: dsi: fix gcc-4.8
        drm: ingenic-drm: add MODULE_DEVICE_TABLE
        drm/virtio: create context before RESOURCE_CREATE_2D in 3D mode
        drm/amd/display: work around fp code being emitted outside of DC_FP_START/END
        drm/amdgpu/dc: Use WARN_ON_ONCE for ASSERT
        drm/amdgpu: drop redundant cg/pg ungate on runpm enter
        drm/amdgpu: move kfd suspend after ip_suspend_phase1
      775a8e03