1. 31 May, 2018 2 commits
  2. 20 Mar, 2018 9 commits
    • Miklos Szeredi's avatar
      fuse: add writeback documentation · 5ba24197
      Miklos Szeredi authored
      Document various modes of I/O supported by the fuse kernel module.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      5ba24197
    • Miklos Szeredi's avatar
      fuse: honor AT_STATX_FORCE_SYNC · bf5c1898
      Miklos Szeredi authored
      Force a refresh of attributes from the fuse server in this case.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      bf5c1898
    • Miklos Szeredi's avatar
      fuse: honor AT_STATX_DONT_SYNC · ff1b89f3
      Miklos Szeredi authored
      The description of this flag says "Don't sync attributes with the server".
      In other words: always use the attributes cached in the kernel and don't
      send network or local messages to refresh the attributes.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      ff1b89f3
    • Seth Forshee's avatar
      fuse: Restrict allow_other to the superblock's namespace or a descendant · 73f03c2b
      Seth Forshee authored
      Unprivileged users are normally restricted from mounting with the
      allow_other option by system policy, but this could be bypassed for a mount
      done with user namespace root permissions. In such cases allow_other should
      not allow users outside the userns to access the mount as doing so would
      give the unprivileged user the ability to manipulate processes it would
      otherwise be unable to manipulate. Restrict allow_other to apply to users
      in the same userns used at mount or a descendant of that namespace. Also
      export current_in_userns() for use by fuse when built as a module.
      Reviewed-by: default avatarSerge Hallyn <serge@hallyn.com>
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Signed-off-by: default avatarDongsu Park <dongsu@kinvolk.io>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      73f03c2b
    • Eric W. Biederman's avatar
      fuse: Support fuse filesystems outside of init_user_ns · 8cb08329
      Eric W. Biederman authored
      In order to support mounts from namespaces other than init_user_ns, fuse
      must translate uids and gids to/from the userns of the process servicing
      requests on /dev/fuse. This patch does that, with a couple of restrictions
      on the namespace:
      
       - The userns for the fuse connection is fixed to the namespace
         from which /dev/fuse is opened.
      
       - The namespace must be the same as s_user_ns.
      
      These restrictions simplify the implementation by avoiding the need to pass
      around userns references and by allowing fuse to rely on the checks in
      setattr_prepare for ownership changes.  Either restriction could be relaxed
      in the future if needed.
      
      For cuse the userns used is the opener of /dev/cuse.  Semantically the cuse
      support does not appear safe for unprivileged users.  Practically the
      permissions on /dev/cuse only make it accessible to the global root user.
      If something slips through the cracks in a user namespace the only users
      who will be able to use the cuse device are those users mapped into the
      user namespace.
      
      Translation in the posix acl is updated to use the uuser namespace of the
      filesystem.  Avoiding cases which might bypass this translation is handled
      in a following change.
      
      This change is stronlgy based on a similar change from Seth Forshee and
      Dongsu Park.
      
      Cc: Seth Forshee <seth.forshee@canonical.com>
      Cc: Dongsu Park <dongsu@kinvolk.io>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      8cb08329
    • Eric W. Biederman's avatar
      fuse: Fail all requests with invalid uids or gids · c9582eb0
      Eric W. Biederman authored
      Upon a cursory examinination the uid and gid of a fuse request are
      necessary for correct operation.  Failing a fuse request where those
      values are not reliable seems a straight forward and reliable means of
      ensuring that fuse requests with bad data are not sent or processed.
      
      In most cases the vfs will avoid actions it suspects will cause
      an inode write back of an inode with an invalid uid or gid.  But that does
      not map precisely to what fuse is doing, so test for this and solve
      this at the fuse level as well.
      
      Performing this work in fuse_req_init_context is cheap as the code is
      already performing the translation here and only needs to check the
      result of the translation to see if things are not representable in
      a form the fuse server can handle.
      
      [SzM] Don't zero the context for the nofail case, just keep using the
      munging version (makes sense for debugging and doesn't hurt).
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      c9582eb0
    • Eric W. Biederman's avatar
      fuse: Remove the buggy retranslation of pids in fuse_dev_do_read · dbf107b2
      Eric W. Biederman authored
      At the point of fuse_dev_do_read the user space process that initiated the
      action on the fuse filesystem may no longer exist.  The process have been
      killed or may have fired an asynchronous request and exited.
      
      If the initial process has exited, the code "pid_vnr(find_pid_ns(in->h.pid,
      fc->pid_ns)" will either return a pid of 0, or in the unlikely event that
      the pid has been reallocated it can return practically any pid.  Any pid is
      possible as the pid allocator allocates pid numbers in different pid
      namespaces independently.
      
      The only way to make translation in fuse_dev_do_read reliable is to call
      get_pid in fuse_req_init_context, and pid_vnr followed by put_pid in
      fuse_dev_do_read.  That reference counting in other contexts has been shown
      to bounce cache lines between processors and in general be slow.  So that
      is not desirable.
      
      The only known user of running the fuse server in a different pid namespace
      from the filesystem does not care what the pids are in the fuse messages so
      removing this code should not matter.
      
      Getting the translation to a server running outside of the pid namespace of
      a container can still be achieved by playing setns games at mount time.  It
      is also possible to add an option to pass a pid namespace into the fuse
      filesystem at mount time.
      
      Fixes: 5d6d3a30 ("fuse: allow server to run in different pid_ns")
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      dbf107b2
    • Szymon Lukasz's avatar
      fuse: return -ECONNABORTED on /dev/fuse read after abort · 3b7008b2
      Szymon Lukasz authored
      Currently the userspace has no way of knowing whether the fuse
      connection ended because of umount or abort via sysfs. It makes it hard
      for filesystems to free the mountpoint after abort without worrying
      about removing some new mount.
      
      The patch fixes it by returning different errors when userspace reads
      from /dev/fuse (-ENODEV for umount and -ECONNABORTED for abort).
      
      Add a new capability flag FUSE_ABORT_ERROR. If set and the connection is
      gone because of sysfs abort, reading from the device will return
      -ECONNABORTED.
      Signed-off-by: default avatarSzymon Lukasz <noh4hss@gmail.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      3b7008b2
    • Miklos Szeredi's avatar
      fuse: atomic_o_trunc should truncate pagecache · df0e91d4
      Miklos Szeredi authored
      Fuse has an "atomic_o_trunc" mode, where userspace filesystem uses the
      O_TRUNC flag in the OPEN request to truncate the file atomically with the
      open.
      
      In this mode there's no need to send a SETATTR request to userspace after
      the open, so fuse_do_setattr() checks this mode and returns.  But this
      misses the important step of truncating the pagecache.
      
      Add the missing parts of truncation to the ATTR_OPEN branch.
      Reported-by: default avatarChad Austin <chadaustin@fb.com>
      Fixes: 6ff958ed ("fuse: add atomic open+truncate support")
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Cc: <stable@vger.kernel.org>
      df0e91d4
  3. 19 Mar, 2018 1 commit
  4. 18 Mar, 2018 5 commits
    • Linus Torvalds's avatar
      Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9e1909b9
      Linus Torvalds authored
      Pull x86/pti updates from Thomas Gleixner:
       "Another set of melted spectrum updates:
      
         - Iron out the last late microcode loading issues by actually
           checking whether new microcode is present and preventing the CPU
           synchronization to run into a timeout induced hang.
      
         - Remove Skylake C2 from the microcode blacklist according to the
           latest Intel documentation
      
         - Fix the VM86 POPF emulation which traps if VIP is set, but VIF is
           not. Enhance the selftests to catch that kind of issue
      
         - Annotate indirect calls/jumps for objtool on 32bit. This is not a
           functional issue, but for consistency sake its the right thing to
           do.
      
         - Fix a jump label build warning observed on SPARC64 which uses 32bit
           storage for the code location which is casted to 64 bit pointer w/o
           extending it to 64bit first.
      
         - Add two new cpufeature bits. Not really an urgent issue, but
           provides them for both x86 and x86/kvm work. No impact on the
           current kernel"
      
      * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/microcode: Fix CPU synchronization routine
        x86/microcode: Attempt late loading only when new microcode is present
        x86/speculation: Remove Skylake C2 from Speculation Control microcode blacklist
        jump_label: Fix sparc64 warning
        x86/speculation, objtool: Annotate indirect calls/jumps for objtool on 32-bit kernels
        x86/vm86/32: Fix POPF emulation
        selftests/x86/entry_from_vm86: Add test cases for POPF
        selftests/x86/entry_from_vm86: Exit with 1 if we fail
        x86/cpufeatures: Add Intel PCONFIG cpufeature
        x86/cpufeatures: Add Intel Total Memory Encryption cpufeature
      9e1909b9
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · df4fe178
      Linus Torvalds authored
      Pull x86 fix from Thomas Gleixner:
       "A single fix for vmalloc_fault() which uses p*d_huge() unconditionally
        whether CONFIG_HUGETLBFS is set or not. In case of CONFIG_HUGETLBFS=n
        this results in a crash as p*d_huge() returns 0 in that case"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/mm: Fix vmalloc_fault to use pXd_large
      df4fe178
    • Linus Torvalds's avatar
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · d2149e13
      Linus Torvalds authored
      Pull irq fixes from Thomas Gleixner:
       "Three fixes for irq chip drivers:
      
         - Make sure the allocations in the GIC-V3 ITS driver are large enough
           to accomodate the interrupt space
      
         - Fix a misplaced __iomem annotation which causes a splat of 26
           sparse warnings
      
         - Remove an unused function in the IMX GPCV2 driver which causes
           build warnings"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip/irq-imx-gpcv2: Remove unused function
        irqchip/gic-v3-its: Ensure nr_ites >= nr_lpis
        irqchip/gic-v3-its: Fix misplaced __iomem annotations
      d2149e13
    • Linus Torvalds's avatar
      Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 23fe85ae
      Linus Torvalds authored
      Pull EFI fix from Thomas Gleixner:
       "A single fix to prevent partially initialized pointers in mixed mode
        (64bit kernel on 32bit UEFI)"
      
      * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        efi/libstub/tpm: Initialize pointer variables to zero for mixed mode
      23fe85ae
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 3cd1d327
      Linus Torvalds authored
      Pull KVM fixes from Paolo Bonzini:
       "PPC:
         - fix bug leading to lost IPIs and smp_call_function_many() lockups
           on POWER9
      
        ARM:
         - locking fix
         - reset fix
         - GICv2 multi-source SGI injection fix
         - GICv2-on-v3 MMIO synchronization fix
         - make the console less verbose.
      
        x86:
         - fix device passthrough on AMD SME"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: x86: Fix device passthrough when SME is active
        kvm: arm/arm64: vgic-v3: Tighten synchronization for guests using v2 on v3
        KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintid
        KVM: arm/arm64: Reduce verbosity of KVM init log
        KVM: arm/arm64: Reset mapped IRQs on VM reset
        KVM: arm/arm64: Avoid vcpu_load for other vcpu ioctls than KVM_RUN
        KVM: arm/arm64: vgic: Add missing irq_lock to vgic_mmio_read_pending
        KVM: PPC: Book3S HV: Fix trap number return from __kvmppc_vcore_entry
      3cd1d327
  5. 17 Mar, 2018 1 commit
    • John David Anglin's avatar
      parisc: Handle case where flush_cache_range is called with no context · 9ef0f88f
      John David Anglin authored
      Just when I had decided that flush_cache_range() was always called with
      a valid context, Helge reported two cases where the
      "BUG_ON(!vma->vm_mm->context);" was hit on the phantom buildd:
      
       kernel BUG at /mnt/sdb6/linux/linux-4.15.4/arch/parisc/kernel/cache.c:587!
       CPU: 1 PID: 3254 Comm: kworker/1:2 Tainted: G D 4.15.0-1-parisc64-smp #1 Debian 4.15.4-1+b1
       Workqueue: events free_ioctx
        IAOQ[0]: flush_cache_range+0x164/0x168
        IAOQ[1]: flush_cache_page+0x0/0x1c8
        RP(r2): unmap_page_range+0xae8/0xb88
       Backtrace:
        [<00000000404a6980>] unmap_page_range+0xae8/0xb88
        [<00000000404a6ae0>] unmap_single_vma+0xc0/0x188
        [<00000000404a6cdc>] zap_page_range_single+0x134/0x1f8
        [<00000000404a702c>] unmap_mapping_range+0x1cc/0x208
        [<0000000040461518>] truncate_pagecache+0x98/0x108
        [<0000000040461624>] truncate_setsize+0x9c/0xb8
        [<00000000405d7f30>] put_aio_ring_file+0x80/0x100
        [<00000000405d803c>] aio_free_ring+0x8c/0x290
        [<00000000405d82c0>] free_ioctx+0x80/0x180
        [<0000000040284e6c>] process_one_work+0x21c/0x668
        [<00000000402854c4>] worker_thread+0x20c/0x778
        [<0000000040291d44>] kthread+0x2d4/0x2e0
        [<0000000040204020>] end_fault_vector+0x20/0xc0
      
      This indicates that we need to handle the no context case in
      flush_cache_range() as we do in flush_cache_mm().
      
      In thinking about this, I realized that we don't need to flush the TLB
      when there is no context.  So, I added context checks to the large flush
      cases in flush_cache_mm() and flush_cache_range().  The large flush case
      occurs frequently in flush_cache_mm() and the change should improve fork
      performance.
      
      The v2 version of this change removes the BUG_ON from flush_cache_page()
      by skipping the TLB flush when there is no context.  I also added code
      to flush the TLB in flush_cache_mm() and flush_cache_range() when we
      have a context that's not current.  Now all three routines handle TLB
      flushes in a similar manner.
      Signed-off-by: default avatarJohn David Anglin <dave.anglin@bell.net>
      Cc: stable@vger.kernel.org # 4.9+
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      9ef0f88f
  6. 16 Mar, 2018 17 commits
  7. 15 Mar, 2018 5 commits
    • Eric W. Biederman's avatar
      fs: Teach path_connected to handle nfs filesystems with multiple roots. · 95dd7758
      Eric W. Biederman authored
      On nfsv2 and nfsv3 the nfs server can export subsets of the same
      filesystem and report the same filesystem identifier, so that the nfs
      client can know they are the same filesystem.  The subsets can be from
      disjoint directory trees.  The nfsv2 and nfsv3 filesystems provides no
      way to find the common root of all directory trees exported form the
      server with the same filesystem identifier.
      
      The practical result is that in struct super s_root for nfs s_root is
      not necessarily the root of the filesystem.  The nfs mount code sets
      s_root to the root of the first subset of the nfs filesystem that the
      kernel mounts.
      
      This effects the dcache invalidation code in generic_shutdown_super
      currently called shrunk_dcache_for_umount and that code for years
      has gone through an additional list of dentries that might be dentry
      trees that need to be freed to accomodate nfs.
      
      When I wrote path_connected I did not realize nfs was so special, and
      it's hueristic for avoiding calling is_subdir can fail.
      
      The practical case where this fails is when there is a move of a
      directory from the subtree exposed by one nfs mount to the subtree
      exposed by another nfs mount.  This move can happen either locally or
      remotely.  With the remote case requiring that the move directory be cached
      before the move and that after the move someone walks the path
      to where the move directory now exists and in so doing causes the
      already cached directory to be moved in the dcache through the magic
      of d_splice_alias.
      
      If someone whose working directory is in the move directory or a
      subdirectory and now starts calling .. from the initial mount of nfs
      (where s_root == mnt_root), then path_connected as a heuristic will
      not bother with the is_subdir check.  As s_root really is not the root
      of the nfs filesystem this heuristic is wrong, and the path may
      actually not be connected and path_connected can fail.
      
      The is_subdir function might be cheap enough that we can call it
      unconditionally.  Verifying that will take some benchmarking and
      the result may not be the same on all kernels this fix needs
      to be backported to.  So I am avoiding that for now.
      
      Filesystems with snapshots such as nilfs and btrfs do something
      similar.  But as the directory tree of the snapshots are disjoint
      from one another and from the main directory tree rename won't move
      things between them and this problem will not occur.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarAl Viro <viro@ZenIV.linux.org.uk>
      Fixes: 397d425d ("vfs: Test for and handle paths that are unreachable from their mnt_root")
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      95dd7758
    • Rodrigo Vivi's avatar
      Merge tag 'gvt-fixes-2018-03-15' of https://github.com/intel/gvt-linux into drm-intel-fixes · 05b429a8
      Rodrigo Vivi authored
      gvt-fixes-2018-03-15
      
      - Two warnings fix for runtime pm and usr copy (Xiong, Zhenyu)
      - OA context fix for vGPU profiling (Min)
      - privilege batch buffer reloc fix (Fred)
      Signed-off-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180315100023.5n5a74afky6qinoh@zhen-hp.sh.intel.com
      05b429a8
    • David S. Miller's avatar
      sparc64: Fix regression in pmdp_invalidate(). · cfb61b5e
      David S. Miller authored
      pmdp_invalidate() was changed to update the pmd atomically
      (to not lose dirty/access bits) and return the original pmd
      value.
      
      However, in doing so, we lost a lot of the essential work that
      set_pmd_at() does, namely to update hugepage mapping counts and
      queuing up the batched TLB flush entry.
      
      Thus we were not flushing entries out of the TLB when making
      such PMD changes.
      
      Fix this by abstracting the accounting work of set_pmd_at() out into a
      separate function, and call it from pmdp_establish().
      
      Fixes: a8e654f0 ("sparc64: update pmdp_invalidate() to return old pmd value")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cfb61b5e
    • Paolo Bonzini's avatar
      Merge tag 'kvm-ppc-fixes-4.16-2' of... · 52be7a46
      Paolo Bonzini authored
      Merge tag 'kvm-ppc-fixes-4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into kvm-master
      
      Fix for PPC KVM for 4.16
      
      - Fix bug leading to lost IPIs on POWER9 and hence to other CPUs reporting
        lockups in smp_call_function_many().
      52be7a46
    • Paolo Bonzini's avatar
      Merge tag 'kvm-arm-fixes-for-v4.16-2' of... · bb9b4dbe
      Paolo Bonzini authored
      Merge tag 'kvm-arm-fixes-for-v4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master
      
      kvm/arm fixes for 4.16, take 2
      
      - Peace of mind locking fix in vgic_mmio_read_pending
      - Allow hw-mapped interrupts to be reset when the VM resets
      - Fix GICv2 multi-source SGI injection
      - Fix MMIO synchronization for GICv2 on v3 emulation
      - Remove excess verbosity on the console
      bb9b4dbe