1. 02 Apr, 2020 28 commits
    • Linus Torvalds's avatar
      Merge tag 'vfs-5.7-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 7db83c07
      Linus Torvalds authored
      Pull hibernation fix from Darrick Wong:
       "Fix a regression where we broke the userspace hibernation driver by
        disallowing writes to the swap device"
      
      * tag 'vfs-5.7-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        hibernate: Allow uswsusp to write to swap
      7db83c07
    • Linus Torvalds's avatar
      Merge tag 'iomap-5.7-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 35a9fafe
      Linus Torvalds authored
      Pull iomap updates from Darrick Wong:
       "We're fixing tracepoints and comments in this cycle, so there
        shouldn't be any surprises here.
      
        I anticipate sending a second pull request next week with a single bug
        fix for readahead, but it's still undergoing QA.
      
        Summary:
      
         - Fix a broken tracepoint
      
         - Fix a broken comment"
      
      * tag 'iomap-5.7-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        iomap: fix comments in iomap_dio_rw
        iomap: Remove pgoff from tracepoints
      35a9fafe
    • Linus Torvalds's avatar
      Merge branch 'work.dotdot1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 9c577491
      Linus Torvalds authored
      Pull vfs pathwalk sanitizing from Al Viro:
       "Massive pathwalk rewrite and cleanups.
      
        Several iterations have been posted; hopefully this thing is getting
        readable and understandable now. Pretty much all parts of pathname
        resolutions are affected...
      
        The branch is identical to what has sat in -next, except for commit
        message in "lift all calls of step_into() out of follow_dotdot/
        follow_dotdot_rcu", crediting Qian Cai for reporting the bug; only
        commit message changed there."
      
      * 'work.dotdot1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (69 commits)
        lookup_open(): don't bother with fallbacks to lookup+create
        atomic_open(): no need to pass struct open_flags anymore
        open_last_lookups(): move complete_walk() into do_open()
        open_last_lookups(): lift O_EXCL|O_CREAT handling into do_open()
        open_last_lookups(): don't abuse complete_walk() when all we want is unlazy
        open_last_lookups(): consolidate fsnotify_create() calls
        take post-lookup part of do_last() out of loop
        link_path_walk(): sample parent's i_uid and i_mode for the last component
        __nd_alloc_stack(): make it return bool
        reserve_stack(): switch to __nd_alloc_stack()
        pick_link(): take reserving space on stack into a new helper
        pick_link(): more straightforward handling of allocation failures
        fold path_to_nameidata() into its only remaining caller
        pick_link(): pass it struct path already with normal refcounting rules
        fs/namei.c: kill follow_mount()
        non-RCU analogue of the previous commit
        helper for mount rootwards traversal
        follow_dotdot(): be lazy about changing nd->path
        follow_dotdot_rcu(): be lazy about changing nd->path
        follow_dotdot{,_rcu}(): massage loops
        ...
      9c577491
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace · d987ca1c
      Linus Torvalds authored
      Pull exec/proc updates from Eric Biederman:
       "This contains two significant pieces of work: the work to sort out
        proc_flush_task, and the work to solve a deadlock between strace and
        exec.
      
        Fixing proc_flush_task so that it no longer requires a persistent
        mount makes improvements to proc possible. The removal of the
        persistent mount solves an old regression that that caused the hidepid
        mount option to only work on remount not on mount. The regression was
        found and reported by the Android folks. This further allows Alexey
        Gladkov's work making proc mount options specific to an individual
        mount of proc to move forward.
      
        The work on exec starts solving a long standing issue with exec that
        it takes mutexes of blocking userspace applications, which makes exec
        extremely deadlock prone. For the moment this adds a second mutex with
        a narrower scope that handles all of the easy cases. Which makes the
        tricky cases easy to spot. With a little luck the code to solve those
        deadlocks will be ready by next merge window"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (25 commits)
        signal: Extend exec_id to 64bits
        pidfd: Use new infrastructure to fix deadlocks in execve
        perf: Use new infrastructure to fix deadlocks in execve
        proc: io_accounting: Use new infrastructure to fix deadlocks in execve
        proc: Use new infrastructure to fix deadlocks in execve
        kernel/kcmp.c: Use new infrastructure to fix deadlocks in execve
        kernel: doc: remove outdated comment cred.c
        mm: docs: Fix a comment in process_vm_rw_core
        selftests/ptrace: add test cases for dead-locks
        exec: Fix a deadlock in strace
        exec: Add exec_update_mutex to replace cred_guard_mutex
        exec: Move exec_mmap right after de_thread in flush_old_exec
        exec: Move cleanup of posix timers on exec out of de_thread
        exec: Factor unshare_sighand out of de_thread and call it separately
        exec: Only compute current once in flush_old_exec
        pid: Improve the comment about waiting in zap_pid_ns_processes
        proc: Remove the now unnecessary internal mount of proc
        uml: Create a private mount of proc for mconsole
        uml: Don't consult current to find the proc_mnt in mconsole_proc
        proc: Use a list of inodes to flush from proc
        ...
      d987ca1c
    • Al Viro's avatar
      lookup_open(): don't bother with fallbacks to lookup+create · 99a4a90c
      Al Viro authored
      We fall back to lookup+create (instead of atomic_open) in several cases:
      	1) we don't have write access to filesystem and O_TRUNC is
      present in the flags.  It's not something we want ->atomic_open() to
      see - it just might go ahead and truncate the file.  However, we can
      pass it the flags sans O_TRUNC - eventually do_open() will call
      handle_truncate() anyway.
      	2) we have O_CREAT | O_EXCL and we can't write to parent.
      That's going to be an error, of course, but we want to know _which_
      error should that be - might be EEXIST (if file exists), might be
      EACCES or EROFS.  Simply stripping O_CREAT (and checking if we see
      ENOENT) would suffice, if not for O_EXCL.  However, we used to have
      ->atomic_open() fully responsible for rejecting O_CREAT | O_EXCL
      on existing file and just stripping O_CREAT would've disarmed
      those checks.  With nothing downstream to catch the problem -
      FMODE_OPENED used to be "don't bother with EEXIST checks,
      ->atomic_open() has done those".  Now EEXIST checks downstream
      are skipped only if FMODE_CREATED is set - FMODE_OPENED alone
      is not enough.  That has eliminated the need to fall back onto
      lookup+create path in this case.
      	3) O_WRONLY or O_RDWR when we have no write access to
      filesystem, with nothing else objectionable.  Fallback is
      (and had always been) pointless.
      
      IOW, we don't really need that fallback; all we need in such
      cases is to trim O_TRUNC and O_CREAT properly.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      99a4a90c
    • Al Viro's avatar
      atomic_open(): no need to pass struct open_flags anymore · d489cf9a
      Al Viro authored
      argument had been unused since 1643b43f (lookup_open(): lift the
      "fallback to !O_CREAT" logics from atomic_open()) back in 2016
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      d489cf9a
    • Al Viro's avatar
      ff326a32
    • Al Viro's avatar
      open_last_lookups(): lift O_EXCL|O_CREAT handling into do_open() · b94e0b32
      Al Viro authored
      Currently path_openat() has "EEXIST on O_EXCL|O_CREAT" checks done on one
      of the ways out of open_last_lookups().  There are 4 cases:
      	1) the last component is . or ..; check is not done.
      	2) we had FMODE_OPENED or FMODE_CREATED set while in lookup_open();
      check is not done.
      	3) symlink to be traversed is found; check is not done (nor
      should it be)
      	4) everything else: check done (before complete_walk(), even).
      
      In case (1) O_EXCL|O_CREAT ends up failing with -EISDIR - that's
      	open("/tmp/.", O_CREAT|O_EXCL, 0600)
      Note that in the same conditions
      	open("/tmp", O_CREAT|O_EXCL, 0600)
      would have yielded EEXIST.  Either error is allowed, switching to -EEXIST
      in these cases would've been more consistent.
      
      Case (2) is more subtle; first of all, if we have FMODE_CREATED set, the
      object hadn't existed prior to the call.  The check should not be done in
      such a case.  The rest is problematic, though - we have
      	FMODE_OPENED set (i.e. it went through ->atomic_open() and got
      successfully opened there)
      	FMODE_CREATED is *NOT* set
      	O_CREAT and O_EXCL are both set.
      Any such case is a bug - either we failed to set FMODE_CREATED when we
      had, in fact, created an object (no such instances in the tree) or
      we have opened a pre-existing file despite having had both O_CREAT and
      O_EXCL passed.  One of those was, in fact caught (and fixed) while
      sorting out this mess (gfs2 on cold dcache).  And in such situations
      we should fail with EEXIST.
      
      Note that for (1) and (4) FMODE_CREATED is not set - for (1) there's nothing
      in handle_dots() to set it, for (4) we'd explicitly checked that.
      
      And (1), (2) and (4) are exactly the cases when we leave the loop in
      the caller, with do_open() called immediately after that loop.  IOW, we
      can move the check over there, and make it
      
      	If we have O_CREAT|O_EXCL and after successful pathname resolution
      FMODE_CREATED is *not* set, we must have run into a preexisting file and
      should fail with EEXIST.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      b94e0b32
    • Al Viro's avatar
    • Al Viro's avatar
      f7bb959d
    • Al Viro's avatar
      take post-lookup part of do_last() out of loop · c5971b8c
      Al Viro authored
      now we can have open_last_lookups() directly from the loop in
      path_openat() - the rest of do_last() never returns a symlink
      to follow, so we can bloody well leave the loop first.
      
      Rename the rest of that thing from do_last() to do_open() and
      make it return an int.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      c5971b8c
    • Al Viro's avatar
    • Al Viro's avatar
      __nd_alloc_stack(): make it return bool · 60ef60c7
      Al Viro authored
      ... and adjust the caller (reserve_stack()).  Rename to nd_alloc_stack(),
      while we are at it.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      60ef60c7
    • Al Viro's avatar
      reserve_stack(): switch to __nd_alloc_stack() · 4542576b
      Al Viro authored
      expand the call of nd_alloc_stack() into it (and don't
      recheck the depth on the second call)
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      4542576b
    • Al Viro's avatar
      49055906
    • Al Viro's avatar
      pick_link(): more straightforward handling of allocation failures · aef9404d
      Al Viro authored
      pick_link() needs to push onto stack; we start with using two-element
      array embedded into struct nameidata and the first time we need
      more than that we switch to separately allocated array.
      
      Allocation can fail, of course, and handling of that would be simple
      enough - we need to drop 'link' and bugger off.  However, the things
      get more complicated in RCU mode.  There we must do GFP_ATOMIC
      allocation.  If that fails, we try to switch to non-RCU mode and
      repeat the allocation.
      
      To switch to non-RCU mode we need to grab references to 'link' and
      to everything in nameidata.  The latter done by unlazy_walk();
      the former - legitimize_path().  'link' must go first - after
      unlazy_walk() we are out of RCU-critical period and it's too
      late to call legitimize_path() since the references in link->mnt
      and link->dentry might be pointing to freed and reused memory.
      
      So we do legitimize_path(), then unlazy_walk().  And that's where
      it gets too subtle: what to do if the former fails?  We MUST
      do path_put(link) to avoid leaks.  And we can't do that under
      rcu_read_lock().  Solution in mainline was to empty then nameidata
      manually, drop out of RCU mode and then do put_path().
      
      In effect, we open-code the things eventual terminate_walk()
      would've done on error in RCU mode.  That looks badly out of place
      and confusing.  We could add a comment along the lines of the
      explanation above, but... there's a simpler solution.  Call
      unlazy_walk() even if legitimaze_path() fails.  It will take
      us out of RCU mode, so we'll be able to do path_put(link).
      
      Yes, it will do unnecessary work - attempt to grab references
      on the stuff in nameidata, only to have them dropped as soon
      as we return the error to upper layer and get terminate_walk()
      called there.  So what?  We are thoroughly off the fast path
      by that point - we had GFP_ATOMIC allocation fail, we had
      ->d_seq or mount_lock mismatch and we are about to try walking
      the same path from scratch in non-RCU mode.  Which will need
      to do the same allocation, this time with GFP_KERNEL, so it will
      be able to apply memory pressure for blocking stuff.
      
      Compared to that the cost of several lockref_get_not_dead()
      is noise.  And the logics become much easier to understand
      that way.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      aef9404d
    • Al Viro's avatar
      c99687a0
    • Al Viro's avatar
      pick_link(): pass it struct path already with normal refcounting rules · 84f0cd9e
      Al Viro authored
      step_into() tries to avoid grabbing and dropping mount references
      on the steps that do not involve crossing mountpoints (which is
      obviously the majority of cases).  So it uses a local struct path
      with unusual refcounting rules - path.mnt is pinned if and only if
      it's not equal to nd->path.mnt.
      
      We used to have similar beasts all over the place and we had quite
      a few bugs crop up in their handling - it's easy to get confused
      when changing e.g. cleanup on failure exits (or adding a new check,
      etc.)
      
      Now that's mostly gone - the step_into() instance (which is what
      we need them for) is the only one left.  It is exposed to mount
      traversal and it's (shortly) seen by pick_link().  Since pick_link()
      needs to store it in link stack, where the normal rules apply,
      it has to make sure that mount is pinned regardless of nd->path.mnt
      value.  That's done on all calls of pick_link() and very early
      in those.  Let's do that in the caller (step_into()) instead -
      that way the fewer places need to be aware of such struct path
      instances.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      84f0cd9e
    • Al Viro's avatar
      fs/namei.c: kill follow_mount() · 19f6028a
      Al Viro authored
      The only remaining caller (path_pts()) should be using follow_down()
      anyway.  And clean path_pts() a bit.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      19f6028a
    • Al Viro's avatar
      non-RCU analogue of the previous commit · 2aa38470
      Al Viro authored
      new helper: choose_mountpoint().  Wrapper around choose_mountpoint_rcu(),
      similar to lookup_mnt() vs. __lookup_mnt().  follow_dotdot() switched to
      it.  Now we don't grab mount_lock exclusive anymore; note that the
      primitive used non-RCU mount traversals in other direction (lookup_mnt())
      doesn't bother with that either - it uses mount_lock seqcount instead.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      2aa38470
    • Al Viro's avatar
      helper for mount rootwards traversal · 7ef482fa
      Al Viro authored
      The loops in follow_dotdot{_rcu()} are doing the same thing:
      we have a mount and we want to find out how far up the chain
      of mounts do we need to go.
      
      We follow the chain of mount until we find one that is not
      directly overmounting the root of another mount.  If such
      a mount is found, we want the location it's mounted upon.
      If we run out of chain (i.e. get to a mount that is not
      mounted on anything else) or run into process' root, we
      report failure.
      
      On success, we want (in RCU case) d_seq of resulting location
      sampled or (in non-RCU case) references to that location
      acquired.
      
      This commit introduces such primitive for RCU case and
      switches follow_dotdot_rcu() to it; non-RCU case will be
      go in the next commit.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      7ef482fa
    • Al Viro's avatar
      follow_dotdot(): be lazy about changing nd->path · 165200d6
      Al Viro authored
      Change nd->path only after the loop is done and only in case we hadn't
      ended up finding ourselves in root.  Same for NO_XDEV check.
      
      That separates the "check how far back do we need to go through the
      mount stack" logics from the rest of .. traversal.
      
      NOTE: path_get/path_put introduced here are temporary.  They will
      go away later in the series.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      165200d6
    • Al Viro's avatar
      follow_dotdot_rcu(): be lazy about changing nd->path · efe772d6
      Al Viro authored
      Change nd->path only after the loop is done and only in case we hadn't
      ended up finding ourselves in root.  Same for NO_XDEV check.  Don't
      recheck mount_lock on each step either.
      
      That separates the "check how far back do we need to go through the
      mount stack" logics from the rest of .. traversal.
      
      Note that the sequence for d_seq/d_inode here is
      	* sample mount_lock seqcount
      ...
      	* sample d_seq
      	* fetch d_inode
      	* verify mount_lock seqcount
      The last step makes sure that d_inode value we'd got matches d_seq -
      it dentry is guaranteed to have been a mountpoint through the
      entire thing, so its d_inode must have been stable.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      efe772d6
    • Al Viro's avatar
      follow_dotdot{,_rcu}(): massage loops · 12487f30
      Al Viro authored
      The logics in both of them is the same:
      	while true
      		if in process' root	// uncommon
      			break
      		if *not* in mount root	// normal case
      			find the parent
      			return
      		if at absolute root	// very uncommon
      			break
      		move to underlying mountpoint
      	report that we are in root
      
      Pull the common path out of the loop:
      	if in process' root		// uncommon
      		goto in_root
      	if unlikely(in mount root)
      		while true
      			if at absolute root
      				goto in_root
      			move to underlying mountpoint
      			if in process' root
      				goto in_root
      			if in mount root
      				break;
      	find the parent	// we are not in mount root
      	return
      in_root:
      	report that we are in root
      
      The reason for that transformation is that we get to keep the
      common path straight *and* get a separate block for "move
      through underlying mountpoints", which will allow to sanitize
      NO_XDEV handling there.  What's more, the pared-down loops
      will be easier to deal with - in particular, non-RCU case
      has no need to grab mount_lock and rewriting it to the
      form that wouldn't do that is a non-trivial change.  Better
      do that with less stuff getting in the way...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      12487f30
    • Al Viro's avatar
      lift all calls of step_into() out of follow_dotdot/follow_dotdot_rcu · c2df1968
      Al Viro authored
      lift step_into() into handle_dots() (where they merge with each other);
      have follow_... return dentry and pass inode/seq to the caller.
      
      [braino fix folded; kudos to Qian Cai <cai@lca.pw> for reporting it]
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      c2df1968
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · 919dce24
      Linus Torvalds authored
      Pull rdma updates from Jason Gunthorpe:
       "The majority of the patches are cleanups, refactorings and clarity
        improvements.
      
        This cycle saw some more activity from Syzkaller, I think we are now
        clean on all but one of those bugs, including the long standing and
        obnoxious rdma_cm locking design defect. Continue to see many drivers
        getting cleanups, with a few new user visible features.
      
        Summary:
      
         - Various driver updates for siw, bnxt_re, rxe, efa, mlx5, hfi1
      
         - Lots of cleanup patches for hns
      
         - Convert more places to use refcount
      
         - Aggressively lock the RDMA CM code that syzkaller says isn't
           working
      
         - Work to clarify ib_cm
      
         - Use the new ib_device lifecycle model in bnxt_re
      
         - Fix mlx5's MR cache which seems to be failing more often with the
           new ODP code
      
         - mlx5 'dynamic uar' and 'tx steering' user interfaces"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (144 commits)
        RDMA/bnxt_re: make bnxt_re_ib_init static
        IB/qib: Delete struct qib_ivdev.qp_rnd
        RDMA/hns: Fix uninitialized variable bug
        RDMA/hns: Modify the mask of QP number for CQE of hip08
        RDMA/hns: Reduce the maximum number of extend SGE per WQE
        RDMA/hns: Reduce PFC frames in congestion scenarios
        RDMA/mlx5: Add support for RDMA TX flow table
        net/mlx5: Add support for RDMA TX steering
        IB/hfi1: Call kobject_put() when kobject_init_and_add() fails
        IB/hfi1: Fix memory leaks in sysfs registration and unregistration
        IB/mlx5: Move to fully dynamic UAR mode once user space supports it
        IB/mlx5: Limit the scope of struct mlx5_bfreg_info to mlx5_ib
        IB/mlx5: Extend QP creation to get uar page index from user space
        IB/mlx5: Extend CQ creation to get uar page index from user space
        IB/mlx5: Expose UAR object and its alloc/destroy commands
        IB/hfi1: Get rid of a warning
        RDMA/hns: Remove redundant judgment of qp_type
        RDMA/hns: Remove redundant assignment of wc->smac when polling cq
        RDMA/hns: Remove redundant qpc setup operations
        RDMA/hns: Remove meaningless prints
        ...
      919dce24
    • Linus Torvalds's avatar
      Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · 50a5de89
      Linus Torvalds authored
      Pull hmm updates from Jason Gunthorpe:
       "This series focuses on corner case bug fixes and general clarity
        improvements to hmm_range_fault(). It arose from a review of
        hmm_range_fault() by Christoph, Ralph and myself.
      
        hmm_range_fault() is being used by these 'SVM' style drivers to
        non-destructively read the page tables. It is very similar to
        get_user_pages() except that the output is an array of PFNs and
        per-pfn flags, and it has various modes of reading.
      
        This is necessary before RDMA ODP can be converted, as we don't want
        to have weird corner case regressions, which is still a looking
        forward item. Ralph has a nice tester for this routine, but it is
        waiting for feedback from the selftests maintainers.
      
        Summary:
      
         - 9 bug fixes
      
         - Allow pgmap to track the 'owner' of a DEVICE_PRIVATE - in this case
           the owner tells the driver if it can understand the DEVICE_PRIVATE
           page or not. Use this to resolve a bug in nouveau where it could
           touch DEVICE_PRIVATE pages from other drivers.
      
         - Remove a bunch of dead, redundant or unused code and flags
      
         - Clarity improvements to hmm_range_fault()"
      
      * tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (25 commits)
        mm/hmm: return error for non-vma snapshots
        mm/hmm: do not set pfns when returning an error code
        mm/hmm: do not unconditionally set pfns when returning EBUSY
        mm/hmm: use device_private_entry_to_pfn()
        mm/hmm: remove HMM_FAULT_SNAPSHOT
        mm/hmm: remove unused code and tidy comments
        mm/hmm: return the fault type from hmm_pte_need_fault()
        mm/hmm: remove pgmap checking for devmap pages
        mm/hmm: check the device private page owner in hmm_range_fault()
        mm: simplify device private page handling in hmm_range_fault
        mm: handle multiple owners of device private pages in migrate_vma
        memremap: add an owner field to struct dev_pagemap
        mm: merge hmm_vma_do_fault into into hmm_vma_walk_hole_
        mm/hmm: don't handle the non-fault case in hmm_vma_walk_hole_()
        mm/hmm: simplify hmm_vma_walk_hugetlb_entry()
        mm/hmm: remove the unused HMM_FAULT_ALLOW_RETRY flag
        mm/hmm: don't provide a stub for hmm_range_fault()
        mm/hmm: do not check pmd_protnone twice in hmm_vma_handle_pmd()
        mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling
        mm/hmm: return -EFAULT when setting HMM_PFN_ERROR on requested valid pages
        ...
      50a5de89
    • Linus Torvalds's avatar
      Merge tag 'xarray-5.7' of git://git.infradead.org/users/willy/linux-dax · 193bc55b
      Linus Torvalds authored
      Pull XArray updates from Matthew Wilcox:
      
       - Fix two bugs which affected multi-index entries larger than 2^26
         indices
      
       - Fix some documentation
      
       - Remove unused IDA macros
      
       - Add a small optimisation for tiny configurations
      
       - Fix a bug which could cause an RCU walker to terminate a marked walk
         early
      
      * tag 'xarray-5.7' of git://git.infradead.org/users/willy/linux-dax:
        xarray: Fix early termination of xas_for_each_marked
        radix tree test suite: Support kmem_cache alignment
        XArray: Optimise xas_sibling() if !CONFIG_XARRAY_MULTI
        ida: remove abandoned macros
        XArray: Fix incorrect comment in header file
        XArray: Fix xas_pause for large multi-index entries
        XArray: Fix xa_find_next for large multi-index entries
      193bc55b
  2. 01 Apr, 2020 12 commits
    • Linus Torvalds's avatar
      Merge tag 'linux-kselftest-kunit-5.7-rc1' of... · 668f1e92
      Linus Torvalds authored
      Merge tag 'linux-kselftest-kunit-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull kunit updates from Shuah Khan:
       "This kunit update consists of:
      
         - debugfs support for displaying kunit test suite results.
      
           This is especially useful for module-loaded tests to allow
           disentangling of test result display from other dmesg events.
           CONFIG_KUNIT_DEBUGFS enables/disables the debugfs support.
      
         - Several fixes and improvements to kunit framework and tool"
      
      * tag 'linux-kselftest-kunit-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
        kunit: tool: add missing test data file content
        kunit: update documentation to describe debugfs representation
        kunit: subtests should be indented 4 spaces according to TAP
        kunit: add log test
        kunit: add debugfs /sys/kernel/debug/kunit/<suite>/results display
        Documentation: kunit: Make the KUnit documentation less UML-specific
        Fix linked-list KUnit test when run multiple times
        kunit: kunit_tool: Allow .kunitconfig to disable config items
        kunit: Always print actual pointer values in asserts
        kunit: add --make_options
        kunit: Run all KUnit tests through allyesconfig
        kunit: kunit_parser: make parser more robust
      668f1e92
    • Linus Torvalds's avatar
      Merge tag 'linux-kselftest-5.7-rc1' of... · 397a9794
      Linus Torvalds authored
      Merge tag 'linux-kselftest-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull kselftest update from Shuah Khan:
       "This kselftest update consists of:
      
         - resctrl_tests for resctrl file system. resctrl isn't included in
           the default TARGETS list in kselftest Makefile. It can be run
           manually.
      
         - Kselftest harness improvements.
      
         - Kselftest framework and individual test fixes to support runs on
           Kernel CI rings and other environments that use relocatable build
           and install features.
      
         - Minor cleanups and typo fixes"
      
      * tag 'linux-kselftest-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (25 commits)
        selftests: enforce local header dependency in lib.mk
        selftests: Fix memfd to support relocatable build (O=objdir)
        selftests: Fix seccomp to support relocatable build (O=objdir)
        selftests/harness: Handle timeouts cleanly
        selftests/harness: Move test child waiting logic
        selftests: android: Fix custom install from skipping test progs
        selftests: android: ion: Fix ionmap_test compile error
        selftests: Fix kselftest O=objdir build from cluttering top level objdir
        selftests/seccomp: Adjust test fixture counts
        selftests/ftrace: Fix typo in trigger-multihist.tc
        selftests/timens: Remove duplicated include <time.h>
        selftests/resctrl: fix spelling mistake "Errror" -> "Error"
        selftests/resctrl: Add the test in MAINTAINERS
        selftests/resctrl: Disable MBA and MBM tests for AMD
        selftests/resctrl: Use cache index3 id for AMD schemata masks
        selftests/resctrl: Add vendor detection mechanism
        selftests/resctrl: Add Cache Allocation Technology (CAT) selftest
        selftests/resctrl: Add Cache QoS Monitoring (CQM) selftest
        selftests/resctrl: Add MBA test
        selftests/resctrl: Add MBM test
        ...
      397a9794
    • Linus Torvalds's avatar
      Merge tag 'for-5.7/dm-changes' of... · ffc1c20c
      Linus Torvalds authored
      Merge tag 'for-5.7/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
      
      Pull device mapper updates from Mike Snitzer:
      
       - Add DM writecache "cleaner" policy feature that allows cache to be
         flushed while userspace monitors for completion to then discommision
         use of caching.
      
       - Optimize DM writecache superblock writing and also yield CPU while
         initializing writecache on large PMEM devices to avoid CPU stalls.
      
       - Various fixes to DM integrity target while preparing for the ability
         to resize a DM integrity device. In addition to resize support, add
         optional discard support with the "allow_discards" feature.
      
       - Fix DM clone target's discard handling and overflow bugs which could
         cause data corruption.
      
       - Fix memory leak in destructor for DM verity FEC support.
      
       - Fix DM zoned target's redundant increment of nr_rnd_zones.
      
       - Small cleanup in DM crypt to use crypt_integrity_aead() helper.
      
      * tag 'for-5.7/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
        dm clone metadata: Fix return type of dm_clone_nr_of_hydrated_regions()
        dm clone: Add missing casts to prevent overflows and data corruption
        dm clone: Add overflow check for number of regions
        dm clone: Fix handling of partial region discards
        dm writecache: add cond_resched to avoid CPU hangs
        dm integrity: improve discard in journal mode
        dm integrity: add optional discard support
        dm integrity: allow resize of the integrity device
        dm integrity: factor out get_provided_data_sectors()
        dm integrity: don't replay journal data past the end of the device
        dm integrity: remove sector type casts
        dm integrity: fix a crash with unusually large tag size
        dm zoned: remove duplicate nr_rnd_zones increase in dmz_init_zone()
        dm verity fec: fix memory leak in verity_fec_dtr
        dm writecache: optimize superblock write
        dm writecache: implement gradual cleanup
        dm writecache: implement the "cleaner" policy
        dm writecache: do direct write if the cache is full
        dm integrity: print device name in integrity_metadata() error message
        dm crypt: use crypt_integrity_aead() helper
      ffc1c20c
    • Linus Torvalds's avatar
      Merge tag 'drm-next-2020-04-01' of git://anongit.freedesktop.org/drm/drm · f365ab31
      Linus Torvalds authored
      Pull drm updates from Dave Airlie:
       "This is the main drm pull request for 5.7-rc1.
      
        Highlights:
      
         - i915 enables Tigerlake by default
      
         - i915 and amdgpu have initial OLED backlight support
      
           [ Jani Nikula pipes up and points out that we've had a bunch of
             "initial support" code for a long time already, but only now
             Lyude made it actually work on real world machines ]
      
         - vmwgfx add support to enable OpenGL 4 userspace
      
         - zero length arrays are mostly removed.
      
        Detailed summary:
      
        new driver:
         - tidss: TI Keystone platform display subsystem
      
        core:
         - new drm device warn macros
         - mode config valid for memory constrained devices
         - bridge bus format negotation
         - consolidated fake vblank event handling
         - dma_alloc related cleanups
         - drop get_crtc callback
         - dp: DP1.4 EDID corruption test
         - EDID CEA detailed timings improvements
         - relicense some code to dual GPL2/MIT
         - convert core vblank support to per-crtc support
         - rework drm_global_mutex
         - bridge rework to allow omap_dss custom driver removeal
         - remove drm_fb_helper connector interrfaces
         - zero-length array removal
      
        scheduler:
         - support for modifying the sched list
         - revert job distribution optimization
         - helper to pick least loaded scheduler
         - race condition fix
      
        mst:
         - various fixes
         - remove register_connector callback
      
        i915:
         - uapi to allows userspace specific CS ring buffer sizes
         - Tigerlake enablement patches + Tigerlake enabled by default
         - new sysfs entries for engine properties
         - display/logging refactors
         - eDP/DP fixes for DPCD
         - Gen7 back to aliasing-ppgtt
         - Gen8+ irq refactor
         - Avoid globals
         - GEM locking fixes and simplifications
         - Ice Lake and Elkhart Lake fixes and workarounds
         - Baytrail/Haswell instability fix
         - GVT - VFIO edid better support
      
        amdgpu:
         - Rework VM update handling in preparation for HMM support
         - drm load/unload removal fixups
         - USB-C PD firmware updates
         - HDCP srm support
         - Navi/renoir PM watermark fixes
         - OLED panel support
         - Optimize debugging vram access
         - Use BACO for runtime pm
         - DC clock programming optimizations and fixes
         - PSP fw loading sequence updates
         - Drop DRIVER_USE_AGP
         - Remove legacy drm load and unload callbacks
         - ACP Kconfig fix
         - Lots of fixes across the driver
      
        amdkfd:
         - runtime pm support
         - more gfx config details in amdgpu
      
        radeon:
         - drop DRIVER_USE_AGP
      
        vmwgfx:
         - Disable DMA when SEV encryption in use
         - Shader Model 5 support - needed for GL4 support
      
        msm:
         - DPU resource manager refactor
         - dpu using atomic global state
      
        mediatek:
         - MT8183 DPI support
      
        etnaviv:
         - out-of-bounds read fix
         - expose feature flags for GC400 STM32MP1 SoC
         - runtime suspend entry fix
         - dma32 zone fix
      
        hisilicon:
         - mode selection fixes
      
        meson:
         - YUV420 support
      
        lima:
         - add support for heap buffers
      
        tinydrm:
         - removal of owner field
         - explicit DT dependency removal
         - YAML schema conversion
      
        tegra:
         - misc cleanups
      
        tidss:
         - new driver
      
        virtio:
         - better batching of notifications to host
         - memory handling reworked
         - shmem + gpu context fixes
      
        hibmc:
         - add gamma_set support
         - improve DPMS support
      
        pl111:
         - Integrator IM-PD1 support
      
        sun4i:
         - LVDS support for A20 + A33
         - DSI panel handling improvements"
      
      * tag 'drm-next-2020-04-01' of git://anongit.freedesktop.org/drm/drm: (1537 commits)
        drm/i915/display: Fix mode private_flags comparison at atomic_check
        drm/i915/gt: Stage the transfer of the virtual breadcrumb
        drm/i915/gt: Select the deepest available parking mode for rc6
        drm/i915: Avoid live-lock with i915_vma_parked()
        drm/i915/gt: Treat idling as a RPS downclock event
        drm/i915/gt: Cancel a hung context if already closed
        drm/i915: Use explicit flag to mark unreachable intel_context
        drm/amdgpu: don't try to reserve training bo for sriov (v2)
        drm/amdgpu/smu11: add support for SMU AC/DC interrupts
        drm/amdgpu/swSMU: handle manual AC/DC notifications
        drm/amdgpu/swSMU: handle DC controlled by GPIO for navi1x
        drm/amdgpu/swSMU: set AC/DC mode based on the current system state (v2)
        drm/amdgpu/swSMU: correct the bootup power source for Navi1X (v2)
        drm/amdgpu/swSMU: use the smu11 power source helper for navi1x
        drm/amdgpu/smu11: add a helper to set the power source
        drm/amd/swSMU: add callback to set AC/DC power source (v2)
        drm/scheduler: fix rare NULL ptr race
        drm/amdgpu: fix the coverage issue to clear ArcVPGRs
        drm/amd/display: Fix pageflip event race condition for DCN.
        drm/[radeon|amdgpu]: Remove HAINAN board from max_sclk override check
        ...
      f365ab31
    • Linus Torvalds's avatar
      Merge tag 'mailbox-v5.7' of git://git.linaro.org/landing-teams/working/fujitsu/integration · 4646de87
      Linus Torvalds authored
      Pull mailbox updates from Jassi Brar:
      
       - imx: add support for i.MX8/8X to existing driver
      
       - mediatek: drop the atomix execution feature, add flush
      
       - allwinner: new 'msgbox' controller driver
      
       - armada: misc: drop redundant error print
      
       - bcm: misc: catch error in probe and snprintf buffer overflow
      
      * tag 'mailbox-v5.7' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
        mailbox: imx: add SCU MU support
        mailbox: imx: restructure code to make easy for new MU
        dt-bindings: mailbox: imx-mu: add SCU MU support
        mailbox: mediatek: remove implementation related to atomic_exec
        mailbox: mediatek: implement flush function
        dt-binding: gce: remove atomic_exec in mboxes property
        maillbox: bcm-flexrm-mailbox: handle cmpl_pool dma allocation failure
        mailbox: sun6i-msgbox: Add a new mailbox driver
        dt-bindings: mailbox: Add a binding for the sun6i msgbox
        mailbox: bcm-pdc: Use scnprintf() for avoiding potential buffer overflow
        mailbox:armada-37xx-rwtm:remove duplicate print in armada_37xx_mbox_probe()
      4646de87
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid · c101e9bb
      Linus Torvalds authored
      Pull HID updates from Jiri Kosina:
      
       - Logitech HID++ protocol support improvement from Filipe Laíns
      
       - probe fix for Logitech-G* devices from Hans de Goede
      
       - a few other small code cleanups and support for new device IDs
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
        HID: rmi: Simplify an error handling path in 'rmi_hid_read_block()'
        HID: intel-ish-hid: hbm.h: Replace zero-length array with flexible-array member
        HID: intel-ish-hid: ishtp-dev.h: Replace zero-length array with flexible-array member
        HID: Add driver fixing Glorious PC Gaming Race mouse report descriptor
        HID: lg-g15: Do not fail the probe when we fail to disable F# emulation
        HID: appleir: Use devm_kzalloc() instead of kzalloc()
        HID: appleir: Remove unnecessary goto label
        HID: logitech-dj: add support for the static device in the Powerplay mat/receiver
        HID: mcp2221: add usb to i2c-smbus host bridge
        HID: logitech-dj: add debug msg when exporting a HID++ report descriptors
        HID: quirks: Remove ITE 8595 entry from hid_have_special_driver
      c101e9bb
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial · 69c1fd97
      Linus Torvalds authored
      Pull trivial tree updates from Jiri Kosina:
       "My attempt to revitalize trivial queue I've been neglecting for years
        (what a disaster that was for this world, right? :) ) with patches
        collected from backlog that were still relevant and not applied
        elsewhere in the meantime"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
        err.h: remove deprecated PTR_RET for good
        blk-mq: Fix typo in comment
        x86/boot: Fix comment spelling
        sh: mach-highlander: Fix comment spelling
        s390/dasd: Fix comment spelling
        mfd: wm8994: Fix comment spelling
        docs: Add reference in binfmt-misc.rst
        genirq: fix kerneldoc comment for irq_desc
        drm/amdgpu: fix two documentation mismatch issues
        HID: fix Kconfig word ordering
        list/hashtable: minor documentation corrections.
      69c1fd97
    • Linus Torvalds's avatar
      Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · 72f35423
      Linus Torvalds authored
      Pull crypto updates from Herbert Xu:
       "API:
         - Fix out-of-sync IVs in self-test for IPsec AEAD algorithms
      
        Algorithms:
         - Use formally verified implementation of x86/curve25519
      
        Drivers:
         - Enhance hwrng support in caam
      
         - Use crypto_engine for skcipher/aead/rsa/hash in caam
      
         - Add Xilinx AES driver
      
         - Add uacce driver
      
         - Register zip engine to uacce in hisilicon
      
         - Add support for OCTEON TX CPT engine in marvell"
      
      * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (162 commits)
        crypto: af_alg - bool type cosmetics
        crypto: arm[64]/poly1305 - add artifact to .gitignore files
        crypto: caam - limit single JD RNG output to maximum of 16 bytes
        crypto: caam - enable prediction resistance in HRWNG
        bus: fsl-mc: add api to retrieve mc version
        crypto: caam - invalidate entropy register during RNG initialization
        crypto: caam - check if RNG job failed
        crypto: caam - simplify RNG implementation
        crypto: caam - drop global context pointer and init_done
        crypto: caam - use struct hwrng's .init for initialization
        crypto: caam - allocate RNG instantiation descriptor with GFP_DMA
        crypto: ccree - remove duplicated include from cc_aead.c
        crypto: chelsio - remove set but not used variable 'adap'
        crypto: marvell - enable OcteonTX cpt options for build
        crypto: marvell - add the Virtual Function driver for CPT
        crypto: marvell - add support for OCTEON TX CPT engine
        crypto: marvell - create common Kconfig and Makefile for Marvell
        crypto: arm/neon - memzero_explicit aes-cbc key
        crypto: bcm - Use scnprintf() for avoiding potential buffer overflow
        crypto: atmel-i2c - Fix wakeup fail
        ...
      72f35423
    • Linus Torvalds's avatar
      x86: start using named parameters for low-level uaccess asms · 890f0b0d
      Linus Torvalds authored
      This is partly for readability - using named arguments instead of
      numbered ones makes it muchmore obvious just what is going on.  Using
      "%[efault]" instead of "%4" for the special -EFAULT constant just means
      that you don't have to count the arguments to see what's up.
      
      But the motivation for all this cleanup is that when we'll start to
      conditionally use "asm goto" even for the __get_user_asm() case, the
      argument numbers will depend on whether we have an error output, or an
      error label we can just directly jump to.
      
      So this moves us towards named arguments for the same reason that we
      have to use named arguments for the asms that use SET_CC(): numbering
      will eventually become similarly unreliable and depends on whether we
      can use particular compiler features or not.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      890f0b0d
    • Linus Torvalds's avatar
      x86: get rid of 'rtype' argument to __get_user_asm() macro · 7da63b3d
      Linus Torvalds authored
      This is the exact same thing as 36807856 ("x86: get rid of 'rtype'
      argument to __put_user_goto() macro") except it's about __get_user_asm()
      rather than __put_user_goto().
      
      The reasons are the same: having the low-level asm access the argument
      with a different size than the compiler thinks it does is fundamentally
      wrong.
      
      But unlike the __put_user_goto() case, we actually did tell the compiler
      that we used a bigger variable (either long or long long), and then only
      filled in the low bits, and ended up "fixing" this by casting the result
      to the proper pointer type.
      
      That's because we needed to use a non-qualified type (the user pointer
      might be a const pointer!), and that makes this a bit more painful.  Our
      '__inttype()' macro used to be lazy and only differentiate between "fits
      in a register" or "needs two registers".
      
      So this fix had to also make that '__inttype()' macro more precise.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7da63b3d
    • Linus Torvalds's avatar
      x86: get rid of 'rtype' argument to __put_user_goto() macro · 36807856
      Linus Torvalds authored
      The 'rtype' argument goes back to pre-git (and pre-BK) times, and comes
      from the fact that we used to not necessarily have the same type sizes
      for the arguments of the inline asm as we did for the actual accesses we
      did.
      
      So 'rtype' is the 'register type' - the override of the register size in
      the inline asm when it doesn't match the actual size of the variable we
      use as the output argument (for when you used "put_user()" on an "int"
      value that was assigned to a byte-sized user space access etc).
      
      That mismatch doesn't actually exist any more, and should probably never
      have existed in the first place.  It's a horrid bug just waiting to
      happen (using more - or less - of the variable that the compiler
      expected us to use).
      
      I think we had some odd casting going on to hide the effects of that
      oddity after-the-fact, but those are long gone, and these days we should
      always have the right size value in the first place, using things like
      
              __typeof__(*(ptr)) __pu_val = (x);
      
      and gcc should thus have the right register size without any manual
      'rtype' games.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      36807856
    • Eric W. Biederman's avatar
      signal: Extend exec_id to 64bits · d1e7fd64
      Eric W. Biederman authored
      Replace the 32bit exec_id with a 64bit exec_id to make it impossible
      to wrap the exec_id counter.  With care an attacker can cause exec_id
      wrap and send arbitrary signals to a newly exec'd parent.  This
      bypasses the signal sending checks if the parent changes their
      credentials during exec.
      
      The severity of this problem can been seen that in my limited testing
      of a 32bit exec_id it can take as little as 19s to exec 65536 times.
      Which means that it can take as little as 14 days to wrap a 32bit
      exec_id.  Adam Zabrocki has succeeded wrapping the self_exe_id in 7
      days.  Even my slower timing is in the uptime of a typical server.
      Which means self_exec_id is simply a speed bump today, and if exec
      gets noticably faster self_exec_id won't even be a speed bump.
      
      Extending self_exec_id to 64bits introduces a problem on 32bit
      architectures where reading self_exec_id is no longer atomic and can
      take two read instructions.  Which means that is is possible to hit
      a window where the read value of exec_id does not match the written
      value.  So with very lucky timing after this change this still
      remains expoiltable.
      
      I have updated the update of exec_id on exec to use WRITE_ONCE
      and the read of exec_id in do_notify_parent to use READ_ONCE
      to make it clear that there is no locking between these two
      locations.
      
      Link: https://lore.kernel.org/kernel-hardening/20200324215049.GA3710@pi3.com.pl
      Fixes: 2.3.23pre2
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      d1e7fd64