1. 21 Dec, 2022 1 commit
    • Christian Brauner's avatar
      pnode: terminate at peers of source · 11933cf1
      Christian Brauner authored
      The propagate_mnt() function handles mount propagation when creating
      mounts and propagates the source mount tree @source_mnt to all
      applicable nodes of the destination propagation mount tree headed by
      @dest_mnt.
      
      Unfortunately it contains a bug where it fails to terminate at peers of
      @source_mnt when looking up copies of the source mount that become
      masters for copies of the source mount tree mounted on top of slaves in
      the destination propagation tree causing a NULL dereference.
      
      Once the mechanics of the bug are understood it's easy to trigger.
      Because of unprivileged user namespaces it is available to unprivileged
      users.
      
      While fixing this bug we've gotten confused multiple times due to
      unclear terminology or missing concepts. So let's start this with some
      clarifications:
      
      * The terms "master" or "peer" denote a shared mount. A shared mount
        belongs to a peer group.
      
      * A peer group is a set of shared mounts that propagate to each other.
        They are identified by a peer group id. The peer group id is available
        in @shared_mnt->mnt_group_id.
        Shared mounts within the same peer group have the same peer group id.
        The peers in a peer group can be reached via @shared_mnt->mnt_share.
      
      * The terms "slave mount" or "dependent mount" denote a mount that
        receives propagation from a peer in a peer group. IOW, shared mounts
        may have slave mounts and slave mounts have shared mounts as their
        master. Slave mounts of a given peer in a peer group are listed on
        that peers slave list available at @shared_mnt->mnt_slave_list.
      
      * The term "master mount" denotes a mount in a peer group. IOW, it
        denotes a shared mount or a peer mount in a peer group. The term
        "master mount" - or "master" for short - is mostly used when talking
        in the context of slave mounts that receive propagation from a master
        mount. A master mount of a slave identifies the closest peer group a
        slave mount receives propagation from. The master mount of a slave can
        be identified via @slave_mount->mnt_master. Different slaves may point
        to different masters in the same peer group.
      
      * Multiple peers in a peer group can have non-empty ->mnt_slave_lists.
        Non-empty ->mnt_slave_lists of peers don't intersect. Consequently, to
        ensure all slave mounts of a peer group are visited the
        ->mnt_slave_lists of all peers in a peer group have to be walked.
      
      * Slave mounts point to a peer in the closest peer group they receive
        propagation from via @slave_mnt->mnt_master (see above). Together with
        these peers they form a propagation group (see below). The closest
        peer group can thus be identified through the peer group id
        @slave_mnt->mnt_master->mnt_group_id of the peer/master that a slave
        mount receives propagation from.
      
      * A shared-slave mount is a slave mount to a peer group pg1 while also
        a peer in another peer group pg2. IOW, a peer group may receive
        propagation from another peer group.
      
        If a peer group pg1 is a slave to another peer group pg2 then all
        peers in peer group pg1 point to the same peer in peer group pg2 via
        ->mnt_master. IOW, all peers in peer group pg1 appear on the same
        ->mnt_slave_list. IOW, they cannot be slaves to different peer groups.
      
      * A pure slave mount is a slave mount that is a slave to a peer group
        but is not a peer in another peer group.
      
      * A propagation group denotes the set of mounts consisting of a single
        peer group pg1 and all slave mounts and shared-slave mounts that point
        to a peer in that peer group via ->mnt_master. IOW, all slave mounts
        such that @slave_mnt->mnt_master->mnt_group_id is equal to
        @shared_mnt->mnt_group_id.
      
        The concept of a propagation group makes it easier to talk about a
        single propagation level in a propagation tree.
      
        For example, in propagate_mnt() the immediate peers of @dest_mnt and
        all slaves of @dest_mnt's peer group form a propagation group propg1.
        So a shared-slave mount that is a slave in propg1 and that is a peer
        in another peer group pg2 forms another propagation group propg2
        together with all slaves that point to that shared-slave mount in
        their ->mnt_master.
      
      * A propagation tree refers to all mounts that receive propagation
        starting from a specific shared mount.
      
        For example, for propagate_mnt() @dest_mnt is the start of a
        propagation tree. The propagation tree ecompasses all mounts that
        receive propagation from @dest_mnt's peer group down to the leafs.
      
      With that out of the way let's get to the actual algorithm.
      
      We know that @dest_mnt is guaranteed to be a pure shared mount or a
      shared-slave mount. This is guaranteed by a check in
      attach_recursive_mnt(). So propagate_mnt() will first propagate the
      source mount tree to all peers in @dest_mnt's peer group:
      
      for (n = next_peer(dest_mnt); n != dest_mnt; n = next_peer(n)) {
              ret = propagate_one(n);
              if (ret)
                     goto out;
      }
      
      Notice, that the peer propagation loop of propagate_mnt() doesn't
      propagate @dest_mnt itself. @dest_mnt is mounted directly in
      attach_recursive_mnt() after we propagated to the destination
      propagation tree.
      
      The mount that will be mounted on top of @dest_mnt is @source_mnt. This
      copy was created earlier even before we entered attach_recursive_mnt()
      and doesn't concern us a lot here.
      
      It's just important to notice that when propagate_mnt() is called
      @source_mnt will not yet have been mounted on top of @dest_mnt. Thus,
      @source_mnt->mnt_parent will either still point to @source_mnt or - in
      the case @source_mnt is moved and thus already attached - still to its
      former parent.
      
      For each peer @M in @dest_mnt's peer group propagate_one() will create a
      new copy of the source mount tree and mount that copy @child on @M such
      that @child->mnt_parent points to @M after propagate_one() returns.
      
      propagate_one() will stash the last destination propagation node @M in
      @last_dest and the last copy it created for the source mount tree in
      @last_source.
      
      Hence, if we call into propagate_one() again for the next destination
      propagation node @M, @last_dest will point to the previous destination
      propagation node and @last_source will point to the previous copy of the
      source mount tree and mounted on @last_dest.
      
      Each new copy of the source mount tree is created from the previous copy
      of the source mount tree. This will become important later.
      
      The peer loop in propagate_mnt() is straightforward. We iterate through
      the peers copying and updating @last_source and @last_dest as we go
      through them and mount each copy of the source mount tree @child on a
      peer @M in @dest_mnt's peer group.
      
      After propagate_mnt() handled the peers in @dest_mnt's peer group
      propagate_mnt() will propagate the source mount tree down the
      propagation tree that @dest_mnt's peer group propagates to:
      
      for (m = next_group(dest_mnt, dest_mnt); m;
                      m = next_group(m, dest_mnt)) {
              /* everything in that slave group */
              n = m;
              do {
                      ret = propagate_one(n);
                      if (ret)
                              goto out;
                      n = next_peer(n);
              } while (n != m);
      }
      
      The next_group() helper will recursively walk the destination
      propagation tree, descending into each propagation group of the
      propagation tree.
      
      The important part is that it takes care to propagate the source mount
      tree to all peers in the peer group of a propagation group before it
      propagates to the slaves to those peers in the propagation group. IOW,
      it creates and mounts copies of the source mount tree that become
      masters before it creates and mounts copies of the source mount tree
      that become slaves to these masters.
      
      It is important to remember that propagating the source mount tree to
      each mount @M in the destination propagation tree simply means that we
      create and mount new copies @child of the source mount tree on @M such
      that @child->mnt_parent points to @M.
      
      Since we know that each node @M in the destination propagation tree
      headed by @dest_mnt's peer group will be overmounted with a copy of the
      source mount tree and since we know that the propagation properties of
      each copy of the source mount tree we create and mount at @M will mostly
      mirror the propagation properties of @M. We can use that information to
      create and mount the copies of the source mount tree that become masters
      before their slaves.
      
      The easy case is always when @M and @last_dest are peers in a peer group
      of a given propagation group. In that case we know that we can simply
      copy @last_source without having to figure out what the master for the
      new copy @child of the source mount tree needs to be as we've done that
      in a previous call to propagate_one().
      
      The hard case is when we're dealing with a slave mount or a shared-slave
      mount @M in a destination propagation group that we need to create and
      mount a copy of the source mount tree on.
      
      For each propagation group in the destination propagation tree we
      propagate the source mount tree to we want to make sure that the copies
      @child of the source mount tree we create and mount on slaves @M pick an
      ealier copy of the source mount tree that we mounted on a master @M of
      the destination propagation group as their master. This is a mouthful
      but as far as we can tell that's the core of it all.
      
      But, if we keep track of the masters in the destination propagation tree
      @M we can use the information to find the correct master for each copy
      of the source mount tree we create and mount at the slaves in the
      destination propagation tree @M.
      
      Let's walk through the base case as that's still fairly easy to grasp.
      
      If we're dealing with the first slave in the propagation group that
      @dest_mnt is in then we don't yet have marked any masters in the
      destination propagation tree.
      
      We know the master for the first slave to @dest_mnt's peer group is
      simple @dest_mnt. So we expect this algorithm to yield a copy of the
      source mount tree that was mounted on a peer in @dest_mnt's peer group
      as the master for the copy of the source mount tree we want to mount at
      the first slave @M:
      
      for (n = m; ; n = p) {
              p = n->mnt_master;
              if (p == dest_master || IS_MNT_MARKED(p))
                      break;
      }
      
      For the first slave we walk the destination propagation tree all the way
      up to a peer in @dest_mnt's peer group. IOW, the propagation hierarchy
      can be walked by walking up the @mnt->mnt_master hierarchy of the
      destination propagation tree @M. We will ultimately find a peer in
      @dest_mnt's peer group and thus ultimately @dest_mnt->mnt_master.
      
      Btw, here the assumption we listed at the beginning becomes important.
      Namely, that peers in a peer group pg1 that are slaves in another peer
      group pg2 appear on the same ->mnt_slave_list. IOW, all slaves who are
      peers in peer group pg1 point to the same peer in peer group pg2 via
      their ->mnt_master. Otherwise the termination condition in the code
      above would be wrong and next_group() would be broken too.
      
      So the first iteration sets:
      
      n = m;
      p = n->mnt_master;
      
      such that @p now points to a peer or @dest_mnt itself. We walk up one
      more level since we don't have any marked mounts. So we end up with:
      
      n = dest_mnt;
      p = dest_mnt->mnt_master;
      
      If @dest_mnt's peer group is not slave to another peer group then @p is
      now NULL. If @dest_mnt's peer group is a slave to another peer group
      then @p now points to @dest_mnt->mnt_master points which is a master
      outside the propagation tree we're dealing with.
      
      Now we need to figure out the master for the copy of the source mount
      tree we're about to create and mount on the first slave of @dest_mnt's
      peer group:
      
      do {
              struct mount *parent = last_source->mnt_parent;
              if (last_source == first_source)
                      break;
              done = parent->mnt_master == p;
              if (done && peers(n, parent))
                      break;
              last_source = last_source->mnt_master;
      } while (!done);
      
      We know that @last_source->mnt_parent points to @last_dest and
      @last_dest is the last peer in @dest_mnt's peer group we propagated to
      in the peer loop in propagate_mnt().
      
      Consequently, @last_source is the last copy we created and mount on that
      last peer in @dest_mnt's peer group. So @last_source is the master we
      want to pick.
      
      We know that @last_source->mnt_parent->mnt_master points to
      @last_dest->mnt_master. We also know that @last_dest->mnt_master is
      either NULL or points to a master outside of the destination propagation
      tree and so does @p. Hence:
      
      done = parent->mnt_master == p;
      
      is trivially true in the base condition.
      
      We also know that for the first slave mount of @dest_mnt's peer group
      that @last_dest either points @dest_mnt itself because it was
      initialized to:
      
      last_dest = dest_mnt;
      
      at the beginning of propagate_mnt() or it will point to a peer of
      @dest_mnt in its peer group. In both cases it is guaranteed that on the
      first iteration @n and @parent are peers (Please note the check for
      peers here as that's important.):
      
      if (done && peers(n, parent))
              break;
      
      So, as we expected, we select @last_source, which referes to the last
      copy of the source mount tree we mounted on the last peer in @dest_mnt's
      peer group, as the master of the first slave in @dest_mnt's peer group.
      The rest is taken care of by clone_mnt(last_source, ...). We'll skip
      over that part otherwise this becomes a blogpost.
      
      At the end of propagate_mnt() we now mark @m->mnt_master as the first
      master in the destination propagation tree that is distinct from
      @dest_mnt->mnt_master. IOW, we mark @dest_mnt itself as a master.
      
      By marking @dest_mnt or one of it's peers we are able to easily find it
      again when we later lookup masters for other copies of the source mount
      tree we mount copies of the source mount tree on slaves @M to
      @dest_mnt's peer group. This, in turn allows us to find the master we
      selected for the copies of the source mount tree we mounted on master in
      the destination propagation tree again.
      
      The important part is to realize that the code makes use of the fact
      that the last copy of the source mount tree stashed in @last_source was
      mounted on top of the previous destination propagation node @last_dest.
      What this means is that @last_source allows us to walk the destination
      propagation hierarchy the same way each destination propagation node @M
      does.
      
      If we take @last_source, which is the copy of @source_mnt we have
      mounted on @last_dest in the previous iteration of propagate_one(), then
      we know @last_source->mnt_parent points to @last_dest but we also know
      that as we walk through the destination propagation tree that
      @last_source->mnt_master will point to an earlier copy of the source
      mount tree we mounted one an earlier destination propagation node @M.
      
      IOW, @last_source->mnt_parent will be our hook into the destination
      propagation tree and each consecutive @last_source->mnt_master will lead
      us to an earlier propagation node @M via
      @last_source->mnt_master->mnt_parent.
      
      Hence, by walking up @last_source->mnt_master, each of which is mounted
      on a node that is a master @M in the destination propagation tree we can
      also walk up the destination propagation hierarchy.
      
      So, for each new destination propagation node @M we use the previous
      copy of @last_source and the fact it's mounted on the previous
      propagation node @last_dest via @last_source->mnt_master->mnt_parent to
      determine what the master of the new copy of @last_source needs to be.
      
      The goal is to find the _closest_ master that the new copy of the source
      mount tree we are about to create and mount on a slave @M in the
      destination propagation tree needs to pick. IOW, we want to find a
      suitable master in the propagation group.
      
      As the propagation structure of the source mount propagation tree we
      create mirrors the propagation structure of the destination propagation
      tree we can find @M's closest master - i.e., a marked master - which is
      a peer in the closest peer group that @M receives propagation from. We
      store that closest master of @M in @p as before and record the slave to
      that master in @n
      
      We then search for this master @p via @last_source by walking up the
      master hierarchy starting from the last copy of the source mount tree
      stored in @last_source that we created and mounted on the previous
      destination propagation node @M.
      
      We will try to find the master by walking @last_source->mnt_master and
      by comparing @last_source->mnt_master->mnt_parent->mnt_master to @p. If
      we find @p then we can figure out what earlier copy of the source mount
      tree needs to be the master for the new copy of the source mount tree
      we're about to create and mount at the current destination propagation
      node @M.
      
      If @last_source->mnt_master->mnt_parent and @n are peers then we know
      that the closest master they receive propagation from is
      @last_source->mnt_master->mnt_parent->mnt_master. If not then the
      closest immediate peer group that they receive propagation from must be
      one level higher up.
      
      This builds on the earlier clarification at the beginning that all peers
      in a peer group which are slaves of other peer groups all point to the
      same ->mnt_master, i.e., appear on the same ->mnt_slave_list, of the
      closest peer group that they receive propagation from.
      
      However, terminating the walk has corner cases.
      
      If the closest marked master for a given destination node @M cannot be
      found by walking up the master hierarchy via @last_source->mnt_master
      then we need to terminate the walk when we encounter @source_mnt again.
      
      This isn't an arbitrary termination. It simply means that the new copy
      of the source mount tree we're about to create has a copy of the source
      mount tree we created and mounted on a peer in @dest_mnt's peer group as
      its master. IOW, @source_mnt is the peer in the closest peer group that
      the new copy of the source mount tree receives propagation from.
      
      We absolutely have to stop @source_mnt because @last_source->mnt_master
      either points outside the propagation hierarchy we're dealing with or it
      is NULL because @source_mnt isn't a shared-slave.
      
      So continuing the walk past @source_mnt would cause a NULL dereference
      via @last_source->mnt_master->mnt_parent. And so we have to stop the
      walk when we encounter @source_mnt again.
      
      One scenario where this can happen is when we first handled a series of
      slaves of @dest_mnt's peer group and then encounter peers in a new peer
      group that is a slave to @dest_mnt's peer group. We handle them and then
      we encounter another slave mount to @dest_mnt that is a pure slave to
      @dest_mnt's peer group. That pure slave will have a peer in @dest_mnt's
      peer group as its master. Consequently, the new copy of the source mount
      tree will need to have @source_mnt as it's master. So we walk the
      propagation hierarchy all the way up to @source_mnt based on
      @last_source->mnt_master.
      
      So terminate on @source_mnt, easy peasy. Except, that the check misses
      something that the rest of the algorithm already handles.
      
      If @dest_mnt has peers in it's peer group the peer loop in
      propagate_mnt():
      
      for (n = next_peer(dest_mnt); n != dest_mnt; n = next_peer(n)) {
              ret = propagate_one(n);
              if (ret)
                      goto out;
      }
      
      will consecutively update @last_source with each previous copy of the
      source mount tree we created and mounted at the previous peer in
      @dest_mnt's peer group. So after that loop terminates @last_source will
      point to whatever copy of the source mount tree was created and mounted
      on the last peer in @dest_mnt's peer group.
      
      Furthermore, if there is even a single additional peer in @dest_mnt's
      peer group then @last_source will __not__ point to @source_mnt anymore.
      Because, as we mentioned above, @dest_mnt isn't even handled in this
      loop but directly in attach_recursive_mnt(). So it can't even accidently
      come last in that peer loop.
      
      So the first time we handle a slave mount @M of @dest_mnt's peer group
      the copy of the source mount tree we create will make the __last copy of
      the source mount tree we created and mounted on the last peer in
      @dest_mnt's peer group the master of the new copy of the source mount
      tree we create and mount on the first slave of @dest_mnt's peer group__.
      
      But this means that the termination condition that checks for
      @source_mnt is wrong. The @source_mnt cannot be found anymore by
      propagate_one(). Instead it will find the last copy of the source mount
      tree we created and mounted for the last peer of @dest_mnt's peer group
      again. And that is a peer of @source_mnt not @source_mnt itself.
      
      IOW, we fail to terminate the loop correctly and ultimately dereference
      @last_source->mnt_master->mnt_parent. When @source_mnt's peer group
      isn't slave to another peer group then @last_source->mnt_master is NULL
      causing the splat below.
      
      For example, assume @dest_mnt is a pure shared mount and has three peers
      in its peer group:
      
      ===================================================================================
                                               mount-id   mount-parent-id   peer-group-id
      ===================================================================================
      (@dest_mnt) mnt_master[216]              309        297               shared:216
          \
           (@source_mnt) mnt_master[218]:      609        609               shared:218
      
      (1) mnt_master[216]:                     607        605               shared:216
          \
           (P1) mnt_master[218]:               624        607               shared:218
      
      (2) mnt_master[216]:                     576        574               shared:216
          \
           (P2) mnt_master[218]:               625        576               shared:218
      
      (3) mnt_master[216]:                     545        543               shared:216
          \
           (P3) mnt_master[218]:               626        545               shared:218
      
      After this sequence has been processed @last_source will point to (P3),
      the copy generated for the third peer in @dest_mnt's peer group we
      handled. So the copy of the source mount tree (P4) we create and mount
      on the first slave of @dest_mnt's peer group:
      
      ===================================================================================
                                               mount-id   mount-parent-id   peer-group-id
      ===================================================================================
          mnt_master[216]                      309        297               shared:216
         /
        /
      (S0) mnt_slave                           483        481               master:216
        \
         \    (P3) mnt_master[218]             626        545               shared:218
          \  /
           \/
          (P4) mnt_slave                       627        483               master:218
      
      will pick the last copy of the source mount tree (P3) as master, not (S0).
      
      When walking the propagation hierarchy via @last_source's master
      hierarchy we encounter (P3) but not (S0), i.e., @source_mnt.
      
      We can fix this in multiple ways:
      
      (1) By setting @last_source to @source_mnt after we processed the peers
          in @dest_mnt's peer group right after the peer loop in
          propagate_mnt().
      
      (2) By changing the termination condition that relies on finding exactly
          @source_mnt to finding a peer of @source_mnt.
      
      (3) By only moving @last_source when we actually venture into a new peer
          group or some clever variant thereof.
      
      The first two options are minimally invasive and what we want as a fix.
      The third option is more intrusive but something we'd like to explore in
      the near future.
      
      This passes all LTP tests and specifically the mount propagation
      testsuite part of it. It also holds up against all known reproducers of
      this issues.
      
      Final words.
      First, this is a clever but __worringly__ underdocumented algorithm.
      There isn't a single detailed comment to be found in next_group(),
      propagate_one() or anywhere else in that file for that matter. This has
      been a giant pain to understand and work through and a bug like this is
      insanely difficult to fix without a detailed understanding of what's
      happening. Let's not talk about the amount of time that was sunk into
      fixing this.
      
      Second, all the cool kids with access to
      unshare --mount --user --map-root --propagation=unchanged
      are going to have a lot of fun. IOW, triggerable by unprivileged users
      while namespace_lock() lock is held.
      
      [  115.848393] BUG: kernel NULL pointer dereference, address: 0000000000000010
      [  115.848967] #PF: supervisor read access in kernel mode
      [  115.849386] #PF: error_code(0x0000) - not-present page
      [  115.849803] PGD 0 P4D 0
      [  115.850012] Oops: 0000 [#1] PREEMPT SMP PTI
      [  115.850354] CPU: 0 PID: 15591 Comm: mount Not tainted 6.1.0-rc7 #3
      [  115.850851] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
      VirtualBox 12/01/2006
      [  115.851510] RIP: 0010:propagate_one.part.0+0x7f/0x1a0
      [  115.851924] Code: 75 eb 4c 8b 05 c2 25 37 02 4c 89 ca 48 8b 4a 10
      49 39 d0 74 1e 48 3b 81 e0 00 00 00 74 26 48 8b 92 e0 00 00 00 be 01
      00 00 00 <48> 8b 4a 10 49 39 d0 75 e2 40 84 f6 74 38 4c 89 05 84 25 37
      02 4d
      [  115.853441] RSP: 0018:ffffb8d5443d7d50 EFLAGS: 00010282
      [  115.853865] RAX: ffff8e4d87c41c80 RBX: ffff8e4d88ded780 RCX: ffff8e4da4333a00
      [  115.854458] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8e4d88ded780
      [  115.855044] RBP: ffff8e4d88ded780 R08: ffff8e4da4338000 R09: ffff8e4da43388c0
      [  115.855693] R10: 0000000000000002 R11: ffffb8d540158000 R12: ffffb8d5443d7da8
      [  115.856304] R13: ffff8e4d88ded780 R14: 0000000000000000 R15: 0000000000000000
      [  115.856859] FS:  00007f92c90c9800(0000) GS:ffff8e4dfdc00000(0000)
      knlGS:0000000000000000
      [  115.857531] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  115.858006] CR2: 0000000000000010 CR3: 0000000022f4c002 CR4: 00000000000706f0
      [  115.858598] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  115.859393] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  115.860099] Call Trace:
      [  115.860358]  <TASK>
      [  115.860535]  propagate_mnt+0x14d/0x190
      [  115.860848]  attach_recursive_mnt+0x274/0x3e0
      [  115.861212]  path_mount+0x8c8/0xa60
      [  115.861503]  __x64_sys_mount+0xf6/0x140
      [  115.861819]  do_syscall_64+0x5b/0x80
      [  115.862117]  ? do_faccessat+0x123/0x250
      [  115.862435]  ? syscall_exit_to_user_mode+0x17/0x40
      [  115.862826]  ? do_syscall_64+0x67/0x80
      [  115.863133]  ? syscall_exit_to_user_mode+0x17/0x40
      [  115.863527]  ? do_syscall_64+0x67/0x80
      [  115.863835]  ? do_syscall_64+0x67/0x80
      [  115.864144]  ? do_syscall_64+0x67/0x80
      [  115.864452]  ? exc_page_fault+0x70/0x170
      [  115.864775]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
      [  115.865187] RIP: 0033:0x7f92c92b0ebe
      [  115.865480] Code: 48 8b 0d 75 4f 0c 00 f7 d8 64 89 01 48 83 c8 ff
      c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00 00
      00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 42 4f 0c 00 f7 d8 64 89
      01 48
      [  115.866984] RSP: 002b:00007fff000aa728 EFLAGS: 00000246 ORIG_RAX:
      00000000000000a5
      [  115.867607] RAX: ffffffffffffffda RBX: 000055a77888d6b0 RCX: 00007f92c92b0ebe
      [  115.868240] RDX: 000055a77888d8e0 RSI: 000055a77888e6e0 RDI: 000055a77888e620
      [  115.868823] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
      [  115.869403] R10: 0000000000001000 R11: 0000000000000246 R12: 000055a77888e620
      [  115.869994] R13: 000055a77888d8e0 R14: 00000000ffffffff R15: 00007f92c93e4076
      [  115.870581]  </TASK>
      [  115.870763] Modules linked in: nft_fib_inet nft_fib_ipv4
      nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6
      nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6
      nf_defrag_ipv4 ip_set rfkill nf_tables nfnetlink qrtr snd_intel8x0
      sunrpc snd_ac97_codec ac97_bus snd_pcm snd_timer intel_rapl_msr
      intel_rapl_common snd vboxguest intel_powerclamp video rapl joydev
      soundcore i2c_piix4 wmi fuse zram xfs vmwgfx crct10dif_pclmul
      crc32_pclmul crc32c_intel polyval_clmulni polyval_generic
      drm_ttm_helper ttm e1000 ghash_clmulni_intel serio_raw ata_generic
      pata_acpi scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath
      [  115.875288] CR2: 0000000000000010
      [  115.875641] ---[ end trace 0000000000000000 ]---
      [  115.876135] RIP: 0010:propagate_one.part.0+0x7f/0x1a0
      [  115.876551] Code: 75 eb 4c 8b 05 c2 25 37 02 4c 89 ca 48 8b 4a 10
      49 39 d0 74 1e 48 3b 81 e0 00 00 00 74 26 48 8b 92 e0 00 00 00 be 01
      00 00 00 <48> 8b 4a 10 49 39 d0 75 e2 40 84 f6 74 38 4c 89 05 84 25 37
      02 4d
      [  115.878086] RSP: 0018:ffffb8d5443d7d50 EFLAGS: 00010282
      [  115.878511] RAX: ffff8e4d87c41c80 RBX: ffff8e4d88ded780 RCX: ffff8e4da4333a00
      [  115.879128] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8e4d88ded780
      [  115.879715] RBP: ffff8e4d88ded780 R08: ffff8e4da4338000 R09: ffff8e4da43388c0
      [  115.880359] R10: 0000000000000002 R11: ffffb8d540158000 R12: ffffb8d5443d7da8
      [  115.880962] R13: ffff8e4d88ded780 R14: 0000000000000000 R15: 0000000000000000
      [  115.881548] FS:  00007f92c90c9800(0000) GS:ffff8e4dfdc00000(0000)
      knlGS:0000000000000000
      [  115.882234] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  115.882713] CR2: 0000000000000010 CR3: 0000000022f4c002 CR4: 00000000000706f0
      [  115.883314] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  115.883966] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      Fixes: f2ebb3a9 ("smarter propagate_mnt()")
      Fixes: 5ec0811d ("propogate_mnt: Handle the first propogated copy being a slave")
      Cc: <stable@vger.kernel.org>
      Reported-by: default avatarDitang Chen <ditang.c@gmail.com>
      Signed-off-by: default avatarSeth Forshee (Digital Ocean) <sforshee@kernel.org>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      ---
      If there are no big objections I'll get this to Linus rather sooner than later.
      11933cf1
  2. 11 Dec, 2022 3 commits
  3. 10 Dec, 2022 10 commits
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm · 296a7b7e
      Linus Torvalds authored
      Pull ARM fix from Russell King:
       "One further ARM fix for 6.1 from Wang Kefeng, fixing up the handling
        for kfence faults"
      
      * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
        ARM: 9278/1: kfence: only handle translation faults
      296a7b7e
    • Tejun Heo's avatar
      memcg: fix possible use-after-free in memcg_write_event_control() · 4a7ba45b
      Tejun Heo authored
      memcg_write_event_control() accesses the dentry->d_name of the specified
      control fd to route the write call.  As a cgroup interface file can't be
      renamed, it's safe to access d_name as long as the specified file is a
      regular cgroup file.  Also, as these cgroup interface files can't be
      removed before the directory, it's safe to access the parent too.
      
      Prior to 347c4a87 ("memcg: remove cgroup_event->cft"), there was a
      call to __file_cft() which verified that the specified file is a regular
      cgroupfs file before further accesses.  The cftype pointer returned from
      __file_cft() was no longer necessary and the commit inadvertently dropped
      the file type check with it allowing any file to slip through.  With the
      invarients broken, the d_name and parent accesses can now race against
      renames and removals of arbitrary files and cause use-after-free's.
      
      Fix the bug by resurrecting the file type check in __file_cft().  Now that
      cgroupfs is implemented through kernfs, checking the file operations needs
      to go through a layer of indirection.  Instead, let's check the superblock
      and dentry type.
      
      Link: https://lkml.kernel.org/r/Y5FRm/cfcKPGzWwl@slm.duckdns.org
      Fixes: 347c4a87 ("memcg: remove cgroup_event->cft")
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarJann Horn <jannh@google.com>
      Acked-by: default avatarRoman Gushchin <roman.gushchin@linux.dev>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: <stable@vger.kernel.org>	[3.14+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4a7ba45b
    • Muchun Song's avatar
      MAINTAINERS: update Muchun Song's email · a501788a
      Muchun Song authored
      I'm moving to the @linux.dev account.  Map my old addresses and update it
      to my new address.
      
      Link: https://lkml.kernel.org/r/20221208115548.85244-1-songmuchun@bytedance.comSigned-off-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a501788a
    • John Starks's avatar
      mm/gup: fix gup_pud_range() for dax · fcd0ccd8
      John Starks authored
      For dax pud, pud_huge() returns true on x86. So the function works as long
      as hugetlb is configured. However, dax doesn't depend on hugetlb.
      Commit 414fd080 ("mm/gup: fix gup_pmd_range() for dax") fixed
      devmap-backed huge PMDs, but missed devmap-backed huge PUDs. Fix this as
      well.
      
      This fixes the below kernel panic:
      
      general protection fault, probably for non-canonical address 0x69e7c000cc478: 0000 [#1] SMP
      	< snip >
      Call Trace:
      <TASK>
      get_user_pages_fast+0x1f/0x40
      iov_iter_get_pages+0xc6/0x3b0
      ? mempool_alloc+0x5d/0x170
      bio_iov_iter_get_pages+0x82/0x4e0
      ? bvec_alloc+0x91/0xc0
      ? bio_alloc_bioset+0x19a/0x2a0
      blkdev_direct_IO+0x282/0x480
      ? __io_complete_rw_common+0xc0/0xc0
      ? filemap_range_has_page+0x82/0xc0
      generic_file_direct_write+0x9d/0x1a0
      ? inode_update_time+0x24/0x30
      __generic_file_write_iter+0xbd/0x1e0
      blkdev_write_iter+0xb4/0x150
      ? io_import_iovec+0x8d/0x340
      io_write+0xf9/0x300
      io_issue_sqe+0x3c3/0x1d30
      ? sysvec_reschedule_ipi+0x6c/0x80
      __io_queue_sqe+0x33/0x240
      ? fget+0x76/0xa0
      io_submit_sqes+0xe6a/0x18d0
      ? __fget_light+0xd1/0x100
      __x64_sys_io_uring_enter+0x199/0x880
      ? __context_tracking_enter+0x1f/0x70
      ? irqentry_exit_to_user_mode+0x24/0x30
      ? irqentry_exit+0x1d/0x30
      ? __context_tracking_exit+0xe/0x70
      do_syscall_64+0x3b/0x90
      entry_SYSCALL_64_after_hwframe+0x61/0xcb
      RIP: 0033:0x7fc97c11a7be
      	< snip >
      </TASK>
      ---[ end trace 48b2e0e67debcaeb ]---
      RIP: 0010:internal_get_user_pages_fast+0x340/0x990
      	< snip >
      Kernel panic - not syncing: Fatal exception
      Kernel Offset: disabled
      
      Link: https://lkml.kernel.org/r/1670392853-28252-1-git-send-email-ssengar@linux.microsoft.com
      Fixes: 414fd080 ("mm/gup: fix gup_pmd_range() for dax")
      Signed-off-by: default avatarJohn Starks <jostarks@microsoft.com>
      Signed-off-by: default avatarSaurabh Sengar <ssengar@linux.microsoft.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      fcd0ccd8
    • Liam Howlett's avatar
      mmap: fix do_brk_flags() modifying obviously incorrect VMAs · 6c28ca64
      Liam Howlett authored
      Add more sanity checks to the VMA that do_brk_flags() will expand.  Ensure
      the VMA matches basic merge requirements within the function before
      calling can_vma_merge_after().
      
      Drop the duplicate checks from vm_brk_flags() since they will be enforced
      later.
      
      The old code would expand file VMAs on brk(), which is functionally
      wrong and also dangerous in terms of locking because the brk() path
      isn't designed for file VMAs and therefore doesn't lock the file
      mapping.  Checking can_vma_merge_after() ensures that new anonymous
      VMAs can't be merged into file VMAs.
      
      See https://lore.kernel.org/linux-mm/CAG48ez1tJZTOjS_FjRZhvtDA-STFmdw8PEizPDwMGFd_ui0Nrw@mail.gmail.com/
      
      Link: https://lkml.kernel.org/r/20221205192304.1957418-1-Liam.Howlett@oracle.com
      Fixes: 2e7ce7d3 ("mm/mmap: change do_brk_flags() to expand existing VMA and add do_brk_munmap()")
      Signed-off-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Suggested-by: default avatarJann Horn <jannh@google.com>
      Cc: Jason A. Donenfeld <Jason@zx2c4.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6c28ca64
    • David Hildenbrand's avatar
      mm/swap: fix SWP_PFN_BITS with CONFIG_PHYS_ADDR_T_64BIT on 32bit · 630dc25e
      David Hildenbrand authored
      We use "unsigned long" to store a PFN in the kernel and phys_addr_t to
      store a physical address.
      
      On a 64bit system, both are 64bit wide.  However, on a 32bit system, the
      latter might be 64bit wide.  This is, for example, the case on x86 with
      PAE: phys_addr_t and PTEs are 64bit wide, while "unsigned long" only spans
      32bit.
      
      The current definition of SWP_PFN_BITS without MAX_PHYSMEM_BITS misses
      that case, and assumes that the maximum PFN is limited by an 32bit
      phys_addr_t.  This implies, that SWP_PFN_BITS will currently only be able
      to cover 4 GiB - 1 on any 32bit system with 4k page size, which is wrong.
      
      Let's rely on the number of bits in phys_addr_t instead, but make sure to
      not exceed the maximum swap offset, to not make the BUILD_BUG_ON() in
      is_pfn_swap_entry() unhappy.  Note that swp_entry_t is effectively an
      unsigned long and the maximum swap offset shares that value with the swap
      type.
      
      For example, on an 8 GiB x86 PAE system with a kernel config based on
      Debian 11.5 (-> CONFIG_FLATMEM=y, CONFIG_X86_PAE=y), we will currently
      fail removing migration entries (remove_migration_ptes()), because
      mm/page_vma_mapped.c:check_pte() will fail to identify a PFN match as
      swp_offset_pfn() wrongly masks off PFN bits.  For example,
      split_huge_page_to_list()->...->remap_page() will leave migration entries
      in place and continue to unlock the page.
      
      Later, when we stumble over these migration entries (e.g., via
      /proc/self/pagemap), pfn_swap_entry_to_page() will BUG_ON() because these
      migration entries shouldn't exist anymore and the page was unlocked.
      
      [   33.067591] kernel BUG at include/linux/swapops.h:497!
      [   33.067597] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
      [   33.067602] CPU: 3 PID: 742 Comm: cow Tainted: G            E      6.1.0-rc8+ #16
      [   33.067605] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014
      [   33.067606] EIP: pagemap_pmd_range+0x644/0x650
      [   33.067612] Code: 00 00 00 00 66 90 89 ce b9 00 f0 ff ff e9 ff fb ff ff 89 d8 31 db e8 48 c6 52 00 e9 23 fb ff ff e8 61 83 56 00 e9 b6 fe ff ff <0f> 0b bf 00 f0 ff ff e9 38 fa ff ff 3e 8d 74 26 00 55 89 e5 57 31
      [   33.067615] EAX: ee394000 EBX: 00000002 ECX: ee394000 EDX: 00000000
      [   33.067617] ESI: c1b0ded4 EDI: 00024a00 EBP: c1b0ddb4 ESP: c1b0dd68
      [   33.067619] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010246
      [   33.067624] CR0: 80050033 CR2: b7a00000 CR3: 01bbbd20 CR4: 00350ef0
      [   33.067625] Call Trace:
      [   33.067628]  ? madvise_free_pte_range+0x720/0x720
      [   33.067632]  ? smaps_pte_range+0x4b0/0x4b0
      [   33.067634]  walk_pgd_range+0x325/0x720
      [   33.067637]  ? mt_find+0x1d6/0x3a0
      [   33.067641]  ? mt_find+0x1d6/0x3a0
      [   33.067643]  __walk_page_range+0x164/0x170
      [   33.067646]  walk_page_range+0xf9/0x170
      [   33.067648]  ? __kmem_cache_alloc_node+0x2a8/0x340
      [   33.067653]  pagemap_read+0x124/0x280
      [   33.067658]  ? default_llseek+0x101/0x160
      [   33.067662]  ? smaps_account+0x1d0/0x1d0
      [   33.067664]  vfs_read+0x90/0x290
      [   33.067667]  ? do_madvise.part.0+0x24b/0x390
      [   33.067669]  ? debug_smp_processor_id+0x12/0x20
      [   33.067673]  ksys_pread64+0x58/0x90
      [   33.067675]  __ia32_sys_ia32_pread64+0x1b/0x20
      [   33.067680]  __do_fast_syscall_32+0x4c/0xc0
      [   33.067683]  do_fast_syscall_32+0x29/0x60
      [   33.067686]  do_SYSENTER_32+0x15/0x20
      [   33.067689]  entry_SYSENTER_32+0x98/0xf1
      
      Decrease the indentation level of SWP_PFN_BITS and SWP_PFN_MASK to keep it
      readable and consistent.
      
      [david@redhat.com: rely on sizeof(phys_addr_t) and min_t() instead]
        Link: https://lkml.kernel.org/r/20221206105737.69478-1-david@redhat.com
      [david@redhat.com: use "int" for comparison, as we're only comparing numbers < 64]
        Link: https://lkml.kernel.org/r/1f157500-2676-7cef-a84e-9224ed64e540@redhat.com
      Link: https://lkml.kernel.org/r/20221205150857.167583-1-david@redhat.com
      Fixes: 0d206b5d ("mm/swap: add swp_offset_pfn() to fetch PFN from swap entry")
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      630dc25e
    • Hugh Dickins's avatar
      tmpfs: fix data loss from failed fallocate · 44bcabd7
      Hugh Dickins authored
      Fix tmpfs data loss when the fallocate system call is interrupted by a
      signal, or fails for some other reason.  The partial folio handling in
      shmem_undo_range() forgot to consider this unfalloc case, and was liable
      to erase or truncate out data which had already been committed earlier.
      
      It turns out that none of the partial folio handling there is appropriate
      for the unfalloc case, which just wants to proceed to removal of whole
      folios: which find_get_entries() provides, even when partially covered.
      
      Original patch by Rui Wang.
      
      Link: https://lore.kernel.org/linux-mm/33b85d82.7764.1842e9ab207.Coremail.chenguoqic@163.com/
      Link: https://lkml.kernel.org/r/a5dac112-cf4b-7af-a33-f386e347fd38@google.com
      Fixes: b9a8a419 ("truncate,shmem: Handle truncates that split large folios")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reported-by: default avatarGuoqi Chen <chenguoqic@163.com>
        Link: https://lore.kernel.org/all/20221101032248.819360-1-kernel@hev.cc/
      Cc: Rui Wang <kernel@hev.cc>
      Cc: Huacai Chen <chenhuacai@loongson.cn>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
      Cc: <stable@vger.kernel.org>	[5.17+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      44bcabd7
    • Michal Hocko's avatar
      kselftests: cgroup: update kmem test precision tolerance · de16d6e4
      Michal Hocko authored
      1813e51e ("memcg: increase MEMCG_CHARGE_BATCH to 64") has changed
      the batch size while this test case has been left behind. This has led
      to a test failure reported by test bot:
      not ok 2 selftests: cgroup: test_kmem # exit=1
      
      Update the tolerance for the pcp charges to reflect the
      MEMCG_CHARGE_BATCH change to fix this.
      
      [akpm@linux-foundation.org: update comments, per Roman]
      Link: https://lkml.kernel.org/r/Y4m8Unt6FhWKC6IH@dhcp22.suse.cz
      Fixes: 1813e51e ("memcg: increase MEMCG_CHARGE_BATCH to 64")
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarkernel test robot <yujie.liu@intel.com>
        Link: https://lore.kernel.org/oe-lkp/202212010958.c1053bd3-yujie.liu@intel.comAcked-by: default avatarShakeel Butt <shakeelb@google.com>
      Acked-by: default avatarRoman Gushchin <roman.gushchin@linux.dev>
      Tested-by: default avatarYujie Liu <yujie.liu@intel.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Feng Tang <feng.tang@intel.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: "Michal Koutný" <mkoutny@suse.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      de16d6e4
    • Jason A. Donenfeld's avatar
      mm: do not BUG_ON missing brk mapping, because userspace can unmap it · f5ad5083
      Jason A. Donenfeld authored
      The following program will trigger the BUG_ON that this patch removes,
      because the user can munmap() mm->brk:
      
        #include <sys/syscall.h>
        #include <sys/mman.h>
        #include <assert.h>
        #include <unistd.h>
      
        static void *brk_now(void)
        {
          return (void *)syscall(SYS_brk, 0);
        }
      
        static void brk_set(void *b)
        {
          assert(syscall(SYS_brk, b) != -1);
        }
      
        int main(int argc, char *argv[])
        {
          void *b = brk_now();
          brk_set(b + 4096);
          assert(munmap(b - 4096, 4096 * 2) == 0);
          brk_set(b);
          return 0;
        }
      
      Compile that with musl, since glibc actually uses brk(), and then
      execute it, and it'll hit this splat:
      
        kernel BUG at mm/mmap.c:229!
        invalid opcode: 0000 [#1] PREEMPT SMP
        CPU: 12 PID: 1379 Comm: a.out Tainted: G S   U             6.1.0-rc7+ #419
        RIP: 0010:__do_sys_brk+0x2fc/0x340
        Code: 00 00 4c 89 ef e8 04 d3 fe ff eb 9a be 01 00 00 00 4c 89 ff e8 35 e0 fe ff e9 6e ff ff ff 4d 89 a7 20>
        RSP: 0018:ffff888140bc7eb0 EFLAGS: 00010246
        RAX: 0000000000000000 RBX: 00000000007e7000 RCX: ffff8881020fe000
        RDX: ffff8881020fe001 RSI: ffff8881955c9b00 RDI: ffff8881955c9b08
        RBP: 0000000000000000 R08: ffff8881955c9b00 R09: 00007ffc77844000
        R10: 0000000000000000 R11: 0000000000000001 R12: 00000000007e8000
        R13: 00000000007e8000 R14: 00000000007e7000 R15: ffff8881020fe000
        FS:  0000000000604298(0000) GS:ffff88901f700000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000603fe0 CR3: 000000015ba9a005 CR4: 0000000000770ee0
        PKRU: 55555554
        Call Trace:
         <TASK>
         do_syscall_64+0x2b/0x50
         entry_SYSCALL_64_after_hwframe+0x46/0xb0
        RIP: 0033:0x400678
        Code: 10 4c 8d 41 08 4c 89 44 24 10 4c 8b 01 8b 4c 24 08 83 f9 2f 77 0a 4c 8d 4c 24 20 4c 01 c9 eb 05 48 8b>
        RSP: 002b:00007ffc77863890 EFLAGS: 00000212 ORIG_RAX: 000000000000000c
        RAX: ffffffffffffffda RBX: 000000000040031b RCX: 0000000000400678
        RDX: 00000000004006a1 RSI: 00000000007e6000 RDI: 00000000007e7000
        RBP: 00007ffc77863900 R08: 0000000000000000 R09: 00000000007e6000
        R10: 00007ffc77863930 R11: 0000000000000212 R12: 00007ffc77863978
        R13: 00007ffc77863988 R14: 0000000000000000 R15: 0000000000000000
         </TASK>
      
      Instead, just return the old brk value if the original mapping has been
      removed.
      
      [akpm@linux-foundation.org: fix changelog, per Liam]
      Link: https://lkml.kernel.org/r/20221202162724.2009-1-Jason@zx2c4.com
      Fixes: 2e7ce7d3 ("mm/mmap: change do_brk_flags() to expand existing VMA and add do_brk_munmap()")
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Reviewed-by: default avatarSeongJae Park <sj@kernel.org>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Jann Horn <jannh@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f5ad5083
    • Matti Vaittinen's avatar
      mailmap: update Matti Vaittinen's email address · 38f1d4ae
      Matti Vaittinen authored
      The email backend used by ROHM keeps labeling patches as spam.  This can
      result in missing the patches.
      
      Switch my mail address from a company mail to a personal one.
      
      Link: https://lkml.kernel.org/r/8f4498b66fedcbded37b3b87e0c516e659f8f583.1669912977.git.mazziesaccount@gmail.comSigned-off-by: default avatarMatti Vaittinen <mazziesaccount@gmail.com>
      Suggested-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
      Cc: Anup Patel <anup@brainfault.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Atish Patra <atishp@atishpatra.org>
      Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
      Cc: Ben Widawsky <bwidawsk@kernel.org>
      Cc: Bjorn Andersson <andersson@kernel.org>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Colin Ian King <colin.i.king@gmail.com>
      Cc: Kirill Tkhai <tkhai@ya.ru>
      Cc: Qais Yousef <qyousef@layalina.io>
      Cc: Vasily Averin <vasily.averin@linux.dev>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      38f1d4ae
  4. 09 Dec, 2022 5 commits
  5. 08 Dec, 2022 21 commits
    • Linus Torvalds's avatar
      Merge tag 'block-6.1-2022-12-08' of git://git.kernel.dk/linux · 859c73d4
      Linus Torvalds authored
      Pull block fix from Jens Axboe:
       "A small fix for initializing the NVMe quirks before initializing the
        subsystem"
      
      * tag 'block-6.1-2022-12-08' of git://git.kernel.dk/linux:
        nvme initialize core quirks before calling nvme_init_subsystem
      859c73d4
    • Linus Torvalds's avatar
      Merge tag 'io_uring-6.1-2022-12-08' of git://git.kernel.dk/linux · af145500
      Linus Torvalds authored
      Pull io_uring fix from Jens Axboe:
       "A single small fix for an issue related to ordering between
        cancelation and current->io_uring teardown"
      
      * tag 'io_uring-6.1-2022-12-08' of git://git.kernel.dk/linux:
        io_uring: Fix a null-ptr-deref in io_tctx_exit_cb()
      af145500
    • Linus Torvalds's avatar
      Merge tag 'net-6.1-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 010b6761
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from bluetooth, can and netfilter.
      
        Current release - new code bugs:
      
         - bonding: ipv6: correct address used in Neighbour Advertisement
           parsing (src vs dst typo)
      
         - fec: properly scope IRQ coalesce setup during link up to supported
           chips only
      
        Previous releases - regressions:
      
         - Bluetooth fixes for fake CSR clones (knockoffs):
             - re-add ERR_DATA_REPORTING quirk
             - fix crash when device is replugged
      
         - Bluetooth:
             - silence a user-triggerable dmesg error message
             - L2CAP: fix u8 overflow, oob access
             - correct vendor codec definition
             - fix support for Read Local Supported Codecs V2
      
         - ti: am65-cpsw: fix RGMII configuration at SPEED_10
      
         - mana: fix race on per-CQ variable NAPI work_done
      
        Previous releases - always broken:
      
         - af_unix: diag: fetch user_ns from in_skb in unix_diag_get_exact(),
           avoid null-deref
      
         - af_can: fix NULL pointer dereference in can_rcv_filter
      
         - can: slcan: fix UAF with a freed work
      
         - can: can327: flush TX_work on ldisc .close()
      
         - macsec: add missing attribute validation for offload
      
         - ipv6: avoid use-after-free in ip6_fragment()
      
         - nft_set_pipapo: actually validate intervals in fields after the
           first one
      
         - mvneta: prevent oob access in mvneta_config_rss()
      
         - ipv4: fix incorrect route flushing when table ID 0 is used, or when
           source address is deleted
      
         - phy: mxl-gpy: add workaround for IRQ bug on GPY215B and GPY215C"
      
      * tag 'net-6.1-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (77 commits)
        net: dsa: sja1105: avoid out of bounds access in sja1105_init_l2_policing()
        s390/qeth: fix use-after-free in hsci
        macsec: add missing attribute validation for offload
        net: mvneta: Fix an out of bounds check
        net: thunderbolt: fix memory leak in tbnet_open()
        ipv6: avoid use-after-free in ip6_fragment()
        net: plip: don't call kfree_skb/dev_kfree_skb() under spin_lock_irq()
        net: phy: mxl-gpy: add MDINT workaround
        net: dsa: mv88e6xxx: accept phy-mode = "internal" for internal PHY ports
        xen/netback: don't call kfree_skb() under spin_lock_irqsave()
        dpaa2-switch: Fix memory leak in dpaa2_switch_acl_entry_add() and dpaa2_switch_acl_entry_remove()
        ethernet: aeroflex: fix potential skb leak in greth_init_rings()
        tipc: call tipc_lxc_xmit without holding node_read_lock
        can: esd_usb: Allow REC and TEC to return to zero
        can: can327: flush TX_work on ldisc .close()
        can: slcan: fix freed work crash
        can: af_can: fix NULL pointer dereference in can_rcv_filter
        net: dsa: sja1105: fix memory leak in sja1105_setup_devlink_regions()
        ipv4: Fix incorrect route flushing when table ID 0 is used
        ipv4: Fix incorrect route flushing when source address is deleted
        ...
      010b6761
    • Linus Torvalds's avatar
      Merge tag 'for-linus-2022120801' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid · ce19275f
      Linus Torvalds authored
      Pull HID fixes from Jiri Kosina:
       "A regression fix for handling Logitech HID++ devices and memory
        corruption fixes:
      
         - regression fix (revert) for catch-all handling of Logitech HID++
           Bluetooth devices; there are devices that turn out not to work with
           this, and the root cause is yet to be properly understood. So we
           are dropping it for now, and it will be revisited for 6.2 or 6.3
           (Benjamin Tissoires)
      
         - memory corruption fix in HID core (ZhangPeng)
      
         - memory corruption fix in hid-lg4ff (Anastasia Belova)
      
         - Kconfig fix for I2C_HID (Benjamin Tissoires)
      
         - a few device-id specific quirks that piggy-back on top of the
           important fixes above"
      
      * tag 'for-linus-2022120801' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
        Revert "HID: logitech-hidpp: Enable HID++ for all the Logitech Bluetooth devices"
        Revert "HID: logitech-hidpp: Remove special-casing of Bluetooth devices"
        HID: usbhid: Add ALWAYS_POLL quirk for some mice
        HID: core: fix shift-out-of-bounds in hid_report_raw_event
        HID: uclogic: Add HID_QUIRK_HIDINPUT_FORCE quirk
        HID: fix I2C_HID not selected when I2C_HID_OF_ELAN is
        HID: hid-lg4ff: Add check for empty lbuf
        HID: ite: Enable QUIRK_TOUCHPAD_ON_OFF_REPORT on Acer Aspire Switch V 10
        HID: uclogic: Fix frame templates for big endian architectures
      ce19275f
    • Linus Torvalds's avatar
      Merge tag 'soc-fixes-6.1-5' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · f3e84166
      Linus Torvalds authored
      Pull ARM SoC fix from Arnd Bergmann:
       "One last build fix came in, addressing a link failure when building
        without CONFIG_OUTER_CACHE"
      
      * tag 'soc-fixes-6.1-5' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
        ARM: at91: fix build for SAMA5D3 w/o L2 cache
      f3e84166
    • Benjamin Tissoires's avatar
      Revert "HID: logitech-hidpp: Enable HID++ for all the Logitech Bluetooth devices" · a9d9e46c
      Benjamin Tissoires authored
      This reverts commit 532223c8.
      
      As reported in [0], hid-logitech-hidpp now binds on all bluetooth mice,
      but there are corner cases where hid-logitech-hidpp just gives up on
      the mouse. This leads the end user with a dead mouse.
      
      Given that we are at -rc8, we are definitively too late to find a proper
      fix. We already identified 2 issues less than 24 hours after the bug
      report. One in that ->match() was never designed to be used anywhere else
      than in hid-generic, and the other that hid-logitech-hidpp has corner
      cases where it gives up on devices it is not supposed to.
      
      So we have no choice but postpone this patch to the next kernel release.
      
      [0] https://lore.kernel.org/linux-input/CAJZ5v0g-_o4AqMgNwihCb0jrwrcJZfRrX=jv8aH54WNKO7QB8A@mail.gmail.com/Reported-by: default avatarRafael J . Wysocki <rjw@rjwysocki.net>
      Signed-off-by: default avatarBenjamin Tissoires <benjamin.tissoires@redhat.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      a9d9e46c
    • Benjamin Tissoires's avatar
      Revert "HID: logitech-hidpp: Remove special-casing of Bluetooth devices" · 40f2432b
      Benjamin Tissoires authored
      This reverts commit 8544c812.
      
      We need to revert commit 532223c8 ("HID: logitech-hidpp: Enable HID++
      for all the Logitech Bluetooth devices") because that commit might make
      hid-logitech-hidpp bind on mice that are not well enough supported by
      hid-logitech-hidpp, and the end result is that the probe of those mice
      is now returning -ENODEV, leaving the end user with a dead mouse.
      
      Given that commit 8544c812 ("HID: logitech-hidpp: Remove special-casing
      of Bluetooth devices") is a direct dependency of 532223c8, revert it
      too.
      
      Note that this also adapt according to commit 908d325e ("HID:
      logitech-hidpp: Detect hi-res scrolling support") to re-add support of
      the devices that were removed from that commit too.
      
      I have locally an MX Master and I tested this device with that revert,
      ensuring we still have high-res scrolling.
      Reported-by: default avatarRafael J . Wysocki <rjw@rjwysocki.net>
      Signed-off-by: default avatarBenjamin Tissoires <benjamin.tissoires@redhat.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      40f2432b
    • Linus Torvalds's avatar
      Merge tag 'loongarch-fixes-6.1-3' of... · 7f043b76
      Linus Torvalds authored
      Merge tag 'loongarch-fixes-6.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
      
      Pull LoongArch fixes from Huacai Chen:
       "Export smp_send_reschedule() for modules use, fix a huge page entry
        update issue, and add documents for booting description"
      
      * tag 'loongarch-fixes-6.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
        docs/zh_CN: Add LoongArch booting description's translation
        docs/LoongArch: Add booting description
        LoongArch: mm: Fix huge page entry update for virtual machine
        LoongArch: Export symbol for function smp_send_reschedule()
      7f043b76
    • Linus Torvalds's avatar
      Merge tag 'for-linus-xsa-6.1-rc9b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · a4c3a07e
      Linus Torvalds authored
      Pull xen fix from Juergen Gross:
       "A single fix for the recent security issue XSA-423"
      
      * tag 'for-linus-xsa-6.1-rc9b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen/netback: fix build warning
      a4c3a07e
    • Linus Torvalds's avatar
      Merge tag 'gpio-fixes-for-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux · 306ba240
      Linus Torvalds authored
      Pull gpio fixes from Bartosz Golaszewski:
      
       - fix a memory leak in gpiolib core
      
       - fix reference leaks in gpio-amd8111 and gpio-rockchip
      
      * tag 'gpio-fixes-for-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
        gpio/rockchip: fix refcount leak in rockchip_gpiolib_register()
        gpio: amd8111: Fix PCI device reference count leak
        gpiolib: fix memory leak in gpiochip_setup_dev()
      306ba240
    • Linus Torvalds's avatar
      Merge tag 'ata-6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata · 57fb3f66
      Linus Torvalds authored
      Pull ATA fix from Damien Le Moal:
      
       - Avoid a NULL pointer dereference in the libahci platform code that
         can happen on initialization when a device tree does not specify
         names for the adapter clocks (from Anders)
      
      * tag 'ata-6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
        ata: libahci_platform: ahci_platform_find_clk: oops, NULL pointer
      57fb3f66
    • Tejun Heo's avatar
      memcg: Fix possible use-after-free in memcg_write_event_control() · fbf83212
      Tejun Heo authored
      memcg_write_event_control() accesses the dentry->d_name of the specified
      control fd to route the write call.  As a cgroup interface file can't be
      renamed, it's safe to access d_name as long as the specified file is a
      regular cgroup file.  Also, as these cgroup interface files can't be
      removed before the directory, it's safe to access the parent too.
      
      Prior to 347c4a87 ("memcg: remove cgroup_event->cft"), there was a
      call to __file_cft() which verified that the specified file is a regular
      cgroupfs file before further accesses.  The cftype pointer returned from
      __file_cft() was no longer necessary and the commit inadvertently
      dropped the file type check with it allowing any file to slip through.
      With the invarients broken, the d_name and parent accesses can now race
      against renames and removals of arbitrary files and cause
      use-after-free's.
      
      Fix the bug by resurrecting the file type check in __file_cft().  Now
      that cgroupfs is implemented through kernfs, checking the file
      operations needs to go through a layer of indirection.  Instead, let's
      check the superblock and dentry type.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Fixes: 347c4a87 ("memcg: remove cgroup_event->cft")
      Cc: stable@kernel.org # v3.14+
      Reported-by: default avatarJann Horn <jannh@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarRoman Gushchin <roman.gushchin@linux.dev>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fbf83212
    • Radu Nicolae Pirea (OSS)'s avatar
      net: dsa: sja1105: avoid out of bounds access in sja1105_init_l2_policing() · f8bac7f9
      Radu Nicolae Pirea (OSS) authored
      The SJA1105 family has 45 L2 policing table entries
      (SJA1105_MAX_L2_POLICING_COUNT) and SJA1110 has 110
      (SJA1110_MAX_L2_POLICING_COUNT). Keeping the table structure but
      accounting for the difference in port count (5 in SJA1105 vs 10 in
      SJA1110) does not fully explain the difference. Rather, the SJA1110 also
      has L2 ingress policers for multicast traffic. If a packet is classified
      as multicast, it will be processed by the policer index 99 + SRCPORT.
      
      The sja1105_init_l2_policing() function initializes all L2 policers such
      that they don't interfere with normal packet reception by default. To have
      a common code between SJA1105 and SJA1110, the index of the multicast
      policer for the port is calculated because it's an index that is out of
      bounds for SJA1105 but in bounds for SJA1110, and a bounds check is
      performed.
      
      The code fails to do the proper thing when determining what to do with the
      multicast policer of port 0 on SJA1105 (ds->num_ports = 5). The "mcast"
      index will be equal to 45, which is also equal to
      table->ops->max_entry_count (SJA1105_MAX_L2_POLICING_COUNT). So it passes
      through the check. But at the same time, SJA1105 doesn't have multicast
      policers. So the code programs the SHARINDX field of an out-of-bounds
      element in the L2 Policing table of the static config.
      
      The comparison between index 45 and 45 entries should have determined the
      code to not access this policer index on SJA1105, since its memory wasn't
      even allocated.
      
      With enough bad luck, the out-of-bounds write could even overwrite other
      valid kernel data, but in this case, the issue was detected using KASAN.
      
      Kernel log:
      
      sja1105 spi5.0: Probed switch chip: SJA1105Q
      ==================================================================
      BUG: KASAN: slab-out-of-bounds in sja1105_setup+0x1cbc/0x2340
      Write of size 8 at addr ffffff880bd57708 by task kworker/u8:0/8
      ...
      Workqueue: events_unbound deferred_probe_work_func
      Call trace:
      ...
      sja1105_setup+0x1cbc/0x2340
      dsa_register_switch+0x1284/0x18d0
      sja1105_probe+0x748/0x840
      ...
      Allocated by task 8:
      ...
      sja1105_setup+0x1bcc/0x2340
      dsa_register_switch+0x1284/0x18d0
      sja1105_probe+0x748/0x840
      ...
      
      Fixes: 38fbe91f ("net: dsa: sja1105: configure the multicast policers, if present")
      CC: stable@vger.kernel.org # 5.15+
      Signed-off-by: default avatarRadu Nicolae Pirea (OSS) <radu-nicolae.pirea@oss.nxp.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Link: https://lore.kernel.org/r/20221207132347.38698-1-radu-nicolae.pirea@oss.nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f8bac7f9
    • Alexandra Winter's avatar
      s390/qeth: fix use-after-free in hsci · ebaaadc3
      Alexandra Winter authored
      KASAN found that addr was dereferenced after br2dev_event_work was freed.
      
      ==================================================================
      BUG: KASAN: use-after-free in qeth_l2_br2dev_worker+0x5ba/0x6b0
      Read of size 1 at addr 00000000fdcea440 by task kworker/u760:4/540
      CPU: 17 PID: 540 Comm: kworker/u760:4 Tainted: G            E      6.1.0-20221128.rc7.git1.5aa3bed4ce83.300.fc36.s390x+kasan #1
      Hardware name: IBM 8561 T01 703 (LPAR)
      Workqueue: 0.0.8000_event qeth_l2_br2dev_worker
      Call Trace:
       [<000000016944d4ce>] dump_stack_lvl+0xc6/0xf8
       [<000000016942cd9c>] print_address_description.constprop.0+0x34/0x2a0
       [<000000016942d118>] print_report+0x110/0x1f8
       [<0000000167a7bd04>] kasan_report+0xfc/0x128
       [<000000016938d79a>] qeth_l2_br2dev_worker+0x5ba/0x6b0
       [<00000001673edd1e>] process_one_work+0x76e/0x1128
       [<00000001673ee85c>] worker_thread+0x184/0x1098
       [<000000016740718a>] kthread+0x26a/0x310
       [<00000001672c606a>] __ret_from_fork+0x8a/0xe8
       [<00000001694711da>] ret_from_fork+0xa/0x40
      Allocated by task 108338:
       kasan_save_stack+0x40/0x68
       kasan_set_track+0x36/0x48
       __kasan_kmalloc+0xa0/0xc0
       qeth_l2_switchdev_event+0x25a/0x738
       atomic_notifier_call_chain+0x9c/0xf8
       br_switchdev_fdb_notify+0xf4/0x110
       fdb_notify+0x122/0x180
       fdb_add_entry.constprop.0.isra.0+0x312/0x558
       br_fdb_add+0x59e/0x858
       rtnl_fdb_add+0x58a/0x928
       rtnetlink_rcv_msg+0x5f8/0x8d8
       netlink_rcv_skb+0x1f2/0x408
       netlink_unicast+0x570/0x790
       netlink_sendmsg+0x752/0xbe0
       sock_sendmsg+0xca/0x110
       ____sys_sendmsg+0x510/0x6a8
       ___sys_sendmsg+0x12a/0x180
       __sys_sendmsg+0xe6/0x168
       __do_sys_socketcall+0x3c8/0x468
       do_syscall+0x22c/0x328
       __do_syscall+0x94/0xf0
       system_call+0x82/0xb0
      Freed by task 540:
       kasan_save_stack+0x40/0x68
       kasan_set_track+0x36/0x48
       kasan_save_free_info+0x4c/0x68
       ____kasan_slab_free+0x14e/0x1a8
       __kasan_slab_free+0x24/0x30
       __kmem_cache_free+0x168/0x338
       qeth_l2_br2dev_worker+0x154/0x6b0
       process_one_work+0x76e/0x1128
       worker_thread+0x184/0x1098
       kthread+0x26a/0x310
       __ret_from_fork+0x8a/0xe8
       ret_from_fork+0xa/0x40
      Last potentially related work creation:
       kasan_save_stack+0x40/0x68
       __kasan_record_aux_stack+0xbe/0xd0
       insert_work+0x56/0x2e8
       __queue_work+0x4ce/0xd10
       queue_work_on+0xf4/0x100
       qeth_l2_switchdev_event+0x520/0x738
       atomic_notifier_call_chain+0x9c/0xf8
       br_switchdev_fdb_notify+0xf4/0x110
       fdb_notify+0x122/0x180
       fdb_add_entry.constprop.0.isra.0+0x312/0x558
       br_fdb_add+0x59e/0x858
       rtnl_fdb_add+0x58a/0x928
       rtnetlink_rcv_msg+0x5f8/0x8d8
       netlink_rcv_skb+0x1f2/0x408
       netlink_unicast+0x570/0x790
       netlink_sendmsg+0x752/0xbe0
       sock_sendmsg+0xca/0x110
       ____sys_sendmsg+0x510/0x6a8
       ___sys_sendmsg+0x12a/0x180
       __sys_sendmsg+0xe6/0x168
       __do_sys_socketcall+0x3c8/0x468
       do_syscall+0x22c/0x328
       __do_syscall+0x94/0xf0
       system_call+0x82/0xb0
      Second to last potentially related work creation:
       kasan_save_stack+0x40/0x68
       __kasan_record_aux_stack+0xbe/0xd0
       kvfree_call_rcu+0xb2/0x760
       kernfs_unlink_open_file+0x348/0x430
       kernfs_fop_release+0xc2/0x320
       __fput+0x1ae/0x768
       task_work_run+0x1bc/0x298
       exit_to_user_mode_prepare+0x1a0/0x1a8
       __do_syscall+0x94/0xf0
       system_call+0x82/0xb0
      The buggy address belongs to the object at 00000000fdcea400
       which belongs to the cache kmalloc-96 of size 96
      The buggy address is located 64 bytes inside of
       96-byte region [00000000fdcea400, 00000000fdcea460)
      The buggy address belongs to the physical page:
      page:000000005a9c26e8 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xfdcea
      flags: 0x3ffff00000000200(slab|node=0|zone=1|lastcpupid=0x1ffff)
      raw: 3ffff00000000200 0000000000000000 0000000100000122 000000008008cc00
      raw: 0000000000000000 0020004100000000 ffffffff00000001 0000000000000000
      page dumped because: kasan: bad access detected
      Memory state around the buggy address:
       00000000fdcea300: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
       00000000fdcea380: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
      >00000000fdcea400: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
                                                 ^
       00000000fdcea480: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
       00000000fdcea500: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
      ==================================================================
      
      Fixes: f7936b7b ("s390/qeth: Update MACs of LEARNING_SYNC device")
      Reported-by: default avatarThorsten Winkler <twinkler@linux.ibm.com>
      Signed-off-by: default avatarAlexandra Winter <wintera@linux.ibm.com>
      Reviewed-by: default avatarWenjia Zhang <wenjia@linux.ibm.com>
      Reviewed-by: default avatarThorsten Winkler <twinkler@linux.ibm.com>
      Link: https://lore.kernel.org/r/20221207105304.20494-1-wintera@linux.ibm.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ebaaadc3
    • Emeel Hakim's avatar
      macsec: add missing attribute validation for offload · 38099024
      Emeel Hakim authored
      Add missing attribute validation for IFLA_MACSEC_OFFLOAD
      to the netlink policy.
      
      Fixes: 791bb3fc ("net: macsec: add support for specifying offload upon link creation")
      Signed-off-by: default avatarEmeel Hakim <ehakim@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Reviewed-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Link: https://lore.kernel.org/r/20221207101618.989-1-ehakim@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      38099024
    • Dan Carpenter's avatar
      net: mvneta: Fix an out of bounds check · cdd97383
      Dan Carpenter authored
      In an earlier commit, I added a bounds check to prevent an out of bounds
      read and a WARN().  On further discussion and consideration that check
      was probably too aggressive.  Instead of returning -EINVAL, a better fix
      would be to just prevent the out of bounds read but continue the process.
      
      Background: The value of "pp->rxq_def" is a number between 0-7 by default,
      or even higher depending on the value of "rxq_number", which is a module
      parameter. If the value is more than the number of available CPUs then
      it will trigger the WARN() in cpu_max_bits_warn().
      
      Fixes: e8b4fc13 ("net: mvneta: Prevent out of bounds read in mvneta_config_rss()")
      Signed-off-by: default avatarDan Carpenter <error27@gmail.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Link: https://lore.kernel.org/r/Y5A7d1E5ccwHTYPf@kadamSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cdd97383
    • Zhengchao Shao's avatar
      net: thunderbolt: fix memory leak in tbnet_open() · ed14e590
      Zhengchao Shao authored
      When tb_ring_alloc_rx() failed in tbnet_open(), ida that allocated in
      tb_xdomain_alloc_out_hopid() is not released. Add
      tb_xdomain_release_out_hopid() to the error path to release ida.
      
      Fixes: 180b0689 ("thunderbolt: Allow multiple DMA tunnels over a single XDomain connection")
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Acked-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Link: https://lore.kernel.org/r/20221207015001.1755826-1-shaozhengchao@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ed14e590
    • Francesco Dolcini's avatar
      Revert "ARM: dts: imx7: Fix NAND controller size-cells" · ef19964d
      Francesco Dolcini authored
      This reverts commit 753395ea.
      
      It introduced a boot regression on colibri-imx7, and potentially any
      other i.MX7 boards with MTD partition list generated into the fdt by
      U-Boot.
      
      While the commit we are reverting here is not obviously wrong, it fixes
      only a dt binding checker warning that is non-functional, while it
      introduces a boot regression and there is no obvious fix ready.
      
      Fixes: 753395ea ("ARM: dts: imx7: Fix NAND controller size-cells")
      Signed-off-by: default avatarFrancesco Dolcini <francesco.dolcini@toradex.com>
      Reviewed-by: default avatarMiquel Raynal <miquel.raynal@bootlin.com>
      Acked-by: default avatarMarek Vasut <marex@denx.de>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/all/Y4dgBTGNWpM6SQXI@francesco-nb.int.toradex.com/
      Link: https://lore.kernel.org/all/20221205144917.6514168a@xps-13/Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      ef19964d
    • Yanteng Si's avatar
      docs/zh_CN: Add LoongArch booting description's translation · 1385313d
      Yanteng Si authored
      Translate ../loongarch/booting.rst into Chinese.
      Suggested-by: default avatarXiaotian Wu <wuxiaotian@loongson.cn>
      Signed-off-by: default avatarYanteng Si <siyanteng@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      1385313d
    • Yanteng Si's avatar
      docs/LoongArch: Add booting description · 38eb496d
      Yanteng Si authored
      1, Describe the information passed from BootLoader to kernel.
      2, Describe the meaning and values of the kernel image header field.
      Suggested-by: default avatarXiaotian Wu <wuxiaotian@loongson.cn>
      Signed-off-by: default avatarYanteng Si <siyanteng@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      38eb496d
    • Huacai Chen's avatar
      LoongArch: mm: Fix huge page entry update for virtual machine · b681604e
      Huacai Chen authored
      In virtual machine (guest mode), the tlbwr instruction can not write the
      last entry of MTLB, so we need to make it non-present by invtlb and then
      write it by tlbfill. This also simplify the whole logic.
      Signed-off-by: default avatarRui Wang <wangrui@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      b681604e