1. 01 Mar, 2024 7 commits
    • Christian Brauner's avatar
      libfs: improve path_from_stashed() · e9c5263c
      Christian Brauner authored
      Right now we pass a bunch of info that is fs specific which doesn't make
      a lot of sense and it bleeds fs sepcific details into the generic
      helper. nsfs and pidfs have slightly different needs when initializing
      inodes. Add simple operations that are stashed in sb->s_fs_info that
      both can implement. This also allows us to get rid of cleaning up
      references in the caller. All in all path_from_stashed() becomes way
      simpler.
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      e9c5263c
    • Christian Brauner's avatar
      libfs: add stashed_dentry_prune() · 2558e3b2
      Christian Brauner authored
      Both pidfs and nsfs use a memory location to stash a dentry for reuse by
      concurrent openers. Right now two custom
      dentry->d_prune::{ns,pidfs}_prune_dentry() methods are needed that do
      the same thing. The only thing that differs is that they need to get to
      the memory location to store or retrieve the dentry from differently.
      Fix that by remember the stashing location for the dentry in
      dentry->d_fsdata which allows us to retrieve it in dentry->d_prune. That
      in turn makes it possible to add a common helper that pidfs and nsfs can
      both use.
      
      Link: https://lore.kernel.org/r/CAHk-=wg8cHY=i3m6RnXQ2Y2W8psicKWQEZq1=94ivUiviM-0OA@mail.gmail.comSigned-off-by: default avatarChristian Brauner <brauner@kernel.org>
      2558e3b2
    • Christian Brauner's avatar
      libfs: improve path_from_stashed() helper · 159a0d9f
      Christian Brauner authored
      In earlier patches we moved both nsfs and pidfs to path_from_stashed().
      The helper currently tries to add and stash a new dentry if a reusable
      dentry couldn't be found and returns EAGAIN if it lost the race to stash
      the dentry. The caller can use EAGAIN to retry.
      
      The helper and the two filesystems be written in a way that makes
      returning EAGAIN unnecessary. To do this we need to change the
      dentry->d_prune() implementation of nsfs and pidfs to not simply replace
      the stashed dentry with NULL but to use a cmpxchg() and only replace
      their own dentry.
      
      Then path_from_stashed() can then be changed to not just stash a new
      dentry when no dentry is currently stashed but also when an already dead
      dentry is stashed. If another task managed to install a dentry in the
      meantime it can simply be reused. Pack that into a loop and call it a
      day.
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Link: https://lore.kernel.org/r/CAHk-=wgtLF5Z5=15-LKAczWm=-tUjHO+Bpf7WjBG+UU3s=fEQw@mail.gmail.comSigned-off-by: default avatarChristian Brauner <brauner@kernel.org>
      159a0d9f
    • Christian Brauner's avatar
      pidfs: convert to path_from_stashed() helper · b28ddcc3
      Christian Brauner authored
      Moving pidfds from the anonymous inode infrastructure to a separate tiny
      in-kernel filesystem similar to sockfs, pipefs, and anon_inodefs causes
      selinux denials and thus various userspace components that make heavy
      use of pidfds to fail as pidfds used anon_inode_getfile() which aren't
      subject to any LSM hooks. But dentry_open() is and that would cause
      regressions.
      
      The failures that are seen are selinux denials. But the core failure is
      dbus-broker. That cascades into other services failing that depend on
      dbus-broker. For example, when dbus-broker fails to start polkit and all
      the others won't be able to work because they depend on dbus-broker.
      
      The reason for dbus-broker failing is because it doesn't handle failures
      for SO_PEERPIDFD correctly. Last kernel release we introduced
      SO_PEERPIDFD (and SCM_PIDFD). SO_PEERPIDFD allows dbus-broker and polkit
      and others to receive a pidfd for the peer of an AF_UNIX socket. This is
      the first time in the history of Linux that we can safely authenticate
      clients in a race-free manner.
      
      dbus-broker immediately made use of this but messed up the error
      checking. It only allowed EINVAL as a valid failure for SO_PEERPIDFD.
      That's obviously problematic not just because of LSM denials but because
      of seccomp denials that would prevent SO_PEERPIDFD from working; or any
      other new error code from there.
      
      So this is catching a flawed implementation in dbus-broker as well. It
      has to fallback to the old pid-based authentication when SO_PEERPIDFD
      doesn't work no matter the reasons otherwise it'll always risk such
      failures. So overall that LSM denial should not have caused dbus-broker
      to fail. It can never assume that a feature released one kernel ago like
      SO_PEERPIDFD can be assumed to be available.
      
      So, the next fix separate from the selinux policy update is to try and
      fix dbus-broker at [3]. That should make it into Fedora as well. In
      addition the selinux reference policy should also be updated. See [4]
      for that. If Selinux is in enforcing mode in userspace and it encounters
      anything that it doesn't know about it will deny it by default. And the
      policy is entirely in userspace including declaring new types for stuff
      like nsfs or pidfs to allow it.
      
      For now we continue to raise S_PRIVATE on the inode if it's a pidfs
      inode which means things behave exactly like before.
      
      Link: https://bugzilla.redhat.com/show_bug.cgi?id=2265630
      Link: https://github.com/fedora-selinux/selinux-policy/pull/2050
      Link: https://github.com/bus1/dbus-broker/pull/343 [3]
      Link: https://github.com/SELinuxProject/refpolicy/pull/762 [4]
      Reported-by: default avatarNathan Chancellor <nathan@kernel.org>
      Link: https://lore.kernel.org/r/20240222190334.GA412503@dev-arch.thelio-3990X
      Link: https://lore.kernel.org/r/20240218-neufahrzeuge-brauhaus-fb0eb6459771@braunerSigned-off-by: default avatarChristian Brauner <brauner@kernel.org>
      b28ddcc3
    • Christian Brauner's avatar
      nsfs: convert to path_from_stashed() helper · 1fa08aec
      Christian Brauner authored
      Use the newly added path_from_stashed() helper for nsfs.
      
      Link: https://lore.kernel.org/r/20240218-neufahrzeuge-brauhaus-fb0eb6459771@braunerSigned-off-by: default avatarChristian Brauner <brauner@kernel.org>
      1fa08aec
    • Christian Brauner's avatar
      libfs: add path_from_stashed() · 07fd7c32
      Christian Brauner authored
      Add a helper for both nsfs and pidfs to reuse an already stashed dentry
      or to add and stash a new dentry.
      
      Link: https://lore.kernel.org/r/20240218-neufahrzeuge-brauhaus-fb0eb6459771@braunerSigned-off-by: default avatarChristian Brauner <brauner@kernel.org>
      07fd7c32
    • Christian Brauner's avatar
      pidfd: add pidfs · cb12fd8e
      Christian Brauner authored
      This moves pidfds from the anonymous inode infrastructure to a tiny
      pseudo filesystem. This has been on my todo for quite a while as it will
      unblock further work that we weren't able to do simply because of the
      very justified limitations of anonymous inodes. Moving pidfds to a tiny
      pseudo filesystem allows:
      
      * statx() on pidfds becomes useful for the first time.
      * pidfds can be compared simply via statx() and then comparing inode
        numbers.
      * pidfds have unique inode numbers for the system lifetime.
      * struct pid is now stashed in inode->i_private instead of
        file->private_data. This means it is now possible to introduce
        concepts that operate on a process once all file descriptors have been
        closed. A concrete example is kill-on-last-close.
      * file->private_data is freed up for per-file options for pidfds.
      * Each struct pid will refer to a different inode but the same struct
        pid will refer to the same inode if it's opened multiple times. In
        contrast to now where each struct pid refers to the same inode. Even
        if we were to move to anon_inode_create_getfile() which creates new
        inodes we'd still be associating the same struct pid with multiple
        different inodes.
      
      The tiny pseudo filesystem is not visible anywhere in userspace exactly
      like e.g., pipefs and sockfs. There's no lookup, there's no complex
      inode operations, nothing. Dentries and inodes are always deleted when
      the last pidfd is closed.
      
      We allocate a new inode for each struct pid and we reuse that inode for
      all pidfds. We use iget_locked() to find that inode again based on the
      inode number which isn't recycled. We allocate a new dentry for each
      pidfd that uses the same inode. That is similar to anonymous inodes
      which reuse the same inode for thousands of dentries. For pidfds we're
      talking way less than that. There usually won't be a lot of concurrent
      openers of the same struct pid. They can probably often be counted on
      two hands. I know that systemd does use separate pidfd for the same
      struct pid for various complex process tracking issues. So I think with
      that things actually become way simpler. Especially because we don't
      have to care about lookup. Dentries and inodes continue to be always
      deleted.
      
      The code is entirely optional and fairly small. If it's not selected we
      fallback to anonymous inodes. Heavily inspired by nsfs which uses a
      similar stashing mechanism just for namespaces.
      
      Link: https://lore.kernel.org/r/20240213-vfs-pidfd_fs-v1-2-f863f58cfce1@kernel.orgSigned-off-by: default avatarChristian Brauner <brauner@kernel.org>
      cb12fd8e
  2. 28 Feb, 2024 1 commit
  3. 21 Feb, 2024 1 commit
  4. 10 Feb, 2024 2 commits
  5. 07 Feb, 2024 2 commits
  6. 06 Feb, 2024 3 commits
  7. 02 Feb, 2024 7 commits
  8. 21 Jan, 2024 17 commits