1. 12 Nov, 2015 27 commits
  2. 09 Oct, 2015 4 commits
    • Andy Lutomirski's avatar
      fs: Treat foreign mounts as nosuid · d2347aea
      Andy Lutomirski authored
      If a process gets access to a mount from a different user
      namespace, that process should not be able to take advantage of
      setuid files or selinux entrypoints from that filesystem.  Prevent
      this by treating mounts from other mount namespaces and those not
      owned by current_user_ns() or an ancestor as nosuid.
      
      This will make it safer to allow more complex filesystems to be
      mounted in non-root user namespaces.
      
      This does not remove the need for MNT_LOCK_NOSUID.  The setuid,
      setgid, and file capability bits can no longer be abused if code in
      a user namespace were to clear nosuid on an untrusted filesystem,
      but this patch, by itself, is insufficient to protect the system
      from abuse of files that, when execed, would increase MAC privilege.
      
      As a more concrete explanation, any task that can manipulate a
      vfsmount associated with a given user namespace already has
      capabilities in that namespace and all of its descendents.  If they
      can cause a malicious setuid, setgid, or file-caps executable to
      appear in that mount, then that executable will only allow them to
      elevate privileges in exactly the set of namespaces in which they
      are already privileges.
      
      On the other hand, if they can cause a malicious executable to
      appear with a dangerous MAC label, running it could change the
      caller's security context in a way that should not have been
      possible, even inside the namespace in which the task is confined.
      
      As a hardening measure, this would have made CVE-2014-5207 much
      more difficult to exploit.
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      d2347aea
    • Seth Forshee's avatar
      mtd: Check permissions towards mtd block device inode when mounting · 81901738
      Seth Forshee authored
      Unprivileged users should not be able to mount mtd block devices
      when they lack sufficient privileges towards the block device
      inode.  Update mount_mtd() to validate that the user has the
      required access to the inode at the specified path. The check
      will be skipped for CAP_SYS_ADMIN, so privileged mounts will
      continue working as before.
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      81901738
    • Seth Forshee's avatar
      block_dev: Check permissions towards block device inode when mounting · 02241271
      Seth Forshee authored
      Unprivileged users should not be able to mount block devices when
      they lack sufficient privileges towards the block device inode.
      Update blkdev_get_by_path() to validate that the user has the
      required access to the inode at the specified path. The check
      will be skipped for CAP_SYS_ADMIN, so privileged mounts will
      continue working as before.
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      02241271
    • Seth Forshee's avatar
      block_dev: Support checking inode permissions in lookup_bdev() · 054e3188
      Seth Forshee authored
      When looking up a block device by path no permission check is
      done to verify that the user has access to the block device inode
      at the specified path. In some cases it may be necessary to
      check permissions towards the inode, such as allowing
      unprivileged users to mount block devices in user namespaces.
      
      Add an argument to lookup_bdev() to optionally perform this
      permission check. A value of 0 skips the permission check and
      behaves the same as before. A non-zero value specifies the mask
      of access rights required towards the inode at the specified
      path. The check is always skipped if the user has CAP_SYS_ADMIN.
      
      All callers of lookup_bdev() currently pass a mask of 0, so this
      patch results in no functional change. Subsequent patches will
      add permission checks where appropriate.
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      054e3188
  3. 25 Sep, 2015 4 commits
    • Seth Forshee's avatar
      Smack: Add support for unprivileged mounts from user namespaces · 36979551
      Seth Forshee authored
      Security labels from unprivileged mounts cannot be trusted.
      Ideally for these mounts we would assign the objects in the
      filesystem the same label as the inode for the backing device
      passed to mount. Unfortunately it's currently impossible to
      determine which inode this is from the LSM mount hooks, so we
      settle for the label of the process doing the mount.
      
      This label is assigned to s_root, and also to smk_default to
      ensure that new inodes receive this label. The transmute property
      is also set on s_root to make this behavior more explicit, even
      though it is technically not necessary.
      
      If a filesystem has existing security labels, access to inodes is
      permitted if the label is the same as smk_root, otherwise access
      is denied. The SMACK64EXEC xattr is completely ignored.
      
      Explicit setting of security labels continues to require
      CAP_MAC_ADMIN in init_user_ns.
      
      Altogether, this ensures that filesystem objects are not
      accessible to subjects which cannot already access the backing
      store, that MAC is not violated for any objects in the fileystem
      which are already labeled, and that a user cannot use an
      unprivileged mount to gain elevated MAC privileges.
      
      sysfs, tmpfs, and ramfs are already mountable from user
      namespaces and support security labels. We can't rule out the
      possibility that these filesystems may already be used in mounts
      from user namespaces with security lables set from the init
      namespace, so failing to trust lables in these filesystems may
      introduce regressions. It is safe to trust labels from these
      filesystems, since the unprivileged user does not control the
      backing store and thus cannot supply security labels, so an
      explicit exception is made to trust labels from these
      filesystems.
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Acked-by: default avatarCasey Schaufler <casey@schaufler-ca.com>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      36979551
    • Seth Forshee's avatar
      fs: Limit file caps to the user namespace of the super block · a8c473e9
      Seth Forshee authored
      Capability sets attached to files must be ignored except in the
      user namespaces where the mounter is privileged, i.e. s_user_ns
      and its descendants. Otherwise a vector exists for gaining
      privileges in namespaces where a user is not already privileged.
      
      Add a new helper function, in_user_ns(), to test whether a user
      namespace is the same as or a descendant of another namespace.
      Use this helper to determine whether a file's capability set
      should be applied to the caps constructed during exec.
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      a8c473e9
    • Eric W. Biederman's avatar
      userns: Simpilify MNT_NODEV handling. · 4aceccd6
      Eric W. Biederman authored
      - Consolidate the testing if a device node may be opened in a new
        function may_open_dev.
      
      - Move the check for allowing access to device nodes on filesystems
        not mounted in the initial user namespace from mount time to open
        time and include it in may_open_dev.
      
      This set of changes removes the implicit adding of MNT_NODEV which
      simplifies the logic in fs/namespace.c and removes a potentially
      problematic difference in how normal and unprivileged mount
      namespaces work.  This is a user visible change in behavior for
      remount in unpriviliged mount namespaces but is unlikely to cause
      problems for existing software.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      4aceccd6
    • Seth Forshee's avatar
      fs: Add user namesapace member to struct super_block · b2bc2ff7
      Seth Forshee authored
      Initially this will be used to eliminate the implicit MNT_NODEV
      flag for mounts from user namespaces. In the future it will also
      be used for translating ids and checking capabilities for
      filesystems mounted from user namespaces.
      
      s_user_ns is initialized in alloc_super() and is generally set to
      current_user_ns(). To avoid security and corruption issues, two
      additional mount checks are also added:
      
       - do_new_mount() gains a check that the user has CAP_SYS_ADMIN
         in current_user_ns().
      
       - sget() will fail with EBUSY when the filesystem it's looking
         for is already mounted from another user namespace.
      
      proc requires some special handling. The user namespace of
      current isn't appropriate when forking as a result of clone (2)
      with CLONE_NEWPID|CLONE_NEWUSER, as it will set s_user_ns to the
      namespace of the parent and make proc unmountable in the new user
      namespace. Instead, the user namespace which owns the new pid
      namespace is used. sget_userns() is allowed to allow passing in
      a namespace other than that of current, and sget becomes a
      wrapper around sget_userns() which passes current_user_ns().
      
      Changes to original version of this patch
        * Documented @user_ns in sget_userns, alloc_super and fs.h
        * Kept an blank line in fs.h
        * Removed unncessary include of user_namespace.h from fs.h
        * Tweaked the location of get_user_ns and put_user_ns so
          the security modules can (if they wish) depend on it.
        -- EWB
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      b2bc2ff7
  4. 12 Sep, 2015 5 commits
    • Linus Torvalds's avatar
      Linux 4.3-rc1 · 6ff33f39
      Linus Torvalds authored
      6ff33f39
    • Linus Torvalds's avatar
      Merge tag 'cris-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jesper/cris · 6917b51d
      Linus Torvalds authored
      Pull CRIS updates from Jesper Nilsson:
       "Mostly removal of old cruft of which we can use a generic version, or
        fixes for code not commonly run in the cris port, but also additions
        to enable some good debug"
      
      * tag 'cris-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jesper/cris: (25 commits)
        CRISv10: delete unused lib/dmacopy.c
        CRISv10: delete unused lib/old_checksum.c
        CRIS: fix switch_mm() lockdep splat
        CRISv32: enable LOCKDEP_SUPPORT
        CRIS: add STACKTRACE_SUPPORT
        CRISv32: annotate irq enable in idle loop
        CRISv32: add support for irqflags tracing
        CRIS: UAPI: use generic types.h
        CRIS: UAPI: use generic shmbuf.h
        CRIS: UAPI: use generic msgbuf.h
        CRIS: UAPI: use generic socket.h
        CRIS: UAPI: use generic sembuf.h
        CRIS: UAPI: use generic sockios.h
        CRIS: UAPI: use generic auxvec.h
        CRIS: UAPI: use generic headers via Kbuild
        CRIS: UAPI: fix elf.h export
        CRIS: don't make asm/elf.h depend on asm/user.h
        CRIS: UAPI: fix ptrace.h
        CRISv32: Squash compile warnings for axisflashmap
        CRISv32: Add GPIO driver to the default configs
        ...
      6917b51d
    • Linus Torvalds's avatar
      blk: rq_data_dir() should not return a boolean · 10fbd36e
      Linus Torvalds authored
      rq_data_dir() returns either READ or WRITE (0 == READ, 1 == WRITE), not
      a boolean value.
      
      Now, admittedly the "!= 0" doesn't really change the value (0 stays as
      zero, 1 stays as one), but it's not only redundant, it confuses gcc, and
      causes gcc to warn about the construct
      
          switch (rq_data_dir(req)) {
              case READ:
                  ...
              case WRITE:
                  ...
      
      that we have in a few drivers.
      
      Now, the gcc warning is silly and stupid (it seems to warn not about the
      switch value having a different type from the case statements, but about
      _any_ boolean switch value), but in this case the code itself is silly
      and stupid too, so let's just change it, and get rid of warnings like
      this:
      
        drivers/block/hd.c: In function ‘hd_request’:
        drivers/block/hd.c:630:11: warning: switch condition has boolean value [-Wswitch-bool]
           switch (rq_data_dir(req)) {
      
      The odd '!= 0' came in when "cmd_flags" got turned into a "u64" in
      commit 5953316d ("block: make rq->cmd_flags be 64-bit") and is
      presumably because the old code (that just did a logical 'and' with 1)
      would then end up making the type of rq_data_dir() be u64 too.
      
      But if we want to retain the old regular integer type, let's just cast
      the result to 'int' rather than use that rather odd '!= 0'.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      10fbd36e
    • Linus Torvalds's avatar
      Merge branch 'writeback-plugging' · e1df8b0a
      Linus Torvalds authored
      Fix up the writeback plugging introduced in commit d353d758
      ("writeback: plug writeback at a high level") that then caused problems
      due to the unplug happening with a spinlock held.
      
      * writeback-plugging:
        writeback: plug writeback in wb_writeback() and writeback_inodes_wb()
        Revert "writeback: plug writeback at a high level"
      e1df8b0a
    • Linus Torvalds's avatar
      writeback: plug writeback in wb_writeback() and writeback_inodes_wb() · 505a666e
      Linus Torvalds authored
      We had to revert the pluggin in writeback_sb_inodes() because the
      wb->list_lock is held, but we could easily plug at a higher level before
      taking that lock, and unplug after releasing it.  This does that.
      
      Chris will run performance numbers, just to verify that this approach is
      comparable to the alternative (we could just drop and re-take the lock
      around the blk_finish_plug() rather than these two commits.
      
      I'd have preferred waiting for actual performance numbers before picking
      one approach over the other, but I don't want to release rc1 with the
      known "sleeping function called from invalid context" issue, so I'll
      pick this cleanup version for now.  But if the numbers show that we
      really want to plug just at the writeback_sb_inodes() level, and we
      should just play ugly games with the spinlock, we'll switch to that.
      
      Cc: Chris Mason <clm@fb.com>
      Cc: Josef Bacik <jbacik@fb.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      505a666e