- 12 Nov, 2015 27 commits
-
-
Seth Forshee authored
Priveleged operations should be allowed on loop devices within a devloop mount by root within the user namespace which owns the mount. Stash away the namespace at mount time and allow CAP_SYS_ADMIN within this namespace to perform priveleged operations on loop devices. Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
-
Seth Forshee authored
Add limited capability for use of loop devices in non-root containers via a loopfs psuedo fs. When mounted this filesystem will contain only a loop-control device node. This can be used to request free loop devices which will be "owned" by that mount. Device nodes appear automatically for these devices, and the same device will not be given to another loopfs mount. Privileged loop ioctls (for encrypted loop) will be allowed within the namespace which mounted the loopfs. Privileged block ioctls are not permitted, so features such as partitions are not supported for unprivileged users. Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
-
Seth Forshee authored
loop_clr_fd defers teardown if the loop device itself is referenced but proceeds when a partition of the loop device is open. When this happens partition scanning will fail, leaving behind the block devices for the loop device's partitions. Prevent this by returning -EBUSY when attempting to tear down the loop device while any partition block device is open. Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
-
Seth Forshee authored
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
-
Seth Forshee authored
Unprivileged users are normally restricted from mounting with the allow_other option by system policy, but this could be bypassed for a mount done with user namespace root permissions. In such cases allow_other should not allow users outside the userns to access the mount as doing so would give the unprivileged user the ability to manipulate processes it would otherwise be unable to manipulate. Restrict allow_other to apply to users in the same userns used at mount or a descendant of that namespace. Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
-
Seth Forshee authored
Update fuse to translate uids and gids to/from the user namspace of the process servicing requests on /dev/fuse. Any ids which do not map into the namespace will result in errors. inodes will also be marked bad when unmappable ids are received from the userspace fuse process. Currently no use cases are known for letting the userspace fuse daemon switch namespaces after opening /dev/fuse. Given this fact, and in order to keep the implementation as simple as possible and ease security auditing, the user namespace from which /dev/fuse is opened is used for all id translations. This is required to be the same namespace as s_user_ns to maintain behavior consistent with other filesystems which can be mounted in user namespaces. For cuse the namespace used for the connection is also simply current_user_ns() at the time /dev/cuse is opened. Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
-
Seth Forshee authored
If the userspace process servicing fuse requests is running in a pid namespace then pids passed via the fuse fd need to be translated relative to that namespace. Capture the pid namespace in use when the filesystem is mounted and use this for pid translation. Since no use case currently exists for changing namespaces all translations are done relative to the pid namespace in use when /dev/fuse is opened. Mounting or /dev/fuse IO from another namespace will return errors. Requests from processes whose pid cannot be translated into the target namespace are not permitted, except for requests allocated via fuse_get_req_nofail_nopages. For no-fail requests in.h.pid will be 0 if the pid translation fails. File locking changes based on previous work done by Eric Biederman. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
-
Seth Forshee authored
Support unprivileged mounting of ext4 volumes from user namespaces. This requires the following changes: - Perform all uid and gid conversions to/from disk relative to s_user_ns. In many cases this will already be handled by the vfs helper functions. This also requires updates to handle cases where ids may not map into s_user_ns. - Update most capability checks to check for capabilities in s_user_ns rather than init_user_ns. These mostly reflect changes to the filesystem that a user in s_user_ns could already make externally by virtue of having write access to the backing device. - Restrict unsafe options in either the mount options or the ext4 superblock. Currently the only concerning option is errors=panic, and this is made to require CAP_SYS_ADMIN in init_user_ns. - Verify that unprivileged users have the required access to the journal device at the path passed via the journal_path mount option. Note that for the journal_path and the journal_dev mount options, and for external journal devices specified in the ext4 superblock, devcgroup restrictions will be enforced by __blkdev_get(), (via blkdev_get_by_dev()), ensuring that the user has been granted appropriate access to the block device. - Set the FS_USERNS_MOUNT flag on the filesystem types supported by ext4. sysfs attributes for ext4 mounts remain writable only by real root. Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
-
Seth Forshee authored
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
-
Seth Forshee authored
A privileged user in a super block's s_user_ns is privileged towards that file system and thus should be allowed to set file capabilities. The file capabilities will not be trusted outside of s_user_ns, so an unprivileged user cannot use this to gain privileges in a user namespace where they are not already privileged. Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
-
Seth Forshee authored
A user privileged towards s_user_ns is privileged towards the superblock and should be allowed to perform privileged quota operations on that superblock. Convert check_quotactl_permission to check for CAP_SYS_ADMIN in s_user_ns for these operations. Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
-
Seth Forshee authored
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
-
Seth Forshee authored
Superblock level remounts are currently restricted to global CAP_SYS_ADMIN, as is the path for changing the root mount to read only on umount. Loosen both of these permission checks to also allow CAP_SYS_ADMIN in any namespace which is privileged towards the userns which originally mounted the filesystem. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
-
Seth Forshee authored
The user in control of a super block should be allowed to freeze and thaw it. Relax the restrictions on the FIFREEZE and FITHAW ioctls to require CAP_SYS_ADMIN in s_user_ns. Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
-
Seth Forshee authored
Privileged users in the namespace which controls a super block should not be prevented from creating hard links. Expand the check in may_linkat() to allow CAP_FOWNER in s_user_ns to set any hardlink. Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
-
Seth Forshee authored
Expand the check in should_remove_suid() to keep privileges for CAP_FSETID in s_user_ns rather than init_user_ns. Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
-
Seth Forshee authored
The mounter of a filesystem should be privileged towards the inodes of that filesystem. Extend the checks in inode_owner_or_capable() and capable_wrt_inode_uidgid() to permit access by users priviliged in the user namespace of the inode's superblock. Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
-
Seth Forshee authored
The EVM HMAC should be calculated using the on disk user and group ids, so the k[ug]ids in the inode must be translated relative to the s_user_ns of the inode's super block. Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
-
Seth Forshee authored
ids in on-disk ACLs should be converted to s_user_ns instead of init_user_ns as is done now. This introduces the possibility for id mappings to fail, and when this happens syscalls will return EOVERFLOW. Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
-
Seth Forshee authored
When dealing with mounts from user namespaces quota user ids must be translated relative to s_user_ns, especially when they will be stored on disk. For purposes of the in-kernel hash table using init_user_ns is still okay and may help reduce hash collisions, so continue using init_user_ns there. These changes introduce the possibility that the conversion operations in struct qtree_fmt_operations could fail if an id cannot be legitimately converted. Change these operations to return int rather than void, and update the callers to handle failures. Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
-
Seth Forshee authored
Add checks to inode_change_ok to verify that uid and gid changes will map into the superblock's user namespace. If they do not fail with -EOVERFLOW. This cannot be overriden with ATTR_FORCE. Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
-
Seth Forshee authored
For filesystems mounted from a user namespace on-disk ids should be translated relative to s_users_ns rather than init_user_ns. When an id in the filesystem doesn't exist in s_user_ns the associated id in the inode will be set to INVALID_[UG]ID, which turns these into de facto "nobody" ids. This actually maps pretty well into the way most code already works, and those places where it didn't were fixed in previous patches. Moving forward vfs code needs to be careful to handle instances where ids in inodes may be invalid. Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
-
Seth Forshee authored
Using INVALID_[UG]ID for the LSM file creation context doesn't make sense, so return an error if the inode passed to set_create_file_as() has an invalid id. Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
-
Seth Forshee authored
Filesystem uids which don't map into a user namespace may result in inode->i_uid being INVALID_UID. A symlink and its parent could have different owners in the filesystem can both get mapped to INVALID_UID, which may result in following a symlink when this would not have otherwise been permitted when protected symlinks are enabled. Add a new helper function, uid_valid_eq(), and use this to validate that the ids in may_follow_link() are both equal and valid. Also add an equivalent helper for gids, which is currently unused. Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
-
Seth Forshee authored
The SMACK64, SMACK64EXEC, and SMACK64MMAP labels are all handled differently in untrusted mounts. This is confusing and potentically problematic. Change this to handle them all the same way that SMACK64 is currently handled; that is, read the label from disk and check it at use time. For SMACK64 and SMACK64MMAP access is denied if the label does not match smk_root. To be consistent with suid, a SMACK64EXEC label which does not match smk_root will still allow execution of the file but will not run with the label supplied in the xattr. Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
-
Seth Forshee authored
All current callers of in_userns pass current_user_ns as the first argument. Simplify by replacing in_userns with current_in_userns which checks whether current_user_ns is in the namespace supplied as an argument. Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
-
Seth Forshee authored
Security labels from unprivileged mounts in user namespaces must be ignored. Force superblocks from user namespaces whose labeling behavior is to use xattrs to use mountpoint labeling instead. For the mountpoint label, default to converting the current task context into a form suitable for file objects, but also allow the policy writer to specify a different label through policy transition rules. Pieced together from code snippets provided by Stephen Smalley. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
-
- 09 Oct, 2015 4 commits
-
-
Andy Lutomirski authored
If a process gets access to a mount from a different user namespace, that process should not be able to take advantage of setuid files or selinux entrypoints from that filesystem. Prevent this by treating mounts from other mount namespaces and those not owned by current_user_ns() or an ancestor as nosuid. This will make it safer to allow more complex filesystems to be mounted in non-root user namespaces. This does not remove the need for MNT_LOCK_NOSUID. The setuid, setgid, and file capability bits can no longer be abused if code in a user namespace were to clear nosuid on an untrusted filesystem, but this patch, by itself, is insufficient to protect the system from abuse of files that, when execed, would increase MAC privilege. As a more concrete explanation, any task that can manipulate a vfsmount associated with a given user namespace already has capabilities in that namespace and all of its descendents. If they can cause a malicious setuid, setgid, or file-caps executable to appear in that mount, then that executable will only allow them to elevate privileges in exactly the set of namespaces in which they are already privileges. On the other hand, if they can cause a malicious executable to appear with a dangerous MAC label, running it could change the caller's security context in a way that should not have been possible, even inside the namespace in which the task is confined. As a hardening measure, this would have made CVE-2014-5207 much more difficult to exploit. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
-
Seth Forshee authored
Unprivileged users should not be able to mount mtd block devices when they lack sufficient privileges towards the block device inode. Update mount_mtd() to validate that the user has the required access to the inode at the specified path. The check will be skipped for CAP_SYS_ADMIN, so privileged mounts will continue working as before. Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
-
Seth Forshee authored
Unprivileged users should not be able to mount block devices when they lack sufficient privileges towards the block device inode. Update blkdev_get_by_path() to validate that the user has the required access to the inode at the specified path. The check will be skipped for CAP_SYS_ADMIN, so privileged mounts will continue working as before. Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
-
Seth Forshee authored
When looking up a block device by path no permission check is done to verify that the user has access to the block device inode at the specified path. In some cases it may be necessary to check permissions towards the inode, such as allowing unprivileged users to mount block devices in user namespaces. Add an argument to lookup_bdev() to optionally perform this permission check. A value of 0 skips the permission check and behaves the same as before. A non-zero value specifies the mask of access rights required towards the inode at the specified path. The check is always skipped if the user has CAP_SYS_ADMIN. All callers of lookup_bdev() currently pass a mask of 0, so this patch results in no functional change. Subsequent patches will add permission checks where appropriate. Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
-
- 25 Sep, 2015 4 commits
-
-
Seth Forshee authored
Security labels from unprivileged mounts cannot be trusted. Ideally for these mounts we would assign the objects in the filesystem the same label as the inode for the backing device passed to mount. Unfortunately it's currently impossible to determine which inode this is from the LSM mount hooks, so we settle for the label of the process doing the mount. This label is assigned to s_root, and also to smk_default to ensure that new inodes receive this label. The transmute property is also set on s_root to make this behavior more explicit, even though it is technically not necessary. If a filesystem has existing security labels, access to inodes is permitted if the label is the same as smk_root, otherwise access is denied. The SMACK64EXEC xattr is completely ignored. Explicit setting of security labels continues to require CAP_MAC_ADMIN in init_user_ns. Altogether, this ensures that filesystem objects are not accessible to subjects which cannot already access the backing store, that MAC is not violated for any objects in the fileystem which are already labeled, and that a user cannot use an unprivileged mount to gain elevated MAC privileges. sysfs, tmpfs, and ramfs are already mountable from user namespaces and support security labels. We can't rule out the possibility that these filesystems may already be used in mounts from user namespaces with security lables set from the init namespace, so failing to trust lables in these filesystems may introduce regressions. It is safe to trust labels from these filesystems, since the unprivileged user does not control the backing store and thus cannot supply security labels, so an explicit exception is made to trust labels from these filesystems. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
-
Seth Forshee authored
Capability sets attached to files must be ignored except in the user namespaces where the mounter is privileged, i.e. s_user_ns and its descendants. Otherwise a vector exists for gaining privileges in namespaces where a user is not already privileged. Add a new helper function, in_user_ns(), to test whether a user namespace is the same as or a descendant of another namespace. Use this helper to determine whether a file's capability set should be applied to the caps constructed during exec. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
-
Eric W. Biederman authored
- Consolidate the testing if a device node may be opened in a new function may_open_dev. - Move the check for allowing access to device nodes on filesystems not mounted in the initial user namespace from mount time to open time and include it in may_open_dev. This set of changes removes the implicit adding of MNT_NODEV which simplifies the logic in fs/namespace.c and removes a potentially problematic difference in how normal and unprivileged mount namespaces work. This is a user visible change in behavior for remount in unpriviliged mount namespaces but is unlikely to cause problems for existing software. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
-
Seth Forshee authored
Initially this will be used to eliminate the implicit MNT_NODEV flag for mounts from user namespaces. In the future it will also be used for translating ids and checking capabilities for filesystems mounted from user namespaces. s_user_ns is initialized in alloc_super() and is generally set to current_user_ns(). To avoid security and corruption issues, two additional mount checks are also added: - do_new_mount() gains a check that the user has CAP_SYS_ADMIN in current_user_ns(). - sget() will fail with EBUSY when the filesystem it's looking for is already mounted from another user namespace. proc requires some special handling. The user namespace of current isn't appropriate when forking as a result of clone (2) with CLONE_NEWPID|CLONE_NEWUSER, as it will set s_user_ns to the namespace of the parent and make proc unmountable in the new user namespace. Instead, the user namespace which owns the new pid namespace is used. sget_userns() is allowed to allow passing in a namespace other than that of current, and sget becomes a wrapper around sget_userns() which passes current_user_ns(). Changes to original version of this patch * Documented @user_ns in sget_userns, alloc_super and fs.h * Kept an blank line in fs.h * Removed unncessary include of user_namespace.h from fs.h * Tweaked the location of get_user_ns and put_user_ns so the security modules can (if they wish) depend on it. -- EWB Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
-
- 12 Sep, 2015 5 commits
-
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/jesper/crisLinus Torvalds authored
Pull CRIS updates from Jesper Nilsson: "Mostly removal of old cruft of which we can use a generic version, or fixes for code not commonly run in the cris port, but also additions to enable some good debug" * tag 'cris-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jesper/cris: (25 commits) CRISv10: delete unused lib/dmacopy.c CRISv10: delete unused lib/old_checksum.c CRIS: fix switch_mm() lockdep splat CRISv32: enable LOCKDEP_SUPPORT CRIS: add STACKTRACE_SUPPORT CRISv32: annotate irq enable in idle loop CRISv32: add support for irqflags tracing CRIS: UAPI: use generic types.h CRIS: UAPI: use generic shmbuf.h CRIS: UAPI: use generic msgbuf.h CRIS: UAPI: use generic socket.h CRIS: UAPI: use generic sembuf.h CRIS: UAPI: use generic sockios.h CRIS: UAPI: use generic auxvec.h CRIS: UAPI: use generic headers via Kbuild CRIS: UAPI: fix elf.h export CRIS: don't make asm/elf.h depend on asm/user.h CRIS: UAPI: fix ptrace.h CRISv32: Squash compile warnings for axisflashmap CRISv32: Add GPIO driver to the default configs ...
-
Linus Torvalds authored
rq_data_dir() returns either READ or WRITE (0 == READ, 1 == WRITE), not a boolean value. Now, admittedly the "!= 0" doesn't really change the value (0 stays as zero, 1 stays as one), but it's not only redundant, it confuses gcc, and causes gcc to warn about the construct switch (rq_data_dir(req)) { case READ: ... case WRITE: ... that we have in a few drivers. Now, the gcc warning is silly and stupid (it seems to warn not about the switch value having a different type from the case statements, but about _any_ boolean switch value), but in this case the code itself is silly and stupid too, so let's just change it, and get rid of warnings like this: drivers/block/hd.c: In function ‘hd_request’: drivers/block/hd.c:630:11: warning: switch condition has boolean value [-Wswitch-bool] switch (rq_data_dir(req)) { The odd '!= 0' came in when "cmd_flags" got turned into a "u64" in commit 5953316d ("block: make rq->cmd_flags be 64-bit") and is presumably because the old code (that just did a logical 'and' with 1) would then end up making the type of rq_data_dir() be u64 too. But if we want to retain the old regular integer type, let's just cast the result to 'int' rather than use that rather odd '!= 0'. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Linus Torvalds authored
Fix up the writeback plugging introduced in commit d353d758 ("writeback: plug writeback at a high level") that then caused problems due to the unplug happening with a spinlock held. * writeback-plugging: writeback: plug writeback in wb_writeback() and writeback_inodes_wb() Revert "writeback: plug writeback at a high level"
-
Linus Torvalds authored
We had to revert the pluggin in writeback_sb_inodes() because the wb->list_lock is held, but we could easily plug at a higher level before taking that lock, and unplug after releasing it. This does that. Chris will run performance numbers, just to verify that this approach is comparable to the alternative (we could just drop and re-take the lock around the blk_finish_plug() rather than these two commits. I'd have preferred waiting for actual performance numbers before picking one approach over the other, but I don't want to release rc1 with the known "sleeping function called from invalid context" issue, so I'll pick this cleanup version for now. But if the numbers show that we really want to plug just at the writeback_sb_inodes() level, and we should just play ugly games with the spinlock, we'll switch to that. Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <jbacik@fb.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Neil Brown <neilb@suse.de> Cc: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-