1. 23 Sep, 2016 7 commits
    • Arnd Bergmann's avatar
      netns: move {inc,dec}_net_namespaces into #ifdef · 2ed6afde
      Arnd Bergmann authored
      With the newly enforced limit on the number of namespaces,
      we get a build warning if CONFIG_NETNS is disabled:
      
      net/core/net_namespace.c:273:13: error: 'dec_net_namespaces' defined but not used [-Werror=unused-function]
      net/core/net_namespace.c:268:24: error: 'inc_net_namespaces' defined but not used [-Werror=unused-function]
      
      This moves the two added functions inside the #ifdef that guards
      their callers.
      
      Fixes: 70328660 ("netns: Add a limit on the number of net namespaces")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      2ed6afde
    • Eric W. Biederman's avatar
      nsfs: Simplify __ns_get_path · 213b067c
      Eric W. Biederman authored
      Move mntget from the very beginning of __ns_get_path to
      the success path of __ns_get_path, and remove the mntget
      calls.
      
      This removes the possibility that there will be a mntget/mntput
      pair of __ns_get_path has to retry, and generally simplifies the code.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      213b067c
    • Eric W. Biederman's avatar
      Merge branch 'nsfs-ioctls' into HEAD · 78725596
      Eric W. Biederman authored
      From: Andrey Vagin <avagin@openvz.org>
      
      Each namespace has an owning user namespace and now there is not way
      to discover these relationships.
      
      Pid and user namepaces are hierarchical. There is no way to discover
      parent-child relationships too.
      
      Why we may want to know relationships between namespaces?
      
      One use would be visualization, in order to understand the running
      system.  Another would be to answer the question: what capability does
      process X have to perform operations on a resource governed by namespace
      Y?
      
      One more use-case (which usually called abnormal) is checkpoint/restart.
      In CRIU we are going to dump and restore nested namespaces.
      
      There [1] was a discussion about which interface to choose to determing
      relationships between namespaces.
      
      Eric suggested to add two ioctl-s [2]:
      > Grumble, Grumble.  I think this may actually a case for creating ioctls
      > for these two cases.  Now that random nsfs file descriptors are bind
      > mountable the original reason for using proc files is not as pressing.
      >
      > One ioctl for the user namespace that owns a file descriptor.
      > One ioctl for the parent namespace of a namespace file descriptor.
      
      Here is an implementaions of these ioctl-s.
      
      $ man man7/namespaces.7
      ...
      Since  Linux  4.X,  the  following  ioctl(2)  calls are supported for
      namespace file descriptors.  The correct syntax is:
      
            fd = ioctl(ns_fd, ioctl_type);
      
      where ioctl_type is one of the following:
      
      NS_GET_USERNS
            Returns a file descriptor that refers to an owning user names‐
            pace.
      
      NS_GET_PARENT
            Returns  a  file descriptor that refers to a parent namespace.
            This ioctl(2) can be used for pid  and  user  namespaces.  For
            user namespaces, NS_GET_PARENT and NS_GET_USERNS have the same
            meaning.
      
      In addition to generic ioctl(2) errors, the following  specific  ones
      can occur:
      
      EINVAL NS_GET_PARENT was called for a nonhierarchical namespace.
      
      EPERM  The  requested  namespace  is outside of the current namespace
            scope.
      
      [1] https://lkml.org/lkml/2016/7/6/158
      [2] https://lkml.org/lkml/2016/7/9/101
      
      Changes for v2:
      * don't return ENOENT for init_user_ns and init_pid_ns. There is nothing
        outside of the init namespace, so we can return EPERM in this case too.
        > The fewer special cases the easier the code is to get
        > correct, and the easier it is to read. // Eric
      
      Changes for v3:
      * rename ns->get_owner() to ns->owner(). get_* usually means that it
        grabs a reference.
      
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
      Cc: "W. Trevor King" <wking@tremily.us>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Serge Hallyn <serge.hallyn@canonical.com>
      78725596
    • Andrey Vagin's avatar
      tools/testing: add a test to check nsfs ioctl-s · 6ad92bf6
      Andrey Vagin authored
      There are two new ioctl-s:
      One ioctl for the user namespace that owns a file descriptor.
      One ioctl for the parent namespace of a namespace file descriptor.
      
      The test checks that these ioctl-s works and that they handle a case
      when a target namespace is outside of the current process namespace.
      Signed-off-by: default avatarAndrei Vagin <avagin@openvz.org>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      6ad92bf6
    • Andrey Vagin's avatar
      nsfs: add ioctl to get a parent namespace · a7306ed8
      Andrey Vagin authored
      Pid and user namepaces are hierarchical. There is no way to discover
      parent-child relationships.
      
      In a future we will use this interface to dump and restore nested
      namespaces.
      Acked-by: default avatarSerge Hallyn <serge@hallyn.com>
      Signed-off-by: default avatarAndrei Vagin <avagin@openvz.org>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      a7306ed8
    • Andrey Vagin's avatar
      nsfs: add ioctl to get an owning user namespace for ns file descriptor · 6786741d
      Andrey Vagin authored
      Each namespace has an owning user namespace and now there is not way
      to discover these relationships.
      
      Understending namespaces relationships allows to answer the question:
      what capability does process X have to perform operations on a resource
      governed by namespace Y?
      
      After a long discussion, Eric W. Biederman proposed to use ioctl-s for
      this purpose.
      
      The NS_GET_USERNS ioctl returns a file descriptor to an owning user
      namespace.
      It returns EPERM if a target namespace is outside of a current user
      namespace.
      
      v2: rename parent to relative
      
      v3: Add a missing mntput when returning -EAGAIN --EWB
      Acked-by: default avatarSerge Hallyn <serge@hallyn.com>
      Link: https://lkml.org/lkml/2016/7/6/158Signed-off-by: default avatarAndrei Vagin <avagin@openvz.org>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      6786741d
    • Andrey Vagin's avatar
      kernel: add a helper to get an owning user namespace for a namespace · bcac25a5
      Andrey Vagin authored
      Return -EPERM if an owning user namespace is outside of a process
      current user namespace.
      
      v2: In a first version ns_get_owner returned ENOENT for init_user_ns.
          This special cases was removed from this version. There is nothing
          outside of init_user_ns, so we can return EPERM.
      v3: rename ns->get_owner() to ns->owner(). get_* usually means that it
      grabs a reference.
      Acked-by: default avatarSerge Hallyn <serge@hallyn.com>
      Signed-off-by: default avatarAndrei Vagin <avagin@openvz.org>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      bcac25a5
  2. 22 Sep, 2016 8 commits
  3. 31 Aug, 2016 1 commit
  4. 08 Aug, 2016 12 commits
  5. 07 Aug, 2016 10 commits
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 857953d7
      Linus Torvalds authored
      Pull more block fixes from Jens Axboe:
       "As mentioned in the pull the other day, a few more fixes for this
        round, all related to the bio op changes in this series.
      
        Two fixes, and then a cleanup, renaming bio->bi_rw to bio->bi_opf.  I
        wanted to do that change right after or right before -rc1, so that
        risk of conflict was reduced.  I just rebased the series on top of
        current master, and no new ->bi_rw usage has snuck in"
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        block: rename bio bi_rw to bi_opf
        target: iblock_execute_sync_cache() should use bio_set_op_attrs()
        mm: make __swap_writepage() use bio_set_op_attrs()
        block/mm: make bdev_ops->rw_page() take a bool for read/write
      857953d7
    • Linus Torvalds's avatar
      Merge tag 'drm-for-v4.8-zpos' of git://people.freedesktop.org/~airlied/linux · 635a4ba1
      Linus Torvalds authored
      Pull drm zpos property support from Dave Airlie:
       "This tree was waiting on some media stuff I hadn't had time to get a
        stable branchpoint off, so I just waited until it was all in your tree
        first.
      
        It's been around a bit on the list and shouldn't affect anything
        outside adding the generic API and moving some ARM drivers to using
        it"
      
      * tag 'drm-for-v4.8-zpos' of git://people.freedesktop.org/~airlied/linux:
        drm: rcar: use generic code for managing zpos plane property
        drm/exynos: use generic code for managing zpos plane property
        drm: sti: use generic zpos for plane
        drm: add generic zpos property
      635a4ba1
    • Jens Axboe's avatar
      block: rename bio bi_rw to bi_opf · 1eff9d32
      Jens Axboe authored
      Since commit 63a4cc24, bio->bi_rw contains flags in the lower
      portion and the op code in the higher portions. This means that
      old code that relies on manually setting bi_rw is most likely
      going to be broken. Instead of letting that brokeness linger,
      rename the member, to force old and out-of-tree code to break
      at compile time instead of at runtime.
      
      No intended functional changes in this commit.
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      1eff9d32
    • Jens Axboe's avatar
      target: iblock_execute_sync_cache() should use bio_set_op_attrs() · 31c64f78
      Jens Axboe authored
      The original commit missed this function, it needs to mark it a
      write flush.
      
      Cc: Mike Christie <mchristi@redhat.com>
      Fixes: e742fc32 ("target: use bio op accessors")
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      31c64f78
    • Jens Axboe's avatar
      mm: make __swap_writepage() use bio_set_op_attrs() · ba13e83e
      Jens Axboe authored
      Cleaner than manipulating bio->bi_rw flags directly.
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      ba13e83e
    • Jens Axboe's avatar
      block/mm: make bdev_ops->rw_page() take a bool for read/write · c11f0c0b
      Jens Axboe authored
      Commit abf54548 changed it from an 'rw' flags type to the
      newer ops based interface, but now we're effectively leaking
      some bdev internals to the rest of the kernel. Since we only
      care about whether it's a read or a write at that level, just
      pass in a bool 'is_write' parameter instead.
      
      Then we can also move op_is_write() and friends back under
      CONFIG_BLOCK protection.
      Reviewed-by: default avatarMike Christie <mchristi@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      c11f0c0b
    • Linus Torvalds's avatar
      Merge tag 'doc-4.8-fixes' of git://git.lwn.net/linux · 52ddb7e9
      Linus Torvalds authored
      Pull documentation fixes from Jonathan Corbet:
       "Three fixes for the docs build, including removing an annoying warning
        on 'make help' if sphinx isn't present"
      
      * tag 'doc-4.8-fixes' of git://git.lwn.net/linux:
        DocBook: use DOCBOOKS="" to ignore DocBooks instead of IGNORE_DOCBOOKS=1
        Documenation: update cgroup's document path
        Documentation/sphinx: do not warn about missing tools in 'make help'
      52ddb7e9
    • Linus Torvalds's avatar
      Merge tag 'binfmt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/binfmt_misc · e9d488c3
      Linus Torvalds authored
      Pull binfmt_misc update from James Bottomley:
       "This update is to allow architecture emulation containers to function
        such that the emulation binary can be housed outside the container
        itself.  The container and fs parts both have acks from relevant
        experts.
      
        To use the new feature you have to add an F option to your binfmt_misc
        configuration"
      
      From the docs:
       "The usual behaviour of binfmt_misc is to spawn the binary lazily when
        the misc format file is invoked.  However, this doesn't work very well
        in the face of mount namespaces and changeroots, so the F mode opens
        the binary as soon as the emulation is installed and uses the opened
        image to spawn the emulator, meaning it is always available once
        installed, regardless of how the environment changes"
      
      * tag 'binfmt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/binfmt_misc:
        binfmt_misc: add F option description to documentation
        binfmt_misc: add persistent opened binary handler for containers
        fs: add filp_clone_open API
      e9d488c3
    • Eryu Guan's avatar
      fs: return EPERM on immutable inode · 337684a1
      Eryu Guan authored
      In most cases, EPERM is returned on immutable inode, and there're only a
      few places returning EACCES. I noticed this when running LTP on
      overlayfs, setxattr03 failed due to unexpected EACCES on immutable
      inode.
      
      So converting all EACCES to EPERM on immutable inode.
      Acked-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarEryu Guan <guaneryu@gmail.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      337684a1
    • Linus Torvalds's avatar
      Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · fe64f328
      Linus Torvalds authored
      Pull more vfs updates from Al Viro:
       "Assorted cleanups and fixes.
      
        In the "trivial API change" department - ->d_compare() losing 'parent'
        argument"
      
      * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        cachefiles: Fix race between inactivating and culling a cache object
        9p: use clone_fid()
        9p: fix braino introduced in "9p: new helper - v9fs_parent_fid()"
        vfs: make dentry_needs_remove_privs() internal
        vfs: remove file_needs_remove_privs()
        vfs: fix deadlock in file_remove_privs() on overlayfs
        get rid of 'parent' argument of ->d_compare()
        cifs, msdos, vfat, hfs+: don't bother with parent in ->d_compare()
        affs ->d_compare(): don't bother with ->d_inode
        fold _d_rehash() and __d_rehash() together
        fold dentry_rcuwalk_invalidate() into its only remaining caller
      fe64f328
  6. 06 Aug, 2016 2 commits
    • Linus Torvalds's avatar
      Merge tag 'xfs-rmap-for-linus-4.8-rc1' of... · 0cbbc422
      Linus Torvalds authored
      Merge tag 'xfs-rmap-for-linus-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs
      
      Pull more xfs updates from Dave Chinner:
       "This is the second part of the XFS updates for this merge cycle, and
        contains the new reverse block mapping feature for XFS.
      
        Reverse mapping allows us to track the owner of a specific block on
        disk precisely.  It is implemented as a set of btrees (one per
        allocation group) that track the owners of allocated extents.
        Effectively it is a "used space tree" that is updated when we allocate
        or free extents.  i.e. it is coherent with the free space btrees we
        already maintain and never overlaps with them.
      
        This reverse mapping infrastructure is the building block of several
        upcoming features - reflink, copy-on-write data, dedupe, online
        metadata and data scrubbing, highly accurate bad sector/data loss
        reporting to users, and significantly improved reconstruction of
        damaged and corrupted filesystems.  There's a lot of new stuff coming
        along in the next couple of cycles,a nd it all builds in the rmap
        infrastructure.
      
        As such, it's a huge chunk of new code with new on-disk format
        features and internal infrastructure.  It warns at mount time as an
        experimental feature and that it may eat data (as we do with all new
        on-disk features until they stabilise).  We have not released
        userspace suport for it yet - userspace support currently requires
        download from Darrick's xfsprogs repo and build from source, so the
        access to this feature is really developer/tester only at this point.
        Initial userspace support will be released at the same time kernel
        with this code in it is released.
      
        The new rmap enabled code regresses 3 xfstests - all are ENOSPC
        related corner cases, one of which Darrick posted a fix for a few
        hours ago.  The other two are fixed by infrastructure that is part of
        the upcoming reflink patchset.  This new ENOSPC infrastructure
        requires a on-disk format tweak required to keep mount times in
        check - we need to keep an on-disk count of allocated rmapbt blocks so
        we don't have to scan the entire btrees at mount time to count them.
      
        This is currently being tested and will be part of the fixes sent in
        the next week or two so users will not be exposed to this change"
      
      * tag 'xfs-rmap-for-linus-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (52 commits)
        xfs: move (and rename) the deferred bmap-free tracepoints
        xfs: collapse single use static functions
        xfs: remove unnecessary parentheses from log redo item recovery functions
        xfs: remove the extents array from the rmap update done log item
        xfs: in btree_lshift, only allocate temporary cursor when needed
        xfs: remove unnecesary lshift/rshift key initialization
        xfs: remove the get*keys and update_keys btree ops pointers
        xfs: enable the rmap btree functionality
        xfs: don't update rmapbt when fixing agfl
        xfs: disable XFS_IOC_SWAPEXT when rmap btree is enabled
        xfs: add rmap btree block detection to log recovery
        xfs: add rmap btree geometry feature flag
        xfs: propagate bmap updates to rmapbt
        xfs: enable the xfs_defer mechanism to process rmaps to update
        xfs: log rmap intent items
        xfs: create rmap update intent log items
        xfs: add rmap btree insert and delete helpers
        xfs: convert unwritten status of reverse mappings
        xfs: remove an extent from the rmap btree
        xfs: add an extent to the rmap btree
        ...
      0cbbc422
    • Linus Torvalds's avatar
      Merge branch 'work.const-qstr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 835c92d4
      Linus Torvalds authored
      Pull qstr constification updates from Al Viro:
       "Fairly self-contained bunch - surprising lot of places passes struct
        qstr * as an argument when const struct qstr * would suffice; it
        complicates analysis for no good reason.
      
        I'd prefer to feed that separately from the assorted fixes (those are
        in #for-linus and with somewhat trickier topology)"
      
      * 'work.const-qstr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        qstr: constify instances in adfs
        qstr: constify instances in lustre
        qstr: constify instances in f2fs
        qstr: constify instances in ext2
        qstr: constify instances in vfat
        qstr: constify instances in procfs
        qstr: constify instances in fuse
        qstr constify instances in fs/dcache.c
        qstr: constify instances in nfs
        qstr: constify instances in ocfs2
        qstr: constify instances in autofs4
        qstr: constify instances in hfs
        qstr: constify instances in hfsplus
        qstr: constify instances in logfs
        qstr: constify dentry_init_security
      835c92d4