1. 16 Mar, 2020 1 commit
  2. 12 Mar, 2020 2 commits
    • Tejun Heo's avatar
      cgroup: Restructure release_agent_path handling · e7b20d97
      Tejun Heo authored
      cgrp->root->release_agent_path is protected by both cgroup_mutex and
      release_agent_path_lock and readers can hold either one. The
      dual-locking scheme was introduced while breaking a locking dependency
      issue around cgroup_mutex but doesn't make sense anymore given that
      the only remaining reader which uses cgroup_mutex is
      cgroup1_releaes_agent().
      
      This patch updates cgroup1_release_agent() to use
      release_agent_path_lock so that release_agent_path is always protected
      only by release_agent_path_lock.
      
      While at it, convert strlen() based empty string checks to direct
      tests on the first character as suggested by Linus.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      e7b20d97
    • Tejun Heo's avatar
      Merge branch 'for-5.6-fixes' into for-5.7 · a09833f7
      Tejun Heo authored
      a09833f7
  3. 04 Mar, 2020 2 commits
    • Tycho Andersen's avatar
      cgroup1: don't call release_agent when it is "" · 2e5383d7
      Tycho Andersen authored
      Older (and maybe current) versions of systemd set release_agent to "" when
      shutting down, but do not set notify_on_release to 0.
      
      Since 64e90a8a ("Introduce STATIC_USERMODEHELPER to mediate
      call_usermodehelper()"), we filter out such calls when the user mode helper
      path is "". However, when used in conjunction with an actual (i.e. non "")
      STATIC_USERMODEHELPER, the path is never "", so the real usermode helper
      will be called with argv[0] == "".
      
      Let's avoid this by not invoking the release_agent when it is "".
      Signed-off-by: default avatarTycho Andersen <tycho@tycho.ws>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      2e5383d7
    • Qian Cai's avatar
      cgroup: fix psi_show() crash on 32bit ino archs · 190ecb19
      Qian Cai authored
      Similar to the commit d7495343 ("cgroup: fix incorrect
      WARN_ON_ONCE() in cgroup_setup_root()"), cgroup_id(root_cgrp) does not
      equal to 1 on 32bit ino archs which triggers all sorts of issues with
      psi_show() on s390x. For example,
      
       BUG: KASAN: slab-out-of-bounds in collect_percpu_times+0x2d0/
       Read of size 4 at addr 000000001e0ce000 by task read_all/3667
       collect_percpu_times+0x2d0/0x798
       psi_show+0x7c/0x2a8
       seq_read+0x2ac/0x830
       vfs_read+0x92/0x150
       ksys_read+0xe2/0x188
       system_call+0xd8/0x2b4
      
      Fix it by using cgroup_ino().
      
      Fixes: 74321038 ("cgroup: use cgrp->kn->id as the cgroup ID")
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: stable@vger.kernel.org # v5.5
      190ecb19
  4. 12 Feb, 2020 17 commits
    • Christian Brauner's avatar
      selftests/cgroup: add tests for cloning into cgroups · 9bd5910d
      Christian Brauner authored
      Expand the cgroup test-suite to include tests for CLONE_INTO_CGROUP.
      This adds the following tests:
      - CLONE_INTO_CGROUP manages to clone a process directly into a correctly
        delegated cgroup
      - CLONE_INTO_CGROUP fails to clone a process into a cgroup that has been
        removed after we've opened an fd to it
      - CLONE_INTO_CGROUP fails to clone a process into an invalid domain
        cgroup
      - CLONE_INTO_CGROUP adheres to the no internal process constraint
      - CLONE_INTO_CGROUP works with the freezer feature
      
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: cgroups@vger.kernel.org
      Cc: linux-kselftest@vger.kernel.org
      Acked-by: default avatarRoman Gushchin <guro@fb.com>
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      9bd5910d
    • Christian Brauner's avatar
      clone3: allow spawning processes into cgroups · ef2c41cf
      Christian Brauner authored
      This adds support for creating a process in a different cgroup than its
      parent. Callers can limit and account processes and threads right from
      the moment they are spawned:
      - A service manager can directly spawn new services into dedicated
        cgroups.
      - A process can be directly created in a frozen cgroup and will be
        frozen as well.
      - The initial accounting jitter experienced by process supervisors and
        daemons is eliminated with this.
      - Threaded applications or even thread implementations can choose to
        create a specific cgroup layout where each thread is spawned
        directly into a dedicated cgroup.
      
      This feature is limited to the unified hierarchy. Callers need to pass
      a directory file descriptor for the target cgroup. The caller can
      choose to pass an O_PATH file descriptor. All usual migration
      restrictions apply, i.e. there can be no processes in inner nodes. In
      general, creating a process directly in a target cgroup adheres to all
      migration restrictions.
      
      One of the biggest advantages of this feature is that CLONE_INTO_GROUP does
      not need to grab the write side of the cgroup cgroup_threadgroup_rwsem.
      This global lock makes moving tasks/threads around super expensive. With
      clone3() this lock is avoided.
      
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: cgroups@vger.kernel.org
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      ef2c41cf
    • Christian Brauner's avatar
      cgroup: add cgroup_may_write() helper · f3553220
      Christian Brauner authored
      Add a cgroup_may_write() helper which we can use in the
      CLONE_INTO_CGROUP patch series to verify that we can write to the
      destination cgroup.
      
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: cgroups@vger.kernel.org
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      f3553220
    • Christian Brauner's avatar
      cgroup: refactor fork helpers · 5a5cf5cb
      Christian Brauner authored
      This refactors the fork helpers so they can be easily modified in the
      next patches. The patch just moves the cgroup threadgroup rwsem grab and
      release into the helpers. They don't need to be directly exposed in fork.c.
      
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: cgroups@vger.kernel.org
      Acked-by: default avatarMichal Koutný <mkoutny@suse.com>
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      5a5cf5cb
    • Christian Brauner's avatar
      cgroup: add cgroup_get_from_file() helper · 17703097
      Christian Brauner authored
      Add a helper cgroup_get_from_file(). The helper will be used in
      subsequent patches to retrieve a cgroup while holding a reference to the
      struct file it was taken from.
      
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: cgroups@vger.kernel.org
      Acked-by: default avatarMichal Koutný <mkoutny@suse.com>
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      17703097
    • Christian Brauner's avatar
      cgroup: unify attach permission checking · 6df970e4
      Christian Brauner authored
      The core codepaths to check whether a process can be attached to a
      cgroup are the same for threads and thread-group leaders. Only a small
      piece of code verifying that source and destination cgroup are in the
      same domain differentiates the thread permission checking from
      thread-group leader permission checking.
      Since cgroup_migrate_vet_dst() only matters cgroup2 - it is a noop on
      cgroup1 - we can move it out of cgroup_attach_task().
      All checks can now be consolidated into a new helper
      cgroup_attach_permissions() callable from both cgroup_procs_write() and
      cgroup_threads_write().
      
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: cgroups@vger.kernel.org
      Acked-by: default avatarMichal Koutný <mkoutny@suse.com>
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      6df970e4
    • Prateek Sood's avatar
      cpuset: Make cpuset hotplug synchronous · a49e4629
      Prateek Sood authored
      Convert cpuset_hotplug_workfn() into synchronous call for cpu hotplug
      path. For memory hotplug path it still gets queued as a work item.
      
      Since cpuset_hotplug_workfn() can be made synchronous for cpu hotplug
      path, it is not required to wait for cpuset hotplug while thawing
      processes.
      Signed-off-by: default avatarPrateek Sood <prsood@codeaurora.org>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      a49e4629
    • Madhuparna Bhowmik's avatar
      cgroup.c: Use built-in RCU list checking · 3010c5b9
      Madhuparna Bhowmik authored
      list_for_each_entry_rcu has built-in RCU and lock checking.
      Pass cond argument to list_for_each_entry_rcu() to silence
      false lockdep warning when  CONFIG_PROVE_RCU_LIST is enabled
      by default.
      
      Even though the function css_next_child() already checks if
      cgroup_mutex or rcu_read_lock() is held using
      cgroup_assert_mutex_or_rcu_locked(), there is a need to pass
      cond to list_for_each_entry_rcu() to avoid false positive
      lockdep warning.
      Signed-off-by: default avatarMadhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
      Acked-by: default avatarMichal Koutný <mkoutny@suse.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      3010c5b9
    • Suren Baghdasaryan's avatar
      kselftest/cgroup: add cgroup destruction test · 04189382
      Suren Baghdasaryan authored
      Add new test to verify that a cgroup with dead processes can be destroyed.
      The test spawns a child process which allocates and touches 100MB of RAM
      to ensure prolonged exit. Subsequently it kills the child, waits until
      the cgroup containing the child is empty and destroys the cgroup.
      Signed-off-by: default avatarSuren Baghdasaryan <surenb@google.com>
      [mkoutny@suse.com: Fix typo in test_cgcore_destroy comment]
      Acked-by: default avatarMichal Koutný <mkoutny@suse.com>
      Signed-off-by: default avatarMichal Koutný <mkoutny@suse.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      04189382
    • Michal Koutný's avatar
      cgroup: Clean up css_set task traversal · f43caa2a
      Michal Koutný authored
      css_task_iter stores pointer to head of each iterable list, this dates
      back to commit 0f0a2b4f ("cgroup: reorganize css_task_iter") when we
      did not store cur_cset. Let us utilize list heads directly in cur_cset
      and streamline css_task_iter_advance_css_set a bit. This is no
      intentional function change.
      Signed-off-by: default avatarMichal Koutný <mkoutny@suse.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      f43caa2a
    • Michal Koutný's avatar
      cgroup: Iterate tasks that did not finish do_exit() · 9c974c77
      Michal Koutný authored
      PF_EXITING is set earlier than actual removal from css_set when a task
      is exitting. This can confuse cgroup.procs readers who see no PF_EXITING
      tasks, however, rmdir is checking against css_set membership so it can
      transitionally fail with EBUSY.
      
      Fix this by listing tasks that weren't unlinked from css_set active
      lists.
      It may happen that other users of the task iterator (without
      CSS_TASK_ITER_PROCS) spot a PF_EXITING task before cgroup_exit(). This
      is equal to the state before commit c03cd773 ("cgroup: Include dying
      leaders with live threads in PROCS iterations") but it may be reviewed
      later.
      Reported-by: default avatarSuren Baghdasaryan <surenb@google.com>
      Fixes: c03cd773 ("cgroup: Include dying leaders with live threads in PROCS iterations")
      Signed-off-by: default avatarMichal Koutný <mkoutny@suse.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      9c974c77
    • Vasily Averin's avatar
      cgroup: cgroup_procs_next should increase position index · 2d4ecb03
      Vasily Averin authored
      If seq_file .next fuction does not change position index,
      read after some lseek can generate unexpected output:
      
      1) dd bs=1 skip output of each 2nd elements
      $ dd if=/sys/fs/cgroup/cgroup.procs bs=8 count=1
      2
      3
      4
      5
      1+0 records in
      1+0 records out
      8 bytes copied, 0,000267297 s, 29,9 kB/s
      [test@localhost ~]$ dd if=/sys/fs/cgroup/cgroup.procs bs=1 count=8
      2
      4 <<< NB! 3 was skipped
      6 <<<    ... and 5 too
      8 <<<    ... and 7
      8+0 records in
      8+0 records out
      8 bytes copied, 5,2123e-05 s, 153 kB/s
      
       This happen because __cgroup_procs_start() makes an extra
       extra cgroup_procs_next() call
      
      2) read after lseek beyond end of file generates whole last line.
      3) read after lseek into middle of last line generates
      expected rest of last line and unexpected whole line once again.
      
      Additionally patch removes an extra position index changes in
      __cgroup_procs_start()
      
      Cc: stable@vger.kernel.org
      https://bugzilla.kernel.org/show_bug.cgi?id=206283Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      2d4ecb03
    • Vasily Averin's avatar
      cgroup-v1: cgroup_pidlist_next should update position index · db8dd969
      Vasily Averin authored
      if seq_file .next fuction does not change position index,
      read after some lseek can generate unexpected output.
      
       # mount | grep cgroup
       # dd if=/mnt/cgroup.procs bs=1  # normal output
      ...
      1294
      1295
      1296
      1304
      1382
      584+0 records in
      584+0 records out
      584 bytes copied
      
      dd: /mnt/cgroup.procs: cannot skip to specified offset
      83  <<< generates end of last line
      1383  <<< ... and whole last line once again
      0+1 records in
      0+1 records out
      8 bytes copied
      
      dd: /mnt/cgroup.procs: cannot skip to specified offset
      1386  <<< generates last line anyway
      0+1 records in
      0+1 records out
      5 bytes copied
      
      https://bugzilla.kernel.org/show_bug.cgi?id=206283Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      db8dd969
    • Randy Dunlap's avatar
      linux/pipe_fs_i.h: fix kernel-doc warnings after @wait was split · 0bf999f9
      Randy Dunlap authored
      Fix kernel-doc warnings in struct pipe_inode_info after @wait was
      split into @rd_wait and @wr_wait.
      
        include/linux/pipe_fs_i.h:66: warning: Function parameter or member 'rd_wait' not described in 'pipe_inode_info'
        include/linux/pipe_fs_i.h:66: warning: Function parameter or member 'wr_wait' not described in 'pipe_inode_info'
      
      Fixes: 0ddad21d ("pipe: use exclusive waits when reading or writing")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0bf999f9
    • Linus Torvalds's avatar
      Merge tag 'kbuild-fixes-v5.6' of... · f2850dd5
      Linus Torvalds authored
      Merge tag 'kbuild-fixes-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
      
      Pull Kbuild fixes from Masahiro Yamada:
      
       - fix memory corruption in scripts/kallsyms
      
       - fix the vmlinux link stage to correctly update compile.h
      
      * tag 'kbuild-fixes-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        kbuild: fix mismatch between .version and include/generated/compile.h
        scripts/kallsyms: fix memory corruption caused by write over-run
      f2850dd5
    • Linus Torvalds's avatar
      Merge tag 'dax-fixes-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · 359c92c0
      Linus Torvalds authored
      Pull dax fixes from Dan Williams:
       "A fix for an xfstest failure and some and an update that removes an
        fsdax dependency on block devices.
      
        Summary:
      
         - Fix RWF_NOWAIT writes to properly return -EAGAIN
      
         - Clean up an unused helper
      
         - Update dax_writeback_mapping_range to not need a block_device
           argument"
      
      * tag 'dax-fixes-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        dax: pass NOWAIT flag to iomap_apply
        dax: Get rid of fs_dax_get_by_host() helper
        dax: Pass dax_dev instead of bdev to dax_writeback_mapping_range()
      359c92c0
    • Linus Torvalds's avatar
      Merge tag 'trace-v5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 61a75954
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
       "Various fixes:
      
         - Fix an uninitialized variable
      
         - Fix compile bug to bootconfig userspace tool (in tools directory)
      
         - Suppress some error messages of bootconfig userspace tool
      
         - Remove unneded CONFIG_LIBXBC from bootconfig
      
         - Allocate bootconfig xbc_nodes dynamically. To ease complaints about
           taking up static memory at boot up
      
         - Use of parse_args() to parse bootconfig instead of strstr() usage
           Prevents issues of double quotes containing the interested string
      
         - Fix missing ring_buffer_nest_end() on synthetic event error path
      
         - Return zero not -EINVAL on soft disabled synthetic event (soft
           disabling must be the same as hard disabling, which returns zero)
      
         - Consolidate synthetic event code (remove duplicate code)"
      
      * tag 'trace-v5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracing: Consolidate trace() functions
        tracing: Don't return -EINVAL when tracing soft disabled synth events
        tracing: Add missing nest end to synth_event_trace_start() error case
        tools/bootconfig: Suppress non-error messages
        bootconfig: Allocate xbc_nodes array dynamically
        bootconfig: Use parse_args() to find bootconfig and '--'
        tracing/kprobe: Fix uninitialized variable bug
        bootconfig: Remove unneeded CONFIG_LIBXBC
        tools/bootconfig: Fix wrong __VA_ARGS__ usage
      61a75954
  5. 11 Feb, 2020 5 commits
  6. 10 Feb, 2020 10 commits
  7. 09 Feb, 2020 3 commits
    • Linus Torvalds's avatar
      Merge tag 'zonefs-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs · 380a129e
      Linus Torvalds authored
      Pull new zonefs file system from Damien Le Moal:
       "Zonefs is a very simple file system exposing each zone of a zoned
        block device as a file.
      
        Unlike a regular file system with native zoned block device support
        (e.g. f2fs or the on-going btrfs effort), zonefs does not hide the
        sequential write constraint of zoned block devices to the user. As a
        result, zonefs is not a POSIX compliant file system. Its goal is to
        simplify the implementation of zoned block devices support in
        applications by replacing raw block device file accesses with a richer
        file based API, avoiding relying on direct block device file ioctls
        which may be more obscure to developers.
      
        One example of this approach is the implementation of LSM
        (log-structured merge) tree structures (such as used in RocksDB and
        LevelDB) on zoned block devices by allowing SSTables to be stored in a
        zone file similarly to a regular file system rather than as a range of
        sectors of a zoned device. The introduction of the higher level
        construct "one file is one zone" can help reducing the amount of
        changes needed in the application while at the same time allowing the
        use of zoned block devices with various programming languages other
        than C.
      
        Zonefs IO management implementation uses the new iomap generic code.
        Zonefs has been successfully tested using a functional test suite
        (available with zonefs userland format tool on github) and a prototype
        implementation of LevelDB on top of zonefs"
      
      * tag 'zonefs-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
        zonefs: Add documentation
        fs: New zonefs file system
      380a129e
    • Marc Zyngier's avatar
      irqchip/gic-v4.1: Avoid 64bit division for the sake of 32bit ARM · 490d332e
      Marc Zyngier authored
      In order to allow the GICv4 code to link properly on 32bit ARM,
      make sure we don't use 64bit divisions when it isn't strictly
      necessary.
      
      Fixes: 4e6437f1 ("irqchip/gic-v4.1: Ensure L2 vPE table is allocated at RD level")
      Reported-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Cc: Zenghui Yu <yuzenghui@huawei.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      490d332e
    • Linus Torvalds's avatar
      Merge tag '5.6-rc-smb3-plugfest-patches' of git://git.samba.org/sfrench/cifs-2.6 · d1ea35f4
      Linus Torvalds authored
      Pull cifs fixes from Steve French:
       "13 cifs/smb3 patches, most from testing at the SMB3 plugfest this week:
      
         - Important fix for multichannel and for modefromsid mounts.
      
         - Two reconnect fixes
      
         - Addition of SMB3 change notify support
      
         - Backup tools fix
      
         - A few additional minor debug improvements (tracepoints and
           additional logging found useful during testing this week)"
      
      * tag '5.6-rc-smb3-plugfest-patches' of git://git.samba.org/sfrench/cifs-2.6:
        smb3: Add defines for new information level, FileIdInformation
        smb3: print warning once if posix context returned on open
        smb3: add one more dynamic tracepoint missing from strict fsync path
        cifs: fix mode bits from dir listing when mounted with modefromsid
        cifs: fix channel signing
        cifs: add SMB3 change notification support
        cifs: make multichannel warning more visible
        cifs: fix soft mounts hanging in the reconnect code
        cifs: Add tracepoints for errors on flush or fsync
        cifs: log warning message (once) if out of disk space
        cifs: fail i/o on soft mounts if sessionsetup errors out
        smb3: fix problem with null cifs super block with previous patch
        SMB3: Backup intent flag missing from some more ops
      d1ea35f4