• Tejun Heo's avatar
    cgroup: implement delayed destruction for cgroup_pidlist · b1a21367
    Tejun Heo authored
    
    
    Currently, pidlists are reference counted from file open and release
    methods.  This means that holding onto an open file may waste memory
    and reads may return data which is very stale.  Both aren't critical
    because pidlists are keyed and shared per namespace and, well, the
    user isn't supposed to have large delay between open and reads.
    
    cgroup is planned to be converted to use kernfs and it'd be best if we
    can stick to just the seq_file operations - start, next, stop and
    show.  This can be achieved by loading pidlist on demand from start
    and release with time delay from stop, so that consecutive reads don't
    end up reloading the pidlist on each iteration.  This would remove the
    need for hooking into open and release while also avoiding issues with
    holding onto pidlist for too long.
    
    This patch implements delayed release of pidlist.  As pidlists could
    be lingering on cgroup removal waiting for the timer to expire, cgroup
    free path needs to queue the destruction work item immediately and
    flush.  As those work items are self-destroying, each work item can't
    be flushed directly.  A new workqueue - cgroup_pidlist_destroy_wq - is
    added to serve as flush domain.
    
    Note that this patch just adds delayed release on top of the current
    implementation and doesn't change where pidlist is loaded and
    released.  Following patches will make those changes.
    Signed-off-by: default avatarTejun Heo <tj@kernel.org>
    Acked-by: default avatarLi Zefan <lizefan@huawei.com>
    b1a21367
cgroup.c 148 KB