Commit 60056060 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'locking-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Ingo Molnar:
 "The biggest change to core locking facilities in this cycle is the
  introduction of local_lock_t - this primitive comes from the -rt
  project and identifies CPU-local locking dependencies normally handled
  opaquely beind preempt_disable() or local_irq_save/disable() critical
  sections.

  The generated code on mainline kernels doesn't change as a result, but
  still there are benefits: improved debugging and better documentation
  of data structure accesses.

  The new local_lock_t primitives are introduced and then utilized in a
  couple of kernel subsystems. No change in functionality is intended.

  There's also other smaller changes and cleanups"

* tag 'locking-core-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  zram: Use local lock to protect per-CPU data
  zram: Allocate struct zcomp_strm as per-CPU memory
  connector/cn_proc: Protect send_msg() with a local lock
  squashfs: Make use of local lock in multi_cpu decompressor
  mm/swap: Use local_lock for protection
  radix-tree: Use local_lock for protection
  locking: Introduce local_lock()
  locking/lockdep: Replace zero-length array with flexible-array
  locking/rtmutex: Remove unused rt_mutex_cmpxchg_relaxed()
parents 2227e5b2 19f545b6
......@@ -13,6 +13,7 @@ The kernel provides a variety of locking primitives which can be divided
into two categories:
- Sleeping locks
- CPU local locks
- Spinning locks
This document conceptually describes these lock types and provides rules
......@@ -44,9 +45,23 @@ Sleeping lock types:
On PREEMPT_RT kernels, these lock types are converted to sleeping locks:
- local_lock
- spinlock_t
- rwlock_t
CPU local locks
---------------
- local_lock
On non-PREEMPT_RT kernels, local_lock functions are wrappers around
preemption and interrupt disabling primitives. Contrary to other locking
mechanisms, disabling preemption or interrupts are pure CPU local
concurrency control mechanisms and not suited for inter-CPU concurrency
control.
Spinning locks
--------------
......@@ -67,6 +82,7 @@ can have suffixes which apply further protections:
_irqsave/restore() Save and disable / restore interrupt disabled state
=================== ====================================================
Owner semantics
===============
......@@ -139,6 +155,56 @@ implementation, thus changing the fairness:
writer from starving readers.
local_lock
==========
local_lock provides a named scope to critical sections which are protected
by disabling preemption or interrupts.
On non-PREEMPT_RT kernels local_lock operations map to the preemption and
interrupt disabling and enabling primitives:
=========================== ======================
local_lock(&llock) preempt_disable()
local_unlock(&llock) preempt_enable()
local_lock_irq(&llock) local_irq_disable()
local_unlock_irq(&llock) local_irq_enable()
local_lock_save(&llock) local_irq_save()
local_lock_restore(&llock) local_irq_save()
=========================== ======================
The named scope of local_lock has two advantages over the regular
primitives:
- The lock name allows static analysis and is also a clear documentation
of the protection scope while the regular primitives are scopeless and
opaque.
- If lockdep is enabled the local_lock gains a lockmap which allows to
validate the correctness of the protection. This can detect cases where
e.g. a function using preempt_disable() as protection mechanism is
invoked from interrupt or soft-interrupt context. Aside of that
lockdep_assert_held(&llock) works as with any other locking primitive.
local_lock and PREEMPT_RT
-------------------------
PREEMPT_RT kernels map local_lock to a per-CPU spinlock_t, thus changing
semantics:
- All spinlock_t changes also apply to local_lock.
local_lock usage
----------------
local_lock should be used in situations where disabling preemption or
interrupts is the appropriate form of concurrency control to protect
per-CPU data structures on a non PREEMPT_RT kernel.
local_lock is not suitable to protect against preemption or interrupts on a
PREEMPT_RT kernel due to the PREEMPT_RT specific spinlock_t semantics.
raw_spinlock_t and spinlock_t
=============================
......@@ -258,10 +324,82 @@ implementation, thus changing semantics:
PREEMPT_RT caveats
==================
local_lock on RT
----------------
The mapping of local_lock to spinlock_t on PREEMPT_RT kernels has a few
implications. For example, on a non-PREEMPT_RT kernel the following code
sequence works as expected::
local_lock_irq(&local_lock);
raw_spin_lock(&lock);
and is fully equivalent to::
raw_spin_lock_irq(&lock);
On a PREEMPT_RT kernel this code sequence breaks because local_lock_irq()
is mapped to a per-CPU spinlock_t which neither disables interrupts nor
preemption. The following code sequence works perfectly correct on both
PREEMPT_RT and non-PREEMPT_RT kernels::
local_lock_irq(&local_lock);
spin_lock(&lock);
Another caveat with local locks is that each local_lock has a specific
protection scope. So the following substitution is wrong::
func1()
{
local_irq_save(flags); -> local_lock_irqsave(&local_lock_1, flags);
func3();
local_irq_restore(flags); -> local_lock_irqrestore(&local_lock_1, flags);
}
func2()
{
local_irq_save(flags); -> local_lock_irqsave(&local_lock_2, flags);
func3();
local_irq_restore(flags); -> local_lock_irqrestore(&local_lock_2, flags);
}
func3()
{
lockdep_assert_irqs_disabled();
access_protected_data();
}
On a non-PREEMPT_RT kernel this works correctly, but on a PREEMPT_RT kernel
local_lock_1 and local_lock_2 are distinct and cannot serialize the callers
of func3(). Also the lockdep assert will trigger on a PREEMPT_RT kernel
because local_lock_irqsave() does not disable interrupts due to the
PREEMPT_RT-specific semantics of spinlock_t. The correct substitution is::
func1()
{
local_irq_save(flags); -> local_lock_irqsave(&local_lock, flags);
func3();
local_irq_restore(flags); -> local_lock_irqrestore(&local_lock, flags);
}
func2()
{
local_irq_save(flags); -> local_lock_irqsave(&local_lock, flags);
func3();
local_irq_restore(flags); -> local_lock_irqrestore(&local_lock, flags);
}
func3()
{
lockdep_assert_held(&local_lock);
access_protected_data();
}
spinlock_t and rwlock_t
-----------------------
These changes in spinlock_t and rwlock_t semantics on PREEMPT_RT kernels
The changes in spinlock_t and rwlock_t semantics on PREEMPT_RT kernels
have a few implications. For example, on a non-PREEMPT_RT kernel the
following code sequence works as expected::
......@@ -282,9 +420,61 @@ local_lock mechanism. Acquiring the local_lock pins the task to a CPU,
allowing things like per-CPU interrupt disabled locks to be acquired.
However, this approach should be used only where absolutely necessary.
A typical scenario is protection of per-CPU variables in thread context::
raw_spinlock_t
--------------
struct foo *p = get_cpu_ptr(&var1);
spin_lock(&p->lock);
p->count += this_cpu_read(var2);
This is correct code on a non-PREEMPT_RT kernel, but on a PREEMPT_RT kernel
this breaks. The PREEMPT_RT-specific change of spinlock_t semantics does
not allow to acquire p->lock because get_cpu_ptr() implicitly disables
preemption. The following substitution works on both kernels::
struct foo *p;
migrate_disable();
p = this_cpu_ptr(&var1);
spin_lock(&p->lock);
p->count += this_cpu_read(var2);
On a non-PREEMPT_RT kernel migrate_disable() maps to preempt_disable()
which makes the above code fully equivalent. On a PREEMPT_RT kernel
migrate_disable() ensures that the task is pinned on the current CPU which
in turn guarantees that the per-CPU access to var1 and var2 are staying on
the same CPU.
The migrate_disable() substitution is not valid for the following
scenario::
func()
{
struct foo *p;
migrate_disable();
p = this_cpu_ptr(&var1);
p->val = func2();
While correct on a non-PREEMPT_RT kernel, this breaks on PREEMPT_RT because
here migrate_disable() does not protect against reentrancy from a
preempting task. A correct substitution for this case is::
func()
{
struct foo *p;
local_lock(&foo_lock);
p = this_cpu_ptr(&var1);
p->val = func2();
On a non-PREEMPT_RT kernel this protects against reentrancy by disabling
preemption. On a PREEMPT_RT kernel this is achieved by acquiring the
underlying per-CPU spinlock.
raw_spinlock_t on RT
--------------------
Acquiring a raw_spinlock_t disables preemption and possibly also
interrupts, so the critical section must avoid acquiring a regular
......@@ -325,22 +515,25 @@ Lock type nesting rules
The most basic rules are:
- Lock types of the same lock category (sleeping, spinning) can nest
arbitrarily as long as they respect the general lock ordering rules to
prevent deadlocks.
- Lock types of the same lock category (sleeping, CPU local, spinning)
can nest arbitrarily as long as they respect the general lock ordering
rules to prevent deadlocks.
- Sleeping lock types cannot nest inside CPU local and spinning lock types.
- Sleeping lock types cannot nest inside spinning lock types.
- CPU local and spinning lock types can nest inside sleeping lock types.
- Spinning lock types can nest inside sleeping lock types.
- Spinning lock types can nest inside all lock types
These constraints apply both in PREEMPT_RT and otherwise.
The fact that PREEMPT_RT changes the lock category of spinlock_t and
rwlock_t from spinning to sleeping means that they cannot be acquired while
holding a raw spinlock. This results in the following nesting ordering:
rwlock_t from spinning to sleeping and substitutes local_lock with a
per-CPU spinlock_t means that they cannot be acquired while holding a raw
spinlock. This results in the following nesting ordering:
1) Sleeping locks
2) spinlock_t and rwlock_t
2) spinlock_t, rwlock_t, local_lock
3) raw_spinlock_t and bit spinlocks
Lockdep will complain if these constraints are violated, both in
......
......@@ -37,19 +37,16 @@ static void zcomp_strm_free(struct zcomp_strm *zstrm)
if (!IS_ERR_OR_NULL(zstrm->tfm))
crypto_free_comp(zstrm->tfm);
free_pages((unsigned long)zstrm->buffer, 1);
kfree(zstrm);
zstrm->tfm = NULL;
zstrm->buffer = NULL;
}
/*
* allocate new zcomp_strm structure with ->tfm initialized by
* backend, return NULL on error
* Initialize zcomp_strm structure with ->tfm initialized by backend, and
* ->buffer. Return a negative value on error.
*/
static struct zcomp_strm *zcomp_strm_alloc(struct zcomp *comp)
static int zcomp_strm_init(struct zcomp_strm *zstrm, struct zcomp *comp)
{
struct zcomp_strm *zstrm = kmalloc(sizeof(*zstrm), GFP_KERNEL);
if (!zstrm)
return NULL;
zstrm->tfm = crypto_alloc_comp(comp->name, 0, 0);
/*
* allocate 2 pages. 1 for compressed data, plus 1 extra for the
......@@ -58,9 +55,9 @@ static struct zcomp_strm *zcomp_strm_alloc(struct zcomp *comp)
zstrm->buffer = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 1);
if (IS_ERR_OR_NULL(zstrm->tfm) || !zstrm->buffer) {
zcomp_strm_free(zstrm);
zstrm = NULL;
return -ENOMEM;
}
return zstrm;
return 0;
}
bool zcomp_available_algorithm(const char *comp)
......@@ -113,12 +110,13 @@ ssize_t zcomp_available_show(const char *comp, char *buf)
struct zcomp_strm *zcomp_stream_get(struct zcomp *comp)
{
return *get_cpu_ptr(comp->stream);
local_lock(&comp->stream->lock);
return this_cpu_ptr(comp->stream);
}
void zcomp_stream_put(struct zcomp *comp)
{
put_cpu_ptr(comp->stream);
local_unlock(&comp->stream->lock);
}
int zcomp_compress(struct zcomp_strm *zstrm,
......@@ -159,17 +157,15 @@ int zcomp_cpu_up_prepare(unsigned int cpu, struct hlist_node *node)
{
struct zcomp *comp = hlist_entry(node, struct zcomp, node);
struct zcomp_strm *zstrm;
int ret;
if (WARN_ON(*per_cpu_ptr(comp->stream, cpu)))
return 0;
zstrm = per_cpu_ptr(comp->stream, cpu);
local_lock_init(&zstrm->lock);
zstrm = zcomp_strm_alloc(comp);
if (IS_ERR_OR_NULL(zstrm)) {
ret = zcomp_strm_init(zstrm, comp);
if (ret)
pr_err("Can't allocate a compression stream\n");
return -ENOMEM;
}
*per_cpu_ptr(comp->stream, cpu) = zstrm;
return 0;
return ret;
}
int zcomp_cpu_dead(unsigned int cpu, struct hlist_node *node)
......@@ -177,10 +173,8 @@ int zcomp_cpu_dead(unsigned int cpu, struct hlist_node *node)
struct zcomp *comp = hlist_entry(node, struct zcomp, node);
struct zcomp_strm *zstrm;
zstrm = *per_cpu_ptr(comp->stream, cpu);
if (!IS_ERR_OR_NULL(zstrm))
zstrm = per_cpu_ptr(comp->stream, cpu);
zcomp_strm_free(zstrm);
*per_cpu_ptr(comp->stream, cpu) = NULL;
return 0;
}
......@@ -188,7 +182,7 @@ static int zcomp_init(struct zcomp *comp)
{
int ret;
comp->stream = alloc_percpu(struct zcomp_strm *);
comp->stream = alloc_percpu(struct zcomp_strm);
if (!comp->stream)
return -ENOMEM;
......
......@@ -5,8 +5,11 @@
#ifndef _ZCOMP_H_
#define _ZCOMP_H_
#include <linux/local_lock.h>
struct zcomp_strm {
/* The members ->buffer and ->tfm are protected by ->lock. */
local_lock_t lock;
/* compression/decompression buffer */
void *buffer;
struct crypto_comp *tfm;
......@@ -14,7 +17,7 @@ struct zcomp_strm {
/* dynamic per-device compression frontend */
struct zcomp {
struct zcomp_strm * __percpu *stream;
struct zcomp_strm __percpu *stream;
const char *name;
struct hlist_node node;
};
......
......@@ -18,6 +18,7 @@
#include <linux/pid_namespace.h>
#include <linux/cn_proc.h>
#include <linux/local_lock.h>
/*
* Size of a cn_msg followed by a proc_event structure. Since the
......@@ -38,25 +39,31 @@ static inline struct cn_msg *buffer_to_cn_msg(__u8 *buffer)
static atomic_t proc_event_num_listeners = ATOMIC_INIT(0);
static struct cb_id cn_proc_event_id = { CN_IDX_PROC, CN_VAL_PROC };
/* proc_event_counts is used as the sequence number of the netlink message */
static DEFINE_PER_CPU(__u32, proc_event_counts) = { 0 };
/* local_event.count is used as the sequence number of the netlink message */
struct local_event {
local_lock_t lock;
__u32 count;
};
static DEFINE_PER_CPU(struct local_event, local_event) = {
.lock = INIT_LOCAL_LOCK(lock),
};
static inline void send_msg(struct cn_msg *msg)
{
preempt_disable();
local_lock(&local_event.lock);
msg->seq = __this_cpu_inc_return(proc_event_counts) - 1;
msg->seq = __this_cpu_inc_return(local_event.count) - 1;
((struct proc_event *)msg->data)->cpu = smp_processor_id();
/*
* Preemption remains disabled during send to ensure the messages are
* ordered according to their sequence numbers.
* local_lock() disables preemption during send to ensure the messages
* are ordered according to their sequence numbers.
*
* If cn_netlink_send() fails, the data is not sent.
*/
cn_netlink_send(msg, 0, CN_IDX_PROC, GFP_NOWAIT);
preempt_enable();
local_unlock(&local_event.lock);
}
void proc_fork_connector(struct task_struct *task)
......
......@@ -8,6 +8,7 @@
#include <linux/slab.h>
#include <linux/percpu.h>
#include <linux/buffer_head.h>
#include <linux/local_lock.h>
#include "squashfs_fs.h"
#include "squashfs_fs_sb.h"
......@@ -21,6 +22,7 @@
struct squashfs_stream {
void *stream;
local_lock_t lock;
};
void *squashfs_decompressor_create(struct squashfs_sb_info *msblk,
......@@ -41,6 +43,7 @@ void *squashfs_decompressor_create(struct squashfs_sb_info *msblk,
err = PTR_ERR(stream->stream);
goto out;
}
local_lock_init(&stream->lock);
}
kfree(comp_opts);
......@@ -75,12 +78,16 @@ void squashfs_decompressor_destroy(struct squashfs_sb_info *msblk)
int squashfs_decompress(struct squashfs_sb_info *msblk, struct buffer_head **bh,
int b, int offset, int length, struct squashfs_page_actor *output)
{
struct squashfs_stream __percpu *percpu =
(struct squashfs_stream __percpu *) msblk->stream;
struct squashfs_stream *stream = get_cpu_ptr(percpu);
int res = msblk->decompressor->decompress(msblk, stream->stream, bh, b,
struct squashfs_stream *stream;
int res;
local_lock(&msblk->stream->lock);
stream = this_cpu_ptr(msblk->stream);
res = msblk->decompressor->decompress(msblk, stream->stream, bh, b,
offset, length, output);
put_cpu_ptr(stream);
local_unlock(&msblk->stream->lock);
if (res < 0)
ERROR("%s decompression failed, data probably corrupt\n",
......
......@@ -171,7 +171,7 @@ static inline bool idr_is_empty(const struct idr *idr)
*/
static inline void idr_preload_end(void)
{
preempt_enable();
local_unlock(&radix_tree_preloads.lock);
}
/**
......
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _LINUX_LOCAL_LOCK_H
#define _LINUX_LOCAL_LOCK_H
#include <linux/local_lock_internal.h>
/**
* local_lock_init - Runtime initialize a lock instance
*/
#define local_lock_init(lock) __local_lock_init(lock)
/**
* local_lock - Acquire a per CPU local lock
* @lock: The lock variable
*/
#define local_lock(lock) __local_lock(lock)
/**
* local_lock_irq - Acquire a per CPU local lock and disable interrupts
* @lock: The lock variable
*/
#define local_lock_irq(lock) __local_lock_irq(lock)
/**
* local_lock_irqsave - Acquire a per CPU local lock, save and disable
* interrupts
* @lock: The lock variable
* @flags: Storage for interrupt flags
*/
#define local_lock_irqsave(lock, flags) \
__local_lock_irqsave(lock, flags)
/**
* local_unlock - Release a per CPU local lock
* @lock: The lock variable
*/
#define local_unlock(lock) __local_unlock(lock)
/**
* local_unlock_irq - Release a per CPU local lock and enable interrupts
* @lock: The lock variable
*/
#define local_unlock_irq(lock) __local_unlock_irq(lock)
/**
* local_unlock_irqrestore - Release a per CPU local lock and restore
* interrupt flags
* @lock: The lock variable
* @flags: Interrupt flags to restore
*/
#define local_unlock_irqrestore(lock, flags) \
__local_unlock_irqrestore(lock, flags)
#endif
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _LINUX_LOCAL_LOCK_H
# error "Do not include directly, include linux/local_lock.h"
#endif
#include <linux/percpu-defs.h>
#include <linux/lockdep.h>
typedef struct {
#ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map dep_map;
struct task_struct *owner;
#endif
} local_lock_t;
#ifdef CONFIG_DEBUG_LOCK_ALLOC
# define LL_DEP_MAP_INIT(lockname) \
.dep_map = { \
.name = #lockname, \
.wait_type_inner = LD_WAIT_CONFIG, \
}
#else
# define LL_DEP_MAP_INIT(lockname)
#endif
#define INIT_LOCAL_LOCK(lockname) { LL_DEP_MAP_INIT(lockname) }
#define __local_lock_init(lock) \
do { \
static struct lock_class_key __key; \
\
debug_check_no_locks_freed((void *)lock, sizeof(*lock));\
lockdep_init_map_wait(&(lock)->dep_map, #lock, &__key, 0, LD_WAIT_CONFIG);\
} while (0)
#ifdef CONFIG_DEBUG_LOCK_ALLOC
static inline void local_lock_acquire(local_lock_t *l)
{
lock_map_acquire(&l->dep_map);
DEBUG_LOCKS_WARN_ON(l->owner);
l->owner = current;
}
static inline void local_lock_release(local_lock_t *l)
{
DEBUG_LOCKS_WARN_ON(l->owner != current);
l->owner = NULL;
lock_map_release(&l->dep_map);
}
#else /* CONFIG_DEBUG_LOCK_ALLOC */
static inline void local_lock_acquire(local_lock_t *l) { }
static inline void local_lock_release(local_lock_t *l) { }
#endif /* !CONFIG_DEBUG_LOCK_ALLOC */
#define __local_lock(lock) \
do { \
preempt_disable(); \
local_lock_acquire(this_cpu_ptr(lock)); \
} while (0)
#define __local_lock_irq(lock) \
do { \
local_irq_disable(); \
local_lock_acquire(this_cpu_ptr(lock)); \
} while (0)
#define __local_lock_irqsave(lock, flags) \
do { \
local_irq_save(flags); \
local_lock_acquire(this_cpu_ptr(lock)); \
} while (0)
#define __local_unlock(lock) \
do { \
local_lock_release(this_cpu_ptr(lock)); \
preempt_enable(); \
} while (0)
#define __local_unlock_irq(lock) \
do { \
local_lock_release(this_cpu_ptr(lock)); \
local_irq_enable(); \
} while (0)
#define __local_unlock_irqrestore(lock, flags) \
do { \
local_lock_release(this_cpu_ptr(lock)); \
local_irq_restore(flags); \
} while (0)
......@@ -16,11 +16,20 @@
#include <linux/spinlock.h>
#include <linux/types.h>
#include <linux/xarray.h>
#include <linux/local_lock.h>
/* Keep unconverted code working */
#define radix_tree_root xarray
#define radix_tree_node xa_node
struct radix_tree_preload {
local_lock_t lock;
unsigned nr;
/* nodes->parent points to next preallocated node */
struct radix_tree_node *nodes;
};
DECLARE_PER_CPU(struct radix_tree_preload, radix_tree_preloads);
/*
* The bottom two bits of the slot determine how the remaining bits in the
* slot are interpreted:
......@@ -245,7 +254,7 @@ int radix_tree_tagged(const struct radix_tree_root *, unsigned int tag);
static inline void radix_tree_preload_end(void)
{
preempt_enable();
local_unlock(&radix_tree_preloads.lock);
}
void __rcu **idr_get_free(struct radix_tree_root *root,
......
......@@ -337,6 +337,7 @@ extern void activate_page(struct page *);
extern void mark_page_accessed(struct page *);
extern void lru_add_drain(void);
extern void lru_add_drain_cpu(int cpu);
extern void lru_add_drain_cpu_zone(struct zone *zone);
extern void lru_add_drain_all(void);
extern void rotate_reclaimable_page(struct page *page);
extern void deactivate_file_page(struct page *page);
......
......@@ -470,7 +470,7 @@ struct lock_trace {
struct hlist_node hash_entry;
u32 hash;
u32 nr_entries;
unsigned long entries[0] __aligned(sizeof(unsigned long));
unsigned long entries[] __aligned(sizeof(unsigned long));
};
#define LOCK_TRACE_SIZE_IN_LONGS \
(sizeof(struct lock_trace) / sizeof(unsigned long))
......
......@@ -141,7 +141,6 @@ static void fixup_rt_mutex_waiters(struct rt_mutex *lock)
* set up.
*/
#ifndef CONFIG_DEBUG_RT_MUTEXES
# define rt_mutex_cmpxchg_relaxed(l,c,n) (cmpxchg_relaxed(&l->owner, c, n) == c)
# define rt_mutex_cmpxchg_acquire(l,c,n) (cmpxchg_acquire(&l->owner, c, n) == c)
# define rt_mutex_cmpxchg_release(l,c,n) (cmpxchg_release(&l->owner, c, n) == c)
......@@ -202,7 +201,6 @@ static inline bool unlock_rt_mutex_safe(struct rt_mutex *lock,
}
#else
# define rt_mutex_cmpxchg_relaxed(l,c,n) (0)
# define rt_mutex_cmpxchg_acquire(l,c,n) (0)
# define rt_mutex_cmpxchg_release(l,c,n) (0)
......
......@@ -20,6 +20,7 @@
#include <linux/kernel.h>
#include <linux/kmemleak.h>
#include <linux/percpu.h>
#include <linux/local_lock.h>
#include <linux/preempt.h> /* in_interrupt() */
#include <linux/radix-tree.h>
#include <linux/rcupdate.h>
......@@ -27,7 +28,6 @@
#include <linux/string.h>
#include <linux/xarray.h>
/*
* Radix tree node cache.
*/
......@@ -58,12 +58,10 @@ struct kmem_cache *radix_tree_node_cachep;
/*
* Per-cpu pool of preloaded nodes
*/
struct radix_tree_preload {
unsigned nr;
/* nodes->parent points to next preallocated node */
struct radix_tree_node *nodes;
DEFINE_PER_CPU(struct radix_tree_preload, radix_tree_preloads) = {
.lock = INIT_LOCAL_LOCK(lock),
};
static DEFINE_PER_CPU(struct radix_tree_preload, radix_tree_preloads) = { 0, };
EXPORT_PER_CPU_SYMBOL_GPL(radix_tree_preloads);
static inline struct radix_tree_node *entry_to_node(void *ptr)
{
......@@ -332,14 +330,14 @@ static __must_check int __radix_tree_preload(gfp_t gfp_mask, unsigned nr)
*/
gfp_mask &= ~__GFP_ACCOUNT;
preempt_disable();
local_lock(&radix_tree_preloads.lock);
rtp = this_cpu_ptr(&radix_tree_preloads);
while (rtp->nr < nr) {
preempt_enable();
local_unlock(&radix_tree_preloads.lock);
node = kmem_cache_alloc(radix_tree_node_cachep, gfp_mask);
if (node == NULL)
goto out;
preempt_disable();
local_lock(&radix_tree_preloads.lock);
rtp = this_cpu_ptr(&radix_tree_preloads);
if (rtp->nr < nr) {
node->parent = rtp->nodes;
......@@ -381,7 +379,7 @@ int radix_tree_maybe_preload(gfp_t gfp_mask)
if (gfpflags_allow_blocking(gfp_mask))
return __radix_tree_preload(gfp_mask, RADIX_TREE_PRELOAD_SIZE);
/* Preloading doesn't help anything with this gfp mask, skip it */
preempt_disable();
local_lock(&radix_tree_preloads.lock);
return 0;
}
EXPORT_SYMBOL(radix_tree_maybe_preload);
......@@ -1470,7 +1468,7 @@ EXPORT_SYMBOL(radix_tree_tagged);
void idr_preload(gfp_t gfp_mask)
{
if (__radix_tree_preload(gfp_mask, IDR_PRELOAD_SIZE))
preempt_disable();
local_lock(&radix_tree_preloads.lock);
}
EXPORT_SYMBOL(idr_preload);
......
......@@ -2243,15 +2243,11 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
* would succeed.
*/
if (cc->order > 0 && last_migrated_pfn) {
int cpu;
unsigned long current_block_start =
block_start_pfn(cc->migrate_pfn, cc->order);
if (last_migrated_pfn < current_block_start) {
cpu = get_cpu();
lru_add_drain_cpu(cpu);
drain_local_pages(cc->zone);
put_cpu();
lru_add_drain_cpu_zone(cc->zone);
/* No more flushing until we migrate again */
last_migrated_pfn = 0;
}
......
......@@ -35,6 +35,7 @@
#include <linux/uio.h>
#include <linux/hugetlb.h>
#include <linux/page_idle.h>
#include <linux/local_lock.h>
#include "internal.h"
......@@ -44,14 +45,32 @@
/* How many pages do we try to swap or page in/out together? */
int page_cluster;
static DEFINE_PER_CPU(struct pagevec, lru_add_pvec);
static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
static DEFINE_PER_CPU(struct pagevec, lru_deactivate_file_pvecs);
static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs);
static DEFINE_PER_CPU(struct pagevec, lru_lazyfree_pvecs);
/* Protecting only lru_rotate.pvec which requires disabling interrupts */
struct lru_rotate {
local_lock_t lock;
struct pagevec pvec;
};
static DEFINE_PER_CPU(struct lru_rotate, lru_rotate) = {
.lock = INIT_LOCAL_LOCK(lock),
};
/*
* The following struct pagevec are grouped together because they are protected
* by disabling preemption (and interrupts remain enabled).
*/
struct lru_pvecs {
local_lock_t lock;
struct pagevec lru_add;
struct pagevec lru_deactivate_file;
struct pagevec lru_deactivate;
struct pagevec lru_lazyfree;
#ifdef CONFIG_SMP
static DEFINE_PER_CPU(struct pagevec, activate_page_pvecs);
struct pagevec activate_page;
#endif
};
static DEFINE_PER_CPU(struct lru_pvecs, lru_pvecs) = {
.lock = INIT_LOCAL_LOCK(lock),
};
/*
* This path almost never happens for VM activity - pages are normally
......@@ -254,11 +273,11 @@ void rotate_reclaimable_page(struct page *page)
unsigned long flags;
get_page(page);
local_irq_save(flags);
pvec = this_cpu_ptr(&lru_rotate_pvecs);
local_lock_irqsave(&lru_rotate.lock, flags);
pvec = this_cpu_ptr(&lru_rotate.pvec);
if (!pagevec_add(pvec, page) || PageCompound(page))
pagevec_move_tail(pvec);
local_irq_restore(flags);
local_unlock_irqrestore(&lru_rotate.lock, flags);
}
}
......@@ -293,7 +312,7 @@ static void __activate_page(struct page *page, struct lruvec *lruvec,
#ifdef CONFIG_SMP
static void activate_page_drain(int cpu)
{
struct pagevec *pvec = &per_cpu(activate_page_pvecs, cpu);
struct pagevec *pvec = &per_cpu(lru_pvecs.activate_page, cpu);
if (pagevec_count(pvec))
pagevec_lru_move_fn(pvec, __activate_page, NULL);
......@@ -301,19 +320,21 @@ static void activate_page_drain(int cpu)
static bool need_activate_page_drain(int cpu)
{
return pagevec_count(&per_cpu(activate_page_pvecs, cpu)) != 0;
return pagevec_count(&per_cpu(lru_pvecs.activate_page, cpu)) != 0;
}
void activate_page(struct page *page)
{
page = compound_head(page);
if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) {
struct pagevec *pvec = &get_cpu_var(activate_page_pvecs);
struct pagevec *pvec;
local_lock(&lru_pvecs.lock);
pvec = this_cpu_ptr(&lru_pvecs.activate_page);
get_page(page);
if (!pagevec_add(pvec, page) || PageCompound(page))
pagevec_lru_move_fn(pvec, __activate_page, NULL);
put_cpu_var(activate_page_pvecs);
local_unlock(&lru_pvecs.lock);
}
}
......@@ -335,9 +356,12 @@ void activate_page(struct page *page)
static void __lru_cache_activate_page(struct page *page)
{
struct pagevec *pvec = &get_cpu_var(lru_add_pvec);
struct pagevec *pvec;
int i;
local_lock(&lru_pvecs.lock);
pvec = this_cpu_ptr(&lru_pvecs.lru_add);
/*
* Search backwards on the optimistic assumption that the page being
* activated has just been added to this pagevec. Note that only
......@@ -357,7 +381,7 @@ static void __lru_cache_activate_page(struct page *page)
}
}
put_cpu_var(lru_add_pvec);
local_unlock(&lru_pvecs.lock);
}
/*
......@@ -385,7 +409,7 @@ void mark_page_accessed(struct page *page)
} else if (!PageActive(page)) {
/*
* If the page is on the LRU, queue it for activation via
* activate_page_pvecs. Otherwise, assume the page is on a
* lru_pvecs.activate_page. Otherwise, assume the page is on a
* pagevec, mark it active and it'll be moved to the active
* LRU on the next drain.
*/
......@@ -404,12 +428,14 @@ EXPORT_SYMBOL(mark_page_accessed);
static void __lru_cache_add(struct page *page)
{
struct pagevec *pvec = &get_cpu_var(lru_add_pvec);
struct pagevec *pvec;
local_lock(&lru_pvecs.lock);
pvec = this_cpu_ptr(&lru_pvecs.lru_add);
get_page(page);
if (!pagevec_add(pvec, page) || PageCompound(page))
__pagevec_lru_add(pvec);
put_cpu_var(lru_add_pvec);
local_unlock(&lru_pvecs.lock);
}
/**
......@@ -593,30 +619,30 @@ static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec,
*/
void lru_add_drain_cpu(int cpu)
{
struct pagevec *pvec = &per_cpu(lru_add_pvec, cpu);
struct pagevec *pvec = &per_cpu(lru_pvecs.lru_add, cpu);
if (pagevec_count(pvec))
__pagevec_lru_add(pvec);
pvec = &per_cpu(lru_rotate_pvecs, cpu);
pvec = &per_cpu(lru_rotate.pvec, cpu);
if (pagevec_count(pvec)) {
unsigned long flags;
/* No harm done if a racing interrupt already did this */
local_irq_save(flags);
local_lock_irqsave(&lru_rotate.lock, flags);
pagevec_move_tail(pvec);
local_irq_restore(flags);
local_unlock_irqrestore(&lru_rotate.lock, flags);
}
pvec = &per_cpu(lru_deactivate_file_pvecs, cpu);
pvec = &per_cpu(lru_pvecs.lru_deactivate_file, cpu);
if (pagevec_count(pvec))
pagevec_lru_move_fn(pvec, lru_deactivate_file_fn, NULL);
pvec = &per_cpu(lru_deactivate_pvecs, cpu);
pvec = &per_cpu(lru_pvecs.lru_deactivate, cpu);
if (pagevec_count(pvec))
pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL);
pvec = &per_cpu(lru_lazyfree_pvecs, cpu);
pvec = &per_cpu(lru_pvecs.lru_lazyfree, cpu);
if (pagevec_count(pvec))
pagevec_lru_move_fn(pvec, lru_lazyfree_fn, NULL);
......@@ -641,11 +667,14 @@ void deactivate_file_page(struct page *page)
return;
if (likely(get_page_unless_zero(page))) {
struct pagevec *pvec = &get_cpu_var(lru_deactivate_file_pvecs);
struct pagevec *pvec;
local_lock(&lru_pvecs.lock);
pvec = this_cpu_ptr(&lru_pvecs.lru_deactivate_file);
if (!pagevec_add(pvec, page) || PageCompound(page))
pagevec_lru_move_fn(pvec, lru_deactivate_file_fn, NULL);
put_cpu_var(lru_deactivate_file_pvecs);
local_unlock(&lru_pvecs.lock);
}
}
......@@ -660,12 +689,14 @@ void deactivate_file_page(struct page *page)
void deactivate_page(struct page *page)
{
if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) {
struct pagevec *pvec = &get_cpu_var(lru_deactivate_pvecs);
struct pagevec *pvec;
local_lock(&lru_pvecs.lock);
pvec = this_cpu_ptr(&lru_pvecs.lru_deactivate);
get_page(page);
if (!pagevec_add(pvec, page) || PageCompound(page))
pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL);
put_cpu_var(lru_deactivate_pvecs);
local_unlock(&lru_pvecs.lock);
}
}
......@@ -680,19 +711,30 @@ void mark_page_lazyfree(struct page *page)
{
if (PageLRU(page) && PageAnon(page) && PageSwapBacked(page) &&
!PageSwapCache(page) && !PageUnevictable(page)) {
struct pagevec *pvec = &get_cpu_var(lru_lazyfree_pvecs);
struct pagevec *pvec;
local_lock(&lru_pvecs.lock);
pvec = this_cpu_ptr(&lru_pvecs.lru_lazyfree);
get_page(page);
if (!pagevec_add(pvec, page) || PageCompound(page))
pagevec_lru_move_fn(pvec, lru_lazyfree_fn, NULL);
put_cpu_var(lru_lazyfree_pvecs);
local_unlock(&lru_pvecs.lock);
}
}
void lru_add_drain(void)
{
lru_add_drain_cpu(get_cpu());
put_cpu();
local_lock(&lru_pvecs.lock);
lru_add_drain_cpu(smp_processor_id());
local_unlock(&lru_pvecs.lock);
}
void lru_add_drain_cpu_zone(struct zone *zone)
{
local_lock(&lru_pvecs.lock);
lru_add_drain_cpu(smp_processor_id());
drain_local_pages(zone);
local_unlock(&lru_pvecs.lock);
}
#ifdef CONFIG_SMP
......@@ -743,11 +785,11 @@ void lru_add_drain_all(void)
for_each_online_cpu(cpu) {
struct work_struct *work = &per_cpu(lru_add_drain_work, cpu);
if (pagevec_count(&per_cpu(lru_add_pvec, cpu)) ||
pagevec_count(&per_cpu(lru_rotate_pvecs, cpu)) ||
pagevec_count(&per_cpu(lru_deactivate_file_pvecs, cpu)) ||
pagevec_count(&per_cpu(lru_deactivate_pvecs, cpu)) ||
pagevec_count(&per_cpu(lru_lazyfree_pvecs, cpu)) ||
if (pagevec_count(&per_cpu(lru_pvecs.lru_add, cpu)) ||
pagevec_count(&per_cpu(lru_rotate.pvec, cpu)) ||
pagevec_count(&per_cpu(lru_pvecs.lru_deactivate_file, cpu)) ||
pagevec_count(&per_cpu(lru_pvecs.lru_deactivate, cpu)) ||
pagevec_count(&per_cpu(lru_pvecs.lru_lazyfree, cpu)) ||
need_activate_page_drain(cpu)) {
INIT_WORK(work, lru_add_drain_per_cpu);
queue_work_on(cpu, mm_percpu_wq, work);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment