Commit 595b28fb authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'locking-core-2021-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Thomas Gleixner:

 - Move futex code into kernel/futex/ and split up the kitchen sink into
   seperate files to make integration of sys_futex_waitv() simpler.

 - Add a new sys_futex_waitv() syscall which allows to wait on multiple
   futexes.

   The main use case is emulating Windows' WaitForMultipleObjects which
   allows Wine to improve the performance of Windows Games. Also native
   Linux games can benefit from this interface as this is a common wait
   pattern for this kind of applications.

 - Add context to ww_mutex_trylock() to provide a path for i915 to
   rework their eviction code step by step without making lockdep upset
   until the final steps of rework are completed. It's also useful for
   regulator and TTM to avoid dropping locks in the non contended path.

 - Lockdep and might_sleep() cleanups and improvements

 - A few improvements for the RT substitutions.

 - The usual small improvements and cleanups.

* tag 'locking-core-2021-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
  locking: Remove spin_lock_flags() etc
  locking/rwsem: Fix comments about reader optimistic lock stealing conditions
  locking: Remove rcu_read_{,un}lock() for preempt_{dis,en}able()
  locking/rwsem: Disable preemption for spinning region
  docs: futex: Fix kernel-doc references
  futex: Fix PREEMPT_RT build
  futex2: Documentation: Document sys_futex_waitv() uAPI
  selftests: futex: Test sys_futex_waitv() wouldblock
  selftests: futex: Test sys_futex_waitv() timeout
  selftests: futex: Add sys_futex_waitv() test
  futex,arm: Wire up sys_futex_waitv()
  futex,x86: Wire up sys_futex_waitv()
  futex: Implement sys_futex_waitv()
  futex: Simplify double_lock_hb()
  futex: Split out wait/wake
  futex: Split out requeue
  futex: Rename mark_wake_futex()
  futex: Rename: match_futex()
  futex: Rename: hb_waiter_{inc,dec,pending}()
  futex: Split out PI futex
  ...
parents 91e1c99e f98a3dcc
......@@ -1352,7 +1352,19 @@ Mutex API reference
Futex API reference
===================
.. kernel-doc:: kernel/futex.c
.. kernel-doc:: kernel/futex/core.c
:internal:
.. kernel-doc:: kernel/futex/futex.h
:internal:
.. kernel-doc:: kernel/futex/pi.c
:internal:
.. kernel-doc:: kernel/futex/requeue.c
:internal:
.. kernel-doc:: kernel/futex/waitwake.c
:internal:
Further reading
......
......@@ -1396,7 +1396,19 @@ Riferimento per l'API dei Mutex
Riferimento per l'API dei Futex
===============================
.. kernel-doc:: kernel/futex.c
.. kernel-doc:: kernel/futex/core.c
:internal:
.. kernel-doc:: kernel/futex/futex.h
:internal:
.. kernel-doc:: kernel/futex/pi.c
:internal:
.. kernel-doc:: kernel/futex/requeue.c
:internal:
.. kernel-doc:: kernel/futex/waitwake.c
:internal:
Approfondimenti
......
.. SPDX-License-Identifier: GPL-2.0
======
futex2
======
:Author: André Almeida <andrealmeid@collabora.com>
futex, or fast user mutex, is a set of syscalls to allow userspace to create
performant synchronization mechanisms, such as mutexes, semaphores and
conditional variables in userspace. C standard libraries, like glibc, uses it
as a means to implement more high level interfaces like pthreads.
futex2 is a followup version of the initial futex syscall, designed to overcome
limitations of the original interface.
User API
========
``futex_waitv()``
-----------------
Wait on an array of futexes, wake on any::
futex_waitv(struct futex_waitv *waiters, unsigned int nr_futexes,
unsigned int flags, struct timespec *timeout, clockid_t clockid)
struct futex_waitv {
__u64 val;
__u64 uaddr;
__u32 flags;
__u32 __reserved;
};
Userspace sets an array of struct futex_waitv (up to a max of 128 entries),
using ``uaddr`` for the address to wait for, ``val`` for the expected value
and ``flags`` to specify the type (e.g. private) and size of futex.
``__reserved`` needs to be 0, but it can be used for future extension. The
pointer for the first item of the array is passed as ``waiters``. An invalid
address for ``waiters`` or for any ``uaddr`` returns ``-EFAULT``.
If userspace has 32-bit pointers, it should do a explicit cast to make sure
the upper bits are zeroed. ``uintptr_t`` does the tricky and it works for
both 32/64-bit pointers.
``nr_futexes`` specifies the size of the array. Numbers out of [1, 128]
interval will make the syscall return ``-EINVAL``.
The ``flags`` argument of the syscall needs to be 0, but it can be used for
future extension.
For each entry in ``waiters`` array, the current value at ``uaddr`` is compared
to ``val``. If it's different, the syscall undo all the work done so far and
return ``-EAGAIN``. If all tests and verifications succeeds, syscall waits until
one of the following happens:
- The timeout expires, returning ``-ETIMEOUT``.
- A signal was sent to the sleeping task, returning ``-ERESTARTSYS``.
- Some futex at the list was woken, returning the index of some waked futex.
An example of how to use the interface can be found at ``tools/testing/selftests/futex/functional/futex_waitv.c``.
Timeout
-------
``struct timespec *timeout`` argument is an optional argument that points to an
absolute timeout. You need to specify the type of clock being used at
``clockid`` argument. ``CLOCK_MONOTONIC`` and ``CLOCK_REALTIME`` are supported.
This syscall accepts only 64bit timespec structs.
Types of futex
--------------
A futex can be either private or shared. Private is used for processes that
shares the same memory space and the virtual address of the futex will be the
same for all processes. This allows for optimizations in the kernel. To use
private futexes, it's necessary to specify ``FUTEX_PRIVATE_FLAG`` in the futex
flag. For processes that doesn't share the same memory space and therefore can
have different virtual addresses for the same futex (using, for instance, a
file-backed shared memory) requires different internal mechanisms to be get
properly enqueued. This is the default behavior, and it works with both private
and shared futexes.
Futexes can be of different sizes: 8, 16, 32 or 64 bits. Currently, the only
supported one is 32 bit sized futex, and it need to be specified using
``FUTEX_32`` flag.
......@@ -28,6 +28,7 @@ place where this information is gathered.
media/index
sysfs-platform_profile
vduse
futex2
.. only:: subproject and html
......
......@@ -7737,6 +7737,7 @@ M: Ingo Molnar <mingo@redhat.com>
R: Peter Zijlstra <peterz@infradead.org>
R: Darren Hart <dvhart@infradead.org>
R: Davidlohr Bueso <dave@stgolabs.net>
R: André Almeida <andrealmeid@collabora.com>
L: linux-kernel@vger.kernel.org
S: Maintained
T: git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git locking/core
......@@ -7744,7 +7745,7 @@ F: Documentation/locking/*futex*
F: include/asm-generic/futex.h
F: include/linux/futex.h
F: include/uapi/linux/futex.h
F: kernel/futex.c
F: kernel/futex/*
F: tools/perf/bench/futex*
F: tools/testing/selftests/futex/
......
......@@ -462,3 +462,4 @@
446 common landlock_restrict_self sys_landlock_restrict_self
# 447 reserved for memfd_secret
448 common process_mrelease sys_process_mrelease
449 common futex_waitv sys_futex_waitv
......@@ -38,7 +38,7 @@
#define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5)
#define __ARM_NR_COMPAT_END (__ARM_NR_COMPAT_BASE + 0x800)
#define __NR_compat_syscalls 449
#define __NR_compat_syscalls 450
#endif
#define __ARCH_WANT_SYS_CLONE
......
......@@ -903,6 +903,8 @@ __SYSCALL(__NR_landlock_add_rule, sys_landlock_add_rule)
__SYSCALL(__NR_landlock_restrict_self, sys_landlock_restrict_self)
#define __NR_process_mrelease 448
__SYSCALL(__NR_process_mrelease, sys_process_mrelease)
#define __NR_futex_waitv 449
__SYSCALL(__NR_futex_waitv, sys_futex_waitv)
/*
* Please add new compat syscalls above this comment and update
......
......@@ -124,18 +124,13 @@ static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
__ticket_spin_unlock(lock);
}
static __always_inline void arch_spin_lock_flags(arch_spinlock_t *lock,
unsigned long flags)
{
arch_spin_lock(lock);
}
#define arch_spin_lock_flags arch_spin_lock_flags
#ifdef ASM_SUPPORTED
static __always_inline void
arch_read_lock_flags(arch_rwlock_t *lock, unsigned long flags)
arch_read_lock(arch_rwlock_t *lock)
{
unsigned long flags = 0;
__asm__ __volatile__ (
"tbit.nz p6, p0 = %1,%2\n"
"br.few 3f\n"
......@@ -157,13 +152,8 @@ arch_read_lock_flags(arch_rwlock_t *lock, unsigned long flags)
: "p6", "p7", "r2", "memory");
}
#define arch_read_lock_flags arch_read_lock_flags
#define arch_read_lock(lock) arch_read_lock_flags(lock, 0)
#else /* !ASM_SUPPORTED */
#define arch_read_lock_flags(rw, flags) arch_read_lock(rw)
#define arch_read_lock(rw) \
do { \
arch_rwlock_t *__read_lock_ptr = (rw); \
......@@ -186,8 +176,10 @@ do { \
#ifdef ASM_SUPPORTED
static __always_inline void
arch_write_lock_flags(arch_rwlock_t *lock, unsigned long flags)
arch_write_lock(arch_rwlock_t *lock)
{
unsigned long flags = 0;
__asm__ __volatile__ (
"tbit.nz p6, p0 = %1, %2\n"
"mov ar.ccv = r0\n"
......@@ -210,9 +202,6 @@ arch_write_lock_flags(arch_rwlock_t *lock, unsigned long flags)
: "ar.ccv", "p6", "p7", "r2", "r29", "memory");
}
#define arch_write_lock_flags arch_write_lock_flags
#define arch_write_lock(rw) arch_write_lock_flags(rw, 0)
#define arch_write_trylock(rw) \
({ \
register long result; \
......
......@@ -19,9 +19,6 @@
#include <asm/qrwlock.h>
#define arch_read_lock_flags(lock, flags) arch_read_lock(lock)
#define arch_write_lock_flags(lock, flags) arch_write_lock(lock)
#define arch_spin_relax(lock) cpu_relax()
#define arch_read_relax(lock) cpu_relax()
#define arch_write_relax(lock) cpu_relax()
......
......@@ -23,21 +23,6 @@ static inline void arch_spin_lock(arch_spinlock_t *x)
continue;
}
static inline void arch_spin_lock_flags(arch_spinlock_t *x,
unsigned long flags)
{
volatile unsigned int *a;
a = __ldcw_align(x);
while (__ldcw(a) == 0)
while (*a == 0)
if (flags & PSW_SM_I) {
local_irq_enable();
local_irq_disable();
}
}
#define arch_spin_lock_flags arch_spin_lock_flags
static inline void arch_spin_unlock(arch_spinlock_t *x)
{
volatile unsigned int *a;
......
......@@ -123,27 +123,6 @@ static inline void arch_spin_lock(arch_spinlock_t *lock)
}
}
static inline
void arch_spin_lock_flags(arch_spinlock_t *lock, unsigned long flags)
{
unsigned long flags_dis;
while (1) {
if (likely(__arch_spin_trylock(lock) == 0))
break;
local_save_flags(flags_dis);
local_irq_restore(flags);
do {
HMT_low();
if (is_shared_processor())
splpar_spin_yield(lock);
} while (unlikely(lock->slock != 0));
HMT_medium();
local_irq_restore(flags_dis);
}
}
#define arch_spin_lock_flags arch_spin_lock_flags
static inline void arch_spin_unlock(arch_spinlock_t *lock)
{
__asm__ __volatile__("# arch_spin_unlock\n\t"
......
......@@ -67,14 +67,6 @@ static inline void arch_spin_lock(arch_spinlock_t *lp)
arch_spin_lock_wait(lp);
}
static inline void arch_spin_lock_flags(arch_spinlock_t *lp,
unsigned long flags)
{
if (!arch_spin_trylock_once(lp))
arch_spin_lock_wait(lp);
}
#define arch_spin_lock_flags arch_spin_lock_flags
static inline int arch_spin_trylock(arch_spinlock_t *lp)
{
if (!arch_spin_trylock_once(lp))
......
......@@ -453,3 +453,4 @@
446 i386 landlock_restrict_self sys_landlock_restrict_self
447 i386 memfd_secret sys_memfd_secret
448 i386 process_mrelease sys_process_mrelease
449 i386 futex_waitv sys_futex_waitv
......@@ -370,6 +370,7 @@
446 common landlock_restrict_self sys_landlock_restrict_self
447 common memfd_secret sys_memfd_secret
448 common process_mrelease sys_process_mrelease
449 common futex_waitv sys_futex_waitv
#
# Due to a historical design error, certain syscalls are numbered differently
......
......@@ -248,7 +248,7 @@ static inline int modeset_lock(struct drm_modeset_lock *lock,
if (ctx->trylock_only) {
lockdep_assert_held(&ctx->ww_ctx);
if (!ww_mutex_trylock(&lock->mutex))
if (!ww_mutex_trylock(&lock->mutex, NULL))
return -EBUSY;
else
return 0;
......
......@@ -145,7 +145,7 @@ static inline int regulator_lock_nested(struct regulator_dev *rdev,
mutex_lock(&regulator_nesting_mutex);
if (ww_ctx || !ww_mutex_trylock(&rdev->mutex)) {
if (!ww_mutex_trylock(&rdev->mutex, ww_ctx)) {
if (rdev->mutex_owner == current)
rdev->ref_cnt++;
else
......
......@@ -47,8 +47,6 @@ extern int debug_locks_off(void);
# define locking_selftest() do { } while (0)
#endif
struct task_struct;
#ifdef CONFIG_LOCKDEP
extern void debug_show_all_locks(void);
extern void debug_show_held_locks(struct task_struct *task);
......
......@@ -173,7 +173,7 @@ static inline int dma_resv_lock_slow_interruptible(struct dma_resv *obj,
*/
static inline bool __must_check dma_resv_trylock(struct dma_resv *obj)
{
return ww_mutex_trylock(&obj->lock);
return ww_mutex_trylock(&obj->lock, NULL);
}
/**
......
......@@ -111,8 +111,8 @@ static __always_inline void might_resched(void)
#endif /* CONFIG_PREEMPT_* */
#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
extern void ___might_sleep(const char *file, int line, int preempt_offset);
extern void __might_sleep(const char *file, int line, int preempt_offset);
extern void __might_resched(const char *file, int line, unsigned int offsets);
extern void __might_sleep(const char *file, int line);
extern void __cant_sleep(const char *file, int line, int preempt_offset);
extern void __cant_migrate(const char *file, int line);
......@@ -129,7 +129,7 @@ extern void __cant_migrate(const char *file, int line);
* supposed to.
*/
# define might_sleep() \
do { __might_sleep(__FILE__, __LINE__, 0); might_resched(); } while (0)
do { __might_sleep(__FILE__, __LINE__); might_resched(); } while (0)
/**
* cant_sleep - annotation for functions that cannot sleep
*
......@@ -168,10 +168,9 @@ extern void __cant_migrate(const char *file, int line);
*/
# define non_block_end() WARN_ON(current->non_block_count-- == 0)
#else
static inline void ___might_sleep(const char *file, int line,
int preempt_offset) { }
static inline void __might_sleep(const char *file, int line,
int preempt_offset) { }
static inline void __might_resched(const char *file, int line,
unsigned int offsets) { }
static inline void __might_sleep(const char *file, int line) { }
# define might_sleep() do { might_resched(); } while (0)
# define cant_sleep() do { } while (0)
# define cant_migrate() do { } while (0)
......
......@@ -481,23 +481,6 @@ do { \
#endif /* CONFIG_LOCK_STAT */
#ifdef CONFIG_LOCKDEP
/*
* On lockdep we dont want the hand-coded irq-enable of
* _raw_*_lock_flags() code, because lockdep assumes
* that interrupts are not re-enabled during lock-acquire:
*/
#define LOCK_CONTENDED_FLAGS(_lock, try, lock, lockfl, flags) \
LOCK_CONTENDED((_lock), (try), (lock))
#else /* CONFIG_LOCKDEP */
#define LOCK_CONTENDED_FLAGS(_lock, try, lock, lockfl, flags) \
lockfl((_lock), (flags))
#endif /* CONFIG_LOCKDEP */
#ifdef CONFIG_PROVE_LOCKING
extern void print_irqtrace_events(struct task_struct *curr);
#else
......
......@@ -21,7 +21,7 @@ enum lockdep_wait_type {
LD_WAIT_SPIN, /* spin loops, raw_spinlock_t etc.. */
#ifdef CONFIG_PROVE_RAW_LOCK_NESTING
LD_WAIT_CONFIG, /* CONFIG_PREEMPT_LOCK, spinlock_t etc.. */
LD_WAIT_CONFIG, /* preemptible in PREEMPT_RT, spinlock_t etc.. */
#else
LD_WAIT_CONFIG = LD_WAIT_SPIN,
#endif
......
......@@ -124,6 +124,7 @@
#if !defined(CONFIG_PREEMPT_RT)
#define PREEMPT_LOCK_OFFSET PREEMPT_DISABLE_OFFSET
#else
/* Locks on RT do not disable preemption */
#define PREEMPT_LOCK_OFFSET 0
#endif
......
......@@ -30,31 +30,16 @@ do { \
#ifdef CONFIG_DEBUG_SPINLOCK
extern void do_raw_read_lock(rwlock_t *lock) __acquires(lock);
#define do_raw_read_lock_flags(lock, flags) do_raw_read_lock(lock)
extern int do_raw_read_trylock(rwlock_t *lock);
extern void do_raw_read_unlock(rwlock_t *lock) __releases(lock);
extern void do_raw_write_lock(rwlock_t *lock) __acquires(lock);
#define do_raw_write_lock_flags(lock, flags) do_raw_write_lock(lock)
extern int do_raw_write_trylock(rwlock_t *lock);
extern void do_raw_write_unlock(rwlock_t *lock) __releases(lock);
#else
#ifndef arch_read_lock_flags
# define arch_read_lock_flags(lock, flags) arch_read_lock(lock)
#endif
#ifndef arch_write_lock_flags
# define arch_write_lock_flags(lock, flags) arch_write_lock(lock)
#endif
# define do_raw_read_lock(rwlock) do {__acquire(lock); arch_read_lock(&(rwlock)->raw_lock); } while (0)
# define do_raw_read_lock_flags(lock, flags) \
do {__acquire(lock); arch_read_lock_flags(&(lock)->raw_lock, *(flags)); } while (0)
# define do_raw_read_trylock(rwlock) arch_read_trylock(&(rwlock)->raw_lock)
# define do_raw_read_unlock(rwlock) do {arch_read_unlock(&(rwlock)->raw_lock); __release(lock); } while (0)
# define do_raw_write_lock(rwlock) do {__acquire(lock); arch_write_lock(&(rwlock)->raw_lock); } while (0)
# define do_raw_write_lock_flags(lock, flags) \
do {__acquire(lock); arch_write_lock_flags(&(lock)->raw_lock, *(flags)); } while (0)
# define do_raw_write_trylock(rwlock) arch_write_trylock(&(rwlock)->raw_lock)
# define do_raw_write_unlock(rwlock) do {arch_write_unlock(&(rwlock)->raw_lock); __release(lock); } while (0)
#endif
......
......@@ -157,8 +157,7 @@ static inline unsigned long __raw_read_lock_irqsave(rwlock_t *lock)
local_irq_save(flags);
preempt_disable();
rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_);
LOCK_CONTENDED_FLAGS(lock, do_raw_read_trylock, do_raw_read_lock,
do_raw_read_lock_flags, &flags);
LOCK_CONTENDED(lock, do_raw_read_trylock, do_raw_read_lock);
return flags;
}
......@@ -184,8 +183,7 @@ static inline unsigned long __raw_write_lock_irqsave(rwlock_t *lock)
local_irq_save(flags);
preempt_disable();
rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_);
LOCK_CONTENDED_FLAGS(lock, do_raw_write_trylock, do_raw_write_lock,
do_raw_write_lock_flags, &flags);
LOCK_CONTENDED(lock, do_raw_write_trylock, do_raw_write_lock);
return flags;
}
......
......@@ -2037,7 +2037,7 @@ static inline int _cond_resched(void) { return 0; }
#endif /* !defined(CONFIG_PREEMPTION) || defined(CONFIG_PREEMPT_DYNAMIC) */
#define cond_resched() ({ \
___might_sleep(__FILE__, __LINE__, 0); \
__might_resched(__FILE__, __LINE__, 0); \
_cond_resched(); \
})
......@@ -2045,18 +2045,37 @@ extern int __cond_resched_lock(spinlock_t *lock);
extern int __cond_resched_rwlock_read(rwlock_t *lock);
extern int __cond_resched_rwlock_write(rwlock_t *lock);
#define MIGHT_RESCHED_RCU_SHIFT 8
#define MIGHT_RESCHED_PREEMPT_MASK ((1U << MIGHT_RESCHED_RCU_SHIFT) - 1)
#ifndef CONFIG_PREEMPT_RT
/*
* Non RT kernels have an elevated preempt count due to the held lock,
* but are not allowed to be inside a RCU read side critical section
*/
# define PREEMPT_LOCK_RESCHED_OFFSETS PREEMPT_LOCK_OFFSET
#else
/*
* spin/rw_lock() on RT implies rcu_read_lock(). The might_sleep() check in
* cond_resched*lock() has to take that into account because it checks for
* preempt_count() and rcu_preempt_depth().
*/
# define PREEMPT_LOCK_RESCHED_OFFSETS \
(PREEMPT_LOCK_OFFSET + (1U << MIGHT_RESCHED_RCU_SHIFT))
#endif
#define cond_resched_lock(lock) ({ \
___might_sleep(__FILE__, __LINE__, PREEMPT_LOCK_OFFSET);\
__might_resched(__FILE__, __LINE__, PREEMPT_LOCK_RESCHED_OFFSETS); \
__cond_resched_lock(lock); \
})
#define cond_resched_rwlock_read(lock) ({ \
__might_sleep(__FILE__, __LINE__, PREEMPT_LOCK_OFFSET); \
__might_resched(__FILE__, __LINE__, PREEMPT_LOCK_RESCHED_OFFSETS); \
__cond_resched_rwlock_read(lock); \
})
#define cond_resched_rwlock_write(lock) ({ \
__might_sleep(__FILE__, __LINE__, PREEMPT_LOCK_OFFSET); \
__might_resched(__FILE__, __LINE__, PREEMPT_LOCK_RESCHED_OFFSETS); \
__cond_resched_rwlock_write(lock); \
})
......
......@@ -177,7 +177,6 @@ do { \
#ifdef CONFIG_DEBUG_SPINLOCK
extern void do_raw_spin_lock(raw_spinlock_t *lock) __acquires(lock);
#define do_raw_spin_lock_flags(lock, flags) do_raw_spin_lock(lock)
extern int do_raw_spin_trylock(raw_spinlock_t *lock);
extern void do_raw_spin_unlock(raw_spinlock_t *lock) __releases(lock);
#else
......@@ -188,18 +187,6 @@ static inline void do_raw_spin_lock(raw_spinlock_t *lock) __acquires(lock)
mmiowb_spin_lock();
}
#ifndef arch_spin_lock_flags
#define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
#endif
static inline void
do_raw_spin_lock_flags(raw_spinlock_t *lock, unsigned long *flags) __acquires(lock)
{
__acquire(lock);
arch_spin_lock_flags(&lock->raw_lock, *flags);
mmiowb_spin_lock();
}
static inline int do_raw_spin_trylock(raw_spinlock_t *lock)
{
int ret = arch_spin_trylock(&(lock)->raw_lock);
......
......@@ -108,16 +108,7 @@ static inline unsigned long __raw_spin_lock_irqsave(raw_spinlock_t *lock)
local_irq_save(flags);
preempt_disable();
spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
/*
* On lockdep we dont want the hand-coded irq-enable of
* do_raw_spin_lock_flags() code, because lockdep assumes
* that interrupts are not re-enabled during lock-acquire:
*/
#ifdef CONFIG_LOCKDEP
LOCK_CONTENDED(lock, do_raw_spin_trylock, do_raw_spin_lock);
#else
do_raw_spin_lock_flags(lock, &flags);
#endif
return flags;
}
......
......@@ -62,7 +62,6 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
#define arch_spin_is_locked(lock) ((void)(lock), 0)
/* for sched/core.c and kernel_lock.c: */
# define arch_spin_lock(lock) do { barrier(); (void)(lock); } while (0)
# define arch_spin_lock_flags(lock, flags) do { barrier(); (void)(lock); } while (0)
# define arch_spin_unlock(lock) do { barrier(); (void)(lock); } while (0)
# define arch_spin_trylock(lock) ({ barrier(); (void)(lock); 1; })
#endif /* DEBUG_SPINLOCK */
......
......@@ -58,6 +58,7 @@ struct mq_attr;
struct compat_stat;
struct old_timeval32;
struct robust_list_head;
struct futex_waitv;
struct getcpu_cache;
struct old_linux_dirent;
struct perf_event_attr;
......@@ -610,7 +611,7 @@ asmlinkage long sys_waitid(int which, pid_t pid,
asmlinkage long sys_set_tid_address(int __user *tidptr);
asmlinkage long sys_unshare(unsigned long unshare_flags);
/* kernel/futex.c */
/* kernel/futex/syscalls.c */
asmlinkage long sys_futex(u32 __user *uaddr, int op, u32 val,
const struct __kernel_timespec __user *utime,
u32 __user *uaddr2, u32 val3);
......@@ -623,6 +624,10 @@ asmlinkage long sys_get_robust_list(int pid,
asmlinkage long sys_set_robust_list(struct robust_list_head __user *head,
size_t len);
asmlinkage long sys_futex_waitv(struct futex_waitv *waiters,
unsigned int nr_futexes, unsigned int flags,
struct __kernel_timespec __user *timeout, clockid_t clockid);
/* kernel/hrtimer.c */
asmlinkage long sys_nanosleep(struct __kernel_timespec __user *rqtp,
struct __kernel_timespec __user *rmtp);
......
......@@ -28,12 +28,10 @@
#ifndef CONFIG_PREEMPT_RT
#define WW_MUTEX_BASE mutex
#define ww_mutex_base_init(l,n,k) __mutex_init(l,n,k)
#define ww_mutex_base_trylock(l) mutex_trylock(l)
#define ww_mutex_base_is_locked(b) mutex_is_locked((b))
#else
#define WW_MUTEX_BASE rt_mutex
#define ww_mutex_base_init(l,n,k) __rt_mutex_init(l,n,k)
#define ww_mutex_base_trylock(l) rt_mutex_trylock(l)
#define ww_mutex_base_is_locked(b) rt_mutex_base_is_locked(&(b)->rtmutex)
#endif
......@@ -339,17 +337,8 @@ ww_mutex_lock_slow_interruptible(struct ww_mutex *lock,
extern void ww_mutex_unlock(struct ww_mutex *lock);
/**
* ww_mutex_trylock - tries to acquire the w/w mutex without acquire context
* @lock: mutex to lock
*
* Trylocks a mutex without acquire context, so no deadlock detection is
* possible. Returns 1 if the mutex has been acquired successfully, 0 otherwise.
*/
static inline int __must_check ww_mutex_trylock(struct ww_mutex *lock)
{
return ww_mutex_base_trylock(&lock->base);
}
extern int __must_check ww_mutex_trylock(struct ww_mutex *lock,
struct ww_acquire_ctx *ctx);
/***
* ww_mutex_destroy - mark a w/w mutex unusable
......
......@@ -880,8 +880,11 @@ __SYSCALL(__NR_memfd_secret, sys_memfd_secret)
#define __NR_process_mrelease 448
__SYSCALL(__NR_process_mrelease, sys_process_mrelease)
#define __NR_futex_waitv 449
__SYSCALL(__NR_futex_waitv, sys_futex_waitv)
#undef __NR_syscalls
#define __NR_syscalls 449
#define __NR_syscalls 450
/*
* 32 bit systems traditionally used different
......
......@@ -43,6 +43,31 @@
#define FUTEX_CMP_REQUEUE_PI_PRIVATE (FUTEX_CMP_REQUEUE_PI | \
FUTEX_PRIVATE_FLAG)
/*
* Flags to specify the bit length of the futex word for futex2 syscalls.
* Currently, only 32 is supported.
*/
#define FUTEX_32 2
/*
* Max numbers of elements in a futex_waitv array
*/
#define FUTEX_WAITV_MAX 128
/**
* struct futex_waitv - A waiter for vectorized wait
* @val: Expected value at uaddr
* @uaddr: User address to wait on
* @flags: Flags for this waiter
* @__reserved: Reserved member to preserve data alignment. Should be 0.
*/
struct futex_waitv {
__u64 val;
__u64 uaddr;
__u32 flags;
__u32 __reserved;
};
/*
* Support for robust futexes: the kernel cleans up held futexes at
* thread exit time.
......
......@@ -59,7 +59,7 @@ obj-$(CONFIG_FREEZER) += freezer.o
obj-$(CONFIG_PROFILING) += profile.o
obj-$(CONFIG_STACKTRACE) += stacktrace.o
obj-y += time/
obj-$(CONFIG_FUTEX) += futex.o
obj-$(CONFIG_FUTEX) += futex/
obj-$(CONFIG_GENERIC_ISA_DMA) += dma.o
obj-$(CONFIG_SMP) += smp.o
ifneq ($(CONFIG_SMP),y)
......
This source diff could not be displayed because it is too large. You can view the blob instead.
# SPDX-License-Identifier: GPL-2.0
obj-y += core.o syscalls.o pi.o requeue.o waitwake.o
This diff is collapsed.
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _FUTEX_H
#define _FUTEX_H
#include <linux/futex.h>
#include <linux/sched/wake_q.h>
#ifdef CONFIG_PREEMPT_RT
#include <linux/rcuwait.h>
#endif
#include <asm/futex.h>
/*
* Futex flags used to encode options to functions and preserve them across
* restarts.
*/
#ifdef CONFIG_MMU
# define FLAGS_SHARED 0x01
#else
/*
* NOMMU does not have per process address space. Let the compiler optimize
* code away.
*/
# define FLAGS_SHARED 0x00
#endif
#define FLAGS_CLOCKRT 0x02
#define FLAGS_HAS_TIMEOUT 0x04
#ifdef CONFIG_HAVE_FUTEX_CMPXCHG
#define futex_cmpxchg_enabled 1
#else
extern int __read_mostly futex_cmpxchg_enabled;
#endif
#ifdef CONFIG_FAIL_FUTEX
extern bool should_fail_futex(bool fshared);
#else
static inline bool should_fail_futex(bool fshared)
{
return false;
}
#endif
/*
* Hash buckets are shared by all the futex_keys that hash to the same
* location. Each key may have multiple futex_q structures, one for each task
* waiting on a futex.
*/
struct futex_hash_bucket {
atomic_t waiters;
spinlock_t lock;
struct plist_head chain;
} ____cacheline_aligned_in_smp;
/*
* Priority Inheritance state:
*/
struct futex_pi_state {
/*
* list of 'owned' pi_state instances - these have to be
* cleaned up in do_exit() if the task exits prematurely:
*/
struct list_head list;
/*
* The PI object:
*/
struct rt_mutex_base pi_mutex;
struct task_struct *owner;
refcount_t refcount;
union futex_key key;
} __randomize_layout;
/**
* struct futex_q - The hashed futex queue entry, one per waiting task
* @list: priority-sorted list of tasks waiting on this futex
* @task: the task waiting on the futex
* @lock_ptr: the hash bucket lock
* @key: the key the futex is hashed on
* @pi_state: optional priority inheritance state
* @rt_waiter: rt_waiter storage for use with requeue_pi
* @requeue_pi_key: the requeue_pi target futex key
* @bitset: bitset for the optional bitmasked wakeup
* @requeue_state: State field for futex_requeue_pi()
* @requeue_wait: RCU wait for futex_requeue_pi() (RT only)
*
* We use this hashed waitqueue, instead of a normal wait_queue_entry_t, so
* we can wake only the relevant ones (hashed queues may be shared).
*
* A futex_q has a woken state, just like tasks have TASK_RUNNING.
* It is considered woken when plist_node_empty(&q->list) || q->lock_ptr == 0.
* The order of wakeup is always to make the first condition true, then
* the second.
*
* PI futexes are typically woken before they are removed from the hash list via
* the rt_mutex code. See futex_unqueue_pi().
*/
struct futex_q {
struct plist_node list;
struct task_struct *task;
spinlock_t *lock_ptr;
union futex_key key;
struct futex_pi_state *pi_state;
struct rt_mutex_waiter *rt_waiter;
union futex_key *requeue_pi_key;
u32 bitset;
atomic_t requeue_state;
#ifdef CONFIG_PREEMPT_RT
struct rcuwait requeue_wait;
#endif
} __randomize_layout;
extern const struct futex_q futex_q_init;
enum futex_access {
FUTEX_READ,
FUTEX_WRITE
};
extern int get_futex_key(u32 __user *uaddr, bool fshared, union futex_key *key,
enum futex_access rw);
extern struct hrtimer_sleeper *
futex_setup_timer(ktime_t *time, struct hrtimer_sleeper *timeout,
int flags, u64 range_ns);
extern struct futex_hash_bucket *futex_hash(union futex_key *key);
/**
* futex_match - Check whether two futex keys are equal
* @key1: Pointer to key1
* @key2: Pointer to key2
*
* Return 1 if two futex_keys are equal, 0 otherwise.
*/
static inline int futex_match(union futex_key *key1, union futex_key *key2)
{
return (key1 && key2
&& key1->both.word == key2->both.word
&& key1->both.ptr == key2->both.ptr
&& key1->both.offset == key2->both.offset);
}
extern int futex_wait_setup(u32 __user *uaddr, u32 val, unsigned int flags,
struct futex_q *q, struct futex_hash_bucket **hb);
extern void futex_wait_queue(struct futex_hash_bucket *hb, struct futex_q *q,
struct hrtimer_sleeper *timeout);
extern void futex_wake_mark(struct wake_q_head *wake_q, struct futex_q *q);
extern int fault_in_user_writeable(u32 __user *uaddr);
extern int futex_cmpxchg_value_locked(u32 *curval, u32 __user *uaddr, u32 uval, u32 newval);
extern int futex_get_value_locked(u32 *dest, u32 __user *from);
extern struct futex_q *futex_top_waiter(struct futex_hash_bucket *hb, union futex_key *key);
extern void __futex_unqueue(struct futex_q *q);
extern void __futex_queue(struct futex_q *q, struct futex_hash_bucket *hb);
extern int futex_unqueue(struct futex_q *q);
/**
* futex_queue() - Enqueue the futex_q on the futex_hash_bucket
* @q: The futex_q to enqueue
* @hb: The destination hash bucket
*
* The hb->lock must be held by the caller, and is released here. A call to
* futex_queue() is typically paired with exactly one call to futex_unqueue(). The
* exceptions involve the PI related operations, which may use futex_unqueue_pi()
* or nothing if the unqueue is done as part of the wake process and the unqueue
* state is implicit in the state of woken task (see futex_wait_requeue_pi() for
* an example).
*/
static inline void futex_queue(struct futex_q *q, struct futex_hash_bucket *hb)
__releases(&hb->lock)
{
__futex_queue(q, hb);
spin_unlock(&hb->lock);
}
extern void futex_unqueue_pi(struct futex_q *q);
extern void wait_for_owner_exiting(int ret, struct task_struct *exiting);
/*
* Reflects a new waiter being added to the waitqueue.
*/
static inline void futex_hb_waiters_inc(struct futex_hash_bucket *hb)
{
#ifdef CONFIG_SMP
atomic_inc(&hb->waiters);
/*
* Full barrier (A), see the ordering comment above.
*/
smp_mb__after_atomic();
#endif
}
/*
* Reflects a waiter being removed from the waitqueue by wakeup
* paths.
*/
static inline void futex_hb_waiters_dec(struct futex_hash_bucket *hb)
{
#ifdef CONFIG_SMP
atomic_dec(&hb->waiters);
#endif
}
static inline int futex_hb_waiters_pending(struct futex_hash_bucket *hb)
{
#ifdef CONFIG_SMP
/*
* Full barrier (B), see the ordering comment above.
*/
smp_mb();
return atomic_read(&hb->waiters);
#else
return 1;
#endif
}
extern struct futex_hash_bucket *futex_q_lock(struct futex_q *q);
extern void futex_q_unlock(struct futex_hash_bucket *hb);
extern int futex_lock_pi_atomic(u32 __user *uaddr, struct futex_hash_bucket *hb,
union futex_key *key,
struct futex_pi_state **ps,
struct task_struct *task,
struct task_struct **exiting,
int set_waiters);
extern int refill_pi_state_cache(void);
extern void get_pi_state(struct futex_pi_state *pi_state);
extern void put_pi_state(struct futex_pi_state *pi_state);
extern int fixup_pi_owner(u32 __user *uaddr, struct futex_q *q, int locked);
/*
* Express the locking dependencies for lockdep:
*/
static inline void
double_lock_hb(struct futex_hash_bucket *hb1, struct futex_hash_bucket *hb2)
{
if (hb1 > hb2)
swap(hb1, hb2);
spin_lock(&hb1->lock);
if (hb1 != hb2)
spin_lock_nested(&hb2->lock, SINGLE_DEPTH_NESTING);
}
static inline void
double_unlock_hb(struct futex_hash_bucket *hb1, struct futex_hash_bucket *hb2)
{
spin_unlock(&hb1->lock);
if (hb1 != hb2)
spin_unlock(&hb2->lock);
}
/* syscalls */
extern int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags, u32
val, ktime_t *abs_time, u32 bitset, u32 __user
*uaddr2);
extern int futex_requeue(u32 __user *uaddr1, unsigned int flags,
u32 __user *uaddr2, int nr_wake, int nr_requeue,
u32 *cmpval, int requeue_pi);
extern int futex_wait(u32 __user *uaddr, unsigned int flags, u32 val,
ktime_t *abs_time, u32 bitset);
/**
* struct futex_vector - Auxiliary struct for futex_waitv()
* @w: Userspace provided data
* @q: Kernel side data
*
* Struct used to build an array with all data need for futex_waitv()
*/
struct futex_vector {
struct futex_waitv w;
struct futex_q q;
};
extern int futex_wait_multiple(struct futex_vector *vs, unsigned int count,
struct hrtimer_sleeper *to);
extern int futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset);
extern int futex_wake_op(u32 __user *uaddr1, unsigned int flags,
u32 __user *uaddr2, int nr_wake, int nr_wake2, int op);
extern int futex_unlock_pi(u32 __user *uaddr, unsigned int flags);
extern int futex_lock_pi(u32 __user *uaddr, unsigned int flags, ktime_t *time, int trylock);
#endif /* _FUTEX_H */
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
......@@ -4671,7 +4671,7 @@ print_lock_invalid_wait_context(struct task_struct *curr,
/*
* Verify the wait_type context.
*
* This check validates we takes locks in the right wait-type order; that is it
* This check validates we take locks in the right wait-type order; that is it
* ensures that we do not take mutexes inside spinlocks and do not attempt to
* acquire spinlocks inside raw_spinlocks and the sort.
*
......@@ -5366,7 +5366,7 @@ int __lock_is_held(const struct lockdep_map *lock, int read)
struct held_lock *hlock = curr->held_locks + i;
if (match_held_lock(hlock, lock)) {
if (read == -1 || hlock->read == read)
if (read == -1 || !!hlock->read == read)
return LOCK_STATE_HELD;
return LOCK_STATE_NOT_HELD;
......
......@@ -94,6 +94,9 @@ static inline unsigned long __owner_flags(unsigned long owner)
return owner & MUTEX_FLAGS;
}
/*
* Returns: __mutex_owner(lock) on failure or NULL on success.
*/
static inline struct task_struct *__mutex_trylock_common(struct mutex *lock, bool handoff)
{
unsigned long owner, curr = (unsigned long)current;
......@@ -348,13 +351,16 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner,
{
bool ret = true;
rcu_read_lock();
lockdep_assert_preemption_disabled();
while (__mutex_owner(lock) == owner) {
/*
* Ensure we emit the owner->on_cpu, dereference _after_
* checking lock->owner still matches owner. If that fails,
* owner might point to freed memory. If it still matches,
* the rcu_read_lock() ensures the memory stays valid.
* checking lock->owner still matches owner. And we already
* disabled preemption which is equal to the RCU read-side
* crital section in optimistic spinning code. Thus the
* task_strcut structure won't go away during the spinning
* period
*/
barrier();
......@@ -374,7 +380,6 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner,
cpu_relax();
}
rcu_read_unlock();
return ret;
}
......@@ -387,19 +392,25 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock)
struct task_struct *owner;
int retval = 1;
lockdep_assert_preemption_disabled();
if (need_resched())
return 0;
rcu_read_lock();
/*
* We already disabled preemption which is equal to the RCU read-side
* crital section in optimistic spinning code. Thus the task_strcut
* structure won't go away during the spinning period.
*/
owner = __mutex_owner(lock);
/*
* As lock holder preemption issue, we both skip spinning if task is not
* on cpu or its cpu is preempted
*/
if (owner)
retval = owner->on_cpu && !vcpu_is_preempted(task_cpu(owner));
rcu_read_unlock();
/*
* If lock->owner is not set, the mutex has been released. Return true
......@@ -736,6 +747,44 @@ __ww_mutex_lock(struct mutex *lock, unsigned int state, unsigned int subclass,
return __mutex_lock_common(lock, state, subclass, NULL, ip, ww_ctx, true);
}
/**
* ww_mutex_trylock - tries to acquire the w/w mutex with optional acquire context
* @ww: mutex to lock
* @ww_ctx: optional w/w acquire context
*
* Trylocks a mutex with the optional acquire context; no deadlock detection is
* possible. Returns 1 if the mutex has been acquired successfully, 0 otherwise.
*
* Unlike ww_mutex_lock, no deadlock handling is performed. However, if a @ctx is
* specified, -EALREADY handling may happen in calls to ww_mutex_trylock.
*
* A mutex acquired with this function must be released with ww_mutex_unlock.
*/
int ww_mutex_trylock(struct ww_mutex *ww, struct ww_acquire_ctx *ww_ctx)
{
if (!ww_ctx)
return mutex_trylock(&ww->base);
MUTEX_WARN_ON(ww->base.magic != &ww->base);
/*
* Reset the wounded flag after a kill. No other process can
* race and wound us here, since they can't have a valid owner
* pointer if we don't have any locks held.
*/
if (ww_ctx->acquired == 0)
ww_ctx->wounded = 0;
if (__mutex_trylock(&ww->base)) {
ww_mutex_set_context_fastpath(ww, ww_ctx);
mutex_acquire_nest(&ww->base.dep_map, 0, 1, &ww_ctx->dep_map, _RET_IP_);
return 1;
}
return 0;
}
EXPORT_SYMBOL(ww_mutex_trylock);
#ifdef CONFIG_DEBUG_LOCK_ALLOC
void __sched
mutex_lock_nested(struct mutex *lock, unsigned int subclass)
......
......@@ -446,19 +446,26 @@ static __always_inline void rt_mutex_adjust_prio(struct task_struct *p)
}
/* RT mutex specific wake_q wrappers */
static __always_inline void rt_mutex_wake_q_add(struct rt_wake_q_head *wqh,
struct rt_mutex_waiter *w)
static __always_inline void rt_mutex_wake_q_add_task(struct rt_wake_q_head *wqh,
struct task_struct *task,
unsigned int wake_state)
{
if (IS_ENABLED(CONFIG_PREEMPT_RT) && w->wake_state != TASK_NORMAL) {
if (IS_ENABLED(CONFIG_PREEMPT_RT) && wake_state == TASK_RTLOCK_WAIT) {
if (IS_ENABLED(CONFIG_PROVE_LOCKING))
WARN_ON_ONCE(wqh->rtlock_task);
get_task_struct(w->task);
wqh->rtlock_task = w->task;
get_task_struct(task);
wqh->rtlock_task = task;
} else {
wake_q_add(&wqh->head, w->task);
wake_q_add(&wqh->head, task);
}
}
static __always_inline void rt_mutex_wake_q_add(struct rt_wake_q_head *wqh,
struct rt_mutex_waiter *w)
{
rt_mutex_wake_q_add_task(wqh, w->task, w->wake_state);
}
static __always_inline void rt_mutex_wake_up_q(struct rt_wake_q_head *wqh)
{
if (IS_ENABLED(CONFIG_PREEMPT_RT) && wqh->rtlock_task) {
......
......@@ -59,8 +59,7 @@ static __always_inline int rwbase_read_trylock(struct rwbase_rt *rwb)
* set.
*/
for (r = atomic_read(&rwb->readers); r < 0;) {
/* Fully-ordered if cmpxchg() succeeds, provides ACQUIRE */
if (likely(atomic_try_cmpxchg(&rwb->readers, &r, r + 1)))
if (likely(atomic_try_cmpxchg_acquire(&rwb->readers, &r, r + 1)))
return 1;
}
return 0;
......@@ -148,6 +147,7 @@ static void __sched __rwbase_read_unlock(struct rwbase_rt *rwb,
{
struct rt_mutex_base *rtm = &rwb->rtmutex;
struct task_struct *owner;
DEFINE_RT_WAKE_Q(wqh);
raw_spin_lock_irq(&rtm->wait_lock);
/*
......@@ -158,9 +158,12 @@ static void __sched __rwbase_read_unlock(struct rwbase_rt *rwb,
*/
owner = rt_mutex_owner(rtm);
if (owner)
wake_up_state(owner, state);
rt_mutex_wake_q_add_task(&wqh, owner, state);
/* Pairs with the preempt_enable in rt_mutex_wake_up_q() */
preempt_disable();
raw_spin_unlock_irq(&rtm->wait_lock);
rt_mutex_wake_up_q(&wqh);
}
static __always_inline void rwbase_read_unlock(struct rwbase_rt *rwb,
......@@ -183,7 +186,7 @@ static inline void __rwbase_write_unlock(struct rwbase_rt *rwb, int bias,
/*
* _release() is needed in case that reader is in fast path, pairing
* with atomic_try_cmpxchg() in rwbase_read_trylock(), provides RELEASE
* with atomic_try_cmpxchg_acquire() in rwbase_read_trylock().
*/
(void)atomic_add_return_release(READER_BIAS - bias, &rwb->readers);
raw_spin_unlock_irqrestore(&rtm->wait_lock, flags);
......
......@@ -56,7 +56,6 @@
*
* A fast path reader optimistic lock stealing is supported when the rwsem
* is previously owned by a writer and the following conditions are met:
* - OSQ is empty
* - rwsem is not currently writer owned
* - the handoff isn't set.
*/
......@@ -485,7 +484,7 @@ static void rwsem_mark_wake(struct rw_semaphore *sem,
/*
* Limit # of readers that can be woken up per wakeup call.
*/
if (woken >= MAX_READERS_WAKEUP)
if (unlikely(woken >= MAX_READERS_WAKEUP))
break;
}
......@@ -577,6 +576,24 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
return true;
}
/*
* The rwsem_spin_on_owner() function returns the following 4 values
* depending on the lock owner state.
* OWNER_NULL : owner is currently NULL
* OWNER_WRITER: when owner changes and is a writer
* OWNER_READER: when owner changes and the new owner may be a reader.
* OWNER_NONSPINNABLE:
* when optimistic spinning has to stop because either the
* owner stops running, is unknown, or its timeslice has
* been used up.
*/
enum owner_state {
OWNER_NULL = 1 << 0,
OWNER_WRITER = 1 << 1,
OWNER_READER = 1 << 2,
OWNER_NONSPINNABLE = 1 << 3,
};
#ifdef CONFIG_RWSEM_SPIN_ON_OWNER
/*
* Try to acquire write lock before the writer has been put on wait queue.
......@@ -617,7 +634,10 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
}
preempt_disable();
rcu_read_lock();
/*
* Disable preemption is equal to the RCU read-side crital section,
* thus the task_strcut structure won't go away.
*/
owner = rwsem_owner_flags(sem, &flags);
/*
* Don't check the read-owner as the entry may be stale.
......@@ -625,30 +645,12 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
if ((flags & RWSEM_NONSPINNABLE) ||
(owner && !(flags & RWSEM_READER_OWNED) && !owner_on_cpu(owner)))
ret = false;
rcu_read_unlock();
preempt_enable();
lockevent_cond_inc(rwsem_opt_fail, !ret);
return ret;
}
/*
* The rwsem_spin_on_owner() function returns the following 4 values
* depending on the lock owner state.
* OWNER_NULL : owner is currently NULL
* OWNER_WRITER: when owner changes and is a writer
* OWNER_READER: when owner changes and the new owner may be a reader.
* OWNER_NONSPINNABLE:
* when optimistic spinning has to stop because either the
* owner stops running, is unknown, or its timeslice has
* been used up.
*/
enum owner_state {
OWNER_NULL = 1 << 0,
OWNER_WRITER = 1 << 1,
OWNER_READER = 1 << 2,
OWNER_NONSPINNABLE = 1 << 3,
};
#define OWNER_SPINNABLE (OWNER_NULL | OWNER_WRITER | OWNER_READER)
static inline enum owner_state
......@@ -670,12 +672,13 @@ rwsem_spin_on_owner(struct rw_semaphore *sem)
unsigned long flags, new_flags;
enum owner_state state;
lockdep_assert_preemption_disabled();
owner = rwsem_owner_flags(sem, &flags);
state = rwsem_owner_state(owner, flags);
if (state != OWNER_WRITER)
return state;
rcu_read_lock();
for (;;) {
/*
* When a waiting writer set the handoff flag, it may spin
......@@ -693,7 +696,9 @@ rwsem_spin_on_owner(struct rw_semaphore *sem)
* Ensure we emit the owner->on_cpu, dereference _after_
* checking sem->owner still matches owner, if that fails,
* owner might point to free()d memory, if it still matches,
* the rcu_read_lock() ensures the memory stays valid.
* our spinning context already disabled preemption which is
* equal to RCU read-side crital section ensures the memory
* stays valid.
*/
barrier();
......@@ -704,7 +709,6 @@ rwsem_spin_on_owner(struct rw_semaphore *sem)
cpu_relax();
}
rcu_read_unlock();
return state;
}
......@@ -878,12 +882,11 @@ static inline bool rwsem_optimistic_spin(struct rw_semaphore *sem)
static inline void clear_nonspinnable(struct rw_semaphore *sem) { }
static inline int
static inline enum owner_state
rwsem_spin_on_owner(struct rw_semaphore *sem)
{
return 0;
return OWNER_NONSPINNABLE;
}
#define OWNER_NULL 1
#endif
/*
......@@ -1095,9 +1098,16 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
* In this case, we attempt to acquire the lock again
* without sleeping.
*/
if (wstate == WRITER_HANDOFF &&
rwsem_spin_on_owner(sem) == OWNER_NULL)
if (wstate == WRITER_HANDOFF) {
enum owner_state owner_state;
preempt_disable();
owner_state = rwsem_spin_on_owner(sem);
preempt_enable();
if (owner_state == OWNER_NULL)
goto trylock_again;
}
/* Block until there are no active lockers. */
for (;;) {
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment