Commits · 8d32588077bdc390420cfa6946f407033a20d7a8 · Kirill Smelkov / linux

01 Nov, 2018 5 commits

locking/atomics: Check generated headers are up-to-date · 8d325880

Mark Rutland authored Sep 04, 2018

Now that all the generated atomic headers are in place, it would be good
to ensure that:

a) the headers are up-to-date when scripting changes.

b) developers don't directly modify the generated headers.

To ensure both of these properties, let's add a Kbuild step to check
that the generated headers are up-to-date.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: catalin.marinas@arm.com
Cc: Will Deacon <will.deacon@arm.com>
Cc: linuxdrivers@attotech.com
Cc: dvyukov@google.com
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: arnd@arndb.de
Cc: aryabinin@virtuozzo.com
Cc: glider@google.com
Link: http://lkml.kernel.org/r/20180904104830.2975-6-mark.rutland@arm.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

8d325880

locking/atomics: Switch to generated instrumentation · aa525d06

Mark Rutland authored Sep 04, 2018

As a step towards ensuring the atomic* APIs are consistent, let's switch
to wrappers generated by gen-atomic-instrumented.h, using the same table
used to generate the fallbacks and atomic-long wrappers.

These are checked in rather than generated with Kbuild, since:

* This allows inspection of the atomics with git grep and ctags on a
  pristine tree, which Linus strongly prefers being able to do.

* The fallbacks are not affected by machine details or configuration
  options, so it is not necessary to regenerate them to take these into
  account.

* These are included by files required *very* early in the build process
  (e.g. for generating bounds.h), and we'd rather not complicate the
  top-level Kbuild file with dependencies.

Generating the atomic headers means that the instrumented wrappers will
remain in sync with the rest of the atomic APIs, and we gain all the
ordering variants of each atomic without having to manually expanded
them all.

The KASAN checks are automatically generated based on the function
parameters defined in atomics.tbl. Note that try_cmpxchg() now correctly
treats 'old' as a parameter that may be written to, and not only read as
the hand-written instrumentation assumed.

Other than the change to try_cmpxchg(), existing code should not be
affected by this patch. The patch introduces instrumentation for all
optional atomics (and ordering variants), along with the ifdeffery this
requires, enabling other architectures to make use of the instrumented
atomics.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: catalin.marinas@arm.com
Cc: linuxdrivers@attotech.com
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Link: http://lkml.kernel.org/r/20180904104830.2975-5-mark.rutland@arm.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

aa525d06

locking/atomics: Switch to generated atomic-long · b5d47ef9

Mark Rutland authored Sep 04, 2018

As a step towards ensuring the atomic* APIs are consistent, let's switch
to wrappers generated by gen-atomic-long.h, using the same table that
gen-atomic-fallbacks.h uses to fill in gaps in the atomic_* and
atomic64_* APIs.

These are checked in rather than generated with Kbuild, since:

* This allows inspection of the atomics with git grep and ctags on a
  pristine tree, which Linus strongly prefers being able to do.

* The fallbacks are not affected by machine details or configuration
  options, so it is not necessary to regenerate them to take these into
  account.

* These are included by files required *very* early in the build process
  (e.g. for generating bounds.h), and we'd rather not complicate the
  top-level Kbuild file with dependencies.

Other than *_INIT() and *_cond_read_acquire(), all API functions are
implemented as static inline C functions, ensuring consistent type
promotion and/or truncation without requiring explicit casts to be
applied to parameters or return values.

Since we typedef atomic_long_t to either atomic_t or atomic64_t, we know
these types are equivalent, and don't require explicit casts between
them. However, as the try_cmpxchg*() functions take a pointer for the
'old' parameter, which may be an int or s64, an explicit cast is
generated for this.

There should be no functional change as a result of this patch (i.e.
existing code should not be affected). However, this introduces a number
of functions into the atomic_long_* API, bringing it into line with the
atomic_* and atomic64_* APIs.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: catalin.marinas@arm.com
Cc: linuxdrivers@attotech.com
Cc: dvyukov@google.com
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: aryabinin@virtuozzo.com
Cc: glider@google.com
Link: http://lkml.kernel.org/r/20180904104830.2975-4-mark.rutland@arm.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

b5d47ef9

locking/atomics: Switch to generated fallbacks · 9fa45070

Mark Rutland authored Sep 04, 2018

As a step to ensuring the atomic* APIs are consistent, switch to fallbacks
generated by gen-atomic-fallback.sh.

These are checked in rather than generated with Kbuild, since:

* This allows inspection of the atomics with git grep and ctags on a
  pristine tree, which Linus strongly prefers being able to do.

* The fallbacks are not affected by machine details or configuration
  options, so it is not necessary to regenerate them to take these into
  account.

* These are included by files required *very* early in the build process
  (e.g. for generating bounds.h), and we'd rather not complicate the
  top-level Kbuild file with dependencies.

The new fallback header should be equivalent to the old fallbacks in
<linux/atomic.h>, but:

* It is formatted a little differently due to scripting ensuring things
  are more regular than they used to be.

* Fallbacks are now expanded in-place as static inline functions rather
  than macros.

* The prototypes for fallbacks are arragned consistently with the return
  type on a separate line to try to keep to a sensible line length.

There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: catalin.marinas@arm.com
Cc: linuxdrivers@attotech.com
Cc: dvyukov@google.com
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: arnd@arndb.de
Cc: aryabinin@virtuozzo.com
Cc: glider@google.com
Link: http://lkml.kernel.org/r/20180904104830.2975-3-mark.rutland@arm.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

9fa45070

locking/atomics: Add common header generation files · ace9bad4

Mark Rutland authored Sep 04, 2018

To minimize repetition, to allow for future rework, and to ensure
regularity of the various atomic APIs, we'd like to automatically
generate (the bulk of) a number of headers related to atomics.

This patch adds the infrastructure to do so, leaving actual conversion
of headers to subsequent patches. This infrastructure consists of:

* atomics.tbl - a table describing the functions in the atomics API,
  with names, prototypes, and metadata describing the variants that
  exist (e.g fetch/return, acquire/release/relaxed). Note that the
  return type is dependent on the particular variant.

* atomic-tbl.sh - a library of routines useful for dealing with
  atomics.tbl (e.g. querying which variants exist, or generating
  argument/parameter lists for a given function variant).

* gen-atomic-fallback.sh - a script which generates a header of
  fallbacks, covering cases where architecture omit certain functions
  (e.g. omitting relaxed variants).

* gen-atomic-long.sh - a script which generates wrappers providing the
  atomic_long API atomic of the relevant atomic or atomic64 API,
  ensuring the APIs are consistent.

* gen-atomic-instrumented.sh - a script which generates atomic* wrappers
  atop of arch_atomic* functions, with automatically generated KASAN
  instrumentation.

* fallbacks/* - a set of fallback implementations for atomics, which
  should be used when no implementation of a given atomic is provided.
  These are used by gen-atomic-fallback.sh to generate fallbacks, and
  these are also used by other scripts to determine the set of optional
  atomics (as required to generate preprocessor guards correctly).

  Fallbacks may use the following variables:

  ${atomic}     atomic prefix: atomic/atomic64/atomic_long, which can be
		used to derive the atomic type, and to prefix functions

  ${int}        integer type: int/s64/long

  ${pfx}        variant prefix, e.g. fetch_

  ${name}       base function name, e.g. add

  ${sfx}        variant suffix, e.g. _return

  ${order}      order suffix, e.g. _relaxed

  ${atomicname} full name, e.g. atomic64_fetch_add_relaxed

  ${ret}        return type of the function, e.g. void

  ${retstmt}    a return statement (with a trailing space), unless the
                variant returns void

  ${params}     parameter list for the function declaration, e.g.
                "int i, atomic_t *v"

  ${args}       argument list for invoking the function, e.g. "i, v"

  ... for clarity, ${ret}, ${retstmt}, ${params}, and ${args} are
  open-coded for fallbacks where these do not vary, or are critical to
  understanding the logic of the fallback.

The MAINTAINERS entry for the atomic infrastructure is updated to cover
the new scripts.

There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: catalin.marinas@arm.com
Cc: Will Deacon <will.deacon@arm.com>
Cc: linuxdrivers@attotech.com
Cc: dvyukov@google.com
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: arnd@arndb.de
Cc: aryabinin@virtuozzo.com
Cc: glider@google.com
Link: http://lkml.kernel.org/r/20180904104830.2975-2-mark.rutland@arm.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

ace9bad4

19 Oct, 2018 2 commits

locking/lockdep: Make global debug_locks* variables read-mostly · 01a14bda

Waiman Long authored Oct 18, 2018

Make the frequently used lockdep global variable debug_locks read-mostly.
As debug_locks_silent is sometime used together with debug_locks,
it is also made read-mostly so that they can be close together.

With false cacheline sharing, cacheline contention problem can happen
depending on what get put into the same cacheline as debug_locks.
Signed-off-by: Waiman Long <longman@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/1539913518-15598-2-git-send-email-longman@redhat.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

01a14bda

locking/lockdep: Fix debug_locks off performance problem · 9506a742

Waiman Long authored Oct 18, 2018

It was found that when debug_locks was turned off because of a problem
found by the lockdep code, the system performance could drop quite
significantly when the lock_stat code was also configured into the
kernel. For instance, parallel kernel build time on a 4-socket x86-64
server nearly doubled.

Further analysis into the cause of the slowdown traced back to the
frequent call to debug_locks_off() from the __lock_acquired() function
probably due to some inconsistent lockdep states with debug_locks
off. The debug_locks_off() function did an unconditional atomic xchg
to write a 0 value into debug_locks which had already been set to 0.
This led to severe cacheline contention in the cacheline that held
debug_locks.  As debug_locks is being referenced in quite a few different
places in the kernel, this greatly slow down the system performance.

To prevent that trashing of debug_locks cacheline, lock_acquired()
and lock_contended() now checks the state of debug_locks before
proceeding. The debug_locks_off() function is also modified to check
debug_locks before calling __debug_locks_off().
Signed-off-by: Waiman Long <longman@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/1539913518-15598-1-git-send-email-longman@redhat.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

9506a742

17 Oct, 2018 2 commits

locking/pvqspinlock: Extend node size when pvqspinlock is configured · 0fa809ca

Waiman Long authored Oct 16, 2018

The qspinlock code supports up to 4 levels of slowpath nesting using
four per-CPU mcs_spinlock structures. For 64-bit architectures, they
fit nicely in one 64-byte cacheline.

For para-virtualized (PV) qspinlocks it needs to store more information
in the per-CPU node structure than there is space for. It uses a trick
to use a second cacheline to hold the extra information that it needs.
So PV qspinlock needs to access two extra cachelines for its information
whereas the native qspinlock code only needs one extra cacheline.

Freshly added counter profiling of the qspinlock code, however, revealed
that it was very rare to use more than two levels of slowpath nesting.
So it doesn't make sense to penalize PV qspinlock code in order to have
four mcs_spinlock structures in the same cacheline to optimize for a case
in the native qspinlock code that rarely happens.

Extend the per-CPU node structure to have two more long words when PV
qspinlock locks are configured to hold the extra data that it needs.

As a result, the PV qspinlock code will enjoy the same benefit of using
just one extra cacheline like the native counterpart, for most cases.

[ mingo: Minor changelog edits. ]
Signed-off-by: Waiman Long <longman@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/1539697507-28084-2-git-send-email-longman@redhat.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

0fa809ca

locking/qspinlock_stat: Count instances of nested lock slowpaths · 1222109a

Waiman Long authored Oct 16, 2018

Queued spinlock supports up to 4 levels of lock slowpath nesting -
user context, soft IRQ, hard IRQ and NMI. However, we are not sure how
often the nesting happens.

So add 3 more per-CPU stat counters to track the number of instances where
nesting index goes to 1, 2 and 3 respectively.

On a dual-socket 64-core 128-thread Zen server, the following were the
new stat counter values under different circumstances:

         State                         slowpath   index1   index2   index3
         -----                         --------   ------   ------   -------
  After bootup                         1,012,150    82       0        0
  After parallel build + perf-top    125,195,009    82       0        0

So the chance of having more than 2 levels of nesting is extremely low.

[ mingo: Minor changelog edits. ]
Signed-off-by: Waiman Long <longman@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/1539697507-28084-1-git-send-email-longman@redhat.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

1222109a

16 Oct, 2018 6 commits

locking/qspinlock, x86: Provide liveness guarantee · 7aa54be2

Peter Zijlstra authored Sep 26, 2018

On x86 we cannot do fetch_or() with a single instruction and thus end up
using a cmpxchg loop, this reduces determinism. Replace the fetch_or()
with a composite operation: tas-pending + load.

Using two instructions of course opens a window we previously did not
have. Consider the scenario:

	CPU0		CPU1		CPU2

 1)	lock
	  trylock -> (0,0,1)

 2)			lock
			  trylock /* fail */

 3)	unlock -> (0,0,0)

 4)					lock
					  trylock -> (0,0,1)

 5)			  tas-pending -> (0,1,1)
			  load-val <- (0,1,0) from 3

 6)			  clear-pending-set-locked -> (0,0,1)

			  FAIL: _2_ owners

where 5) is our new composite operation. When we consider each part of
the qspinlock state as a separate variable (as we can when
_Q_PENDING_BITS == 8) then the above is entirely possible, because
tas-pending will only RmW the pending byte, so the later load is able
to observe prior tail and lock state (but not earlier than its own
trylock, which operates on the whole word, due to coherence).

To avoid this we need 2 things:

 - the load must come after the tas-pending (obviously, otherwise it
   can trivially observe prior state).

 - the tas-pending must be a full word RmW instruction, it cannot be an XCHGB for
   example, such that we cannot observe other state prior to setting
   pending.

On x86 we can realize this by using "LOCK BTS m32, r32" for
tas-pending followed by a regular load.

Note that observing later state is not a problem:

 - if we fail to observe a later unlock, we'll simply spin-wait for
   that store to become visible.

 - if we observe a later xchg_tail(), there is no difference from that
   xchg_tail() having taken place before the tas-pending.
Suggested-by: Will Deacon <will.deacon@arm.com>
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: andrea.parri@amarulasolutions.com
Cc: longman@redhat.com
Fixes: 59fb586b ("locking/qspinlock: Remove unbounded cmpxchg() loop from locking slowpath")
Link: https://lkml.kernel.org/r/20181003130957.183726335@infradead.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>

7aa54be2

x86/asm: 'Simplify' GEN_*_RMWcc() macros · 288e4521

Peter Zijlstra authored Oct 03, 2018

Currently the GEN_*_RMWcc() macros include a return statement, which
pretty much mandates we directly wrap them in a (inline) function.

Macros with return statements are tricky and, as per the above, limit
use, so remove the return statement and make them
statement-expressions. This allows them to be used more widely.

Also, shuffle the arguments a bit. Place the @cc argument as 3rd, this
makes it consistent between UNARY and BINARY, but more importantly, it
makes the @arg0 argument last.

Since the @arg0 argument is now last, we can do CPP trickery and make
it an optional argument, simplifying the users; 17 out of 18
occurences do not need this argument.

Finally, change to asm symbolic names, instead of the numeric ordering
of operands, which allows us to get rid of __BINARY_RMWcc_ARG and get
cleaner code overall.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: JBeulich@suse.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@alien8.de
Cc: hpa@linux.intel.com
Link: https://lkml.kernel.org/r/20181003130957.108960094@infradead.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>

288e4521

locking/qspinlock: Rework some comments · 756b1df4

Peter Zijlstra authored Sep 26, 2018

While working my way through the code again; I felt the comments could
use help.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: andrea.parri@amarulasolutions.com
Cc: longman@redhat.com
Link: https://lkml.kernel.org/r/20181003130257.156322446@infradead.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>

756b1df4

locking/qspinlock: Re-order code · 53bf57fa

Peter Zijlstra authored Sep 26, 2018

Flip the branch condition after atomic_fetch_or_acquire(_Q_PENDING_VAL)
such that we loose the indent. This also result in a more natural code
flow IMO.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: andrea.parri@amarulasolutions.com
Cc: longman@redhat.com
Link: https://lkml.kernel.org/r/20181003130257.156322446@infradead.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>

53bf57fa

Merge branch 'x86/build' into locking/core, to pick up dependent patches and unify jump-label work · ec57e2f0
Ingo Molnar authored Oct 16, 2018
```
Signed-off-by: Ingo Molnar <mingo@kernel.org>
```
ec57e2f0

locking/lockdep: Remove duplicated 'lock_class_ops' percpu array · 4766ab56

Waiman Long authored Oct 12, 2018

Remove the duplicated 'lock_class_ops' percpu array that is not used
anywhere.
Signed-off-by: Waiman Long <longman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Fixes: 8ca2b56c ("locking/lockdep: Make class->ops a percpu counter and move it under CONFIG_DEBUG_LOCKDEP=y")
Link: http://lkml.kernel.org/r/1539380547-16726-1-git-send-email-longman@redhat.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

4766ab56

10 Oct, 2018 1 commit

x86/defconfig: Enable CONFIG_USB_XHCI_HCD=y · 72a9c673

Adam Borowski authored Oct 09, 2018

A spanking new machine I just got has all but one USB ports wired as 3.0.
Booting defconfig resulted in no keyboard or mouse, which was pretty
uncool.  Let's enable that -- USB3 is ubiquitous rather than an oddity.
As 'y' not 'm' -- recovering from initrd problems needs a keyboard.

Also add it to the 32-bit defconfig.
Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-usb@vger.kernel.org
Link: http://lkml.kernel.org/r/20181009062803.4332-1-kilobyte@angband.plSigned-off-by: Ingo Molnar <mingo@kernel.org>

72a9c673

09 Oct, 2018 2 commits

futex: Replace spin_is_locked() with lockdep · 4de1a293

Lance Roy authored Oct 02, 2018

lockdep_assert_held() is better suited for checking locking requirements,
since it won't get confused when the lock is held by some other task. This
is also a step towards possibly removing spin_is_locked().
Signed-off-by: Lance Roy <ldr709@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Paul E. McKenney" <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Darren Hart <dvhart@infradead.org>
Link: https://lkml.kernel.org/r/20181003053902.6910-12-ldr709@gmail.com

4de1a293

locking/lockdep: Make class->ops a percpu counter and move it under CONFIG_DEBUG_LOCKDEP=y · 8ca2b56c

Waiman Long authored Oct 03, 2018

A sizable portion of the CPU cycles spent on the __lock_acquire() is used
up by the atomic increment of the class->ops stat counter. By taking it out
from the lock_class structure and changing it to a per-cpu per-lock-class
counter, we can reduce the amount of cacheline contention on the class
structure when multiple CPUs are trying to acquire locks of the same
class simultaneously.

To limit the increase in memory consumption because of the percpu nature
of that counter, it is now put back under the CONFIG_DEBUG_LOCKDEP
config option. So the memory consumption increase will only occur if
CONFIG_DEBUG_LOCKDEP is defined. The lock_class structure, however,
is reduced in size by 16 bytes on 64-bit archs after ops removal and
a minor restructuring of the fields.

This patch also fixes a bug in the increment code as the counter is of
the 'unsigned long' type, but atomic_inc() was used to increment it.
Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/d66681f3-8781-9793-1dcf-2436a284550b@redhat.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

8ca2b56c

06 Oct, 2018 4 commits

x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs · 5bdcd510

Nadav Amit authored Oct 05, 2018

As described in:

  77b0bf55: ("kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs")

GCC's inlining heuristics are broken with common asm() patterns used in
kernel code, resulting in the effective disabling of inlining.

The workaround is to set an assembly macro and call it from the inline
assembly block - which is also a minor cleanup for the jump-label code.

As a result the code size is slightly increased, but inlining decisions
are better:

      text     data     bss      dec     hex  filename
  18163528 10226300 2957312 31347140 1de51c4  ./vmlinux before
  18163608 10227348 2957312 31348268 1de562c  ./vmlinux after (+1128)

And functions such as intel_pstate_adjust_policy_max(),
kvm_cpu_accept_dm_intr(), kvm_register_readl() are inlined.
Tested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181005202718.229565-4-namit@vmware.com
Link: https://lore.kernel.org/lkml/20181003213100.189959-11-namit@vmware.com/T/#uSigned-off-by: Ingo Molnar <mingo@kernel.org>

5bdcd510

x86/cpufeature: Macrofy inline assembly code to work around GCC inlining bugs · d5a581d8

Nadav Amit authored Oct 05, 2018

As described in:

  77b0bf55: ("kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs")

GCC's inlining heuristics are broken with common asm() patterns used in
kernel code, resulting in the effective disabling of inlining.

The workaround is to set an assembly macro and call it from the inline
assembly block - which is pretty pointless indirection in the static_cpu_has()
case, but is worth it to improve overall inlining quality.

The patch slightly increases the kernel size:

      text     data     bss      dec     hex  filename
  18162879 10226256 2957312 31346447 1de4f0f  ./vmlinux before
  18163528 10226300 2957312 31347140 1de51c4  ./vmlinux after (+693)

And enables the inlining of function such as free_ldt_pgtables().
Tested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181005202718.229565-3-namit@vmware.com
Link: https://lore.kernel.org/lkml/20181003213100.189959-10-namit@vmware.com/T/#uSigned-off-by: Ingo Molnar <mingo@kernel.org>

d5a581d8

x86/extable: Macrofy inline assembly code to work around GCC inlining bugs · 0474d5d9

Nadav Amit authored Oct 05, 2018

As described in:

  77b0bf55: ("kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs")

GCC's inlining heuristics are broken with common asm() patterns used in
kernel code, resulting in the effective disabling of inlining.

The workaround is to set an assembly macro and call it from the inline
assembly block - which is also a minor cleanup for the exception table
code.

Text size goes up a bit:

      text     data     bss      dec     hex  filename
  18162555 10226288 2957312 31346155 1de4deb  ./vmlinux before
  18162879 10226256 2957312 31346447 1de4f0f  ./vmlinux after (+292)

But this allows the inlining of functions such as nested_vmx_exit_reflected(),
set_segment_reg(), __copy_xstate_to_user() which is a net benefit.
Tested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181005202718.229565-2-namit@vmware.com
Link: https://lore.kernel.org/lkml/20181003213100.189959-9-namit@vmware.com/T/#uSigned-off-by: Ingo Molnar <mingo@kernel.org>

0474d5d9

Merge branch 'core/core' into x86/build, to prevent conflicts · 02678a58
Ingo Molnar authored Oct 06, 2018
```
Signed-off-by: Ingo Molnar <mingo@kernel.org>
```
02678a58

05 Oct, 2018 1 commit
- Merge branch 'x86/core' into x86/build, to avoid conflicts · bce6824c
  Ingo Molnar authored Oct 05, 2018
```
Signed-off-by: Ingo Molnar <mingo@kernel.org>
```
  bce6824c
04 Oct, 2018 8 commits

x86/paravirt: Work around GCC inlining bugs when compiling paravirt ops · 494b5168