- 10 Sep, 2020 3 commits
-
-
Ahmed S. Darwish authored
Latch sequence counters are a multiversion concurrency control mechanism where the seqcount_t counter even/odd value is used to switch between two copies of protected data. This allows the seqcount_t read path to safely interrupt its write side critical section (e.g. from NMIs). Initially, latch sequence counters were implemented as a single write function above plain seqcount_t: raw_write_seqcount_latch(). The read side was expected to use plain seqcount_t raw_read_seqcount(). A specialized latch read function, raw_read_seqcount_latch(), was later added. It became the standardized way for latch read paths. Due to the dependent load, it has one read memory barrier less than the plain seqcount_t raw_read_seqcount() API. Only raw_write_seqcount_latch() and raw_read_seqcount_latch() should be used with latch sequence counters. Having *unique* read and write path APIs means that latch sequence counters are actually a data type of their own -- just inappropriately overloading plain seqcount_t. Introduce seqcount_latch_t. This adds type-safety and ensures that only the correct latch-safe APIs are to be used. Not to break bisection, let the latch APIs also accept plain seqcount_t or seqcount_raw_spinlock_t. After converting all call sites to seqcount_latch_t, only that new data type will be allowed. References: 9b0fd802 ("seqcount: Add raw_write_seqcount_latch()") References: 7fc26327 ("seqlock: Introduce raw_read_seqcount_latch()") References: aadd6e5c ("time/sched_clock: Use raw_read_seqcount_latch()") Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200827114044.11173-4-a.darwish@linutronix.de
-
Ahmed S. Darwish authored
Commit eef1a429 ("mm/swap.c: piggyback lru_add_drain_all() calls") implemented an optimization mechanism to exit the to-be-started LRU drain operation (name it A) if another drain operation *started and finished* while (A) was blocked on the LRU draining mutex. This was done through a seqcount_t latch, which is an abuse of its semantics: 1. seqcount_t latching should be used for the purpose of switching between two storage places with sequence protection to allow interruptible, preemptible, writer sections. The referenced optimization mechanism has absolutely nothing to do with that. 2. The used raw_write_seqcount_latch() has two SMP write memory barriers to insure one consistent storage place out of the two storage places available. A full memory barrier is required instead: to guarantee that the pagevec counter stores visible by local CPU are visible to other CPUs -- before loading the current drain generation. Beside the seqcount_t API abuse, the semantics of a latch sequence counter was force-fitted into the referenced optimization. What was meant is to track "generations" of LRU draining operations, where "global lru draining generation = x" implies that all generations 0 < n <= x are already *scheduled* for draining -- thus nothing needs to be done if the current generation number n <= x. Remove the conceptually-inappropriate seqcount_t latch usage. Manually implement the referenced optimization using a counter and SMP memory barriers. Note, while at it, use the non-atomic variant of cpumask_set_cpu(), __cpumask_set_cpu(), due to the already existing mutex protection. Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/87y2pg9erj.fsf@vostro.fn.ogness.net
-
Ahmed S. Darwish authored
sched_clock uses seqcount_t latching to switch between two storage places protected by the sequence counter. This allows it to have interruptible, NMI-safe, seqcount_t write side critical sections. Since 7fc26327 ("seqlock: Introduce raw_read_seqcount_latch()"), raw_read_seqcount_latch() became the standardized way for seqcount_t latch read paths. Due to the dependent load, it has one read memory barrier less than the currently used raw_read_seqcount() API. Use raw_read_seqcount_latch() for the suspend path. Commit aadd6e5c ("time/sched_clock: Use raw_read_seqcount_latch()") missed changing that instance of raw_read_seqcount(). References: 1809bfa4 ("timers, sched/clock: Avoid deadlock during read from NMI") Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200715092345.GA231464@debian-buster-darwi.lab.linutronix.de
-
- 26 Aug, 2020 35 commits
-
-
Boqun Feng authored
Add a test case shows that USED_IN_*_READ and ENABLE_*_READ can cause deadlock too. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200807074238.1632519-20-boqun.feng@gmail.com
-
Boqun Feng authored
Add two self test cases for the following case: P0: P1: P2: <in irq handler> spin_lock_irq(&slock) read_lock(&rwlock) write_lock_irq(&rwlock) read_lock(&rwlock) spin_lock(&slock) , which is a deadlock, as the read_lock() on P0 cannot get the lock because of the fairness. P0: P1: P2: <in irq handler> spin_lock(&slock) read_lock(&rwlock) write_lock(&rwlock) read_lock(&rwlock) spin_lock_irq(&slock) , which is not a deadlock, as the read_lock() on P0 can get the lock because it could use the unfair fastpass. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200807074238.1632519-19-boqun.feng@gmail.com
-
Boqun Feng authored
This reverts commit d82fed75. Since we now could handle mixed read-write deadlock detection well, the self tests could be detected as expected, no need to use this work-around. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200807074238.1632519-18-boqun.feng@gmail.com
-
Boqun Feng authored
Add those four test cases: 1. X --(ER)--> Y --(ER)--> Z --(ER)--> X is deadlock. 2. X --(EN)--> Y --(SR)--> Z --(ER)--> X is deadlock. 3. X --(EN)--> Y --(SR)--> Z --(SN)--> X is not deadlock. 4. X --(ER)--> Y --(SR)--> Z --(EN)--> X is not deadlock. Those self testcases are valuable for the development of supporting recursive read related deadlock detection. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200807074238.1632519-17-boqun.feng@gmail.com
-
Boqun Feng authored
Now since we can handle recursive read related irq inversion deadlocks correctly, uncomment the irq_read_recursion2 and add more testcases. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200807074238.1632519-16-boqun.feng@gmail.com
-
Boqun Feng authored
Currently, the chainkey of a lock chain is a hash sum of the class_idx of all the held locks, the read/write status are not taken in to consideration while generating the chainkey. This could result into a problem, if we have: P1() { read_lock(B); lock(A); } P2() { lock(A); read_lock(B); } P3() { lock(A); write_lock(B); } , and P1(), P2(), P3() run one by one. And when running P2(), lockdep detects such a lock chain A -> B is not a deadlock, then it's added in the chain cache, and then when running P3(), even if it's a deadlock, we could miss it because of the hit of chain cache. This could be confirmed by self testcase "chain cached mixed R-L/L-W ". To resolve this, we use concept "hlock_id" to generate the chainkey, the hlock_id is a tuple (hlock->class_idx, hlock->read), which fits in a u16 type. With this, the chainkeys are different is the lock sequences have the same locks but different read/write status. Besides, since we use "hlock_id" to generate chainkeys, the chain_hlocks array now store the "hlock_id"s rather than lock_class indexes. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200807074238.1632519-15-boqun.feng@gmail.com
-
Boqun Feng authored
As our chain cache doesn't differ read/write locks, so even we can detect a read-lock/lock-write deadlock in check_noncircular(), we can still be fooled if a read-lock/lock-read case(which is not a deadlock) comes first. So introduce this test case to test specific to the chain cache behavior on detecting recursive read lock related deadlocks. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200807074238.1632519-14-boqun.feng@gmail.com
-
Boqun Feng authored
Since we have all the fundamental to handle recursive read locks, we now add them into the dependency graph. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200807074238.1632519-13-boqun.feng@gmail.com
-
Boqun Feng authored
Currently, in safe->unsafe detection, lockdep misses the fact that a LOCK_ENABLED_IRQ_*_READ usage and a LOCK_USED_IN_IRQ_*_READ usage may cause deadlock too, for example: P1 P2 <irq disabled> write_lock(l1); <irq enabled> read_lock(l2); write_lock(l2); <in irq> read_lock(l1); Actually, all of the following cases may cause deadlocks: LOCK_USED_IN_IRQ_* -> LOCK_ENABLED_IRQ_* LOCK_USED_IN_IRQ_*_READ -> LOCK_ENABLED_IRQ_* LOCK_USED_IN_IRQ_* -> LOCK_ENABLED_IRQ_*_READ LOCK_USED_IN_IRQ_*_READ -> LOCK_ENABLED_IRQ_*_READ To fix this, we need to 1) change the calculation of exclusive_mask() so that READ bits are not dropped and 2) always call usage() in mark_lock_irq() to check usage deadlocks, even when the new usage of the lock is READ. Besides, adjust usage_match() and usage_acculumate() to recursive read lock changes. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200807074238.1632519-12-boqun.feng@gmail.com
-
Boqun Feng authored
check_redundant() will report redundancy if it finds a path could replace the about-to-add dependency in the BFS search. With recursive read lock changes, we certainly need to change the match function for the check_redundant(), because the path needs to match not only the lock class but also the dependency kinds. For example, if the about-to-add dependency @prev -> @next is A -(SN)-> B, and we find a path A -(S*)-> .. -(*R)->B in the dependency graph with __bfs() (for simplicity, we can also say we find an -(SR)-> path from A to B), we can not replace the dependency with that path in the BFS search. Because the -(SN)-> dependency can make a strong path with a following -(S*)-> dependency, however an -(SR)-> path cannot. Further, we can replace an -(SN)-> dependency with a -(EN)-> path, that means if we find a path which is stronger than or equal to the about-to-add dependency, we can report the redundancy. By "stronger", it means both the start and the end of the path are not weaker than the start and the end of the dependency (E is "stronger" than S and N is "stronger" than R), so that we can replace the dependency with that path. To make sure we find a path whose start point is not weaker than the about-to-add dependency, we use a trick: the ->only_xr of the root (start point) of __bfs() is initialized as @prev-> == 0, therefore if @prev is E, __bfs() will pick only -(E*)-> for the first dependency, otherwise, __bfs() can pick -(E*)-> or -(S*)-> for the first dependency. To make sure we find a path whose end point is not weaker than the about-to-add dependency, we replace the match function for __bfs() check_redundant(), we check for the case that either @next is R (anything is not weaker than it) or the end point of the path is N (which is not weaker than anything). Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200807074238.1632519-11-boqun.feng@gmail.com
-
Boqun Feng authored
Currently, lockdep only has limit support for deadlock detection for recursive read locks. This patch support deadlock detection for recursive read locks. The basic idea is: We are about to add dependency B -> A in to the dependency graph, we use check_noncircular() to find whether we have a strong dependency path A -> .. -> B so that we have a strong dependency circle (a closed strong dependency path): A -> .. -> B -> A , which doesn't have two adjacent dependencies as -(*R)-> L -(S*)->. Since A -> .. -> B is already a strong dependency path, so if either B -> A is -(E*)-> or A -> .. -> B is -(*N)->, the circle A -> .. -> B -> A is strong, otherwise not. So we introduce a new match function hlock_conflict() to replace the class_equal() for the deadlock check in check_noncircular(). Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200807074238.1632519-10-boqun.feng@gmail.com
-
Boqun Feng authored
The "match" parameter of __bfs() is used for checking whether we hit a match in the search, therefore it should return a boolean value rather than an integer for better readability. This patch then changes the return type of the function parameter and the match functions to bool. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200807074238.1632519-9-boqun.feng@gmail.com
-
Boqun Feng authored
Now we have four types of dependencies in the dependency graph, and not all the pathes carry real dependencies (the dependencies that may cause a deadlock), for example: Given lock A and B, if we have: CPU1 CPU2 ============= ============== write_lock(A); read_lock(B); read_lock(B); write_lock(A); (assuming read_lock(B) is a recursive reader) then we have dependencies A -(ER)-> B, and B -(SN)-> A, and a dependency path A -(ER)-> B -(SN)-> A. In lockdep w/o recursive locks, a dependency path from A to A means a deadlock. However, the above case is obviously not a deadlock, because no one holds B exclusively, therefore no one waits for the other to release B, so who get A first in CPU1 and CPU2 will run non-blockingly. As a result, dependency path A -(ER)-> B -(SN)-> A is not a real/strong dependency that could cause a deadlock. From the observation above, we know that for a dependency path to be real/strong, no two adjacent dependencies can be as -(*R)-> -(S*)->. Now our mission is to make __bfs() traverse only the strong dependency paths, which is simple: we record whether we only have -(*R)-> for the previous lock_list of the path in lock_list::only_xr, and when we pick a dependency in the traverse, we 1) filter out -(S*)-> dependency if the previous lock_list only has -(*R)-> dependency (i.e. ->only_xr is true) and 2) set the next lock_list::only_xr to true if we only have -(*R)-> left after we filter out dependencies based on 1), otherwise, set it to false. With this extension for __bfs(), we now need to initialize the root of __bfs() properly (with a correct ->only_xr), to do so, we introduce some helper functions, which also cleans up a little bit for the __bfs() root initialization code. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200807074238.1632519-8-boqun.feng@gmail.com
-
Boqun Feng authored
To add recursive read locks into the dependency graph, we need to store the types of dependencies for the BFS later. There are four types of dependencies: * Exclusive -> Non-recursive dependencies: EN e.g. write_lock(prev) held and try to acquire write_lock(next) or non-recursive read_lock(next), which can be represented as "prev -(EN)-> next" * Shared -> Non-recursive dependencies: SN e.g. read_lock(prev) held and try to acquire write_lock(next) or non-recursive read_lock(next), which can be represented as "prev -(SN)-> next" * Exclusive -> Recursive dependencies: ER e.g. write_lock(prev) held and try to acquire recursive read_lock(next), which can be represented as "prev -(ER)-> next" * Shared -> Recursive dependencies: SR e.g. read_lock(prev) held and try to acquire recursive read_lock(next), which can be represented as "prev -(SR)-> next" So we use 4 bits for the presence of each type in lock_list::dep. Helper functions and macros are also introduced to convert a pair of locks into lock_list::dep bit and maintain the addition of different types of dependencies. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200807074238.1632519-7-boqun.feng@gmail.com
-
Boqun Feng authored
lock_list::distance is always not greater than MAX_LOCK_DEPTH (which is 48 right now), so a u16 will fit. This patch reduces the size of lock_list::distance to save space, so that we can introduce other fields to help detect recursive read lock deadlocks without increasing the size of lock_list structure. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200807074238.1632519-6-boqun.feng@gmail.com
-
Boqun Feng authored
Currently, __bfs() will do a breadth-first search in the dependency graph and visit each lock class in the graph exactly once, so for example, in the following graph: A ---------> B | ^ | | +----------> C a __bfs() call starts at A, will visit B through dependency A -> B and visit C through dependency A -> C and that's it, IOW, __bfs() will not visit dependency C -> B. This is OK for now, as we only have strong dependencies in the dependency graph, so whenever there is a traverse path from A to B in __bfs(), it means A has strong dependencies to B (IOW, B depends on A strongly). So no need to visit all dependencies in the graph. However, as we are going to add recursive-read lock into the dependency graph, as a result, not all the paths mean strong dependencies, in the same example above, dependency A -> B may be a weak dependency and traverse A -> C -> B may be a strong dependency path. And with the old way of __bfs() (i.e. visiting every lock class exactly once), we will miss the strong dependency path, which will result into failing to find a deadlock. To cure this for the future, we need to find a way for __bfs() to visit each dependency, rather than each class, exactly once in the search until we find a match. The solution is simple: We used to mark lock_class::lockdep_dependency_gen_id to indicate a class has been visited in __bfs(), now we change the semantics a little bit: we now mark lock_class::lockdep_dependency_gen_id to indicate _all the dependencies_ in its lock_{after,before} have been visited in the __bfs() (note we only take one direction in a __bfs() search). In this way, every dependency is guaranteed to be visited until we find a match. Note: the checks in mark_lock_accessed() and lock_accessed() are removed, because after this modification, we may call these two functions on @source_entry of __bfs(), which may not be the entry in "list_entries" Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200807074238.1632519-5-boqun.feng@gmail.com
-
Boqun Feng authored
__bfs() could return four magic numbers: 1: search succeeds, but none match. 0: search succeeds, find one match. -1: search fails because of the cq is full. -2: search fails because a invalid node is found. This patch cleans things up by using a enum type for the return value of __bfs() and its friends, this improves the code readability of the code, and further, could help if we want to extend the BFS. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200807074238.1632519-4-boqun.feng@gmail.com
-
Boqun Feng authored
This patch add the documentation piece for the reasoning of deadlock detection related to recursive read lock. The following sections are added: * Explain what is a recursive read lock, and what deadlock cases they could introduce. * Introduce the notations for different types of dependencies, and the definition of strong paths. * Proof for a closed strong path is both sufficient and necessary for deadlock detections with recursive read locks involved. The proof could also explain why we call the path "strong" Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200807074238.1632519-3-boqun.feng@gmail.com
-
Boqun Feng authored
On the archs using QUEUED_RWLOCKS, read_lock() is not always a recursive read lock, actually it's only recursive if in_interrupt() is true. So change the annotation accordingly to catch more deadlocks. Note we used to treat read_lock() as pure recursive read locks in lib/locking-seftest.c, and this is useful, especially for the lockdep development selftest, so we keep this via a variable to force switching lock annotation for read_lock(). Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200807074238.1632519-2-boqun.feng@gmail.com
-
Marta Rybczynska authored
Fix issues with local_locks documentation: - fix function names, local_lock.h has local_unlock_irqrestore(), not local_lock_irqrestore() - fix mapping table, local_unlock_irqrestore() maps to local_irq_restore(), not _save() Signed-off-by: Marta Rybczynska <rybczynska@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lkml.kernel.org/r/CAApg2=SKxQ3Sqwj6TZnV-0x0cKLXFKDaPvXT4N15MPDMKq724g@mail.gmail.com
-
Randy Dunlap authored
Fix kernel-doc warnings in <linux/seqlock.h>. ../include/linux/seqlock.h:152: warning: Incorrect use of kernel-doc format: * seqcount_LOCKNAME_init() - runtime initializer for seqcount_LOCKNAME_t ../include/linux/seqlock.h:164: warning: Incorrect use of kernel-doc format: * SEQCOUNT_LOCKTYPE() - Instantiate seqcount_LOCKNAME_t and helpers ../include/linux/seqlock.h:229: warning: Function parameter or member 'seq_name' not described in 'SEQCOUNT_LOCKTYPE_ZERO' ../include/linux/seqlock.h:229: warning: Function parameter or member 'assoc_lock' not described in 'SEQCOUNT_LOCKTYPE_ZERO' ../include/linux/seqlock.h:229: warning: Excess function parameter 'name' description in 'SEQCOUNT_LOCKTYPE_ZERO' ../include/linux/seqlock.h:229: warning: Excess function parameter 'lock' description in 'SEQCOUNT_LOCKTYPE_ZERO' ../include/linux/seqlock.h:695: warning: duplicate section name 'NOTE' Demote kernel-doc notation for the macros "seqcount_LOCKNAME_init()" and "SEQCOUNT_LOCKTYPE()"; scripts/kernel-doc does not handle them correctly. Rename function parameters in SEQCNT_LOCKNAME_ZERO() documentation to match the macro's argument names. Change the macro name in the documentation to SEQCOUNT_LOCKTYPE_ZERO() to match the macro's name. For raw_write_seqcount_latch(), rename the second NOTE: to NOTE2: to prevent a kernel-doc warning. However, the generated output is not quite as nice as it could be for this. Fix a typo: s/LOCKTYPR/LOCKTYPE/ Fixes: 0efc94c5 ("seqcount: Compress SEQCNT_LOCKNAME_ZERO()") Fixes: e4e9ab3f ("seqlock: Fold seqcount_LOCKNAME_init() definition") Fixes: a8772dcc ("seqlock: Fold seqcount_LOCKNAME_t definition") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200817000200.20993-1-rdunlap@infradead.org
-
Peter Zijlstra authored
David requested means to obtain the old/previous value from the refcount API for tracing purposes. Duplicate (most of) the API as __refcount*() with an additional 'int *' argument into which, if !NULL, the old value will be stored. Requested-by: David Howells <dhowells@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20200729111120.GA2638@hirez.programming.kicks-ass.net
-
Peter Zijlstra authored
Such that we might easily find seqcount_LOCKTYPE_t and seqcount_LOCKTYPE_init(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200729161938.GB2678@hirez.programming.kicks-ass.net
-
Peter Zijlstra authored
The lockdep tracepoints are under the lockdep recursion counter, this has a bunch of nasty side effects: - TRACE_IRQFLAGS doesn't work across the entire tracepoint - RCU-lockdep doesn't see the tracepoints either, hiding numerous "suspicious RCU usage" warnings. Pull the trace_lock_*() tracepoints completely out from under the lockdep recursion handling and completely rely on the trace level recusion handling -- also, tracing *SHOULD* not be taking locks in any case. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Marco Elver <elver@google.com> Link: https://lkml.kernel.org/r/20200821085348.782688941@infradead.org
-
Nicholas Piggin authored
Problem: raw_local_irq_save(); // software state on local_irq_save(); // software state off ... local_irq_restore(); // software state still off, because we don't enable IRQs raw_local_irq_restore(); // software state still off, *whoopsie* existing instances: - lock_acquire() raw_local_irq_save() __lock_acquire() arch_spin_lock(&graph_lock) pv_wait() := kvm_wait() (same or worse for Xen/HyperV) local_irq_save() - trace_clock_global() raw_local_irq_save() arch_spin_lock() pv_wait() := kvm_wait() local_irq_save() - apic_retrigger_irq() raw_local_irq_save() apic->send_IPI() := default_send_IPI_single_phys() local_irq_save() Possible solutions: A) make it work by enabling the tracing inside raw_*() B) make it work by keeping tracing disabled inside raw_*() C) call it broken and clean it up now Now, given that the only reason to use the raw_* variant is because you don't want tracing. Therefore A) seems like a weird option (although it can be done). C) is tempting, but OTOH it ends up converting a _lot_ of code to raw just because there is one raw user, this strips the validation/tracing off for all the other users. So we pick B) and declare any code that ends up doing: raw_local_irq_save() local_irq_save() lockdep_assert_irqs_disabled(); broken. AFAICT this problem has existed forever, the only reason it came up is because commit: 859d069e ("lockdep: Prepare for NMI IRQ state tracking") changed IRQ tracing vs lockdep recursion and the first instance is fairly common, the other cases hardly ever happen. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [rewrote changelog] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Marco Elver <elver@google.com> Link: https://lkml.kernel.org/r/20200723105615.1268126-1-npiggin@gmail.com
-
Peter Zijlstra authored
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Paul Burton <paulburton@kernel.org> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200826101653.GE1362448@hirez.programming.kicks-ass.net
-
Peter Zijlstra authored
Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lkml.kernel.org/r/20200821085348.664425120@infradead.org
-
Peter Zijlstra authored
Cc: Nick Hu <nickhu@andestech.com> Cc: Greentime Hu <green.hu@gmail.com> Cc: Vincent Chen <deanbo422@gmail.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Marco Elver <elver@google.com> Link: https://lkml.kernel.org/r/20200821085348.604899379@infradead.org
-
Peter Zijlstra authored
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Marco Elver <elver@google.com> Link: https://lkml.kernel.org/r/20200821085348.546087214@infradead.org
-
Peter Zijlstra authored
Unused remnants Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Marco Elver <elver@google.com> Link: https://lkml.kernel.org/r/20200821085348.487040689@infradead.org
-
Peter Zijlstra authored
Remove trace_cpu_idle() from the arch_cpu_idle() implementations and put it in the generic code, right before disabling RCU. Gets rid of more trace_*_rcuidle() users. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Marco Elver <elver@google.com> Link: https://lkml.kernel.org/r/20200821085348.428433395@infradead.org
-
Peter Zijlstra authored
This allows moving the leave_mm() call into generic code before rcu_idle_enter(). Gets rid of more trace_*_rcuidle() users. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Marco Elver <elver@google.com> Link: https://lkml.kernel.org/r/20200821085348.369441600@infradead.org
-
Peter Zijlstra authored
Lots of things take locks, due to a wee bug, rcu_lockdep didn't notice that the locking tracepoints were using RCU. Push rcu_idle_{enter,exit}() as deep as possible into the idle paths, this also resolves a lot of _rcuidle()/RCU_NONIDLE() usage. Specifically, sched_clock_idle_wakeup_event() will use ktime which will use seqlocks which will tickle lockdep, and stop_critical_timings() uses lock. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Marco Elver <elver@google.com> Link: https://lkml.kernel.org/r/20200821085348.310943801@infradead.org
-
Peter Zijlstra authored
Match the pattern elsewhere in this file. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Marco Elver <elver@google.com> Link: https://lkml.kernel.org/r/20200821085348.251340558@infradead.org
-
Peter Zijlstra authored
Sven reported that commit a21ee605 ("lockdep: Change hardirq{s_enabled,_context} to per-cpu variables") caused trouble on s390 because their this_cpu_*() primitives disable preemption which then lands back tracing. On the one hand, per-cpu ops should use preempt_*able_notrace() and raw_local_irq_*(), on the other hand, we can trivialy use raw_cpu_*() ops for this. Fixes: a21ee605 ("lockdep: Change hardirq{s_enabled,_context} to per-cpu variables") Reported-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200821085348.192346882@infradead.org
-
- 23 Aug, 2020 2 commits
-
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linuxLinus Torvalds authored
Pull powerpc fixes from Michael Ellerman: - Add perf support for emitting extended registers for power10. - A fix for CPU hotplug on pseries, where on large/loaded systems we may not wait long enough for the CPU to be offlined, leading to crashes. - Addition of a raw cputable entry for Power10, which is not required to boot, but is required to make our PMU setup work correctly in guests. - Three fixes for the recent changes on 32-bit Book3S to move modules into their own segment for strict RWX. - A fix for a recent change in our powernv PCI code that could lead to crashes. - A change to our perf interrupt accounting to avoid soft lockups when using some events, found by syzkaller. - A change in the way we handle power loss events from the hypervisor on pseries. We no longer immediately shut down if we're told we're running on a UPS. - A few other minor fixes. Thanks to Alexey Kardashevskiy, Andreas Schwab, Aneesh Kumar K.V, Anju T Sudhakar, Athira Rajeev, Christophe Leroy, Frederic Barrat, Greg Kurz, Kajol Jain, Madhavan Srinivasan, Michael Neuling, Michael Roth, Nageswara R Sastry, Oliver O'Halloran, Thiago Jung Bauermann, Vaidyanathan Srinivasan, Vasant Hegde. * tag 'powerpc-5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/perf/hv-24x7: Move cpumask file to top folder of hv-24x7 driver powerpc/32s: Fix module loading failure when VMALLOC_END is over 0xf0000000 powerpc/pseries: Do not initiate shutdown when system is running on UPS powerpc/perf: Fix soft lockups due to missed interrupt accounting powerpc/powernv/pci: Fix possible crash when releasing DMA resources powerpc/pseries/hotplug-cpu: wait indefinitely for vCPU death powerpc/32s: Fix is_module_segment() when MODULES_VADDR is defined powerpc/kasan: Fix KASAN_SHADOW_START on BOOK3S_32 powerpc/fixmap: Fix the size of the early debug area powerpc/pkeys: Fix build error with PPC_MEM_KEYS disabled powerpc/kernel: Cleanup machine check function declarations powerpc: Add POWER10 raw mode cputable entry powerpc/perf: Add extended regs support for power10 platform powerpc/perf: Add support for outputting extended regs in perf intr_regs powerpc: Fix P10 PVR revision in /proc/cpuinfo for SMT4 cores
-