- 24 Jun, 2004 40 commits
-
-
Andrew Morton authored
From: Ralf Baechle <ralf@linux-mips.org> Forward port of the 2.4 driver with changes required for 2.6. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
From: Ralf Baechle <ralf@linux-mips.org> MIPS update: - Further conversion of MIPS kernel configuration to reverse dependencies. - Support for the PMC-Sierra Yosemite evaluation board. - Merge arch/mips/mm-32 and arch/mips/mm-32 into arch/mips/mm. - Partial support for the R8000 now that I finally have clearance for the documentation previously covered by NDA. - Make distclean fixes. - Regenerate default configuration files against latest Kconfig files. - Fix handling of data bus errors in modules. - Make R4000 bug probing more bullet proof. - Rewrite semaphore code folloing the PPC implementation to no longer manipulate 2 32-bit quantities atomically using 64-bit instructions. Occasionally this did cause problems due to struct semaphore not having sufficient alignment. - Make sys_pipe() code bullet proof against gcc 3.5 over-optimization. - Fix possibly exploitable bug in IRIX compatibility statvfs(2). - Make sched_clock() an outline function. - Support for the MIPS 24K and 25K processors. - Make functions static that aren't needed anywhere else. - Factor out some more generic MIPS SMP code. - Factor out common part of the GT-64240 code. - Ocelot C now uses the generic MV-64340 interrupt handler code. - Factor out common board support code - More cleanup and bug fixes for the NEC VR41xx code. - Start cleanup of hazard handling as required for MIPS32/64 V2 processors. - Enforce minimal kmalloc alignment of 8 byte so 64-bit registers can be stored into fields without exceptions. - Speeling and warning fixes. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
From: Alan Cox <alan@redhat.com> OSS avoids the Dell lockup by not hitting the problem register (which apparently breaks resume on a Sony laptop). ALSA keeps a flag and uses pci subvendor info to clear it for problem Dell laptops. Unfortunately there is at least one other Dell laptop which is affected. This adds its sub id's [Patch from Dan Williams @ Red Hat slightly reformatted by me] Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
From: Ralf Baechle <ralf@linux-mips.org> Fix HAL2 audio driver for the SGI A2 audio subsystem and rewrite large parts of it to finally work. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
From: Anton Blanchard <anton@samba.org> ../include/linux/swap.h:extern int nr_swap_pages; /* XXX: shouldn't this be ulong? --hch */ Sounds like it should be too me. Some of the code checks for nr_swap_pages < 0 so I made it a long instead. I had to fix up the ppc64 show_mem() (Im guessing there will be other trivial changes required in other 64bit archs, I can find and fix those if you want). I also noticed that the ppc64 show_mem() used ints to store page counts. We can overflow that, so make them unsigned long. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
We use per-cpu counters for the system-wide pagecache accounting. The counters spill into the global nr_pagecache atomic_t when they underflow or overflow. Hence it is possible, under weird circumstances, for nr_pagecache to go negative. Anton says he has hit this. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
From: "Antonino A. Daplas" <adaplas@hotpop.com> Signed-off-by: Antonino Daplas <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
From: Christoph Lameter <christoph@graphe.net> Attached a patch to support a variety of PCI based serial and parallel port I/O ports (typically labeled 222N-2 or 9835). I think this should go into 2.6.0 since it has been out there for a long time and is just some additional driver support that somehow fell through the cracks in 2.4.X. Tim Waugh submitted it in the 2.4.X series. See also http://winterwolf.co.uk/pciioSigned-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
OK, the pending abs() disaster has hit: drivers/usb/class/audio.c:404: warning: static declaration of 'abs' follows non-static declaration This is due to the declaration in kernel.h. AFAIK there's not even a matching definition for that. The patch implements abs() as a macro in kernel.h and kills off various private implementations. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
From: Manfred Spraul <manfred@colorfullife.com> slab.c contains too many inline functions: - some functions that are not performance critical were inlined. Waste of text size. - The debug code relies on __builtin_return_address(0) to keep track of the callers. According to rmk, gcc didn't inline some functions as expected and that resulted in useless debug output. This was probably caused by the large debug-only inline functions. The attached patche removes most inline functions: - the empty on release/huge on debug inline functions were replaced with empty macros on release/normal functions on debug. - spurious inline statements were removed. The code is down to 6 inline functions: three one-liners for struct abstractions, one for a might_sleep_if test and two for the performance critical __cache_alloc / __cache_free functions. Note: If an embedded arch wants to save a few bytes by uninlining __cache_{free,alloc}: The right way to do that is to fold the functions into kmem_cache_xy and then replace kmalloc with kmem_cache_alloc(kmem_find_general_cachep(),). Signed-Off: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
From: Manfred Spraul <manfred@colorfullife.com> Reversing the patches that made all caches hw cacheline aligned had an unintended side effect on the kmalloc caches: Before they had the SLAB_HWCACHE_ALIGN flag set, now it's clear. This breaks one sgi driver - it expects aligned caches. Additionally I think it's the right thing to do: It costs virtually nothing (the caches are power-of-two sized) and could reduce false sharing. Additionally, the patch adds back the documentation for the SLAB_HWCACHE_ALIGN flag. Signed-Off: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
From: William Lee Irwin III <wli@holomorphy.com> Based on Arjan van de Ven's idea, with guidance and testing from James Bottomley. The physical ordering of pages delivered to the IO subsystem is strongly related to the order in which fragments are subdivided from larger blocks of memory tracked by the page allocator. Consider a single MAX_ORDER block of memory in isolation acted on by a sequence of order 0 allocations in an otherwise empty buddy system. Subdividing the block beginning at the highest addresses will yield all the pages of the block in reverse, and subdividing the block begining at the lowest addresses will yield all the pages of the block in physical address order. Empirical tests demonstrate this ordering is preserved, and that changing the order of subdivision so that the lowest page is split off first resolves the sglist merging difficulties encountered by driver authors at Adaptec and others in James Bottomley's testing. James found that before this patch, there were 40 merges out of about 32K segments. Afterward, there were 24007 merges out of 19513 segments, for a merge rate of about 55%. Merges of 128 segments, the maximum allowed, were observed afterward, where beforehand they never occurred. It also improves dbench on my workstation and works fine there. Signed-off-by: William Lee Irwin III <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
Use the more SMP-friendly prepare_to_wait()/finish_wait() in wait_event() and friends. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
From: "Adam J. Richter" <adam@yggdrasil.com> Replace the use of a global spinlock with the per-inode ->i_lock. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
Some people want the dentry and inode caches shrink harder, others want them shrunk more reluctantly. The patch adds /proc/sys/vm/vfs_cache_pressure, which tunes the vfs cache versus pagecache scanning pressure. - at vfs_cache_pressure=0 we don't shrink dcache and icache at all. - at vfs_cache_pressure=100 there is no change in behaviour. - at vfs_cache_pressure > 100 we reclaim dentries and inodes harder. The number of megabytes of slab left after a slocate.cron on my 256MB test box: vfs_cache_pressure=100000 33480 vfs_cache_pressure=10000 61996 vfs_cache_pressure=1000 104056 vfs_cache_pressure=200 166340 vfs_cache_pressure=100 190200 vfs_cache_pressure=50 206168 Of course, this just left more directory and inode pagecache behind instead of vfs cache. Interestingly, on this machine the entire slocate run fits into pagecache, but not into VFS caches. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
The shrink_zone() logic can, under some circumstances, cause far too many pages to be reclaimed. Say, we're scanning at high priority and suddenly hit a large number of reclaimable pages on the LRU. Change things so we bale out when SWAP_CLUSTER_MAX pages have been reclaimed. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
We've been futzing with the scan rates of the inactive and active lists far too much, and it's still not right (Anton reports interrupt-off times of over a second). - We have this logic in there from 2.4.early (at least) which tries to keep the inactive list 1/3rd the size of the active list. Or something. I really cannot see any logic behind this, so toss it out and change the arithmetic in there so that all pages on both lists have equal scan rates. - Chunk the work up so we never hold interrupts off for more that 32 pages worth of scanning. - Make the per-zone scan-count accumulators unsigned long rather than atomic_t. Mainly because atomic_t's could conceivably overflow, but also because access to these counters is racy-by-design anyway. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
Move all the data structure declarations, macros and variable definitions to less surprising places. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
From: long <tlnguyen@snoqualmie.dp.intel.com> MSI support for x86_64 is currently disabled in the kernel 2.6.x. Below is the patch, which provides a fix and reenable it. In addition, the patch provides a info message during kernel boot if configuring vector-base indexing. Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
From: William Lee Irwin III <wli@holomorphy.com> The following patch makes irqaction's ->mask a cpumask as it was intended to be and wraps up the rest of the sweep. Only struct irqaction is usefully greppable, so there may be some assignments to ->mask missing still. This removes more code than it adds. From: William Lee Irwin III <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
From: William Lee Irwin III <wli@holomorphy.com> The cpumask patches broke alpha's build, even without the irqaction patch, largely centering around cpu_possible_map. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
From: Rusty Russell <rusty@rustcorp.com.au> Paul Jackson's cpumask tour-de-force allows us to get rid of those stupid temporaries which we used to hold CPU_MASK_ALL to hand them to functions. This used to break NR_CPUS > BITS_PER_LONG. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
From: Paul Jackson <pj@sgi.com> Tweak cpumask.h comments, spacing: - Add comments for cpu_present_map macros: num_present_cpus() and cpu_present() - Remove comments for obsolete macros: cpu_set_online(), cpu_set_offline() - Reorder a few comment lines, to match the code and confuse readers of this patch - Tabify one chunk of code Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
From: Paul Jackson <pj@sgi.com> Make use of for_each_cpu_mask() macro to simplify and optimize a couple of sparc64 per-CPU loops. Optimize a bit of cpumask code for asm-i386/mach-es7000 Convert physids_complement() to use both args in the files include/asm-i386/mpspec.h, include/asm-x86_64/mpspec.h. Remove cpumask hack from asm-x86_64/topology.h routine pcibus_to_cpumask(). Clarify and slightly optimize several cpumask manipulations in kernel/sched.c Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
From: Paul Jackson <pj@sgi.com> Now that the emulation of the obsolete cpumask macros is no longer needed, remove it from cpumask.h Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
include/asm/smp.h:55:1: warning: "cpu_possible" redefined include/asm/smp.h:54:1: warning: "cpu_online" redefined Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
From: Paul Jackson <pj@sgi.com> Remove by recoding other uses of the obsolete cpumask const, coerce and promote macros. Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
From: Paul Jackson <pj@sgi.com> Remove by recoding i386 uses of the obsolete cpumask const, coerce and promote macros. Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
From: Paul Jackson <pj@sgi.com> With the cpumask rewrite in the previous patch, these various include/asm-*/cpumask*.h headers are no longer used. Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
From: Paul Jackson <pj@sgi.com> Major rewrite of cpumask to use a single implementation, as a struct-wrapped bitmap. This patch leaves some 26 include/asm-*/cpumask*.h header files orphaned - to be removed next patch. Some nine cpumask macros for const variants and to coerce and promote between an unsigned long and a cpumask are obsolete. Simple emulation wrappers are provided in this patch for these obsolete macros, which can be removed once each of the 3 archs (i386, ppc64, x86_64) using them are recoded in follow-on patches to not need them. The CPU_MASK_ALL macro now avoids leaving possible garbage one bits in any unused portion of the high word. An inproved comment lists all available operators, for convenient browsing. From: Mikael Pettersson <mikpe@csd.uu.se> 2.6.7-rc3-mm1 changed CPU_MASK_NONE into something that isn't a valid rvalue (it only works inside struct initializers). This caused compile-time errors in perfctr in UP x86 builds. From: Arnd Bergmann <arnd@arndb.de> cpumask-5-10-rewrite-cpumaskh-single-bitmap-based from 2.6.7-rc3-mm1 causes include2/asm/smp.h:54:1: warning: "cpu_online" redefined Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Mikael Pettersson <mikpe@csd.uu.se> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
From: Paul Jackson <pj@sgi.com> These bitmap improvements make it a suitable basis for fully supporting cpumask_t and nodemask_t. Inline macros with compile-time checks enable generating tight code on both small and large systems (large meaning cpumask_t requires more than one unsigned long's worth of bits). The existing bitmap_<op> macros in lib/bitmap.c are renamed to __bitmap_<op>, and wrappers for each bitmap_<op> are exposed in include/linux/bitmap.h This patch _includes_ Bill Irwins rewrite of the bitmap_shift operators to not require a fixed length intermediate bitmap. Improved comments list each available operator for easy browsing. Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
From: Paul Jackson <pj@sgi.com> Document the bitmap bit model and handling of unused bits. Tighten up bitmap so it does not generate nonzero bits in the unused tail if it is not given any on input. Add intersects, subset, xor and andnot operators. Change bitmap_complement to take two operands. Add a couple of missing 'const' qualifiers on bitops test_bit and bitmap_equal args. Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
From: Paul Jackson <pj@sgi.com> This patch makes cpu_present_map a real map for all configurations, instead of a constant for non-SMP. It also moves the definition of cpu_present_map out of kernel/cpu.c into kernel/sched.c, because cpu.c isn't compiled into non-SMP kernels. The pattern is that each of the possible, present and online cpu maps are actual kernel global cpumask_t variables, for all configurations. They are documented in include/linux/cpumask.h. Some of the UP (NR_CPUS=1) code cheats, and hardcodes the assumption that the single bit position of these maps is always set, as an optimization. Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
From: Dipankar Sarma <dipankar@in.ibm.com> This patch changes the call_rcu() API and avoids passing an argument to the callback function as suggested by Rusty. Instead, it is assumed that the user has embedded the rcu head into a structure that is useful in the callback and the rcu_head pointer is passed to the callback. The callback can use container_of() to get the pointer to its structure and work with it. Together with the rcu-singly-link patch, it reduces the rcu_head size by 50%. Considering that we use these in things like struct dentry and struct dst_entry, this is good savings in space. An example : struct my_struct { struct rcu_head rcu; int x; int y; }; void my_rcu_callback(struct rcu_head *head) { struct my_struct *p = container_of(head, struct my_struct, rcu); free(p); } void my_delete(struct my_struct *p) { ... call_rcu(&p->rcu, my_rcu_callback); ... } Signed-Off-By: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
From: Dipankar Sarma <dipankar@in.ibm.com> This reduces the RCU head size by using a singly linked to maintain them. The ordering of the callbacks is still maintained as before by using a tail pointer for the next list. Signed-Off-By : Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
From: Manfred Spraul <manfred@colorfullife.com> Step three for reducing cacheline trashing within rcupdate.c: Cleanup and code move from <linux/rcupdate.h> to kernel/rcupdate.c: Remove internal details from the header file. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
From: Manfred Spraul <manfred@colorfullife.com> Step two for reducing cacheline trashing within rcupdate.c: rcu_process_callbacks always acquires rcu_ctrlblk.state.mutex and calls rcu_start_batch, even if the batch is already running or already scheduled to run. This can be avoided with a sequence lock: A sequence lock allows to read the current batch number and next_pending atomically. If next_pending is already set, then there is no need to acquire the global mutex. This means that for each grace period, there will be - one write access to the rcu_ctrlblk.batch cacheline - lots of read accesses to rcu_ctrlblk.batch (3-10*cpus_online()). Behavior similar to the jiffies cacheline, shouldn't be a problem. - cpus_online()+1 write accesses to rcu_ctrlblk.state, all of them starting with spin_lock(&rcu_ctrlblk.state.mutex). For large enough cpus_online() this will be a problem, but all except two of the spin_lock calls only protect the rcu_cpu_mask bitmap, thus a hierarchical bitmap would allow to split the write accesses to multiple cachelines. Tested on an 8-way with reaim. Unfortunately it probably won't help with Jack Steiner's 'ls' test since in this test only one cpu generates rcu entries. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
From: Manfred Spraul <manfred@colorfullife.com> Below is the one of my patches from my rcu lock update. Jack Steiner tested the first one on a 512p and it resolved the rcu cache line trashing. All were tested on osdl with STP. Step one for reducing cacheline trashing within rcupdate.c: The current code uses the rcu_cpu_mask bitmap both for keeping track of the cpus that haven't gone through a quiescent state and for checking if a cpu should look for quiescent states. The bitmap is frequently changed and the check is done by polling - together this causes cache line trashing. If it's cheaper to access a (mostly) read-only cacheline than a cacheline that is frequently dirtied, then it's possible to reduce the trashing by splitting the rcu_cpu_mask bitmap into two cachelines: The patch adds a generation counter and moves it into a separate cacheline. This allows to removes all accesses to rcu_cpumask (in the read-write cacheline) from rcu_pending and at least 50% of the accesses from rcu_check_quiescent_state. rcu_pending and all but one call per cpu to rcu_check_quiescent_state access the read-only cacheline. Probably not enough for 512p, but it's a start, just for 128 byte more memory use, without slowing down rcu grace periods. Obviously the read-only cacheline is not really read-only: it's written once per grace period to indicate that a new grace period is running. Tests on an 8-way Pentium III with reaim showed some improvement: oprofile hits: Reference: http://khack.osdl.org/stp/293075/ Hits % 23741 0.0994 rcu_pending 19057 0.0798 rcu_check_quiescent_state 6530 0.0273 rcu_check_callbacks Patched: http://khack.osdl.org/stp/293076/ 8291 0.0579 rcu_pending 5475 0.0382 rcu_check_quiescent_state 3604 0.0252 rcu_check_callbacks The total runtime differs between both runs, thus the % number must be compared: Around 50% faster. I've uninlined rcu_pending for the test. Tested with reaim and kernbench. Description: - per-cpu quiescbatch and qs_pending fields introduced: quiescbatch contains the number of the last quiescent period that the cpu has seen and qs_pending is set if the cpu has not yet reported the quiescent state for the current period. With these two fields a cpu can test if it should report a quiescent state without having to look at the frequently written rcu_cpu_mask bitmap. - curbatch split into two fields: rcu_ctrlblk.batch.completed and rcu_ctrlblk.batch.cur. This makes it possible to figure out if a grace period is running (completed != cur) without accessing the rcu_cpu_mask bitmap. - rcu_ctrlblk.maxbatch removed and replaced with a true/false next_pending flag: next_pending=1 means that another grace period should be started immediately after the end of the current period. Previously, this was achieved by maxbatch: curbatch==maxbatch means don't start, curbatch!= maxbatch means start. A flag improves the readability: The only possible values for maxbatch were curbatch and curbatch+1. - rcu_ctrlblk split into two cachelines for better performance. - common code from rcu_offline_cpu and rcu_check_quiescent_state merged into cpu_quiet. - rcu_offline_cpu: replace spin_lock_irq with spin_lock_bh, there are no accesses from irq context (and there are accesses to the spinlock with enabled interrupts from tasklet context). - rcu_restart_cpu introduced, s390 should call it after changing nohz: Theoretically the global batch counter could wrap around and end up at RCU_quiescbatch(cpu). Then the cpu would not look for a quiescent state and rcu would lock up. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-