- 18 Aug, 2010 17 commits
-
-
Nick Piggin authored
fs: scale files_lock Improve scalability of files_lock by adding per-cpu, per-sb files lists, protected with an lglock. The lglock provides fast access to the per-cpu lists to add and remove files. It also provides a snapshot of all the per-cpu lists (although this is very slow). One difficulty with this approach is that a file can be removed from the list by another CPU. We must track which per-cpu list the file is on with a new variale in the file struct (packed into a hole on 64-bit archs). Scalability could suffer if files are frequently removed from different cpu's list. However loads with frequent removal of files imply short interval between adding and removing the files, and the scheduler attempts to avoid moving processes too far away. Also, even in the case of cross-CPU removal, the hardware has much more opportunity to parallelise cacheline transfers with N cachelines than with 1. A worst-case test of 1 CPU allocating files subsequently being freed by N CPUs degenerates to contending on a single lock, which is no worse than before. When more than one CPU are allocating files, even if they are always freed by different CPUs, there will be more parallelism than the single-lock case. Testing results: On a 2 socket, 8 core opteron, I measure the number of times the lock is taken to remove the file, the number of times it is removed by the same CPU that added it, and the number of times it is removed by the same node that added it. Booting: locks= 25049 cpu-hits= 23174 (92.5%) node-hits= 23945 (95.6%) kbuild -j16 locks=2281913 cpu-hits=2208126 (96.8%) node-hits=2252674 (98.7%) dbench 64 locks=4306582 cpu-hits=4287247 (99.6%) node-hits=4299527 (99.8%) So a file is removed from the same CPU it was added by over 90% of the time. It remains within the same node 95% of the time. Tim Chen ran some numbers for a 64 thread Nehalem system performing a compile. throughput 2.6.34-rc2 24.5 +patch 24.9 us sys idle IO wait (in %) 2.6.34-rc2 51.25 28.25 17.25 3.25 +patch 53.75 18.5 19 8.75 So significantly less CPU time spent in kernel code, higher idle time and slightly higher throughput. Single threaded performance difference was within the noise of microbenchmarks. That is not to say penalty does not exist, the code is larger and more memory accesses required so it will be slightly slower. Cc: linux-kernel@vger.kernel.org Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Nick Piggin authored
lglock: introduce special lglock and brlock spin locks This patch introduces "local-global" locks (lglocks). These can be used to: - Provide fast exclusive access to per-CPU data, with exclusive access to another CPU's data allowed but possibly subject to contention, and to provide very slow exclusive access to all per-CPU data. - Or to provide very fast and scalable read serialisation, and to provide very slow exclusive serialisation of data (not necessarily per-CPU data). Brlocks are also implemented as a short-hand notation for the latter use case. Thanks to Paul for local/global naming convention. Cc: linux-kernel@vger.kernel.org Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Nick Piggin authored
tty: fix fu_list abuse tty code abuses fu_list, which causes a bug in remount,ro handling. If a tty device node is opened on a filesystem, then the last link to the inode removed, the filesystem will be allowed to be remounted readonly. This is because fs_may_remount_ro does not find the 0 link tty inode on the file sb list (because the tty code incorrectly removed it to use for its own purpose). This can result in a filesystem with errors after it is marked "clean". Taking idea from Christoph's initial patch, allocate a tty private struct at file->private_data and put our required list fields in there, linking file and tty. This makes tty nodes behave the same way as other device nodes and avoid meddling with the vfs, and avoids this bug. The error handling is not trivial in the tty code, so for this bugfix, I take the simple approach of using __GFP_NOFAIL and don't worry about memory errors. This is not a problem because our allocator doesn't fail small allocs as a rule anyway. So proper error handling is left as an exercise for tty hackers. [ Arguably filesystem's device inode would ideally be divorced from the driver's pseudo inode when it is opened, but in practice it's not clear whether that will ever be worth implementing. ] Cc: linux-kernel@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Nick Piggin authored
fs: cleanup files_lock locking Lock tty_files with a new spinlock, tty_files_lock; provide helpers to manipulate the per-sb files list; unexport the files_lock spinlock. Cc: linux-kernel@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Nick Piggin authored
fs: remove extra lookup in __lookup_hash Optimize lookup for create operations, where no dentry should often be common-case. In cases where it is not, such as unlink, the added overhead is much smaller than the removed. Also, move comments about __d_lookup racyness to the __d_lookup call site. d_lookup is intuitive; __d_lookup is what needs commenting. So in that same vein, add kerneldoc comments to __d_lookup and clean up some of the comments: - We are interested in how the RCU lookup works here, particularly with renames. Make that explicit, and point to the document where it is explained in more detail. - RCU is pretty standard now, and macros make implementations pretty mindless. If we want to know about RCU barrier details, we look in RCU code. - Delete some boring legacy comments because we don't care much about how the code used to work, more about the interesting parts of how it works now. So comments about lazy LRU may be interesting, but would better be done in the LRU or refcount management code. Signed-off-by: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Nick Piggin authored
fs: fs_struct rwlock to spinlock struct fs_struct.lock is an rwlock with the read-side used to protect root and pwd members while taking references to them. Taking a reference to a path typically requires just 2 atomic ops, so the critical section is very small. Parallel read-side operations would have cacheline contention on the lock, the dentry, and the vfsmount cachelines, so the rwlock is unlikely to ever give a real parallelism increase. Replace it with a spinlock to avoid one or two atomic operations in typical path lookup fastpath. Signed-off-by: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Nick Piggin authored
apparmor: use task path helpers Signed-off-by: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Nick Piggin authored
fs: dentry allocation consolidation There are 2 duplicate copies of code in dentry allocation in path lookup. Consolidate them into a single function. Signed-off-by: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Nick Piggin authored
fs: fix do_lookup false negative In do_lookup, if we initially find no dentry, we take the directory i_mutex and re-check the lookup. If we find a dentry there, then we revalidate it if needed. However if that revalidate asks for the dentry to be invalidated, we return -ENOENT from do_lookup. What should happen instead is an attempt to allocate and lookup a new dentry. This is probably not noticed because it is rare. It is only reached if a concurrent create races in first (in which case, the dentry probably won't be invalidated anyway), or if the racy __d_lookup has failed due to a false-negative (which is very rare). Fix this by removing code and have it use the normal reval path. Signed-off-by: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Andreas Gruenbacher authored
Limit the maximum number of mb_cache entries depending on the number of hash buckets: if the only limit to the number of cache entries is the available memory the hash chains can grow very long, taking a long time to search. At least partially solves https://bugzilla.lustre.org/show_bug.cgi?id=22771. Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
we want the assignment to err done inside the if () to be visible after it, so (re)declaring err inside if () body is wrong. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
... not harmless in this case - we have a string in the end of buffer already. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Christoph Hellwig authored
These flags aren't real I/O types, but tell ll_rw_block to always lock the buffer instead of giving up on a failed trylock. Instead add a new write_dirty_buffer helper that implements this semantic and use it from the existing SWRITE* callers. Note that the ll_rw_block code had a bug where it didn't promote WRITE_SYNC_PLUG properly, which this patch fixes. In the ufs code clean up the helper that used to call ll_rw_block to mirror sync_dirty_buffer, which is the function it implements for compound buffers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Christoph Hellwig authored
Instead of abusing a buffer_head flag just add a variant of sync_dirty_buffer which allows passing the exact type of write flag required. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Jan Kara authored
generic_acl_set didn't update the ctime of the file when its permission was changed. Steps to reproduce: # touch aaa # stat -c %Z aaa 1275289822 # setfacl -m 'u::x,g::x,o::x' aaa # stat -c %Z aaa 1275289822 <- unchanged But, according to the spec of the ctime, vfs must update it. Port of ext3 patch by Miao Xie <miaox@cn.fujitsu.com>. CC: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Alexander Shishkin authored
Commit 77b8a75f introduced a warning at fs/inode.c:692 unlock_new_inode(), caused by unlock_new_inode() being called on existing inodes as well. This patch changes setup_inode() to only call unlock_new_inode() for I_NEW inodes. Signed-off-by: Alexander Shishkin <virtuoso@slind.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Sergey Senozhatsky authored
reiserfs_evict_inode calls end_writeback two times hitting kernel BUG at fs/inode.c:298 becase inode->i_state is I_CLEAR already. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
- 16 Aug, 2010 5 commits
-
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6Linus Torvalds authored
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: gcc-4.6: ACPI: fix unused but set variables in ACPI ACPI thermal: make procfs I/F depend on CONFIG_ACPI_PROCFS ACPI video: make procfs I/F depend on CONFIG_ACPI_PROCFS ACPI processor: remove deprecated ACPI procfs I/F ACPI power_resource: remove unused procfs I/F ACPI: remove deprecated ACPI procfs I/F ACPI: introduce drivers/acpi/sysfs.c ACPI: introduce module parameter acpi.aml_debug_output ACPI: introduce drivers/acpi/debugfs.c ACPI, APEI, ERST debug support ACPI, APEI, Manage GHES as platform devices ACPI, APEI, Rename CPER and GHES severity constants ACPI, APEI, Fix a typo of error path of apei_resources_request ACPI / ACPICA: Fix reference counting problems with GPE handlers ACPI: Add the check of ADR flag in course of finding ACPI handle for PCI device ACPI / Sleep: Drop acpi_suspend_finish() ACPI / Sleep: Consolidate suspend and hibernation routines ACPI / Wakeup: Simplify enabling of wakeup devices ACPI / Sleep: Rework enabling wakeup devices ACPI / Sleep: Free NVS copy if suspending of devices fails Fixed up totally buggered "ACPI: fix unused but set variables in ACPI" patch that doesn't even compile in the merge. Thanks to Sedat Dilek <sedat.dilek@googlemail.com> for noticing the breakage before I even pulled. And a big "Grrr.." at Len for not even bothering to compile the tree before asking me to pull.
-
git://git.infradead.org/iommu-2.6Linus Torvalds authored
* git://git.infradead.org/iommu-2.6: intel-iommu: Fix 32-bit build warning with __cmpxchg() intr-remap: allow disabling source id checking
-
git://git.infradead.org/mtd-2.6Linus Torvalds authored
* git://git.infradead.org/mtd-2.6: mtd/nand_ids: Fix buswidth mtd/m25p80: fix test for end of loop mtd/m25p80: retlen is never NULL MIPS: Fix gen_nand probe structures contents gen_nand: Test if nr_chips field is valid BFIN: Fix gen_nand probe structures contents nand/denali: move all hardware initialization work to denali_hw_init nand/denali: Add a page check in denali_read_page & denali_read_page_raw nand/denali: use cpu_relax() while waiting for hardware interrupt nand/denali: change read_status function method nand/denali: Fixed check patch warnings ARM: Fix gen_nand probe structures contents mtd/nand_base: fix kernel-doc warnings & typos nand/denali: use dev_xx debug function to replace nand_dbg_print and some printk nand/denali: Fixed handle ECC error bugs nand/denali: use iowrite32() to replace denali_write32() nand/denali: Fixed probe function bugs
-
git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tileLinus Torvalds authored
* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: arch/tile: don't validate CROSS_COMPILE needlessly arch/tile: export only COMMAND_LINE_SIZE to userspace. arch/tile: rename ARCH_KMALLOC_MINALIGN to ARCH_DMA_MINALIGN arch/tile: Rename the hweight() implementations to __arch_hweight() arch/tile: extend syscall ABI to set r1 on return as well. arch/tile: Various cleanups. arch/tile: support backtracing on TILE-Gx arch/tile: Fix a couple of issues with the COMPAT code for TILE-Gx. arch/tile: Use separate, better minsec values for clocksource and sched_clock. arch/tile: correct a bug in freeing bootmem by VA for the optional second initrd. arch: tile: mm: pgtable.c: Removed duplicated #include arch: tile: kernel/proc.c Removed duplicated #include Add fanotify syscalls to <asm-generic/unistd.h>. arch/tile: support new kunmap_atomic() naming convention. tile: remove unused ISA_DMA_THRESHOLD define Conflicts in arch/tile/configs/tile_defconfig (pick the mainline version with the reduced defconfig).
-
- 15 Aug, 2010 18 commits
-
-
Chris Metcalf authored
With this change, the arch/tile Makefile will only check for a valid combination of CROSS_COMPILE vs "uname -m" for a few common targets that are typically the ones we get wrong (vmlinux, all, and modules). The change handles the case of an empty "make" goal like "make all". Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
-
Linus Torvalds authored
This commit makes the stack guard page somewhat less visible to user space. It does this by: - not showing the guard page in /proc/<pid>/maps It looks like lvm-tools will actually read /proc/self/maps to figure out where all its mappings are, and effectively do a specialized "mlockall()" in user space. By not showing the guard page as part of the mapping (by just adding PAGE_SIZE to the start for grows-up pages), lvm-tools ends up not being aware of it. - by also teaching the _real_ mlock() functionality not to try to lock the guard page. That would just expand the mapping down to create a new guard page, so there really is no point in trying to lock it in place. It would perhaps be nice to show the guard page specially in /proc/<pid>/maps (or at least mark grow-down segments some way), but let's not open ourselves up to more breakage by user space from programs that depends on the exact deails of the 'maps' file. Special thanks to Henrique de Moraes Holschuh for diving into lvm-tools source code to see what was going on with the whole new warning. Reported-and-tested-by: François Valenduc <francois.valenduc@tvcablenet.be Reported-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6Linus Torvalds authored
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: ALSA: sound/usb/format: silence uninitialized variable warnings MAINTAINERS: Add Ian Lartey as comaintaner for Wolfson devices MAINTAINERS: Make Wolfson entry also cover CODEC drivers ASoC: Only tweak WM8994 chip configuration on devices up to rev D ASoC: Optimise DSP performance for WM8994 ALSA: hda - Fix dynamic ADC change working again ALSA: hda - Restrict PCM parameters per ELD information over HDMI sound: oss: sh_dac_audio.c removed duplicated #include
-
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6Linus Torvalds authored
* 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6: intel_idle: recognize Lincroft Atom Processor intel_idle: no longer EXPERIMENTAL intel_idle: disable module support intel_idle: add support for Westmere-EX intel_idle: delete power_policy modparam, and choose substate functions intel_idle: delete substates DEBUG modparam
-
Chris Metcalf authored
This fixes a failure in "make headers_check" for tile. I hadn't realized this file was exported to userspace by default. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
-
Chris Metcalf authored
See commit a6eb9fe1. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
-
Takashi Iwai authored
-
Takashi Iwai authored
-
Dan Carpenter authored
Gcc complains that ret might be used uninitialized: sound/usb/format.c: In function ‘snd_usb_parse_audio_format’: sound/usb/format.c:354: warning: ‘ret’ may be used uninitialized in this function sound/usb/format.c:354: note: ‘ret’ was declared here sound/usb/format.c:414: warning: ‘ret’ may be used uninitialized in this function sound/usb/format.c:414: note: ‘ret’ was declared here I suppose it could be uninitialized if there is ever a UAC_VERSION_3 released. Anyway this patch is worthwhile if only to silence the gcc warning. Signed-off-by: Dan Carpenter <error27@gmail.com> Acked-by: Daniel Mack <daniel@caiaq.de> Signed-off-by: Takashi Iwai <tiwai@suse.de>
-
Len Brown authored
Conflicts: drivers/acpi/debug.c Signed-off-by: Len Brown <len.brown@intel.com>
-
Andi Kleen authored
Some minor improvements in error handling, but overall it was mostly dead code. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>
-
Zhang Rui authored
Mark the ACPI thermal procfs I/F deprecated, because /sys/class/thermal/ is already available and has been working for years w/o any problem. The ACPI thermal procfs I/F will be removed in 2.6.37. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
-
Zhang Rui authored
Mark ACPI video driver procfs I/F deprecated, including: /proc/acpi/video/*/info /proc/acpi/video/*/DOS /proc/acpi/video/*/ROM /proc/acpi/video/*/POST /proc/acpi/video/*/POST_info /proc/acpi/video/*/*/info /proc/acpi/video/*/*/state /proc/acpi/video/*/*/EDID and /proc/acpi/video/*/*/brightness, because 1. we already have the sysfs I/F /sysclass/backlight/ as the replacement of /proc/acpi/video/*/*/brightness. 2. the other procfs I/F is not useful for userspace. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
-
Zhang Rui authored
Remove deprecated ACPI processor procfs I/F, including: /proc/acpi/processor/CPUX/power /proc/acpi/processor/CPUX/limit /proc/acpi/processor/CPUX/info /proc/acpi/processor/CPUX/throttling still exists, as we don't have sysfs I/F available for now. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
-
Zhang Rui authored
Remove unused ACPI power procfs I/F. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
-
Zhang Rui authored
Rmove deprecated ACPI procfs I/F, including /proc/acpi/debug_layer /proc/acpi/debug_level /proc/acpi/info /proc/acpi/dsdt /proc/acpi/fadt /proc/acpi/sleep because the sysfs I/F is already available and has been working well for years. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
-
Zhang Rui authored
Introduce drivers/acpi/sysfs.c. code for ACPI sysfs I/F, including #ifdef ACPI_DEBUG /sys/module/acpi/parameters/debug_layer /sys/module/acpi/parameters/debug_level /sys/module/acpi/parameters/trace_method_name /sys/module/acpi/parameters/trace_debug_layer /sys/module/acpi/parameters/trace_debug_level /sys/module/acpi/parameters/trace_state #endif /sys/module/acpi/parameters/acpica_version /sys/firmware/acpi/tables/ /sys/firmware/acpi/interrupts/ is moved to this file. No function change in this patch. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
-
Len Brown authored
-