- 29 Dec, 2003 40 commits
-
-
Andrew Morton authored
From: Manfred Spraul <manfred@colorfullife.com> attached is the lockless semop patch. I did another test run with idle=poll on an pentium III, and it remained unchanged: 99.9% direct fast path, 0.1% race with wakeup against writing the final result code: http://khack.osdl.org/stp/282936/environment/proc/slabinfo That means there is no immediate need to add the two-stage implementation to finish_wait. It reduces the spinlock operations on the semaphore array spinlock by 1/3.
-
Andrew Morton authored
From: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Current writev() of pipe/fifo can be interleaved with data from other processes doing writes even when the requests size is <= PIPE_BUF. These writes should in fact be atomic. The readv() side is also supported for same behavior with read(). And it is faster. readv/writev version of bw_pipe in LMbench 2.6.0-test9-bk12 hirofumi@devron (i686-pc-linux-gnu)[1010]$ ./bw_pipe -m 4096 -M 5 Pipe bandwidth: 45.53 MB/sec hirofumi@devron (i686-pc-linux-gnu)[1009]$ ./bw_pipe -m 1024 -M 5 Pipe bandwidth: 20.08 MB/sec 2.6.0-test9-bk12 + patch hirofumi@devron (i686-pc-linux-gnu)[1001]$ ./bw_pipe -m 4096 -M 5 Pipe bandwidth: 65.98 MB/sec hirofumi@devron (i686-pc-linux-gnu)[1002]$ ./bw_pipe -m 1024 -M 5 Pipe bandwidth: 32.19 MB/sec
-
Andrew Morton authored
From: Manfred Spraul <manfred@colorfullife.com> The memmove implementation of i386 is not optimized: it uses movsb, which is far slower than movsd. The optimization is trivial: if dest is less than source, then call memcpy(). markw tried it on a 4xXeon with dbt2, it saved around 300 million cpu ticks in cache_flusharray(): oprofile, GLOBAL_POWER_EVENTS, count 100k Before: c0144ed1 <cache_flusharray>: /* cache_flusharray total: 21823 0.0165 */ 6 4.5e-06 :c0144f8e: cmp %esi,%ebx 11 8.3e-06 :c0144f90: jae c0144f9e <cache_flusharray+0xcd> 3 2.3e-06 :c0144f92: mov %ebx,%edi 7305 0.0055 :c0144f94: repz movsb %ds:(%esi),%es:(%edi) 201 1.5e-04 :c0144f96: add $0x10,%esp After: c0144f1d <cache_flusharray>: /* cache_flusharray total: 17959 0.0136 */ 1270 9.6e-04 :c0144f1d: push %ebp [snip] 6 4.6e-06 :c0144fdc: cmp %esi,%ebx 13 9.9e-06 :c0144fde: jae c0145000 <cache_flusharray+0xe3> 2 1.5e-06 :c0144fe0: mov %edx,%eax 1 7.6e-07 :c0144fe2: mov %ebx,%edi 11 8.4e-06 :c0144fe4: shr $0x2,%eax 1 7.6e-07 :c0144fe7: mov %eax,%ecx 4129 0.0031 :c0144fe9: repz movsl %ds:(%esi),%es:(%edi) 261 2.0e-04 :c0144feb: test $0x2,%dl 27 2.1e-05 :c0144fee: je c0144ff2 <cache_flusharray+0xd5> :c0144ff0: movsw %ds:(%esi),%es:(%edi) 95 7.2e-05 :c0144ff2: test $0x1,%dl 96 7.3e-05 :c0144ff5: je c0144ff8 <cache_flusharray+0xdb> :c0144ff7: movsb %ds:(%esi),%es:(%edi) 121 9.2e-05 :c0144ff8: add $0x1c,%esp
-
Andrew Morton authored
From: jbarnes@sgi.com (Jesse Barnes) Now that we have a proper NODES_SHIFT value, we need to use it to define ZONE_SHIFT otherwise we'll spill over 8 bits if we have more than 85 nodes.
-
Andrew Morton authored
From: Paul Jackson <pj@sgi.com> The patch is needed to build NR_CPUS > 256. Without this fix, you get compile errors: include/linux/cpumask.h: In function `next_online_cpu': include/linux/cpumask.h:56: structure has no member named `val'
-
Andrew Morton authored
From: Zwane Mwaikambo <zwane@arm.linux.org.uk> Make the test unconditional - we can always run it now we have fixmap support.
-
Andrew Morton authored
The seq_file conversion of /proc/pid/maps caused altered behaviour with respect to 2.4.22. Before the conversion, spaces and tabs in filenames were displayed verbatim. After the conversion they are escaped as \040, etc. Also, if the mmapped file has been unlinked the output appears as 40017000-40018000 rw-p 00000000 03:02 1425800 /home/akpm/foo\040(deleted) instead of 40017000-40018000 rw-p 00000000 03:02 1425800 /home/akpm/foo (deleted) This could break applications which parse /proc/pid/maps (one person has reported this). The patch restores the 2.4.20 behaviour.
-
Andrew Morton authored
From: "Bryan O'Sullivan" <bos@pathscale.com> The current version of modpost breaks if invoked from outside the build tree. This patch fixes that, and simplifies the code a bit while it's at it.
-
Andrew Morton authored
From: john stultz <johnstul@us.ibm.com> The patch arranges for each timesource type to have a name, and uses that to tell the user which timesource is in use at bootup time.
-
Andrew Morton authored
zone->refill_counter is only there to provide decent levels of work batching: don't call refill_inactive_zone() just for a couple of pages. But the logic in there allows it to build up to huge values and it can overflow (go negative) which will disable refilling altogether until it wraps positive again. Just reset it to zero whenever we decide to do some refilling.
-
Andrew Morton authored
From: Bjorn Helgaas <bjorn.helgaas@hp.com> uart_set_options() can dereference a null pointer. This happens if you specify a console that hasn't previously been setup by early_serial_setup(). For example, on ia64, the HCDP typically tells us about line 0, so we calls early_serial_setup() for it. If the user specifies "console=ttyS3", we machine-check when trying to follow the uninitialized port->ops pointer. It's not entirely clear to me whether we should return 0 or -ENODEV or something. The advantage of returning zero is that if the user specifies "console=ttyS0" and we just lack the HCDP, the console doesn't work as early as usual, but it does start working after the serial driver detects the port (though the baud/parity/etc from the command line are lost). Returning -ENODEV seems to prevent it from ever working.
-
Andrew Morton authored
From: Brian Gerst <bgerst@didntduck.org> The current code disables sysenter when first entering vm86 mode, but does not disable it again when coming back to a vm86 task after a task switch.
-
Andrew Morton authored
From: Adrian Bunk <bunk@fs.tum.de> Allow the kernel to be built with `-Os'. It requires CONFIG_EMBEDDED. This is to make it "hard to get at" because one gcc version (3.2.x I think) from RH9 generates crashy kernels with this option set.
-
Andrew Morton authored
From: Manfred Spraul <manfred@colorfullife.com> Fixes a race between proc_pid_lookup and sys_exit. - The inodes and dentries for /proc/<pid>/whatever are cached in the dentry cache. d_revalidate is used to protect against stale data: d_revalidate returns invalid if the task exited. Additionally, sys_exit flushes the dentries for the task that died - otherwise the dentries would stay around until they arrive at the end of the LRU, which could take some time. But there is one race: - proc_pid_lookup finds a task and prepares new dentries for it. It must drop all locks for that operation. - the process exits, and the /proc/ dentries are flushed. Nothing happens, because they are not yet in the hash tables. - proc_pid_lookup adds the task to the dentry cache. Result: dentry of a dead task in the hash tables. The patch fixes that problem by flushing again if proc_pid_lookup notices that the thread exited while it created the dentry. The patch should go in, but it's not critical. - task->proc_dentry must be the dentry of /proc/<pid>. That way sys_exit can flush the whole subtree at exit time. proc_task_lookup is a direct copy of proc_pid_lookup and handles /proc/<>/task/<pid>. It contains the lines that set task->proc_dentry. This is bogus, and must be removed. This hunk is much more critical, because creates a de-facto dentry leak (they are recovered after flushing real dentries from the cache).
-
Andrew Morton authored
From: Russell King <rmk@arm.linux.org.uk> This oops has been caused by the need to register the class before registering any objects against it. Unfortunately, the class needs to be registered asynchronously in a separate thread to avoid driver model deadlock with yenta with cardbus cards inserted or standard PCMCIA cards not being detected correctly due to a race. I think the only real solution is to remove the class_device_create_file calls from all socket drivers. This is just a simple commenting out of the calls, and should be suitable for the remainder of the -test kernels. Due to the number of cases that we're encountering with PCMCIA, I'm beginning to wonder if the driver model could be fixed to be more kind to PCMCIA by avoiding some of these ordering dependencies. None of this would be a problem if the driver model would allow PCI device drivers to register PCI devices while their probe or remove functions were executing.
-
Andrew Morton authored
From: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> - use "select" instead of "depend" - remove the unused SMB_NLS - remove unneeded "default y" of CONFIG_NLS - revert to postion of nls menu (middle of filessytem menus is strange) - fix "#ifdef CONFIG_NLS" on UDF (should this add new one to Kconfig?)
-
Andrew Morton authored
This fixes the recently-reported "fsstress memory leak" problem. It has been there since November 2002. shrink_dcache() has a heuristic to prevent the dcache (and hence icache) from getting shrunk too far: it refuses to allow the dcache to shrink below 2*nr_used. Problem is, _all_ non-leaf dentries (directories) count as used. So when you have really deep directory hierarchies (fsstress creates these), nr_used is really high, and there is no upper bound to the amount of pinned dcache. The patch just rips out the heuristic. This means that dcache (and hence icache (and hence pagecache)) will be shrunk more aggressively. This could be a problem, and tons of testing is needed - a new heuristic may be needed. However I am not able to reproduce the problem which cause me to add this heuristic in the first place: Simple testcase: run a huge `dd' while running a concurrent `watch -n1 cat /proc/meminfo'. The program text for `cat' gets loaded from disk once per second.
-
Andrew Morton authored
It is doing a set_fs(KERNEL_DS) for no obvious reason. Spotted by margitsw@t-online.de (Margit Schubert-While)
-
Andrew Morton authored
Sometimes kjournald has to refile a huge number of buffers, because someone else wrote them out beforehand - they are all clean. This happens under a lock and scheduling latencies of 88 milliseconds on a 2.7GHx CPU were observed. The patch forward-ports a little bit of the 2.4 low-latency patch to fix this problem. Worst-case on ext3 is now sub-half-millisecond, except for when the RCU dentry reaping softirq cuts in :(
-
Andrew Morton authored
It calls __init functions anyway.
-
Andrew Morton authored
The cdrom driver does an order-4 allocation and the open will fail if that allocation does not succeed. This happened to me on an unstressed 900MB machine. So add the __GFP_REPEAT flag in there - this will cause the page allocator to keep on freeing pages until the allocation succeeds. It can in theory livelock but in practice I expect it is OK: the user should just stop running dbench or whatever it is which is gobbling all the memory and the mount/open will then succeed.
-
Andrew Morton authored
This tunable refers to the amount of free memory which the VM will attempt to sustain. It is mainly needed for atomic allocations (eg, networking receive). It is currently hardwired to 1024k, which is far too large for small machines and too small for large machines. Rework it to be 128k on tiny machines and 16M on huge machines.
-
Andrew Morton authored
It turns out that the int_sqrt() function in oom_kill.c gets it wrong. But fb_sqrt() in fbmon.c gets its math right. Move that function into lib/int_sqrt.c, and consolidate. (oom_kill.c fix from Thomas Schlichter <schlicht@uni-mannheim.de>)
-
Andrew Morton authored
From: Benjamin Herrenschmidt <benh@kernel.crashing.org> I needed those for the G5 on ppc64, so here they are, I was only able to test the SMBUS stuff though.
-
Andrew Morton authored
From: Matt Tolentino <metolent@snoqualmie.dp.intel.com> Attached is a patch that enables EFI boot-up support in ia32 kernels. In order to continue to determine whether the kernel should initialize using EFI tables, I've temporarily added a check on the LOADER_TYPE boot parameter. Although I haven't requested that elilo be assigned an id for this yet, I've used this to determine whether the kernel should use the EFI initialization path as well as a check to see if the EFI_SYSTAB boot parameter contains anything. If someone has a better suggestion for determining this, I'm open... This patch also uses the existing ioremapping functions to map the efi tables into kernel virtual address space. I've added an option such that I could use Dave Hansen's boot_ioremap() before paging_init(). After paging_init, I then remap the efi memmap using bt_ioremap for use later. This has eliminated the need for several functions...thanks for the suggestions and thanks for your help Dave. Still this could use a look-see.
-
Andrew Morton authored
From: long <tlnguyen@snoqualmie.dp.intel.com> Add support for Message Signalled Interrupt delivery on ia32. With a fix from Zwane Mwaikambo <zwane@arm.linux.org.uk>
-
Andrew Morton authored
text data bss dec hex filename Before: 4674 1040 4100 9814 2656 kernel/futex.o After: 4098 1176 4100 9374 249e kernel/futex.o
-
Andrew Morton authored
From: Chris Wright <chrisw@osdl.org> Fix for CAN-2003-0461: /proc/tty/driver/serial in Linux 2.4.x reveals the exact number of characters used in serial links, which could allow local users to obtain potentially sensitive information such as the length of passwords.
-
Andrew Morton authored
From: Chris Wright <chrisw@osdl.org> Fix for CAN-2003-0501: The /proc filesystem in Linux allows local users to obtain sensitive information by opening various entries in /proc/self before executing a setuid program, which causes the program to fail to change the ownership and permissions of those entries.
-
Andrew Morton authored
From: Chris Wright <chrisw@osdl.org> Fix for CAN-2003-0462: A race condition in the way env_start and env_end pointers are initialized in the execve system call and used in fs/proc/base.c on Linux 2.4 allows local users to cause a denial of service (crash).
-
Andrew Morton authored
From: Chris Wright <chrisw@osdl.org> Use the new steal_locks helper to steal the locks from the old files struct left from unshare_files() when the new unshared struct files gets used.
-
Andrew Morton authored
From: Chris Wright <chrisw@osdl.org> Add steal_locks helper for use in conjunction with unshare_files to make sure POSIX file lock semantics aren't broken due to unshare_files.
-
Andrew Morton authored
From: Chris Wright <chrisw@osdl.org> Use unshare_files during binary loading to eliminate potential leak of the binary's fd installed during execve(). As is, this breaks binfmt_som.c
-
Andrew Morton authored
From: Chris Wright <chrisw@osdl.org> Introduce unshare_files as a helper for use during execve to eliminate potential leak of the execve'd binary's fd.
-
bk://kernel.bkbits.net/davem/compat-aio-2.6Linus Torvalds authored
into home.osdl.org:/home/torvalds/v2.5/linux
-
David S. Miller authored
into kernel.bkbits.net:/home/davem/compat-aio-2.5
-
bk://bk.arm.linux.org.uk/linux-2.6-expLinus Torvalds authored
into home.osdl.org:/home/torvalds/v2.5/linux
-
bk://kernel.bkbits.net/davem/sparc-2.5Linus Torvalds authored
into home.osdl.org:/home/torvalds/v2.5/linux
-
bk://linuxusb.bkbits.net/usb-devel-2.6Linus Torvalds authored
into home.osdl.org:/home/torvalds/v2.5/linux
-
David S. Miller authored
into kernel.bkbits.net:/home/davem/sparc-2.5
-