- 29 Dec, 2003 40 commits
-
-
Andrew Morton authored
proc_kill_inodes() walks the s_files list, playing with ->f_dentry. But there is a window in which __fput() will leave a file on that list with a null f_dentry and f_vfsmnt. I'm not sure it was ever confirmed that this fixed the reported oops, but it seems much better to set those fields to null _after_ removing the filp from the list. (Actually, there's no need to null those pointers out at all. But whatever; it caught a bug).
-
Andrew Morton authored
From: William Lee Irwin III <wli@holomorphy.com> Our accounting of minor faults versus major faults is currently quite wrong. To fix it up we need to propagate the actual fault type back to the higher-level code. Repurpose the currently-unused third arg to ->nopage for this.
-
Andrew Morton authored
From: James Morris <jmorris@redhat.com> The patch below removes the CLONE_FILES flag from the kernel_thread() call which starts init. This is to prevent other kernel threads from sharing file descriptors opened by init (try 'lsof /dev/initctl' on a 2.6 system :-). The reason this patch is being proposed is so that usermode helper apps launched via kernel threads (e.g. modprobe, hotplug) do not then inherit any such file descriptors. This is not a problem in itself so far (other than being messy), but it is a problem for SELinux, which will otherwise need to grant access to /dev/initctl by modprobe and hotplug, a somewhat undesirable scenario. As far as I can tell, there is no reason why init needs to be spawned with CLONE_FILES. Please let me know if there are any objections to the change, which I would like to propose for 2.6.0+ as a cleanup.
-
Andrew Morton authored
From: Aniket Malatpure <aniket@sgi.com> Adds support for the IOC4 IDE part.
-
Andrew Morton authored
From: Paul Jackson <pj@sgi.com> This patch is a followup to one from Bill Irwin. On Nov 17, he had consolidated the half-dozen chunks of code that displayed cpumasks in /proc/irq/prof_cpu_mask and /proc/irq/<pid>/smp_affinity into a single routine, which he called format_cpumask(). I believe that Andrew Morton has accepted Bill's patch into his 2.6.0-test10-mm1 patch set as the "format_cpumask" patch. I hope that the following patch will replace Bill's patch. I look forward to Bill's feedback on this patch. The following patch carries Bill's work further: 1) It also consolidates the input side (write syscalls). 2) It adapts a new format, same on input and output. 3) The core routines work for any multi-word bitmask, not just cpumasks. 4) The core routines avoid overrunning their output buffers. Note esp. for David Mosberger: The small patch I sent you and the linux-ia64 list yesterday entitled: "check user access ok writing /proc/irq/<pid>/smp_affinity" for arch ia64 only is _separate_ from the following patch. Neither presumes the other. However, they do collide on one line. Last one in is a Monkey's Uncle and will need an updated patch from me (or otherwise need to resolve the one obvious collision). Details of the following patch: Both the display and input of cpumasks on 9 arch's are consolidated into a single pair of routines, which use the same format for input and output, as recommended by Tony Luck. The two common routines work on any multi-word bitmask (array of unsigned longs). A pair of trivial inline wrappers cpumask_snprintf() and cpumask_parse() hide this generality for the common case of cpumask input and output. My real motivation for consolidating this code will become visible later - when I seek to add a nodemask_t that resembles cpumask_t (just a different length). These common underlying routines will be used there as well, following up on a suggestion of Christoph Hellwig that I investigate implementing nodemask_t as an ADT sharing infrastructure with cpumask_t. However, I believe that this patch stands on its own merit, consolidating a couple hundred lines of duplicated code, and making the cpumask display format usable on very large systems. There are two exceptions to the consolidation - the alpha and sparc64 arch's manipulate bare unsigned longs, not cpumask_t's, on input (write syscall), and do stuff that was more funky than I could make sense of. So the input side of these two arch's was left as-is. I'd welcome someone with access to either of these systems to provide additional patches. The new format consists of multiple 32 bit words, separated by commas, displayed and input in hex. The following comment from this patch describes this format further: * The ascii representation of multi-word bit masks displays each * 32bit word in hex (not zero filled), and for masks longer than * one word, uses a comma separator between words. Words are * displayed in big-endian order most significant first. And hex * digits within a word are also in big-endian order, of course. * * Examples: * A mask with just bit 0 set displays as "1". * A mask with just bit 127 set displays as "80000000,0,0,0". * A mask with just bit 64 set displays as "1,0,0". * A mask with bits 0, 1, 2, 4, 8, 16, 32 and 64 set displays * as "1,1,10117". The first "1" is for bit 64, the second * for bit 32, the third for bit 16, and so forth, to the * "7", which is for bits 2, 1 and 0. * A mask with bits 32 through 39 set displays as "ff,0". The essential reason for adding the comma breaks was to make the long masks from our (SGI's) big 512 CPU systems parsable by humans. An unbroken string of 128 hex digits is pretty difficult to read. For those who are compiling systems with CONFIG_NR_CPUS of 32 or less, there should be no visible change in format. There are of course a thousand possible output formats that meet similar criteria. If someone wants to lobby for and seek consensus behind another such format, that's fine. Now that the format is consolidated into a single pair of routines, it should be easy to adapt whatever we choose. Internally, the display routine uses snprintf to track the remaining space in its output buffer, to avoid the risk of overrunning it. A new file, lib/mask.c, is added to the lib directory, to hold the two common routines. I anticipate adding a few more common routines for generic support of multi-word bit masks to lib/mask.c, in subsequent patches that will add a nodemask_t type as an ADT sharing implementation with cpumask_t.
-
Andrew Morton authored
From: Paul Jackson <pj@sgi.com> Push the cpumask implementation from linux/cpumask.h into asm/cpumask.h, so that ia64 can do special things without breaking sparc64. 1) Each arch has its own include/asm-<arch>/cpumask.h file 2) That arch-specific header file can include <asm-generic/cpumask.h>, if it wants to make use of the generic cpumask implementation. 3) Using code should continue to include linux/cpumask.h, which in turn includes asm/cpumask.h. Some common implementation independent cpumask related items, such as the cpu_online_map, are declared directly in linux/cpumask.h.
-
Andrew Morton authored
From: Will Dyson <will_dyson@pobox.com> Add documentation and comments to lib/parser.c and include/linux/parser.h
-
Andrew Morton authored
From: Alan Cox <alan@redhat.com> Capability elevation bug in 2.6.0 IDE. Long fixed in 2.4.x, trivial to cure
-
Andrew Morton authored
From: Alan Cox <alan@redhat.com> IDE core code had the mmio==2 (ioremap) mode supported but two small changes had been missed for ide-dma.c. Without this fix mmio IDE controllers bomb if you have plenty of memory as it uses request_mem_region on an ioremap return.
-
Andrew Morton authored
From: Peter Chubb <peterc@gelato.unsw.edu.au> If you try to disable IDE DMA from Kconfig, you'll end up with an undefined symbol, ide_hwif_setup_dma(). The attached rather ugly patch fixes the problem by defining a dummy function.
-
Andrew Morton authored
From: Peter Chubb <peterc@gelato.unsw.edu.au> The PIIX5 IDE controller on I2000 IA64 boxen using the 460GX chipset will hang on startup if an ordinary harddrive is plugged into it (it seems to workj for the LSI120 and the CDROM drives). This is because the 460GX chipset contains a PCI expanssion bridge that works like the 450NX PXB, and has the same PCI ID (but a later revision). The PIIX driver, to work around interactions between PIIX4 and the 450NX PXB, tries to disable DMA. Unfortunately, the way it tries to disable DMA doesn't work, and the higher layers think that DMA is still on, and so timeout waiting for DMA, and then hang on bootup. A simple workaround is to tighten the check for the buggy chipset, as in the attached patch. However, someone with more time (and who actually *understands* the IDE subsystem) needs to fix the real bug as well.
-
Andrew Morton authored
From: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl>, Stuart Hayes <stuart_hayes@dell.com> - Check drive's write protect bit, try to return appropriate errors when attempting to write a write-protected tape. - Moved "idetape_read_position" call in idetape_chrdev_open after the "wait_ready" call. - Added IDETAPE_MEDIUM_PRESENT flag so driver would know not to rewind tape after ejecting it. - Fixed bug with ide_abort_pipeline (it was deleting stages from tape->next_stage to end, instead of from new_last_stage->next (tape->next_stage was set to NULL by idetape_discard_read_pipeline before calling!). - Made improvements to idetape_wait_ready. - Added a few comments here and there. - Made MTOFFL unlock tape drive door before attempting to eject. - Added fixes to get Seagate STT3401A Travan working: Handle drives that don't support 0-length reads/writes increased timeout (retension takes ~10 minutes before irq is returned). Fixed request mode page packet command byte 3. Also remove code depending on NO_LONGER_REQUIRED to match 2.4.x (me).
-
Andrew Morton authored
From: Arun Sharma <arun.sharma@intel.com> - Several instances where we were using pid_t instead of uid_t - If the caller passed a NULL `oldact' pointer into sys_sigprocmask then don't try to write the old sigmask there.
-
Andrew Morton authored
From: gleb@nbase.co.il (Gleb Natapov) There is inconsistency in fops->write() implementation in different watchdog drivers. Some of them return number of bytes written while others return 1. I think the correct implementation should always return number of bytes written (we examine all the buffer after all) otherwise "echo V > /dev/watchdog" doesn't work as expected (it doesn't stop watchdog).
-
Andrew Morton authored
From: Olaf Hering <olh@suse.de> We need to update `offset' here so that the subsequent push_pad() (which uses `offset') will do the right thing.
-
Andrew Morton authored
From: Valdis.Kletnieks@vt.edu Nick wrote a nice as-iosched.txt file, but apparently nobody updated the kernel-parameters.txt file...
-
Andrew Morton authored
From: Jeremy Fitzhardinge <jeremy@goop.org> I've been getting quite a lot of people mailing me about this CPU. It seems Toshiba has released a machine with it. It would be nice if this patch gets into a kernel soonish. It's very low-impact.
-
Andrew Morton authored
- Add missing PCI ID - Forward-port IRQ routing workaround from 2.4.
-
Andrew Morton authored
From: corbet@lwn.net (Jonathan Corbet) This converts all architectures' /proc/interrupts implementation over to seq_file. We need this for SMP machines with ridiculous numbers of CPUs and if you convert one arch, you have to convert them all...
-
Andrew Morton authored
From: Adrian Bunk <bunk@fs.tum.de> The legacy eicon driver in drivers/isdn/eicon is the old one and will be removed as soon as all features went to the new driver. Anyway this old driver was never meant to be non-module.
-
Andrew Morton authored
From: Adrian Bunk <bunk@fs.tum.de> the issue below is only a minor documentation fix, but it has confused me when configuring a kernel for such a card.
-
Andrew Morton authored
From: Ingo Molnar <mingo@elte.hu> Clarify a comment in the CPU scheduler.
-
Andrew Morton authored
From: Martin Hicks <mort@wildopensource.com> Once NR_CPUS exceeds about 300 ext2 and ext3 will not compile, because the percpu counters in the superblocks are so huge that they cannot be kmalloced. Fix this by converting the percpu_counter mechanism to use alloc_percpu() rather than an NR_CPUS-sized array.
-
Andrew Morton authored
From: Manfred Spraul <manfred@colorfullife.com> attached is the lockless semop patch. I did another test run with idle=poll on an pentium III, and it remained unchanged: 99.9% direct fast path, 0.1% race with wakeup against writing the final result code: http://khack.osdl.org/stp/282936/environment/proc/slabinfo That means there is no immediate need to add the two-stage implementation to finish_wait. It reduces the spinlock operations on the semaphore array spinlock by 1/3.
-
Andrew Morton authored
From: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Current writev() of pipe/fifo can be interleaved with data from other processes doing writes even when the requests size is <= PIPE_BUF. These writes should in fact be atomic. The readv() side is also supported for same behavior with read(). And it is faster. readv/writev version of bw_pipe in LMbench 2.6.0-test9-bk12 hirofumi@devron (i686-pc-linux-gnu)[1010]$ ./bw_pipe -m 4096 -M 5 Pipe bandwidth: 45.53 MB/sec hirofumi@devron (i686-pc-linux-gnu)[1009]$ ./bw_pipe -m 1024 -M 5 Pipe bandwidth: 20.08 MB/sec 2.6.0-test9-bk12 + patch hirofumi@devron (i686-pc-linux-gnu)[1001]$ ./bw_pipe -m 4096 -M 5 Pipe bandwidth: 65.98 MB/sec hirofumi@devron (i686-pc-linux-gnu)[1002]$ ./bw_pipe -m 1024 -M 5 Pipe bandwidth: 32.19 MB/sec
-
Andrew Morton authored
From: Manfred Spraul <manfred@colorfullife.com> The memmove implementation of i386 is not optimized: it uses movsb, which is far slower than movsd. The optimization is trivial: if dest is less than source, then call memcpy(). markw tried it on a 4xXeon with dbt2, it saved around 300 million cpu ticks in cache_flusharray(): oprofile, GLOBAL_POWER_EVENTS, count 100k Before: c0144ed1 <cache_flusharray>: /* cache_flusharray total: 21823 0.0165 */ 6 4.5e-06 :c0144f8e: cmp %esi,%ebx 11 8.3e-06 :c0144f90: jae c0144f9e <cache_flusharray+0xcd> 3 2.3e-06 :c0144f92: mov %ebx,%edi 7305 0.0055 :c0144f94: repz movsb %ds:(%esi),%es:(%edi) 201 1.5e-04 :c0144f96: add $0x10,%esp After: c0144f1d <cache_flusharray>: /* cache_flusharray total: 17959 0.0136 */ 1270 9.6e-04 :c0144f1d: push %ebp [snip] 6 4.6e-06 :c0144fdc: cmp %esi,%ebx 13 9.9e-06 :c0144fde: jae c0145000 <cache_flusharray+0xe3> 2 1.5e-06 :c0144fe0: mov %edx,%eax 1 7.6e-07 :c0144fe2: mov %ebx,%edi 11 8.4e-06 :c0144fe4: shr $0x2,%eax 1 7.6e-07 :c0144fe7: mov %eax,%ecx 4129 0.0031 :c0144fe9: repz movsl %ds:(%esi),%es:(%edi) 261 2.0e-04 :c0144feb: test $0x2,%dl 27 2.1e-05 :c0144fee: je c0144ff2 <cache_flusharray+0xd5> :c0144ff0: movsw %ds:(%esi),%es:(%edi) 95 7.2e-05 :c0144ff2: test $0x1,%dl 96 7.3e-05 :c0144ff5: je c0144ff8 <cache_flusharray+0xdb> :c0144ff7: movsb %ds:(%esi),%es:(%edi) 121 9.2e-05 :c0144ff8: add $0x1c,%esp
-
Andrew Morton authored
From: jbarnes@sgi.com (Jesse Barnes) Now that we have a proper NODES_SHIFT value, we need to use it to define ZONE_SHIFT otherwise we'll spill over 8 bits if we have more than 85 nodes.
-
Andrew Morton authored
From: Paul Jackson <pj@sgi.com> The patch is needed to build NR_CPUS > 256. Without this fix, you get compile errors: include/linux/cpumask.h: In function `next_online_cpu': include/linux/cpumask.h:56: structure has no member named `val'
-
Andrew Morton authored
From: Zwane Mwaikambo <zwane@arm.linux.org.uk> Make the test unconditional - we can always run it now we have fixmap support.
-
Andrew Morton authored
The seq_file conversion of /proc/pid/maps caused altered behaviour with respect to 2.4.22. Before the conversion, spaces and tabs in filenames were displayed verbatim. After the conversion they are escaped as \040, etc. Also, if the mmapped file has been unlinked the output appears as 40017000-40018000 rw-p 00000000 03:02 1425800 /home/akpm/foo\040(deleted) instead of 40017000-40018000 rw-p 00000000 03:02 1425800 /home/akpm/foo (deleted) This could break applications which parse /proc/pid/maps (one person has reported this). The patch restores the 2.4.20 behaviour.
-
Andrew Morton authored
From: "Bryan O'Sullivan" <bos@pathscale.com> The current version of modpost breaks if invoked from outside the build tree. This patch fixes that, and simplifies the code a bit while it's at it.
-
Andrew Morton authored
From: john stultz <johnstul@us.ibm.com> The patch arranges for each timesource type to have a name, and uses that to tell the user which timesource is in use at bootup time.
-
Andrew Morton authored
zone->refill_counter is only there to provide decent levels of work batching: don't call refill_inactive_zone() just for a couple of pages. But the logic in there allows it to build up to huge values and it can overflow (go negative) which will disable refilling altogether until it wraps positive again. Just reset it to zero whenever we decide to do some refilling.
-
Andrew Morton authored
From: Bjorn Helgaas <bjorn.helgaas@hp.com> uart_set_options() can dereference a null pointer. This happens if you specify a console that hasn't previously been setup by early_serial_setup(). For example, on ia64, the HCDP typically tells us about line 0, so we calls early_serial_setup() for it. If the user specifies "console=ttyS3", we machine-check when trying to follow the uninitialized port->ops pointer. It's not entirely clear to me whether we should return 0 or -ENODEV or something. The advantage of returning zero is that if the user specifies "console=ttyS0" and we just lack the HCDP, the console doesn't work as early as usual, but it does start working after the serial driver detects the port (though the baud/parity/etc from the command line are lost). Returning -ENODEV seems to prevent it from ever working.
-
Andrew Morton authored
From: Brian Gerst <bgerst@didntduck.org> The current code disables sysenter when first entering vm86 mode, but does not disable it again when coming back to a vm86 task after a task switch.
-
Andrew Morton authored
From: Adrian Bunk <bunk@fs.tum.de> Allow the kernel to be built with `-Os'. It requires CONFIG_EMBEDDED. This is to make it "hard to get at" because one gcc version (3.2.x I think) from RH9 generates crashy kernels with this option set.
-
Andrew Morton authored
From: Manfred Spraul <manfred@colorfullife.com> Fixes a race between proc_pid_lookup and sys_exit. - The inodes and dentries for /proc/<pid>/whatever are cached in the dentry cache. d_revalidate is used to protect against stale data: d_revalidate returns invalid if the task exited. Additionally, sys_exit flushes the dentries for the task that died - otherwise the dentries would stay around until they arrive at the end of the LRU, which could take some time. But there is one race: - proc_pid_lookup finds a task and prepares new dentries for it. It must drop all locks for that operation. - the process exits, and the /proc/ dentries are flushed. Nothing happens, because they are not yet in the hash tables. - proc_pid_lookup adds the task to the dentry cache. Result: dentry of a dead task in the hash tables. The patch fixes that problem by flushing again if proc_pid_lookup notices that the thread exited while it created the dentry. The patch should go in, but it's not critical. - task->proc_dentry must be the dentry of /proc/<pid>. That way sys_exit can flush the whole subtree at exit time. proc_task_lookup is a direct copy of proc_pid_lookup and handles /proc/<>/task/<pid>. It contains the lines that set task->proc_dentry. This is bogus, and must be removed. This hunk is much more critical, because creates a de-facto dentry leak (they are recovered after flushing real dentries from the cache).
-
Andrew Morton authored
From: Russell King <rmk@arm.linux.org.uk> This oops has been caused by the need to register the class before registering any objects against it. Unfortunately, the class needs to be registered asynchronously in a separate thread to avoid driver model deadlock with yenta with cardbus cards inserted or standard PCMCIA cards not being detected correctly due to a race. I think the only real solution is to remove the class_device_create_file calls from all socket drivers. This is just a simple commenting out of the calls, and should be suitable for the remainder of the -test kernels. Due to the number of cases that we're encountering with PCMCIA, I'm beginning to wonder if the driver model could be fixed to be more kind to PCMCIA by avoiding some of these ordering dependencies. None of this would be a problem if the driver model would allow PCI device drivers to register PCI devices while their probe or remove functions were executing.
-
Andrew Morton authored
From: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> - use "select" instead of "depend" - remove the unused SMB_NLS - remove unneeded "default y" of CONFIG_NLS - revert to postion of nls menu (middle of filessytem menus is strange) - fix "#ifdef CONFIG_NLS" on UDF (should this add new one to Kconfig?)
-
Andrew Morton authored
This fixes the recently-reported "fsstress memory leak" problem. It has been there since November 2002. shrink_dcache() has a heuristic to prevent the dcache (and hence icache) from getting shrunk too far: it refuses to allow the dcache to shrink below 2*nr_used. Problem is, _all_ non-leaf dentries (directories) count as used. So when you have really deep directory hierarchies (fsstress creates these), nr_used is really high, and there is no upper bound to the amount of pinned dcache. The patch just rips out the heuristic. This means that dcache (and hence icache (and hence pagecache)) will be shrunk more aggressively. This could be a problem, and tons of testing is needed - a new heuristic may be needed. However I am not able to reproduce the problem which cause me to add this heuristic in the first place: Simple testcase: run a huge `dd' while running a concurrent `watch -n1 cat /proc/meminfo'. The program text for `cat' gets loaded from disk once per second.
-