- 28 May, 2002 21 commits
-
-
Jens Axboe authored
o blk_get_request() and blk_put_request() needs exporting o blk_max_pfn is used by BLOCK_BOUNCE_ANY, which modular SCSI needs
-
Robert Love authored
> Hmm. That patch does not compile. "p->cpu" does not exist, it's > "p->thread_info->cpu". Tssk. Ouch, I am bad. Sorry. Make the ChangeLog entry something really defamatory. Robert Love
-
Robert Love authored
This is William Irwin's algorithmically O(1) version of count_active_tasks (which is currently O(n) for n total tasks on the system). I like it a lot: we become O(1) because now we count uninterruptible tasks, so we can return (nr_uninterruptible + nr_running). It does not introduce any overhead or hurt the case for small n, so I have no complaints. This copy has a small optimization over the original posting, but is otherwise the same thing wli posted earlier. I have tested to make sure this returns accurate results and that the kernel profile improves.
-
Ivan Kokshaysky authored
Previously assigned resources are perfectly valid - just silently ignore them.
-
Robert Love authored
Attached patch adds output of rt_priority and policy to /proc/<pid>/stats. This will not break compatibility with existing applications and will allow ps(1) and friends to display pertinent scheduling information.
-
Jan-Benedict Glaw authored
Please apply this patch to let binfmt_em86.c compile again.
-
Linus Torvalds authored
-
Ivan Kokshaysky authored
As pointed out by Russell King, resource name pointers of the secondary PCI buses are left uninitialized in the non-x86 PCI allocation path. Assigning these pointers in pci_add_new_bus() fixes the problem.
-
http://fbdev.bkbits.net:8080/fbdev-2.5Linus Torvalds authored
into home.transmeta.com:/home/torvalds/v2.5/linux
-
Martin Dalecki authored
- Eliminate all usages of the obscure QUEUE_EMPTY macro. - Eliminate all unneccessary checks for RQ_INACTIVE, this can't happen during the time we run the request strategy routine of a single major number block device. Perhaps the still remaining usage in scsi and i2o_block.c should be killed as well, since the upper ll_rw_blk layer shouldn't pass inactive requests down. Those are all places where we have deeply burried and hidden major number indexed arrays. Let's deal with them slowly...
-
Martin Dalecki authored
Since apparently no body else did care thus far, and since I'm using this driver, well here it comes: - Adjust the airo wireless LAN card driver for the fact that modules don't export symbols by default any longer. - Make some stuff which obivously should be static there static as well. (Plenty of code in Linux actually deserves a review for this far too common bug...)
-
Martin Dalecki authored
- Replace ide_delay_50m with mdelay(50). There is absolutely no reason we should behave different behaviors whatever IDECS support is enabled or not. - Kill last parameter of ide_register_hw(). It should return a pointer to the interface registered later. - pdc202xx patches by Bartomiej onierkiewicz. - ServerWorks chi pset support cleanup by Andrej Panin. - Move temporarily ide_setup_ports to main.c unfold it in ide-pnp.c.
-
Robert Love authored
This fixes three locations in net/ where per-CPU data could bite us under preemption. This is the result of an audit I did and should constitute all of the unsafe code in net/. In net/core/skbuff.c I did not have to introduce any code - just rearrange the grabbing of smp_processor_id() to be in the interrupt off region. Pretty clean fixes. Note in the future we can use put_cpu() and get_cpu() to grab the CPU# safely. I will send a patch to Marcelo so we can have a 2.4 version (which doesn't do the preempt stuff), too...
-
Robert Love authored
This adds an optimization to set_cpus_allowed: if the task is not running, there is no sense in kicking the migration_threads into action, we just need to update task->cpu. This was suggested by Mike Kravetz. Besides being an optimization, this would prevent any future race between set_cpus_allowed and the migration_threads.
-
Robert Love authored
This adds documentation about the O(1) scheduler to Documentation/. The new scheduler is complicated and providing future scheduler hackers some background seems a Good Thing to me. Specifically: - add Documentation/sched-coding.txt: an overview of the functions, magic numbers, and variables in the scheduler as well as (most importantly) a review of the locking semantics. - add Documentation/sched-design.txt: an edited version of Ingo's initial email to lkml about his scheduler. Goes over the design, implementation, and goals of the scheduler. I tried to edit it where needed to bring it in line with the scheduler as it is today. - modify kernel/sched.c: update your copyright and add a change entry for the new scheduler.
-
Robert Love authored
The attached trivial patch simply changes the printk debug statement in do_exit when preempt_count!=0 to say "note" instead of "error" and log at KERN_INFO in lieu of KERN_ERR. I want to keep the message around a bit, but people get too paranoid when things like nfsd legitimately exit with a preempt_count=1.
-
James Simmons authored
-
James Simmons authored
-
Linus Torvalds authored
-
James Simmons authored
-
James Simmons authored
-
- 27 May, 2002 19 commits
-
-
Andrew Morton authored
This makes sure that sys_sync() will terminate. It counts up the number of dirty pages in the machine and will refuse to write out more than 1.25 times this number of pages. This function is called twice on the sys_sync() path, so the kernel will actually write 2.5x the number of initially-dirty pages before giving up.
-
Andrew Morton authored
It might reduce pagemap_lru_lock hold times a little, and is more consistent. I think all global page accounting is now inside page_states[].
-
Andrew Morton authored
Factor out some similar code in page_alloc.c
-
Andrew Morton authored
For historical reasons, ext3 has a private BH state bit which has global scope. This patch moves it inside ext3.
-
Andrew Morton authored
Patch from Anton Blanchard which replaces printk(KERN_FOO __FUNCTION__ ": msg"); with printk(KERN_FOO "%s: msg", __FUNCTION__); in ext3.
-
Andrew Morton authored
Fixes all the goto spaghetti in generic_file_write() and turns it into something which humans can understand. Andi tells me that gcc3 does a decent job of relocating blocks out of line anyway. This patch gives the compiler a helping hand with appropriate use of likely() and unlikely().
-
Andrew Morton authored
Random cleanup: remove the mem_map_t typedef. Just use 'struct page' everywhere.
-
Andrew Morton authored
An implementation of directory-synchronous mounts. I sent this out some months ago and it didn't generate a lot of interest. Later we had one of the usual cheery exchanges with Wietse Venema (postfix development) and he agreed that directory synchronous mounts were something that he could use, and that there was benefit in implementing them in Linux. If you choose to apply this I'll push the 2.4 patch. Patch against e2fsprogs-1.26: http://www.zip.com.au/~akpm/linux/dirsync/e2fsprogs-1.26.patch Patch against util-linux-2.11n: http://www.zip.com.au/~akpm/linux/dirsync/util-linux-2.11n.patch The kernel patch includes implementations for ext2 and ext3. It's pretty simple. - When dirsync is in operation against a directory, the following operations are synchronous within that directory: create, link, unlink, symlink, mkdir, rmdir, mknod, rename (synchronous if either the source or dest directory is dirsync). - dirsync is a subset of sync. So `mount -o sync' or `chattr +S' give you everything which `mount -o dirsync' or `chattr +D' gives, plus synchronous file writes. - ext2's inode.i_attr_flags is unused, and is removed. - mount /dev/foo /mnt/bar -o dirsync works as expected. - An ext2 or ext3 directory tree can be set dirsync with `chattr +D -R'. - dirsync is maintained as new directories are created under a `chattr +D' directory. Like `chattr +S'. - Other filesystems can trivially be taught about dirsync. It's just a matter of replacing `IS_SYNC(inode)' with `IS_DIRSYNC(inode)' in the directory update functions. IS_SYNC will still be honoured when IS_DIRSYNC is used. - Non-directory files do not have their dirsync flag propagated. So an S_ISREG file which is created inside a dirsync directory will not have its dirsync bit set. chattr needs to do this as well. - There was a bit of version skew between e2fsprogs' idea of the inode flags and the kernel's. That is sorted out here. - `lsattr' shows the dirsync flag as "D". The letter "D" was previously being used for Compressed_Dirty_File. I changed Compressed_Dirty_File to use "Z". Is that OK? The mount(2) manpage needs to be taught about MS_DIRSYNC.
-
Andrew Morton authored
Spot the difference: aops.readpage aops.readpages aops.writepage aops.writeback_mapping The patch renames `writeback_mapping' to `writepages'
-
Andrew Morton authored
Turn on multipage no-buffers reads for ext3.
-
Andrew Morton authored
Multipage BIO writeout from the pagecache. It's pretty much the same as multipage reads. It falls back to buffers if things got complex. The write case is a little more complex because it handles pages which have buffers and pages which do not. If the page didn't have buffers this code does not add them.
-
Andrew Morton authored
Implements BIO-based multipage reads into the pagecache, and turns this on for ext2. CPU load for `cat large_file > /dev/null' is reduced by approximately 15%. Similar reductions for tiobench with a single thread. (Earlier claims of 25% were exaggerated - they were measured with slab debug enabled. But 15% isn't bad for a load which is dominated by copy_*_user costs). With 2, 4 and 8 tiobench threads, throughput is increased as well, which was unexpected. It's due to request queue weirdness. (Generally the request queueing is doing bad things under certain workloads - that's a separate issue.) BIOs of up to 64 kbytes are assembled and submitted for readahead and for single-page reads. So the work involved in reading 32 pages has gone from: - allocate and attach 32 buffer_heads - submit 32 buffer_heads - allocate 32 bios - submit 32 bios to: - allocate 2 bios - submit 2 bios These pages never have buffers attached. Buffers will be attached later if the application writes to these pages (file overwrite). The first version of this code (in the "delayed allocation" patches) tries to handle everything - bios which start mid-page, bios which end mid-page and pages which are covered by multiple bios. It is very complex code and in fact appears to be incorrect: out-of-order BIO completion could cause a page to come unlocked at the wrong time. This implementation is much simpler: if things get complex, it just falls back to the buffer-based block_read_full_page(), which isn't going away, and which understands all that complexity. There's no point in doing this in two places. This code will bypass the buffer layer for - fully-mapped pages which are on-disk contiguous. - fully unmapoped pages (holes) - partially unmapped pages, where the unmappedness is at the end of the page (end-of-file). and everything else falls back to buffers. This means that with blocksize == PAGE_CACHE_SIZE, 100% of pages are handed direct to BIO. With a heavy 10-minute dbench run on 4k PAGE_CACHE_SIZE and 1k blocks, 95% of pages were handed direct to BIO. Almost all of the other 5% were passed to block_read_full_page() because they were already partially uptodate from an earlier sub-page write(). This ratio will fall if PAGE_CACHE_SIZE/blocksize is greater than four. But if that's the case, CPU efficiency is far from the main concern - there are significant seek and bandwidth problems just at 4 blocks per page. This code will stress out the block layer somewhat - RAID0 doesn't like multipage BIOs, and there are probably others. RAID0 seems to struggle along - readahead fails but read falls back to single-page reads, which succeed. Such problems may be worked around by setting MPAGE_BIO_MAX_SIZE to PAGE_CACHE_SIZE in fs/mpage.c. It is trivial to enable multipage reads for many other filesystems. We can do that after completion of external testing of ext2.
-
Andrew Morton authored
Relax the requirements on the writeback_mapping a_op. This function is passed the number of pages which it should write. The current fs-writeback.c code will get confused if the address_space writes back more pages than it was asked to. With this change the address_space may write more pages than required if that is convenient. Extent-based fileystems may wish to do this.
-
Andrew Morton authored
Pages which are under writeout to swap are locked, and not PageWriteback(). So page allocators do not throttle against them in shrink_caches(). This causes enormous list scans and general coma under really heavy swapout loads. One fix would be to teach shrink_cache() to wait on PG_locked for swap pages. The other approach is to set both PG_locked and PG_writeback for swap pages so they can be handled in the same manner as file-backed pages in shrink_cache(). This patch takes the latter approach.
-
Andrew Morton authored
Fix bug in the loop driver. When presented with a multipage BIO, loop is overindexing the first page in the BIO rather than advancing to the second page. It scribbles on the backing file and/or on kernel memory. This happens with multipage BIO-based pagecache I/O and presumably with O_DIRECT also. The fix is much-needed with the multipage-BIO patches - using that code on loop-backed filesystems has rather messy results.
-
Andrew Morton authored
The set_page_dirty() in the ext3_writepage() failure path isn't right. set_page_dirty() will alter buffer states - it's a "whole page" dirtying. __set_page_dirty_buffers() is emitting warnings when it refuses to set dirty a non-uptodate buffer against a partially-mapped page. All we want to do in there is to move the page back onto mapping->dirty_pages, without altering the state of its buffers.
-
Andrew Morton authored
Fix bug in block_truncate_page(). When buffers are attached to an uptodate page, they are marked as being uptodate. To preserve buffer/page state coherency. Dirtiness is handled in the same way. But block_truncate_page() assumes that a buffer which is unmapped and uptodate is over a hole. That's not the case, and the net effect is that block_truncate_page() is failing to zero the block outside the truncation point. This only happens if the page has a disk mapping but has no attached buffers on entry to block_truncate_page(). That's never the case in current kernels, so the problem does not exhibit (it _does_ exhibit with direct-to-BIO bypass-the-buffers I/O). There are actually three possible states of buffer mappedness: - Buffer has a disk mapping (buffer_mapped(bh) == true) - buffer is over a hole (buffer_mapped(bh) == false) - don't know. Need to run get_block() (buffer_mapped(bh) == false) This ambiguity could be resolved by added another buffer state bit (BH_mapping_state_known?) but given that we already elide the get_block calls for the common case (buffer outside i_size) it is unlikely that the complexity is worthwhile.
-
Andrew Morton authored
- Fix the fix to the fix to the sector_t printing in buffer_io_error() - A few microoptimisations in buffer.c. Replace: set_buffer_foo(bh); with if (!buffer_foo(bh)) set_buffer_foo(bh); when buffer_fooness is likely. To avoid the buslocked rmw, and to avoid dirtying a cacheline. - export write_mapping_buffers() - filesystems which put buffers on mapping->private_list need this function for I/O scheduling reasons.
-
Rusty Russell authored
I'm sick of searching my mail archives to find that email addr.
-