- 18 Jun, 2002 28 commits
-
-
Matthew Wilcox authored
This is actually part of the work I've been doing to remove BHs, but it stands by itself.
-
Andi Kleen authored
This patch streamlines poll and select by adding fast paths for a small number of descriptors passed. The majority of polls/selects seem to be of this nature. The main saving comes from not allocating two pages for wait queue and table, but from using stack allocation (upto 256bytes) when only a few descriptors are needed. This makes it as fast again as 2.0 and even a bit faster because the wait queue page allocation is avoided too (except when the drivers overflow it) select also skips a lot faster over big holes and avoids the separate pass of determining the max. number of descriptors in the bitmap. A typical linux system saves a considerable amount of unswappable memory with this patch, because it usually has 10+ daemons hanging around in poll or select with each two pages allocated for data and wait queue. Some other cleanups.
-
Andi Kleen authored
x86-64 needs an own special declaration of jiffies_64. prepare for this by moving the jiffies_64 declaration from kernel/timer.c down into each architecture.
-
Andi Kleen authored
x86_64 core updates. - Make it compile again (switch_to macros etc., add dummy suspend.h) - reenable strength reduce optimization - Fix ramdisk (patch from Mikael Pettersson) - Some merges from i386 - Reimplement lazy iobitmap allocation. I reimplemented it based on bcrl's idea. - Fix IPC 32bit emulation to actually work and move into own file - New fixed mtrr.c from DaveJ ported from 2.4 and reenable it. - Move tlbstate into PDA. - Add some changes that got lost during the last merge. - new memset that seems to actually work. - Align signal handler stack frames to 16 bytes. - Some more minor bugfixes.
-
Andrew Morton authored
Heaven knows why, but that's what the opengroup say, and returning -EFAULT causes 2.5 to fail one of the Linux Test Project tests. [ENOMEM] The addresses in the range starting at addr and continuing for len bytes are outside the range allowed for the address space of a process or specify one or more pages that are not mapped. 2.4 has it right, but 2.5 doesn't.
-
Andrew Morton authored
Reduce the radix tree nodes from 128 slots to 64. - The main reason for this is that on 64-bit/4k page machines, the slab allocator has decided that radix tree nodes will require an order-1 allocation. Shrinking the nodes to 64 slots pulls that back to an order-0 allocation. - On x86 we get fifteen 64-slot nodes per page rather than seven 129-slot nodes, for a modest memory saving. - Halving the node size will approximately halve the memory use in the worrisome really-large, really-sparse file case. Of course, the downside is longer tree walks. Each level of the tree covers six bits of pagecache index rather than seven. As ever, I am guided by Anton's profiling on the 12- and 32-way PPC boxes. radix_tree_lookup() is currently down in the noise floor. Now, there is one special case: one file which is really big and which is accessed in a random manner and which is accessed very heavily: the blockdev mapping. We _are_ showing some locking cost in __find_get_block (used to be __get_hash_table) and in its call to find_get_page(). I have a bunch of patches which introduce a generic per-cpu buffer LRU, and which remove ext2's private bitmap buffer LRUs. I expect these patches to wipe the blockdev mapping lookup lock contention off the map, but I'm awaiting test results from Anton before deciding whether those patches are worth submitting.
-
Andrew Morton authored
Renames the buffer_head lookup function `get_hash_table' to `find_get_block'. get_hash_table() is too generic a name. Plus it doesn't even use a hash any more.
-
Andrew Morton authored
One weakness which was introduced when the buffer LRU went away was that GFP_NOFS allocations became equivalent to GFP_NOIO. Because all writeback goes via writepage/writepages, which requires entry into the filesystem. However now that swapout no longer calls bmap(), we can honour GFP_NOFS's intent for swapcache pages. So if the allocation request specifies __GFP_IO and !__GFP_FS, we can wait on swapcache pages and we can perform swapcache writeout. This should strengthen the VM somewhat.
-
Andrew Morton authored
The set_page_buffers() and clear_page_buffers() macros are each used in only one place. Fold them into their callers.
-
Andrew Morton authored
highmem.h includes bio.h, so just about every compilation unit in the kernel gets to process bio.h. The patch moves the BIO-related functions out of highmem.h and into bio-related headers. The nested include is removed and all files which need to include bio.h now do so.
-
Andrew Morton authored
alloc_bufer_head() does not need the additional argument - GFP_NOFS is always correct.
-
Andrew Morton authored
Clean up ext3's journal_try_to_free_buffers(). Now that the releasepage() a_op is non-blocking and need not perform I/O, this function becomes much simpler.
-
Andrew Morton authored
bio_copy is doing vfrom = kmap_atomic(bv->bv_page, KM_BIO_IRQ); vto = kmap_atomic(bbv->bv_page, KM_BIO_IRQ); which, if I understand atomic kmaps, is incorrect. Both source and dest will get the same pte. The patch creates a separate atomic kmap member for the destination and source of this copy.
-
Andrew Morton authored
Fix the loop driver for loop-on-blockdev setups. When presented with a multipage BIO, loop_make_request overindexes the first page and corrupts kernel memory. Fix it to walk the individual pages. BTW, I suspect the IV handling in loop may be incorrect for multipage BIOs. Should we not be recalculating the IV for each page in the BIOs, or incrementing the offset by the size of the preceding pages, or such?
-
Andrew Morton authored
This patch changes the swap I/O handling. The objectives are: - Remove swap special-casing - Stop using buffer_heads -> direct-to-BIO - Make S_ISREG swapfiles more robust. I've spent quite some time with swap. The first patches converted swap to use block_read/write_full_page(). These were discarded because they are still using buffer_heads, and a reasonable amount of otherwise unnecessary infrastructure had to be added to the swap code just to make it look like a regular fs. So this code just has a custom direct-to-BIO path for swap, which seems to be the most comfortable approach. A significant thing here is the introduction of "swap extents". A swap extent is a simple data structure which maps a range of swap pages onto a range of disk sectors. It is simply: struct swap_extent { struct list_head list; pgoff_t start_page; pgoff_t nr_pages; sector_t start_block; }; At swapon time (for an S_ISREG swapfile), each block in the file is bmapped() and the block numbers are parsed to generate the device's swap extent list. This extent list is quite compact - a 512 megabyte swapfile generates about 130 nodes in the list. That's about 4 kbytes of storage. The conversion from filesystem blocksize blocks into PAGE_SIZE blocks is performed at swapon time. At swapon time (for an S_ISBLK swapfile), we install a single swap extent which describes the entire device. The advantages of the swap extents are: 1: We never have to run bmap() (ie: read from disk) at swapout time. So S_ISREG swapfiles are now just as robust as S_ISBLK swapfiles. 2: All the differences between S_ISBLK swapfiles and S_ISREG swapfiles are handled at swapon time. During normal operation, we just don't care. Both types of swapfiles are handled the same way. 3: The extent lists always operate in PAGE_SIZE units. So the problems of going from fs blocksize to PAGE_SIZE are handled at swapon time and normal operating code doesn't need to care. 4: Because we don't have to fiddle with different blocksizes, we can go direct-to-BIO for swap_readpage() and swap_writepage(). This introduces the kernel-wide invariant "anonymous pages never have buffers attached", which cleans some things up nicely. All those block_flushpage() calls in the swap code simply go away. 5: The kernel no longer has to allocate both buffer_heads and BIOs to perform swapout. Just a BIO. 6: It permits us to perform swapcache writeout and throttling for GFP_NOFS allocations (a later patch). (Well, there is one sort of anon page which can have buffers: the pages which are cast adrift in truncate_complete_page() because do_invalidatepage() failed. But these pages are never added to swapcache, and nobody except the VM LRU has to deal with them). The swapfile parser in setup_swap_extents() will attempt to extract the largest possible number of PAGE_SIZE-sized and PAGE_SIZE-aligned chunks of disk from the S_ISREG swapfile. Any stray blocks (due to file discontiguities) are simply discarded - we never swap to those. If an S_ISREG swapfile is found to have any unmapped blocks (file holes) then the swapon attempt will fail. The extent list can be quite large (hundreds of nodes for a gigabyte S_ISREG swapfile). It needs to be consulted once for each page within swap_readpage() and swap_writepage(). Hence there is a risk that we could blow significant amounts of CPU walking that list. However I have implemented a "where we found the last block" cache, which is used as the starting point for the next search. Empirical testing indicates that this is wildly effective - the average length of the list walk in map_swap_page() is 0.3 iterations per page, with a 130-element list. It _could_ be that some workloads do start suffering long walks in that code, and perhaps a tree would be needed there. But I doubt that, and if this is happening then it means that we're seeking all over the disk for swap I/O, and the list walk is the least of our problems. rw_swap_page_nolock() now takes a page*, not a kernel virtual address. It has been renamed to rw_swap_page_sync() and it takes care of locking and unlocking the page itself. Which is all a much better interface. Support for type 0 swap has been removed. Current versions of mkwap(8) seem to never produce v0 swap unless you explicitly ask for it, so I doubt if this will affect anyone. If you _do_ have a type 0 swapfile, swapon will fail and the message version 0 swap is no longer supported. Use mkswap -v1 /dev/sdb3 is printed. We can remove that code for real later on. Really, all that swapfile header parsing should be pushed out to userspace. This code always uses single-page BIOs for swapin and swapout. I have an additional patch which converts swap to use mpage_writepages(), so we swap out in 16-page BIOs. It works fine, but I don't intend to submit that. There just doesn't seem to be any significant advantage to it. I can't see anything in sys_swapon()/sys_swapoff() which needs the lock_kernel() calls, so I deleted them. If you ftruncate an S_ISREG swapfile to a shorter size while it is in use, subsequent swapout will destroy the filesystem. It was always thus, but it is much, much easier to do now. Not really a kernel problem, but swapon(8) should not be allowing the kernel to use swapfiles which are modifiable by unprivileged users.
-
Andrew Morton authored
Convert swap pages so that they are PageWriteback and !PageLocked while under writeout, like all other block-backed pages. (Network filesystems aren't doing this yet - their pages are still locked while under writeout)
-
Andrew Morton authored
buffer_insert_list() is showing up on Anton's graphs. It'll be via ext2's mark_buffer_dirty_inode() against indirect blocks. If the buffer is already on an inode queue, we know that it is on the correct inode's queue so we don't need to re-add it.
-
Andrew Morton authored
The request queue was increased from 256 slots to 512 in 2.5.20. The throughput of `dbench 128' on Randy's 384 megabyte machine fell 40%. We do need to understand why that happened, and what we can learn from it. But in the meanwhile I'd suggest that we go back to 256 slots so that this known problem doesn't impact people's evaluation and tuning of 2.5 performance.
-
Andrew Morton authored
mark_buffer_dirty() is showing up on Anton's graphs. Avoiding the buslocked RMW if the buffer is already dirty should fix that up.
-
Andrew Morton authored
- If grab_cache_page_nowait() is to be called while holding a lock on a different page, it must perform memory allocations with GFP_NOFS. Otherwise it could come back onto the locked page (if it's dirty) and deadlock. Also tidy this function up a bit - the checks in there were overly paranoid. - In a few of places, look to see if we can avoid a buslocked cycle and dirtying of a cacheline.
-
Andrew Morton authored
Remove unneeded do_update_atime(), and convert update_atime() to C.
-
Andrew Morton authored
Stephen and Neil Brown recently worked this out. It's a rare situation which only affects data=journal mode. Fix problem in data=journal mode where writeback could be left pending on a journaled, deleted disk block. If that block then gets reallocated, we can end up with an alias in which the old data can be written back to disk over the new. Thanks to Neil Brown for spotting this and coming up with the initial fix.
-
Andrew Morton authored
Adds five sysctls for tuning the writeback behaviour: dirty_async_ratio dirty_background_ratio dirty_sync_ratio dirty_expire_centisecs dirty_writeback_centisecs these are described in Documentation/filesystems/proc.txt They are basically the tradiditional knobs which we've always had... We are accreting a ton of obsolete sysctl numbers under /proc/sys/vm/. I didn't recycle these - just mark them unused and remove the obsolete documentation.
-
Rusty Russell authored
Ironically enough, both were written by me. Fixed thus.
-
Linus Torvalds authored
-
Andi Kleen authored
Add change_page_attr to change page attributes for the kernel linear map. Fix AGP driver to use change_page_attr for the AGP buffer. Clean up AGP driver a bit (only tested on i386/VIA+AMD) Change ioremap_nocache to use change_page_attr to avoid mappings with conflicting caching attributes.
-
Stelian Pop authored
This exports the pci_bus_type symbol to modules, needed by (at least) the recent changes in pcmcia/cardbus.c.
-
Linus Torvalds authored
-
- 17 Jun, 2002 12 commits
-
-
Linus Torvalds authored
-
Rusty Russell authored
This was done by inspection, is it OK Anton? It's very simple:
-
Linus Torvalds authored
into home.transmeta.com:/home/torvalds/v2.5/linux
-
Rusty Russell authored
This patch removes the concept of "logical" CPU numbers, in preparation for CPU hotplugging.
-
Mikael Pettersson authored
Summary: 2.5.17 broke initrd on x86. Fix below. Why: Kai's patch in 2.5.17 to move x86-specific options from Makefile to arch/i386/boot/Makefile unfortunately lost the fact that the orginal "#export RAMDISK = -DRAMDISK=512" statement was commented out. (I suspect a typo.) RAMDISK is obsolete since 1.3.something, and uncommenting it has "interesting" effects since the ram_size field has a very different meaning now. The patch below reverts the statement to its pre-2.5.17 state. Perhaps it should be removed altogether?
-
Linus Torvalds authored
into home.transmeta.com:/home/torvalds/v2.5/linux
-
David S. Miller authored
-
Russell King authored
-
David S. Miller authored
-
David S. Miller authored
-
Benjamin LaHaise authored
This patch splits fput into fput and __fput. __fput is needed by aio to construct a mechanism for performing a deferred fput during io completion, which typically occurs during interrupt context.
-
Benjamin LaHaise authored
This adds support for wait queue function callbacks, which are used by aio to build async read / write operations on top of existing wait queues at points that would normally block a process.
-