- 29 Jul, 2002 28 commits
-
-
Hugh Dickins authored
Update Doc and remove FIXME comment from fork.c now accounting right.
-
Hugh Dickins authored
do_mmap_pgoff's (file == NULL) check was incorrect: it caused shared MAP_ANONYMOUS objects to be counted twice (again in shmem_file_setup), and again on fork(); whereas the equivalent shared /dev/zero objects were correctly counted. Conversely, a private readonly file mapping was (correctly) not counted, but still not counted when mprotected to writable: mprotect_fixup had pointless "charged = 0" changes, now it does vm_enough_memory checking when private is first made writable (but later we may want to refine behaviour on a noreserve mapping). Also changed correct (flags & MAP_SHARED) test in do_mmap_pgoff to equivalent (vm_flags & VM_SHARED) test: because do_mmap_pgoff is dealing with vm_flags rather than the input flags by that stage.
-
Hugh Dickins authored
Remove vm_unacct_vma function: it's only used in one place, which can do it better by using vm_unacct_memory directly.
-
Hugh Dickins authored
do_mmap_pgoff clears MAP_NORESERVE from vm_flags when VM accounts strictly: but it's not in vm_flags, it's in flags (and tested there).
-
Hugh Dickins authored
There is no point in do_mremap clearing MAP_NORESERVE from its flags: it has already validated that only the MREMAP_ flags can be set, and it has no use for MAP_NORESERVE in the code that follows anyway.
-
Hugh Dickins authored
shmem_notify_change and shmem_file_write be careful about overflowingly large loff_t before shifting it into unsigned long for vm_enough_memory. Rename SHMEM_MAX_BLOCKS to SHMEM_MAX_INDEX (to avoid confusion with 512-byte blocks), define SHMEM_MAX_BYTES from it. But 2.5 vmtruncate lacked the s_maxbytes error handling which shmem_notify_change now expects: bring it in from the -dj tree. shmem_file_write error handling needs a closer look later on.
-
Hugh Dickins authored
Repeated overnight kernel builds in tmpfs showed insane Committed_AS by morning. The main bug was that shmem_file_write was passing (newsize-oldsize)>>PAGE_SHIFT to vm_enough_memory, but it has to be ((newsize>>PAGE_SHIFT)-(oldsize>>PAGE_SHIFT)) - imagine 1k writes. But actually, if we're going to do strict accounting, then we should round up to next page not down - use VM_ACCT macro throughout (needs unusual mix of PAGE_CACHE_SIZE with PAGE_SHIFT); and must count one page for a long symlink.
-
Christoph Hellwig authored
Currently there is no way to find out the effective object size of a slab cache. XFS has lots of IRIX-derived code that want to do zalloc() style allocations on zones (which are implemented as slab caches in XFS/Linux) and thus needs to know about it. There are three ways do implement it: a) implement kmem_cache_zalloc b) make the xfs zone a struct of kmem_cache_t and a size variable c) implement kmem_cache_size The current XFS tree does a) but I absolutely don't like it as encourages people to use kmem_cache_zalloc for new code instead of thinking about how to utilize slab object reuse. b) would be easy, but I guess kmem_cache_size is usefull enough to get into the kernel. Here's the patch:
-
Linus Torvalds authored
actual implementation and avoid confusion.
-
Dave Hansen authored
I just duplicated the method used in drivers/net/tulip/de2104x.c
-
Linus Torvalds authored
-
David Howells authored
This should do the trick.
-
Jens Axboe authored
-
Paul Mackerras authored
I found a situation where page->index for a pagetable page can be set to 0 instead of the correct value. This means that ptep_to_address will return the wrong answer. The problem occurs when remap_pmd_range calls pte_alloc_map and pte_alloc_map needs to allocate a new pte page, because remap_pmd_range has masked off the top bits of the address (to avoid overflow in the computation of `end'), and it passes the masked address to pte_alloc_map. Now we presumably don't need to get from the physical pages mapped by remap_page_range back to the ptes mapping them. But we could easily map some normal pages using ptes in that pagetable page subsequently, and when we call ptep_to_address on their ptes it will give the wrong answer. The patch below fixes the problem. There is a more general question this brings up - some of the procedures which iterate over ranges of ptes will do the wrong thing if the end of the address range is too close to ~0UL, while others are OK. Is this a problem in practice? On i386, ppc, and the 64-bit architectures it isn't since user addresses can't go anywhere near ~0UL, but what about arm or m68k for instance? And BTW, being able to go from a pte pointer to the mm and virtual address that that pte maps is an extremely useful thing on ppc, since it will enable me to do MMU hash-table management at set_pte (and ptep_*) time and thus avoid the extra traversal of the pagetables that I am currently doing in flush_tlb_*. So if you do decide to back out rmap, please leave in the hooks for setting page->mapping and page->index on pagetable pages.
-
Paul Mackerras authored
include/linux/timer.h needs to include <linux/stddef.h> to get the definition of NULL.
-
Adam J. Richter authored
linux-2.5.28/drivers/block_dev.c has a new do_open that broke initial ramdisk support, because it now requires devices that "manually" set bdev->bd_openers to set bdev->bd_inode->i_size as well. The following single line patch, suggested by Russell King, fixes the problem. There does not appear to be anyone acting as maintainer for rd.c, so I posted to lkml yesterday to ask if anyone objected to my submitting the patch to you, and I also emailed the message to Russell King and Al Viro. Nobody has complained. I have been running the patch for almost a day without problems.
-
bk://bk.arm.linux.org.uk:14691Linus Torvalds authored
into home.transmeta.com:/home/torvalds/v2.5/linux
-
Russell King authored
Al Viro pointed out there was a fair bit of redundancy here. We remove many include files from the serial layer, leaving those which are necessary for it to build. This has been posted to lkml, no one complained. This cset also combines a missing include of asm/io.h in 8250_pci.c (unfortunately I've lost the name of the reporter, sorry.)
-
Russell King authored
-
bk://jfs.bkbits.net/linux-2.5Linus Torvalds authored
into home.transmeta.com:/home/torvalds/v2.5/linux
-
Dave Kleikamp authored
jfs_rmdir and jfs_unlink have always called d_delete, but it hasn't caused a problem until 2.5.28. The call is an artifact of the 2.2 kernel, which had gone unnoticed in 2.4 and 2.5.
-
Linus Torvalds authored
-
Christoph Hellwig authored
These were totally unused for a long time. It's interesting how many files include swapctl.h, though..
-
David Woodhouse authored
I did this ages ago but never submitted it because I never got round to testing it. I still haven't tested it, but it ought to work, and the code is definitely broken without it...
-
Linus Torvalds authored
Noticed by Zwane Mwaikambo.
-
David Woodhouse authored
-
Russell King authored
-
Russell King authored
8250_pci.c contains some old compatibility cruft for when __devexit wasn't defined by the generic kernel. It is now, so it's gone.
-
- 28 Jul, 2002 12 commits
-
-
Andrew Morton authored
I removed the PF_INVALIDATE debug check from buffercache leaks, too. It's non-functional - the flag should have been set across truncate_inode_pages(), not invalidate_inode_pages().
-
Linus Torvalds authored
-
Linus Torvalds authored
used. This fixes a lockup in synchronize_irq() on x86.
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Ingo Molnar authored
the attached patch is a comment update of sched.c and it also does a small cleanup in migration_thread().
-
Matthew Dharm authored
Modified the MODE_SENSE write-protect test in sd.c to issue a SCSI request with the request_bufflen the same size as the MODE_SENSE command being issued requests.
-
Matthew Dharm authored
Fixed one of the INQUIRY commands used for probing SCSI devices. This badly-formed command was trapped by the usb-storage driver BUG_ON() which is designed to stop command with a badly formed transfer_length field.
-
Andrew Morton authored
Audit put_page() uses of pages that may be in the page cache. Use page_cache_release() instead.
-
Ingo Molnar authored
the attached patch does the set_thread_area parameter simplification - it also cleans up some other TLS issues, it removes the tls_* fields from the thread_struct, and removes the now unused page-granularity flag.
-
Andrew Morton authored
This patch allows the raw driver to be built as a kernel module. It also cleans up a bunch of stuff, C99ifies the initialisers, gives lots of symbols static scope, etc. The module is unloadable when there are zero bindings. The current ioctl() interface have no way of undoing a binding - it only allows bindings to be overwritten. So I overloaded a bind to major=0,minor=0 to mean "undo the binding". I'll update the raw(8) manpage for that. generic_file_direct_IO has been exported to modules. The call to invalidate_inode_pages2() has been removed from all generic_file_driect_IO() callers, into generic_file_direct_IO() itself. Mainly to avoid exporting invalidate_inode_pages2() to modules.
-
Andrew Morton authored
This patch is a performance and correctness update to the direct-IO code: O_DIRECT and the raw driver. It mainly affects IO against blockdevs. The direct_io code was returning -EINVAL for a filesystem hole. Change it to clear the userspace page instead. There were a few restrictions and weirdnesses wrt blocksize and alignments. The code has been reworked so we now lay out maximum-sized BIOs at any sector alignment. Because of this, the raw driver has been altered to set the blockdev's soft blocksize to the minimum possible at open() time. Typically, 512 bytes. There are now no performance disadvantages to using small blocksizes, and this gives the finest possible alignment. There is no API here for setting or querying the soft blocksize of the raw driver (there never was, really), which could conceivably be a problem. If it is, we can permit BLKBSZSET and BLKBSZGET against the fd which /dev/raw/rawN returned, but that would require that blk_ioctl() be exported to modules again. This code is wickedly quick. Here's an oprofile of a single 500MHz PIII reading from four (old) scsi disks (two aic7xxx controllers) via the raw driver. Aggregate throughput is 72 megabytes/second: c013363c 24 0.0896492 __set_page_dirty_buffers c021b8cc 24 0.0896492 ahc_linux_isr c012b5dc 25 0.0933846 kmem_cache_free c014d894 26 0.09712 dio_bio_complete c01cc78c 26 0.09712 number c0123bd4 40 0.149415 follow_page c01eed8c 46 0.171828 end_that_request_first c01ed410 49 0.183034 blk_recount_segments c01ed574 65 0.2428 blk_rq_map_sg c014db38 85 0.317508 do_direct_IO c021b090 90 0.336185 ahc_linux_run_device_queue c010bb78 236 0.881551 timer_interrupt c01052d8 25354 94.707 poll_idle A testament to the efficiency of the 2.5 block layer. And against four IDE disks on an HPT374 controller. Throughput is 120 megabytes/sec: c01eed8c 80 0.292462 end_that_request_first c01fe850 87 0.318052 hpt3xx_intrproc c01ed574 123 0.44966 blk_rq_map_sg c01f8f10 141 0.515464 ata_select c014db38 153 0.559333 do_direct_IO c010bb78 235 0.859107 timer_interrupt c01f9144 281 1.02727 ata_irq_enable c01ff990 290 1.06017 udma_pci_init c01fe878 308 1.12598 hpt3xx_maskproc c02006f8 379 1.38554 idedisk_do_request c02356a0 609 2.22637 pci_conf1_read c01ff8dc 611 2.23368 udma_pci_start c01ff950 922 3.37062 udma_pci_irq_status c01f8fac 1002 3.66308 ata_status c01ff26c 1059 3.87146 ata_start_dma c01feb70 1141 4.17124 hpt374_udma_stop c01f9228 3072 11.2305 ata_out_regfile c01052d8 15193 55.5422 poll_idle Not so good. One problem which has been identified with O_DIRECT is the cost of repeated calls into the mapping's get_block() callback. Not a big problem with ext2 but other filesystems have more complex get_block implementations. So what I have done is to require that callers of generic_direct_IO() implement the new `get_blocks()' interface. This is a small extension to get_block(). It gets passed another argument which indicates the maximum number of blocks which should be mapped, and it returns the number of blocks which it did map in bh_result->b_size. This allows the fs to map up to 4G of disk (or of hole) in a single get_block() invokation. There are some other caveats and requirements of get_blocks() which are documented in the comment block over fs/direct_io.c:get_more_blocks(). Possibly, get_blocks() will be the 2.6 kernel's way of doing gang block mapping. It certainly allows good speedups. But it doesn't allow the fs to return a scatter list of blocks - it only understands linear chunks of disk. I think that's really all it _should_ do. I'll let get_blocks() sit for a while and wait for some feedback. If it is sufficient and nobody objects too much, I shall convert all get_block() instances in the kernel to be get_blocks() instances. And I'll teach readahead (at least) to use the get_blocks() extension. Delayed allocate writeback could use get_blocks(). As could block_prepare_write() for blocksize < PAGE_CACHE_SIZE. There's no mileage using it in mpage_writepages() because all our filesystems are syncalloc, and nobody uses MAP_SHARED for much. It will be tricky to use get_blocks() for writes, because if a ton of blocks have been mapped into the file and then something goes wrong, the kernel needs to either remove those blocks from the file or zero them out. The direct_io code zeroes them out. btw, some time ago you mentioned that some drivers and/or hardware may get upset if there are multiple simultaneous IOs in progress against the same block. Well, the raw driver has always allowed that to happen. O_DIRECT writes to blockdevs do as well now. todo: 1) The driver will probably explode if someone runs BLKBSZSET while IO is in progress. Need to use bdclaim() somewhere. 2) readv() and writev() need to become direct_io-aware. At present we're doing stop-and-wait for each segment when performing readv/writev against the raw driver and O_DIRECT blockdevs.
-