1. 18 Jun, 2002 14 commits
    • Andrew Morton's avatar
      [PATCH] direct-to-BIO I/O for swapcache pages · 88c4650a
      Andrew Morton authored
      This patch changes the swap I/O handling.  The objectives are:
      
      - Remove swap special-casing
      - Stop using buffer_heads -> direct-to-BIO
      - Make S_ISREG swapfiles more robust.
      
      I've spent quite some time with swap.  The first patches converted swap to
      use block_read/write_full_page().  These were discarded because they are
      still using buffer_heads, and a reasonable amount of otherwise unnecessary
      infrastructure had to be added to the swap code just to make it look like a
      regular fs.  So this code just has a custom direct-to-BIO path for swap,
      which seems to be the most comfortable approach.
      
      A significant thing here is the introduction of "swap extents".  A swap
      extent is a simple data structure which maps a range of swap pages onto a
      range of disk sectors.  It is simply:
      
      	struct swap_extent {
      		struct list_head list;
      		pgoff_t start_page;
      		pgoff_t nr_pages;
      		sector_t start_block;
      	};
      
      At swapon time (for an S_ISREG swapfile), each block in the file is bmapped()
      and the block numbers are parsed to generate the device's swap extent list.
      This extent list is quite compact - a 512 megabyte swapfile generates about
      130 nodes in the list.  That's about 4 kbytes of storage.  The conversion
      from filesystem blocksize blocks into PAGE_SIZE blocks is performed at swapon
      time.
      
      At swapon time (for an S_ISBLK swapfile), we install a single swap extent
      which describes the entire device.
      
      The advantages of the swap extents are:
      
      1: We never have to run bmap() (ie: read from disk) at swapout time.  So
         S_ISREG swapfiles are now just as robust as S_ISBLK swapfiles.
      
      2: All the differences between S_ISBLK swapfiles and S_ISREG swapfiles are
         handled at swapon time.  During normal operation, we just don't care.
         Both types of swapfiles are handled the same way.
      
      3: The extent lists always operate in PAGE_SIZE units.  So the problems of
         going from fs blocksize to PAGE_SIZE are handled at swapon time and normal
         operating code doesn't need to care.
      
      4: Because we don't have to fiddle with different blocksizes, we can go
         direct-to-BIO for swap_readpage() and swap_writepage().  This introduces
         the kernel-wide invariant "anonymous pages never have buffers attached",
         which cleans some things up nicely.  All those block_flushpage() calls in
         the swap code simply go away.
      
      5: The kernel no longer has to allocate both buffer_heads and BIOs to
         perform swapout.  Just a BIO.
      
      6: It permits us to perform swapcache writeout and throttling for
         GFP_NOFS allocations (a later patch).
      
      (Well, there is one sort of anon page which can have buffers: the pages which
      are cast adrift in truncate_complete_page() because do_invalidatepage()
      failed.  But these pages are never added to swapcache, and nobody except the
      VM LRU has to deal with them).
      
      The swapfile parser in setup_swap_extents() will attempt to extract the
      largest possible number of PAGE_SIZE-sized and PAGE_SIZE-aligned chunks of
      disk from the S_ISREG swapfile.  Any stray blocks (due to file
      discontiguities) are simply discarded - we never swap to those.
      
      If an S_ISREG swapfile is found to have any unmapped blocks (file holes) then
      the swapon attempt will fail.
      
      The extent list can be quite large (hundreds of nodes for a gigabyte S_ISREG
      swapfile).  It needs to be consulted once for each page within
      swap_readpage() and swap_writepage().  Hence there is a risk that we could
      blow significant amounts of CPU walking that list.  However I have
      implemented a "where we found the last block" cache, which is used as the
      starting point for the next search.  Empirical testing indicates that this is
      wildly effective - the average length of the list walk in map_swap_page() is
      0.3 iterations per page, with a 130-element list.
      
      It _could_ be that some workloads do start suffering long walks in that code,
      and perhaps a tree would be needed there.  But I doubt that, and if this is
      happening then it means that we're seeking all over the disk for swap I/O,
      and the list walk is the least of our problems.
      
      rw_swap_page_nolock() now takes a page*, not a kernel virtual address.  It
      has been renamed to rw_swap_page_sync() and it takes care of locking and
      unlocking the page itself.  Which is all a much better interface.
      
      Support for type 0 swap has been removed.  Current versions of mkwap(8) seem
      to never produce v0 swap unless you explicitly ask for it, so I doubt if this
      will affect anyone.  If you _do_ have a type 0 swapfile, swapon will fail and
      the message
      
      	version 0 swap is no longer supported. Use mkswap -v1 /dev/sdb3
      
      is printed.  We can remove that code for real later on.  Really, all that
      swapfile header parsing should be pushed out to userspace.
      
      This code always uses single-page BIOs for swapin and swapout.  I have an
      additional patch which converts swap to use mpage_writepages(), so we swap
      out in 16-page BIOs.  It works fine, but I don't intend to submit that.
      There just doesn't seem to be any significant advantage to it.
      
      I can't see anything in sys_swapon()/sys_swapoff() which needs the
      lock_kernel() calls, so I deleted them.
      
      If you ftruncate an S_ISREG swapfile to a shorter size while it is in use,
      subsequent swapout will destroy the filesystem.  It was always thus, but it
      is much, much easier to do now.  Not really a kernel problem, but swapon(8)
      should not be allowing the kernel to use swapfiles which are modifiable by
      unprivileged users.
      88c4650a
    • Andrew Morton's avatar
      [PATCH] leave swapcache pages unlocked during writeout · 3ab86fb0
      Andrew Morton authored
      Convert swap pages so that they are PageWriteback and !PageLocked while
      under writeout, like all other block-backed pages.  (Network
      filesystems aren't doing this yet - their pages are still locked while
      under writeout)
      3ab86fb0
    • Andrew Morton's avatar
      [PATCH] mark_buffer_dirty_inode() speedup · 43967af3
      Andrew Morton authored
      buffer_insert_list() is showing up on Anton's graphs.  It'll be via
      ext2's mark_buffer_dirty_inode() against indirect blocks.  If the
      buffer is already on an inode queue, we know that it is on the correct
      inode's queue so we don't need to re-add it.
      43967af3
    • Andrew Morton's avatar
      [PATCH] go back to 256 requests per queue · 374cac7a
      Andrew Morton authored
      The request queue was increased from 256 slots to 512 in 2.5.20.  The
      throughput of `dbench 128' on Randy's 384 megabyte machine fell 40%.
      
      We do need to understand why that happened, and what we can learn from
      it.  But in the meanwhile I'd suggest that we go back to 256 slots so
      that this known problem doesn't impact people's evaluation and tuning
      of 2.5 performance.
      374cac7a
    • Andrew Morton's avatar
      [PATCH] mark_buffer_dirty() speedup · 7a1a7f5b
      Andrew Morton authored
      mark_buffer_dirty() is showing up on Anton's graphs.  Avoiding the
      buslocked RMW if the buffer is already dirty should fix that up.
      7a1a7f5b
    • Andrew Morton's avatar
      [PATCH] grab_cache_page_nowait deadlock fix · 85bfa7dc
      Andrew Morton authored
      - If grab_cache_page_nowait() is to be called while holding a lock on
        a different page, it must perform memory allocations with GFP_NOFS.
        Otherwise it could come back onto the locked page (if it's dirty) and
        deadlock.
      
        Also tidy this function up a bit - the checks in there were overly
        paranoid.
      
      - In a few of places, look to see if we can avoid a buslocked cycle
        and dirtying of a cacheline.
      85bfa7dc
    • Andrew Morton's avatar
      [PATCH] update_atime cleanup · 386b1f74
      Andrew Morton authored
      Remove unneeded do_update_atime(), and convert update_atime() to C.
      386b1f74
    • Andrew Morton's avatar
      [PATCH] ext3 corruption fix · afb51f81
      Andrew Morton authored
      Stephen and Neil Brown recently worked this out.  It's a
      rare situation which only affects data=journal mode.
      
      Fix problem in data=journal mode where writeback could be left pending on a
      journaled, deleted disk block.  If that block then gets reallocated, we can
      end up with an alias in which the old data can be written back to disk over
      the new.  Thanks to Neil Brown for spotting this and coming up with the
      initial fix.
      afb51f81
    • Andrew Morton's avatar
      [PATCH] writeback tunables · e3e529bf
      Andrew Morton authored
      Adds five sysctls for tuning the writeback behaviour:
      
      	dirty_async_ratio
      	dirty_background_ratio
      	dirty_sync_ratio
      	dirty_expire_centisecs
      	dirty_writeback_centisecs
      
      these are described in Documentation/filesystems/proc.txt  They are
      basically the tradiditional knobs which we've always had...
      
      We are accreting a ton of obsolete sysctl numbers under /proc/sys/vm/.
      I didn't recycle these - just mark them unused and remove the obsolete
      documentation.
      e3e529bf
    • Rusty Russell's avatar
      [PATCH] Net updates / CPU hotplug infrastructure missed merge · 88bccfb7
      Rusty Russell authored
      Ironically enough, both were written by me.
      
      Fixed thus.
      88bccfb7
    • Linus Torvalds's avatar
      Merge · 1dbe77d3
      Linus Torvalds authored
      1dbe77d3
    • Andi Kleen's avatar
      [PATCH] change_page_attr and AGP update · c8712aeb
      Andi Kleen authored
      Add change_page_attr to change page attributes for the kernel linear map.
      
      Fix AGP driver to use change_page_attr for the AGP buffer.
      
      Clean up AGP driver a bit (only tested on i386/VIA+AMD)
      
      Change ioremap_nocache to use change_page_attr to avoid mappings with
      conflicting caching attributes.
      c8712aeb
    • Stelian Pop's avatar
      [PATCH] export pci_bus_type to modules. · 68d6275b
      Stelian Pop authored
      This exports the pci_bus_type symbol to modules, needed by (at least)
      the recent changes in pcmcia/cardbus.c.
      68d6275b
    • Linus Torvalds's avatar
      baf74405
  2. 17 Jun, 2002 17 commits
  3. 16 Jun, 2002 5 commits
    • Linus Torvalds's avatar
      Linux kernel 2.5.22 · d9083ea2
      Linus Torvalds authored
      d9083ea2
    • Robert Love's avatar
      [PATCH] scheduler whitespace/comment merge from -ac · 48fc1713
      Robert Love authored
      Attached patch brings over the sane bits from 2.4-ac: i.e. if Linus
      merges this and Alan merges your patch minus my complaints, the two
      trees will be in sync...
      48fc1713
    • Linus Torvalds's avatar
    • Martin Dalecki's avatar
      [PATCH] 2.5.21 ide 92 · 78929a18
      Martin Dalecki authored
       - Finally unify task_in_intr and task_mulin_intr. One crucial code path less to
         watch out, but a quite dangerous step in itself. PIO reading is functional
         again. The next step will be the unification of the write path of course.
      
       - Introduce a small helper for the execution of task file commands which
         basically just send a simple command down to the drive.
      
       - Add a buffer parameter to ide_raw_taskfile allowing to unify the handling of
         ioctl and normal ide_raw_taskfile request.
      
       - Fix some small function pointer type mismatches.
      
      Apply more host chip controller clenups by Bartlomiej:
      
           - move setting drive->current_speed from *_tune_chipset()
             to ide_config_drive_speed()
      
          cmd64x.c:
      	- convert cmd64x_tuneproc() to use ata-timing library
      	- clean cmd64x_tune_chipset() and cmd680_tune_chipset()
      
          hpt366.c:
      	- remove empty timings table
      
          it8172.c:
      	- kill prototypes
      	- update to new udma_setup() scheme
      
          - misc cleanups
      78929a18
    • Linus Torvalds's avatar
      Fix smbfs debug macros · 47496445
      Linus Torvalds authored
      47496445
  4. 15 Jun, 2002 4 commits
    • Linus Torvalds's avatar
      cardbus.c: · 9f64c00f
      Linus Torvalds authored
        Set up CardBus cards correctly: initialize them fully
        before calling device_register(), and make sure to tell
        the world that it's a PCI-like bus.
      9f64c00f
    • Pavel Machek's avatar
      [PATCH] suspend-to-{ram/disk} cleanups/fixes for 2.5.21 · ef8e826c
      Pavel Machek authored
      This kills Sysrq-D support (did not work anyway, and complicated
      code).
      
      Adds resume support to i8259A (otherwise interrupts will not work
      after S3).
      
      HAVE_NEW_DEVICE_MODEL is always true in 2.5, so we should define
      it. S3 can't work properly without that. Also limit toshiba workaround
      to S1. (This hide lack of i8259A support for me).
      
      Fixes compilation, and kills <asm/suspend.h> being included
      twice with ugly hacks around.
      ef8e826c
    • William Lee Irwin III's avatar
      3f52c652
    • François Romieu's avatar
      [PATCH] 2.5.21 - hdlc drivers fixes · d5ba0bf6
      François Romieu authored
      - (leak) memory allocated in dscc4_found1() isn't freed by caller in error path.
        dscc4_free1() is now in charge of this duty.
      - (style) code factored in dscc4_remove_one after use of dscc4_free1().
      d5ba0bf6