An error occurred fetching the project authors.
  1. 31 Aug, 2003 1 commit
    • Andrew Morton's avatar
      [PATCH] vmscan: zone pressure calculation fix · b25bb608
      Andrew Morton authored
      Off-by-one in balance_pgdat(): `priority' can never go negative.  It causes
      the scanning priority thresholds to be quite wrong and kswapd tends to go
      berzerk when there is a lot of mapped memory around.
      b25bb608
  2. 21 Aug, 2003 1 commit
  3. 20 Aug, 2003 1 commit
    • Andrew Morton's avatar
      [PATCH] vmscan: give dirty referenced pages another pass · d55158b5
      Andrew Morton authored
      In a further attempt to prevent dirty pages from being written out from the
      LRU, don't write them if they were referenced.  This gives those pages
      another trip around the inactive list.  So more of them are written via
      balance_dirty_pages().
      
      It speeds up an untar-of-five-kernel trees by 5% on a 256M box, presumably
      because balance_dirty_pages() has better IO patterns.
      
      It largely fixes the problem which Gerrit talked about at the kernel summit:
      the individual writepage()s of dirty pages coming off the tail of the LRU are
      reduced by 83% in their database workload.
      
      I'm a bit worried that it increases scanning and OOM possibilities under
      nutty VM stress cases, but nothing untoward has been noted during its four
      weeks in -mm, so...
      d55158b5
  4. 19 Aug, 2003 2 commits
    • Andrew Morton's avatar
      [PATCH] async write errors: use flags in address space · fcad2b42
      Andrew Morton authored
      From: Oliver Xymoron <oxymoron@waste.org>
      
      This patch just saves a few bytes in the inode by turning mapping->gfp_mask
      into an unsigned long mapping->flags.
      
      The mapping's gfp mask is placed in the 16 high bits of mapping->flags and
      two of the remaining 16 bits are used for tracking EIO and ENOSPC errors.
      
      This leaves 14 bits in the mapping for future use.  They should be accessed
      with the atomic bitops.
      fcad2b42
    • Andrew Morton's avatar
      [PATCH] async write errors: report truncate and io errors on · fe7e689f
      Andrew Morton authored
      From: Oliver Xymoron <oxymoron@waste.org>
      
      These patches add the infrastructure for reporting asynchronous write errors
      to block devices to userspace.  Error which are detected due to pdflush or VM
      writeout are reported at the next fsync, fdatasync, or msync on the given
      file, and on close if the error occurs in time.
      
      We do this by propagating any errors into page->mapping->error when they are
      detected.  In fsync(), msync(), fdatasync() and close() we return that error
      and zero it out.
      
      
      The Open Group say close() _may_ fail if an I/O error occurred while reading
      from or writing to the file system.  Well, in this implementation close() can
      return -EIO or -ENOSPC.  And in that case it will succeed, not fail - perhaps
      that is what they meant.
      
      
      There are three patches in this series and testing has only been performed
      with all three applied.
      fe7e689f
  5. 18 Aug, 2003 1 commit
    • Andrew Morton's avatar
      [PATCH] cpumask_t: allow more than BITS_PER_LONG CPUs · bf8cb61f
      Andrew Morton authored
      From: William Lee Irwin III <wli@holomorphy.com>
      
      Contributions from:
      	Jan Dittmer <jdittmer@sfhq.hn.org>
      	Arnd Bergmann <arnd@arndb.de>
      	"Bryan O'Sullivan" <bos@serpentine.com>
      	"David S. Miller" <davem@redhat.com>
      	Badari Pulavarty <pbadari@us.ibm.com>
      	"Martin J. Bligh" <mbligh@aracnet.com>
      	Zwane Mwaikambo <zwane@linuxpower.ca>
      
      It has ben tested on x86, sparc64, x86_64, ia64 (I think), ppc and ppc64.
      
      cpumask_t enables systems with NR_CPUS > BITS_PER_LONG to utilize all their
      cpus by creating an abstract data type dedicated to representing cpu
      bitmasks, similar to fd sets from userspace, and sweeping the appropriate
      code to update callers to the access API.  The fd set-like structure is
      according to Linus' own suggestion; the macro calling convention to ambiguate
      representations with minimal code impact is my own invention.
      
      Specifically, a new set of inline functions for manipulating arbitrary-width
      bitmaps is introduced with a relatively simple implementation, in tandem with
      a new data type representing bitmaps of width NR_CPUS, cpumask_t, whose
      accessor functions are defined in terms of the bitmap manipulation inlines.
      This bitmap ADT found an additional use in i386 arch code handling sparse
      physical APIC ID's, which was convenient to use in this case as the
      accounting structure was required to be wider to accommodate the physids
      consumed by larger numbers of cpus.
      
      For the sake of simplicity and low code impact, these cpu bitmasks are passed
      primarily by value; however, an additional set of accessors along with an
      auxiliary data type with const call-by-reference semantics is provided to
      address performance concerns raised in connection with very large systems,
      such as SGI's larger models, where copying and call-by-value overhead would
      be prohibitive.  Few (if any) users of the call-by-reference API are
      immediately introduced.
      
      Also, in order to avoid calling convention overhead on architectures where
      structures are required to be passed by value, NR_CPUS <= BITS_PER_LONG is
      special-cased so that cpumask_t falls back to an unsigned long and the
      accessors perform the usual bit twiddling on unsigned longs as opposed to
      arrays thereof.  Audits were done with the structure overhead in-place,
      restoring this special-casing only afterward so as to ensure a more complete
      API conversion while undergoing the majority of its end-user exposure in -mm.
       More -mm's were shipped after its restoration to be sure that was tested,
      too.
      
      The immediate users of this functionality are Sun sparc64 systems, SGI mips64
      and ia64 systems, and IBM ia32, ppc64, and s390 systems.  Of these, only the
      ppc64 machines needing the functionality have yet to be released; all others
      have had systems requiring it for full functionality for at least 6 months,
      and in some cases, since the initial Linux port to the affected architecture.
      bf8cb61f
  6. 01 Aug, 2003 4 commits
    • Andrew Morton's avatar
      [PATCH] vmscan: use zone_pressure for page unmapping · 14d927a3
      Andrew Morton authored
      From: Nikita Danilov <Nikita@Namesys.COM>
      
      Use zone->pressure (rathar than scanning priority) to determine when to
      start reclaiming mapped pages in refill_inactive_zone().  When using
      priority every call to try_to_free_pages() starts with scanning parts of
      active list and skipping mapped pages (because reclaim_mapped evaluates to
      0 on low priorities) no matter how high memory pressure is.
      14d927a3
    • Andrew Morton's avatar
      [PATCH] vmscan: decaying average of zone pressure · ecbeb4b2
      Andrew Morton authored
      From: Nikita Danilov <Nikita@Namesys.COM>
      
      The vmscan logic at present will scan the inactive list with increasing
      priority until a threshold is triggered.  At that threshold we start
      unmapping pages from pagetables.
      
      The problem is that each time someone calls into this code, the priority is
      initially low, so some mapped pages will be refiled event hough we really
      should be unmapping them now.
      
      Nikita's patch adds the `pressure' field to struct zone.  it is a decaying
      average of the zone's memory pressure and allows us to start unmapping pages
      immediately on entry to page reclaim, based on measurements which were made
      in earlier reclaim attempts.
      ecbeb4b2
    • Andrew Morton's avatar
      [PATCH] fix kswapd throttling · 00401a44
      Andrew Morton authored
      kswapd currently takes a throttling nap even if it freed all the pages it
      was asked to free.
      
      Change it so we only throttle if reclaim is not being sufficiently
      successful.
      00401a44
    • Andrew Morton's avatar
      [PATCH] kwsapd can free too much memory · f76a4338
      Andrew Morton authored
      We need to subtract the number of freed slab pages from the number of pages
      to free, not add it.
      f76a4338
  7. 12 Jul, 2003 1 commit
    • Bernardo Innocenti's avatar
      [PATCH] asm-generic/div64.h breakage · ed08e6df
      Bernardo Innocenti authored
       - __div64_32(): remove __attribute_pure__ qualifier from the prototype
         since this function obviously clobbers memory through &(n);
      
       - do_div(): add a check to ensure (n) is type-compatible with uint64_t;
      
       - as_update_iohist(): Use sector_div() instead of do_div().
         (Whether the result of the addition should always be stored in 64bits
         regardless of CONFIG_LBD is still being discussed, therefore it's
         unadderessed here);
      
       - Fix all places where do_div() was being called with a bad divisor argument.
      ed08e6df
  8. 14 Jun, 2003 1 commit
    • Andrew Morton's avatar
      [PATCH] NUMA fixes · 1d292c60
      Andrew Morton authored
      From: Anton Blanchard <anton@samba.org>
      
      
      Anton has been testing odd setups:
      
      /* node 0 - no cpus, no memory */
      /* node 1 - 1 cpu, no memory */
      /* node 2 - 0 cpus, 1GB memory */
      /* node 3 - 3 cpus, 3GB memory */
      
      Two things tripped so far.  Firstly the ppc64 debug check for invalid cpus
      in cpu_to_node().  Fix that in kernel/sched.c:node_nr_running_init().
      
      The other problem concerned nodes with memory but no cpus.  kswapd tries to
      set_cpus_allowed(0) and bad things happen.  So we only set cpu affinity
      for kswapd if there are cpus in the node.
      1d292c60
  9. 06 Jun, 2003 1 commit
    • Andrew Morton's avatar
      [PATCH] Don't let processes be scheduled on CPU-less nodes (1/3) · 2eb57dd2
      Andrew Morton authored
      From: Matthew Dobson <colpatch@us.ibm.com>
      
      sched_best_cpu schedules processes on nodes based on node_nr_running.  For
      CPU-less nodes, this is always 0, and thus sched_best_cpu tends to migrate
      tasks to these nodes, which eventually get remigrated elsewhere.
      
      This patch adds include/linux/topology.h, and modifies all includes of
      asm/topology.h to linux/topology.h.  A subsequent patch in this series adds
      helper functions to linux/topology.h to ensure processes are only migrated
      to nodes with CPUs.
      
      Test compiled and booted by Andrew Theurer (habanero@us.ibm.com) on both
      x440 and ppc64.
      2eb57dd2
  10. 22 May, 2003 1 commit
  11. 07 May, 2003 1 commit
    • Andrew Morton's avatar
      [PATCH] account for slab reclaim in try_to_free_pages() · f31fd780
      Andrew Morton authored
      try_to_free_pages() currently fails to notice that it successfully freed slab
      pages via shrink_slab().  So it can keep looping and eventually call
      out_of_memory(), even though there's a lot of memory now free.
      
      And even if it doesn't do that, it can free too much memory.
      
      The patch changes try_to_free_pages() so that it will notice freed slab pages
      and will return when enough memory has been freed via shrink_slab().
      
      Many options were considered, but must of them were unacceptably inaccurate,
      intrusive or sleazy.  I ended up putting the accounting into a stack-local
      structure which is pointed to by current->reclaim_state.
      
      One reason for this is that we can cleanly resurrect the current->local_pages
      pool by putting it into struct reclaim_state.
      
      (current->local_pages was removed because the per-cpu page pools in the page
      allocator largely duplicate its function.  But it is still possible for
      interrupt-time allocations to steal just-freed pages, so we might want to put
      it back some time.)
      f31fd780
  12. 30 Apr, 2003 1 commit
    • Andrew Morton's avatar
      [PATCH] zone accounting race fix · 98605ba9
      Andrew Morton authored
      Fix a bug identified by Nikita Danilov: refill_inactive_zone() is deferring
      the update of zone->nr_inactive and zone->nr_active for too long - it needs
      to be consistent whenever zone->lock is not held.
      98605ba9
  13. 20 Apr, 2003 3 commits
    • Andrew Morton's avatar
      [PATCH] don't shrink slab for highmem allocations · 5a08774a
      Andrew Morton authored
      From: William Lee Irwin III <wli@holomorphy.com>
      
      If one's goal is to free highmem pages, shrink_slab() is an ineffective
      method of recovering them, as slab pages are all ZONE_NORMAL or ZONE_DMA.
      Hence, this "FIXME: do not do for zone highmem".  Presumably this is a
      question of policy, as highmem allocations may be satisfied by reaping slab
      pages and handing them back; but the FIXME says what we should do.
      5a08774a
    • Andrew Morton's avatar
      [PATCH] implement __GFP_REPEAT, __GFP_NOFAIL, __GFP_NORETRY · 75908778
      Andrew Morton authored
      This is a cleanup patch.
      
      There are quite a lot of places in the kernel which will infinitely retry a
      memory allocation.
      
      Generally, they get it wrong.  Some do yield(), the semantics of which have
      changed over time.  Some do schedule(), which can lock up if the caller is
      SCHED_FIFO/RR.  Some do schedule_timeout(), etc.
      
      And often it is unnecessary, because the page allocator will do the retry
      internally anyway.  But we cannot rely on that - this behaviour may change
      (-aa and -rmap kernels do not do this, for instance).
      
      So it is good to formalise and to centralise this operation.  If an
      allocation specifies __GFP_REPEAT then the page allocator must infinitely
      retry the allocation.
      
      The semantics of __GFP_REPEAT are "try harder".  The allocation _may_ fail
      (the 2.4 -aa and -rmap VM's do not retry infinitely by default).
      
      The semantics of __GFP_NOFAIL are "cannot fail".  It is a no-op in this VM,
      but needs to be honoured (or fix up the callers) if the VM ischanged to not
      retry infinitely by default.
      
      The semantics of __GFP_NOREPEAT are "try once, don't loop".  This isn't used
      at present (although perhaps it should be, in swapoff).  It is mainly for
      completeness.
      75908778
    • Andrew Morton's avatar
      [PATCH] Clean up various buffer-head dependencies · cda55f33
      Andrew Morton authored
      From: William Lee Irwin III <wli@holomorphy.com>
      
      Remove page_has_buffers() from various functions, document the dependencies
      on buffer_head.h from other files besides filemap.c, and s/this file/core VM/
      in filemap.c
      cda55f33
  14. 09 Apr, 2003 1 commit
    • Andrew Morton's avatar
      [PATCH] Replace the radix-tree rwlock with a spinlock · 8e98702b
      Andrew Morton authored
      Spinlocks don't have a buslocked unlock and are faster.
      
      On a P4, time to write a 4M file with 4M one-byte-write()s:
      
      Before:
      	0.72s user 5.47s system 99% cpu 6.227 total
      	0.76s user 5.40s system 100% cpu 6.154 total
      	0.77s user 5.38s system 100% cpu 6.146 total
      
      After:
      	1.09s user 4.92s system 99% cpu 6.014 total
      	0.74s user 5.28s system 99% cpu 6.023 total
      	1.03s user 4.97s system 100% cpu 5.991 total
      8e98702b
  15. 28 Mar, 2003 2 commits
    • Andrew Morton's avatar
      [PATCH] permit page unmapping if !CONFIG_SWAP · 09efe93d
      Andrew Morton authored
      From: Hugh Dickins <hugh@veritas.com>
      
      Raised #endif CONFIG_SWAP in shrink_list, it was excluding
      try_to_unmap of file pages.  Suspect !CONFIG_MMU relied on
      that to suppress try_to_unmap, added SWAP_FAIL stub for it.
      09efe93d
    • Andrew Morton's avatar
      [PATCH] remove SWAP_ERROR · 255373b8
      Andrew Morton authored
      From: Hugh Dickins <hugh@veritas.com>
      
      Delete unused SWAP_ERROR and non-existent page_over_rsslimit().
      255373b8
  16. 15 Feb, 2003 1 commit
    • Andrew Morton's avatar
      [PATCH] blk_congestion_wait tuning and lockup fix · ecc3f712
      Andrew Morton authored
      blk_congestion_wait() will currently not wait if there are no write requests
      in flight.  Which is a potential problem if all the dirty data is against NFS
      filesystems.
      
      For write(2) traffic against NFS, things work nicely, because writers
      throttle in nfs_wait_on_requests().  But for MAP_SHARED dirtyings we need to
      avoid spinning in balance_dirty_pages().  So allow callers to fall through to
      the explicit sleep in that case.
      
      This will also fix a weird lockup which the reiser4 developers report.  In
      that case they have managed to have _all_ inodes against a superblock in
      locked state, yet there are no write requests in flight.  Taking a nap in
      blk_congestion_wait() in this case will yield the CPU to the threads which
      are trying to write out pages.
      
      Also tune up the sleep durations in various callers - 250 milliseconds seems
      rather long.
      ecc3f712
  17. 12 Feb, 2003 1 commit
  18. 11 Feb, 2003 1 commit
    • Linus Torvalds's avatar
      Sanitize kernel daemon signal handling and process naming. · 43fea1be
      Linus Torvalds authored
      Add a name argument to daemonize() (va_arg) to avoid all the
      kernel threads having to duplicate the name setting over and
      over again.
      
      Make daemonize() disable all signals by default, and add a
      "allow_signal()" function to let daemons say they explicitly
      want to support a signal.
      
      Make flush_signal() take the signal lock, so that callers do
      not need to.
      43fea1be
  19. 06 Feb, 2003 1 commit
  20. 04 Feb, 2003 2 commits
    • Andrew Morton's avatar
      [PATCH] Remove __ from topology macros · 8c4ea5db
      Andrew Morton authored
      Patch from Matthew Dobson <colpatch@us.ibm.com>
      
      When I originally wrote the patches implementing the in-kernel topology
      macros, they were meant to be called as a second layer of functions,
      sans underbars.  This additional layer was deemed unnecessary and
      summarily dropped.  As such, carrying around (and typing!) all these
      extra underbars is quite pointless.  Here's a patch to nip this in the
      (sorta) bud.  The macros only appear in 16 files so far, most of them
      being the definitions themselves.
      8c4ea5db
    • Andrew Morton's avatar
      [PATCH] remove __GFP_HIGHIO · 3ac8c845
      Andrew Morton authored
      Patch From: Hugh Dickins <hugh@veritas.com>
      
      Recently noticed that __GFP_HIGHIO has played no real part since bounce
      buffering was converted to mempool in 2.5.12: so this patch (over 2.5.58-mm1)
      removes it and GFP_NOHIGHIO and SLAB_NOHIGHIO.
      
      Also removes GFP_KSWAPD, in 2.5 same as GFP_KERNEL; leaves GFP_USER, which
      can be a useful comment, even though in 2.5 same as GFP_KERNEL.
      
      One anomaly needs comment: strictly, if there's no __GFP_HIGHIO, then
      GFP_NOHIGHIO translates to GFP_NOFS; but GFP_NOFS looks wrong in the block
      layer, and if you follow them down, you find that GFP_NOFS and GFP_NOIO
      behave the same way in mempool_alloc - so I've used the less surprising
      GFP_NOIO to replace GFP_NOHIGHIO.
      3ac8c845
  21. 21 Dec, 2002 2 commits
    • Andrew Morton's avatar
      [PATCH] Give kswapd writeback higher priority than pdflush · e386771c
      Andrew Morton authored
      The `low latency page reclaim' design works by preventing page
      allocators from blocking on request queues (and by preventing them from
      blocking against writeback of individual pages, but that is immaterial
      here).
      
      This has a problem under some situations.  pdflush (or a write(2)
      caller) could be saturating the queue with highmem pages.  This
      prevents anyone from writing back ZONE_NORMAL pages.  We end up doing
      enormous amounts of scenning.
      
      A test case is to mmap(MAP_SHARED) almost all of a 4G machine's memory,
      then kill the mmapping applications.  The machine instantly goes from
      0% of memory dirty to 95% or more.  pdflush kicks in and starts writing
      the least-recently-dirtied pages, which are all highmem.  The queue is
      congested so nobody will write back ZONE_NORMAL pages.  kswapd chews
      50% of the CPU scanning past dirty ZONE_NORMAL pages and page reclaim
      efficiency (pages_reclaimed/pages_scanned) falls to 2%.
      
      So this patch changes the policy for kswapd.  kswapd may use all of a
      request queue, and is prepared to block on request queues.
      
      What will now happen in the above scenario is:
      
      1: The page alloctor scans some pages, fails to reclaim enough
         memory and takes a nap in blk_congetion_wait().
      
      2: kswapd() will scan the ZONE_NORMAL LRU and will start writing
         back pages.  (These pages will be rotated to the tail of the
         inactive list at IO-completion interrupt time).
      
         This writeback will saturate the queue with ZONE_NORMAL pages.
         Conveniently, pdflush will avoid the congested queues.  So we end up
         writing the correct pages.
      
      In this test, kswapd CPU utilisation falls from 50% to 2%, page reclaim
      efficiency rises from 2% to 40% and things are generally a lot happier.
      
      
      The downside is that kswapd may now do a lot less page reclaim,
      increasing page allocation latency, causing more direct reclaim,
      increasing lock contention in the VM, etc.  But I have not been able to
      demonstrate that in testing.
      
      
      The other problem is that there is only one kswapd, and there are lots
      of disks.  That is a generic problem - without being able to co-opt
      user processes we don't have enough threads to keep lots of disks saturated.
      
      One fix for this would be to add an additional "really congested"
      threshold in the request queues, so kswapd can still perform
      nonblocking writeout.  This gives kswapd priority over pdflush while
      allowing kswapd to feed many disk queues.  I doubt if this will be
      called for.
      e386771c
    • Andrew Morton's avatar
      [PATCH] fix a page dirtying race in vmscan.c · 985babe8
      Andrew Morton authored
      There's a small window in which another CPU could dirty the page after
      we've cleaned it, and before we've moved it to mapping->dirty_pages().
      The end result is a dirty page on mapping->locked_pages, which is
      wrong.
      
      So take mapping->page_lock before clearing the dirty bit.
      985babe8
  22. 14 Dec, 2002 5 commits
    • Andrew Morton's avatar
      [PATCH] remove a vm debug check · d8259d09
      Andrew Morton authored
      This ad-hoc assertion is no longer true.  If all zones are in the `all
      unreclaimable' state it can trigger.  When testing with a tiny amount
      of physical memory.
      d8259d09
    • Andrew Morton's avatar
      [PATCH] remove PF_SYNC · 577c516f
      Andrew Morton authored
      current->flags:PF_SYNC was a hack I added because I didn't want to
      change all ->writepage implementations.
      
      It's foul.  And it means that if someone happens to run direct page
      reclaim within the context of (say) sys_sync, the writepage invokations
      from the VM will be treated as "data integrity" operations, not "memory
      cleansing" operations, which would cause latency.
      
      So the patch removes PF_SYNC and adds an extra arg to a_ops->writepage.
       It is the `writeback_control' structure which contains the full context
      information about why writepage was called.
      
      The initial version of this patch just passed in a bare `int sync', but
      the XFS team need more info so they can perform writearound from within
      page reclaim.
      
      The patch also adds writeback_control.for_reclaim, so writepage
      implementations can inspect that to work out the call context rather
      than peeking at current->flags:PF_MEMALLOC.
      577c516f
    • Andrew Morton's avatar
      [PATCH] vm accounting fixes and addition · c720c50a
      Andrew Morton authored
      - /proc/vmstat:pageoutrun and /proc/vmstat:allocstall are always
        identical.  Rework this so that
      
        - "allocstall" is the number of times a page allocator ran diect reclaim
      
        - "pageoutrun" is the number of times kswapd ran page reclaim
      
      - Add a new stat: "pgrotated".  The number of pages which were
        rotated to the tail of the LRU for immediate reclaim by
        rotate_reclaimable_page().
      
      - Document things a bit.
      c720c50a
    • Andrew Morton's avatar
      [PATCH] Remove fail_writepage, redux · 3e9afe4c
      Andrew Morton authored
      fail_writepage() does not work.  Its activate_page() call cannot
      activate the page because it is not on the LRU.
      
      So perform that function (more efficiently) in the VM.  Remove
      fail_writepage() and, if the filesystem does not implement
      ->writepage() then activate the page from shrink_list().
      
      A special case is tmpfs, which does have a writepage, but which
      sometimes wants to activate the pages anyway.  The most important case
      is when there is no swap online and we don't want to keep all those
      pages on the inactive list.  So just as a tmpfs special-case, allow
      writepage() to return WRITEPAGE_ACTIVATE, and handle that in the VM.
      
      Also, the whole idea of allowing ->writepage() to return -EAGAIN, and
      handling that in the caller has been reverted.  If a writepage()
      implementation wants to back out and not write the page, it must
      redirty the page, unlock it and return zero.  (This is Hugh's preferred
      way).
      
      And remove the now-unneeded shmem_writepages() - shmem inodes are
      marked as `memory backed' so it will not be called.
      
      And remove the test for non-null ->writepage() in generic_file_mmap().
      Memory-backed files _are_ mmappable, and they do not have a
      writepage().  It just isn't called.
      
      So the locking rules for writepage() are unchanged.  They are:
      
      - Called with the page locked
      - Returns with the page unlocked
      - Must redirty the page itself if it wasn't all written.
      
      But there is a new, special, hidden, undocumented, secret hack for
      tmpfs: writepage may return WRITEPAGE_ACTIVATE to tell the VM to move
      the page to the active list.  The page must be kept locked in this one
      case.
      3e9afe4c
    • Andrew Morton's avatar
      [PATCH] Fix rmap locking for CONFIG_SWAP=n · c7d7f43a
      Andrew Morton authored
      The pte_chain_unlock() needs to be outside the ifdef.
      c7d7f43a
  23. 03 Dec, 2002 4 commits
    • Andrew Morton's avatar
      [PATCH] Move unreleasable pages onto the active list · 1c0f3462
      Andrew Morton authored
      With some workloads a large number of pages coming off the LRU are
      pinned blockdev pagecache - things like ext2 group descriptors, pages
      which have buffers in the per-cpu buffer LRUs, etc.
      
      They keep churning around the inactive list, reducing the overall page
      reclaim effectiveness.
      
      So move these pages onto the active list.
      1c0f3462
    • Andrew Morton's avatar
      [PATCH] Special-case fail_writepage() in page reclaim · 32b51ef2
      Andrew Morton authored
      Pages from memory-backed filesystems are supposed to be moved up onto
      the active list, but that's not working because fail_writepage() is
      called when the page is not on the LRU.
      
      So look for this case in page reclaim and handle it there.
      
      And it's more efficient, the VM knows more about what is going on and
      it later leads to the removal of fail_writepage().
      32b51ef2
    • Andrew Morton's avatar
      [PATCH] Move reclaimable pages to the tail ofthe inactive list on · 3b0db538
      Andrew Morton authored
      The patch addresses some search complexity failures which occur when
      there is a large amount of dirty data on the inactive list.
      
      Normally we attempt to write out those pages and then move them to the
      head of the inactive list.  But this goes against page aging, and means
      that the page has to traverse the entire list again before it can be
      reclaimed.
      
      But the VM really wants to reclaim that page - it has reached the tail
      of the LRU.
      
      So what we do in this patch is to mark the page as needing reclamation,
      and then start I/O.  In the IO completion handler we check to see if
      the page is still probably reclaimable and if so, move it to the tail of
      the inactive list, where it can be reclaimed immediately.
      
      Under really heavy swap-intensive loads this increases the page reclaim
      efficiency (pages reclaimed/pages scanned) from 10% to 25%.  Which is
      OK for that sort of load.  Not great, but OK.
      
      This code path takes the LRU lock once per page.  I didn't bother
      playing games with batching up the locking work - it's a rare code
      path, and the machine has plenty of CPU to spare when this is
      happening.
      3b0db538
    • Andrew Morton's avatar
      [PATCH] Remove the final per-page throttling site in the VM · 3139a3ec
      Andrew Morton authored
      This removes the last remnant of the 2.4 way of throttling page
      allocators: the wait_on_page_writeback() against mapped-or-swapcache
      pages.
      
      I did this because:
      
      a) It's not used much.
      b) It's already causing big latencies
      c) With Jens' large-queue stuff, it can cause huuuuuuuuge latencies.
         Like: ninety seconds.
      
      So kill it, and rely on blk_congestion_wait() to slow the allocator
      down to match the rate at which the IO system can retire writes.
      3139a3ec
  24. 26 Nov, 2002 1 commit
    • Andrew Morton's avatar
      [PATCH] reduced latency in dentry and inode cache shrinking · 23e77b64
      Andrew Morton authored
      Shrinking a huge number of dentries or inodes can hold dcache_lock or
      inode_lock for a long time.  Not only does this hold off preemption -
      holding those locks basically shuts down the whole VFS.
      
      A neat fix for all such caches is to chunk the work up at the
      shrink_slab() level.
      
      I made the chunksize pretty small, for scalability reasons - avoid
      holding the lock for too long so another CPU can come in, acquire it
      and go off to do some work.
      23e77b64