Commits · ea66b69c3ec03a6762c0949af67123d8a278f49d · Kirill Smelkov / linux

04 Jul, 2002 24 commits

[PATCH] fix invalidate_inode_pages2() race · ea66b69c

Andrew Morton authored Jul 04, 2002

Fix a buglet in invalidate_list_pages2(): there is a small window in
which writeback could start against the page before this function locks
it.

The patch closes the race by performing the PageWriteback test inside
PageLocked.

Testing PageWriteback inside PageLocked is "definitive" - when a page
is locked, writeback cannot start against it.

ea66b69c

[PATCH] JBD commit callback capability · 8b00e4fa

Andrew Morton authored Jul 04, 2002

This is a patch which Stephen has applied to ext3's 2.4 repository.
Originally written by Andreas, generalised somewhat by Stephen.

Add jbd callback mechanism, requested for InterMezzo. We allow the jbd's
client to request notification when a given handle's IO finally commits to
disk, so that clients can manage their own writeback state asynchronously.

8b00e4fa

[PATCH] ext3 truncate fix · 66c1d66f

Andrew Morton authored Jul 04, 2002

Forward-port of a fix which Stephen has applied to ext3's 2.4 CVS tree.

Fix for a rare problem seen under stress in data=journal mode: if we
have to restart a truncate transaction while traversing the inode's
direct blocks, we need to deal with bh==NULL in ext3_clear_blocks.

66c1d66f

[PATCH] combine generic_writepages() and mpage_writepages() · c0902cac

Andrew Morton authored Jul 04, 2002

generic_writepages and mpage_writepages are basically identical,
except one calls ->writepage() and the other calls mpage_writepage().
This duplication is irritating.

The patch folds generic_writepage() into mpage_writepages().  It does
this rather kludgily: if the get_block argument to mpage_writepages()
is NULL then use ->writepage().

Can't think of a better way, really - we could go for a fully-blown
write_actor_t thing, but that would be overly elaborate and would not
allow mpage_writepage() to be inlined inside mpage_writepages(), which
is rather desirable.

c0902cac

[PATCH] fix a writeback race · 2ab9665b

Andrew Morton authored Jul 04, 2002

Fixes a bug in generic_writepages() and its cut-n-paste-cousin,
mpage_writepages().

The code was clearing PageDirty and then baling out if it discovered
the page was nder writeback.  Which would cause the dirty bit to be
lost.

It's a very small window, but reversing the order so PageDirty is only
cleared when we know for-sure that IO will be started fixes it up.

2ab9665b

[PATCH] suppress more allocation failure warnings · 193ae036

Andrew Morton authored Jul 04, 2002

The `page allocation failure' warning in __alloc_pages() is being a
pain.  But I'm persisting with it...

The patch renames PF_RADIX_TREE to PF_NOWARN, and uses it in a few
places where allocations failures are known to happen.  These code
paths are well-tested now and suppressing the warning is OK.

193ae036

[PATCH] always update page->flags atomically · a2b41d23

Andrew Morton authored Jul 04, 2002

move_from_swap_cache() and move_to_swap_cache() are playing with
page->flags nonatomically.  The page is on the LRU at the time and
another CPU could be altering page->flags concurrently.

The patch converts those functions to use atomic operations.

It also rationalises the number of bits which are cleared.  It's not
really clear to me what page flags we really want to set to a known
state in there.

It had no right to go clearing PG_arch_1.  I'm now clearing PG_arch_1
inside rmqueue() which is still a bit presumptious.

btw: shmem uses PAGE_CACHE_SIZE and swapper_space uses PAGE_SIZE.  I've
been carefully maintaining the distinction, but it looks like shmem
will break if we ever do make these values different.


Also, __add_to_page_cache() was performing a non-atomic RMW against
page->flags, under the assumption that it was a newly allocated page
which no other CPU would look at.  Not true - this function is used for
moving anon pages into swapcache.  Those anon pages are on the LRU -
other CPUs can be performing operations against page->flags while
__add_to_swap_cache is stomping on them.  This had me running around in
circles for two days.

So let's move the initialisation of the page state into rmqueue(),
where the page really is new (could do it in page_cache_alloc,
perhaps).

The SetPageLocked() in __add_to_page_cache() is also rather curious.
Seems OK for both pagecache and swapcache so I covered that with a
comment.


2.4 has the same problem.  Basically, add_to_swap_cache() can stomp on
another CPU's manipulation of page->flags.  After a quick review of the
code there, it is barely conceivable that a concurrent refill_inactve()
could get its PG_referenced and PG_active bits scribbled on.  Rather
unlikely because swap_out() will probably see PageActive() and bale
out.

Also, mark_dirty_kiobuf() could have its PG_dirty bit accidentally
cleared (but try_to_swap_out() sets it again later).

But there may be other code paths.  Really, I think this needs fixing
in 2.4 - it's horrid.

a2b41d23

[PATCH] Use __GFP_HIGH in mpage_writepages() · a263b647

Andrew Morton authored Jul 04, 2002

In mpage_writepage(), use __GFP_HIGH when allocating the BIO: writeback
is a memory reclaim function and is entitle to dip into the page
reserves to get its IO underway.

a263b647

[PATCH] resurrect __GFP_HIGH · 371151c9

Andrew Morton authored Jul 04, 2002

This patch reinstates __GFP_HIGH functionality.

__GFP_HIGH means "able to dip into the emergency pools".  However,
somewhere along the line this got broken.  __GFP_HIGH ceased to do
anything.  Instead, !__GFP_WAIT is used to tell the page allocator to
try harder.

__GFP_HIGH makes sense.  The concepts of "unable to sleep" and "should
try harder" are quite separate, and overloading !__GFP_WAIT to mean
"should access emergency pools" seems wrong.

This patch fixes a problem in mempool_alloc().  mempool_alloc() tries
the first allocation with __GFP_WAIT cleared.  If that fails, it tries
again with __GFP_WAIT enabled (if the caller can support __GFP_WAIT).
So it is currently performing an atomic allocation first, even though
the caller said that they're prepared to go in and call the page
stealer.

I thought this was a mempool bug, but Ingo said:

> no, it's not GFP_ATOMIC. The important difference is __GFP_HIGH, which
> triggers the intrusive highprio allocation mode. Otherwise gfp_nowait is
> just a nonblocking allocation of the same type as the original gfp_mask.
> ...
> what i've added is a bit more subtle allocation method, with both
> performance and balancing-correctness in mind:
>
> 1. allocate via gfp_mask, but nonblocking
> 2. if failure => try to get from the pool if the pool is 'full enough'.
> 3. if failure => allocate with gfp_mask [which might block]
>
> there is performance data that this method improves bounce-IO performance
> significantly, because even under VM pressure (when gfp_mask would block)
> we can still use up to 50% of the memory pool without blocking (and
> without endangering deadlock-free allocation). Ie. the memory pool is also
> a fast 'frontside cache' of memory elements.

Ingo was assuming that __GFP_HIGH was still functional.  It isn't, and the
mempool design wants it.

371151c9

[PATCH] set_page_dirty() in mark_dirty_kiobuf() · 9bd6f86b

Andrew Morton authored Jul 04, 2002

Yet another SetPageDirty/set_page_dirty bugfix: mark_dirty_kiobuf needs
to run set_page_dirty() so the page goes onto its mapping's dirty_pages
list.

9bd6f86b

[PATCH] check for O_DIRECT capability in open(), not write() · 6ef5d4bb

Andrew Morton authored Jul 04, 2002

For O_DIRECT opens we're currently checking that the fs supports
O_DIRECT at write(2)-time.

This is a forward-port of Andrea's patch which moves the check to
open() time.  Seems more sensible.

6ef5d4bb

[PATCH] set TASK_RUNNING in yield() · b5b6fa52

Andrew Morton authored Jul 04, 2002

It seems that the yield() macro requires state TASK_RUNNING, but
practically none of the callers remember to do that.

The patch turns yield() into a real function which sets state
TASK_RUNNING before scheduling.

b5b6fa52

[PATCH] set TASK_RUNNING in cond_resched() · b2bd3a26

Andrew Morton authored Jul 04, 2002

do_select() does set_current_state(TASK_INTERRUPTIBLE) then calls
__pollwait() which calls __get_free_page() and the cond_resched() which
I added to the pagecache reclaim code never returns.

The patch makes cond_resched() more useful by setting current->state to
TASK_RUNNING before scheduling.

b2bd3a26

[PATCH] add new list_splice_init() · f42e6ed8

Andrew Morton authored Jul 04, 2002

A little cleanup: Most callers of list_splice() immediately
reinitialise the source list_head after calling list_splice().

So create a new list_splice_init() which does all that.

f42e6ed8

[PATCH] shmem fixes · e7c89646

Andrew Morton authored Jul 04, 2002

A shmem cleanup/bugfix patch from Hugh Dickins.

- Minor: in try_to_unuse(), only wait on writeout if we actually
  started new writeout.  Otherwise, there is no need because a
  wait_on_page_writeback() has already been executed against this page.
  And it's locked, so no new writeback can start.

- Minor: in shmem_unuse_inode(): remove all the
  wait_on_page_writeback() logic.  We already did that in
  try_to_unuse(), adn the page is locked so no new writeback can start.

- Less minor: add a missing a page_cache_release() to
  shmem_get_page_locked() in the uncommon case where the page was found
  to be under writeout.

e7c89646

[PATCH] remove swap_get_block() · b6a7f088

Andrew Morton authored Jul 04, 2002

Patch from Christoph Hellwig removes swap_get_block().

I was sort-of hanging onto this function because it is a standard
get_block function, and maybe perhaps it could be used to make swap use
the regular filesystem I/O functions.  We don't want to do that, so
kill it.

b6a7f088

[PATCH] pdflush cleanup · f0e10c64

Andrew Morton authored Jul 04, 2002

Writeback/pdflush cleanup patch from Steven Augart

* Exposes nr_pdflush_threads as /proc/sys/vm/nr_pdflush_threads, read-only.

  (I like this - I expect that management of the pdflush thread pool
  will be important for many-spindle machines, and this is a neat way
  of getting at the info).

* Adds minimum and maximum checking to the five writable pdflush
  and fs-writeback  parameters.

* Minor indentation fix in sysctl.c

* mm/pdflush.c now includes linux/writeback.h, which prototypes
  pdflush_operation.  This is so that the compiler can
  automatically check that the prototype matches the definition.

* Adds a few comments to existing code.

f0e10c64

[PATCH] misc cleanups and fixes · 06be3a5e

Andrew Morton authored Jul 04, 2002

- Comment and documentation fixlets

- Remove some unneeded fields from swapper_inode (these are a
  leftover from when I had swap using the filesystem IO functions).

- fix a printk bug in pci/pool.c: when dma_addr_t is 64 bit it
  generates a compile warning, and will print out garbage.  Cast it to
  unsigned long long.

- Convert some writeback #defines into enums (Steven Augart)

06be3a5e

[PATCH] debug check for leaked blockdev buffers · 5226cca6

Andrew Morton authored Jul 04, 2002

Having just fiddled with the refcounts of blockdev buffers, I want some
way of assuring that the code is correct and is not leaking
buffer_heads.

There's no easy way to do this: if a blockdev page has pinned buffers
then truncate_complete_page just cuts it loose and we leak memory.

The patch adds a bit of debug code to catch these leaks.  This code,
PF_RADIX_TREE and buffer_error() need to be removed later on.

5226cca6

[PATCH] Remove ext3's buffer_head cache · 34cb9226

Andrew Morton authored Jul 04, 2002

Removes ext3's open-coded inode and allocation bitmap LRUs.

This patch includes a cleanup to ext3_new_block().  The local variables
`bh', `bh2', `i', `j', `k' and `tmp' have been renamed to something
more palatable.

34cb9226

[PATCH] Remove ext2's buffer_head cache · 7ef751c5
Andrew Morton authored Jul 04, 2002
```
Remove ext2's open-coded bitmap LRUs.  Core kernel does this for it now.
```
7ef751c5

[PATCH] per-cpu buffer_head cache · e7ae11b6

Andrew Morton authored Jul 04, 2002

ext2 and ext3 implement a custom LRU cache of buffer_heads - the eight
most-recently-used inode bitmap buffers and the eight MRU block bitmap
buffers.

I don't like them, for a number of reasons:

- The code is duplicated between filesystems

- The functionality is unavailable to other filesystems

- The LRU only applies to bitmap buffers.  And not, say, indirects.

- The LRUs are subtly dependent upon lock_super() for protection:
  without lock_super protection a bitmap could be evicted and freed
  while in use.

  And removing this dependence on lock_super() gets us one step on
  the way toward getting that semaphore out of the ext2 block allocator -
  it causes significant contention under some loads and should be a
  spinlock.

- The LRUs pin 64 kbytes per mounted filesystem.

Now, we could just delete those LRUs and rely on the VM to manage the
memory.  But that would introduce significant lock contention in
__find_get_block - the blockdev mapping's private_lock and page_lock
are heavily used.

So this patch introduces a transparent per-CPU bh lru which is hidden
inside __find_get_block(), __getblk() and __bread().  It is designed to
shorten code paths and to reduce lock contention.  It uses a seven-slot
LRU.  It achieves a 99% hit rate in `dbench 64'.  It provides benefit
to all filesystems.

The next patches remove the open-coded LRUs from ext2 and ext3.

Taken together, these patches are a code cleanup (300-400 lines gone),
and they reduce lock contention.  Anton tested these patches on the
32-way and demonstrated a throughput improvement of up to 15% on
RAM-only dbench runs.  See http://samba.org/~anton/linux/2.5.24/dbench/

Most of this benefit is from avoiding find_get_page() on the blockdev
mapping.  Because the generic LRU copes with indirect blocks as well as
bitmaps.

e7ae11b6

[PATCH] Fix 3c59x driver for some 3c566B's · fbaf74c8

Andrew Morton authored Jul 04, 2002

Fix from Rahul Karnik and Donald Becker - some new 3c566B mini-PCI NICs
refuse to power up the transceiver unless we tickle an undocumented bit
in an undocumented register.  They worked this out by before-and-after
diffing of the register contents when it was set up by the Windows
driver.

fbaf74c8

[PATCH] handle BIO allocation failures in swap_writepage() · 27c02b00
Andrew Morton authored Jul 04, 2002
```
If allocation of a BIO for swap writeout fails, mark the page dirty
again to save it from eviction.
```
27c02b00

20 Jun, 2002 16 commits
- Linux version 2.5.24 · 2ffe5f2f
  Linus Torvalds authored Jun 20, 2002
  
  2ffe5f2f
- Fix up various odds and ends after big merges.. · 24255033
  Linus Torvalds authored Jun 20, 2002
  
  24255033
- [PATCH] Another RAID-5 XOR assembly fix · 4376dc0e
  Andi Kleen authored Jun 20, 2002
```
The last changes did trigger another latent bug in the inline assembly.
akpm noticed it because he compiles his kernels with frame pointers.
```
  4376dc0e
- Merge http://gkernel.bkbits.net/irda-2.5 · e79952fd
  Linus Torvalds authored Jun 20, 2002
```
into home.transmeta.com:/home/torvalds/v2.5/linux
```
  e79952fd
- [PATCH] uninline elv_next_request() · 3ba0f211
  Jens Axboe authored Jun 20, 2002
```
Uninline elv_next_request() and move it to elevator.c, where it belongs.
Because of CURRENT declaration, this actually saves lots of space.  From
Andrew.
```
  3ba0f211
- [PATCH] namespace.c - compiler warning · 1856a951
  Robert Kuebel authored Jun 19, 2002
```
init_rootfs() (from ramfs) doesn't appear in any header file.  I didn't
see any that looked like a good home, so lets put a prototype at the top
of fs/namespace.c.  This only use of this function is in namespace.c.
```
  1856a951
- [PATCH] 3c509.c - 2_2 · 0dcc1e61
  Robert Kuebel authored Jun 19, 2002
```
This patch makes sure the 3c509 module license is always GPL.  Currently
the MODULE_LICENSE() marco is only used when CONFIG_ISAPNP or
CONFIG_ISAPNP_MODULE is defined.  I have moved MODULE_LICENSE() to the
#ifdef MODULE section at the bottom of 3c509.c.

Same is true for the MODULE_DEVICE_TABLE() macro.
```
  0dcc1e61
- [PATCH] remove unnecessary parentheses from expand() · f3fa4e6a
  William Lee Irwin III authored Jun 19, 2002
```
Not sure why I forgot to do this, but here is a small bit of tidying up
of some leftover parentheses from the memlist macro removal. The
parentheses are just noise and should go.
```
  f3fa4e6a
- [PATCH] remove unnecessary headers from mm_page_alloc.c · fcec5d01
  William Lee Irwin III authored Jun 19, 2002
```
page_alloc.c does not use either slab.h or swapctl.h. This removes the
inclusion of those headers from page_alloc.c
```
  fcec5d01
- [PATCH] beautify nr_free_pages() · 6d3c546f
  William Lee Irwin III authored Jun 19, 2002
```
nr_free_pages() is overly verbose. The following is perhaps clearer and
gets to the point with fewer lines of code and inside of 80 columns.
```
  6d3c546f
- [PATCH] Re: convert BAD_RANGE() to an inline function · f6adf918
  William Lee Irwin III authored Jun 19, 2002
  
  f6adf918
- [PATCH] Re: TRIVIAL: William Lee Irwin III: buddy system comment · 9a89d96d
  William Lee Irwin III authored Jun 19, 2002
  
  9a89d96d
- [PATCH] Consolidate sys_pause · 021ba6a0
  Stephen Rothwell authored Jun 19, 2002
```
14 of our 17 architectures define sys_pause exactly the same
way.  The other three don't define it at all.  I assume glibc
translates pause() into sigsuspend() or something.
```
  021ba6a0
- [PATCH] Trivial TAP_TUN patch to remove minmax macros · f72a32ff
  Rusty Russell authored Jun 19, 2002
```
In favour of those now in kernel.h..
```
  f72a32ff
- [PATCH] trivial: reiserfs whitespace · 2f8e8c5b
  Rusty Russell authored Jun 19, 2002
  
  2f8e8c5b
- [PATCH] Typo in radeonfb.c printk() · 4722056e
  James Mayer authored Jun 19, 2002
  
  4722056e