Commits · 3f4ab37c4786bc196abfbae917be0efcda518454 · Kirill Smelkov / linux

An error occurred fetching the project authors.

29 Jun, 2004 2 commits

sparse: fix pointer/integer confusion · 9893721c

Linus Torvalds authored 20 years ago

I don't think we're in K&R any more, Toto.

If you want a NULL pointer, use NULL. Don't use an integer.

Most of the users really didn't seem to know the proper type.

9893721c

[PATCH] Remove NOOP code from fs/buffer.c::drop_buffers() · 4d1ae8c5

Anton Altaparmakov authored 20 years ago

I noticed that fs/buffer.c::drop_buffers() contains some code that
AFAICS doesn't actually do anything other than waste cpu cycles so here
is patch to remove it...  The local variable was_uptodate is being
messed with but it is not being read anywhere so it seems entirely
pointless.

I assume this must be a remainder from old code which mucked around with
the page uptodateness but which has since been (re-)moved.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

4d1ae8c5

22 May, 2004 2 commits

[PATCH] slab: consolidate panic code · b33a7bad

Andrew Morton authored 20 years ago

Many places do:

	if (kmem_cache_create(...) == NULL)
		panic(...);

We can consolidate all that by passing another flag to kmem_cache_create()
which says "panic if it doesn't work".

b33a7bad

[PATCH] revert recent swapcache handling changes · e74193ad

Andrew Morton authored 20 years ago

Go back to the 2.6.5 concepts, with rmap additions.  In particular:

- Implement Andrea's flavour of page_mapping().  This function opaquely does
  the right thing for pagecache pages, anon pages and for swapcache pages.

  The critical thing here is that page_mapping() returns &swapper_space for
  swapcache pages without actually requiring the storage at page->mapping. 
  This frees page->mapping for the anonmm/anonvma metadata.

- Andrea and Hugh placed the pagecache index of swapcache pages into
  page->private rather than page->index.  So add new page_index() function
  which hides this.

- Make swapper_space.set_page_dirty() again point at
  __set_page_dirty_buffers().  If we don't do that, a bare set_page_dirty()
  will fall through to __set_page_dirty_buffers(), which is silly.

  This way, __set_page_dirty_buffers() can continue to use page->mapping.
  It should never go near anon or swapcache pages.

- Give swapper_space a ->set_page_dirty address_space_operation method, so
  that set_page_dirty() will not fall through to __set_page_dirty_buffers()
  for swapcache pages.  That function is not set up to handle them.


The main effect of these changes is that swapcache pages are treated more
similarly to pagecache pages.  And we are again tagging swapcache pages as
dirty in their radix tree, which is a requirement if we later wish to
implement swapcache writearound based on tagged radix-tree walks.

e74193ad

21 May, 2004 1 commit

[PATCH] getblk() BUG removal · bd105032

Andrew Morton authored 20 years ago

We keep on getting BUG()s from isofs_read_super() because it passes an insane
blocksize to bread().  See http://bugme.osdl.org/show_bug.cgi?id=2735 for
example.

I don't know what's up with isofs, but going BUG in there seems a bit rude.
Change it to drop a bunch of diagnostics and a backtrace then return a null
bh*.

Most callers of getblk() don't expect it to fail, so they'll oops anyway.  But
isofs does actually check for a NULL return.  This way, the machine stays up
and we get better debug diagnostics.

bd105032

19 May, 2004 2 commits

[PATCH] Fix overzealous use of online cpu iterators · 2cb2f31f

Andrew Morton authored 20 years ago

From: Rusty Russell <rusty@rustcorp.com.au>

The IA64 hotplug CPU merge seems to have included some core changes: in
particular the recalc_bh_state() needs to sum for all (including offline)
cpus, since we don't empty the counters on CPU down.  The totals printed by
/proc/stat (the first loop) should include offline cpus, too (apparently
printing out the per-cpu lines for offline cpus confuses top).

2cb2f31f

[PATCH] blk_run_page() race fix · 66a759eb

Andrew Morton authored 20 years ago

blk_run_page() is incorrectly using page->mapping, which makes it racy against
removal from swapcache.

Make block_sync_page() use page_mapping(), and remove bkl_run_page(), which
only had one caller.

66a759eb

14 May, 2004 4 commits

[PATCH] Revisited: ia64-cpu-hotplug-cpu_present.patch · fda94eff

Andrew Morton authored 20 years ago

From: Paul Jackson <pj@sgi.com>

With a hotplug capable kernel, there is a requirement to distinguish a
possible CPU from one actually present.  The set of possible CPU numbers
doesn't change during a single system boot, but the set of present CPUs
changes as CPUs are physically inserted into or removed from a system.  The
cpu_possible_map does not change once initialized at boot, but the
cpu_present_map changes dynamically as CPUs are inserted or removed.


Paul Jackson <pj@sgi.com> provided an expanded explanation:


Ashok's cpu hot plug patch adds a cpu_present_map, resulting in the following
cpu maps being available.  All the following maps are fixed size bitmaps of
size NR_CPUS.

#ifdef CONFIG_HOTPLUG_CPU
	cpu_possible_map - map with all NR_CPUS bits set
	cpu_present_map - map with bit 'cpu' set iff cpu is populated
	cpu_online_map - map with bit 'cpu' set iff cpu available to scheduler
#else
	cpu_possible_map - map with bit 'cpu' set iff cpu is populated
	cpu_present_map - copy of cpu_possible_map
	cpu_online_map - map with bit 'cpu' set iff cpu available to scheduler
#endif

In either case, NR_CPUS is fixed at compile time, as the static size of these
bitmaps.  The cpu_possible_map is fixed at boot time, as the set of CPU id's
that it is possible might ever be plugged in at anytime during the life of
that system boot.  The cpu_present_map is dynamic(*), representing which CPUs
are currently plugged in.  And cpu_online_map is the dynamic subset of
cpu_present_map, indicating those CPUs available for scheduling.

If HOTPLUG is enabled, then cpu_possible_map is forced to have all NR_CPUS
bits set, otherwise it is just the set of CPUs that ACPI reports present at
boot.

If HOTPLUG is enabled, then cpu_present_map varies dynamically, depending on
what ACPI reports as currently plugged in, otherwise cpu_present_map is just a
copy of cpu_possible_map.

(*) Well, cpu_present_map is dynamic in the hotplug case.  If not hotplug,
    it's the same as cpu_possible_map, hence fixed at boot.

fda94eff

[PATCH] blk_run_page(): we don't trust bh->b_page · 4e36c118
Andrew Morton authored 20 years ago
```
We don't trust bh->b_page to point to the right thing across all filesystems,
so revert this bit.
```
4e36c118

[PATCH] Add blk_run_page() · e059d5da

Andrew Morton authored 20 years ago

From: Andrea Arcangeli <andrea@suse.de>

From: Jens Axboe

Add blk_run_page() API.  This is so that we can pass the target page all the
way down to (for example) the swap unplug function.  So swap can work out
which blockdevs back this particular page.

e059d5da

[PATCH] filtered wakeups: apply to buffer_head functions · 70d1f017

Andrew Morton authored 20 years ago

From: William Lee Irwin III <wli@holomorphy.com>

This patch implements wake-one semantics for buffer_head wakeups in a single
step. The buffer_head being waited on is passed to the waiter's wakeup
function by the waker, and the wakeup function compares that to the a pointer
stored in its on-stack structure and checking the readiness of the bit there
also. Wake-one semantics are achieved by using WQ_FLAG_EXCLUSIVE in the
codepaths waiting to acquire the bit for mutual exclusion.

70d1f017

22 Apr, 2004 1 commit

[PATCH] writeback livelock fix · 1ed73535

Andrew Morton authored 20 years ago

If a filesystem's ->writepage implementation repeatedly refuses to write the
page (it keeps on redirtying it instead) (reiserfs seems to do this) then the
writeback logic can get stuck repeately trying to write the same page.

Fix that up by correctly setting wbc->pages_skipped, to tell the writeback
logic that things aren't working out.

1ed73535

21 Apr, 2004 1 commit

[PATCH] lockfs - vfs bits · 137718ec

Andrew Morton authored 20 years ago

From: Christoph Hellwig <hch@lst.de>

These are the generic lockfs bits.  Basically it takes the XFS freezing
statemachine into the VFS.  It's all behind the kernel-doc documented
freeze_bdev and thaw_bdev interfaces.

Based on an older patch from Chris Mason.

137718ec

17 Apr, 2004 2 commits

[PATCH] remove buffer_error() · 4f990f49
Andrew Morton authored 20 years ago
```
From: Jeff Garzik <jgarzik@pobox.com>

It was debug code, no longer required.
```
4f990f49

[PATCH] kill submit_{bh,bio} return value · 01d86f02

Andrew Morton authored 20 years ago

From: Jeff Garzik <jgarzik@pobox.com>

Nobody ever checks the return value of submit_bh(), and submit_bh() is the
only caller that checks the submit_bio() return value.

This changes the kernel I/O submission path -- a fast path -- so this
cleanup is also a microoptimization.

01d86f02

12 Apr, 2004 10 commits

[PATCH] rmap 2 anon and swapcache · 4875a601

Andrew Morton authored 20 years ago

From: Hugh Dickins <hugh@veritas.com>

Tracking anonymous pages by anon_vma,pgoff or mm,address needs a
pointer,offset pair in struct page: mapping,index the natural choice. But
swapcache uses those for &swapper_space,swp_entry_t.

It's trivial to separate swapcache from pagecache with radix tree; most of
swapper_space is actually unused, just a fiction to pretend swap like file;
and page->private is a good place to keep swp_entry_t, now that swap never
uses bufferheads.

Define PG_anon bit, page_add_rmap SetPageAnon and put an oopsable address in
page->mapping to test that we're not confused by it. Define
page_mapping(page) macro to give NULL when PageAnon, whatever may be in
page->mapping. Define PG_swapcache bit, deduce swapper_space from that in
the few places we need it.

add_to_swap_cache now distinct from add_to_page_cache. Separating the caches
somewhat simplifies the tmpfs swizzling in swap_state.c, now the page can
briefly be in both caches.

The rmap method remains pte chains, no change to that yet. But one small
functional difference: the use of PageAnon implies that a page truncated
while still mapped will no longer be found and freed (swapped out) by
try_to_unmap, will only be freed by exit or munmap. But normally pages are
unmapped by vmtruncate: this should only affect nonlinear mappings, and a
later patch not in this batch will fix that.

4875a601

[PATCH] per-backing dev unplugging · 6d27f67b

Andrew Morton authored 20 years ago

From: Jens Axboe <axboe@suse.de>,
      Chris Mason,
      me, others.

The global unplug list causes horrid spinlock contention on many-disk
many-CPU setups - throughput is worse than halved.

The other problem with the global unplugging is of course that it will cause
the unplugging of queues which are unrelated to the I/O upon which the caller
is about to wait.

So what we do to solve these problems is to remove the global unplug and set
up the infrastructure under which the VFS can tell the block layer to unplug
only those queues which are relevant to the page or buffer_head whcih is
about to be waited upon.

We do this via the very appropriate address_space->backing_dev_info structure.

Most of the complexity is in devicemapper, MD and swapper_space, because for
these backing devices, multiple queues may need to be unplugged to complete a
page/buffer I/O.  In each case we ensure that data structures are in place to
permit us to identify all the lower-level queues which contribute to the
higher-level backing_dev_info.  Each contributing queue is told to unplug in
response to a higher-level unplug.

To simplify things in various places we also introduce the concept of a
"synchronous BIO": it is tagged with BIO_RW_SYNC.  The block layer will
perform an immediate unplug when it sees one of these go past.

6d27f67b

[PATCH] reiserfs: data=ordered support · bb0d9672
Andrew Morton authored 20 years ago
```
From: Chris Mason <mason@suse.com>

reiserfs data=ordered support.
```
bb0d9672

[PATCH] laptop mode · 93d33a48

Andrew Morton authored 20 years ago

From: Bart Samwel <bart@samwel.tk>

Adds /proc/sys/vm/laptop-mode: a special knob which says "this is a laptop".
In this mode the kernel will attempt to avoid spinning disks up.

Algorithm: the idea is to hold dirty data in memory for a long time, but to
flush everything which has been accumulated if the disk happens to spin up
for other reasons.

- Whenever a disk request completes (read or write), schedule a timer a few
  seconds hence.  If the timer was already pending, reset it to a few seconds
  hence.

- When the timer expires, write back the whole world.  We use
  sync_filesystems() for this because it will force ext3 journal commits as
  well.

- In balance_dirty_pages(), kick off background writeback when we hit the
  high threshold (dirty_ratio), not when we hit the low threshold.  This has
  the effect of causing "lumpy" writeback which is something I spent a year
  fixing, but in laptop mode, it is desirable.

- In try_to_free_pages(), only kick pdflush if the VM is getting into
  distress: we want to keep scanning for clean pages, deferring writeback.

- In page reclaim, avoid writing back the odd random dirty page off the
  LRU: only start I/O if the scanning is working harder.

The effect is to perform a sync() a few seconds after all I/O has ceased.

The value which was written into /proc/sys/vm/laptop-mode determines, in
seconds, the delay between the final I/O and the flush.

Additionally, the patch adds tools which help answer the question "why the
heck does my disk spin up all the time?".  The user may set
/proc/sys/vm/block_dump to a non-zero value and the kernel will print out
information which will identify the process which is performing disk reads or
which is dirtying pagecache.

The user should probably disable syslogd before setting block-dump.

93d33a48

[PATCH] don't allow background writes to hide dirty buffers · bd134f27

Andrew Morton authored 20 years ago

If pdflush hits a locked-and-clean buffer in __block_write_full_page() it
will just pass over the buffer. Typically the buffer is an ext3 data=ordered
buffer which is being written by kjournald, but a similar thing can happen
with blockdev buffers and ll_rw_block().

This is bad because the buffer is still under I/O and a subsequent fsync's
fdatawait() needs to know about it.

It is not practical to tag the page for writeback - only the submitter of the
I/O can do that, because the submitter has control of the end_io handler.

So instead, redirty the page so a subsequent fsync's fdatawrite() will wait on
the underway I/O.

There is a risk that pdflush::background_writeout() will lock up, repeatedly
trying and failing to write the same page. This is prevented by ensuring
that background_writeout() always throttles when it made no progress.

bd134f27

[PATCH] stop using the address_space dirty_pages list · 1d7d3304

Andrew Morton authored 20 years ago

Move everything over to walking the radix tree via the PAGECACHE_TAG_DIRTY
tag.  Remove address_space.dirty_pages.

1d7d3304

[PATCH] tag writeback pages as such in their radix tree · 40c8348e
Andrew Morton authored 20 years ago
```
Arrange for under-writeback pages to be marked thus in their pagecache radix
tree.
```
40c8348e
[PATCH] tag dirty pages as such in the radix tree · 8ece6262
Andrew Morton authored 20 years ago
```
Arrange for all dirty pagecache pages to be tagged as dirty within their
radix tree.
```
8ece6262

[PATCH] make the pagecache lock irq-safe. · 89261aab

Andrew Morton authored 20 years ago

Intro to these patches:

- Major surgery against the pagecache, radix-tree and writeback code. This
work is to address the O_DIRECT-vs-buffered data exposure horrors which
we've been struggling with for months.

As a side-effect, 32 bytes are saved from struct inode and eight bytes
are removed from struct page. At a cost of approximately 2.5 bits per page
in the radix tree nodes on 4k pagesize, assuming the pagecache is densely
populated. Not all pages are pagecache; other pages gain the full 8 byte
saving.

This change will break any arch code which is using page->list and will
also break any arch code which is using page->lru of memory which was
obtained from slab.

The basic problem which we (mainly Daniel McNeil) have been struggling
with is in getting a really reliable fsync() across the page lists while
other processes are performing writeback against the same file. It's like
juggling four bars of wet soap with your eyes shut while someone is
whacking you with a baseball bat. Daniel pretty much has the problem
plugged but I suspect that's just because we don't have testcases to
trigger the remaining problems. The complexity and additional locking
which those patches add is worrisome.

So the approach taken here is to remove the page lists altogether and
replace the list-based writeback and wait operations with in-order
radix-tree walks.

The radix-tree code has been enhanced to support "tagging" of pages, for
later searches for pages which have a particular tag set. This means that
we can ask the radix tree code "find me the next 16 dirty pages starting at
pagecache index N" and it will do that in O(log64(N)) time.

This affects I/O scheduling potentially quite significantly. It is no
longer the case that the kernel will submit pages for I/O in the order in
which the application dirtied them. We instead submit them in file-offset
order all the time.

This is likely to be advantageous when applications are seeking all over
a large file randomly writing small amounts of data. I haven't performed
much benchmarking, but tiobench random write throughput seems to be
increased by 30%. Other tests appear to be unaltered. dbench may have got
10-20% quicker, but it's variable.

There is one large file which everyone seeks all over randomly writing
small amounts of data: the blockdev mapping which caches filesystem
metadata. The kernel's IO submission patterns for this are now ideal.

Because writeback and wait-for-writeback use a tree walk instead of a
list walk they are no longer livelockable. This probably means that we no
longer need to hold i_sem across O_SYNC writes and perhaps fsync() and
fdatasync(). This may be beneficial for databases: multiple processes
writing and syncing different parts of the same file at the same time can
now all submit and wait upon writes to just their own little bit of the
file, so we can get a lot more data into the queues.

It is trivial to implement a part-file-fdatasync() as well, so
applications can say "sync the file from byte N to byte M", and multiple
applications can do this concurrently. This is easy for ext2 filesystems,
but probably needs lots of work for data-journalled filesystems and XFS and
it probably doesn't offer much benefit over an i_semless O_SYNC write.

These patches can end up making ext3 (even) slower:

for i in 1 2 3 4
do
dd if=/dev/zero of=$i bs=1M count=2000 &
done

runs awfully slow on SMP. This is, yet again, because all the file
blocks are jumbled up and the per-file linear writeout causes tons of
seeking. The above test runs sweetly on UP because the on UP we don't
allocate blocks to different files in parallel.

Mingming and Badari are working on getting block reservation working for
ext3 (preallocation on steroids). That should fix ext3 up.

This patch:

- Later, we'll need to access the radix trees from inside disk I/O
completion handlers. So make mapping->page_lock irq-safe. And rename it
to tree_lock to reliably break any missed conversions.

89261aab

[PATCH] Fix race between ll_rw_block() and block_write_full_page() · c2179a48

Andrew Morton authored 20 years ago

Fix a race which was identified by Daniel McNeil <daniel@osdl.org>

If a buffer_head is under I/O due to JBD's ordered data writeout (which uses
ll_rw_block()) then either filemap_fdatawrite() or filemap_fdatawait() need
to wait on the buffer's existing I/O.

Presently neither will do so, because __block_write_full_page() will not
actually submit any I/O and will hence not mark the page as being under
writeback.

The best-performing fix would be to somehow mark the page as being under
writeback and defer waiting for the ll_rw_block-initiated I/O until
filemap_fdatawait()-time.  But this is hard, because in
__block_write_full_page() we do not have control of the buffer_head's end_io
handler.  Possibly we could make JBD call into end_buffer_async_write(), but
that gets nasty.

This patch makes __block_write_full_page() wait for any buffer_head I/O to
complete before inspecting the buffer_head state.  It only does this in the
case where __block_write_full_page() was called for a "data-integrity" write:
(wbc->sync_mode != WB_SYNC_NONE).

Probably it doesn't matter, because kjournald is currently submitting (or has
already submitted) all dirty buffers anyway.

c2179a48

19 Mar, 2004 1 commit

[PATCH] Hotplug CPUs: Other CPU_DEAD Notifiers · 279ce7b2

Rusty Russell authored 20 years ago

Various files keep per-cpu caches which need to be freed/moved when a
CPU goes down.  All under CONFIG_HOTPLUG_CPU ifdefs.

scsi.c: drain dead cpu's scsi_done_q onto this cpu.

buffer.c: brelse the bh_lrus queue for dead cpu.

timer.c: migrate timers from dead cpu, being careful of lock order vs
	__mod_timer.

radix_tree.c: free dead cpu's radix_tree_preloads

page_alloc.c: empty dead cpu's nr_pagecache_local into nr_pagecache, and
	free pages on cpu's local cache.

slab.c: stop reap_timer for dead cpu, adjust each cache's free limit, and
	free each slab cache's per-cpu block.

swap.c: drain dead cpu's lru_add_pvecs into ours, and empty its committed_space
	counter into global counter.

dev.c: drain device queues from dead cpu into this one.

flow.c: drain dead cpu's flow cache.

279ce7b2

06 Mar, 2004 3 commits

[PATCH] Fix nobh_prepare_write() race · b12088bf

Andrew Morton authored 21 years ago

Dave Kleikamp <shaggy@austin.ibm.com> points out a race between
nobh_prepare_write() and end_buffer_read_sync(). end_buffer_read_sync()
calls unlock_buffer(), waking the nobh_prepare_write() thread, which
immediately frees the buffer_head. end_buffer_read_sync() then calls
put_bh() which decrements b_count for the already freed structure. The
SLAB_DEBUG code detects the slab corruption.

We fix this by giving nobh_prepare_write() a private buffer_head end_o
handler which doesn't touch the buffer's contents after unlocking it.

b12088bf

[PATCH] CONFIG_LBD fixes · d67c0fd5

Andrew Morton authored 21 years ago

From: Eric Sandeen <sandeen@sgi.com>

Several functions in buffer.c are using unsigned long where they should be
using sector_t.

Also, use pgoff_t in several places so it is easier to tell what is beingused
as a pagecache index, what is being used as a disk index and what is being
used as an offset-into-page.

d67c0fd5

[PATCH] fastcall / regparm fixes · 20e39386

Andrew Morton authored 21 years ago

From: Gerd Knorr <kraxel@suse.de>

Current gcc's error out if a function's declaration and definition disagree
about the register passing convention.

The patch adds a new `fastcall' declatation primitive, and uses that in all
the FASTCALL functions which we could find.  A number of inconsistencies were
fixed up along the way.

20e39386

18 Feb, 2004 1 commit

[PATCH] Remove More Unneccessary CPU Notifiers · 79caa7d5

Andrew Morton authored 21 years ago

From: Rusty Russell <rusty@rustcorp.com.au>

Three more removed CPU notifiers extracted from the hotplug CPU patch.

kernel/softirq.c: the tasklet cpu prepration callback is useless:
the vectors are already initialized to NULL.  Even with the hotplug
CPU patches, they're of little or no use.

fs/buffer.c: once again, they are already initialized to zero.

mm/page_alloc.c: once again, already initialized to zero.

79caa7d5

20 Jan, 2004 1 commit
- [PATCH] ratelimit I/O error printk's · 60f99f6a
  Andrew Morton authored 21 years ago
```
Ratelimit a couple of potentially-stormy printk's in the writeback code.
```
  60f99f6a
19 Jan, 2004 3 commits

[PATCH] Use for_each_cpu() Where It's Meant To Be · 012061cc

Andrew Morton authored 21 years ago

From: Rusty Russell <rusty@rustcorp.com.au>

Some places use cpu_online() where they should be using cpu_possible, most
commonly for tallying statistics.  This makes no difference without hotplug
CPU.

Use the for_each_cpu() macro in those places, providing good examples (and
making the external hotplug CPU patch smaller).

Some places use cpu_online() where they should be using cpu_possible, most
commonly for tallying statistics.  This makes no difference without hotplug
CPU.

Use the for_each_cpu() macro in those places, providing good examples (and
making the external hotplug CPU patch smaller).

012061cc

[PATCH] make try_to_free_pages walk zonelist · d5d4042d

Andrew Morton authored 21 years ago

From: Rik van Riel <riel@surriel.com>

In 2.6.0 both __alloc_pages() and the corresponding wakeup_kswapd()s walk
all zones in the zone list, possibly spanning multiple nodes in a low numa
factor system like AMD64.

Also, if lower_zone_protection is set in /proc, then it may be possible
that kswapd never cleans out data in zones further down the zonelist and
try_to_free_pages needs to do that.

However, in 2.6.0 try_to_free_pages() only frees pages in the pgdat the
first zone in the zonelist belongs to.

This is probably the wrong behaviour, since both the page allocator and the
kswapd wakeup free things from all zones on the zonelist.  The following
patch makes try_to_free_pages() consistent with the allocator, by passing
the zonelist as an argument and freeing pages from all zones in the list.

I do not have any numa systems myself, so I have only tested it on my own
little smp box.  Testing on NUMA systems may be useful, though the patch
really only should have an impact in those rare cases where kswapd can't
keep up with allocations...

As a side effect, the patch shrinks the kernel by 2 lines and replaces some
subtle magic by a simpler array walk.

d5d4042d

[PATCH] bdev: use correct mapping's i_sem · 54df7662

Andrew Morton authored 21 years ago

From: viro@parcelfarce.linux.theplanet.co.uk <viro@parcelfarce.linux.theplanet.co.uk>

In a bunch of places we used file->f_dentry->d_inode->i_sem to protect
fdatasync et.al. Replaced with corrent file->f_mapping->host->i_sem - the
object we are protecting is address_space, so we want an exclusion that would
work for redirected ->i_mapping. For normal files (not coda, not bdev) it's
all the same, of course - there we have

file->f_mapping->host == file->f_dentry->d_inode

and change above is an equivalent transfromation.

54df7662

30 Dec, 2003 1 commit

[PATCH] relax check of page/bh state on I/O error · e8640dfa

Andrew Morton authored 21 years ago

From: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

Suppress a buffer_error() warning which occurs when a page which previously
had an I/O error gets its buffers stripped.

e8640dfa

29 Sep, 2003 1 commit
- o kernel/ksyms.c: move relevant EXPORT_SYMBOLs to fs/buffer.c · 14da27d5
  Arnaldo Carvalho de Melo authored 21 years ago
  
  14da27d5
19 Aug, 2003 3 commits

[PATCH] async write errors: fix spurious fs truncate errors · e89061de

Andrew Morton authored 21 years ago

From: Oliver Xymoron <oxymoron@waste.org>

Currently, a writepage() which detects that it is writing outside i_size (due
to concurrent truncate) will abandon the write, returning -EIO.

The return value will bogusly cause an error to be recorded in the
address_space.  So convert all those writepage() instances to return zero in
this case.

e89061de

[PATCH] async write errors: use flags in address space · fcad2b42

Andrew Morton authored 21 years ago

From: Oliver Xymoron <oxymoron@waste.org>

This patch just saves a few bytes in the inode by turning mapping->gfp_mask
into an unsigned long mapping->flags.

The mapping's gfp mask is placed in the 16 high bits of mapping->flags and
two of the remaining 16 bits are used for tracking EIO and ENOSPC errors.

This leaves 14 bits in the mapping for future use.  They should be accessed
with the atomic bitops.

fcad2b42

[PATCH] async write errors: report truncate and io errors on · fe7e689f

Andrew Morton authored 21 years ago

From: Oliver Xymoron <oxymoron@waste.org>

These patches add the infrastructure for reporting asynchronous write errors
to block devices to userspace.  Error which are detected due to pdflush or VM
writeout are reported at the next fsync, fdatasync, or msync on the given
file, and on close if the error occurs in time.

We do this by propagating any errors into page->mapping->error when they are
detected.  In fsync(), msync(), fdatasync() and close() we return that error
and zero it out.


The Open Group say close() _may_ fail if an I/O error occurred while reading
from or writing to the file system.  Well, in this implementation close() can
return -EIO or -ENOSPC.  And in that case it will succeed, not fail - perhaps
that is what they meant.


There are three patches in this series and testing has only been performed
with all three applied.

fe7e689f

06 Aug, 2003 1 commit

[PATCH] remove PF_READAHEAD · 7ec6fb01

Andrew Morton authored 21 years ago

The problem with PF_READAHEAD is that if someone does a non-GFP_ATOMIC memory
allocation we can enter page reclaim and then call writepage, while
PF_READAHEAD is set.  The block layer then drops writes or the wrong reads on
the floor.  It can cause data loss.

A fix is complex (well, intrusive).  Given that the readahead code is now
skipping the entire readahead attempt if the queue is congested, the setting
of PF_READAHEAD probably is not doing anything useful anyway, so simply
remove it.

7ec6fb01