Commits · 4a4c6811f4fb8aa7f59fbb04c678e48d080e1071 · nexedi / linux

29 Oct, 2002 7 commits

[PATCH] permit direct IO with finer-than-fs-blocksize alignments · 4a4c6811

Andrew Morton authored Oct 28, 2002

Mainly from Badari Pulavarty

Traditionally we have only supported O_DIRECT I/O at an alignment and
granularity which matches the underlying filesystem.  That typically
means that all IO must be 4k-aligned and a multiple of 4k in size.

Here, we relax that so that direct I/O happens with (typically)
512-byte alignment and multiple-of-512-byte size.

The tricky part is when a write starts and/or ends partway through a
filesystem block which has just been added.  We need to zero out the
parts of that block which lie outside the written region.

We handle that by putting appropriately-sized parts of the ZERO_PAGE
into sepatate BIOs.

The generic_direct_IO() function has been changed so that the
filesystem must pass in the address of the block_device against which
the IO is to be performed.  I'd have preferred to not do this, but we
do need that info at that time so that alignment checks can be
performed.

If the filesystem passes in a NULL block_device pointer then we fall
back to the old behaviour - must align with the fs blocksize.

There is no trivial way for userspace to know what the minimum
alignment is - it depends on what bdev_hardsect_size() says about the
device.  It is _usually_ 512 bytes, but not always.  This introduces
the risk that someone will develop and test applications which work
fine on their hardware, but will fail on someone else's hardware.

It is possible to query the hardsect size using the BLKSSZGET ioctl
against the backing block device.  This can be performed at runtime or
at application installation time.

4a4c6811

[PATCH] restructure direct-io to suit bio_add_page · a9577554

Andrew Morton authored Oct 28, 2002

The direct IO code was initially designed to allocate a known-sized
BIO, to fill it with pages and to then send it off.

Then along came bio_add_page().  Really, it broke direct-io.c - it
meant that the direct-IO BIO assembly code no longer had a-priori
knowledge of whether a page would fit into the current BIO.

Our attempts to rework the initial design to play well with
bio_add_page() really weren't adequate.  The code was getting more and
more twisty and we kept finding corner-cases which failed.

So this patch redesigns the BIO assembly and submission path of the
direct-IO code so that it better suits the bio_add_page() semantics.

It introduces another layer in the assembly phase: the 'cur_page' which
is cached in the dio structure.

The function which walks the file mapping do_direct_IO() simply emits a
sequence of (page,offset,len,sector) quads into the next layer down -
submit_page_section().

submit_page_section() is responsible for looking for a merge of the new
quad against the previous page section (same page).  If no merge is
possible it passes the currently-cached page down to the next level,
dio_send_cur_page().

dio_send_cur_page() will try to add the current page to the current
BIO.  If that fails, the current BIO is submitted for IO and we open a
new one.

So it's all nicely layered.  The assembly of sections-of-page into the
current page closely mirrors the assembly of sections-of-BIO into the
current BIO.

At both of these levels everything is done in a "deferred" manner: try
to merge a new request onto the currently-cached one.  If that fails
then send the currently-cached request and then cache this one instead.

Some variables have been renamed to more closely represent their usage.

Some thought has been put into ownership of the various state variables
within `struct dio'.  We were updating and inspecting these in various
places in a rather hard-to-follow manner.  So things have been reworked
so that particular functions "own" particular parts of the dio
structure.  Violators have been exterminated and commentary has been
added to describe this ownership.

The handling of file holes has been simplified.

As a consequence of all this, the code is clearer and simpler than it
used to be, and it now passes the modified-for-O_DIRECT fsx-linux
testing again.

a9577554

[PATCH] invalidate_inode_pages fixes · caa2f807

Andrew Morton authored Oct 28, 2002

Two fixes here.

First:

Fixes a BUG() which occurs if you try to perform O_DIRECT IO against a
blockdev which has an fs mounted on it.  (We should be able to do
that).

What happens is that do_invalidatepage() ends up calling
discard_buffer() on buffers which it couldn't strip.  That clears
buffer_mapped() against useful things like the superblock buffer_head.
The next submit_bh() goes BUG over the write of an unmapped buffer.

So just run try_to_release_page() (aka try_to_free_buffers()) on the
invalidate path.


Second:

The invalidate_inode_pages() functions are best-effort pagecache
shrinkers.  They are used against pages inside i_size and are not
supposed to throw away dirty data.

However it is possible for another CPU to run set_page_dirty() against
one of these pages after invalidate_inode_pages() has decided that it
is clean.  This could happen if someone was performing O_DIRECT IO
against a file which was also mapped with MAP_SHARED.

So recheck the dirty state of the page inside the mapping->page_lock
and back out if the page has just been marked dirty.

This will also prevent the remove_from_page_cache() BUG which will occur
if someone marks the page dirty between the clear_page_dirty() and
remove_from_page_cache() calls in truncate_complete_page().

caa2f807

[PATCH] libfs a_ops correctnes · 303c9cf6

Andrew Morton authored Oct 28, 2002

simple_prepare_write() currently memsets the entire page.  It only
needs to clear the parts which are outside the to-be-written region.
This change makes no difference to performance - that memset was just a
cache preload for the copy_from_user() in generic_file_write().  But
it's more correct.

Also, mark the page dirty in simple_commit_write(), not in
simple_prepare_write().  Because the page's contents are changed after
prepare_write().  This doesn't matter in practice, but it is setting a
bad example.

Also, add a flush_dcache_page() to simple_prepare_write().  Again, not
really needed because the page cannot be mapped into pagetables if it
is not uptodate.  But it is example code and should not be missing such
things.

303c9cf6

[PATCH] move ramfs a_ops into libfs · 3ee477f0

Andrew Morton authored Oct 28, 2002

From Bill Irwin.

Abstract out ramfs readpage(), prepare_write(), and commit_write()
operations.

Ram-backed filesystems are going to be doing a lot of zero-filled read
and write operations.  So in this patch, ramfs' implementations are
moved to libfs in anticipation of other callers.

3ee477f0

[PATCH] blkdev_get_block fix · f596aeef

Andrew Morton authored Oct 28, 2002

Patch from Hugh Dickins <hugh@veritas.com>

Fix premature -EIO from blkdev_get_block: bdget initialize
bd_block_size consistent with bd_inode->i_blkbits (assigned by
new_inode).  Otherwise, subsequent set_blocksize can find bd_block_size
doesn't need updating, and skip updating i_blkbits, leaving them
inconsistent.

f596aeef

[PATCH] fid dmi compile warning · ba3d6419
Andrew Morton authored Oct 28, 2002
```
Local variable `data' is only used for debugging.
```
ba3d6419

28 Oct, 2002 33 commits

[PATCH] remove LVM1 leftovers from the tree · 2829a935

Christoph Hellwig authored Oct 28, 2002

Now that the devicemapper hit the tree there's no more reason
to keep the uncompiling LVM1 code around and it's various hacks
to other files around, this patch removes it.

2829a935

[PATCH] ide-{disk,cd,...} got separate block_device_operations · adf283f2

Alexander Viro authored Oct 28, 2002

	* first application of the fact that block device methods are
per-disk and not per-major - IDE subdrivers got block_device_operations
of their own, redirects in ide.c are gone, so is a bunch of methods of
IDE subdrivers.

adf283f2

[PATCH] ide-taskfile ioctls prototype cleanup · f3da61af
Alexander Viro authored Oct 28, 2002
```
	* ide_..._ioctl() never use two of five arguments - inode and file.
Arguments removed.
```
f3da61af
[PATCH] IO counters - per-disk part · 5ddfdaad
Alexander Viro authored Oct 28, 2002

5ddfdaad

[PATCH] IO counters - per-partition part · 2103a00b

Alexander Viro authored Oct 28, 2002

	This chunk and the next one basically do equivalent of sard in the
right way - counters are exported per-disk in driverfs, as attributes of
disk or partition nodes.

2103a00b

[PATCH] dasd fixes · 5cdeb2cc
Alexander Viro authored Oct 28, 2002

5cdeb2cc

[PATCH] block_device_operations always picked from gendisk · eaa0bfbd

Alexander Viro authored Oct 28, 2002

	* do_open() cleaned up
	* we always pick block_device_operations from gendisk->fops now
	* register_blkdev() just stores the name of driver, nothing more
	* ->bd_op and ->bd_queue removed - we have that in gendisk
	* get_blkfops() is gone

eaa0bfbd

[PATCH] saner initialization order in IDE (gendisks allocated slightly earlier) · 0b0f135d

Alexander Viro authored Oct 28, 2002

	* we move allocation of gendisks in ide-probe to the moment when
queues are set up, so everything that wants to feed requests in one of
IDE queues can safely set ->rq_disk

0b0f135d

[PATCH] removed a bunch of gratuitous kdev_t uses · 5e40b913
Alexander Viro authored Oct 28, 2002

5e40b913
[PATCH] presto cache keyed by superblock instead of kdev_t · 71f73bd1
Alexander Viro authored Oct 28, 2002

71f73bd1
[PATCH] r/o state moved to gendisks · 4d466c1f
Alexander Viro authored Oct 28, 2002

4d466c1f
[PATCH] randomness made per-disk · 1bec5152
Alexander Viro authored Oct 28, 2002
```
	* per-major array eliminated, every disk is a separate source of
randomness
```
1bec5152
[PATCH] removed a bunch of gratuitous ->rq_dev uses · 288ed82d
Alexander Viro authored Oct 28, 2002

288ed82d

[PATCH] blk_dev[] is gone · d5f24b98

Alexander Viro authored Oct 28, 2002

	* remove blk_dev[]
	* removed BLK_DEFAULT_QUEUE
	* moved definition of CURRENT into drivers that used it
	* removed definition of QUEUE from headers

d5f24b98

[PATCH] mtdblock_ro fixes (based on patch from rmk) · 996395f2
Alexander Viro authored Oct 28, 2002
```
	* compile fixes
	* switched to private queue
	* set ->queue
```
996395f2
[PATCH] swim3.c cleanup · 0ffbe56a
Alexander Viro authored Oct 28, 2002
```
	* killed uses of CURRENT and QUEUE
```
0ffbe56a
[PATCH] dasd.c · 0fe11f99
Alexander Viro authored Oct 28, 2002
```
	* switched to private queues
	* set ->queue
```
0fe11f99
[PATCH] ftl.c fix · 042e7783
Alexander Viro authored Oct 28, 2002
```
	* killed remaining CURRENT
```
042e7783

[PATCH] xd.c · 7473f096

Alexander Viro authored Oct 28, 2002

	* switched to private queues
	* set ->queue and ->private_data
	* switched to use of ->bd_disk and ->rq_disk
	* cleaned up

7473f096

[PATCH] hd.c · 9bc1bce3

Alexander Viro authored Oct 28, 2002

	* switched to private queues
	* set ->queue and ->private_data
	* switched to use of ->bd_disk and ->rq_disk
	* folded recalibrate[] and special_op[] into hd_info[]
	* switched to passing pointers instead of indices
	* cleaned up

9bc1bce3

[PATCH] mtdblock (based on a patch from rmk) · ac8fc838
Alexander Viro authored Oct 28, 2002
```
	* switched to private queues
	* set ->queue
```
ac8fc838

[PATCH] nftl · 836d901c

Alexander Viro authored Oct 28, 2002

	* switched to private queues
	* set ->queue and ->private_data
	* switched to use of ->bd_disk and ->rq_disk
	* fixed the problem with request_module() from open()
	* cleaned up

836d901c

[PATCH] ps2esdi · e8782ed7

Alexander Viro authored Oct 28, 2002

	* switched to private queues
	* set ->queue and ->private_data
	* switched to use of ->bd_disk and ->rq_disk
	* somewhat cleaned up

e8782ed7

[PATCH] xpram · dc931f53

Alexander Viro authored Oct 28, 2002

	* switched to private queues
	* set ->queue and ->private_data
	* switched to use of ->bd_disk

dc931f53

[PATCH] z2ram · c14f2880
Alexander Viro authored Oct 28, 2002
```
	* switched to private queues
	* set ->queue
	* cleaned up
```
c14f2880
[PATCH] rd · 624a75e8
Alexander Viro authored Oct 28, 2002
```
	* switched to private queues
	* set ->queue
	* cleaned up
```
624a75e8
[PATCH] A couple of compile fixes · 7bc7ae9e
Alexander Viro authored Oct 28, 2002

7bc7ae9e
Merge mulgrave.(none):/home/jejb/BK/linux-2.5 · 7d9f8b15
James Bottomley authored Oct 28, 2002
```
into mulgrave.(none):/home/jejb/BK/scsi-for-linus-2.5
```
7d9f8b15
patch up scsi mismerge · 4046ce17
James Bottomley authored Oct 28, 2002

4046ce17
Massive merge by hand for changes duplicated independently · 7f40154f
James Bottomley authored Oct 28, 2002
```
by Doug Ledford, hch and alan.
```
7f40154f
ia-64 kcore changes broke i386. Guess who gets the shaft? · 43756209
Linus Torvalds authored Oct 28, 2002

43756209

[PATCH] sparc64 read_barrier_depends fix · c45026dc

Andrew Morton authored Oct 28, 2002

From Dipankar

I missed sparc64 when I broke up read_barrier_depends in -mm and sent
to Linus.  Please apply this to your tree until Linus is back and I can
fix it.

c45026dc

[PATCH] RCU idle detection fix · 3bf97e49

Andrew Morton authored Oct 28, 2002

Patch from Dipankar Sarma <dipankar@in.ibm.com>

There is a check in RCU for idle CPUs which signifies quiescent state
(and hence no reference to RCU protected data) which was broken when
interrupt counters were changed to use thread_info->preempt_count.

Martin's 32 CPU machine with many idle CPUs was not completing any RCU
grace period because RCU was forever waiting for idle CPUs to context
switch.  Had the idle check worked, this would not have happened.  With
no RCU happening, the dentries were getting "freed" (dentry stats
showing that) but not getting returned to slab.  This would not show up
in systems that are generally busy as context switches then would
happen in all CPUs and the per-CPU quiescent state counter would get
incremented during context switch.

3bf97e49