fs/ext3/inode.c · fb796d31fc22175fe702c3890effb77b60c08605 · Kirill Smelkov / linux

[PATCH] permit direct IO with finer-than-fs-blocksize alignments · 4a4c6811

Andrew Morton authored Oct 28, 2002

Mainly from Badari Pulavarty

Traditionally we have only supported O_DIRECT I/O at an alignment and
granularity which matches the underlying filesystem.  That typically
means that all IO must be 4k-aligned and a multiple of 4k in size.

Here, we relax that so that direct I/O happens with (typically)
512-byte alignment and multiple-of-512-byte size.

The tricky part is when a write starts and/or ends partway through a
filesystem block which has just been added.  We need to zero out the
parts of that block which lie outside the written region.

We handle that by putting appropriately-sized parts of the ZERO_PAGE
into sepatate BIOs.

The generic_direct_IO() function has been changed so that the
filesystem must pass in the address of the block_device against which
the IO is to be performed.  I'd have preferred to not do this, but we
do need that info at that time so that alignment checks can be
performed.

If the filesystem passes in a NULL block_device pointer then we fall
back to the old behaviour - must align with the fs blocksize.

There is no trivial way for userspace to know what the minimum
alignment is - it depends on what bdev_hardsect_size() says about the
device.  It is _usually_ 512 bytes, but not always.  This introduces
the risk that someone will develop and test applications which work
fine on their hardware, but will fail on someone else's hardware.

It is possible to query the hardsect size using the BLKSSZGET ioctl
against the backing block device.  This can be performed at runtime or
at application installation time.

4a4c6811

inode.c 83.4 KB

Replace inode.c