[PATCH] permit direct IO with finer-than-fs-blocksize alignments
Mainly from Badari Pulavarty Traditionally we have only supported O_DIRECT I/O at an alignment and granularity which matches the underlying filesystem. That typically means that all IO must be 4k-aligned and a multiple of 4k in size. Here, we relax that so that direct I/O happens with (typically) 512-byte alignment and multiple-of-512-byte size. The tricky part is when a write starts and/or ends partway through a filesystem block which has just been added. We need to zero out the parts of that block which lie outside the written region. We handle that by putting appropriately-sized parts of the ZERO_PAGE into sepatate BIOs. The generic_direct_IO() function has been changed so that the filesystem must pass in the address of the block_device against which the IO is to be performed. I'd have preferred to not do this, but we do need that info at that time so that alignment checks can be performed. If the filesystem passes in a NULL block_device pointer then we fall back to the old behaviour - must align with the fs blocksize. There is no trivial way for userspace to know what the minimum alignment is - it depends on what bdev_hardsect_size() says about the device. It is _usually_ 512 bytes, but not always. This introduces the risk that someone will develop and test applications which work fine on their hardware, but will fail on someone else's hardware. It is possible to query the hardsect size using the BLKSSZGET ioctl against the backing block device. This can be performed at runtime or at application installation time.
Showing
Please register or sign in to comment