• Andrew Morton's avatar
    [PATCH] permit direct IO with finer-than-fs-blocksize alignments · 4a4c6811
    Andrew Morton authored
    Mainly from Badari Pulavarty
    
    Traditionally we have only supported O_DIRECT I/O at an alignment and
    granularity which matches the underlying filesystem.  That typically
    means that all IO must be 4k-aligned and a multiple of 4k in size.
    
    Here, we relax that so that direct I/O happens with (typically)
    512-byte alignment and multiple-of-512-byte size.
    
    The tricky part is when a write starts and/or ends partway through a
    filesystem block which has just been added.  We need to zero out the
    parts of that block which lie outside the written region.
    
    We handle that by putting appropriately-sized parts of the ZERO_PAGE
    into sepatate BIOs.
    
    The generic_direct_IO() function has been changed so that the
    filesystem must pass in the address of the block_device against which
    the IO is to be performed.  I'd have preferred to not do this, but we
    do need that info at that time so that alignment checks can be
    performed.
    
    If the filesystem passes in a NULL block_device pointer then we fall
    back to the old behaviour - must align with the fs blocksize.
    
    There is no trivial way for userspace to know what the minimum
    alignment is - it depends on what bdev_hardsect_size() says about the
    device.  It is _usually_ 512 bytes, but not always.  This introduces
    the risk that someone will develop and test applications which work
    fine on their hardware, but will fail on someone else's hardware.
    
    It is possible to query the hardsect size using the BLKSSZGET ioctl
    against the backing block device.  This can be performed at runtime or
    at application installation time.
    4a4c6811
inode.c 83.4 KB