[PATCH] page writeback locking update
- Fixes a performance problem - callers of prepare_write/commit_write, etc are locking pages, which synchronises them behind writeback, which also locks these pages. Significant slowdowns for some workloads. - So pages are no longer locked while under writeout. Introduce a new PG_writeback and associated infrastructure to support this design change. - Pages which are under read I/O still use PageLocked. Pages which are under write I/O have PageWriteback() true. I considered creating Page_IO instead of PageWriteback, and marking both readin and writeout pages as PageIO(). So pages are unlocked during both read and write. There just doesn't seem a need to do this - nobody ever needs unblocking access to a page which is under read I/O. - Pages under swapout (brw_page) are PageLocked, not PageWriteback. So their treatment is unchangeded. It's not obvious that pages which are under swapout actually need the more asynchronous behaviour of PageWriteback. I was setting the swapout pages PageWriteback and unlocking them prior to submitting the buffers in brw_page(). This led to deadlocks on the exit_mmap->zap_page_range->free_swap_and_cache path. These functions call block_flushpage under spinlock. If the page is unlocked but has locked buffers, block_flushpage->discard_buffer() sleeps. Under spinlock. So that will need fixing if for some reason we want swapout to use PageWriteback. Kernel has called block_flushpage() under spinlock for a long time. It is assuming that a locked page will never have locked buffers. This appears to be true, but it's ugly. - Adds new function wait_on_page_writeback(). Renames wait_on_page() to wait_on_page_locked() to remind people that they need to call the appropriate one. - Renames filemap_fdatasync() to filemap_fdatawrite(). It's more accurate - "sync" implies, if anything, writeout and wait. (fsync, msync) Or writeout. it's not clear. - Subtly changes the filemap_fdatawrite() internals - this function used to do a lock_page() - it waited for any other user of the page to let go before submitting new I/O against a page. It has been changed to simply skip over any pages which are currently under writeback. This is the right thing to do for memory-cleansing reasons. But it's the wrong thing to do for data consistency operations (eg, fsync()). For those operations we must ensure that all data which was dirty *at the time of the system call* are tight on disk before the call returns. So all places which care about this have been converted to do: filemap_fdatawait(mapping); /* Wait for current writeback */ filemap_fdatawrite(mapping); /* Write all dirty pages */ filemap_fdatawait(mapping); /* Wait for I/O to complete */ - Fixes a truncate_inode_pages problem - truncate currently will block when it hits a locked page, so it ends up getting into lockstep behind writeback and all of the file is pointlessly written back. One fix for this is for truncate to simply walk the page list in the opposite direction from writeback. I chose to use a separate cleansing pass. It is more CPU-intensive, but it is surer and clearer. This is because there is no reason why the per-address_space ->vm_writeback and ->writeback_mapping functions *have* to perform writeout in ->dirty_pages order. They may choose to do something totally different. (set_page_dirty() is an a_op now, so address_spaces could almost privatise the whole dirty-page handling thing. Except truncate_inode_pages and invalidate_inode_pages assume that the pages are on the address_space lists. hmm. So making truncate_inode_pages and invalidate_inode_pages a_ops would make some sense).
Showing
Please register or sign in to comment