Commit 56a8c8eb authored by Hugh Dickins's avatar Hugh Dickins Committed by Linus Torvalds

tmpfs: do not allocate pages on read

Mikulas asked in "Do we still need commit a0ee5ec5 ('tmpfs: allocate
on read when stacked')?" in [1]

Lukas noticed this unusual behavior of loop device backed by tmpfs in [2].

Normally, shmem_file_read_iter() copies the ZERO_PAGE when reading
holes; but if it looks like it might be a read for "a stacking
filesystem", it allocates actual pages to the page cache, and even marks
them as dirty.  And reads from the loop device do satisfy the test that
is used.

This oddity was added for an old version of unionfs, to help to limit
its usage to the limited size of the tmpfs mount involved; but about the
same time as the tmpfs mod went in (2.6.25), unionfs was reworked to
proceed differently; and the mod kept just in case others needed it.

Do we still need it? I cannot answer with more certainty than "Probably
not".  It's nasty enough that we really should try to delete it; but if
a regression is reported somewhere, then we might have to revert later.

It's not quite as simple as just removing the test (as Mikulas did):
xfstests generic/013 hung because splice from tmpfs failed on page not
up-to-date and page mapping unset.  That can be fixed just by marking
the ZERO_PAGE as Uptodate, which of course it is: do so in
pagecache_init() - it might be useful to others than tmpfs.

My intention, though, was to stop using the ZERO_PAGE here altogether:
surely iov_iter_zero() is better for this case? Sadly not: it relies on
clear_user(), and the x86 clear_user() is slower than its copy_user() [3].

But while we are still using the ZERO_PAGE, let's stop dirtying its
struct page cacheline with unnecessary get_page() and put_page().

Link: https://lore.kernel.org/linux-mm/alpine.LRH.2.02.2007210510230.6959@file01.intranet.prod.int.rdu2.redhat.com/ [1]
Link: https://lore.kernel.org/linux-mm/20211126075100.gd64odg2bcptiqeb@work/ [2]
Link: https://lore.kernel.org/lkml/2f5ca5e4-e250-a41c-11fb-a7f4ebc7e1c9@google.com/ [3]
Link: https://lkml.kernel.org/r/90bc5e69-9984-b5fa-a685-be55f2b64b@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
Reported-by: default avatarMikulas Patocka <mpatocka@redhat.com>
Reported-by: default avatarLukas Czerner <lczerner@redhat.com>
Acked-by: default avatarDarrick J. Wong <djwong@kernel.org>
Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
Cc: Zdenek Kabelac <zkabelac@redhat.com>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent bc786390
...@@ -1054,6 +1054,12 @@ void __init pagecache_init(void) ...@@ -1054,6 +1054,12 @@ void __init pagecache_init(void)
init_waitqueue_head(&folio_wait_table[i]); init_waitqueue_head(&folio_wait_table[i]);
page_writeback_init(); page_writeback_init();
/*
* tmpfs uses the ZERO_PAGE for reading holes: it is up-to-date,
* and splice's page_cache_pipe_buf_confirm() needs to see that.
*/
SetPageUptodate(ZERO_PAGE(0));
} }
/* /*
......
...@@ -2499,19 +2499,10 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) ...@@ -2499,19 +2499,10 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
struct address_space *mapping = inode->i_mapping; struct address_space *mapping = inode->i_mapping;
pgoff_t index; pgoff_t index;
unsigned long offset; unsigned long offset;
enum sgp_type sgp = SGP_READ;
int error = 0; int error = 0;
ssize_t retval = 0; ssize_t retval = 0;
loff_t *ppos = &iocb->ki_pos; loff_t *ppos = &iocb->ki_pos;
/*
* Might this read be for a stacking filesystem? Then when reading
* holes of a sparse file, we actually need to allocate those pages,
* and even mark them dirty, so it cannot exceed the max_blocks limit.
*/
if (!iter_is_iovec(to))
sgp = SGP_CACHE;
index = *ppos >> PAGE_SHIFT; index = *ppos >> PAGE_SHIFT;
offset = *ppos & ~PAGE_MASK; offset = *ppos & ~PAGE_MASK;
...@@ -2520,6 +2511,7 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) ...@@ -2520,6 +2511,7 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
pgoff_t end_index; pgoff_t end_index;
unsigned long nr, ret; unsigned long nr, ret;
loff_t i_size = i_size_read(inode); loff_t i_size = i_size_read(inode);
bool got_page;
end_index = i_size >> PAGE_SHIFT; end_index = i_size >> PAGE_SHIFT;
if (index > end_index) if (index > end_index)
...@@ -2530,15 +2522,13 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) ...@@ -2530,15 +2522,13 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
break; break;
} }
error = shmem_getpage(inode, index, &page, sgp); error = shmem_getpage(inode, index, &page, SGP_READ);
if (error) { if (error) {
if (error == -EINVAL) if (error == -EINVAL)
error = 0; error = 0;
break; break;
} }
if (page) { if (page) {
if (sgp == SGP_CACHE)
set_page_dirty(page);
unlock_page(page); unlock_page(page);
if (PageHWPoison(page)) { if (PageHWPoison(page)) {
...@@ -2578,9 +2568,10 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) ...@@ -2578,9 +2568,10 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
*/ */
if (!offset) if (!offset)
mark_page_accessed(page); mark_page_accessed(page);
got_page = true;
} else { } else {
page = ZERO_PAGE(0); page = ZERO_PAGE(0);
get_page(page); got_page = false;
} }
/* /*
...@@ -2593,7 +2584,8 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) ...@@ -2593,7 +2584,8 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
index += offset >> PAGE_SHIFT; index += offset >> PAGE_SHIFT;
offset &= ~PAGE_MASK; offset &= ~PAGE_MASK;
put_page(page); if (got_page)
put_page(page);
if (!iov_iter_count(to)) if (!iov_iter_count(to))
break; break;
if (ret < nr) { if (ret < nr) {
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment