Commit 01408c49 authored by NeilBrown's avatar NeilBrown Committed by Linus Torvalds

[PATCH] Prepare for __copy_from_user_inatomic to not zero missed bytes

The problem is that when we write to a file, the copy from userspace to
pagecache is first done with preemption disabled, so if the source address is
not immediately available the copy fails *and* *zeros* *the* *destination*.

This is a problem because a concurrent read (which admittedly is an odd thing
to do) might see zeros rather that was there before the write, or what was
there after, or some mixture of the two (any of these being a reasonable thing
to see).

If the copy did fail, it will immediately be retried with preemption
re-enabled so any transient problem with accessing the source won't cause an
error.

The first copying does not need to zero any uncopied bytes, and doing so
causes the problem.  It uses copy_from_user_atomic rather than copy_from_user
so the simple expedient is to change copy_from_user_atomic to *not* zero out
bytes on failure.

The first of these two patches prepares for the change by fixing two places
which assume copy_from_user_atomic does zero the tail.  The two usages are
very similar pieces of code which copy from a userspace iovec into one or more
page-cache pages.  These are changed to remove the assumption.

The second patch changes __copy_from_user_inatomic* to not zero the tail.
Once these are accepted, I will look at similar patches of other architectures
where this is important (ppc, mips and sparc being the ones I can find).

This patch:

There is a problem with __copy_from_user_inatomic zeroing the tail of the
buffer in the case of an error.  As it is called in atomic context, the error
may be transient, so it results in zeros being written where maybe they
shouldn't be.

In the usage in filemap, this opens a window for a well timed read to see data
(zeros) which is not consistent with any ordering of reads and writes.

Most cases where __copy_from_user_inatomic is called, a failure results in
__copy_from_user being called immediately.  As long as the latter zeros the
tail, the former doesn't need to.  However in *copy_from_user_iovec
implementations (in both filemap and ntfs/file), it is assumed that
copy_from_user_inatomic will zero the tail.

This patch removes that assumption, so that after this patch it will
be safe for copy_from_user_inatomic to not zero the tail.

This patch also adds some commentary to filemap.h and asm-i386/uaccess.h.

After this patch, all architectures that might disable preempt when
kmap_atomic is called need to have their __copy_from_user_inatomic* "fixed".
This includes
 - powerpc
 - i386
 - mips
 - sparc
Signed-off-by: default avatarNeil Brown <neilb@suse.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
parent 5f507d9e
...@@ -1358,7 +1358,7 @@ static inline size_t ntfs_copy_from_user(struct page **pages, ...@@ -1358,7 +1358,7 @@ static inline size_t ntfs_copy_from_user(struct page **pages,
goto out; goto out;
} }
static size_t __ntfs_copy_from_user_iovec(char *vaddr, static size_t __ntfs_copy_from_user_iovec_inatomic(char *vaddr,
const struct iovec *iov, size_t iov_ofs, size_t bytes) const struct iovec *iov, size_t iov_ofs, size_t bytes)
{ {
size_t total = 0; size_t total = 0;
...@@ -1376,10 +1376,6 @@ static size_t __ntfs_copy_from_user_iovec(char *vaddr, ...@@ -1376,10 +1376,6 @@ static size_t __ntfs_copy_from_user_iovec(char *vaddr,
bytes -= len; bytes -= len;
vaddr += len; vaddr += len;
if (unlikely(left)) { if (unlikely(left)) {
/*
* Zero the rest of the target like __copy_from_user().
*/
memset(vaddr, 0, bytes);
total -= left; total -= left;
break; break;
} }
...@@ -1420,11 +1416,13 @@ static inline void ntfs_set_next_iovec(const struct iovec **iovp, ...@@ -1420,11 +1416,13 @@ static inline void ntfs_set_next_iovec(const struct iovec **iovp,
* pages (out to offset + bytes), to emulate ntfs_copy_from_user()'s * pages (out to offset + bytes), to emulate ntfs_copy_from_user()'s
* single-segment behaviour. * single-segment behaviour.
* *
* We call the same helper (__ntfs_copy_from_user_iovec()) both when atomic and * We call the same helper (__ntfs_copy_from_user_iovec_inatomic()) both
* when not atomic. This is ok because __ntfs_copy_from_user_iovec() calls * when atomic and when not atomic. This is ok because
* __copy_from_user_inatomic() and it is ok to call this when non-atomic. In * __ntfs_copy_from_user_iovec_inatomic() calls __copy_from_user_inatomic()
* fact, the only difference between __copy_from_user_inatomic() and * and it is ok to call this when non-atomic.
* __copy_from_user() is that the latter calls might_sleep(). And on many * Infact, the only difference between __copy_from_user_inatomic() and
* __copy_from_user() is that the latter calls might_sleep() and the former
* should not zero the tail of the buffer on error. And on many
* architectures __copy_from_user_inatomic() is just defined to * architectures __copy_from_user_inatomic() is just defined to
* __copy_from_user() so it makes no difference at all on those architectures. * __copy_from_user() so it makes no difference at all on those architectures.
*/ */
...@@ -1441,14 +1439,18 @@ static inline size_t ntfs_copy_from_user_iovec(struct page **pages, ...@@ -1441,14 +1439,18 @@ static inline size_t ntfs_copy_from_user_iovec(struct page **pages,
if (len > bytes) if (len > bytes)
len = bytes; len = bytes;
kaddr = kmap_atomic(*pages, KM_USER0); kaddr = kmap_atomic(*pages, KM_USER0);
copied = __ntfs_copy_from_user_iovec(kaddr + ofs, copied = __ntfs_copy_from_user_iovec_inatomic(kaddr + ofs,
*iov, *iov_ofs, len); *iov, *iov_ofs, len);
kunmap_atomic(kaddr, KM_USER0); kunmap_atomic(kaddr, KM_USER0);
if (unlikely(copied != len)) { if (unlikely(copied != len)) {
/* Do it the slow way. */ /* Do it the slow way. */
kaddr = kmap(*pages); kaddr = kmap(*pages);
copied = __ntfs_copy_from_user_iovec(kaddr + ofs, copied = __ntfs_copy_from_user_iovec_inatomic(kaddr + ofs,
*iov, *iov_ofs, len); *iov, *iov_ofs, len);
/*
* Zero the rest of the target like __copy_from_user().
*/
memset(kaddr + ofs + copied, 0, len - copied);
kunmap(*pages); kunmap(*pages);
if (unlikely(copied != len)) if (unlikely(copied != len))
goto err_out; goto err_out;
......
...@@ -458,6 +458,12 @@ __copy_to_user(void __user *to, const void *from, unsigned long n) ...@@ -458,6 +458,12 @@ __copy_to_user(void __user *to, const void *from, unsigned long n)
* *
* If some data could not be copied, this function will pad the copied * If some data could not be copied, this function will pad the copied
* data to the requested size using zero bytes. * data to the requested size using zero bytes.
*
* An alternate version - __copy_from_user_inatomic() - may be called from
* atomic context and will fail rather than sleep. In this case the
* uncopied bytes will *NOT* be padded with zeros. See fs/filemap.h
* for explanation of why this is needed.
* FIXME this isn't implimented yet EMXIF
*/ */
static __always_inline unsigned long static __always_inline unsigned long
__copy_from_user_inatomic(void *to, const void __user *from, unsigned long n) __copy_from_user_inatomic(void *to, const void __user *from, unsigned long n)
......
...@@ -1892,7 +1892,7 @@ int remove_suid(struct dentry *dentry) ...@@ -1892,7 +1892,7 @@ int remove_suid(struct dentry *dentry)
EXPORT_SYMBOL(remove_suid); EXPORT_SYMBOL(remove_suid);
size_t size_t
__filemap_copy_from_user_iovec(char *vaddr, __filemap_copy_from_user_iovec_inatomic(char *vaddr,
const struct iovec *iov, size_t base, size_t bytes) const struct iovec *iov, size_t base, size_t bytes)
{ {
size_t copied = 0, left = 0; size_t copied = 0, left = 0;
...@@ -1908,12 +1908,8 @@ __filemap_copy_from_user_iovec(char *vaddr, ...@@ -1908,12 +1908,8 @@ __filemap_copy_from_user_iovec(char *vaddr,
vaddr += copy; vaddr += copy;
iov++; iov++;
if (unlikely(left)) { if (unlikely(left))
/* zero the rest of the target like __copy_from_user */
if (bytes)
memset(vaddr, 0, bytes);
break; break;
}
} }
return copied - left; return copied - left;
} }
......
...@@ -16,15 +16,23 @@ ...@@ -16,15 +16,23 @@
#include <linux/uaccess.h> #include <linux/uaccess.h>
size_t size_t
__filemap_copy_from_user_iovec(char *vaddr, __filemap_copy_from_user_iovec_inatomic(char *vaddr,
const struct iovec *iov, const struct iovec *iov,
size_t base, size_t base,
size_t bytes); size_t bytes);
/* /*
* Copy as much as we can into the page and return the number of bytes which * Copy as much as we can into the page and return the number of bytes which
* were sucessfully copied. If a fault is encountered then clear the page * were sucessfully copied. If a fault is encountered then clear the page
* out to (offset+bytes) and return the number of bytes which were copied. * out to (offset+bytes) and return the number of bytes which were copied.
*
* NOTE: For this to work reliably we really want copy_from_user_inatomic_nocache
* to *NOT* zero any tail of the buffer that it failed to copy. If it does,
* and if the following non-atomic copy succeeds, then there is a small window
* where the target page contains neither the data before the write, nor the
* data after the write (it contains zero). A read at this time will see
* data that is inconsistent with any ordering of the read and the write.
* (This has been detected in practice).
*/ */
static inline size_t static inline size_t
filemap_copy_from_user(struct page *page, unsigned long offset, filemap_copy_from_user(struct page *page, unsigned long offset,
...@@ -60,13 +68,15 @@ filemap_copy_from_user_iovec(struct page *page, unsigned long offset, ...@@ -60,13 +68,15 @@ filemap_copy_from_user_iovec(struct page *page, unsigned long offset,
size_t copied; size_t copied;
kaddr = kmap_atomic(page, KM_USER0); kaddr = kmap_atomic(page, KM_USER0);
copied = __filemap_copy_from_user_iovec(kaddr + offset, iov, copied = __filemap_copy_from_user_iovec_inatomic(kaddr + offset, iov,
base, bytes); base, bytes);
kunmap_atomic(kaddr, KM_USER0); kunmap_atomic(kaddr, KM_USER0);
if (copied != bytes) { if (copied != bytes) {
kaddr = kmap(page); kaddr = kmap(page);
copied = __filemap_copy_from_user_iovec(kaddr + offset, iov, copied = __filemap_copy_from_user_iovec_inatomic(kaddr + offset, iov,
base, bytes); base, bytes);
if (bytes - copied)
memset(kaddr + offset + copied, 0, bytes - copied);
kunmap(page); kunmap(page);
} }
return copied; return copied;
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment