Commit 122e7943 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'mm-hotfixes-stable-2023-07-28-15-52' of...

Merge tag 'mm-hotfixes-stable-2023-07-28-15-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull hotfixes from Andrew Morton:
 "11 hotfixes. Five are cc:stable and the remainder address post-6.4
  issues or aren't considered serious enough to justify backporting"

* tag 'mm-hotfixes-stable-2023-07-28-15-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm/memory-failure: fix hardware poison check in unpoison_memory()
  proc/vmcore: fix signedness bug in read_from_oldmem()
  mailmap: update remaining active codeaurora.org email addresses
  mm: lock VMA in dup_anon_vma() before setting ->anon_vma
  mm: fix memory ordering for mm_lock_seq and vm_lock_seq
  scripts/spelling.txt: remove 'thead' as a typo
  mm/pagewalk: fix EFI_PGT_DUMP of espfix area
  shmem: minor fixes to splice-read implementation
  tmpfs: fix Documentation of noswap and huge mount options
  Revert "um: Use swap() to make code cleaner"
  mm/damon/core-test: initialise context before test in damon_test_set_attrs()
parents 20d3f241 6c54312f
This diff is collapsed.
...@@ -84,8 +84,6 @@ nr_inodes The maximum number of inodes for this instance. The default ...@@ -84,8 +84,6 @@ nr_inodes The maximum number of inodes for this instance. The default
is half of the number of your physical RAM pages, or (on a is half of the number of your physical RAM pages, or (on a
machine with highmem) the number of lowmem RAM pages, machine with highmem) the number of lowmem RAM pages,
whichever is the lower. whichever is the lower.
noswap Disables swap. Remounts must respect the original settings.
By default swap is enabled.
========= ============================================================ ========= ============================================================
These parameters accept a suffix k, m or g for kilo, mega and giga and These parameters accept a suffix k, m or g for kilo, mega and giga and
...@@ -99,36 +97,31 @@ mount with such options, since it allows any user with write access to ...@@ -99,36 +97,31 @@ mount with such options, since it allows any user with write access to
use up all the memory on the machine; but enhances the scalability of use up all the memory on the machine; but enhances the scalability of
that instance in a system with many CPUs making intensive use of it. that instance in a system with many CPUs making intensive use of it.
tmpfs blocks may be swapped out, when there is a shortage of memory.
tmpfs has a mount option to disable its use of swap:
====== ===========================================================
noswap Disables swap. Remounts must respect the original settings.
By default swap is enabled.
====== ===========================================================
tmpfs also supports Transparent Huge Pages which requires a kernel tmpfs also supports Transparent Huge Pages which requires a kernel
configured with CONFIG_TRANSPARENT_HUGEPAGE and with huge supported for configured with CONFIG_TRANSPARENT_HUGEPAGE and with huge supported for
your system (has_transparent_hugepage(), which is architecture specific). your system (has_transparent_hugepage(), which is architecture specific).
The mount options for this are: The mount options for this are:
====== ============================================================ ================ ==============================================================
huge=0 never: disables huge pages for the mount huge=never Do not allocate huge pages. This is the default.
huge=1 always: enables huge pages for the mount huge=always Attempt to allocate huge page every time a new page is needed.
huge=2 within_size: only allocate huge pages if the page will be huge=within_size Only allocate huge page if it will be fully within i_size.
fully within i_size, also respect fadvise()/madvise() hints. Also respect madvise(2) hints.
huge=3 advise: only allocate huge pages if requested with huge=advise Only allocate huge page if requested with madvise(2).
fadvise()/madvise() ================ ==============================================================
====== ============================================================
See also Documentation/admin-guide/mm/transhuge.rst, which describes the
There is a sysfs file which you can also use to control system wide THP sysfs file /sys/kernel/mm/transparent_hugepage/shmem_enabled: which can
configuration for all tmpfs mounts, the file is: be used to deny huge pages on all tmpfs mounts in an emergency, or to
force huge pages on all tmpfs mounts for testing.
/sys/kernel/mm/transparent_hugepage/shmem_enabled
This sysfs file is placed on top of THP sysfs directory and so is registered
by THP code. It is however only used to control all tmpfs mounts with one
single knob. Since it controls all tmpfs mounts it should only be used either
for emergency or testing purposes. The values you can set for shmem_enabled are:
== ============================================================
-1 deny: disables huge on shm_mnt and all mounts, for
emergency use
-2 force: enables huge on shm_mnt and all mounts, w/o needing
option, for testing
== ============================================================
tmpfs has a mount option to set the NUMA memory allocation policy for tmpfs has a mount option to set the NUMA memory allocation policy for
all files in that instance (if CONFIG_NUMA is enabled) - which can be all files in that instance (if CONFIG_NUMA is enabled) - which can be
......
...@@ -3,7 +3,6 @@ ...@@ -3,7 +3,6 @@
* Copyright (C) 2002 - 2008 Jeff Dike (jdike@{addtoit,linux.intel}.com) * Copyright (C) 2002 - 2008 Jeff Dike (jdike@{addtoit,linux.intel}.com)
*/ */
#include <linux/minmax.h>
#include <unistd.h> #include <unistd.h>
#include <errno.h> #include <errno.h>
#include <fcntl.h> #include <fcntl.h>
...@@ -51,7 +50,7 @@ static struct pollfds all_sigio_fds; ...@@ -51,7 +50,7 @@ static struct pollfds all_sigio_fds;
static int write_sigio_thread(void *unused) static int write_sigio_thread(void *unused)
{ {
struct pollfds *fds; struct pollfds *fds, tmp;
struct pollfd *p; struct pollfd *p;
int i, n, respond_fd; int i, n, respond_fd;
char c; char c;
...@@ -78,7 +77,9 @@ static int write_sigio_thread(void *unused) ...@@ -78,7 +77,9 @@ static int write_sigio_thread(void *unused)
"write_sigio_thread : " "write_sigio_thread : "
"read on socket failed, " "read on socket failed, "
"err = %d\n", errno); "err = %d\n", errno);
swap(current_poll, next_poll); tmp = current_poll;
current_poll = next_poll;
next_poll = tmp;
respond_fd = sigio_private[1]; respond_fd = sigio_private[1];
} }
else { else {
......
...@@ -132,7 +132,7 @@ ssize_t read_from_oldmem(struct iov_iter *iter, size_t count, ...@@ -132,7 +132,7 @@ ssize_t read_from_oldmem(struct iov_iter *iter, size_t count,
u64 *ppos, bool encrypted) u64 *ppos, bool encrypted)
{ {
unsigned long pfn, offset; unsigned long pfn, offset;
size_t nr_bytes; ssize_t nr_bytes;
ssize_t read = 0, tmp; ssize_t read = 0, tmp;
int idx; int idx;
......
...@@ -641,8 +641,14 @@ static inline void vma_numab_state_free(struct vm_area_struct *vma) {} ...@@ -641,8 +641,14 @@ static inline void vma_numab_state_free(struct vm_area_struct *vma) {}
*/ */
static inline bool vma_start_read(struct vm_area_struct *vma) static inline bool vma_start_read(struct vm_area_struct *vma)
{ {
/* Check before locking. A race might cause false locked result. */ /*
if (vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq)) * Check before locking. A race might cause false locked result.
* We can use READ_ONCE() for the mm_lock_seq here, and don't need
* ACQUIRE semantics, because this is just a lockless check whose result
* we don't rely on for anything - the mm_lock_seq read against which we
* need ordering is below.
*/
if (READ_ONCE(vma->vm_lock_seq) == READ_ONCE(vma->vm_mm->mm_lock_seq))
return false; return false;
if (unlikely(down_read_trylock(&vma->vm_lock->lock) == 0)) if (unlikely(down_read_trylock(&vma->vm_lock->lock) == 0))
...@@ -653,8 +659,13 @@ static inline bool vma_start_read(struct vm_area_struct *vma) ...@@ -653,8 +659,13 @@ static inline bool vma_start_read(struct vm_area_struct *vma)
* False unlocked result is impossible because we modify and check * False unlocked result is impossible because we modify and check
* vma->vm_lock_seq under vma->vm_lock protection and mm->mm_lock_seq * vma->vm_lock_seq under vma->vm_lock protection and mm->mm_lock_seq
* modification invalidates all existing locks. * modification invalidates all existing locks.
*
* We must use ACQUIRE semantics for the mm_lock_seq so that if we are
* racing with vma_end_write_all(), we only start reading from the VMA
* after it has been unlocked.
* This pairs with RELEASE semantics in vma_end_write_all().
*/ */
if (unlikely(vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq))) { if (unlikely(vma->vm_lock_seq == smp_load_acquire(&vma->vm_mm->mm_lock_seq))) {
up_read(&vma->vm_lock->lock); up_read(&vma->vm_lock->lock);
return false; return false;
} }
...@@ -676,7 +687,7 @@ static bool __is_vma_write_locked(struct vm_area_struct *vma, int *mm_lock_seq) ...@@ -676,7 +687,7 @@ static bool __is_vma_write_locked(struct vm_area_struct *vma, int *mm_lock_seq)
* current task is holding mmap_write_lock, both vma->vm_lock_seq and * current task is holding mmap_write_lock, both vma->vm_lock_seq and
* mm->mm_lock_seq can't be concurrently modified. * mm->mm_lock_seq can't be concurrently modified.
*/ */
*mm_lock_seq = READ_ONCE(vma->vm_mm->mm_lock_seq); *mm_lock_seq = vma->vm_mm->mm_lock_seq;
return (vma->vm_lock_seq == *mm_lock_seq); return (vma->vm_lock_seq == *mm_lock_seq);
} }
...@@ -688,7 +699,13 @@ static inline void vma_start_write(struct vm_area_struct *vma) ...@@ -688,7 +699,13 @@ static inline void vma_start_write(struct vm_area_struct *vma)
return; return;
down_write(&vma->vm_lock->lock); down_write(&vma->vm_lock->lock);
vma->vm_lock_seq = mm_lock_seq; /*
* We should use WRITE_ONCE() here because we can have concurrent reads
* from the early lockless pessimistic check in vma_start_read().
* We don't really care about the correctness of that early check, but
* we should use WRITE_ONCE() for cleanliness and to keep KCSAN happy.
*/
WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq);
up_write(&vma->vm_lock->lock); up_write(&vma->vm_lock->lock);
} }
...@@ -702,7 +719,7 @@ static inline bool vma_try_start_write(struct vm_area_struct *vma) ...@@ -702,7 +719,7 @@ static inline bool vma_try_start_write(struct vm_area_struct *vma)
if (!down_write_trylock(&vma->vm_lock->lock)) if (!down_write_trylock(&vma->vm_lock->lock))
return false; return false;
vma->vm_lock_seq = mm_lock_seq; WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq);
up_write(&vma->vm_lock->lock); up_write(&vma->vm_lock->lock);
return true; return true;
} }
......
...@@ -514,6 +514,20 @@ struct vm_area_struct { ...@@ -514,6 +514,20 @@ struct vm_area_struct {
}; };
#ifdef CONFIG_PER_VMA_LOCK #ifdef CONFIG_PER_VMA_LOCK
/*
* Can only be written (using WRITE_ONCE()) while holding both:
* - mmap_lock (in write mode)
* - vm_lock->lock (in write mode)
* Can be read reliably while holding one of:
* - mmap_lock (in read or write mode)
* - vm_lock->lock (in read or write mode)
* Can be read unreliably (using READ_ONCE()) for pessimistic bailout
* while holding nothing (except RCU to keep the VMA struct allocated).
*
* This sequence counter is explicitly allowed to overflow; sequence
* counter reuse can only lead to occasional unnecessary use of the
* slowpath.
*/
int vm_lock_seq; int vm_lock_seq;
struct vma_lock *vm_lock; struct vma_lock *vm_lock;
...@@ -679,6 +693,20 @@ struct mm_struct { ...@@ -679,6 +693,20 @@ struct mm_struct {
* by mmlist_lock * by mmlist_lock
*/ */
#ifdef CONFIG_PER_VMA_LOCK #ifdef CONFIG_PER_VMA_LOCK
/*
* This field has lock-like semantics, meaning it is sometimes
* accessed with ACQUIRE/RELEASE semantics.
* Roughly speaking, incrementing the sequence number is
* equivalent to releasing locks on VMAs; reading the sequence
* number can be part of taking a read lock on a VMA.
*
* Can be modified under write mmap_lock using RELEASE
* semantics.
* Can be read with no other protection when holding write
* mmap_lock.
* Can be read with ACQUIRE semantics if not holding write
* mmap_lock.
*/
int mm_lock_seq; int mm_lock_seq;
#endif #endif
......
...@@ -76,8 +76,14 @@ static inline void mmap_assert_write_locked(struct mm_struct *mm) ...@@ -76,8 +76,14 @@ static inline void mmap_assert_write_locked(struct mm_struct *mm)
static inline void vma_end_write_all(struct mm_struct *mm) static inline void vma_end_write_all(struct mm_struct *mm)
{ {
mmap_assert_write_locked(mm); mmap_assert_write_locked(mm);
/* No races during update due to exclusive mmap_lock being held */ /*
WRITE_ONCE(mm->mm_lock_seq, mm->mm_lock_seq + 1); * Nobody can concurrently modify mm->mm_lock_seq due to exclusive
* mmap_lock being held.
* We need RELEASE semantics here to ensure that preceding stores into
* the VMA take effect before we unlock it with this store.
* Pairs with ACQUIRE semantics in vma_start_read().
*/
smp_store_release(&mm->mm_lock_seq, mm->mm_lock_seq + 1);
} }
#else #else
static inline void vma_end_write_all(struct mm_struct *mm) {} static inline void vma_end_write_all(struct mm_struct *mm) {}
......
...@@ -320,25 +320,25 @@ static void damon_test_update_monitoring_result(struct kunit *test) ...@@ -320,25 +320,25 @@ static void damon_test_update_monitoring_result(struct kunit *test)
static void damon_test_set_attrs(struct kunit *test) static void damon_test_set_attrs(struct kunit *test)
{ {
struct damon_ctx ctx; struct damon_ctx *c = damon_new_ctx();
struct damon_attrs valid_attrs = { struct damon_attrs valid_attrs = {
.min_nr_regions = 10, .max_nr_regions = 1000, .min_nr_regions = 10, .max_nr_regions = 1000,
.sample_interval = 5000, .aggr_interval = 100000,}; .sample_interval = 5000, .aggr_interval = 100000,};
struct damon_attrs invalid_attrs; struct damon_attrs invalid_attrs;
KUNIT_EXPECT_EQ(test, damon_set_attrs(&ctx, &valid_attrs), 0); KUNIT_EXPECT_EQ(test, damon_set_attrs(c, &valid_attrs), 0);
invalid_attrs = valid_attrs; invalid_attrs = valid_attrs;
invalid_attrs.min_nr_regions = 1; invalid_attrs.min_nr_regions = 1;
KUNIT_EXPECT_EQ(test, damon_set_attrs(&ctx, &invalid_attrs), -EINVAL); KUNIT_EXPECT_EQ(test, damon_set_attrs(c, &invalid_attrs), -EINVAL);
invalid_attrs = valid_attrs; invalid_attrs = valid_attrs;
invalid_attrs.max_nr_regions = 9; invalid_attrs.max_nr_regions = 9;
KUNIT_EXPECT_EQ(test, damon_set_attrs(&ctx, &invalid_attrs), -EINVAL); KUNIT_EXPECT_EQ(test, damon_set_attrs(c, &invalid_attrs), -EINVAL);
invalid_attrs = valid_attrs; invalid_attrs = valid_attrs;
invalid_attrs.aggr_interval = 4999; invalid_attrs.aggr_interval = 4999;
KUNIT_EXPECT_EQ(test, damon_set_attrs(&ctx, &invalid_attrs), -EINVAL); KUNIT_EXPECT_EQ(test, damon_set_attrs(c, &invalid_attrs), -EINVAL);
} }
static struct kunit_case damon_test_cases[] = { static struct kunit_case damon_test_cases[] = {
......
...@@ -2487,7 +2487,7 @@ int unpoison_memory(unsigned long pfn) ...@@ -2487,7 +2487,7 @@ int unpoison_memory(unsigned long pfn)
goto unlock_mutex; goto unlock_mutex;
} }
if (!folio_test_hwpoison(folio)) { if (!PageHWPoison(p)) {
unpoison_pr_info("Unpoison: Page was already unpoisoned %#lx\n", unpoison_pr_info("Unpoison: Page was already unpoisoned %#lx\n",
pfn, &unpoison_rs); pfn, &unpoison_rs);
goto unlock_mutex; goto unlock_mutex;
......
...@@ -615,6 +615,7 @@ static inline int dup_anon_vma(struct vm_area_struct *dst, ...@@ -615,6 +615,7 @@ static inline int dup_anon_vma(struct vm_area_struct *dst,
* anon pages imported. * anon pages imported.
*/ */
if (src->anon_vma && !dst->anon_vma) { if (src->anon_vma && !dst->anon_vma) {
vma_start_write(dst);
dst->anon_vma = src->anon_vma; dst->anon_vma = src->anon_vma;
return anon_vma_clone(dst, src); return anon_vma_clone(dst, src);
} }
......
...@@ -48,8 +48,11 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, ...@@ -48,8 +48,11 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
if (walk->no_vma) { if (walk->no_vma) {
/* /*
* pte_offset_map() might apply user-specific validation. * pte_offset_map() might apply user-specific validation.
* Indeed, on x86_64 the pmd entries set up by init_espfix_ap()
* fit its pmd_bad() check (_PAGE_NX set and _PAGE_RW clear),
* and CONFIG_EFI_PGT_DUMP efi_mm goes so far as to walk them.
*/ */
if (walk->mm == &init_mm) if (walk->mm == &init_mm || addr >= TASK_SIZE)
pte = pte_offset_kernel(pmd, addr); pte = pte_offset_kernel(pmd, addr);
else else
pte = pte_offset_map(pmd, addr); pte = pte_offset_map(pmd, addr);
......
...@@ -2796,7 +2796,8 @@ static ssize_t shmem_file_splice_read(struct file *in, loff_t *ppos, ...@@ -2796,7 +2796,8 @@ static ssize_t shmem_file_splice_read(struct file *in, loff_t *ppos,
if (*ppos >= i_size_read(inode)) if (*ppos >= i_size_read(inode))
break; break;
error = shmem_get_folio(inode, *ppos / PAGE_SIZE, &folio, SGP_READ); error = shmem_get_folio(inode, *ppos / PAGE_SIZE, &folio,
SGP_READ);
if (error) { if (error) {
if (error == -EINVAL) if (error == -EINVAL)
error = 0; error = 0;
...@@ -2805,7 +2806,9 @@ static ssize_t shmem_file_splice_read(struct file *in, loff_t *ppos, ...@@ -2805,7 +2806,9 @@ static ssize_t shmem_file_splice_read(struct file *in, loff_t *ppos,
if (folio) { if (folio) {
folio_unlock(folio); folio_unlock(folio);
if (folio_test_hwpoison(folio)) { if (folio_test_hwpoison(folio) ||
(folio_test_large(folio) &&
folio_test_has_hwpoisoned(folio))) {
error = -EIO; error = -EIO;
break; break;
} }
...@@ -2841,7 +2844,7 @@ static ssize_t shmem_file_splice_read(struct file *in, loff_t *ppos, ...@@ -2841,7 +2844,7 @@ static ssize_t shmem_file_splice_read(struct file *in, loff_t *ppos,
folio_put(folio); folio_put(folio);
folio = NULL; folio = NULL;
} else { } else {
n = splice_zeropage_into_pipe(pipe, *ppos, len); n = splice_zeropage_into_pipe(pipe, *ppos, part);
} }
if (!n) if (!n)
......
...@@ -1541,7 +1541,6 @@ temeprature||temperature ...@@ -1541,7 +1541,6 @@ temeprature||temperature
temorary||temporary temorary||temporary
temproarily||temporarily temproarily||temporarily
temperture||temperature temperture||temperature
thead||thread
theads||threads theads||threads
therfore||therefore therfore||therefore
thier||their thier||their
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment