1. 18 Apr, 2023 40 commits
    • Peter Xu's avatar
      selftests/mm: allow uffd test to skip properly with no privilege · f9da2426
      Peter Xu authored
      Allow skip a unit test properly due to no privilege (e.g.  sigbus and
      events tests).
      
      [colin.i.king@gmail.com: fix spelling mistake "priviledge" -> "privilege"]
        Link: https://lkml.kernel.org/r/20230414081506.1678998-1-colin.i.king@gmail.com
      Link: https://lkml.kernel.org/r/20230412164520.329163-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarColin Ian King <colin.i.king@gmail.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f9da2426
    • Peter Xu's avatar
      selftests/mm: workaround no way to detect uffd-minor + wp · 4df9cefa
      Peter Xu authored
      Userfaultfd minor+wp mode was very recently added.  The test will fail on
      the old kernels at ioctl(UFFDIO_CONTINUE) which is misterious. 
      Unfortunately there's no feature bit to detect for this support.
      
      Add a hack to leverage WP_UNPOPULATED to detect whether that feature
      existed, since WP_UNPOPULATED was merged right after minor+wp.
      
      Link: https://lkml.kernel.org/r/20230412164517.329152-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4df9cefa
    • Peter Xu's avatar
      selftests/mm: move zeropage test into uffd unit tests · c3315502
      Peter Xu authored
      Simplifies it a bit along the way, e.g., drop the never used offset field
      (which was always the 1st page so offset=0).
      
      Introduce uffd_register_with_ioctls() out of uffd_register() to detect
      uffdio_register.ioctls got returned.  Check that automatically when testing
      UFFDIO_ZEROPAGE on different types of memory (and kernel).
      
      Link: https://lkml.kernel.org/r/20230412164404.328815-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c3315502
    • Peter Xu's avatar
      selftests/mm: move uffd sig/events tests into uffd unit tests · 73c1ea93
      Peter Xu authored
      Move the two tests into the unit test, and convert it into 20 standalone
      tests:
      
        - events test on all 5 mem types, with wp on/off
        - signal test on all 5 mem types, with wp on/off
      
        Testing sigbus on anon... done
        Testing sigbus on shmem... done
        Testing sigbus on shmem-private... done
        Testing sigbus on hugetlb... done
        Testing sigbus on hugetlb-private... done
        Testing sigbus-wp on anon... done
        Testing sigbus-wp on shmem... done
        Testing sigbus-wp on shmem-private... done
        Testing sigbus-wp on hugetlb... done
        Testing sigbus-wp on hugetlb-private... done
        Testing events on anon... done
        Testing events on shmem... done
        Testing events on shmem-private... done
        Testing events on hugetlb... done
        Testing events on hugetlb-private... done
        Testing events-wp on anon... done
        Testing events-wp on shmem... done
        Testing events-wp on shmem-private... done
        Testing events-wp on hugetlb... done
        Testing events-wp on hugetlb-private... done
      
      It'll also remove a lot of global references along the way,
      e.g. test_uffdio_wp will be replaced with the wp value passed over.
      
      Link: https://lkml.kernel.org/r/20230412164400.328798-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      73c1ea93
    • Peter Xu's avatar
      selftests/mm: move uffd minor test to unit test · 62515b5f
      Peter Xu authored
      This moves the minor test to the new unit test.
      
      Rewrite the content check with char* opeartions to avoid fiddling with
      my_bcmp().
      
      Drop global vars test_uffdio_minor and test_collapse, just assume test them
      always in common code for now.
      
      OTOH make this single test into five tests:
      
        - minor test on [shmem, hugetlb] with wp=false
        - minor test on [shmem, hugetlb] with wp=true
        - minor test + collapse on shmem only
      
      One thing to mention that we used to test COLLAPSE+WP but that doesn't
      sound right at all.  It's possible it's silently broken but unnoticed
      because COLLAPSE is not part of the default test suite.
      
      Make the MADV_COLLAPSE test fail-able (by skip it when failing), because
      it's not guaranteed to success anyway.
      
      Drop a bunch of useless code after the move, because the unit test always
      use aligned num of pages and has nothing to do with n_cpus.
      
      Link: https://lkml.kernel.org/r/20230412164357.328779-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      62515b5f
    • Peter Xu's avatar
      selftests/mm: move uffd pagemap test to unit test · 8bda424f
      Peter Xu authored
      Move it over and make it split into two tests, one for pagemap and one for
      the new WP_UNPOPULATED (to be a separate one).
      
      The thp pagemap test wasn't really working (with MADV_HUGEPAGE).  Let's
      just drop it (since it never really worked anyway..) and leave that for
      later.
      
      Link: https://lkml.kernel.org/r/20230412164352.328733-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8bda424f
    • Peter Xu's avatar
      selftests/mm: add framework for uffd-unit-test · 16a45b57
      Peter Xu authored
      Add a framework to be prepared to move unit tests from uffd-stress.c into
      uffd-unit-tests.c.  The goal is to allow detection of uffd features for
      each test, and also loop over specified types of memory that a test
      support.
      
      Link: https://lkml.kernel.org/r/20230412164348.328710-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      16a45b57
    • Peter Xu's avatar
      selftests/mm: allow allocate_area() to fail properly · be39fec4
      Peter Xu authored
      Mostly to detect hugetlb allocation errors and skip hugetlb tests when
      pages are not allocated.
      
      Link: https://lkml.kernel.org/r/20230412164345.328659-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      be39fec4
    • Peter Xu's avatar
      selftests/mm: let uffd_handle_page_fault() take wp parameter · 0210c43e
      Peter Xu authored
      Make the handler optionally apply WP bit when resolving page faults for
      either missing or minor page faults.  This moves towards removing global
      test_uffdio_wp outside of the common code.
      
      Link: https://lkml.kernel.org/r/20230412164341.328618-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0210c43e
    • Peter Xu's avatar
      selftests/mm: rename uffd_stats to uffd_args · 50834084
      Peter Xu authored
      Prepare for adding more fields into the struct.
      
      Link: https://lkml.kernel.org/r/20230412164337.328607-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Suggested-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      50834084
    • Peter Xu's avatar
      selftests/mm: drop global hpage_size in uffd tests · 265818ef
      Peter Xu authored
      hpage_size was wrongly used.  Sometimes it means hugetlb default size,
      sometimes it was used as thp size.
      
      Remove the global variable and use the right one at each place.
      
      Link: https://lkml.kernel.org/r/20230412164333.328596-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      265818ef
    • Peter Xu's avatar
      selftests/mm: drop global mem_fd in uffd tests · c5cb9036
      Peter Xu authored
      Drop it by creating the memfd dynamically in the tests.
      
      Link: https://lkml.kernel.org/r/20230412164331.328584-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c5cb9036
    • Peter Xu's avatar
      selftests/mm: UFFDIO_API test · d5433ce8
      Peter Xu authored
      Add one simple test for UFFDIO_API.  With that, I also added a bunch of
      small but handy helpers along the way.
      
      Link: https://lkml.kernel.org/r/20230412164257.328375-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d5433ce8
    • Peter Xu's avatar
      selftests/mm: uffd_open_{dev|sys}() · 78391f64
      Peter Xu authored
      Provide two helpers to open an uffd handle.  Drop the error checks around
      SKIPs because it's inside an errexit() anyway, which IMHO doesn't really
      help much if the test will not continue.
      
      Link: https://lkml.kernel.org/r/20230412164254.328335-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Reviewed-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      78391f64
    • Peter Xu's avatar
      selftests/mm: uffd_[un]register() · c4277cb6
      Peter Xu authored
      Add two helpers to register/unregister to an uffd.  Use them to drop
      duplicate codes.
      
      This patch also drops assert_expected_ioctls_present() and
      get_expected_ioctls().  Reasons:
      
        - It'll need a lot of effort to pass test_type==HUGETLB into it from
          the upper, so it's the simplest way to get rid of another global var
      
        - The ioctls returned in UFFDIO_REGISTER is hardly useful at all,
          because any app can already detect kernel support on any ioctl via its
          corresponding UFFD_FEATURE_*.  The check here is for sanity mostly but
          it's probably destined no user app will even use it.
      
        - It's not friendly to one future goal of uffd to run on old
          kernels, the problem is get_expected_ioctls() compiles against
          UFFD_API_RANGE_IOCTLS, which is a value that can change depending on
          where the test is compiled, rather than reflecting what the kernel
          underneath has.  It means it'll report false negatives on old kernels
          so it's against our will.
      
      So let's make our lives easier.
      
      [peterx@redhat.com; tools/testing/selftests/mm/hugepage-mremap.c: add headers]
        Link: https://lkml.kernel.org/r/ZDxrvZh/cw357D8P@x1n
      Link: https://lkml.kernel.org/r/20230412164247.328293-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c4277cb6
    • Peter Xu's avatar
      selftests/mm: split uffd tests into uffd-stress and uffd-unit-tests · 686a8bb7
      Peter Xu authored
      In many ways it's weird and unwanted to keep all the tests in the same
      userfaultfd.c at least when still in the current way.
      
      For example, it doesn't make much sense to run the stress test for each
      method we can create an userfaultfd handle (either via syscall or /dev/
      node).  It's a waste of time running this twice for the whole stress as
      the stress paths are the same, only the open path is different.
      
      It's also just weird to need to manually specify different types of memory
      to run all unit tests for the userfaultfd interface.  We should be able to
      just run a single program and that should go through all functional uffd
      tests without running the stress test at all.  The stress test was more
      for torturing and finding race conditions.  We don't want to wait for
      stress to finish just to regress test a functional test.
      
      When we start to pile up more things on top of the same file and same
      functions, things start to go a bit chaos and the code is just harder to
      maintain too with tons of global variables.
      
      This patch creates a new test uffd-unit-tests to keep userfaultfd unit
      tests in the future, currently empty.
      
      Meanwhile rename the old userfaultfd.c test to uffd-stress.c.
      
      Link: https://lkml.kernel.org/r/20230412164244.328270-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Reviewed-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      686a8bb7
    • Peter Xu's avatar
      selftests/mm: create uffd-common.[ch] · 33be4e89
      Peter Xu authored
      Move common utility functions into uffd-common.[ch] files from the
      original userfaultfd.c.  This prepares for a split of userfaultfd.c into
      two tests: one to only cover the old but powerful stress test, the other
      one covers all the functional tests.
      
      This movement is kind of a brute-force effort for now, with light
      touch-ups but nothing should really change.  There's chances to optimize
      more, but let's leave that for later.
      
      Link: https://lkml.kernel.org/r/20230412164241.328259-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Reviewed-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      33be4e89
    • Peter Xu's avatar
      selftests/mm: drop test_uffdio_zeropage_eexist · 618aeb5d
      Peter Xu authored
      The idea was trying to flip this var in the alarm handler from time to
      time to test -EEXIST of UFFDIO_ZEROPAGE, but firstly it's only used in the
      zeropage test so probably only used once, meanwhile we passed
      "retry==false" so it'll never got tested anyway.
      
      Drop both sides so we always test UFFDIO_ZEROPAGE retries if has_zeropage
      is set (!hugetlb).
      
      One more thing to do is doing UFFDIO_REGISTER for the alias buffer too,
      because otherwise the test won't even pass!  We were just lucky that this
      test never really got ran at all.
      
      Link: https://lkml.kernel.org/r/20230412164238.328238-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      618aeb5d
    • Peter Xu's avatar
      selftests/mm: test UFFDIO_ZEROPAGE only when !hugetlb · 4af9ff29
      Peter Xu authored
      Make the check as simple as "test_type == TEST_HUGETLB" because that's the
      only mem that doesn't support ZEROPAGE.
      
      Link: https://lkml.kernel.org/r/20230412164234.328168-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4af9ff29
    • Peter Xu's avatar
      selftests/mm: reuse pagemap_get_entry() in vm_util.h · 366e93c4
      Peter Xu authored
      Meanwhile drop pagemap_read_vaddr().
      
      Link: https://lkml.kernel.org/r/20230412164231.328157-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      366e93c4
    • Peter Xu's avatar
      selftests/mm: use PM_* macros in vm_utils.h · 9f74696b
      Peter Xu authored
      We've got the macros in uffd-stress.c, move it over and use it in
      vm_util.h.
      
      Link: https://lkml.kernel.org/r/20230412164227.328145-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9f74696b
    • Peter Xu's avatar
      selftests/mm: merge default_huge_page_size() into one · bd4d67e7
      Peter Xu authored
      There're already 3 same definitions of the three functions.  Move it into
      vm_util.[ch].
      
      Link: https://lkml.kernel.org/r/20230412164223.328134-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      bd4d67e7
    • Peter Xu's avatar
      selftests/mm: link vm_util.c always · 4b54f5a7
      Peter Xu authored
      We do have plenty of files that want to link against vm_util.c.  Just make
      it simple by linking it always.
      
      Link: https://lkml.kernel.org/r/20230412164220.328123-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4b54f5a7
    • Peter Xu's avatar
      selftests/mm: use TEST_GEN_PROGS where proper · aef6fde7
      Peter Xu authored
      TEST_GEN_PROGS and TEST_GEN_FILES are used randomly in the mm/Makefile to
      specify programs that need to build.  Logically all these binaries should
      all fall into TEST_GEN_PROGS.
      
      Replace those TEST_GEN_FILES with TEST_GEN_PROGS, so that we can reference
      all the tests easily later.
      
      [peterx@redhat.com: tools/testing/selftests/mm/Makefile: don't wipe out TEST_GEN_PROGS]
        Link: https://lkml.kernel.org/r/ZDxrvZh/cw357D8P@x1n
      Link: https://lkml.kernel.org/r/20230412164218.328104-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      aef6fde7
    • Peter Xu's avatar
      selftests/mm: merge util.h into vm_util.h · af605d26
      Peter Xu authored
      There're two util headers under mm/ kselftest.  Merge one with another. 
      It turns out util.h is the easy one to move.
      
      When merging, drop PAGE_SIZE / PAGE_SHIFT because they're unnecessary
      wrappers to page_size() / page_shift(), meanwhile rename them to psize()
      and pshift() so as to not conflict with some existing definitions in some
      test files that includes vm_util.h.
      
      Link: https://lkml.kernel.org/r/20230412164120.327731-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      af605d26
    • Peter Xu's avatar
      selftests/mm: dump a summary in run_vmtests.sh · c7c55fc4
      Peter Xu authored
      Dump a summary after running whatever test specified.  Useful for human
      runners to identify any kind of failures (besides exit code).
      
      Link: https://lkml.kernel.org/r/20230412164117.327720-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c7c55fc4
    • Peter Xu's avatar
      selftests/mm: update .gitignore with two missing tests · c14ef378
      Peter Xu authored
      Patch series "selftests/mm: Split / Refactor userfault test", v2.
      
      This patchset splits userfaultfd.c into two tests:
      
        - uffd-stress: the "vanilla", old and powerful stress test
        - uffd-unit-tests: all the unit tests will be moved here
      
      This is on my todo list for a long time but I never did it for real.  The
      uffd test is growing into a small and cute monster.  I start to notice it's
      going harder to maintain such a test and make it useful.
      
      A few issues I found when looking at userfaultfd test:
      
        - We have a bunch of unit tests in userfaultfd.c, but they always need to
          be run only after a stress type.  No way to not do it.
      
        - We can only run an unit test for one memory type only, if we want to
          do a quick smoke test to check regressions, there's no good way.  The
          best to come currently is "bash ./run_vmtests.sh -t userfaultfd" thanks
          to the most recent changes to run_vmtests.sh on tagging.  Still, that
          needs to run the stress tests always and hard to see what's wrong.
      
        - It's hard to add a new unit test to userfaultfd.c, we don't really know
          what's happening, not until we mostly read the whole file.
      
        - We did a bunch of useless tests, e.g. we run twice the whole suite of
          stress test just to verify both syscall and /dev/userfaultfd.  They're
          all using userfaultfd_new() to create the handle, everything should
          really be the same underneath.  One simple unit test should cover that!
      
        - We have tens of global variables in one file but shared with all the
          tests.  Some of them are not suitable to be a global var from
          maintainance pov.  It enforces every unit test to consider how these
          vars affects the stress test and vice versa, but that's logically not
          necessary.
      
        - Userfaultfd test is not friendly to old kernels.  Mostly it only works
          on the latest kernel tree.  It's preferrable to be run on all kernels
          and properly report what's missing.
      
      I'll stop here, I feel like I can still list some..
      
      This patchset should resolve all issues above, and actually we can do even
      more on top.  I stopped doing that until I found I already got 29 patches
      and 2000+ LOC changes.  That's already a patchset terrible enough so we
      should move in small steps.
      
      After the whole set applied, "./run_vmtests.sh -t userfaultfd" looks like
      this:
      
      ===8<===
      vm.nr_hugepages = 1024
      -------------------------
      running ./uffd-unit-tests
      -------------------------
      Testing UFFDIO_API (with syscall)... done
      Testing UFFDIO_API (with /dev/userfaultfd)... done
      Testing register-ioctls on anon... done
      Testing register-ioctls on shmem... done
      Testing register-ioctls on shmem-private... done
      Testing register-ioctls on hugetlb... done
      Testing register-ioctls on hugetlb-private... done
      Testing zeropage on anon... done
      Testing zeropage on shmem... done
      Testing zeropage on shmem-private... done
      Testing zeropage on hugetlb... done
      Testing zeropage on hugetlb-private... done
      Testing pagemap on anon... done
      Testing wp-unpopulated on anon... done
      Testing minor on shmem... done
      Testing minor on hugetlb... done
      Testing minor-wp on shmem... done
      Testing minor-wp on hugetlb... done
      Testing minor-collapse on shmem... done
      Testing sigbus on anon... done
      Testing sigbus on shmem... done
      Testing sigbus on shmem-private... done
      Testing sigbus on hugetlb... done
      Testing sigbus on hugetlb-private... done
      Testing sigbus-wp on anon... done
      Testing sigbus-wp on shmem... done
      Testing sigbus-wp on shmem-private... done
      Testing sigbus-wp on hugetlb... done
      Testing sigbus-wp on hugetlb-private... done
      Testing events on anon... done
      Testing events on shmem... done
      Testing events on shmem-private... done
      Testing events on hugetlb... done
      Testing events on hugetlb-private... done
      Testing events-wp on anon... done
      Testing events-wp on shmem... done
      Testing events-wp on shmem-private... done
      Testing events-wp on hugetlb... done
      Testing events-wp on hugetlb-private... done
      Userfaults unit tests: pass=39, skip=0, fail=0 (total=39)
      [PASS]
      --------------------------------
      running ./uffd-stress anon 20 16
      --------------------------------
      nr_pages: 5120, nr_pages_per_cpu: 640
      bounces: 15, mode: rnd racing ver poll, userfaults: 345 missing (26+48+61+102+30+12+59+7) 1596 wp (120+139+317+346+215+67+306+86)
      [...]
      [PASS]
      ------------------------------------
      running ./uffd-stress hugetlb 128 32
      ------------------------------------
      nr_pages: 64, nr_pages_per_cpu: 8
      bounces: 31, mode: rnd racing ver poll, userfaults: 29 missing (6+6+6+5+4+2+0+0) 104 wp (20+19+22+18+7+12+5+1)
      [...]
      [PASS]
      --------------------------------------------
      running ./uffd-stress hugetlb-private 128 32
      --------------------------------------------
      nr_pages: 64, nr_pages_per_cpu: 8
      bounces: 31, mode: rnd racing ver poll, userfaults: 33 missing (12+9+7+0+5+0+0+0) 111 wp (24+25+14+14+11+17+5+1)
      [...]
      [PASS]
      ---------------------------------
      running ./uffd-stress shmem 20 16
      ---------------------------------
      nr_pages: 5120, nr_pages_per_cpu: 640
      bounces: 15, mode: rnd racing ver poll, userfaults: 247 missing (15+17+34+60+81+37+3+0) 2038 wp (180+114+276+400+381+318+165+204)
      [...]
      [PASS]
      -----------------------------------------
      running ./uffd-stress shmem-private 20 16
      -----------------------------------------
      nr_pages: 5120, nr_pages_per_cpu: 640
      bounces: 15, mode: rnd racing ver poll, userfaults: 235 missing (52+29+55+56+13+9+16+5) 2849 wp (218+406+461+531+328+284+430+191)
      [...]
      [PASS]
      SUMMARY: PASS=6 SKIP=0 FAIL=0
      ===8<===
      
      The output may be different if we miss some features (e.g., hugetlb not
      allocated, old kernel, less privilege of uffd handle), but they should show
      up with good reasons.  E.g., I tried to run the unit test on my Fedora
      kernel and it gives me:
      
      ===8<===
      UFFDIO_API (with syscall)... failed [reason: UFFDIO_API should fail with wrong api but didn't]
      UFFDIO_API (with /dev/userfaultfd)... skipped [reason: cannot open userfaultfd handle]
      zeropage on anon... done
      zeropage on shmem... done
      zeropage on shmem-private... done
      zeropage-hugetlb on hugetlb... done
      zeropage-hugetlb on hugetlb-private... done
      pagemap on anon... pagemap on anon... pagemap on anon... done
      wp-unpopulated on anon... skipped [reason: feature missing]
      minor on shmem... done
      minor on hugetlb... done
      minor-wp on shmem... skipped [reason: feature missing]
      minor-wp on hugetlb... skipped [reason: feature missing]
      minor-collapse on shmem... done
      sigbus on anon... skipped [reason: possible lack of priviledge]
      sigbus on shmem... skipped [reason: possible lack of priviledge]
      sigbus on shmem-private... skipped [reason: possible lack of priviledge]
      sigbus on hugetlb... skipped [reason: possible lack of priviledge]
      sigbus on hugetlb-private... skipped [reason: possible lack of priviledge]
      sigbus-wp on anon... skipped [reason: possible lack of priviledge]
      sigbus-wp on shmem... skipped [reason: possible lack of priviledge]
      sigbus-wp on shmem-private... skipped [reason: possible lack of priviledge]
      sigbus-wp on hugetlb... skipped [reason: possible lack of priviledge]
      sigbus-wp on hugetlb-private... skipped [reason: possible lack of priviledge]
      events on anon... skipped [reason: possible lack of priviledge]
      events on shmem... skipped [reason: possible lack of priviledge]
      events on shmem-private... skipped [reason: possible lack of priviledge]
      events on hugetlb... skipped [reason: possible lack of priviledge]
      events on hugetlb-private... skipped [reason: possible lack of priviledge]
      events-wp on anon... skipped [reason: possible lack of priviledge]
      events-wp on shmem... skipped [reason: possible lack of priviledge]
      events-wp on shmem-private... skipped [reason: possible lack of priviledge]
      events-wp on hugetlb... skipped [reason: possible lack of priviledge]
      events-wp on hugetlb-private... skipped [reason: possible lack of priviledge]
      Userfaults unit tests: pass=9, skip=24, fail=1 (total=34)
      ===8<===
      
      Patch layout:
      
      - Revert "userfaultfd: don't fail on unrecognized features"
      
        Something I found when I got the UFFDIO_API test below.  Axel, I still
        propose to revert it as a whole, but feel free to continue the discussion
        from the original patch thread.
      
      - selftests/mm: Update .gitignore with two missing tests
      - selftests/mm: Dump a summary in run_vmtests.sh
      - selftests/mm: Merge util.h into vm_util.h
      - selftests/mm: Use TEST_GEN_PROGS where proper
      - selftests/mm: Link vm_util.c always
      - selftests/mm: Merge default_huge_page_size() into one
      - selftests/mm: Use PM_* macros in vm_utils.h
      - selftests/mm: Reuse pagemap_get_entry() in vm_util.h
      - selftests/mm: Test UFFDIO_ZEROPAGE only when !hugetlb
      - selftests/mm: Drop test_uffdio_zeropage_eexist
      
        Until here, all cleanups here and there.  I wanted to keep going, but I
        found that maybe it'll take a few more days to split the test.  Hence I
        did a split starting from the next one, so we have a working thing first.
      
      - selftests/mm: Create uffd-common.[ch]
      - selftests/mm: Split uffd tests into uffd-stress and uffd-unit-tests
      
        This did the major brute force split of common codes into
        uffd-common.[ch].  That'll be the so far common base for stress and unit
        tests.  Then a new unit test is created.
      
      - selftests/mm: uffd_[un]register()
      - selftests/mm: uffd_open_{dev|sys}()
      - selftests/mm: UFFDIO_API test
      
        This patch hides here to start writting the 1st unit test with
        UFFDIO_API, also detection of userfaultfd privileges.
      
      - selftests/mm: Drop global mem_fd in uffd tests
      - selftests/mm: Drop global hpage_size in uffd tests
      - selftests/mm: Rename uffd_stats to uffd_args
      - selftests/mm: Let uffd_handle_page_fault() takes wp parameter
      - selftests/mm: Allow allocate_area() to fail properly
      
        Some further cleanup that I noticed otherwise hard to move the tests.
      
      - selftests/mm: Add framework for uffd-unit-test
      
        The major patch provides the framework for most of the rest unit tests.
      
      - selftests/mm: Move uffd pagemap test to unit test
      - selftests/mm: Move uffd minor test to unit test
      - selftests/mm: Move uffd sig/events tests into uffd unit tests
      - selftests/mm: Move zeropage test into uffd unit tests
      
        Move unit tests and suite them into the new file.
      
      - selftests/mm: Workaround no way to detect uffd-minor + wp
      - selftests/mm: Allow uffd test to skip properly with no privilege
      - selftests/mm: Drop sys/dev test in uffd-stress test
      - selftests/mm: Add shmem-private test to uffd-stress
      
        A bunch of changes to do better on error reportings, and add
        shmem-private to the stress test which was long missing.
      
      - selftests/mm: Add uffdio register ioctls test
      
        One more patch to test uffdio_register.ioctls.
      
      
      This patch (of 30):
      
      Update .gitignore with two missing tests.
      
      Link: https://lkml.kernel.org/r/20230412163922.327282-1-peterx@redhat.com
      Link: https://lkml.kernel.org/r/20230412164114.327709-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c14ef378
    • Haifeng Xu's avatar
      mm/vmscan: simplify shrink_node() · 54c4fe08
      Haifeng Xu authored
      The difference between sc->nr_reclaimed and nr_reclaimed is computed three
      times.  Introduce a new variable to record the value, so it only needs to
      be computed once.
      
      Link: https://lkml.kernel.org/r/20230411061757.12041-1-haifeng.xu@shopee.comSigned-off-by: default avatarHaifeng Xu <haifeng.xu@shopee.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      54c4fe08
    • Pankaj Raghav's avatar
      mpage: use folios in bio end_io handler · 09a607c9
      Pankaj Raghav authored
      Use folios in the bio end_io handler.  This conversion does the
      appropriate handling on the folios in the respective end_io callback and
      removes the call to page_endio(), which is soon to be removed.
      
      Link: https://lkml.kernel.org/r/20230411122920.30134-4-p.raghav@samsung.comSigned-off-by: default avatarPankaj Raghav <p.raghav@samsung.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Cc: Martin Brandenburg <martin@omnibond.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mike Marshall <hubcap@omnibond.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      09a607c9
    • Pankaj Raghav's avatar
      mpage: split submit_bio and bio end_io handler for reads and writes · f0d6ca46
      Pankaj Raghav authored
      Split the submit_bio() and bio end_io handler for reads and writes similar
      to other aops.
      
      This is a prep patch before we convert end_io handlers to use folios.
      
      Link: https://lkml.kernel.org/r/20230411122920.30134-3-p.raghav@samsung.comSigned-off-by: default avatarPankaj Raghav <p.raghav@samsung.com>
      Suggested-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Cc: Martin Brandenburg <martin@omnibond.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mike Marshall <hubcap@omnibond.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f0d6ca46
    • Pankaj Raghav's avatar
      orangefs: use folios in orangefs_readahead · cd01049d
      Pankaj Raghav authored
      Patch series "remove page_endio()", v3.
      
      It was decided to remove the page_endio() as per the previous RFC
      discussion[1] of this series and move that functionality into the caller
      itself.  One of the side benefit of doing that is the callers have been
      modified to directly work on folios as page_endio() already worked on
      folios.
      
      As Christoph is doing ZRAM cleanups[4] which will get rid of page_endio()
      function usage, I removed the final patch that removes page_endio()[5].  I
      will send it separately after rc-1 once the zram cleanups are merged.
      
      mpage changes were tested with a simple boot testing and running a fio
      workload on ext2 filesystem.  orangefs was tested by Mike Marshall (No
      code changes since he tested).
      
      
      This patch (of 3):
      
      Convert orangefs_readahead() from using struct page to struct folio.  This
      conversion removes the call to page_endio() which is soon to be removed,
      and simplifies the final page handling.
      
      The page error flags is not required to be set in the error case as
      orangefs doesn't depend on them.
      
      Link: https://lkml.kernel.org/r/20230411122920.30134-1-p.raghav@samsung.com
      Link: https://lkml.kernel.org/r/20230411122920.30134-2-p.raghav@samsung.com
      Link: https://lore.kernel.org/linux-mm/ZBHcl8Pz2ULb4RGD@infradead.org/ [1]
      Link: https://lore.kernel.org/linux-mm/20230322135013.197076-1-p.raghav@samsung.com/ [2]
      Link: https://lore.kernel.org/linux-mm/8adb0770-6124-e11f-2551-6582db27ed32@samsung.com/ [3]
      Link: https://lore.kernel.org/linux-block/20230404150536.2142108-1-hch@lst.de/T/#t [4]
      Link: https://lore.kernel.org/lkml/20230403132221.94921-6-p.raghav@samsung.com/ [5]
      Signed-off-by: default avatarPankaj Raghav <p.raghav@samsung.com>
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Tested-by: default avatarMike Marshall <hubcap@omnibond.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Cc: Martin Brandenburg <martin@omnibond.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      cd01049d
    • David Hildenbrand's avatar
      mm/huge_memory: conditionally call maybe_mkwrite() and drop pte_wrprotect() in... · 1462c52e
      David Hildenbrand authored
      mm/huge_memory: conditionally call maybe_mkwrite() and drop pte_wrprotect() in __split_huge_pmd_locked()
      
      No need to call maybe_mkwrite() to then wrprotect if the source PMD was not
      writable.
      
      It's worth nothing that this now allows for PTEs to be writable even if
      the source PMD was not writable: if vma->vm_page_prot includes write
      permissions.
      
      As documented in commit 931298e1 ("mm/userfaultfd: rely on
      vma->vm_page_prot in uffd_wp_range()"), any mechanism that intends to
      have pages wrprotected (COW, writenotify, mprotect, uffd-wp, softdirty,
      ...) has to properly adjust vma->vm_page_prot upfront, to not include
      write permissions. If vma->vm_page_prot includes write permissions, the
      PTE/PMD can be writable as default.
      
      This now mimics the handling in mm/migrate.c:remove_migration_pte() and in
      mm/huge_memory.c:remove_migration_pmd(), which has been in place for a
      long time (except that 96a9c287 ("mm/migrate: fix wrongly apply write
      bit after mkdirty on sparc64") temporarily changed it).
      
      Link: https://lkml.kernel.org/r/20230411142512.438404-7-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1462c52e
    • David Hildenbrand's avatar
      mm/huge_memory: revert "Partly revert "mm/thp: carry over dirty bit when thp splits on pmd"" · 5436d655
      David Hildenbrand authored
      This reverts commit 624a2c94 ("Partly revert "mm/thp: carry over dirty
      bit when thp splits on pmd"") and the fixup in commit e833bc50
      ("mm/thp: re-apply mkdirty for small pages after split").
      
      Now that sparc64 mkdirty handling is fixed and no longer sets a PTE/PMD
      writable that shouldn't be writable, let's revert the temporary fix and
      remove the stale comment.
      
      The mkdirty mm selftest still passes with this change on sparc64.
      
      Note that loongarch handling was fixed in commit bf2f34a5 ("LoongArch:
      Set _PAGE_DIRTY only if _PAGE_WRITE is set in {pmd,pte}_mkdirty()")
      
      Link: https://lkml.kernel.org/r/20230411142512.438404-6-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      5436d655
    • David Hildenbrand's avatar
      mm/migrate: revert "mm/migrate: fix wrongly apply write bit after mkdirty on sparc64" · 3c811f78
      David Hildenbrand authored
      This reverts commit 96a9c287 ("mm/migrate: fix wrongly apply write bit
      after mkdirty on sparc64").
      
      Now that sparc64 mkdirty handling is fixed and no longer sets a PTE/PMD
      writable that shouldn't be writable, let's revert the temporary fix.
      
      The mkdirty mm selftest still passes with this change on sparc64.
      
      Note that loongarch handling was fixed in commit bf2f34a5 ("LoongArch:
      Set _PAGE_DIRTY only if _PAGE_WRITE is set in {pmd,pte}_mkdirty()").
      
      Link: https://lkml.kernel.org/r/20230411142512.438404-5-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3c811f78
    • David Hildenbrand's avatar
      sparc/mm: don't unconditionally set HW writable bit when setting PTE dirty on 64bit · fa2e71a6
      David Hildenbrand authored
      On sparc64, there is no HW modified bit, therefore, SW tracks via a SW bit
      if the PTE is dirty via pte_mkdirty().  However, pte_mkdirty() currently
      also unconditionally sets the HW writable bit, which is wrong.
      
      pte_mkdirty() is not supposed to make a PTE actually writable, unless the
      SW writable bit -- pte_write() -- indicates that the PTE is not
      write-protected.  Fortunately, sparc64 also defines a SW writable bit.
      
      For example, this already turned into a problem in the context of THP
      splitting as documented in commit 624a2c94 ("Partly revert "mm/thp:
      carry over dirty bit when thp splits on pmd""), and for page migration, as
      documented in commit 96a9c287 ("mm/migrate: fix wrongly apply write
      bit after mkdirty on sparc64").
      
      Also, we might want to use the dirty PTE bit in the context of KSM with
      shared zeropage [1], whereby setting the page writable would be
      problematic.
      
      But more general, any code that might end up setting a PTE/PMD dirty
      inside a VM without write permissions is possibly broken,
      
      Before this commit (sun4u in QEMU):
      	root@debian:~/linux/tools/testing/selftests/mm# ./mkdirty
      	# [INFO] detected THP size: 8192 KiB
      	TAP version 13
      	1..6
      	# [INFO] PTRACE write access
      	not ok 1 SIGSEGV generated, page not modified
      	# [INFO] PTRACE write access to THP
      	not ok 2 SIGSEGV generated, page not modified
      	# [INFO] Page migration
      	ok 3 SIGSEGV generated, page not modified
      	# [INFO] Page migration of THP
      	ok 4 SIGSEGV generated, page not modified
      	# [INFO] PTE-mapping a THP
      	ok 5 SIGSEGV generated, page not modified
      	# [INFO] UFFDIO_COPY
      	not ok 6 SIGSEGV generated, page not modified
      	Bail out! 3 out of 6 tests failed
      	# Totals: pass:3 fail:3 xfail:0 xpass:0 skip:0 error:0
      
      Test #3,#4,#5 pass ever since we added some MM workarounds, the
      underlying issue remains.
      
      Let's fix the remaining issues and prepare for reverting the workarounds
      by setting the HW writable bit only if both, the SW dirty bit and the SW
      writable bit are set.
      
      We have to move pte_dirty() and pte_write() up. The code patching
      mechanism and handling constants > 22bit is a bit special on sparc64.
      
      The ASM logic in pte_mkdirty() and pte_mkwrite() match the logic in
      pte_mkold() to create the mask depending on the machine type. The ASM
      logic in __pte_mkhwwrite() matches the logic in pte_present(), just
      using an "or" instead of an "and" instruction.
      
      With this commit (sun4u in QEMU):
      	root@debian:~/linux/tools/testing/selftests/mm# ./mkdirty
      	# [INFO] detected THP size: 8192 KiB
      	TAP version 13
      	1..6
      	# [INFO] PTRACE write access
      	ok 1 SIGSEGV generated, page not modified
      	# [INFO] PTRACE write access to THP
      	ok 2 SIGSEGV generated, page not modified
      	# [INFO] Page migration
      	ok 3 SIGSEGV generated, page not modified
      	# [INFO] Page migration of THP
      	ok 4 SIGSEGV generated, page not modified
      	# [INFO] PTE-mapping a THP
      	ok 5 SIGSEGV generated, page not modified
      	# [INFO] UFFDIO_COPY
      	ok 6 SIGSEGV generated, page not modified
      	# Totals: pass:6 fail:0 xfail:0 xpass:0 skip:0 error:0
      
      This handling seems to have been in place forever.
      
      [1] https://lkml.kernel.org/r/533a7c3d-3a48-b16b-b421-6e8386e0b142@redhat.com
      
      Link: https://lkml.kernel.org/r/20230411142512.438404-4-david@redhat.com
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      fa2e71a6
    • David Hildenbrand's avatar
      selftests/mm: mkdirty: test behavior of (pte|pmd)_mkdirty on VMAs without write permissions · 9eac40fc
      David Hildenbrand authored
      Let's add some tests that trigger (pte|pmd)_mkdirty on VMAs without write
      permissions.  If an architecture implementation is wrong, we might
      accidentally set the PTE/PMD writable and allow for write access in a VMA
      without write permissions.
      
      The tests include reproducers for the two issues recently discovered
      and worked-around in core-MM for now:
      
      (1) commit 624a2c94 ("Partly revert "mm/thp: carry over dirty
          bit when thp splits on pmd"")
      (2) commit 96a9c287 ("mm/migrate: fix wrongly apply write bit
          after mkdirty on sparc64")
      
      In addition, some other tests that reveal further issues.
      
      All tests pass under x86_64:
      	./mkdirty
      	# [INFO] detected THP size: 2048 KiB
      	TAP version 13
      	1..6
      	# [INFO] PTRACE write access
      	ok 1 SIGSEGV generated, page not modified
      	# [INFO] PTRACE write access to THP
      	ok 2 SIGSEGV generated, page not modified
      	# [INFO] Page migration
      	ok 3 SIGSEGV generated, page not modified
      	# [INFO] Page migration of THP
      	ok 4 SIGSEGV generated, page not modified
      	# [INFO] PTE-mapping a THP
      	ok 5 SIGSEGV generated, page not modified
      	# [INFO] UFFDIO_COPY
      	ok 6 SIGSEGV generated, page not modified
      	# Totals: pass:6 fail:0 xfail:0 xpass:0 skip:0 error:0
      
      But some fail on sparc64:
      	./mkdirty
      	# [INFO] detected THP size: 8192 KiB
      	TAP version 13
      	1..6
      	# [INFO] PTRACE write access
      	not ok 1 SIGSEGV generated, page not modified
      	# [INFO] PTRACE write access to THP
      	not ok 2 SIGSEGV generated, page not modified
      	# [INFO] Page migration
      	ok 3 SIGSEGV generated, page not modified
      	# [INFO] Page migration of THP
      	ok 4 SIGSEGV generated, page not modified
      	# [INFO] PTE-mapping a THP
      	ok 5 SIGSEGV generated, page not modified
      	# [INFO] UFFDIO_COPY
      	not ok 6 SIGSEGV generated, page not modified
      	Bail out! 3 out of 6 tests failed
      	# Totals: pass:3 fail:3 xfail:0 xpass:0 skip:0 error:0
      
      Reverting both above commits makes all tests fail on sparc64:
      	./mkdirty
      	# [INFO] detected THP size: 8192 KiB
      	TAP version 13
      	1..6
      	# [INFO] PTRACE write access
      	not ok 1 SIGSEGV generated, page not modified
      	# [INFO] PTRACE write access to THP
      	not ok 2 SIGSEGV generated, page not modified
      	# [INFO] Page migration
      	not ok 3 SIGSEGV generated, page not modified
      	# [INFO] Page migration of THP
      	not ok 4 SIGSEGV generated, page not modified
      	# [INFO] PTE-mapping a THP
      	not ok 5 SIGSEGV generated, page not modified
      	# [INFO] UFFDIO_COPY
      	not ok 6 SIGSEGV generated, page not modified
      	Bail out! 6 out of 6 tests failed
      	# Totals: pass:0 fail:6 xfail:0 xpass:0 skip:0 error:0
      
      The tests are useful to detect other problematic archs, to verify new
      arch fixes, and to stop such issues from reappearing in the future.
      
      For now, we don't add any hugetlb tests.
      
      Link: https://lkml.kernel.org/r/20230411142512.438404-3-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9eac40fc
    • David Hildenbrand's avatar
      selftests/mm: reuse read_pmd_pagesize() in COW selftest · d6e61afb
      David Hildenbrand authored
      Patch series "mm: (pte|pmd)_mkdirty() should not unconditionally allow for
      write access".
      
      This is the follow-up on [1], adding selftests (testing for known issues
      we added workarounds for and other issues that haven't been fixed yet),
      fixing sparc64, reverting the workarounds, and perform one cleanup.
      
      The patch from [1] was modified slightly (updated/extended patch
      description, dropped one unnecessary NOP instruction from the ASM in
      __pte_mkhwwrite()).
      
      Retested on x86_64 and sparc64 (sun4u in QEMU).
      
      I scanned most architectures to make sure their (pte|pmd)_mkdirty()
      handling is correct.  To be sure, we can run the selftests and find out if
      other architectures are still affectes (loongarch was fixed recently as
      well).
      
      Based on master for now. I don't expect surprises regarding mm-tress, but
      I can rebase if there are any problems.
      
      
      This patch (of 6):
      
      The COW selftest can deal with THP not being configured.  So move error
      handling of read_pmd_pagesize() into the callers such that we can reuse it
      in the COW selftest.
      
      Link: https://lkml.kernel.org/r/20230411142512.438404-1-david@redhat.com
      Link: https://lkml.kernel.org/r/20221212130213.136267-1-david@redhat.com [1]
      Link: https://lkml.kernel.org/r/20230411142512.438404-2-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d6e61afb
    • Christoph Hellwig's avatar
      zram: return errors from read_from_bdev_sync · 1e9460d1
      Christoph Hellwig authored
      Propagate read errors to the caller instead of dropping them on the floor,
      and stop returning the somewhat dangerous 1 on success from
      read_from_bdev*.
      
      Link: https://lkml.kernel.org/r/20230411171459.567614-18-hch@lst.deSigned-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarSergey Senozhatsky <senozhatsky@chromium.org>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1e9460d1
    • Christoph Hellwig's avatar
      zram: fix synchronous reads · 4e3c87b9
      Christoph Hellwig authored
      Currently nothing waits for the synchronous reads before accessing the
      data.  Switch them to an on-stack bio and submit_bio_wait to make sure the
      I/O has actually completed when the work item has been flushed.  This also
      removes the call to page_endio that would unlock a page that has never
      been locked.
      
      Drop the partial_io/sync flag, as chaining only makes sense for the
      asynchronous reads of the entire page.
      
      Link: https://lkml.kernel.org/r/20230411171459.567614-17-hch@lst.deSigned-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarSergey Senozhatsky <senozhatsky@chromium.org>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4e3c87b9
    • Christoph Hellwig's avatar
      zram: don't return errors from read_from_bdev_async · 0cd97a03
      Christoph Hellwig authored
      bio_alloc will never return a NULL bio when it is allowed to sleep, and
      adding a single page to bio with a single vector also can't fail, so
      switch to the asserting __bio_add_page variant and drop the error returns.
      
      Link: https://lkml.kernel.org/r/20230411171459.567614-16-hch@lst.deSigned-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarSergey Senozhatsky <senozhatsky@chromium.org>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0cd97a03