Commit 52526ca7 authored by Muhammad Usama Anjum's avatar Muhammad Usama Anjum Committed by Andrew Morton

fs/proc/task_mmu: implement IOCTL to get and optionally clear info about PTEs

The PAGEMAP_SCAN IOCTL on the pagemap file can be used to get or optionally
clear the info about page table entries. The following operations are
supported in this IOCTL:
- Scan the address range and get the memory ranges matching the provided
  criteria. This is performed when the output buffer is specified.
- Write-protect the pages. The PM_SCAN_WP_MATCHING is used to write-protect
  the pages of interest. The PM_SCAN_CHECK_WPASYNC aborts the operation if
  non-Async Write Protected pages are found. The ``PM_SCAN_WP_MATCHING``
  can be used with or without PM_SCAN_CHECK_WPASYNC.
- Both of those operations can be combined into one atomic operation where
  we can get and write protect the pages as well.

Following flags about pages are currently supported:
- PAGE_IS_WPALLOWED - Page has async-write-protection enabled
- PAGE_IS_WRITTEN - Page has been written to from the time it was write protected
- PAGE_IS_FILE - Page is file backed
- PAGE_IS_PRESENT - Page is present in the memory
- PAGE_IS_SWAPPED - Page is in swapped
- PAGE_IS_PFNZERO - Page has zero PFN
- PAGE_IS_HUGE - Page is THP or Hugetlb backed

This IOCTL can be extended to get information about more PTE bits. The
entire address range passed by user [start, end) is scanned until either
the user provided buffer is full or max_pages have been found.

[akpm@linux-foundation.org: update it for "mm: hugetlb: add huge page size param to set_huge_pte_at()"]
[akpm@linux-foundation.org: fix CONFIG_HUGETLB_PAGE=n warning]
[arnd@arndb.de: hide unused pagemap_scan_backout_range() function]
  Link: https://lkml.kernel.org/r/20230927060257.2975412-1-arnd@kernel.org
[sfr@canb.auug.org.au: fix "fs/proc/task_mmu: hide unused pagemap_scan_backout_range() function"]
  Link: https://lkml.kernel.org/r/20230928092223.0625c6bf@canb.auug.org.au
Link: https://lkml.kernel.org/r/20230821141518.870589-3-usama.anjum@collabora.comSigned-off-by: default avatarMuhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: default avatarMichał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: default avatarAndrei Vagin <avagin@gmail.com>
Reviewed-by: default avatarMichał Mirosław <mirq-linux@rere.qmqm.pl>
Cc: Alex Sierra <alex.sierra@amd.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Miroslaw <emmir@google.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Paul Gofman <pgofman@codeweavers.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yun Zhou <yun.zhou@windriver.com>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
parent d61ea1cb
This diff is collapsed.
......@@ -280,6 +280,7 @@ long hugetlb_change_protection(struct vm_area_struct *vma,
unsigned long cp_flags);
bool is_hugetlb_entry_migration(pte_t pte);
bool is_hugetlb_entry_hwpoisoned(pte_t pte);
void hugetlb_unshare_all_pmds(struct vm_area_struct *vma);
#else /* !CONFIG_HUGETLB_PAGE */
......
......@@ -221,6 +221,13 @@ static inline vm_fault_t handle_userfault(struct vm_fault *vmf,
return VM_FAULT_SIGBUS;
}
static inline long uffd_wp_range(struct vm_area_struct *vma,
unsigned long start, unsigned long len,
bool enable_wp)
{
return false;
}
static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma,
struct vm_userfaultfd_ctx vm_ctx)
{
......
......@@ -305,4 +305,63 @@ typedef int __bitwise __kernel_rwf_t;
#define RWF_SUPPORTED (RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT |\
RWF_APPEND)
/* Pagemap ioctl */
#define PAGEMAP_SCAN _IOWR('f', 16, struct pm_scan_arg)
/* Bitmasks provided in pm_scan_args masks and reported in page_region.categories. */
#define PAGE_IS_WPALLOWED (1 << 0)
#define PAGE_IS_WRITTEN (1 << 1)
#define PAGE_IS_FILE (1 << 2)
#define PAGE_IS_PRESENT (1 << 3)
#define PAGE_IS_SWAPPED (1 << 4)
#define PAGE_IS_PFNZERO (1 << 5)
#define PAGE_IS_HUGE (1 << 6)
/*
* struct page_region - Page region with flags
* @start: Start of the region
* @end: End of the region (exclusive)
* @categories: PAGE_IS_* category bitmask for the region
*/
struct page_region {
__u64 start;
__u64 end;
__u64 categories;
};
/* Flags for PAGEMAP_SCAN ioctl */
#define PM_SCAN_WP_MATCHING (1 << 0) /* Write protect the pages matched. */
#define PM_SCAN_CHECK_WPASYNC (1 << 1) /* Abort the scan when a non-WP-enabled page is found. */
/*
* struct pm_scan_arg - Pagemap ioctl argument
* @size: Size of the structure
* @flags: Flags for the IOCTL
* @start: Starting address of the region
* @end: Ending address of the region
* @walk_end Address where the scan stopped (written by kernel).
* walk_end == end (address tags cleared) informs that the scan completed on entire range.
* @vec: Address of page_region struct array for output
* @vec_len: Length of the page_region struct array
* @max_pages: Optional limit for number of returned pages (0 = disabled)
* @category_inverted: PAGE_IS_* categories which values match if 0 instead of 1
* @category_mask: Skip pages for which any category doesn't match
* @category_anyof_mask: Skip pages for which no category matches
* @return_mask: PAGE_IS_* categories that are to be reported in `page_region`s returned
*/
struct pm_scan_arg {
__u64 size;
__u64 flags;
__u64 start;
__u64 end;
__u64 walk_end;
__u64 vec;
__u64 vec_len;
__u64 max_pages;
__u64 category_inverted;
__u64 category_mask;
__u64 category_anyof_mask;
__u64 return_mask;
};
#endif /* _UAPI_LINUX_FS_H */
......@@ -5044,7 +5044,7 @@ bool is_hugetlb_entry_migration(pte_t pte)
return false;
}
static bool is_hugetlb_entry_hwpoisoned(pte_t pte)
bool is_hugetlb_entry_hwpoisoned(pte_t pte)
{
swp_entry_t swp;
......@@ -6266,7 +6266,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
}
entry = huge_pte_clear_uffd_wp(entry);
set_huge_pte_at(mm, haddr, ptep, entry);
set_huge_pte_at(mm, haddr, ptep, entry,
huge_page_size(hstate_vma(vma)));
/* Fallthrough to CoW */
}
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment