-
Linus Torvalds authored
In commit 70e806e4 ("mm: Do early cow for pinned pages during fork() for ptes") we write-protected the PTE before doing the page pinning check, in order to avoid a race with concurrent fast-GUP pinning (which doesn't take the mm semaphore or the page table lock). That trick doesn't actually work - it doesn't handle memory ordering properly, and doing so would be prohibitively expensive. It also isn't really needed. While we're moving in the direction of allowing and supporting page pinning without marking the pinned area with MADV_DONTFORK, the fact is that we've never really supported this kind of odd "concurrent fork() and page pinning", and doing the serialization on a pte level is just wrong. We can add serialization with a per-mm sequence counter, so we know how to solve that race properly, but we'll do that at a more appropriate time. Right now this just removes the write protect games. It also turns out that the write protect games actually break on Power, as reported by Aneesh Kumar: "Architecture like ppc64 expects set_pte_at to be not used for updating a valid pte. This is further explained in commit 56eecdb9 ("mm: Use ptep/pmdp_set_numa() for updating _PAGE_NUMA bit")" and the code triggered a warning there: WARNING: CPU: 0 PID: 30613 at arch/powerpc/mm/pgtable.c:185 set_pte_at+0x2a8/0x3a0 arch/powerpc/mm/pgtable.c:185 Call Trace: copy_present_page mm/memory.c:857 [inline] copy_present_pte mm/memory.c:899 [inline] copy_pte_range mm/memory.c:1014 [inline] copy_pmd_range mm/memory.c:1092 [inline] copy_pud_range mm/memory.c:1127 [inline] copy_p4d_range mm/memory.c:1150 [inline] copy_page_range+0x1f6c/0x2cc0 mm/memory.c:1212 dup_mmap kernel/fork.c:592 [inline] dup_mm+0x77c/0xab0 kernel/fork.c:1355 copy_mm kernel/fork.c:1411 [inline] copy_process+0x1f00/0x2740 kernel/fork.c:2070 _do_fork+0xc4/0x10b0 kernel/fork.c:2429 Link: https://lore.kernel.org/lkml/CAHk-=wiWr+gO0Ro4LvnJBMs90OiePNyrE3E+pJvc9PzdBShdmw@mail.gmail.com/ Link: https://lore.kernel.org/linuxppc-dev/20201008092541.398079-1-aneesh.kumar@linux.ibm.com/Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Tested-by: Leon Romanovsky <leonro@nvidia.com> Cc: Peter Xu <peterx@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jan Kara <jack@suse.cz> Cc: Michal Hocko <mhocko@suse.com> Cc: Kirill Shutemov <kirill@shutemov.name> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
f3c64eda