1. 20 Apr, 2023 1 commit
    • Linus Torvalds's avatar
      Merge tag 'mm-hotfixes-stable-2023-04-19-16-36' of... · cb085634
      Linus Torvalds authored
      Merge tag 'mm-hotfixes-stable-2023-04-19-16-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
      
      Pull misc fixes from Andrew Morton:
       "22 hotfixes.
      
        19 are cc:stable and the remainder address issues which were
        introduced during this merge cycle, or aren't considered suitable for
        -stable backporting.
      
        19 are for MM and the remainder are for other subsystems"
      
      * tag 'mm-hotfixes-stable-2023-04-19-16-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (22 commits)
        nilfs2: initialize unused bytes in segment summary blocks
        mm: page_alloc: skip regions with hugetlbfs pages when allocating 1G pages
        mm/mmap: regression fix for unmapped_area{_topdown}
        maple_tree: fix mas_empty_area() search
        maple_tree: make maple state reusable after mas_empty_area_rev()
        mm: kmsan: handle alloc failures in kmsan_ioremap_page_range()
        mm: kmsan: handle alloc failures in kmsan_vmap_pages_range_noflush()
        tools/Makefile: do missed s/vm/mm/
        mm: fix memory leak on mm_init error handling
        mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock
        kernel/sys.c: fix and improve control flow in __sys_setres[ug]id()
        Revert "userfaultfd: don't fail on unrecognized features"
        writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs
        maple_tree: fix a potential memory leak, OOB access, or other unpredictable bug
        tools/mm/page_owner_sort.c: fix TGID output when cull=tg is used
        mailmap: update jtoppins' entry to reference correct email
        mm/mempolicy: fix use-after-free of VMA iterator
        mm/huge_memory.c: warn with pr_warn_ratelimited instead of VM_WARN_ON_ONCE_FOLIO
        mm/mprotect: fix do_mprotect_pkey() return on error
        mm/khugepaged: check again on anon uffd-wp during isolation
        ...
      cb085634
  2. 19 Apr, 2023 11 commits
  3. 18 Apr, 2023 20 commits
    • Ryusuke Konishi's avatar
      nilfs2: initialize unused bytes in segment summary blocks · ef832747
      Ryusuke Konishi authored
      Syzbot still reports uninit-value in nilfs_add_checksums_on_logs() for
      KMSAN enabled kernels after applying commit 73970316 ("nilfs2:
      initialize "struct nilfs_binfo_dat"->bi_pad field").
      
      This is because the unused bytes at the end of each block in segment
      summaries are not initialized.  So this fixes the issue by padding the
      unused bytes with null bytes.
      
      Link: https://lkml.kernel.org/r/20230417173513.12598-1-konishi.ryusuke@gmail.comSigned-off-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Tested-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Reported-by: syzbot+048585f3f4227bb2b49b@syzkaller.appspotmail.com
        Link: https://syzkaller.appspot.com/bug?extid=048585f3f4227bb2b49b
      Cc: Alexander Potapenko <glider@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ef832747
    • Mel Gorman's avatar
      mm: page_alloc: skip regions with hugetlbfs pages when allocating 1G pages · 4d73ba5f
      Mel Gorman authored
      A bug was reported by Yuanxi Liu where allocating 1G pages at runtime is
      taking an excessive amount of time for large amounts of memory.  Further
      testing allocating huge pages that the cost is linear i.e.  if allocating
      1G pages in batches of 10 then the time to allocate nr_hugepages from
      10->20->30->etc increases linearly even though 10 pages are allocated at
      each step.  Profiles indicated that much of the time is spent checking the
      validity within already existing huge pages and then attempting a
      migration that fails after isolating the range, draining pages and a whole
      lot of other useless work.
      
      Commit eb14d4ee ("mm,page_alloc: drop unnecessary checks from
      pfn_range_valid_contig") removed two checks, one which ignored huge pages
      for contiguous allocations as huge pages can sometimes migrate.  While
      there may be value on migrating a 2M page to satisfy a 1G allocation, it's
      potentially expensive if the 1G allocation fails and it's pointless to try
      moving a 1G page for a new 1G allocation or scan the tail pages for valid
      PFNs.
      
      Reintroduce the PageHuge check and assume any contiguous region with
      hugetlbfs pages is unsuitable for a new 1G allocation.
      
      The hpagealloc test allocates huge pages in batches and reports the
      average latency per page over time.  This test happens just after boot
      when fragmentation is not an issue.  Units are in milliseconds.
      
      hpagealloc
                                     6.3.0-rc6              6.3.0-rc6              6.3.0-rc6
                                       vanilla   hugeallocrevert-v1r1   hugeallocsimple-v1r2
      Min       Latency       26.42 (   0.00%)        5.07 (  80.82%)       18.94 (  28.30%)
      1st-qrtle Latency      356.61 (   0.00%)        5.34 (  98.50%)       19.85 (  94.43%)
      2nd-qrtle Latency      697.26 (   0.00%)        5.47 (  99.22%)       20.44 (  97.07%)
      3rd-qrtle Latency      972.94 (   0.00%)        5.50 (  99.43%)       20.81 (  97.86%)
      Max-1     Latency       26.42 (   0.00%)        5.07 (  80.82%)       18.94 (  28.30%)
      Max-5     Latency       82.14 (   0.00%)        5.11 (  93.78%)       19.31 (  76.49%)
      Max-10    Latency      150.54 (   0.00%)        5.20 (  96.55%)       19.43 (  87.09%)
      Max-90    Latency     1164.45 (   0.00%)        5.53 (  99.52%)       20.97 (  98.20%)
      Max-95    Latency     1223.06 (   0.00%)        5.55 (  99.55%)       21.06 (  98.28%)
      Max-99    Latency     1278.67 (   0.00%)        5.57 (  99.56%)       22.56 (  98.24%)
      Max       Latency     1310.90 (   0.00%)        8.06 (  99.39%)       26.62 (  97.97%)
      Amean     Latency      678.36 (   0.00%)        5.44 *  99.20%*       20.44 *  96.99%*
      
                         6.3.0-rc6   6.3.0-rc6   6.3.0-rc6
                           vanilla   revert-v1   hugeallocfix-v2
      Duration User           0.28        0.27        0.30
      Duration System       808.66       17.77       35.99
      Duration Elapsed      830.87       18.08       36.33
      
      The vanilla kernel is poor, taking up to 1.3 second to allocate a huge
      page and almost 10 minutes in total to run the test.  Reverting the
      problematic commit reduces it to 8ms at worst and the patch takes 26ms. 
      This patch fixes the main issue with skipping huge pages but leaves the
      page_count() out because a page with an elevated count potentially can
      migrate.
      
      BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=217022
      Link: https://lkml.kernel.org/r/20230414141429.pwgieuwluxwez3rj@techsingularity.net
      Fixes: eb14d4ee ("mm,page_alloc: drop unnecessary checks from pfn_range_valid_contig")
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Reported-by: default avatarYuanxi Liu <y.liu@naruida.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4d73ba5f
    • Liam R. Howlett's avatar
      mm/mmap: regression fix for unmapped_area{_topdown} · 58c5d0d6
      Liam R. Howlett authored
      The maple tree limits the gap returned to a window that specifically fits
      what was asked.  This may not be optimal in the case of switching search
      directions or a gap that does not satisfy the requested space for other
      reasons.  Fix the search by retrying the operation and limiting the search
      window in the rare occasion that a conflict occurs.
      
      Link: https://lkml.kernel.org/r/20230414185919.4175572-1-Liam.Howlett@oracle.com
      Fixes: 3499a131 ("mm/mmap: use maple tree for unmapped_area{_topdown}")
      Signed-off-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Reported-by: default avatarRick Edgecombe <rick.p.edgecombe@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      58c5d0d6
    • Liam R. Howlett's avatar
      maple_tree: fix mas_empty_area() search · 06e8fd99
      Liam R. Howlett authored
      The internal function of mas_awalk() was incorrectly skipping the last
      entry in a node, which could potentially be NULL.  This is only a problem
      for the left-most node in the tree - otherwise that NULL would not exist.
      
      Fix mas_awalk() by using the metadata to obtain the end of the node for
      the loop and the logical pivot as apposed to the raw pivot value.
      
      Link: https://lkml.kernel.org/r/20230414145728.4067069-2-Liam.Howlett@oracle.com
      Fixes: 54a611b6 ("Maple Tree: add new data structure")
      Signed-off-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Reported-by: default avatarRick Edgecombe <rick.p.edgecombe@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      06e8fd99
    • Liam R. Howlett's avatar
      maple_tree: make maple state reusable after mas_empty_area_rev() · fad8e429
      Liam R. Howlett authored
      Stop using maple state min/max for the range by passing through pointers
      for those values.  This will allow the maple state to be reused without
      resetting.
      
      Also add some logic to fail out early on searching with invalid
      arguments.
      
      Link: https://lkml.kernel.org/r/20230414145728.4067069-1-Liam.Howlett@oracle.com
      Fixes: 54a611b6 ("Maple Tree: add new data structure")
      Signed-off-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Reported-by: default avatarRick Edgecombe <rick.p.edgecombe@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      fad8e429
    • Alexander Potapenko's avatar
      mm: kmsan: handle alloc failures in kmsan_ioremap_page_range() · fdea03e1
      Alexander Potapenko authored
      Similarly to kmsan_vmap_pages_range_noflush(), kmsan_ioremap_page_range()
      must also properly handle allocation/mapping failures.  In the case of
      such, it must clean up the already created metadata mappings and return an
      error code, so that the error can be propagated to ioremap_page_range(). 
      Without doing so, KMSAN may silently fail to bring the metadata for the
      page range into a consistent state, which will result in user-visible
      crashes when trying to access them.
      
      Link: https://lkml.kernel.org/r/20230413131223.4135168-2-glider@google.com
      Fixes: b073d7f8 ("mm: kmsan: maintain KMSAN metadata for page operations")
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Reported-by: default avatarDipanjan Das <mail.dipanjan.das@gmail.com>
        Link: https://lore.kernel.org/linux-mm/CANX2M5ZRrRA64k0hOif02TjmY9kbbO2aCBPyq79es34RXZ=cAw@mail.gmail.com/Reviewed-by: default avatarMarco Elver <elver@google.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      fdea03e1
    • Alexander Potapenko's avatar
      mm: kmsan: handle alloc failures in kmsan_vmap_pages_range_noflush() · 47ebd031
      Alexander Potapenko authored
      As reported by Dipanjan Das, when KMSAN is used together with kernel fault
      injection (or, generally, even without the latter), calls to kcalloc() or
      __vmap_pages_range_noflush() may fail, leaving the metadata mappings for
      the virtual mapping in an inconsistent state.  When these metadata
      mappings are accessed later, the kernel crashes.
      
      To address the problem, we return a non-zero error code from
      kmsan_vmap_pages_range_noflush() in the case of any allocation/mapping
      failure inside it, and make vmap_pages_range_noflush() return an error if
      KMSAN fails to allocate the metadata.
      
      This patch also removes KMSAN_WARN_ON() from vmap_pages_range_noflush(),
      as these allocation failures are not fatal anymore.
      
      Link: https://lkml.kernel.org/r/20230413131223.4135168-1-glider@google.com
      Fixes: b073d7f8 ("mm: kmsan: maintain KMSAN metadata for page operations")
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Reported-by: default avatarDipanjan Das <mail.dipanjan.das@gmail.com>
        Link: https://lore.kernel.org/linux-mm/CANX2M5ZRrRA64k0hOif02TjmY9kbbO2aCBPyq79es34RXZ=cAw@mail.gmail.com/Reviewed-by: default avatarMarco Elver <elver@google.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      47ebd031
    • SeongJae Park's avatar
      tools/Makefile: do missed s/vm/mm/ · a1014824
      SeongJae Park authored
      Commit 799fb82a ("tools/vm: rename tools/vm to tools/mm") missed
      renaming 'vm' in 'tools/Makefile' to 'mm'.  As a result, 'make clean'
      under 'tools/' directory fails as below:
      
          $ make -C tools clean
            DESCEND vm
          make[1]: Entering directory '/linux/tools/vm'
          make[1]: *** No rule to make target 'clean'.  Stop.
          make[1]: Leaving directory '/linux/tools/vm'
          make: *** [Makefile:173: vm_clean] Error 2
          make: Leaving directory '/linux/tools'
      
      Do the missed rename.
      
      Link: https://lkml.kernel.org/r/20230415203110.13858-1-sj@kernel.org
      Fixes: 799fb82a ("tools/vm: rename tools/vm to tools/mm")
      Signed-off-by: default avatarSeongJae Park <sj@kernel.org>
      Reported-by: default avatarRicardo Pardini <ricardo@pardini.net>
        Link: https://lore.kernel.org/linux-mm/20230415202454.13558-1-sj@kernel.org/Tested-by: default avatarRicardo Pardini <ricardo@pardini.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a1014824
    • Mathieu Desnoyers's avatar
      mm: fix memory leak on mm_init error handling · b20b0368
      Mathieu Desnoyers authored
      commit f1a79412 ("mm: convert mm's rss stats into percpu_counter")
      introduces a memory leak by missing a call to destroy_context() when a
      percpu_counter fails to allocate.
      
      Before introducing the per-cpu counter allocations, init_new_context() was
      the last call that could fail in mm_init(), and thus there was no need to
      ever invoke destroy_context() in the error paths.  Adding the following
      percpu counter allocations adds error paths after init_new_context(),
      which means its associated destroy_context() needs to be called when
      percpu counters fail to allocate.
      
      Link: https://lkml.kernel.org/r/20230330133822.66271-1-mathieu.desnoyers@efficios.com
      Fixes: f1a79412 ("mm: convert mm's rss stats into percpu_counter")
      Signed-off-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: default avatarShakeel Butt <shakeelb@google.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b20b0368
    • Tetsuo Handa's avatar
      mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock · 1007843a
      Tetsuo Handa authored
      syzbot is reporting circular locking dependency which involves
      zonelist_update_seq seqlock [1], for this lock is checked by memory
      allocation requests which do not need to be retried.
      
      One deadlock scenario is kmalloc(GFP_ATOMIC) from an interrupt handler.
      
        CPU0
        ----
        __build_all_zonelists() {
          write_seqlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount odd
          // e.g. timer interrupt handler runs at this moment
            some_timer_func() {
              kmalloc(GFP_ATOMIC) {
                __alloc_pages_slowpath() {
                  read_seqbegin(&zonelist_update_seq) {
                    // spins forever because zonelist_update_seq.seqcount is odd
                  }
                }
              }
            }
          // e.g. timer interrupt handler finishes
          write_sequnlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount even
        }
      
      This deadlock scenario can be easily eliminated by not calling
      read_seqbegin(&zonelist_update_seq) from !__GFP_DIRECT_RECLAIM allocation
      requests, for retry is applicable to only __GFP_DIRECT_RECLAIM allocation
      requests.  But Michal Hocko does not know whether we should go with this
      approach.
      
      Another deadlock scenario which syzbot is reporting is a race between
      kmalloc(GFP_ATOMIC) from tty_insert_flip_string_and_push_buffer() with
      port->lock held and printk() from __build_all_zonelists() with
      zonelist_update_seq held.
      
        CPU0                                   CPU1
        ----                                   ----
        pty_write() {
          tty_insert_flip_string_and_push_buffer() {
                                               __build_all_zonelists() {
                                                 write_seqlock(&zonelist_update_seq);
                                                 build_zonelists() {
                                                   printk() {
                                                     vprintk() {
                                                       vprintk_default() {
                                                         vprintk_emit() {
                                                           console_unlock() {
                                                             console_flush_all() {
                                                               console_emit_next_record() {
                                                                 con->write() = serial8250_console_write() {
            spin_lock_irqsave(&port->lock, flags);
            tty_insert_flip_string() {
              tty_insert_flip_string_fixed_flag() {
                __tty_buffer_request_room() {
                  tty_buffer_alloc() {
                    kmalloc(GFP_ATOMIC | __GFP_NOWARN) {
                      __alloc_pages_slowpath() {
                        zonelist_iter_begin() {
                          read_seqbegin(&zonelist_update_seq); // spins forever because zonelist_update_seq.seqcount is odd
                                                                   spin_lock_irqsave(&port->lock, flags); // spins forever because port->lock is held
                          }
                        }
                      }
                    }
                  }
                }
              }
            }
            spin_unlock_irqrestore(&port->lock, flags);
                                                                   // message is printed to console
                                                                   spin_unlock_irqrestore(&port->lock, flags);
                                                                 }
                                                               }
                                                             }
                                                           }
                                                         }
                                                       }
                                                     }
                                                   }
                                                 }
                                                 write_sequnlock(&zonelist_update_seq);
                                               }
          }
        }
      
      This deadlock scenario can be eliminated by
      
        preventing interrupt context from calling kmalloc(GFP_ATOMIC)
      
      and
      
        preventing printk() from calling console_flush_all()
      
      while zonelist_update_seq.seqcount is odd.
      
      Since Petr Mladek thinks that __build_all_zonelists() can become a
      candidate for deferring printk() [2], let's address this problem by
      
        disabling local interrupts in order to avoid kmalloc(GFP_ATOMIC)
      
      and
      
        disabling synchronous printk() in order to avoid console_flush_all()
      
      .
      
      As a side effect of minimizing duration of zonelist_update_seq.seqcount
      being odd by disabling synchronous printk(), latency at
      read_seqbegin(&zonelist_update_seq) for both !__GFP_DIRECT_RECLAIM and
      __GFP_DIRECT_RECLAIM allocation requests will be reduced.  Although, from
      lockdep perspective, not calling read_seqbegin(&zonelist_update_seq) (i.e.
      do not record unnecessary locking dependency) from interrupt context is
      still preferable, even if we don't allow calling kmalloc(GFP_ATOMIC)
      inside
      write_seqlock(&zonelist_update_seq)/write_sequnlock(&zonelist_update_seq)
      section...
      
      Link: https://lkml.kernel.org/r/8796b95c-3da3-5885-fddd-6ef55f30e4d3@I-love.SAKURA.ne.jp
      Fixes: 3d36424b ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation")
      Link: https://lkml.kernel.org/r/ZCrs+1cDqPWTDFNM@alley [2]
      Reported-by: default avatarsyzbot <syzbot+223c7461c58c58a4cb10@syzkaller.appspotmail.com>
        Link: https://syzkaller.appspot.com/bug?extid=223c7461c58c58a4cb10 [1]
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
      Cc: John Ogness <john.ogness@linutronix.de>
      Cc: Patrick Daly <quic_pdaly@quicinc.com>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1007843a
    • Ondrej Mosnacek's avatar
      kernel/sys.c: fix and improve control flow in __sys_setres[ug]id() · 659c0ce1
      Ondrej Mosnacek authored
      Linux Security Modules (LSMs) that implement the "capable" hook will
      usually emit an access denial message to the audit log whenever they
      "block" the current task from using the given capability based on their
      security policy.
      
      The occurrence of a denial is used as an indication that the given task
      has attempted an operation that requires the given access permission, so
      the callers of functions that perform LSM permission checks must take care
      to avoid calling them too early (before it is decided if the permission is
      actually needed to perform the requested operation).
      
      The __sys_setres[ug]id() functions violate this convention by first
      calling ns_capable_setid() and only then checking if the operation
      requires the capability or not.  It means that any caller that has the
      capability granted by DAC (task's capability set) but not by MAC (LSMs)
      will generate a "denied" audit record, even if is doing an operation for
      which the capability is not required.
      
      Fix this by reordering the checks such that ns_capable_setid() is checked
      last and -EPERM is returned immediately if it returns false.
      
      While there, also do two small optimizations:
      * move the capability check before prepare_creds() and
      * bail out early in case of a no-op.
      
      Link: https://lkml.kernel.org/r/20230217162154.837549-1-omosnace@redhat.com
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarOndrej Mosnacek <omosnace@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      659c0ce1
    • Linus Torvalds's avatar
      Merge tag 'mmc-v6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc · af67688d
      Linus Torvalds authored
      Pull MMC fixes from Ulf Hansson:
       "MMC host:
      
         - sdhci_am654: Fix support for UHS-I SDR12 and SDR25 speed modes
      
        MEMSTICK:
      
         - Fix memory leak if card device never gets registered"
      
      * tag 'mmc-v6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
        memstick: fix memory leak if card device is never registered
        mmc: sdhci_am654: Set HIGH_SPEED_ENA for SDR12 and SDR25
      af67688d
    • Linus Torvalds's avatar
      Merge tag 'arm-fixes-6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · bbab2531
      Linus Torvalds authored
      Pull ARM SoC fixes from Arnd Bergmann:
       "There are a number of updates for devicetree files for Qualcomm,
        Rockchips, and NXP i.MX platforms, addressing mistakes in the DT
        contents:
      
         - Wrong GPIO polarity on some boards
      
         - Lower SD card interface speed for better stability
      
         - Incorrect power supply, clock, pmic, cache properties
      
         - Disable broken hbr3 on sc7280-herobrine
      
         - Devicetree warning fixes
      
        The only other changes are:
      
         - A regression fix for the Amlogic performance monitoring unit
           driver, along with two related DT changes.
      
         - imx_v6_v7_defconfig enables PCI support again.
      
         - Trivial fixes for tee, optee and psci firmware drivers, addressing
           compiler warning and error output"
      
      * tag 'arm-fixes-6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (32 commits)
        firmware/psci: demote suspend-mode warning to info level
        arm64: dts: qcom: sc7280: remove hbr3 support on herobrine boards
        ARM: imx_v6_v7_defconfig: Fix unintentional disablement of PCI
        arm64: dts: rockchip: correct panel supplies on some rk3326 boards
        arm64: dts: rockchip: use just "port" in panel on RockPro64
        arm64: dts: rockchip: use just "port" in panel on Pinebook Pro
        ARM: dts: imx6ull-colibri: Remove unnecessary #address-cells/#size-cells
        ARM: dts: imx7d-remarkable2: Remove unnecessary #address-cells/#size-cells
        arm64: dts: imx8mp-verdin: correct off-on-delay
        arm64: dts: imx8mm-verdin: correct off-on-delay
        arm64: dts: imx8mm-evk: correct pmic clock source
        arm64: dts: qcom: sc8280xp-pmics: fix pon compatible and registers
        arm64: dts: rockchip: Remove non-existing pwm-delay-us property
        arm64: dts: rockchip: Add clk_rtc_32k to Anbernic xx3 Devices
        tee: Pass a pointer to virt_to_page()
        perf/amlogic: adjust register offsets
        arm64: dts: meson-g12-common: resolve conflict between canvas & pmu
        arm64: dts: meson-g12-common: specify full DMC range
        arm64: dts: imx8mp: fix address length for LCDIF2
        riscv: dts: canaan: drop invalid spi-max-frequency
        ...
      bbab2531
    • Huacai Chen's avatar
      LoongArch: module: set section addresses to 0x0 · 93eb1215
      Huacai Chen authored
      These got*, plt* and .text.ftrace_trampoline sections specified for
      LoongArch have non-zero addressses. Non-zero section addresses in a
      relocatable ELF would confuse GDB when it tries to compute the section
      offsets and it ends up printing wrong symbol addresses. Therefore, set
      them to zero, which mirrors the change in commit 5d8591bc
      ("arm64 module: set plt* section addresses to 0x0").
      
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarGuo Ren <guoren@kernel.org>
      Signed-off-by: default avatarChong Qiao <qiaochong@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      93eb1215
    • Huacai Chen's avatar
      LoongArch: Mark 3 symbol exports as non-GPL · dce5ea1d
      Huacai Chen authored
      vm_map_base, empty_zero_page and invalid_pmd_table could be accessed
      widely by some out-of-tree non-GPL but important file systems or drivers
      (e.g. OpenZFS). Let's use EXPORT_SYMBOL() instead of EXPORT_SYMBOL_GPL()
      to export them, so as to avoid build errors.
      
      1, Details about vm_map_base:
      
      This is a LoongArch-specific symbol and may be referenced through macros
      PCI_IOBASE, VMALLOC_START and VMALLOC_END.
      
      2, Details about empty_zero_page:
      
      As it stands today, only 3 architectures export empty_zero_page as a GPL
      symbol: IA64, LoongArch and MIPS. LoongArch gets the GPL export by
      inheriting from MIPS, and the MIPS export was first introduced in commit
      497d2adc ("[MIPS] Export empty_zero_page for sake of the ext4
      module."). The IA64 export was similar: commit a7d57ecf ("[IA64]
      Export three symbols for module use") did so for kvm.
      
      In both IA64 and MIPS, the export of empty_zero_page was done for
      satisfying some in-kernel component built as module (kvm and ext4
      respectively), and given its reasonably low-level nature, GPL is a
      reasonable choice. But looking at the bigger picture it is evident most
      other architectures do not regard it as GPL, so in effect the symbol
      probably should not be treated as such, in favor of consistency.
      
      3, Details about invalid_pmd_table:
      
      Keep consistency with invalid_pte_table and make it be possible by some
      modules.
      
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarWANG Xuerui <git@xen0n.name>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      dce5ea1d
    • Huacai Chen's avatar
      LoongArch: Enable PG when wakeup from suspend · 1c1378a4
      Huacai Chen authored
      Some firmwares don't enable PG when wakeup from suspend, so do it in
      kernel. This can improve code compatibility for boot kernel.
      Signed-off-by: default avatarBaoqi Zhang <zhangbaoqi@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      1c1378a4
    • Qing Zhang's avatar
      LoongArch: Fix _CONST64_(x) as unsigned · 6637775c
      Qing Zhang authored
      Addresses should all be of unsigned type to avoid unnecessary conversions.
      Signed-off-by: default avatarQing Zhang <zhangqing@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      6637775c
    • Huacai Chen's avatar
      LoongArch: Fix build error if CONFIG_SUSPEND is not set · 1cf62488
      Huacai Chen authored
      We can see the following build error on LoongArch if CONFIG_SUSPEND is
      not set:
      
        ld: drivers/acpi/sleep.o: in function 'acpi_pm_prepare':
        sleep.c:(.text+0x2b8): undefined reference to 'loongarch_wakeup_start'
      
      Here is the call trace:
      
        acpi_pm_prepare()
          __acpi_pm_prepare()
            acpi_sleep_prepare()
              acpi_get_wakeup_address()
                loongarch_wakeup_start()
      
      Root cause: loongarch_wakeup_start() is defined in arch/loongarch/power/
      suspend_asm.S which is only built under CONFIG_SUSPEND. In order to fix
      the build error, just let acpi_get_wakeup_address() return 0 if CONFIG_
      SUSPEND is not set.
      
      Fixes: 366bb35a ("LoongArch: Add suspend (ACPI S3) support")
      Reviewed-by: default avatarWANG Xuerui <git@xen0n.name>
      Reported-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Link: https://lore.kernel.org/all/11215033-fa3c-ecb1-2fc0-e9aeba47be9b@infradead.org/Signed-off-by: default avatarTiezhu Yang <yangtiezhu@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      1cf62488
    • Huacai Chen's avatar
      LoongArch: Fix probing of the CRC32 feature · df830336
      Huacai Chen authored
      Not all LoongArch processors support CRC32 instructions. This feature
      is indicated by CPUCFG1.CRC32 (Bit25) but it is wrongly defined in the
      previous versions of the ISA manual (and so does in loongarch.h). The
      CRC32 feature is set unconditionally now, so fix it.
      
      BTW, expose the CRC32 feature in /proc/cpuinfo.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      df830336
    • Huacai Chen's avatar
      LoongArch: Make WriteCombine configurable for ioremap() · 16c52e50
      Huacai Chen authored
      LoongArch maintains cache coherency in hardware, but when paired with
      LS7A chipsets the WUC attribute (Weak-ordered UnCached, which is similar
      to WriteCombine) is out of the scope of cache coherency machanism for
      PCIe devices (this is a PCIe protocol violation, which may be fixed in
      newer chipsets).
      
      This means WUC can only used for write-only memory regions now, so this
      option is disabled by default, making WUC silently fallback to SUC for
      ioremap(). You can enable this option if the kernel is ensured to run on
      hardware without this bug.
      
      Kernel parameter writecombine=on/off can be used to override the Kconfig
      option.
      
      Cc: stable@vger.kernel.org
      Suggested-by: default avatarWANG Xuerui <kernel@xen0n.name>
      Reviewed-by: default avatarWANG Xuerui <kernel@xen0n.name>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      16c52e50
  4. 17 Apr, 2023 1 commit
    • Chuck Lever's avatar
      SUNRPC: Fix failures of checksum Kunit tests · d5142519
      Chuck Lever authored
      Scott reports that when the new GSS krb5 Kunit tests are built as
      a separate module and loaded, the RFC 6803 and RFC 8009 checksum
      tests all fail, even though they pass when run under kunit.py.
      
      It appears that passing a buffer backed by static const memory to
      gss_krb5_checksum() is a problem. A printk in checksum_case() shows
      the correct plaintext, but by the time the buffer has been converted
      to a scatterlist and arrives at checksummer(), it contains all
      zeroes.
      
      Replacing this buffer with one that is dynamically allocated fixes
      the issue.
      Reported-by: default avatarScott Mayhew <smayhew@redhat.com>
      Fixes: 02142b2c ("SUNRPC: Add checksum KUnit tests for the RFC 6803 encryption types")
      Tested-by: default avatarScott Mayhew <smayhew@redhat.com>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      d5142519
  5. 16 Apr, 2023 7 commits
    • Linus Torvalds's avatar
      Linux 6.3-rc7 · 6a8f57ae
      Linus Torvalds authored
      6a8f57ae
    • Peter Xu's avatar
      Revert "userfaultfd: don't fail on unrecognized features" · 2ff559f3
      Peter Xu authored
      This is a proposal to revert commit 914eedcb.
      
      I found this when writing a simple UFFDIO_API test to be the first unit
      test in this set.  Two things breaks with the commit:
      
        - UFFDIO_API check was lost and missing.  According to man page, the
        kernel should reject ioctl(UFFDIO_API) if uffdio_api.api != 0xaa.  This
        check is needed if the api version will be extended in the future, or
        user app won't be able to identify which is a new kernel.
      
        - Feature flags checks were removed, which means UFFDIO_API with a
        feature that does not exist will also succeed.  According to the man
        page, we should (and it makes sense) to reject ioctl(UFFDIO_API) if
        unknown features passed in.
      
      Link: https://lore.kernel.org/r/20220722201513.1624158-1-axelrasmussen@google.com
      Link: https://lkml.kernel.org/r/20230412163922.327282-2-peterx@redhat.com
      Fixes: 914eedcb ("userfaultfd: don't fail on unrecognized features")
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2ff559f3
    • Baokun Li's avatar
      writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs · 1ba1199e
      Baokun Li authored
      KASAN report null-ptr-deref:
      ==================================================================
      BUG: KASAN: null-ptr-deref in bdi_split_work_to_wbs+0x5c5/0x7b0
      Write of size 8 at addr 0000000000000000 by task sync/943
      CPU: 5 PID: 943 Comm: sync Tainted: 6.3.0-rc5-next-20230406-dirty #461
      Call Trace:
       <TASK>
       dump_stack_lvl+0x7f/0xc0
       print_report+0x2ba/0x340
       kasan_report+0xc4/0x120
       kasan_check_range+0x1b7/0x2e0
       __kasan_check_write+0x24/0x40
       bdi_split_work_to_wbs+0x5c5/0x7b0
       sync_inodes_sb+0x195/0x630
       sync_inodes_one_sb+0x3a/0x50
       iterate_supers+0x106/0x1b0
       ksys_sync+0x98/0x160
      [...]
      ==================================================================
      
      The race that causes the above issue is as follows:
      
                 cpu1                     cpu2
      -------------------------|-------------------------
      inode_switch_wbs
       INIT_WORK(&isw->work, inode_switch_wbs_work_fn)
       queue_rcu_work(isw_wq, &isw->work)
       // queue_work async
        inode_switch_wbs_work_fn
         wb_put_many(old_wb, nr_switched)
          percpu_ref_put_many
           ref->data->release(ref)
           cgwb_release
            queue_work(cgwb_release_wq, &wb->release_work)
            // queue_work async
             &wb->release_work
             cgwb_release_workfn
                                  ksys_sync
                                   iterate_supers
                                    sync_inodes_one_sb
                                     sync_inodes_sb
                                      bdi_split_work_to_wbs
                                       kmalloc(sizeof(*work), GFP_ATOMIC)
                                       // alloc memory failed
              percpu_ref_exit
               ref->data = NULL
               kfree(data)
                                       wb_get(wb)
                                        percpu_ref_get(&wb->refcnt)
                                         percpu_ref_get_many(ref, 1)
                                          atomic_long_add(nr, &ref->data->count)
                                           atomic64_add(i, v)
                                           // trigger null-ptr-deref
      
      bdi_split_work_to_wbs() traverses &bdi->wb_list to split work into all
      wbs.  If the allocation of new work fails, the on-stack fallback will be
      used and the reference count of the current wb is increased afterwards. 
      If cgroup writeback membership switches occur before getting the reference
      count and the current wb is released as old_wd, then calling wb_get() or
      wb_put() will trigger the null pointer dereference above.
      
      This issue was introduced in v4.3-rc7 (see fix tag1).  Both
      sync_inodes_sb() and __writeback_inodes_sb_nr() calls to
      bdi_split_work_to_wbs() can trigger this issue.  For scenarios called via
      sync_inodes_sb(), originally commit 7fc5854f ("writeback: synchronize
      sync(2) against cgroup writeback membership switches") reduced the
      possibility of the issue by adding wb_switch_rwsem, but in v5.14-rc1 (see
      fix tag2) removed the "inode_io_list_del_locked(inode, old_wb)" from
      inode_switch_wbs_work_fn() so that wb->state contains WB_has_dirty_io,
      thus old_wb is not skipped when traversing wbs in bdi_split_work_to_wbs(),
      and the issue becomes easily reproducible again.
      
      To solve this problem, percpu_ref_exit() is called under RCU protection to
      avoid race between cgwb_release_workfn() and bdi_split_work_to_wbs(). 
      Moreover, replace wb_get() with wb_tryget() in bdi_split_work_to_wbs(),
      and skip the current wb if wb_tryget() fails because the wb has already
      been shutdown.
      
      Link: https://lkml.kernel.org/r/20230410130826.1492525-1-libaokun1@huawei.com
      Fixes: b817525a ("writeback: bdi_writeback iteration must not skip dying ones")
      Signed-off-by: default avatarBaokun Li <libaokun1@huawei.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Hou Tao <houtao1@huawei.com>
      Cc: yangerkun <yangerkun@huawei.com>
      Cc: Zhang Yi <yi.zhang@huawei.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1ba1199e
    • Peng Zhang's avatar
      maple_tree: fix a potential memory leak, OOB access, or other unpredictable bug · 1f5f12ec
      Peng Zhang authored
      In mas_alloc_nodes(), "node->node_count = 0" means to initialize the
      node_count field of the new node, but the node may not be a new node.  It
      may be a node that existed before and node_count has a value, setting it
      to 0 will cause a memory leak.  At this time, mas->alloc->total will be
      greater than the actual number of nodes in the linked list, which may
      cause many other errors.  For example, out-of-bounds access in
      mas_pop_node(), and mas_pop_node() may return addresses that should not be
      used.  Fix it by initializing node_count only for new nodes.
      
      Also, by the way, an if-else statement was removed to simplify the code.
      
      Link: https://lkml.kernel.org/r/20230411041005.26205-1-zhangpeng.00@bytedance.com
      Fixes: 54a611b6 ("Maple Tree: add new data structure")
      Signed-off-by: default avatarPeng Zhang <zhangpeng.00@bytedance.com>
      Reviewed-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1f5f12ec
    • Steve Chou's avatar
      tools/mm/page_owner_sort.c: fix TGID output when cull=tg is used · 92357568
      Steve Chou authored
      When using cull option with 'tg' flag, the fprintf is using pid instead
      of tgid. It should use tgid instead.
      
      Link: https://lkml.kernel.org/r/20230411034929.2071501-1-steve_chou@pesi.com.tw
      Fixes: 9c8a0a8e ("tools/vm/page_owner_sort.c: support for user-defined culling rules")
      Signed-off-by: default avatarSteve Chou <steve_chou@pesi.com.tw>
      Cc: Jiajian Ye <yejiajian2018@email.szu.edu.cn>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      92357568
    • Jonathan Toppins's avatar
      mailmap: update jtoppins' entry to reference correct email · d2c115ba
      Jonathan Toppins authored
      Link: https://lkml.kernel.org/r/d79bc6eaf65e68bd1c2a1e1510ab6291ce5926a6.1681162487.git.jtoppins@redhat.comSigned-off-by: default avatarJonathan Toppins <jtoppins@redhat.com>
      Cc: Colin Ian King <colin.i.king@gmail.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Kirill Tkhai <tkhai@ya.ru>
      Cc: Konrad Dybcio <konrad.dybcio@linaro.org>
      Cc: Qais Yousef <qyousef@layalina.io>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d2c115ba
    • Liam R. Howlett's avatar
      mm/mempolicy: fix use-after-free of VMA iterator · f4e9e0e6
      Liam R. Howlett authored
      set_mempolicy_home_node() iterates over a list of VMAs and calls
      mbind_range() on each VMA, which also iterates over the singular list of
      the VMA passed in and potentially splits the VMA.  Since the VMA iterator
      is not passed through, set_mempolicy_home_node() may now point to a stale
      node in the VMA tree.  This can result in a UAF as reported by syzbot.
      
      Avoid the stale maple tree node by passing the VMA iterator through to the
      underlying call to split_vma().
      
      mbind_range() is also overly complicated, since there are two calling
      functions and one already handles iterating over the VMAs.  Simplify
      mbind_range() to only handle merging and splitting of the VMAs.
      
      Align the new loop in do_mbind() and existing loop in
      set_mempolicy_home_node() to use the reduced mbind_range() function.  This
      allows for a single location of the range calculation and avoids
      constantly looking up the previous VMA (since this is a loop over the
      VMAs).
      
      Link: https://lore.kernel.org/linux-mm/000000000000c93feb05f87e24ad@google.com/
      Fixes: 66850be5 ("mm/mempolicy: use vma iterator & maple state instead of vma linked list")
      Signed-off-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Reported-by: syzbot+a7c1ec5b1d71ceaa5186@syzkaller.appspotmail.com
        Link: https://lkml.kernel.org/r/20230410152205.2294819-1-Liam.Howlett@oracle.com
      Tested-by: syzbot+a7c1ec5b1d71ceaa5186@syzkaller.appspotmail.com
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f4e9e0e6