1. 05 Jun, 2021 7 commits
    • Naoya Horiguchi's avatar
      hugetlb: pass head page to remove_hugetlb_page() · 0c5da357
      Naoya Horiguchi authored
      When memory_failure() or soft_offline_page() is called on a tail page of
      some hugetlb page, "BUG: unable to handle page fault" error can be
      triggered.
      
      remove_hugetlb_page() dereferences page->lru, so it's assumed that the
      page points to a head page, but one of the caller,
      dissolve_free_huge_page(), provides remove_hugetlb_page() with 'page'
      which could be a tail page.  So pass 'head' to it, instead.
      
      Link: https://lkml.kernel.org/r/20210526235257.2769473-1-nao.horiguchi@gmail.com
      Fixes: 6eb4e88a ("hugetlb: create remove_hugetlb_page() to separate functionality")
      Signed-off-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0c5da357
    • David Hildenbrand's avatar
      drivers/base/memory: fix trying offlining memory blocks with memory holes on aarch64 · 92813053
      David Hildenbrand authored
      offline_pages() properly checks for memory holes and bails out.
      However, we do a page_zone(pfn_to_page(start_pfn)) before calling
      offline_pages() when offlining a memory block.
      
      We should not unconditionally call page_zone(pfn_to_page(start_pfn)) on
      aarch64 in offlining code, otherwise we can trigger a BUG when hitting a
      memory hole:
      
         kernel BUG at include/linux/mm.h:1383!
         Internal error: Oops - BUG: 0 [#1] SMP
         Modules linked in: loop processor efivarfs ip_tables x_tables ext4 mbcache jbd2 dm_mod igb nvme i2c_algo_bit mlx5_core i2c_core nvme_core firmware_class
         CPU: 13 PID: 1694 Comm: ranbug Not tainted 5.12.0-next-20210524+ #4
         Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020
         pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
         pc : memory_subsys_offline+0x1f8/0x250
         lr : memory_subsys_offline+0x1f8/0x250
         Call trace:
           memory_subsys_offline+0x1f8/0x250
           device_offline+0x154/0x1d8
           online_store+0xa4/0x118
           dev_attr_store+0x44/0x78
           sysfs_kf_write+0xe8/0x138
           kernfs_fop_write_iter+0x26c/0x3d0
           new_sync_write+0x2bc/0x4f8
           vfs_write+0x718/0xc88
           ksys_write+0xf8/0x1e0
           __arm64_sys_write+0x74/0xa8
           invoke_syscall.constprop.0+0x78/0x1e8
           do_el0_svc+0xe4/0x298
           el0_svc+0x20/0x30
           el0_sync_handler+0xb0/0xb8
           el0_sync+0x178/0x180
         Kernel panic - not syncing: Oops - BUG: Fatal exception
         SMP: stopping secondary CPUs
         Kernel Offset: disabled
         CPU features: 0x00000251,20000846
         Memory Limit: none
      
      If nr_vmemmap_pages is set, we know that we are dealing with hotplugged
      memory that doesn't have any holes.  So call
      page_zone(pfn_to_page(start_pfn)) only when really necessary -- when
      nr_vmemmap_pages is set and we actually adjust the present pages.
      
      Link: https://lkml.kernel.org/r/20210526075226.5572-1-david@redhat.com
      Fixes: a08a2ae3 ("mm,memory_hotplug: allocate memmap from the added memory range")
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reported-by: default avatarQian Cai (QUIC) <quic_qiancai@quicinc.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Mike Rapoport <rppt@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      92813053
    • Ding Hui's avatar
      mm/page_alloc: fix counting of free pages after take off from buddy · bac9c6fa
      Ding Hui authored
      Recently we found that there is a lot MemFree left in /proc/meminfo
      after do a lot of pages soft offline, it's not quite correct.
      
      Before Oscar's rework of soft offline for free pages [1], if we soft
      offline free pages, these pages are left in buddy with HWPoison flag,
      and NR_FREE_PAGES is not updated immediately.  So the difference between
      NR_FREE_PAGES and real number of available free pages is also even big
      at the beginning.
      
      However, with the workload running, when we catch HWPoison page in any
      alloc functions subsequently, we will remove it from buddy, meanwhile
      update the NR_FREE_PAGES and try again, so the NR_FREE_PAGES will get
      more and more closer to the real number of available free pages.
      (regardless of unpoison_memory())
      
      Now, for offline free pages, after a successful call
      take_page_off_buddy(), the page is no longer belong to buddy allocator,
      and will not be used any more, but we missed accounting NR_FREE_PAGES in
      this situation, and there is no chance to be updated later.
      
      Do update in take_page_off_buddy() like rmqueue() does, but avoid double
      counting if some one already set_migratetype_isolate() on the page.
      
      [1]: commit 06be6ff3 ("mm,hwpoison: rework soft offline for free pages")
      
      Link: https://lkml.kernel.org/r/20210526075247.11130-1-dinghui@sangfor.com.cn
      Fixes: 06be6ff3 ("mm,hwpoison: rework soft offline for free pages")
      Signed-off-by: default avatarDing Hui <dinghui@sangfor.com.cn>
      Suggested-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bac9c6fa
    • Gerald Schaefer's avatar
      mm/debug_vm_pgtable: fix alignment for pmd/pud_advanced_tests() · 04f7ce3f
      Gerald Schaefer authored
      In pmd/pud_advanced_tests(), the vaddr is aligned up to the next pmd/pud
      entry, and so it does not match the given pmdp/pudp and (aligned down)
      pfn any more.
      
      For s390, this results in memory corruption, because the IDTE
      instruction used e.g.  in xxx_get_and_clear() will take the vaddr for
      some calculations, in combination with the given pmdp.  It will then end
      up with a wrong table origin, ending on ...ff8, and some of those
      wrongly set low-order bits will also select a wrong pagetable level for
      the index addition.  IDTE could therefore invalidate (or 0x20) something
      outside of the page tables, depending on the wrongly picked index, which
      in turn depends on the random vaddr.
      
      As result, we sometimes see "BUG task_struct (Not tainted): Padding
      overwritten" on s390, where one 0x5a padding value got overwritten with
      0x7a.
      
      Fix this by aligning down, similar to how the pmd/pud_aligned pfns are
      calculated.
      
      Link: https://lkml.kernel.org/r/20210525130043.186290-2-gerald.schaefer@linux.ibm.com
      Fixes: a5c3b9ff ("mm/debug_vm_pgtable: add tests validating advanced arch page table helpers")
      Signed-off-by: default avatarGerald Schaefer <gerald.schaefer@linux.ibm.com>
      Reviewed-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: <stable@vger.kernel.org>	[5.9+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      04f7ce3f
    • Mark Rutland's avatar
      pid: take a reference when initializing `cad_pid` · 0711f0d7
      Mark Rutland authored
      During boot, kernel_init_freeable() initializes `cad_pid` to the init
      task's struct pid.  Later on, we may change `cad_pid` via a sysctl, and
      when this happens proc_do_cad_pid() will increment the refcount on the
      new pid via get_pid(), and will decrement the refcount on the old pid
      via put_pid().  As we never called get_pid() when we initialized
      `cad_pid`, we decrement a reference we never incremented, can therefore
      free the init task's struct pid early.  As there can be dangling
      references to the struct pid, we can later encounter a use-after-free
      (e.g.  when delivering signals).
      
      This was spotted when fuzzing v5.13-rc3 with Syzkaller, but seems to
      have been around since the conversion of `cad_pid` to struct pid in
      commit 9ec52099 ("[PATCH] replace cad_pid by a struct pid") from the
      pre-KASAN stone age of v2.6.19.
      
      Fix this by getting a reference to the init task's struct pid when we
      assign it to `cad_pid`.
      
      Full KASAN splat below.
      
         ==================================================================
         BUG: KASAN: use-after-free in ns_of_pid include/linux/pid.h:153 [inline]
         BUG: KASAN: use-after-free in task_active_pid_ns+0xc0/0xc8 kernel/pid.c:509
         Read of size 4 at addr ffff23794dda0004 by task syz-executor.0/273
      
         CPU: 1 PID: 273 Comm: syz-executor.0 Not tainted 5.12.0-00001-g9aef892b2d15 #1
         Hardware name: linux,dummy-virt (DT)
         Call trace:
          ns_of_pid include/linux/pid.h:153 [inline]
          task_active_pid_ns+0xc0/0xc8 kernel/pid.c:509
          do_notify_parent+0x308/0xe60 kernel/signal.c:1950
          exit_notify kernel/exit.c:682 [inline]
          do_exit+0x2334/0x2bd0 kernel/exit.c:845
          do_group_exit+0x108/0x2c8 kernel/exit.c:922
          get_signal+0x4e4/0x2a88 kernel/signal.c:2781
          do_signal arch/arm64/kernel/signal.c:882 [inline]
          do_notify_resume+0x300/0x970 arch/arm64/kernel/signal.c:936
          work_pending+0xc/0x2dc
      
         Allocated by task 0:
          slab_post_alloc_hook+0x50/0x5c0 mm/slab.h:516
          slab_alloc_node mm/slub.c:2907 [inline]
          slab_alloc mm/slub.c:2915 [inline]
          kmem_cache_alloc+0x1f4/0x4c0 mm/slub.c:2920
          alloc_pid+0xdc/0xc00 kernel/pid.c:180
          copy_process+0x2794/0x5e18 kernel/fork.c:2129
          kernel_clone+0x194/0x13c8 kernel/fork.c:2500
          kernel_thread+0xd4/0x110 kernel/fork.c:2552
          rest_init+0x44/0x4a0 init/main.c:687
          arch_call_rest_init+0x1c/0x28
          start_kernel+0x520/0x554 init/main.c:1064
          0x0
      
         Freed by task 270:
          slab_free_hook mm/slub.c:1562 [inline]
          slab_free_freelist_hook+0x98/0x260 mm/slub.c:1600
          slab_free mm/slub.c:3161 [inline]
          kmem_cache_free+0x224/0x8e0 mm/slub.c:3177
          put_pid.part.4+0xe0/0x1a8 kernel/pid.c:114
          put_pid+0x30/0x48 kernel/pid.c:109
          proc_do_cad_pid+0x190/0x1b0 kernel/sysctl.c:1401
          proc_sys_call_handler+0x338/0x4b0 fs/proc/proc_sysctl.c:591
          proc_sys_write+0x34/0x48 fs/proc/proc_sysctl.c:617
          call_write_iter include/linux/fs.h:1977 [inline]
          new_sync_write+0x3ac/0x510 fs/read_write.c:518
          vfs_write fs/read_write.c:605 [inline]
          vfs_write+0x9c4/0x1018 fs/read_write.c:585
          ksys_write+0x124/0x240 fs/read_write.c:658
          __do_sys_write fs/read_write.c:670 [inline]
          __se_sys_write fs/read_write.c:667 [inline]
          __arm64_sys_write+0x78/0xb0 fs/read_write.c:667
          __invoke_syscall arch/arm64/kernel/syscall.c:37 [inline]
          invoke_syscall arch/arm64/kernel/syscall.c:49 [inline]
          el0_svc_common.constprop.1+0x16c/0x388 arch/arm64/kernel/syscall.c:129
          do_el0_svc+0xf8/0x150 arch/arm64/kernel/syscall.c:168
          el0_svc+0x28/0x38 arch/arm64/kernel/entry-common.c:416
          el0_sync_handler+0x134/0x180 arch/arm64/kernel/entry-common.c:432
          el0_sync+0x154/0x180 arch/arm64/kernel/entry.S:701
      
         The buggy address belongs to the object at ffff23794dda0000
          which belongs to the cache pid of size 224
         The buggy address is located 4 bytes inside of
          224-byte region [ffff23794dda0000, ffff23794dda00e0)
         The buggy address belongs to the page:
         page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4dda0
         head:(____ptrval____) order:1 compound_mapcount:0
         flags: 0x3fffc0000010200(slab|head)
         raw: 03fffc0000010200 dead000000000100 dead000000000122 ffff23794d40d080
         raw: 0000000000000000 0000000000190019 00000001ffffffff 0000000000000000
         page dumped because: kasan: bad access detected
      
         Memory state around the buggy address:
          ffff23794dd9ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
          ffff23794dd9ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
         >ffff23794dda0000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                            ^
          ffff23794dda0080: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
          ffff23794dda0100: fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00 00
         ==================================================================
      
      Link: https://lkml.kernel.org/r/20210524172230.38715-1-mark.rutland@arm.com
      Fixes: 9ec52099 ("[PATCH] replace cad_pid by a struct pid")
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Acked-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Christian Brauner <christian@brauner.io>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Kees Cook <keescook@chromium.org
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0711f0d7
    • Marco Elver's avatar
      kfence: use TASK_IDLE when awaiting allocation · 8fd0e995
      Marco Elver authored
      Since wait_event() uses TASK_UNINTERRUPTIBLE by default, waiting for an
      allocation counts towards load.  However, for KFENCE, this does not make
      any sense, since there is no busy work we're awaiting.
      
      Instead, use TASK_IDLE via wait_event_idle() to not count towards load.
      
      BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1185565
      Link: https://lkml.kernel.org/r/20210521083209.3740269-1-elver@google.com
      Fixes: 407f1d8c ("kfence: await for allocation using wait_event")
      Signed-off-by: default avatarMarco Elver <elver@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: David Laight <David.Laight@ACULAB.COM>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: <stable@vger.kernel.org>	[5.12+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8fd0e995
    • Thomas Bogendoerfer's avatar
      Revert "MIPS: make userspace mapping young by default" · 50c25ee9
      Thomas Bogendoerfer authored
      This reverts commit f685a533.
      
      The MIPS cache flush logic needs to know whether the mapping was already
      established to decide how to flush caches.  This is done by checking the
      valid bit in the PTE.  The commit above breaks this logic by setting the
      valid in the PTE in new mappings, which causes kernel crashes.
      
      Link: https://lkml.kernel.org/r/20210526094335.92948-1-tsbogend@alpha.franken.de
      Fixes: f685a533 ("MIPS: make userspace mapping young by default")
      Reported-by: default avatarZhou Yanjie <zhouyanjie@wanyeetech.com>
      Signed-off-by: default avatarThomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Huang Pei <huangpei@loongson.cn>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      50c25ee9
  2. 04 Jun, 2021 3 commits
    • Linus Torvalds's avatar
      Merge tag 'sound-5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 16f0596f
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "A couple of small fixes are found in the ALSA core side at this time;
        a fix in the new LED handling code and a long-standing (and likely no
        one would notice) ioctl bug.
      
        The rest are usual HD-audio fixes, mostly device-specific quirks but
        also one major regression fix that was introduced in 5.13"
      
      * tag 'sound-5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: hda: update the power_state during the direct-complete
        ALSA: timer: Fix master timer notification
        ALSA: control led: fix memory leak in snd_ctl_led_register
        ALSA: hda: Fix for mute key LED for HP Pavilion 15-CK0xx
        ALSA: hda/cirrus: Set Initial DMIC volume to -26 dB
        ALSA: hda: Fix a regression in Capture Switch mixer read
        ALSA: hda: Add AlderLake-M PCI ID
      16f0596f
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2021-06-04-1' of git://anongit.freedesktop.org/drm/drm · 3a3c5ab3
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Two big regression reverts in here, one for fbdev and one i915.
        Otherwise it's mostly amdgpu display fixes, and tegra fixes.
      
        fb:
         - revert broken fb_defio patch
      
        amdgpu:
         - Display fixes
         - FRU EEPROM error handling fix
         - RAS fix
         - PSP fix
         - Releasing pinned BO fix
      
        i915:
         - Revert conversion to io_mapping_map_user() which lead to BUG_ON()
         - Fix check for error valued returns in a selftest
      
        tegra:
         - SOR power domain race condition fix
         - build warning fix
         - runtime pm ref leak fix
         - modifier fix"
      
      * tag 'drm-fixes-2021-06-04-1' of git://anongit.freedesktop.org/drm/drm:
        amd/display: convert DRM_DEBUG_ATOMIC to drm_dbg_atomic
        drm/amdgpu: make sure we unpin the UVD BO
        drm/amd/amdgpu:save psp ring wptr to avoid attack
        drm/amd/display: Fix potential memory leak in DMUB hw_init
        drm/amdgpu: Don't query CE and UE errors
        drm/amd/display: Fix overlay validation by considering cursors
        drm/amdgpu: refine amdgpu_fru_get_product_info
        drm/amdgpu: add judgement for dc support
        drm/amd/display: Fix GPU scaling regression by FS video support
        drm/amd/display: Allow bandwidth validation for 0 streams.
        Revert "i915: use io_mapping_map_user"
        drm/i915/selftests: Fix return value check in live_breadcrumbs_smoketest()
        Revert "fb_defio: Remove custom address_space_operations"
        drm/tegra: Correct DRM_FORMAT_MOD_NVIDIA_SECTOR_LAYOUT
        drm/tegra: sor: Fix AUX device reference leak
        drm/tegra: Get ref for DP AUX channel, not its ddc adapter
        drm/tegra: Fix shift overflow in tegra_shared_plane_atomic_update
        drm/tegra: sor: Fully initialize SOR before registration
        gpu: host1x: Split up client initalization and registration
        drm/tegra: sor: Do not leak runtime PM reference
      3a3c5ab3
    • Dave Airlie's avatar
      Merge tag 'drm/tegra/for-5.13-rc5' of ssh://git.freedesktop.org/git/tegra/linux into drm-fixes · 37e2f2e8
      Dave Airlie authored
      drm/tegra: Fixes for v5.13-rc5
      
      The most important change here fixes a race condition that causes either
      HDA or (more frequently) display to malfunction because they race for
      enabling the SOR power domain at probe time.
      
      Other than that, there's a couple of build warnings for issues
      introduced in v5.13 as well as some minor fixes, such as reference leak
      plugs.
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      
      From: Thierry Reding <thierry.reding@gmail.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210603144624.788861-1-thierry.reding@gmail.com
      37e2f2e8
  3. 03 Jun, 2021 11 commits
  4. 02 Jun, 2021 19 commits