1. 04 Oct, 2022 15 commits
    • Chao Yu's avatar
      f2fs: fix to detect corrupted meta ino · fcc2d8cc
      Chao Yu authored
      It is possible that ino of dirent or orphan inode is corrupted in a
      fuzzed image, occasionally, if corrupted ino is equal to meta ino:
      meta_ino, node_ino or compress_ino, caller of f2fs_iget() from below
      call paths will get meta inode directly, it's not allowed, let's
      add sanity check to detect such cases.
      
      case #1
      - recover_dentry
       - __f2fs_find_entry
       - f2fs_iget_retry
      
      case #2
      - recover_orphan_inode
       - f2fs_iget_retry
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      fcc2d8cc
    • Chao Yu's avatar
      f2fs: fix to account FS_CP_DATA_IO correctly · d80afefb
      Chao Yu authored
      f2fs_inode_info.cp_task was introduced for FS_CP_DATA_IO accounting
      since commit b0af6d49 ("f2fs: add app/fs io stat").
      
      However, cp_task usage coverage has been increased due to below
      commits:
      commit 040d2bb3 ("f2fs: fix to avoid deadloop if data_flush is on")
      commit 186857c5 ("f2fs: fix potential recursive call when enabling data_flush")
      
      So that, if data_flush mountoption is on, when data flush was
      triggered from background, the IO from data flush will be accounted
      as checkpoint IO type incorrectly.
      
      In order to fix this issue, this patch splits cp_task into two:
      a) cp_task: used for IO accounting
      b) wb_task: used to avoid deadlock
      
      Fixes: 040d2bb3 ("f2fs: fix to avoid deadloop if data_flush is on")
      Fixes: 186857c5 ("f2fs: fix potential recursive call when enabling data_flush")
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      d80afefb
    • Zhang Qilong's avatar
      f2fs: code clean and fix a type error · 544b53da
      Zhang Qilong authored
      ERROR: code indent should use tabs where possible
      ERROR: spaces required around that ':'
      ERROR: incorrect tab
      
      Found serveral code type errors when review the code and fix it.
      There is no function change.
      Signed-off-by: default avatarZhang Qilong <zhangqilong3@huawei.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      544b53da
    • Zhang Qilong's avatar
      f2fs: add "c_len" into trace_f2fs_update_extent_tree_range for compressed file · a834aa3e
      Zhang Qilong authored
      The trace_f2fs_update_extent_tree_range could not record compressed
      block length in the cluster of compress file and we just add it.
      Signed-off-by: default avatarZhang Qilong <zhangqilong3@huawei.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      a834aa3e
    • Chao Yu's avatar
      f2fs: fix to do sanity check on summary info · c6ad7fd1
      Chao Yu authored
      As Wenqing Liu reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=216456
      
      BUG: KASAN: use-after-free in recover_data+0x63ae/0x6ae0 [f2fs]
      Read of size 4 at addr ffff8881464dcd80 by task mount/1013
      
      CPU: 3 PID: 1013 Comm: mount Tainted: G        W          6.0.0-rc4 #1
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
      Call Trace:
       dump_stack_lvl+0x45/0x5e
       print_report.cold+0xf3/0x68d
       kasan_report+0xa8/0x130
       recover_data+0x63ae/0x6ae0 [f2fs]
       f2fs_recover_fsync_data+0x120d/0x1fc0 [f2fs]
       f2fs_fill_super+0x4665/0x61e0 [f2fs]
       mount_bdev+0x2cf/0x3b0
       legacy_get_tree+0xed/0x1d0
       vfs_get_tree+0x81/0x2b0
       path_mount+0x47e/0x19d0
       do_mount+0xce/0xf0
       __x64_sys_mount+0x12c/0x1a0
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      The root cause is: in fuzzed image, SSA table is corrupted: ofs_in_node
      is larger than ADDRS_PER_PAGE(), result in out-of-range access on 4k-size
      page.
      
      - recover_data
       - do_recover_data
        - check_index_in_prev_nodes
         - f2fs_data_blkaddr
      
      This patch adds sanity check on summary info in recovery and GC flow
      in where the flows rely on them.
      
      After patch:
      [   29.310883] F2FS-fs (loop0): Inconsistent ofs_in_node:65286 in summary, ino:0, nid:6, max:1018
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarWenqing Liu <wenqingliu0120@gmail.com>
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      c6ad7fd1
    • Christian Brauner's avatar
      f2fs: port to vfs{g,u}id_t and associated helpers · 1e8a9191
      Christian Brauner authored
      A while ago we introduced a dedicated vfs{g,u}id_t type in commit
      1e5267cd ("mnt_idmapping: add vfs{g,u}id_t"). We already switched
      over a good part of the VFS. Ultimately we will remove all legacy
      idmapped mount helpers that operate only on k{g,u}id_t in favor of the
      new type safe helpers that operate on vfs{g,u}id_t.
      
      Cc: Seth Forshee (Digital Ocean) <sforshee@kernel.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jaegeuk Kim <jaegeuk@kernel.org>
      Cc: Chao Yu <chao@kernel.org>
      Cc: linux-f2fs-devel@lists.sourceforge.net
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      1e8a9191
    • Chao Yu's avatar
      f2fs: fix to do sanity check on destination blkaddr during recovery · 0ef4ca04
      Chao Yu authored
      As Wenqing Liu reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=216456
      
      loop5: detected capacity change from 0 to 131072
      F2FS-fs (loop5): recover_inode: ino = 6, name = hln, inline = 1
      F2FS-fs (loop5): recover_data: ino = 6 (i_size: recover) err = 0
      F2FS-fs (loop5): recover_inode: ino = 6, name = hln, inline = 1
      F2FS-fs (loop5): recover_data: ino = 6 (i_size: recover) err = 0
      F2FS-fs (loop5): recover_inode: ino = 6, name = hln, inline = 1
      F2FS-fs (loop5): recover_data: ino = 6 (i_size: recover) err = 0
      F2FS-fs (loop5): Bitmap was wrongly set, blk:5634
      ------------[ cut here ]------------
      WARNING: CPU: 3 PID: 1013 at fs/f2fs/segment.c:2198
      RIP: 0010:update_sit_entry+0xa55/0x10b0 [f2fs]
      Call Trace:
       <TASK>
       f2fs_do_replace_block+0xa98/0x1890 [f2fs]
       f2fs_replace_block+0xeb/0x180 [f2fs]
       recover_data+0x1a69/0x6ae0 [f2fs]
       f2fs_recover_fsync_data+0x120d/0x1fc0 [f2fs]
       f2fs_fill_super+0x4665/0x61e0 [f2fs]
       mount_bdev+0x2cf/0x3b0
       legacy_get_tree+0xed/0x1d0
       vfs_get_tree+0x81/0x2b0
       path_mount+0x47e/0x19d0
       do_mount+0xce/0xf0
       __x64_sys_mount+0x12c/0x1a0
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      If we enable CONFIG_F2FS_CHECK_FS config, it will trigger a kernel panic
      instead of warning.
      
      The root cause is: in fuzzed image, SIT table is inconsistent with inode
      mapping table, result in triggering such warning during SIT table update.
      
      This patch introduces a new flag DATA_GENERIC_ENHANCE_UPDATE, w/ this
      flag, data block recovery flow can check destination blkaddr's validation
      in SIT table, and skip f2fs_replace_block() to avoid inconsistent status.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarWenqing Liu <wenqingliu0120@gmail.com>
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      0ef4ca04
    • Weichao Guo's avatar
      f2fs: let FI_OPU_WRITE override FADVISE_COLD_BIT · f3b23c78
      Weichao Guo authored
      Cold files may be fragmented due to SSR, defragment is needed as
      sequential reads are dominant scenarios of these files. FI_OPU_WRITE
      should override FADVISE_COLD_BIT to avoid defragment fails.
      Signed-off-by: default avatarWeichao Guo <guoweichao@oppo.com>
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      f3b23c78
    • Zhang Qilong's avatar
      f2fs: fix race condition on setting FI_NO_EXTENT flag · 07725adc
      Zhang Qilong authored
      The following scenarios exist.
      process A:               process B:
      ->f2fs_drop_extent_tree  ->f2fs_update_extent_cache_range
                                ->f2fs_update_extent_tree_range
                                 ->write_lock
       ->set_inode_flag
                                 ->is_inode_flag_set
                                 ->__free_extent_tree // Shouldn't
                                                      // have been
                                                      // cleaned up
                                                      // here
        ->write_lock
      
      In this case, the "FI_NO_EXTENT" flag is set between
      f2fs_update_extent_tree_range and is_inode_flag_set
      by other process. it leads to clearing the whole exten
      tree which should not have happened. And we fix it by
      move the setting it to the range of write_lock.
      
      Fixes:5f281fab ("f2fs: disable extent_cache for fcollapse/finsert inodes")
      Signed-off-by: default avatarZhang Qilong <zhangqilong3@huawei.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      07725adc
    • Zhang Qilong's avatar
      f2fs: remove redundant check in f2fs_sanity_check_cluster · 9df6d6f9
      Zhang Qilong authored
      It have checked "compressed" at the entry of
      f2fs_sanity_check_cluster, just remove the
      redundant check for better performance here.
      Signed-off-by: default avatarZhang Qilong <zhangqilong3@huawei.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      9df6d6f9
    • Zhang Qilong's avatar
      f2fs: add static init_idisk_time function to reduce the code · 049ea86c
      Zhang Qilong authored
      We can use a inner function to init the disk time
      of f2fs_inode_info for cleaning redundant code.
      Signed-off-by: default avatarZhang Qilong <zhangqilong3@huawei.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      049ea86c
    • Yonggil Song's avatar
      f2fs: fix typo · d382e369
      Yonggil Song authored
      Fix typo in f2fs.h
      Detected by Jaeyoon Choi
      Signed-off-by: default avatarYonggil Song <yonggil.song@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      d382e369
    • Shuqi Zhang's avatar
      f2fs: fix wrong dirty page count when race between mmap and fallocate. · 9b7eadd9
      Shuqi Zhang authored
      This is a BUG_ON issue as follows when running xfstest-generic-503:
      WARNING: CPU: 21 PID: 1385 at fs/f2fs/inode.c:762 f2fs_evict_inode+0x847/0xaa0
      Modules linked in:
      CPU: 21 PID: 1385 Comm: umount Not tainted 5.19.0-rc5+ #73
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/01/2014
      
      Call Trace:
      evict+0x129/0x2d0
      dispose_list+0x4f/0xb0
      evict_inodes+0x204/0x230
      generic_shutdown_super+0x5b/0x1e0
      kill_block_super+0x29/0x80
      kill_f2fs_super+0xe6/0x140
      deactivate_locked_super+0x44/0xc0
      deactivate_super+0x79/0x90
      cleanup_mnt+0x114/0x1a0
      __cleanup_mnt+0x16/0x20
      task_work_run+0x98/0x100
      exit_to_user_mode_prepare+0x3d0/0x3e0
      syscall_exit_to_user_mode+0x12/0x30
      do_syscall_64+0x42/0x80
      entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      Function flow analysis when BUG occurs:
      f2fs_fallocate                    mmap
                                        do_page_fault
                                          pte_spinlock  // ---lock_pte
                                          do_wp_page
                                            wp_page_shared
                                              pte_unmap_unlock   // unlock_pte
                                                do_page_mkwrite
                                                f2fs_vm_page_mkwrite
                                                  down_read(invalidate_lock)
                                                  lock_page
                                                  if (PageMappedToDisk(page))
                                                    goto out;
                                                  // set_page_dirty  --NOT RUN
                                                  out: up_read(invalidate_lock);
                                              finish_mkwrite_fault // unlock_pte
      f2fs_collapse_range
        down_write(i_mmap_sem)
        truncate_pagecache
          unmap_mapping_pages
            i_mmap_lock_write // down_write(i_mmap_rwsem)
              ......
              zap_pte_range
                pte_offset_map_lock // ---lock_pte
                 set_page_dirty
                  f2fs_dirty_data_folio
                    if (!folio_test_dirty(folio)) {
                                              fault_dirty_shared_page
                                                set_page_dirty
                                                  f2fs_dirty_data_folio
                                                    if (!folio_test_dirty(folio)) {
                                                      filemap_dirty_folio
                                                      f2fs_update_dirty_folio // ++
                                                    }
                                                  unlock_page
                      filemap_dirty_folio
                      f2fs_update_dirty_folio // page count++
                    }
                pte_unmap_unlock  // --unlock_pte
            i_mmap_unlock_write  // up_write(i_mmap_rwsem)
        truncate_inode_pages
        up_write(i_mmap_sem)
      
      When race happens between mmap-do_page_fault-wp_page_shared and
      fallocate-truncate_pagecache-zap_pte_range, the zap_pte_range calls
      function set_page_dirty without page lock. Besides, though
      truncate_pagecache has immap and pte lock, wp_page_shared calls
      fault_dirty_shared_page without any. In this case, two threads race
      in f2fs_dirty_data_folio function. Page is set to dirty only ONCE,
      but the count is added TWICE by calling filemap_dirty_folio.
      Thus the count of dirty page cannot accord with the real dirty pages.
      
      Following is the solution to in case of race happens without any lock.
      Since folio_test_set_dirty in filemap_dirty_folio is atomic, judge return
      value will not be at risk of race.
      Signed-off-by: default avatarShuqi Zhang <zhangshuqi3@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      9b7eadd9
    • Zhang Qilong's avatar
      f2fs: use COMPRESS_MAPPING to get compress cache mapping · 173cdf2c
      Zhang Qilong authored
      Just use the defined COMPRESS_MAPPING to get compress cache
      mapping instaed of direct accessing name.
      Signed-off-by: default avatarZhang Qilong <zhangqilong3@huawei.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      173cdf2c
    • Zhang Qilong's avatar
      f2fs: return the tmp_ptr directly in __bitmap_ptr · 280dfeae
      Zhang Qilong authored
      Just return tmp_ptr here, it's no need to dereference
      checkpoint pointer again.
      Signed-off-by: default avatarZhang Qilong <zhangqilong3@huawei.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      280dfeae
  2. 13 Sep, 2022 5 commits
  3. 30 Aug, 2022 5 commits
  4. 29 Aug, 2022 2 commits
  5. 28 Aug, 2022 13 commits
    • Linus Torvalds's avatar
      Linux 6.0-rc3 · b90cb105
      Linus Torvalds authored
      b90cb105
    • Linus Torvalds's avatar
      Merge tag 'mm-hotfixes-stable-2022-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm · b467192e
      Linus Torvalds authored
      Pull more hotfixes from Andrew Morton:
       "Seventeen hotfixes.  Mostly memory management things.
      
        Ten patches are cc:stable, addressing pre-6.0 issues"
      
      * tag 'mm-hotfixes-stable-2022-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
        .mailmap: update Luca Ceresoli's e-mail address
        mm/mprotect: only reference swap pfn page if type match
        squashfs: don't call kmalloc in decompressors
        mm/damon/dbgfs: avoid duplicate context directory creation
        mailmap: update email address for Colin King
        asm-generic: sections: refactor memory_intersects
        bootmem: remove the vmemmap pages from kmemleak in put_page_bootmem
        ocfs2: fix freeing uninitialized resource on ocfs2_dlm_shutdown
        Revert "memcg: cleanup racy sum avoidance code"
        mm/zsmalloc: do not attempt to free IS_ERR handle
        binder_alloc: add missing mmap_lock calls when using the VMA
        mm: re-allow pinning of zero pfns (again)
        vmcoreinfo: add kallsyms_num_syms symbol
        mailmap: update Guilherme G. Piccoli's email addresses
        writeback: avoid use-after-free after removing device
        shmem: update folio if shmem_replace_page() updates the page
        mm/hugetlb: avoid corrupting page->mapping in hugetlb_mcopy_atomic_pte
      b467192e
    • Linus Torvalds's avatar
      Merge tag 'bitmap-6.0-rc3' of github.com:/norov/linux · 373eff57
      Linus Torvalds authored
      Pull bitmap fixes from Yury Norov:
       "Fix the reported issues, and implements the suggested improvements,
        for the version of the cpumask tests [1] that was merged with commit
        c41e8866 ("lib/test: introduce cpumask KUnit test suite").
      
        These changes include fixes for the tests, and better alignment with
        the KUnit style guidelines"
      
      * tag 'bitmap-6.0-rc3' of github.com:/norov/linux:
        lib/cpumask_kunit: add tests file to MAINTAINERS
        lib/cpumask_kunit: log mask contents
        lib/test_cpumask: follow KUnit style guidelines
        lib/test_cpumask: fix cpu_possible_mask last test
        lib/test_cpumask: drop cpu_possible_mask full test
      373eff57
    • Luca Ceresoli's avatar
      .mailmap: update Luca Ceresoli's e-mail address · 0ebafe2e
      Luca Ceresoli authored
      My Bootlin address is preferred from now on.
      
      Link: https://lkml.kernel.org/r/20220826130515.3011951-1-luca.ceresoli@bootlin.comSigned-off-by: default avatarLuca Ceresoli <luca.ceresoli@bootlin.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Atish Patra <atishp@atishpatra.org>
      Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
      Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0ebafe2e
    • Peter Xu's avatar
      mm/mprotect: only reference swap pfn page if type match · 3d2f78f0
      Peter Xu authored
      Yu Zhao reported a bug after the commit "mm/swap: Add swp_offset_pfn() to
      fetch PFN from swap entry" added a check in swp_offset_pfn() for swap type [1]:
      
        kernel BUG at include/linux/swapops.h:117!
        CPU: 46 PID: 5245 Comm: EventManager_De Tainted: G S         O L 6.0.0-dbg-DEV #2
        RIP: 0010:pfn_swap_entry_to_page+0x72/0xf0
        Code: c6 48 8b 36 48 83 fe ff 74 53 48 01 d1 48 83 c1 08 48 8b 09 f6
        c1 01 75 7b 66 90 48 89 c1 48 8b 09 f6 c1 01 74 74 5d c3 eb 9e <0f> 0b
        48 ba ff ff ff ff 03 00 00 00 eb ae a9 ff 0f 00 00 75 13 48
        RSP: 0018:ffffa59e73fabb80 EFLAGS: 00010282
        RAX: 00000000ffffffe8 RBX: 0c00000000000000 RCX: ffffcd5440000000
        RDX: 1ffffffffff7a80a RSI: 0000000000000000 RDI: 0c0000000000042b
        RBP: ffffa59e73fabb80 R08: ffff9965ca6e8bb8 R09: 0000000000000000
        R10: ffffffffa5a2f62d R11: 0000030b372e9fff R12: ffff997b79db5738
        R13: 000000000000042b R14: 0c0000000000042b R15: 1ffffffffff7a80a
        FS:  00007f549d1bb700(0000) GS:ffff99d3cf680000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000440d035b3180 CR3: 0000002243176004 CR4: 00000000003706e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         <TASK>
         change_pte_range+0x36e/0x880
         change_p4d_range+0x2e8/0x670
         change_protection_range+0x14e/0x2c0
         mprotect_fixup+0x1ee/0x330
         do_mprotect_pkey+0x34c/0x440
         __x64_sys_mprotect+0x1d/0x30
      
      It triggers because pfn_swap_entry_to_page() could be called upon e.g. a
      genuine swap entry.
      
      Fix it by only calling it when it's a write migration entry where the page*
      is used.
      
      [1] https://lore.kernel.org/lkml/CAOUHufaVC2Za-p8m0aiHw6YkheDcrO-C3wRGixwDS32VTS+k1w@mail.gmail.com/
      
      Link: https://lkml.kernel.org/r/20220823221138.45602-1-peterx@redhat.com
      Fixes: 6c287605 ("mm: remember exclusively mapped anonymous pages with PG_anon_exclusive")
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reported-by: default avatarYu Zhao <yuzhao@google.com>
      Tested-by: default avatarYu Zhao <yuzhao@google.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3d2f78f0
    • Phillip Lougher's avatar
      squashfs: don't call kmalloc in decompressors · 1f13dff0
      Phillip Lougher authored
      The decompressors may be called while in an atomic section.  So move the
      kmalloc() out of this path, and into the "page actor" init function.
      
      This fixes a regression introduced by commit
      f268eedd ("squashfs: extend "page actor" to handle missing pages")
      
      Link: https://lkml.kernel.org/r/20220822215430.15933-1-phillip@squashfs.org.uk
      Fixes: f268eedd ("squashfs: extend "page actor" to handle missing pages")
      Reported-by: default avatarChris Murphy <lists@colorremedies.com>
      Signed-off-by: default avatarPhillip Lougher <phillip@squashfs.org.uk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1f13dff0
    • Badari Pulavarty's avatar
      mm/damon/dbgfs: avoid duplicate context directory creation · d26f6070
      Badari Pulavarty authored
      When user tries to create a DAMON context via the DAMON debugfs interface
      with a name of an already existing context, the context directory creation
      fails but a new context is created and added in the internal data
      structure, due to absence of the directory creation success check.  As a
      result, memory could leak and DAMON cannot be turned on.  An example test
      case is as below:
      
          # cd /sys/kernel/debug/damon/
          # echo "off" >  monitor_on
          # echo paddr > target_ids
          # echo "abc" > mk_context
          # echo "abc" > mk_context
          # echo $$ > abc/target_ids
          # echo "on" > monitor_on  <<< fails
      
      Return value of 'debugfs_create_dir()' is expected to be ignored in
      general, but this is an exceptional case as DAMON feature is depending
      on the debugfs functionality and it has the potential duplicate name
      issue.  This commit therefore fixes the issue by checking the directory
      creation failure and immediately return the error in the case.
      
      Link: https://lkml.kernel.org/r/20220821180853.2400-1-sj@kernel.org
      Fixes: 75c1c2b5 ("mm/damon/dbgfs: support multiple contexts")
      Signed-off-by: default avatarBadari Pulavarty <badari.pulavarty@intel.com>
      Signed-off-by: default avatarSeongJae Park <sj@kernel.org>
      Cc: <stable@vger.kernel.org>	[ 5.15.x]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d26f6070
    • Colin Ian King's avatar
      mailmap: update email address for Colin King · ac733f65
      Colin Ian King authored
      Colin King is working on kernel janitorial fixes in his spare time and
      using his Intel email is confusing.  Use his gmail account as the default
      email address.
      
      Link: https://lkml.kernel.org/r/20220817212753.101109-1-colin.i.king@gmail.comSigned-off-by: default avatarColin Ian King <colin.i.king@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ac733f65
    • Quanyang Wang's avatar
      asm-generic: sections: refactor memory_intersects · 0c7d7cc2
      Quanyang Wang authored
      There are two problems with the current code of memory_intersects:
      
      First, it doesn't check whether the region (begin, end) falls inside the
      region (virt, vend), that is (virt < begin && vend > end).
      
      The second problem is if vend is equal to begin, it will return true but
      this is wrong since vend (virt + size) is not the last address of the
      memory region but (virt + size -1) is.  The wrong determination will
      trigger the misreporting when the function check_for_illegal_area calls
      memory_intersects to check if the dma region intersects with stext region.
      
      The misreporting is as below (stext is at 0x80100000):
       WARNING: CPU: 0 PID: 77 at kernel/dma/debug.c:1073 check_for_illegal_area+0x130/0x168
       DMA-API: chipidea-usb2 e0002000.usb: device driver maps memory from kernel text or rodata [addr=800f0000] [len=65536]
       Modules linked in:
       CPU: 1 PID: 77 Comm: usb-storage Not tainted 5.19.0-yocto-standard #5
       Hardware name: Xilinx Zynq Platform
        unwind_backtrace from show_stack+0x18/0x1c
        show_stack from dump_stack_lvl+0x58/0x70
        dump_stack_lvl from __warn+0xb0/0x198
        __warn from warn_slowpath_fmt+0x80/0xb4
        warn_slowpath_fmt from check_for_illegal_area+0x130/0x168
        check_for_illegal_area from debug_dma_map_sg+0x94/0x368
        debug_dma_map_sg from __dma_map_sg_attrs+0x114/0x128
        __dma_map_sg_attrs from dma_map_sg_attrs+0x18/0x24
        dma_map_sg_attrs from usb_hcd_map_urb_for_dma+0x250/0x3b4
        usb_hcd_map_urb_for_dma from usb_hcd_submit_urb+0x194/0x214
        usb_hcd_submit_urb from usb_sg_wait+0xa4/0x118
        usb_sg_wait from usb_stor_bulk_transfer_sglist+0xa0/0xec
        usb_stor_bulk_transfer_sglist from usb_stor_bulk_srb+0x38/0x70
        usb_stor_bulk_srb from usb_stor_Bulk_transport+0x150/0x360
        usb_stor_Bulk_transport from usb_stor_invoke_transport+0x38/0x440
        usb_stor_invoke_transport from usb_stor_control_thread+0x1e0/0x238
        usb_stor_control_thread from kthread+0xf8/0x104
        kthread from ret_from_fork+0x14/0x2c
      
      Refactor memory_intersects to fix the two problems above.
      
      Before the 1d7db834 ("dma-debug: use memory_intersects()
      directly"), memory_intersects is called only by printk_late_init:
      
      printk_late_init -> init_section_intersects ->memory_intersects.
      
      There were few places where memory_intersects was called.
      
      When commit 1d7db834 ("dma-debug: use memory_intersects()
      directly") was merged and CONFIG_DMA_API_DEBUG is enabled, the DMA
      subsystem uses it to check for an illegal area and the calltrace above
      is triggered.
      
      [akpm@linux-foundation.org: fix nearby comment typo]
      Link: https://lkml.kernel.org/r/20220819081145.948016-1-quanyang.wang@windriver.com
      Fixes: 97955936 ("asm/sections: add helpers to check for section data")
      Signed-off-by: default avatarQuanyang Wang <quanyang.wang@windriver.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Thierry Reding <treding@nvidia.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0c7d7cc2
    • Liu Shixin's avatar
      bootmem: remove the vmemmap pages from kmemleak in put_page_bootmem · dd0ff4d1
      Liu Shixin authored
      The vmemmap pages is marked by kmemleak when allocated from memblock. 
      Remove it from kmemleak when freeing the page.  Otherwise, when we reuse
      the page, kmemleak may report such an error and then stop working.
      
       kmemleak: Cannot insert 0xffff98fb6eab3d40 into the object search tree (overlaps existing)
       kmemleak: Kernel memory leak detector disabled
       kmemleak: Object 0xffff98fb6be00000 (size 335544320):
       kmemleak:   comm "swapper", pid 0, jiffies 4294892296
       kmemleak:   min_count = 0
       kmemleak:   count = 0
       kmemleak:   flags = 0x1
       kmemleak:   checksum = 0
       kmemleak:   backtrace:
      
      Link: https://lkml.kernel.org/r/20220819094005.2928241-1-liushixin2@huawei.com
      Fixes: f41f2ed4 (mm: hugetlb: free the vmemmap pages associated with each HugeTLB page)
      Signed-off-by: default avatarLiu Shixin <liushixin2@huawei.com>
      Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      dd0ff4d1
    • Heming Zhao's avatar
      ocfs2: fix freeing uninitialized resource on ocfs2_dlm_shutdown · 550842cc
      Heming Zhao authored
      After commit 0737e01d ("ocfs2: ocfs2_mount_volume does cleanup job
      before return error"), any procedure after ocfs2_dlm_init() fails will
      trigger crash when calling ocfs2_dlm_shutdown().
      
      ie: On local mount mode, no dlm resource is initialized.  If
      ocfs2_mount_volume() fails in ocfs2_find_slot(), error handling will call
      ocfs2_dlm_shutdown(), then does dlm resource cleanup job, which will
      trigger kernel crash.
      
      This solution should bypass uninitialized resources in
      ocfs2_dlm_shutdown().
      
      Link: https://lkml.kernel.org/r/20220815085754.20417-1-heming.zhao@suse.com
      Fixes: 0737e01d ("ocfs2: ocfs2_mount_volume does cleanup job before return error")
      Signed-off-by: default avatarHeming Zhao <heming.zhao@suse.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      550842cc
    • Shakeel Butt's avatar
      Revert "memcg: cleanup racy sum avoidance code" · dbb16df6
      Shakeel Butt authored
      This reverts commit 96e51ccf.
      
      Recently we started running the kernel with rstat infrastructure on
      production traffic and begin to see negative memcg stats values. 
      Particularly the 'sock' stat is the one which we observed having negative
      value.
      
      $ grep "sock " /mnt/memory/job/memory.stat
      sock 253952
      total_sock 18446744073708724224
      
      Re-run after couple of seconds
      
      $ grep "sock " /mnt/memory/job/memory.stat
      sock 253952
      total_sock 53248
      
      For now we are only seeing this issue on large machines (256 CPUs) and
      only with 'sock' stat.  I think the networking stack increase the stat on
      one cpu and decrease it on another cpu much more often.  So, this negative
      sock is due to rstat flusher flushing the stats on the CPU that has seen
      the decrement of sock but missed the CPU that has increments.  A typical
      race condition.
      
      For easy stable backport, revert is the most simple solution.  For long
      term solution, I am thinking of two directions.  First is just reduce the
      race window by optimizing the rstat flusher.  Second is if the reader sees
      a negative stat value, force flush and restart the stat collection. 
      Basically retry but limited.
      
      Link: https://lkml.kernel.org/r/20220817172139.3141101-1-shakeelb@google.com
      Fixes: 96e51ccf ("memcg: cleanup racy sum avoidance code")
      Signed-off-by: default avatarShakeel Butt <shakeelb@google.com>
      Cc: "Michal Koutný" <mkoutny@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Yosry Ahmed <yosryahmed@google.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: <stable@vger.kernel.org>	[5.15]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      dbb16df6
    • Sergey Senozhatsky's avatar
      mm/zsmalloc: do not attempt to free IS_ERR handle · a5d21721
      Sergey Senozhatsky authored
      zsmalloc() now returns ERR_PTR values as handles, which zram accidentally
      can pass to zs_free().  Another bad scenario is when zcomp_compress()
      fails - handle has default -ENOMEM value, and zs_free() will try to free
      that "pointer value".
      
      Add the missing check and make sure that zs_free() bails out when
      ERR_PTR() is passed to it.
      
      Link: https://lkml.kernel.org/r/20220816050906.2583956-1-senozhatsky@chromium.org
      Fixes: c7e6f17b ("zsmalloc: zs_malloc: return ERR_PTR on failure")
      Signed-off-by: default avatarSergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>,
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a5d21721