1. 09 Nov, 2021 1 commit
    • Hyeong-Jun Kim's avatar
      f2fs: invalidate META_MAPPING before IPU/DIO write · e3b49ea3
      Hyeong-Jun Kim authored
      Encrypted pages during GC are read and cached in META_MAPPING.
      However, due to cached pages in META_MAPPING, there is an issue where
      newly written pages are lost by IPU or DIO writes.
      
      Thread A - f2fs_gc()            Thread B
      /* phase 3 */
      down_write(i_gc_rwsem)
      ra_data_block()       ---- (a)
      up_write(i_gc_rwsem)
                                      f2fs_direct_IO() :
                                       - down_read(i_gc_rwsem)
                                       - __blockdev_direct_io()
                                       - get_data_block_dio_write()
                                       - f2fs_dio_submit_bio()  ---- (b)
                                       - up_read(i_gc_rwsem)
      /* phase 4 */
      down_write(i_gc_rwsem)
      move_data_block()     ---- (c)
      up_write(i_gc_rwsem)
      
      (a) In phase 3 of f2fs_gc(), up-to-date page is read from storage and
          cached in META_MAPPING.
      (b) In thread B, writing new data by IPU or DIO write on same blkaddr as
          read in (a). cached page in META_MAPPING become out-dated.
      (c) In phase 4 of f2fs_gc(), out-dated page in META_MAPPING is copied to
          new blkaddr. In conclusion, the newly written data in (b) is lost.
      
      To address this issue, invalidating pages in META_MAPPING before IPU or
      DIO write.
      
      Fixes: 6aa58d8a ("f2fs: readahead encrypted block during GC")
      Signed-off-by: default avatarHyeong-Jun Kim <hj514.kim@samsung.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      e3b49ea3
  2. 29 Oct, 2021 2 commits
  3. 27 Oct, 2021 1 commit
  4. 26 Oct, 2021 5 commits
    • Fengnan Chang's avatar
      f2fs: compress: fix overwrite may reduce compress ratio unproperly · b368cc5e
      Fengnan Chang authored
      when overwrite only first block of cluster, since cluster is not full, it
      will call f2fs_write_raw_pages when f2fs_write_multi_pages, and cause the
      whole cluster become uncompressed eventhough data can be compressed.
      this may will make random write bench score reduce a lot.
      
      root# dd if=/dev/zero of=./fio-test bs=1M count=1
      
      root# sync
      
      root# echo 3 > /proc/sys/vm/drop_caches
      
      root# f2fs_io get_cblocks ./fio-test
      
      root# dd if=/dev/zero of=./fio-test bs=4K count=1 oflag=direct conv=notrunc
      
      w/o patch:
      root# f2fs_io get_cblocks ./fio-test
      189
      
      w/ patch:
      root# f2fs_io get_cblocks ./fio-test
      192
      Signed-off-by: default avatarFengnan Chang <changfengnan@vivo.com>
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      b368cc5e
    • Chao Yu's avatar
      f2fs: multidevice: support direct IO · 71f2c820
      Chao Yu authored
      Commit 3c62be17 ("f2fs: support multiple devices") missed
      to support direct IO for multiple device feature, this patch
      adds to support the missing part of multidevice feature.
      
      In addition, for multiple device image, we should be aware of
      any issued direct write IO rather than just buffered write IO,
      so that fsync and syncfs can issue a preflush command to the
      device where direct write IO goes, to persist user data for
      posix compliant.
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      71f2c820
    • Daeho Jeong's avatar
      f2fs: introduce fragment allocation mode mount option · 6691d940
      Daeho Jeong authored
      Added two options into "mode=" mount option to make it possible for
      developers to simulate filesystem fragmentation/after-GC situation
      itself. The developers use these modes to understand filesystem
      fragmentation/after-GC condition well, and eventually get some
      insights to handle them better.
      
      "fragment:segment": f2fs allocates a new segment in ramdom position.
      		With this, we can simulate the after-GC condition.
      "fragment:block" : We can scatter block allocation with
      		"max_fragment_chunk" and "max_fragment_hole" sysfs
      		nodes. f2fs will allocate 1..<max_fragment_chunk>
      		blocks in a chunk and make a hole in the length of
      		1..<max_fragment_hole> by turns	in a newly allocated
      		free segment. Plus, this mode implicitly enables
      		"fragment:segment" option for more randomness.
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarDaeho Jeong <daehojeong@google.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      6691d940
    • Qing Wang's avatar
      f2fs: replace snprintf in show functions with sysfs_emit · 84eab2a8
      Qing Wang authored
      coccicheck complains about the use of snprintf() in sysfs show functions.
      
      Fix the following coccicheck warning:
      fs/f2fs/sysfs.c:198:12-20: WARNING: use scnprintf or sprintf.
      fs/f2fs/sysfs.c:247:8-16: WARNING: use scnprintf or sprintf.
      
      Use sysfs_emit instead of scnprintf or sprintf makes more sense.
      Signed-off-by: default avatarQing Wang <wangqing@vivo.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      84eab2a8
    • Daeho Jeong's avatar
      f2fs: include non-compressed blocks in compr_written_block · 09631cf3
      Daeho Jeong authored
      Need to include non-compressed blocks in compr_written_block to
      estimate average compression ratio more accurately.
      
      Fixes: 5ac443e2 ("f2fs: add sysfs nodes to get runtime compression stat")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDaeho Jeong <daehojeong@google.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      09631cf3
  5. 11 Oct, 2021 2 commits
  6. 28 Sep, 2021 1 commit
  7. 20 Sep, 2021 2 commits
  8. 16 Sep, 2021 5 commits
    • Chao Yu's avatar
      f2fs: avoid attaching SB_ACTIVE flag during mount · c02599f2
      Chao Yu authored
      Quoted from [1]
      
      "I do remember that I've added this code back then because otherwise
      orphan cleanup was losing updates to quota files. But you're right
      that now I don't see how that could be happening and it would be nice
      if we could get rid of this hack"
      
      [1] https://lore.kernel.org/linux-ext4/99cce8ca-e4a0-7301-840f-2ace67c551f3@huawei.com/T/#m04990cfbc4f44592421736b504afcc346b2a7c00
      
      Related fix in ext4 by
      commit 72ffb49a ("ext4: do not set SB_ACTIVE in ext4_orphan_cleanup()").
      
      f2fs has the same hack implementation in
      - f2fs_recover_orphan_inodes()
      - f2fs_recover_fsync_data()
      
      Let's get rid of this hack as well in f2fs.
      
      Cc: Zhang Yi <yi.zhang@huawei.com>
      Cc: Jan Kara <jack@suse.cz>
      Acked-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      c02599f2
    • Chao Yu's avatar
      f2fs: quota: fix potential deadlock · a5c00422
      Chao Yu authored
      As Yi Zhuang reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=214299
      
      There is potential deadlock during quota data flush as below:
      
      Thread A:			Thread B:
      f2fs_dquot_acquire
      down_read(&sbi->quota_sem)
      				f2fs_write_checkpoint
      				block_operations
      				f2fs_look_all
      				down_write(&sbi->cp_rwsem)
      f2fs_quota_write
      f2fs_write_begin
      __do_map_lock
      f2fs_lock_op
      down_read(&sbi->cp_rwsem)
      				__need_flush_qutoa
      				down_write(&sbi->quota_sem)
      
      This patch changes block_operations() to use trylock, if it fails,
      it means there is potential quota data updater, in this condition,
      let's flush quota data first and then trylock again to check dirty
      status of quota data.
      
      The side effect is: in heavy race condition (e.g. multi quota data
      upaters vs quota data flusher), it may decrease the probability of
      synchronizing quota data successfully in checkpoint() due to limited
      retry time of quota flush.
      Reported-by: default avatarYi Zhuang <zhuangyi1@huawei.com>
      Signed-off-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      a5c00422
    • Jaegeuk Kim's avatar
      f2fs: should use GFP_NOFS for directory inodes · 92d602bc
      Jaegeuk Kim authored
      We use inline_dentry which requires to allocate dentry page when adding a link.
      If we allow to reclaim memory from filesystem, we do down_read(&sbi->cp_rwsem)
      twice by f2fs_lock_op(). I think this should be okay, but how about stopping
      the lockdep complaint [1]?
      
      f2fs_create()
       - f2fs_lock_op()
       - f2fs_do_add_link()
        - __f2fs_find_entry
         - f2fs_get_read_data_page()
         -> kswapd
          - shrink_node
           - f2fs_evict_inode
            - f2fs_lock_op()
      
      [1]
      
      fs_reclaim
      ){+.+.}-{0:0}
      :
      kswapd0:        lock_acquire+0x114/0x394
      kswapd0:        __fs_reclaim_acquire+0x40/0x50
      kswapd0:        prepare_alloc_pages+0x94/0x1ec
      kswapd0:        __alloc_pages_nodemask+0x78/0x1b0
      kswapd0:        pagecache_get_page+0x2e0/0x57c
      kswapd0:        f2fs_get_read_data_page+0xc0/0x394
      kswapd0:        f2fs_find_data_page+0xa4/0x23c
      kswapd0:        find_in_level+0x1a8/0x36c
      kswapd0:        __f2fs_find_entry+0x70/0x100
      kswapd0:        f2fs_do_add_link+0x84/0x1ec
      kswapd0:        f2fs_mkdir+0xe4/0x1e4
      kswapd0:        vfs_mkdir+0x110/0x1c0
      kswapd0:        do_mkdirat+0xa4/0x160
      kswapd0:        __arm64_sys_mkdirat+0x24/0x34
      kswapd0:        el0_svc_common.llvm.17258447499513131576+0xc4/0x1e8
      kswapd0:        do_el0_svc+0x28/0xa0
      kswapd0:        el0_svc+0x24/0x38
      kswapd0:        el0_sync_handler+0x88/0xec
      kswapd0:        el0_sync+0x1c0/0x200
      kswapd0:
      -> #1
      (
      &sbi->cp_rwsem
      ){++++}-{3:3}
      :
      kswapd0:        lock_acquire+0x114/0x394
      kswapd0:        down_read+0x7c/0x98
      kswapd0:        f2fs_do_truncate_blocks+0x78/0x3dc
      kswapd0:        f2fs_truncate+0xc8/0x128
      kswapd0:        f2fs_evict_inode+0x2b8/0x8b8
      kswapd0:        evict+0xd4/0x2f8
      kswapd0:        iput+0x1c0/0x258
      kswapd0:        do_unlinkat+0x170/0x2a0
      kswapd0:        __arm64_sys_unlinkat+0x4c/0x68
      kswapd0:        el0_svc_common.llvm.17258447499513131576+0xc4/0x1e8
      kswapd0:        do_el0_svc+0x28/0xa0
      kswapd0:        el0_svc+0x24/0x38
      kswapd0:        el0_sync_handler+0x88/0xec
      kswapd0:        el0_sync+0x1c0/0x200
      
      Cc: stable@vger.kernel.org
      Fixes: bdbc90fa ("f2fs: don't put dentry page in pagecache into highmem")
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Reviewed-by: default avatarStanley Chu <stanley.chu@mediatek.com>
      Reviewed-by: default avatarLight Hsieh <light.hsieh@mediatek.com>
      Tested-by: default avatarLight Hsieh <light.hsieh@mediatek.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      92d602bc
    • Linus Torvalds's avatar
      Merge tag 'hyperv-fixes-signed-20210915' of... · ff1ffd71
      Linus Torvalds authored
      Merge tag 'hyperv-fixes-signed-20210915' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
      
      Pull hyperv fixes from Wei Liu:
      
       - Fix kernel crash caused by uio driver (Vitaly Kuznetsov)
      
       - Remove on-stack cpumask from HV APIC code (Wei Liu)
      
      * tag 'hyperv-fixes-signed-20210915' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
        x86/hyperv: remove on-stack cpumask from hv_send_ipi_mask_allbutself
        asm-generic/hyperv: provide cpumask_to_vpset_noself
        Drivers: hv: vmbus: Fix kernel crash upon unbinding a device from uio_hv_generic driver
      ff1ffd71
    • Linus Torvalds's avatar
      Merge tag 'rtc-5.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux · 453fa43c
      Linus Torvalds authored
      Pull RTC fix from Alexandre Belloni:
       "Fix a locking issue in the cmos rtc driver"
      
      * tag 'rtc-5.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
        rtc: cmos: Disable irq around direct invocation of cmos_interrupt()
      453fa43c
  9. 15 Sep, 2021 8 commits
    • Linus Torvalds's avatar
      qnx4: avoid stringop-overread errors · b7213ffa
      Linus Torvalds authored
      The qnx4 directory entries are 64-byte blocks that have different
      contents depending on the a status byte that is in the last byte of the
      block.
      
      In particular, a directory entry can be either a "link info" entry with
      a 48-byte name and pointers to the real inode information, or an "inode
      entry" with a smaller 16-byte name and the full inode information.
      
      But the code was written to always just treat the directory name as if
      it was part of that "inode entry", and just extend the name to the
      longer case if the status byte said it was a link entry.
      
      That work just fine and gives the right results, but now that gcc is
      tracking data structure accesses much more, the code can trigger a
      compiler error about using up to 48 bytes (the long name) in a structure
      that only has that shorter name in it:
      
         fs/qnx4/dir.c: In function ‘qnx4_readdir’:
         fs/qnx4/dir.c:51:32: error: ‘strnlen’ specified bound 48 exceeds source size 16 [-Werror=stringop-overread]
            51 |                         size = strnlen(de->di_fname, size);
               |                                ^~~~~~~~~~~~~~~~~~~~~~~~~~~
         In file included from fs/qnx4/qnx4.h:3,
                          from fs/qnx4/dir.c:16:
         include/uapi/linux/qnx4_fs.h:45:25: note: source object declared here
            45 |         char            di_fname[QNX4_SHORT_NAME_MAX];
               |                         ^~~~~~~~
      
      which is because the source code doesn't really make this whole "one of
      two different types" explicit.
      
      Fix this by introducing a very explicit union of the two types, and
      basically explaining to the compiler what is really going on.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b7213ffa
    • Linus Torvalds's avatar
      sparc: avoid stringop-overread errors · fc7c028d
      Linus Torvalds authored
      The sparc mdesc code does pointer games with 'struct mdesc_hdr', but
      didn't describe to the compiler how that header is then followed by the
      data that the header describes.
      
      As a result, gcc is now unhappy since it does stricter pointer range
      tracking, and doesn't understand about how these things work.  This
      results in various errors like:
      
          arch/sparc/kernel/mdesc.c: In function ‘mdesc_node_by_name’:
          arch/sparc/kernel/mdesc.c:647:22: error: ‘strcmp’ reading 1 or more bytes from a region of size 0 [-Werror=stringop-overread]
            647 |                 if (!strcmp(names + ep[ret].name_offset, name))
                |                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      which are easily avoided by just describing 'struct mdesc_hdr' better,
      and making the node_block() helper function look into that unsized
      data[] that follows the header.
      
      This makes the sparc64 build happy again at least for my cross-compiler
      version (gcc version 11.2.1).
      
      Link: https://lore.kernel.org/lkml/CAHk-=wi4NW3NC0xWykkw=6LnjQD6D_rtRtxY9g8gQAJXtQMi8A@mail.gmail.com/
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fc7c028d
    • Linus Torvalds's avatar
      Merge branch 'absolute-pointer' (patches from Guenter) · d6efd3f1
      Linus Torvalds authored
      Merge absolute_pointer macro series from Guenter Roeck:
       "Kernel test builds currently fail for several architectures with error
        messages such as the following.
      
        drivers/net/ethernet/i825xx/82596.c: In function 'i82596_probe':
        arch/m68k/include/asm/string.h:72:25: error:
              '__builtin_memcpy' reading 6 bytes from a region of size 0
                      [-Werror=stringop-overread]
      
        Such warnings may be reported by gcc 11.x for string and memory
        operations on fixed addresses if gcc's builtin functions are used for
        those operations.
      
        This series introduces absolute_pointer() to fix the problem.
        absolute_pointer() disassociates a pointer from its originating symbol
        type and context, and thus prevents gcc from making assumptions about
        pointers passed to memory operations"
      
      * emailed patches from Guenter Roeck <linux@roeck-us.net>:
        alpha: Use absolute_pointer to define COMMAND_LINE
        alpha: Move setup.h out of uapi
        net: i825xx: Use absolute_pointer for memcpy from fixed memory location
        compiler.h: Introduce absolute_pointer macro
      d6efd3f1
    • Guenter Roeck's avatar
      alpha: Use absolute_pointer to define COMMAND_LINE · ebdc20d7
      Guenter Roeck authored
      alpha:allmodconfig fails to build with the following error
      when using gcc 11.x.
      
        arch/alpha/kernel/setup.c: In function 'setup_arch':
        arch/alpha/kernel/setup.c:493:13: error:
      	'strcmp' reading 1 or more bytes from a region of size 0
      
      Avoid the problem by declaring COMMAND_LINE as absolute_pointer().
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ebdc20d7
    • Guenter Roeck's avatar
      alpha: Move setup.h out of uapi · 3cb8b153
      Guenter Roeck authored
      Most of the contents of setup.h have no value for userspace
      applications.  The file was probably moved to uapi accidentally.
      
      Keep the file in uapi to define the alpha-specific COMMAND_LINE_SIZE.
      Move all other defines to arch/alpha/include/asm/setup.h.
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3cb8b153
    • Guenter Roeck's avatar
      net: i825xx: Use absolute_pointer for memcpy from fixed memory location · dff2d131
      Guenter Roeck authored
      gcc 11.x reports the following compiler warning/error.
      
        drivers/net/ethernet/i825xx/82596.c: In function 'i82596_probe':
        arch/m68k/include/asm/string.h:72:25: error:
      	'__builtin_memcpy' reading 6 bytes from a region of size 0 [-Werror=stringop-overread]
      
      Use absolute_pointer() to work around the problem.
      
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Reviewed-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dff2d131
    • Guenter Roeck's avatar
      compiler.h: Introduce absolute_pointer macro · f6b5f1a5
      Guenter Roeck authored
      absolute_pointer() disassociates a pointer from its originating symbol
      type and context. Use it to prevent compiler warnings/errors such as
      
        drivers/net/ethernet/i825xx/82596.c: In function 'i82596_probe':
        arch/m68k/include/asm/string.h:72:25: error:
      	'__builtin_memcpy' reading 6 bytes from a region of size 0 [-Werror=stringop-overread]
      
      Such warnings may be reported by gcc 11.x for string and memory
      operations on fixed addresses.
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Reviewed-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f6b5f1a5
    • Masami Hiramatsu's avatar
      tools/bootconfig: Define memblock_free_ptr() to fix build error · 80be5998
      Masami Hiramatsu authored
      The lib/bootconfig.c file is shared with the 'bootconfig' tooling, and
      as a result, the changes incommit 77e02cf5 ("memblock: introduce
      saner 'memblock_free_ptr()' interface") need to also be reflected in the
      tooling header file.
      
      So define the new memblock_free_ptr() wrapper, and remove unused __pa()
      and memblock_free().
      
      Fixes: 77e02cf5 ("memblock: introduce saner 'memblock_free_ptr()' interface")
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      80be5998
  10. 14 Sep, 2021 5 commits
    • Huang Rui's avatar
      drm/ttm: fix type mismatch error on sparc64 · 3ca706c1
      Huang Rui authored
      On sparc64, __fls() returns an "int", but the drm TTM code expected it
      to be "unsigned long" as on x86.  As a result, on sparc (and arc, and
      m68k) you get build errors because 'min()' checks that the types match.
      
      As suggested by Linus, it can use min_t instead of min to force the type
      to be "unsigned int".
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarHuang Rui <ray.huang@amd.com>
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Cc: Alex Deucher <alexdeucher@gmail.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3ca706c1
    • Linus Torvalds's avatar
      memblock: introduce saner 'memblock_free_ptr()' interface · 77e02cf5
      Linus Torvalds authored
      The boot-time allocation interface for memblock is a mess, with
      'memblock_alloc()' returning a virtual pointer, but then you are
      supposed to free it with 'memblock_free()' that takes a _physical_
      address.
      
      Not only is that all kinds of strange and illogical, but it actually
      causes bugs, when people then use it like a normal allocation function,
      and it fails spectacularly on a NULL pointer:
      
         https://lore.kernel.org/all/20210912140820.GD25450@xsang-OptiPlex-9020/
      
      or just random memory corruption if the debug checks don't catch it:
      
         https://lore.kernel.org/all/61ab2d0c-3313-aaab-514c-e15b7aa054a0@suse.cz/
      
      I really don't want to apply patches that treat the symptoms, when the
      fundamental cause is this horribly confusing interface.
      
      I started out looking at just automating a sane replacement sequence,
      but because of this mix or virtual and physical addresses, and because
      people have used the "__pa()" macro that can take either a regular
      kernel pointer, or just the raw "unsigned long" address, it's all quite
      messy.
      
      So this just introduces a new saner interface for freeing a virtual
      address that was allocated using 'memblock_alloc()', and that was kept
      as a regular kernel pointer.  And then it converts a couple of users
      that are obvious and easy to test, including the 'xbc_nodes' case in
      lib/bootconfig.c that caused problems.
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Fixes: 40caa127 ("init: bootconfig: Remove all bootconfig data when the init memory is removed")
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      77e02cf5
    • Vasily Averin's avatar
      ipc: remove memcg accounting for sops objects in do_semtimedop() · 6a4746ba
      Vasily Averin authored
      Linus proposes to revert an accounting for sops objects in
      do_semtimedop() because it's really just a temporary buffer
      for a single semtimedop() system call.
      
      This object can consume up to 2 pages, syscall is sleeping
      one, size and duration can be controlled by user, and this
      allocation can be repeated by many thread at the same time.
      
      However Shakeel Butt pointed that there are much more popular
      objects with the same life time and similar memory
      consumption, the accounting of which was decided to be
      rejected for performance reasons.
      
      Considering at least 2 pages for task_struct and 2 pages for
      the kernel stack, a back of the envelope calculation gives a
      footprint amplification of <1.5 so this temporal buffer can be
      safely ignored.
      
      The factor would IMO be interesting if it was >> 2 (from the
      PoV of excessive (ab)use, fine-grained accounting seems to be
      currently unfeasible due to performance impact).
      
      Link: https://lore.kernel.org/lkml/90e254df-0dfe-f080-011e-b7c53ee7fd20@virtuozzo.com/
      Fixes: 18319498 ("memcg: enable accounting of ipc resources")
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarMichal Koutný <mkoutny@suse.com>
      Acked-by: default avatarShakeel Butt <shakeelb@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6a4746ba
    • Michael Ellerman's avatar
      powerpc/boot: Fix build failure since GCC 4.9 removal · 1619b69e
      Michael Ellerman authored
      Stephen reported that the build was broken since commit
      6d2ef226 ("compiler_attributes.h: drop __has_attribute() support for
      gcc4"), with errors such as:
      
        include/linux/compiler_attributes.h:296:5: warning: "__has_attribute" is not defined, evaluates to 0 [-Wundef]
          296 | #if __has_attribute(__warning__)
              |     ^~~~~~~~~~~~~~~
        make[2]: *** [arch/powerpc/boot/Makefile:225: arch/powerpc/boot/crt0.o] Error 1
      
      But we expect __has_attribute() to always be defined now that we've
      stopped using GCC 4.
      
      Linus debugged it to the point of reading the GCC sources, and noticing
      that the problem is that __has_attribute() is not defined when
      preprocessing assembly files, which is what we're doing here.
      
      Our assembly files don't include, or need, compiler_attributes.h, but
      they are getting it unconditionally from the -include in BOOT_CFLAGS,
      which is then added in its entirety to BOOT_AFLAGS.
      
      That -include was added in commit 77433830 ("powerpc: boot: include
      compiler_attributes.h") so that we'd have "fallthrough" and other
      attributes defined for the C files in arch/powerpc/boot. But it's not
      needed for assembly files.
      
      The minimal fix is to move the addition to BOOT_CFLAGS of -include
      compiler_attributes.h until after we've copied BOOT_CFLAGS into
      BOOT_AFLAGS. That avoids including compiler_attributes.h for asm files,
      but makes no other change to BOOT_CFLAGS or BOOT_AFLAGS.
      Reported-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Debugged-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1619b69e
    • Chris Wilson's avatar
      rtc: cmos: Disable irq around direct invocation of cmos_interrupt() · 13be2efc
      Chris Wilson authored
      As previously noted in commit 66e4f4a9 ("rtc: cmos: Use
      spin_lock_irqsave() in cmos_interrupt()"):
      
      <4>[  254.192378] WARNING: inconsistent lock state
      <4>[  254.192384] 5.12.0-rc1-CI-CI_DRM_9834+ #1 Not tainted
      <4>[  254.192396] --------------------------------
      <4>[  254.192400] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
      <4>[  254.192409] rtcwake/5309 [HC0[0]:SC0[0]:HE1:SE1] takes:
      <4>[  254.192429] ffffffff8263c5f8 (rtc_lock){?...}-{2:2}, at: cmos_interrupt+0x18/0x100
      <4>[  254.192481] {IN-HARDIRQ-W} state was registered at:
      <4>[  254.192488]   lock_acquire+0xd1/0x3d0
      <4>[  254.192504]   _raw_spin_lock+0x2a/0x40
      <4>[  254.192519]   cmos_interrupt+0x18/0x100
      <4>[  254.192536]   rtc_handler+0x1f/0xc0
      <4>[  254.192553]   acpi_ev_fixed_event_detect+0x109/0x13c
      <4>[  254.192574]   acpi_ev_sci_xrupt_handler+0xb/0x28
      <4>[  254.192596]   acpi_irq+0x13/0x30
      <4>[  254.192620]   __handle_irq_event_percpu+0x43/0x2c0
      <4>[  254.192641]   handle_irq_event_percpu+0x2b/0x70
      <4>[  254.192661]   handle_irq_event+0x2f/0x50
      <4>[  254.192680]   handle_fasteoi_irq+0x9e/0x150
      <4>[  254.192693]   __common_interrupt+0x76/0x140
      <4>[  254.192715]   common_interrupt+0x96/0xc0
      <4>[  254.192732]   asm_common_interrupt+0x1e/0x40
      <4>[  254.192750]   _raw_spin_unlock_irqrestore+0x38/0x60
      <4>[  254.192767]   resume_irqs+0xba/0xf0
      <4>[  254.192786]   dpm_resume_noirq+0x245/0x3d0
      <4>[  254.192811]   suspend_devices_and_enter+0x230/0xaa0
      <4>[  254.192835]   pm_suspend.cold.8+0x301/0x34a
      <4>[  254.192859]   state_store+0x7b/0xe0
      <4>[  254.192879]   kernfs_fop_write_iter+0x11d/0x1c0
      <4>[  254.192899]   new_sync_write+0x11d/0x1b0
      <4>[  254.192916]   vfs_write+0x265/0x390
      <4>[  254.192933]   ksys_write+0x5a/0xd0
      <4>[  254.192949]   do_syscall_64+0x33/0x80
      <4>[  254.192965]   entry_SYSCALL_64_after_hwframe+0x44/0xae
      <4>[  254.192986] irq event stamp: 43775
      <4>[  254.192994] hardirqs last  enabled at (43775): [<ffffffff81c00c42>] asm_sysvec_apic_timer_interrupt+0x12/0x20
      <4>[  254.193023] hardirqs last disabled at (43774): [<ffffffff81aa691a>] sysvec_apic_timer_interrupt+0xa/0xb0
      <4>[  254.193049] softirqs last  enabled at (42548): [<ffffffff81e00342>] __do_softirq+0x342/0x48e
      <4>[  254.193074] softirqs last disabled at (42543): [<ffffffff810b45fd>] irq_exit_rcu+0xad/0xd0
      <4>[  254.193101]
                        other info that might help us debug this:
      <4>[  254.193107]  Possible unsafe locking scenario:
      
      <4>[  254.193112]        CPU0
      <4>[  254.193117]        ----
      <4>[  254.193121]   lock(rtc_lock);
      <4>[  254.193137]   <Interrupt>
      <4>[  254.193142]     lock(rtc_lock);
      <4>[  254.193156]
                         *** DEADLOCK ***
      
      <4>[  254.193161] 6 locks held by rtcwake/5309:
      <4>[  254.193174]  #0: ffff888104861430 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x5a/0xd0
      <4>[  254.193232]  #1: ffff88810f823288 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xe7/0x1c0
      <4>[  254.193282]  #2: ffff888100cef3c0 (kn->active#285
      <7>[  254.192706] i915 0000:00:02.0: [drm:intel_modeset_setup_hw_state [i915]] [CRTC:51:pipe A] hw state readout: disabled
      <4>[  254.193307] ){.+.+}-{0:0}, at: kernfs_fop_write_iter+0xf0/0x1c0
      <4>[  254.193333]  #3: ffffffff82649fa8 (system_transition_mutex){+.+.}-{3:3}, at: pm_suspend.cold.8+0xce/0x34a
      <4>[  254.193387]  #4: ffffffff827a2108 (acpi_scan_lock){+.+.}-{3:3}, at: acpi_suspend_begin+0x47/0x70
      <4>[  254.193433]  #5: ffff8881019ea178 (&dev->mutex){....}-{3:3}, at: device_resume+0x68/0x1e0
      <4>[  254.193485]
                        stack backtrace:
      <4>[  254.193492] CPU: 1 PID: 5309 Comm: rtcwake Not tainted 5.12.0-rc1-CI-CI_DRM_9834+ #1
      <4>[  254.193514] Hardware name: Google Soraka/Soraka, BIOS MrChromebox-4.10 08/25/2019
      <4>[  254.193524] Call Trace:
      <4>[  254.193536]  dump_stack+0x7f/0xad
      <4>[  254.193567]  mark_lock.part.47+0x8ca/0xce0
      <4>[  254.193604]  __lock_acquire+0x39b/0x2590
      <4>[  254.193626]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
      <4>[  254.193660]  lock_acquire+0xd1/0x3d0
      <4>[  254.193677]  ? cmos_interrupt+0x18/0x100
      <4>[  254.193716]  _raw_spin_lock+0x2a/0x40
      <4>[  254.193735]  ? cmos_interrupt+0x18/0x100
      <4>[  254.193758]  cmos_interrupt+0x18/0x100
      <4>[  254.193785]  cmos_resume+0x2ac/0x2d0
      <4>[  254.193813]  ? acpi_pm_set_device_wakeup+0x1f/0x110
      <4>[  254.193842]  ? pnp_bus_suspend+0x10/0x10
      <4>[  254.193864]  pnp_bus_resume+0x5e/0x90
      <4>[  254.193885]  dpm_run_callback+0x5f/0x240
      <4>[  254.193914]  device_resume+0xb2/0x1e0
      <4>[  254.193942]  ? pm_dev_err+0x25/0x25
      <4>[  254.193974]  dpm_resume+0xea/0x3f0
      <4>[  254.194005]  dpm_resume_end+0x8/0x10
      <4>[  254.194030]  suspend_devices_and_enter+0x29b/0xaa0
      <4>[  254.194066]  pm_suspend.cold.8+0x301/0x34a
      <4>[  254.194094]  state_store+0x7b/0xe0
      <4>[  254.194124]  kernfs_fop_write_iter+0x11d/0x1c0
      <4>[  254.194151]  new_sync_write+0x11d/0x1b0
      <4>[  254.194183]  vfs_write+0x265/0x390
      <4>[  254.194207]  ksys_write+0x5a/0xd0
      <4>[  254.194232]  do_syscall_64+0x33/0x80
      <4>[  254.194251]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      <4>[  254.194274] RIP: 0033:0x7f07d79691e7
      <4>[  254.194293] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
      <4>[  254.194312] RSP: 002b:00007ffd9cc2c768 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      <4>[  254.194337] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f07d79691e7
      <4>[  254.194352] RDX: 0000000000000004 RSI: 0000556ebfc63590 RDI: 000000000000000b
      <4>[  254.194366] RBP: 0000556ebfc63590 R08: 0000000000000000 R09: 0000000000000004
      <4>[  254.194379] R10: 0000556ebf0ec2a6 R11: 0000000000000246 R12: 0000000000000004
      
      which breaks S3-resume on fi-kbl-soraka presumably as that's slow enough
      to trigger the alarm during the suspend.
      
      Fixes: 6950d046 ("rtc: cmos: Replace spin_lock_irqsave with spin_lock in hard IRQ")
      References: 66e4f4a9 ("rtc: cmos: Use spin_lock_irqsave() in cmos_interrupt()"):
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Xiaofei Tan <tanxiaofei@huawei.com>
      Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Reviewed-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: default avatarAlexandre Belloni <alexandre.belloni@bootlin.com>
      Link: https://lore.kernel.org/r/20210305122140.28774-1-chris@chris-wilson.co.uk
      13be2efc
  11. 13 Sep, 2021 8 commits