1. 09 Apr, 2021 8 commits
    • Gao Xiang's avatar
      erofs: support decompress big pcluster for lz4 backend · 598162d0
      Gao Xiang authored
      Prior to big pcluster, there was only one compressed page so it'd
      easy to map this. However, when big pcluster is enabled, more work
      needs to be done to handle multiple compressed pages. In detail,
      
       - (maptype 0) if there is only one compressed page + no need
         to copy inplace I/O, just map it directly what we did before;
      
       - (maptype 1) if there are more compressed pages + no need to
         copy inplace I/O, vmap such compressed pages instead;
      
       - (maptype 2) if inplace I/O needs to be copied, use per-CPU
         buffers for decompression then.
      
      Another thing is how to detect inplace decompression is feasable or
      not (it's still quite easy for non big pclusters), apart from the
      inplace margin calculation, inplace I/O page reusing order is also
      needed to be considered for each compressed page. Currently, if the
      compressed page is the xth page, it shouldn't be reused as [0 ...
      nrpages_out - nrpages_in + x], otherwise a full copy will be triggered.
      
      Although there are some extra optimization ideas for this, I'd like
      to make big pcluster work correctly first and obviously it can be
      further optimized later since it has nothing with the on-disk format
      at all.
      
      Link: https://lore.kernel.org/r/20210407043927.10623-10-xiang@kernel.orgAcked-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarGao Xiang <hsiangkao@redhat.com>
      598162d0
    • Gao Xiang's avatar
      erofs: support parsing big pcluster compact indexes · b86269f4
      Gao Xiang authored
      Different from non-compact indexes, several lclusters are packed
      as the compact form at once and an unique base blkaddr is stored for
      each pack, so each lcluster index would take less space on avarage
      (e.g. 2 bytes for COMPACT_2B.) btw, that is also why BIG_PCLUSTER
      switch should be consistent for compact head0/1.
      
      Prior to big pcluster, the size of all pclusters was 1 lcluster.
      Therefore, when a new HEAD lcluster was scanned, blkaddr would be
      bumped by 1 lcluster. However, that way doesn't work anymore for
      big pcluster since we actually don't know the compressed size of
      pclusters in advance (before reading CBLKCNT lcluster).
      
      So, instead, let blkaddr of each pack be the first pcluster blkaddr
      with a valid CBLKCNT, in detail,
      
       1) if CBLKCNT starts at the pack, this first valid pcluster is
          itself, e.g.
        _____________________________________________________________
       |_CBLKCNT0_|_NONHEAD_| .. |_HEAD_|_CBLKCNT1_| ... |_HEAD_| ...
       ^ = blkaddr base          ^ += CBLKCNT0           ^ += CBLKCNT1
      
       2) if CBLKCNT doesn't start at the pack, the first valid pcluster
          is the next pcluster, e.g.
        _________________________________________________________
       | NONHEAD_| .. |_HEAD_|_CBLKCNT0_| ... |_HEAD_|_HEAD_| ...
                      ^ = blkaddr base        ^ += CBLKCNT0
                                                     ^ += 1
      
      When a CBLKCNT is found, blkaddr will be increased by CBLKCNT
      lclusters, or a new HEAD is found immediately, bump blkaddr by 1
      instead (see the picture above.)
      
      Also noted if CBLKCNT is the end of the pack, instead of storing
      delta1 (distance of the next HEAD lcluster) as normal NONHEADs,
      it still uses the compressed block count (delta0) since delta1
      can be calculated indirectly but the block count can't.
      
      Adjust decoding logic to fit big pcluster compact indexes as well.
      
      Link: https://lore.kernel.org/r/20210407043927.10623-9-xiang@kernel.orgAcked-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarGao Xiang <hsiangkao@redhat.com>
      b86269f4
    • Gao Xiang's avatar
      erofs: support parsing big pcluster compress indexes · cec6e93b
      Gao Xiang authored
      When INCOMPAT_BIG_PCLUSTER sb feature is enabled, legacy compress indexes
      will also have the same on-disk header compact indexes to keep per-file
      configurations instead of leaving it zeroed.
      
      If ADVISE_BIG_PCLUSTER is set for a file, CBLKCNT will be loaded for each
      pcluster in this file by parsing 1st non-head lcluster.
      
      Link: https://lore.kernel.org/r/20210407043927.10623-8-xiang@kernel.orgAcked-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarGao Xiang <hsiangkao@redhat.com>
      cec6e93b
    • Gao Xiang's avatar
      erofs: adjust per-CPU buffers according to max_pclusterblks · 4fea63f7
      Gao Xiang authored
      Adjust per-CPU buffers on demand since big pcluster definition is
      available. Also, bail out unsupported pcluster size according to
      Z_EROFS_PCLUSTER_MAX_SIZE.
      
      Link: https://lore.kernel.org/r/20210407043927.10623-7-xiang@kernel.orgAcked-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarGao Xiang <hsiangkao@redhat.com>
      4fea63f7
    • Gao Xiang's avatar
      erofs: add big physical cluster definition · 5404c330
      Gao Xiang authored
      Big pcluster indicates the size of compressed data for each physical
      pcluster is no longer fixed as block size, but could be more than 1
      block (more accurately, 1 logical pcluster)
      
      When big pcluster feature is enabled for head0/1, delta0 of the 1st
      non-head lcluster index will keep block count of this pcluster in
      lcluster size instead of 1. Or, the compressed size of pcluster
      should be 1 lcluster if pcluster has no non-head lcluster index.
      
      Also note that BIG_PCLUSTER feature reuses COMPR_CFGS feature since
      it depends on COMPR_CFGS and will be released together.
      
      Link: https://lore.kernel.org/r/20210407043927.10623-6-xiang@kernel.orgAcked-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarGao Xiang <hsiangkao@redhat.com>
      5404c330
    • Gao Xiang's avatar
      erofs: fix up inplace I/O pointer for big pcluster · 81382f5f
      Gao Xiang authored
      When picking up inplace I/O pages, it should be traversed in reverse
      order in aligned with the traversal order of file-backed online pages.
      Also, index should be updated together when preloading compressed pages.
      
      Previously, only page-sized pclustersize was supported so no problem
      at all. Also rename `compressedpages' to `icpage_ptr' to reflect its
      functionality.
      
      Link: https://lore.kernel.org/r/20210407043927.10623-5-xiang@kernel.orgAcked-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarGao Xiang <hsiangkao@redhat.com>
      81382f5f
    • Gao Xiang's avatar
      erofs: introduce physical cluster slab pools · 9f6cc76e
      Gao Xiang authored
      Since multiple pcluster sizes could be used at once, the number of
      compressed pages will become a variable factor. It's necessary to
      introduce slab pools rather than a single slab cache now.
      
      This limits the pclustersize to 1M (Z_EROFS_PCLUSTER_MAX_SIZE), and
      get rid of the obsolete EROFS_FS_CLUSTER_PAGE_LIMIT, which has no
      use now.
      
      Link: https://lore.kernel.org/r/20210407043927.10623-4-xiang@kernel.orgAcked-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarGao Xiang <hsiangkao@redhat.com>
      9f6cc76e
    • Gao Xiang's avatar
      erofs: introduce multipage per-CPU buffers · 52488734
      Gao Xiang authored
      To deal the with the cases which inplace decompression is infeasible
      for some inplace I/O. Per-CPU buffers was introduced to get rid of page
      allocation latency and thrash for low-latency decompression algorithms
      such as lz4.
      
      For the big pcluster feature, introduce multipage per-CPU buffers to
      keep such inplace I/O pclusters temporarily as well but note that
      per-CPU pages are just consecutive virtually.
      
      When a new big pcluster fs is mounted, its max pclustersize will be
      read and per-CPU buffers can be growed if needed. Shrinking adjustable
      per-CPU buffers is more complex (because we don't know if such size
      is still be used), so currently just release them all when unloading.
      
      Link: https://lore.kernel.org/r/20210409190630.19569-1-xiang@kernel.orgAcked-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarGao Xiang <hsiangkao@redhat.com>
      52488734
  2. 07 Apr, 2021 1 commit
    • Gao Xiang's avatar
      erofs: reserve physical_clusterbits[] · 54e0b6c8
      Gao Xiang authored
      Formal big pcluster design is actually more powerful / flexable than
      the previous thought whose pclustersize was fixed as power-of-2 blocks,
      which was obviously inefficient and space-wasting. Instead, pclustersize
      can now be set independently for each pcluster, so various pcluster
      sizes can also be used together in one file if mkfs wants (for example,
      according to data type and/or compression ratio).
      
      Let's get rid of previous physical_clusterbits[] setting (also notice
      that corresponding on-disk fields are still 0 for now). Therefore,
      head1/2 can be used for at most 2 different algorithms in one file and
      again pclustersize is now independent of these.
      
      Link: https://lore.kernel.org/r/20210407043927.10623-2-xiang@kernel.orgAcked-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarGao Xiang <hsiangkao@redhat.com>
      54e0b6c8
  3. 03 Apr, 2021 1 commit
  4. 29 Mar, 2021 10 commits
  5. 28 Mar, 2021 9 commits
    • Linus Torvalds's avatar
      Linux 5.12-rc5 · a5e13c6d
      Linus Torvalds authored
      a5e13c6d
    • Linus Torvalds's avatar
      Merge tag 'perf-tools-fixes-for-v5.12-2020-03-28' of... · f9e2bb42
      Linus Torvalds authored
      Merge tag 'perf-tools-fixes-for-v5.12-2020-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
      
      Pull perf tooling fixes from Arnaldo Carvalho de Melo:
      
       - Avoid write of uninitialized memory when generating PERF_RECORD_MMAP*
         records.
      
       - Fix 'perf top' BPF support related crash with perf_event_paranoid=3 +
         kptr_restrict.
      
       - Validate raw event with sysfs exported format bits.
      
       - Fix waipid on SIGCHLD delivery bugs in 'perf daemon'.
      
       - Change to use bash for daemon test on Debian, where the default is
         dash and thus fails for use of bashisms in this test.
      
       - Fix memory leak in vDSO found using ASAN.
      
       - Remove now useless (due to the fact that BPF now supports static
         vars) failing sub test "BPF relocation checker".
      
       - Fix auxtrace queue conflict.
      
       - Sync linux/kvm.h with the kernel sources.
      
      * tag 'perf-tools-fixes-for-v5.12-2020-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
        perf test: Change to use bash for daemon test
        perf record: Fix memory leak in vDSO found using ASAN
        perf test: Remove now useless failing sub test "BPF relocation checker"
        perf daemon: Return from kill functions
        perf daemon: Force waipid for all session on SIGCHLD delivery
        perf top: Fix BPF support related crash with perf_event_paranoid=3 + kptr_restrict
        perf pmu: Validate raw event with sysfs exported format bits
        perf synthetic events: Avoid write of uninitialized memory when generating PERF_RECORD_MMAP* records
        tools headers UAPI: Sync linux/kvm.h with the kernel sources
        perf synthetic-events: Fix uninitialized 'kernel_thread' variable
        perf auxtrace: Fix auxtrace queue conflict
      f9e2bb42
    • Linus Torvalds's avatar
      Merge tag 'auxdisplay-for-linus-v5.12-rc6' of git://github.com/ojeda/linux · 3fef15f8
      Linus Torvalds authored
      Pull auxdisplay fix from Miguel Ojeda:
       "Remove in_interrupt() usage (Sebastian Andrzej Siewior)"
      
      * tag 'auxdisplay-for-linus-v5.12-rc6' of git://github.com/ojeda/linux:
        auxdisplay: Remove in_interrupt() usage.
      3fef15f8
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2021-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 36a14638
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
       "Two fixes:
      
         - Fix build failure on Ubuntu with new GCC packages that turn
           on -fcf-protection
      
         - Fix SME memory encryption PTE encoding bug - AFAICT the code
           worked on 4K page sizes (level 1) but had the wrong shift at
           higher page level orders (level 2 and higher)"
      
      * tag 'x86-urgent-2021-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/build: Turn off -fcf-protection for realmode targets
        x86/mem_encrypt: Correct physical address calculation in __set_clr_pte_enc()
      36a14638
    • Linus Torvalds's avatar
      Merge tag 'locking-urgent-2021-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 47fbbc94
      Linus Torvalds authored
      Pull locking fix from Ingo Molnar:
       "Fix the non-debug mutex_lock_io_nested() method to map to
        mutex_lock_io() instead of mutex_lock().
      
        Right now nothing uses this API explicitly, but this is an
        accident waiting to happen"
      
      * tag 'locking-urgent-2021-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        locking/mutex: Fix non debug version of mutex_lock_io_nested()
      47fbbc94
    • Linus Torvalds's avatar
      Merge tag '5.12-rc4-smb3' of git://git.samba.org/sfrench/cifs-2.6 · 81b1d39f
      Linus Torvalds authored
      Pull cifs fixes from Steve French:
       "Five cifs/smb3 fixes, two for stable.
      
        Includes an important fix for encryption and an ACL fix, as well as a
        fix for possible reflink data corruption"
      
      * tag '5.12-rc4-smb3' of git://git.samba.org/sfrench/cifs-2.6:
        smb3: fix cached file size problems in duplicate extents (reflink)
        cifs: Silently ignore unknown oplock break handle
        cifs: revalidate mapping when we open files for SMB1 POSIX
        cifs: Fix chmod with modefromsid when an older ACE already exists.
        cifs: Adjust key sizes and key generation routines for AES256 encryption
      81b1d39f
    • Linus Torvalds's avatar
      Merge tag 'io_uring-5.12-2021-03-27' of git://git.kernel.dk/linux-block · b44d1ddc
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
      
       - Use thread info versions of flag testing, as discussed last week.
      
       - The series enabling PF_IO_WORKER to just take signals, instead of
         needing to special case that they do not in a bunch of places. Ends
         up being pretty trivial to do, and then we can revert all the special
         casing we're currently doing.
      
       - Kill dead pointer assignment
      
       - Fix hashed part of async work queue trace
      
       - Fix sign extension issue for IORING_OP_PROVIDE_BUFFERS
      
       - Fix a link completion ordering regression in this merge window
      
       - Cancellation fixes
      
      * tag 'io_uring-5.12-2021-03-27' of git://git.kernel.dk/linux-block:
        io_uring: remove unsued assignment to pointer io
        io_uring: don't cancel extra on files match
        io_uring: don't cancel-track common timeouts
        io_uring: do post-completion chore on t-out cancel
        io_uring: fix timeout cancel return code
        Revert "signal: don't allow STOP on PF_IO_WORKER threads"
        Revert "kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing"
        Revert "kernel: treat PF_IO_WORKER like PF_KTHREAD for ptrace/signals"
        Revert "signal: don't allow sending any signals to PF_IO_WORKER threads"
        kernel: stop masking signals in create_io_thread()
        io_uring: handle signals for IO threads like a normal thread
        kernel: don't call do_exit() for PF_IO_WORKER threads
        io_uring: maintain CQE order of a failed link
        io-wq: fix race around pending work on teardown
        io_uring: do ctx sqd ejection in a clear context
        io_uring: fix provide_buffers sign extension
        io_uring: don't skip file_end_write() on reissue
        io_uring: correct io_queue_async_work() traces
        io_uring: don't use {test,clear}_tsk_thread_flag() for current
      b44d1ddc
    • Linus Torvalds's avatar
      Merge tag 'block-5.12-2021-03-27' of git://git.kernel.dk/linux-block · abed516e
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
      
       - Fix regression from this merge window with the xarray partition
         change, which allowed partition counts that overflow the u8 that
         holds the partition number (Ming)
      
       - Fix zone append warning (Johannes)
      
       - Segmentation count fix for multipage bvecs (David)
      
       - Partition scan fix (Chris)
      
      * tag 'block-5.12-2021-03-27' of git://git.kernel.dk/linux-block:
        block: don't create too many partitions
        block: support zone append bvecs
        block: recalculate segment count for multi-segment discards correctly
        block: clear GD_NEED_PART_SCAN later in bdev_disk_changed
      abed516e
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · e8cfe8fa
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Seven fixes, all in drivers (qla2xxx, mkt3sas, qedi, target,
        ibmvscsi).
      
        The most serious are the target pscsi oom and the qla2xxx revert which
        can otherwise cause a use after free"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: target: pscsi: Clean up after failure in pscsi_map_sg()
        scsi: target: pscsi: Avoid OOM in pscsi_map_sg()
        scsi: mpt3sas: Fix error return code of mpt3sas_base_attach()
        scsi: qedi: Fix error return code of qedi_alloc_global_queues()
        scsi: Revert "qla2xxx: Make sure that aborted commands are freed"
        scsi: ibmvfc: Make ibmvfc_wait_for_ops() MQ aware
        scsi: ibmvfc: Fix potential race in ibmvfc_wait_for_ops()
      e8cfe8fa
  6. 27 Mar, 2021 11 commits