• Linus Torvalds's avatar
    Merge tag 'zstd-for-linus-v5.16' of git://github.com/terrelln/linux · c8c10954
    Linus Torvalds authored
    Pull zstd update from Nick Terrell:
     "Update to zstd-1.4.10.
    
      Add myself as the maintainer of zstd and update the zstd version in
      the kernel, which is now 4 years out of date, to a much more recent
      zstd release. This includes bug fixes, much more extensive fuzzing,
      and performance improvements. And generates the kernel zstd
      automatically from upstream zstd, so it is easier to keep the zstd
      verison up to date, and we don't fall so far out of date again.
    
      This includes 5 commits that update the zstd library version:
    
       - Adds a new kernel-style wrapper around zstd.
    
         This wrapper API is functionally equivalent to the subset of the
         current zstd API that is currently used. The wrapper API changes to
         be kernel style so that the symbols don't collide with zstd's
         symbols. The update to zstd-1.4.10 maintains the same API and
         preserves the semantics, so that none of the callers need to be
         updated. All callers are updated in the commit, because there are
         zero functional changes.
    
       - Adds an indirection for `lib/decompress_unzstd.c` so it doesn't
         depend on the layout of `lib/zstd/` to include every source file.
         This allows the next patch to be automatically generated.
    
       - Imports the zstd-1.4.10 source code. This commit is automatically
         generated from upstream zstd (https://github.com/facebook/zstd).
    
       - Adds me (terrelln@fb.com) as the maintainer of `lib/zstd`.
    
       - Fixes a newly added build warning for clang.
    
      The discussion around this patchset has been pretty long, so I've
      included a FAQ-style summary of the history of the patchset, and why
      we are taking this approach.
    
      Why do we need to update?
      -------------------------
    
      The zstd version in the kernel is based off of zstd-1.3.1, which is
      was released August 20, 2017. Since then zstd has seen many bug fixes
      and performance improvements. And, importantly, upstream zstd is
      continuously fuzzed by OSS-Fuzz, and bug fixes aren't backported to
      older versions. So the only way to sanely get these fixes is to keep
      up to date with upstream zstd.
    
      There are no known security issues that affect the kernel, but we need
      to be able to update in case there are. And while there are no known
      security issues, there are relevant bug fixes. For example the problem
      with large kernel decompression has been fixed upstream for over 2
      years [1]
    
      Additionally the performance improvements for kernel use cases are
      significant. Measured for x86_64 on my Intel i9-9900k @ 3.6 GHz:
    
       - BtrFS zstd compression at levels 1 and 3 is 5% faster
    
       - BtrFS zstd decompression+read is 15% faster
    
       - SquashFS zstd decompression+read is 15% faster
    
       - F2FS zstd compression+write at level 3 is 8% faster
    
       - F2FS zstd decompression+read is 20% faster
    
       - ZRAM decompression+read is 30% faster
    
       - Kernel zstd decompression is 35% faster
    
       - Initramfs zstd decompression+build is 5% faster
    
      On top of this, there are significant performance improvements coming
      down the line in the next zstd release, and the new automated update
      patch generation will allow us to pull them easily.
    
      How is the update patch generated?
      ----------------------------------
    
      The first two patches are preparation for updating the zstd version.
      Then the 3rd patch in the series imports upstream zstd into the
      kernel. This patch is automatically generated from upstream. A script
      makes the necessary changes and imports it into the kernel. The
      changes are:
    
       - Replace all libc dependencies with kernel replacements and rewrite
         includes.
    
       - Remove unncessary portability macros like: #if defined(_MSC_VER).
    
       - Use the kernel xxhash instead of bundling it.
    
      This automation gets tested every commit by upstream's continuous
      integration. When we cut a new zstd release, we will submit a patch to
      the kernel to update the zstd version in the kernel.
    
      The automated process makes it easy to keep the kernel version of zstd
      up to date. The current zstd in the kernel shares the guts of the
      code, but has a lot of API and minor changes to work in the kernel.
      This is because at the time upstream zstd was not ready to be used in
      the kernel envrionment as-is. But, since then upstream zstd has
      evolved to support being used in the kernel as-is.
    
      Why are we updating in one big patch?
      -------------------------------------
    
      The 3rd patch in the series is very large. This is because it is
      restructuring the code, so it both deletes the existing zstd, and
      re-adds the new structure. Future updates will be directly
      proportional to the changes in upstream zstd since the last import.
      They will admittidly be large, as zstd is an actively developed
      project, and has hundreds of commits between every release. However,
      there is no other great alternative.
    
      One option ruled out is to replay every upstream zstd commit. This is
      not feasible for several reasons:
    
       - There are over 3500 upstream commits since the zstd version in the
         kernel.
    
       - The automation to automatically generate the kernel update was only
         added recently, so older commits cannot easily be imported.
    
       - Not every upstream zstd commit builds.
    
       - Only zstd releases are "supported", and individual commits may have
         bugs that were fixed before a release.
    
      Another option to reduce the patch size would be to first reorganize
      to the new file structure, and then apply the patch. However, the
      current kernel zstd is formatted with clang-format to be more
      "kernel-like". But, the new method imports zstd as-is, without
      additional formatting, to allow for closer correlation with upstream,
      and easier debugging. So the patch wouldn't be any smaller.
    
      It also doesn't make sense to import upstream zstd commit by commit
      going forward. Upstream zstd doesn't support production use cases
      running of the development branch. We have a lot of post-commit
      fuzzing that catches many bugs, so indiviudal commits may be buggy,
      but fixed before a release. So going forward, I intend to import every
      (important) zstd release into the Kernel.
    
      So, while it isn't ideal, updating in one big patch is the only patch
      I see forward.
    
      Who is responsible for this code?
      ---------------------------------
    
      I am. This patchset adds me as the maintainer for zstd. Previously,
      there was no tree for zstd patches. Because of that, there were
      several patches that either got ignored, or took a long time to merge,
      since it wasn't clear which tree should pick them up. I'm officially
      stepping up as maintainer, and setting up my tree as the path through
      which zstd patches get merged. I'll make sure that patches to the
      kernel zstd get ported upstream, so they aren't erased when the next
      version update happens.
    
      How is this code tested?
      ------------------------
    
      I tested every caller of zstd on x86_64 (BtrFS, ZRAM, SquashFS, F2FS,
      Kernel, InitRAMFS). I also tested Kernel & InitRAMFS on i386 and
      aarch64. I checked both performance and correctness.
    
      Also, thanks to many people in the community who have tested these
      patches locally.
    
      Lastly, this code will bake in linux-next before being merged into
      v5.16.
    
      Why update to zstd-1.4.10 when zstd-1.5.0 has been released?
      ------------------------------------------------------------
    
      This patchset has been outstanding since 2020, and zstd-1.4.10 was the
      latest release when it was created. Since the update patch is
      automatically generated from upstream, I could generate it from
      zstd-1.5.0.
    
      However, there were some large stack usage regressions in zstd-1.5.0,
      and are only fixed in the latest development branch. And the latest
      development branch contains some new code that needs to bake in the
      fuzzer before I would feel comfortable releasing to the kernel.
    
      Once this patchset has been merged, and we've released zstd-1.5.1, we
      can update the kernel to zstd-1.5.1, and exercise the update process.
    
      You may notice that zstd-1.4.10 doesn't exist upstream. This release
      is an artifical release based off of zstd-1.4.9, with some fixes for
      the kernel backported from the development branch. I will tag the
      zstd-1.4.10 release after this patchset is merged, so the Linux Kernel
      is running a known version of zstd that can be debugged upstream.
    
      Why was a wrapper API added?
      ----------------------------
    
      The first versions of this patchset migrated the kernel to the
      upstream zstd API. It first added a shim API that supported the new
      upstream API with the old code, then updated callers to use the new
      shim API, then transitioned to the new code and deleted the shim API.
      However, Cristoph Hellwig suggested that we transition to a kernel
      style API, and hide zstd's upstream API behind that. This is because
      zstd's upstream API is supports many other use cases, and does not
      follow the kernel style guide, while the kernel API is focused on the
      kernel's use cases, and follows the kernel style guide.
    
      Where is the previous discussion?
      ---------------------------------
    
      Links for the discussions of the previous versions of the patch set
      below. The largest changes in the design of the patchset are driven by
      the discussions in v11, v5, and v1. Sorry for the mix of links, I
      couldn't find most of the the threads on lkml.org"
    
    Link: https://lkml.org/lkml/2020/9/29/27 [1]
    Link: https://www.spinics.net/lists/linux-crypto/msg58189.html [v12]
    Link: https://lore.kernel.org/linux-btrfs/20210430013157.747152-1-nickrterrell@gmail.com/ [v11]
    Link: https://lore.kernel.org/lkml/20210426234621.870684-2-nickrterrell@gmail.com/ [v10]
    Link: https://lore.kernel.org/linux-btrfs/20210330225112.496213-1-nickrterrell@gmail.com/ [v9]
    Link: https://lore.kernel.org/linux-f2fs-devel/20210326191859.1542272-1-nickrterrell@gmail.com/ [v8]
    Link: https://lkml.org/lkml/2020/12/3/1195 [v7]
    Link: https://lkml.org/lkml/2020/12/2/1245 [v6]
    Link: https://lore.kernel.org/linux-btrfs/20200916034307.2092020-1-nickrterrell@gmail.com/ [v5]
    Link: https://www.spinics.net/lists/linux-btrfs/msg105783.html [v4]
    Link: https://lkml.org/lkml/2020/9/23/1074 [v3]
    Link: https://www.spinics.net/lists/linux-btrfs/msg105505.html [v2]
    Link: https://lore.kernel.org/linux-btrfs/20200916034307.2092020-1-nickrterrell@gmail.com/ [v1]
    Signed-off-by: default avatarNick Terrell <terrelln@fb.com>
    Tested By: Paul Jones <paul@pauljones.id.au>
    Tested-by: default avatarOleksandr Natalenko <oleksandr@natalenko.name>
    Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM/Clang v13.0.0 on x86-64
    Tested-by: default avatarJean-Denis Girard <jd.girard@sysnux.pf>
    
    * tag 'zstd-for-linus-v5.16' of git://github.com/terrelln/linux:
      lib: zstd: Add cast to silence clang's -Wbitwise-instead-of-logical
      MAINTAINERS: Add maintainer entry for zstd
      lib: zstd: Upgrade to latest upstream zstd version 1.4.10
      lib: zstd: Add decompress_sources.h for decompress_unzstd
      lib: zstd: Add kernel-specific API
    c8c10954
compress.c 46.1 KB