1. 15 Jan, 2020 9 commits
    • Anatol Pomazau's avatar
      dm mpath: Add timeout mechanism for queue_if_no_path · be240ff5
      Anatol Pomazau authored
      Add a configurable timeout mechanism to disable queue_if_no_path without
      assistance from userspace multipathd.  This reimplements multipathd's
      no_path_retry mechanism in kernel space.  This is motivated by the
      desire to prevent processes from hanging indefinitely waiting for IO
      in cases where multipathd might be unable to respond (after a failure
      or for whatever reason).
      
      Despite replicating userspace multipathd's policy configuration in
      kernel space, it is important to prevent IOs from hanging forever,
      waiting for userspace that may be incapable of behaving correctly.
      
      Use of the provided "queue_if_no_path_timeout_secs" dm-multipath
      module parameter is optional.  This timeout mechanism is disabled by
      default (by being set to 0).
      Signed-off-by: default avatarAnatol Pomazau <anatol@google.com>
      Co-developed-by: default avatarGabriel Krisman Bertazi <krisman@collabora.com>
      Signed-off-by: default avatarGabriel Krisman Bertazi <krisman@collabora.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      be240ff5
    • Mikulas Patocka's avatar
      dm thin: change data device's flush_bio to be member of struct pool · f06c03d1
      Mikulas Patocka authored
      With commit fe64369163c5 ("dm thin: don't allow changing data device
      during thin-pool load") it is now possible to re-parent the data
      device's flush_bio from the pool_c to pool structure.  Doing so offers
      improved lifetime guarantees for the flush_bio so that the call to
      dm_pool_register_pre_commit_callback can now be done safely from
      pool_ctr().
      
      Depends-on: fe64369163c5 ("dm thin: don't allow changing data device during thin-pool load")
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      f06c03d1
    • Mikulas Patocka's avatar
      dm thin: don't allow changing data device during thin-pool reload · 873937e7
      Mikulas Patocka authored
      The existing code allows changing the data device when the thin-pool
      target is reloaded.
      
      This capability is not required and only complicates device lifetime
      guarantees. This can cause crashes like the one reported here:
      	https://bugzilla.redhat.com/show_bug.cgi?id=1788596
      where the kernel tries to issue a flush bio located in a structure that
      was already freed.
      
      Take the first step to simplifying the thin-pool's data device lifetime
      by disallowing changing it. Like the thin-pool's metadata device, the
      data device is now set in pool_create() and it cannot be changed for a
      given thin-pool.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      873937e7
    • Mike Snitzer's avatar
      dm thin: fix use-after-free in metadata_pre_commit_callback · a4a8d286
      Mike Snitzer authored
      dm-thin uses struct pool to hold the state of the pool. There may be
      multiple pool_c's pointing to a given pool, each pool_c represents a
      loaded target. pool_c's may be created and destroyed arbitrarily and the
      pool contains a reference count of pool_c's pointing to it.
      
      Since commit 694cfe7f ("dm thin: Flush data device before
      committing metadata") a pointer to pool_c is passed to
      dm_pool_register_pre_commit_callback and this function stores it in
      pmd->pre_commit_context. If this pool_c is freed, but pool is not
      (because there is another pool_c referencing it), we end up in a
      situation where pmd->pre_commit_context structure points to freed
      pool_c. It causes a crash in metadata_pre_commit_callback.
      
      Fix this by moving the dm_pool_register_pre_commit_callback() from
      pool_ctr() to pool_preresume(). This way the in-core thin-pool metadata
      is only ever armed with callback data whose lifetime matches the
      active thin-pool target.
      
      In should be noted that this fix preserves the ability to load a
      thin-pool table that uses a different data block device (that contains
      the same data) -- though it is unclear if that capability is still
      useful and/or needed.
      
      Fixes: 694cfe7f ("dm thin: Flush data device before committing metadata")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarZdenek Kabelac <zkabelac@redhat.com>
      Reported-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      a4a8d286
    • Mike Snitzer's avatar
      dm thin metadata: use pool locking at end of dm_pool_metadata_close · 44d8ebf4
      Mike Snitzer authored
      Ensure that the pool is locked during calls to __commit_transaction and
      __destroy_persistent_data_objects.  Just being consistent with locking,
      but reality is dm_pool_metadata_close is called once pool is being
      destroyed so access to pool shouldn't be contended.
      
      Also, use pmd_write_lock_in_core rather than __pmd_write_lock in
      dm_pool_commit_metadata and rename __pmd_write_lock to
      pmd_write_lock_in_core -- there was no need for the alias.
      
      In addition, verify that the pool is locked in __commit_transaction().
      
      Fixes: 873f258b ("dm thin metadata: do not write metadata if no changes occurred")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      44d8ebf4
    • Mikulas Patocka's avatar
      dm writecache: fix incorrect flush sequence when doing SSD mode commit · aa950920
      Mikulas Patocka authored
      When committing state, the function writecache_flush does the following:
      1. write metadata (writecache_commit_flushed)
      2. flush disk cache (writecache_commit_flushed)
      3. wait for data writes to complete (writecache_wait_for_ios)
      4. increase superblock seq_count
      5. write the superblock
      6. flush disk cache
      
      It may happen that at step 3, when we wait for some write to finish, the
      disk may report the write as finished, but the write only hit the disk
      cache and it is not yet stored in persistent storage. At step 5 we write
      the superblock - it may happen that the superblock is written before the
      write that we waited for in step 3. If the machine crashes, it may result
      in incorrect data being returned after reboot.
      
      In order to fix the bug, we must swap steps 2 and 3 in the above sequence,
      so that we first wait for writes to complete and then flush the disk
      cache.
      
      Fixes: 48debafe ("dm: add writecache target")
      Cc: stable@vger.kernel.org # 4.18+
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      aa950920
    • Milan Broz's avatar
      dm crypt: fix benbi IV constructor crash if used in authenticated mode · 4ea9471f
      Milan Broz authored
      If benbi IV is used in AEAD construction, for example:
        cryptsetup luksFormat <device> --cipher twofish-xts-benbi --key-size 512 --integrity=hmac-sha256
      the constructor uses wrong skcipher function and crashes:
      
       BUG: kernel NULL pointer dereference, address: 00000014
       ...
       EIP: crypt_iv_benbi_ctr+0x15/0x70 [dm_crypt]
       Call Trace:
        ? crypt_subkey_size+0x20/0x20 [dm_crypt]
        crypt_ctr+0x567/0xfc0 [dm_crypt]
        dm_table_add_target+0x15f/0x340 [dm_mod]
      
      Fix this by properly using crypt_aead_blocksize() in this case.
      
      Fixes: ef43aa38 ("dm crypt: add cryptographic data integrity protection (authenticated encryption)")
      Cc: stable@vger.kernel.org # v4.12+
      Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=941051Reported-by: default avatarJerad Simpson <jbsimpson@gmail.com>
      Signed-off-by: default avatarMilan Broz <gmazyland@gmail.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      4ea9471f
    • Milan Broz's avatar
      dm crypt: Implement Elephant diffuser for Bitlocker compatibility · bbb16584
      Milan Broz authored
      Add experimental support for BitLocker encryption with CBC mode and
      additional Elephant diffuser.
      
      The mode was used in older Windows systems and it is provided mainly
      for compatibility reasons. The userspace support to activate these
      devices is being added to cryptsetup utility.
      
      Read-write activation of such a device is very simple, for example:
        echo <password> | cryptsetup bitlkOpen bitlk_image.img test
      
      The Elephant diffuser uses two rotations in opposite direction for
      data (Diffuser A and B) and also XOR operation with Sector key over
      the sector data; Sector key is derived from additional key data. The
      original public documentation is available here:
        http://download.microsoft.com/download/0/2/3/0238acaf-d3bf-4a6d-b3d6-0a0be4bbb36e/bitlockercipher200608.pdf
      
      The dm-crypt implementation is embedded to "elephant" IV (similar to
      tcw IV construction).
      
      Because we cannot modify original bio data for write (before
      encryption), an additional internal flag to pre-process data is
      added.
      Signed-off-by: default avatarMilan Broz <gmazyland@gmail.com>
      Reviewed-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      bbb16584
    • Joe Thornber's avatar
      dm space map common: fix to ensure new block isn't already in use · 4feaef83
      Joe Thornber authored
      The space-maps track the reference counts for disk blocks allocated by
      both the thin-provisioning and cache targets.  There are variants for
      tracking metadata blocks and data blocks.
      
      Transactionality is implemented by never touching blocks from the
      previous transaction, so we can rollback in the event of a crash.
      
      When allocating a new block we need to ensure the block is free (has
      reference count of 0) in both the current and previous transaction.
      Prior to this fix we were doing this by searching for a free block in
      the previous transaction, and relying on a 'begin' counter to track
      where the last allocation in the current transaction was.  This
      'begin' field was not being updated in all code paths (eg, increment
      of a data block reference count due to breaking sharing of a neighbour
      block in the same btree leaf).
      
      This fix keeps the 'begin' field, but now it's just a hint to speed up
      the search.  Instead the current transaction is searched for a free
      block, and then the old transaction is double checked to ensure it's
      free.  Much simpler.
      
      This fixes reports of sm_disk_new_block()'s BUG_ON() triggering when
      DM thin-provisioning's snapshots are heavily used.
      Reported-by: default avatarEric Wheeler <dm-devel@lists.ewheeler.net>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      4feaef83
  2. 07 Jan, 2020 10 commits
  3. 05 Jan, 2020 7 commits
    • Linus Torvalds's avatar
      Linux 5.5-rc5 · c79f46a2
      Linus Torvalds authored
      c79f46a2
    • Linus Torvalds's avatar
      Merge tag 'riscv/for-v5.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · 768fc661
      Linus Torvalds authored
      Pull RISC-V fixes from Paul Walmsley:
       "Several fixes for RISC-V:
      
         - Fix function graph trace support
      
         - Prefix the CSR IRQ_* macro names with "RV_", to avoid collisions
           with macros elsewhere in the Linux kernel tree named "IRQ_TIMER"
      
         - Use __pa_symbol() when computing the physical address of a kernel
           symbol, rather than __pa()
      
         - Mark the RISC-V port as supporting GCOV
      
        One DT addition:
      
         - Describe the L2 cache controller in the FU540 DT file
      
        One documentation update:
      
         - Add patch acceptance guideline documentation"
      
      * tag 'riscv/for-v5.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        Documentation: riscv: add patch acceptance guidelines
        riscv: prefix IRQ_ macro names with an RV_ namespace
        clocksource: riscv: add notrace to riscv_sched_clock
        riscv: ftrace: correct the condition logic in function graph tracer
        riscv: dts: Add DT support for SiFive L2 cache controller
        riscv: gcov: enable gcov for RISC-V
        riscv: mm: use __pa_symbol for kernel symbols
      768fc661
    • Paul Walmsley's avatar
      Documentation: riscv: add patch acceptance guidelines · 0e194d9d
      Paul Walmsley authored
      Formalize, in kernel documentation, the patch acceptance policy for
      arch/riscv.  In summary, it states that as maintainers, we plan to
      only accept patches for new modules or extensions that have been
      frozen or ratified by the RISC-V Foundation.
      
      We've been following these guidelines for the past few months.  In the
      meantime, we've received quite a bit of feedback that it would be
      helpful to have these guidelines formally documented.
      
      Based on a suggestion from Matthew Wilcox, we also add a link to this
      file to Documentation/process/index.rst, to make this document easier
      to find.  The format of this document has also been changed to align
      to the format outlined in the maintainer entry profiles, in accordance
      with comments from Jon Corbet and Dan Williams.
      Signed-off-by: default avatarPaul Walmsley <paul.walmsley@sifive.com>
      Reviewed-by: default avatarPalmer Dabbelt <palmerdabbelt@google.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Krste Asanovic <krste@berkeley.edu>
      Cc: Andrew Waterman <waterman@eecs.berkeley.edu>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      0e194d9d
    • Paul Walmsley's avatar
      riscv: prefix IRQ_ macro names with an RV_ namespace · 2f3035da
      Paul Walmsley authored
      "IRQ_TIMER", used in the arch/riscv CSR header file, is a sufficiently
      generic macro name that it's used by several source files across the
      Linux code base.  Some of these other files ultimately include the
      arch/riscv CSR include file, causing collisions.  Fix by prefixing the
      RISC-V csr.h IRQ_ macro names with an RV_ prefix.
      
      Fixes: a4c3733d ("riscv: abstract out CSR names for supervisor vs machine mode")
      Reported-by: default avatarOlof Johansson <olof@lixom.net>
      Acked-by: default avatarOlof Johansson <olof@lixom.net>
      Signed-off-by: default avatarPaul Walmsley <paul.walmsley@sifive.com>
      2f3035da
    • Zong Li's avatar
      clocksource: riscv: add notrace to riscv_sched_clock · 9d05c18e
      Zong Li authored
      When enabling ftrace graph tracer, it gets the tracing clock in
      ftrace_push_return_trace().  Eventually, it invokes riscv_sched_clock()
      to get the clock value.  If riscv_sched_clock() isn't marked with
      'notrace', it will call ftrace_push_return_trace() and cause infinite
      loop.
      
      The result of failure as follow:
      
      command: echo function_graph >current_tracer
      [   46.176787] Unable to handle kernel paging request at virtual address ffffffe04fb38c48
      [   46.177309] Oops [#1]
      [   46.177478] Modules linked in:
      [   46.177770] CPU: 0 PID: 256 Comm: $d Not tainted 5.5.0-rc1 #47
      [   46.177981] epc: ffffffe00035e59a ra : ffffffe00035e57e sp : ffffffe03a7569b0
      [   46.178216]  gp : ffffffe000d29b90 tp : ffffffe03a756180 t0 : ffffffe03a756968
      [   46.178430]  t1 : ffffffe00087f408 t2 : ffffffe03a7569a0 s0 : ffffffe03a7569f0
      [   46.178643]  s1 : ffffffe00087f408 a0 : 0000000ac054cda4 a1 : 000000000087f411
      [   46.178856]  a2 : 0000000ac054cda4 a3 : 0000000000373ca0 a4 : ffffffe04fb38c48
      [   46.179099]  a5 : 00000000153e22a8 a6 : 00000000005522ff a7 : 0000000000000005
      [   46.179338]  s2 : ffffffe03a756a90 s3 : ffffffe00032811c s4 : ffffffe03a756a58
      [   46.179570]  s5 : ffffffe000d29fe0 s6 : 0000000000000001 s7 : 0000000000000003
      [   46.179809]  s8 : 0000000000000003 s9 : 0000000000000002 s10: 0000000000000004
      [   46.180053]  s11: 0000000000000000 t3 : 0000003fc815749c t4 : 00000000000efc90
      [   46.180293]  t5 : ffffffe000d29658 t6 : 0000000000040000
      [   46.180482] status: 0000000000000100 badaddr: ffffffe04fb38c48 cause: 000000000000000f
      Signed-off-by: default avatarZong Li <zong.li@sifive.com>
      Reviewed-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      [paul.walmsley@sifive.com: cleaned up patch description]
      Fixes: 92e0d143 ("clocksource/drivers/riscv_timer: Provide the sched_clock")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaul Walmsley <paul.walmsley@sifive.com>
      9d05c18e
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 36487907
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "17 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        hexagon: define ioremap_uc
        ocfs2: fix the crash due to call ocfs2_get_dlm_debug once less
        ocfs2: call journal flush to mark journal as empty after journal recovery when mount
        mm/hugetlb: defer freeing of huge pages if in non-task context
        mm/gup: fix memory leak in __gup_benchmark_ioctl
        mm/oom: fix pgtables units mismatch in Killed process message
        fs/posix_acl.c: fix kernel-doc warnings
        hexagon: work around compiler crash
        hexagon: parenthesize registers in asm predicates
        fs/namespace.c: make to_mnt_ns() static
        fs/nsfs.c: include headers for missing declarations
        fs/direct-io.c: include fs/internal.h for missing prototype
        mm: move_pages: return valid node id in status if the page is already on the target node
        memcg: account security cred as well to kmemcg
        kcov: fix struct layout for kcov_remote_arg
        mm/zsmalloc.c: fix the migrated zspage statistics.
        mm/memory_hotplug: shrink zones when offlining memory
      36487907
    • Linus Torvalds's avatar
      Merge tag 'apparmor-pr-2020-01-04' of... · a125bcda
      Linus Torvalds authored
      Merge tag 'apparmor-pr-2020-01-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor
      
      Pull apparmor fixes from John Johansen:
      
       - performance regression: only get a label reference if the fast path
         check fails
      
       - fix aa_xattrs_match() may sleep while holding a RCU lock
      
       - fix bind mounts aborting with -ENOMEM
      
      * tag 'apparmor-pr-2020-01-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
        apparmor: fix aa_xattrs_match() may sleep while holding a RCU lock
        apparmor: only get a label reference if the fast path check fails
        apparmor: fix bind mounts aborting with -ENOMEM
      a125bcda
  4. 04 Jan, 2020 14 commits