1. 22 Jan, 2019 28 commits
  2. 16 Jan, 2019 12 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.19.16 · 9c5931b6
      Greg Kroah-Hartman authored
      9c5931b6
    • Filipe Manana's avatar
      Btrfs: use nofs context when initializing security xattrs to avoid deadlock · 7a1b9b76
      Filipe Manana authored
      commit 827aa18e upstream.
      
      When initializing the security xattrs, we are holding a transaction handle
      therefore we need to use a GFP_NOFS context in order to avoid a deadlock
      with reclaim in case it's triggered.
      
      Fixes: 39a27ec1 ("btrfs: use GFP_KERNEL for xattr and acl allocations")
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7a1b9b76
    • Filipe Manana's avatar
      Btrfs: fix deadlock when enabling quotas due to concurrent snapshot creation · 79aa5c0d
      Filipe Manana authored
      commit 9a6f209e upstream.
      
      If the quota enable and snapshot creation ioctls are called concurrently
      we can get into a deadlock where the task enabling quotas will deadlock
      on the fs_info->qgroup_ioctl_lock mutex because it attempts to lock it
      twice, or the task creating a snapshot tries to commit the transaction
      while the task enabling quota waits for the former task to commit the
      transaction while holding the mutex. The following time diagrams show how
      both cases happen.
      
      First scenario:
      
                 CPU 0                                    CPU 1
      
       btrfs_ioctl()
        btrfs_ioctl_quota_ctl()
         btrfs_quota_enable()
          mutex_lock(fs_info->qgroup_ioctl_lock)
          btrfs_start_transaction()
      
                                                   btrfs_ioctl()
                                                    btrfs_ioctl_snap_create_v2
                                                     create_snapshot()
                                                      --> adds snapshot to the
                                                          list pending_snapshots
                                                          of the current
                                                          transaction
      
          btrfs_commit_transaction()
           create_pending_snapshots()
             create_pending_snapshot()
              qgroup_account_snapshot()
               btrfs_qgroup_inherit()
      	   mutex_lock(fs_info->qgroup_ioctl_lock)
      	    --> deadlock, mutex already locked
      	        by this task at
      		btrfs_quota_enable()
      
      Second scenario:
      
                 CPU 0                                    CPU 1
      
       btrfs_ioctl()
        btrfs_ioctl_quota_ctl()
         btrfs_quota_enable()
          mutex_lock(fs_info->qgroup_ioctl_lock)
          btrfs_start_transaction()
      
                                                   btrfs_ioctl()
                                                    btrfs_ioctl_snap_create_v2
                                                     create_snapshot()
                                                      --> adds snapshot to the
                                                          list pending_snapshots
                                                          of the current
                                                          transaction
      
                                                      btrfs_commit_transaction()
                                                       --> waits for task at
                                                           CPU 0 to release
                                                           its transaction
                                                           handle
      
          btrfs_commit_transaction()
           --> sees another task started
               the transaction commit first
           --> releases its transaction
               handle
           --> waits for the transaction
               commit to be completed by
               the task at CPU 1
      
                                                       create_pending_snapshot()
                                                        qgroup_account_snapshot()
                                                         btrfs_qgroup_inherit()
                                                          mutex_lock(fs_info->qgroup_ioctl_lock)
                                                           --> deadlock, task at CPU 0
                                                               has the mutex locked but
                                                               it is waiting for us to
                                                               finish the transaction
                                                               commit
      
      So fix this by setting the quota enabled flag in fs_info after committing
      the transaction at btrfs_quota_enable(). This ends up serializing quota
      enable and snapshot creation as if the snapshot creation happened just
      before the quota enable request. The quota rescan task, scheduled after
      committing the transaction in btrfs_quote_enable(), will do the accounting.
      
      Fixes: 6426c7ad ("btrfs: qgroup: Fix qgroup accounting when creating snapshot")
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      79aa5c0d
    • Filipe Manana's avatar
      Btrfs: fix access to available allocation bits when starting balance · 829431a2
      Filipe Manana authored
      commit 5a8067c0 upstream.
      
      The available allocation bits members from struct btrfs_fs_info are
      protected by a sequence lock, and when starting balance we access them
      incorrectly in two different ways:
      
      1) In the read sequence lock loop at btrfs_balance() we use the values we
         read from fs_info->avail_*_alloc_bits and we can immediately do actions
         that have side effects and can not be undone (printing a message and
         jumping to a label). This is wrong because a retry might be needed, so
         our actions must not have side effects and must be repeatable as long
         as read_seqretry() returns a non-zero value. In other words, we were
         essentially ignoring the sequence lock;
      
      2) Right below the read sequence lock loop, we were reading the values
         from avail_metadata_alloc_bits and avail_data_alloc_bits without any
         protection from concurrent writers, that is, reading them outside of
         the read sequence lock critical section.
      
      So fix this by making sure we only read the available allocation bits
      while in a read sequence lock critical section and that what we do in the
      critical section is repeatable (has nothing that can not be undone) so
      that any eventual retry that is needed is handled properly.
      
      Fixes: de98ced9 ("Btrfs: use seqlock to protect fs_info->avail_{data, metadata, system}_alloc_bits")
      Fixes: 14506127 ("btrfs: fix a bogus warning when converting only data or metadata")
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      829431a2
    • Will Deacon's avatar
      arm64: compat: Don't pull syscall number from regs in arm_compat_syscall · 6c9a2046
      Will Deacon authored
      commit 53290432 upstream.
      
      The syscall number may have been changed by a tracer, so we should pass
      the actual number in from the caller instead of pulling it from the
      saved r7 value directly.
      
      Cc: <stable@vger.kernel.org>
      Cc: Pi-Hsun Shih <pihsun@chromium.org>
      Reviewed-by: default avatarDave Martin <Dave.Martin@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      6c9a2046
    • Christoffer Dall's avatar
      KVM: arm/arm64: Fix VMID alloc race by reverting to lock-less · 4f14f446
      Christoffer Dall authored
      commit fb544d1c upstream.
      
      We recently addressed a VMID generation race by introducing a read/write
      lock around accesses and updates to the vmid generation values.
      
      However, kvm_arch_vcpu_ioctl_run() also calls need_new_vmid_gen() but
      does so without taking the read lock.
      
      As far as I can tell, this can lead to the same kind of race:
      
        VM 0, VCPU 0			VM 0, VCPU 1
        ------------			------------
        update_vttbr (vmid 254)
        				update_vttbr (vmid 1) // roll over
      				read_lock(kvm_vmid_lock);
      				force_vm_exit()
        local_irq_disable
        need_new_vmid_gen == false //because vmid gen matches
      
        enter_guest (vmid 254)
        				kvm_arch.vttbr = <PGD>:<VMID 1>
      				read_unlock(kvm_vmid_lock);
      
        				enter_guest (vmid 1)
      
      Which results in running two VCPUs in the same VM with different VMIDs
      and (even worse) other VCPUs from other VMs could now allocate clashing
      VMID 254 from the new generation as long as VCPU 0 is not exiting.
      
      Attempt to solve this by making sure vttbr is updated before another CPU
      can observe the updated VMID generation.
      
      Cc: stable@vger.kernel.org
      Fixes: f0cf47d9 "KVM: arm/arm64: Close VMID generation race"
      Reviewed-by: default avatarJulien Thierry <julien.thierry@arm.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@arm.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4f14f446
    • Vasily Averin's avatar
      sunrpc: use-after-free in svc_process_common() · 44e7bab3
      Vasily Averin authored
      commit d4b09acf upstream.
      
      if node have NFSv41+ mounts inside several net namespaces
      it can lead to use-after-free in svc_process_common()
      
      svc_process_common()
              /* Setup reply header */
              rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr(rqstp); <<< HERE
      
      svc_process_common() can use incorrect rqstp->rq_xprt,
      its caller function bc_svc_process() takes it from serv->sv_bc_xprt.
      The problem is that serv is global structure but sv_bc_xprt
      is assigned per-netnamespace.
      
      According to Trond, the whole "let's set up rqstp->rq_xprt
      for the back channel" is nothing but a giant hack in order
      to work around the fact that svc_process_common() uses it
      to find the xpt_ops, and perform a couple of (meaningless
      for the back channel) tests of xpt_flags.
      
      All we really need in svc_process_common() is to be able to run
      rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr()
      
      Bruce J Fields points that this xpo_prep_reply_hdr() call
      is an awfully roundabout way just to do "svc_putnl(resv, 0);"
      in the tcp case.
      
      This patch does not initialiuze rqstp->rq_xprt in bc_svc_process(),
      now it calls svc_process_common() with rqstp->rq_xprt = NULL.
      
      To adjust reply header svc_process_common() just check
      rqstp->rq_prot and calls svc_tcp_prep_reply_hdr() for tcp case.
      
      To handle rqstp->rq_xprt = NULL case in functions called from
      svc_process_common() patch intruduces net namespace pointer
      svc_rqst->rq_bc_net and adjust SVC_NET() definition.
      Some other function was also adopted to properly handle described case.
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Cc: stable@vger.kernel.org
      Fixes: 23c20ecd ("NFS: callback up - users counting cleanup")
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      v2: added lost extern svc_tcp_prep_reply_hdr()
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      44e7bab3
    • Jan Stancek's avatar
      mm: page_mapped: don't assume compound page is huge or THP · 160f79c0
      Jan Stancek authored
      commit 8ab88c71 upstream.
      
      LTP proc01 testcase has been observed to rarely trigger crashes
      on arm64:
          page_mapped+0x78/0xb4
          stable_page_flags+0x27c/0x338
          kpageflags_read+0xfc/0x164
          proc_reg_read+0x7c/0xb8
          __vfs_read+0x58/0x178
          vfs_read+0x90/0x14c
          SyS_read+0x60/0xc0
      
      The issue is that page_mapped() assumes that if compound page is not
      huge, then it must be THP.  But if this is 'normal' compound page
      (COMPOUND_PAGE_DTOR), then following loop can keep running (for
      HPAGE_PMD_NR iterations) until it tries to read from memory that isn't
      mapped and triggers a panic:
      
              for (i = 0; i < hpage_nr_pages(page); i++) {
                      if (atomic_read(&page[i]._mapcount) >= 0)
                              return true;
      	}
      
      I could replicate this on x86 (v4.20-rc4-98-g60b54823) only
      with a custom kernel module [1] which:
       - allocates compound page (PAGEC) of order 1
       - allocates 2 normal pages (COPY), which are initialized to 0xff (to
         satisfy _mapcount >= 0)
       - 2 PAGEC page structs are copied to address of first COPY page
       - second page of COPY is marked as not present
       - call to page_mapped(COPY) now triggers fault on access to 2nd COPY
         page at offset 0x30 (_mapcount)
      
      [1] https://github.com/jstancek/reproducers/blob/master/kernel/page_mapped_crash/repro.c
      
      Fix the loop to iterate for "1 << compound_order" pages.
      
      Kirrill said "IIRC, sound subsystem can producuce custom mapped compound
      pages".
      
      Link: http://lkml.kernel.org/r/c440d69879e34209feba21e12d236d06bc0a25db.1543577156.git.jstancek@redhat.com
      Fixes: e1534ae9 ("mm: differentiate page_mapped() from page_mapcount() for compound pages")
      Signed-off-by: default avatarJan Stancek <jstancek@redhat.com>
      Debugged-by: default avatarLaszlo Ersek <lersek@redhat.com>
      Suggested-by: default avatar"Kirill A. Shutemov" <kirill@shutemov.name>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      160f79c0
    • Theodore Ts'o's avatar
      ext4: fix special inode number checks in __ext4_iget() · 5dc41af3
      Theodore Ts'o authored
      commit 191ce178 upstream.
      
      The check for special (reserved) inode number checks in __ext4_iget()
      was broken by commit 8a363970: ("ext4: avoid declaring fs
      inconsistent due to invalid file handles").  This was caused by a
      botched reversal of the sense of the flag now known as
      EXT4_IGET_SPECIAL (when it was previously named EXT4_IGET_NORMAL).
      Fix the logic appropriately.
      
      Fixes: 8a363970 ("ext4: avoid declaring fs inconsistent...")
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5dc41af3
    • Theodore Ts'o's avatar
      ext4: track writeback errors using the generic tracking infrastructure · bb80ad0d
      Theodore Ts'o authored
      commit 95cb6713 upstream.
      
      We already using mapping_set_error() in fs/ext4/page_io.c, so all we
      need to do is to use file_check_and_advance_wb_err() when handling
      fsync() requests in ext4_sync_file().
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bb80ad0d
    • Theodore Ts'o's avatar
      ext4: use ext4_write_inode() when fsyncing w/o a journal · da38a1b4
      Theodore Ts'o authored
      commit ad211f3e upstream.
      
      In no-journal mode, we previously used __generic_file_fsync() in
      no-journal mode.  This triggers a lockdep warning, and in addition,
      it's not safe to depend on the inode writeback mechanism in the case
      ext4.  We can solve both problems by calling ext4_write_inode()
      directly.
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      da38a1b4
    • Theodore Ts'o's avatar
      ext4: avoid kernel warning when writing the superblock to a dead device · 01db6e5c
      Theodore Ts'o authored
      commit e8680786 upstream.
      
      The xfstests generic/475 test switches the underlying device with
      dm-error while running a stress test.  This results in a large number
      of file system errors, and since we can't lock the buffer head when
      marking the superblock dirty in the ext4_grp_locked_error() case, it's
      possible the superblock to be !buffer_uptodate() without
      buffer_write_io_error() being true.
      
      We need to set buffer_uptodate() before we call mark_buffer_dirty() or
      this will trigger a WARN_ON.  It's safe to do this since the
      superblock must have been properly read into memory or the mount would
      have been successful.  So if buffer_uptodate() is not set, we can
      safely assume that this happened due to a failed attempt to write the
      superblock.
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      01db6e5c