1. 30 May, 2018 40 commits
    • Alexander Shishkin's avatar
      intel_th: Use correct method of finding hub · 80fceaf3
      Alexander Shishkin authored
      [ Upstream commit 9ad57708 ]
      
      Since commit 8edc514b ("intel_th: Make SOURCE devices children of the
      root device") the hub is not the parent of SOURCE devices any more, so the
      new helper function should be used for that instead of always using the
      parent. The intel_th_set_output() path, however, still uses the old
      logic, leading to the hub driver structure being aliased with something
      else, like struct pci_driver or struct acpi_driver, and an incorrect call
      to an address inferred from that, potentially resulting in a crash.
      
      Fixes: 8edc514b ("intel_th: Make SOURCE devices children of the root device")
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      80fceaf3
    • Sebastian Andrzej Siewior's avatar
      iommu/amd: Take into account that alloc_dev_data() may return NULL · 1366b31d
      Sebastian Andrzej Siewior authored
      [ Upstream commit 39ffe395 ]
      
      find_dev_data() does not check whether the return value alloc_dev_data()
      is NULL. This was okay once because the pointer was returned once as-is.
      Since commit df3f7a6e ("iommu/amd: Use is_attach_deferred
      call-back") the pointer may be used within find_dev_data() so a NULL
      check is required.
      
      Cc: Baoquan He <bhe@redhat.com>
      Fixes: df3f7a6e ("iommu/amd: Use is_attach_deferred call-back")
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1366b31d
    • Anilkumar Kolli's avatar
      ath10k: advertize beacon_int_min_gcd · 6bc2bf60
      Anilkumar Kolli authored
      [ Upstream commit 8ebee73b ]
      
      This patch fixes regression caused by 0c317a02
      ("cfg80211: support virtual interfaces with different beacon intervals"),
      with this change cfg80211 expects the driver to advertize
      'beacon_int_min_gcd' to support different beacon intervals in multivap
      scenario. This support is added for, QCA988X/QCA99X0/QCA9984/QCA4019.
      
      Verifed AP + mesh bring up on QCA9984 with beacon interval 100msec and
      1000msec respectively.
      Frimware: firmware-5.bin_10.4-3.5.3-00053
      
      Fixes: 0c317a02 ("cfg80211: support virtual interfaces with different beacon intervals")
      Signed-off-by: default avatarAnilkumar Kolli <akolli@codeaurora.org>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6bc2bf60
    • Harry Morris's avatar
      ieee802154: ca8210: fix uninitialised data read · 9c222c49
      Harry Morris authored
      [ Upstream commit 86674a97 ]
      
      In ca8210_test_int_user_write() a user can request the transfer of a
      frame with a length field (command.length) that is longer than the
      actual buffer provided (len). In this scenario the driver will copy
      the buffer contents into the uninitialised command[] buffer, then
      transfer <data.length> bytes over the SPI even though only <len> bytes
      had been populated, potentially leaking sensitive kernel memory.
      
      Also the first 6 bytes of the command buffer must be initialised in case
      a malformed, short packet is written and the uninitialised bytes are
      read in ca8210_test_check_upstream.
      Reported-by: default avatarDomen Puncer Kugler <domen.puncer@samsung.com>
      Signed-off-by: default avatarHarry Morris <h.morris@cascoda.com>
      Tested-by: default avatarHarry Morris <h.morris@cascoda.com>
      Signed-off-by: default avatarStefan Schmidt <stefan@osg.samsung.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9c222c49
    • Michael Ellerman's avatar
      powerpc/mpic: Check if cpu_possible() in mpic_physmask() · c3a2a878
      Michael Ellerman authored
      [ Upstream commit 0834d627 ]
      
      In mpic_physmask() we loop over all CPUs up to 32, then get the hard
      SMP processor id of that CPU.
      
      Currently that's possibly walking off the end of the paca array, but
      in a future patch we will change the paca array to be an array of
      pointers, and in that case we will get a NULL for missing CPUs and
      oops. eg:
      
        Unable to handle kernel paging request for data at address 0x88888888888888b8
        Faulting instruction address: 0xc00000000004e380
        Oops: Kernel access of bad area, sig: 11 [#1]
        ...
        NIP .mpic_set_affinity+0x60/0x1a0
        LR  .irq_do_set_affinity+0x48/0x100
      
      Fix it by checking the CPU is possible, this also fixes the code if
      there are gaps in the CPU numbering which probably never happens on
      mpic systems but who knows.
      Debugged-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c3a2a878
    • Lenny Szubowicz's avatar
      ACPI: acpi_pad: Fix memory leak in power saving threads · fc2de796
      Lenny Szubowicz authored
      [ Upstream commit 8b29d29a ]
      
      Fix once per second (round_robin_time) memory leak of about 1 KB in
      each acpi_pad kernel idling thread that is activated.
      
      Found by testing with kmemleak.
      Signed-off-by: default avatarLenny Szubowicz <lszubowi@redhat.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fc2de796
    • Aaro Koskinen's avatar
      drivers: macintosh: rack-meter: really fix bogus memsets · d023498f
      Aaro Koskinen authored
      [ Upstream commit e283655b ]
      
      We should zero an array using sizeof instead of number of elements.
      
      Fixes the following compiler (GCC 7.3.0) warnings:
      
      drivers/macintosh/rack-meter.c: In function 'rackmeter_do_pause':
      drivers/macintosh/rack-meter.c:157:2: warning: 'memset' used with length equal to number of elements without multiplication by element size [-Wmemset-elt-size]
      drivers/macintosh/rack-meter.c:158:2: warning: 'memset' used with length equal to number of elements without multiplication by element size [-Wmemset-elt-size]
      
      Fixes: 4f7bef7a ("drivers: macintosh: rack-meter: fix bogus memsets")
      Reported-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarAaro Koskinen <aaro.koskinen@iki.fi>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d023498f
    • Dan Carpenter's avatar
      xen/acpi: off by one in read_acpi_id() · 8effa218
      Dan Carpenter authored
      [ Upstream commit c37a3c94 ]
      
      If acpi_id is == nr_acpi_bits, then we access one element beyond the end
      of the acpi_psd[] array or we set one bit beyond the end of the bit map
      when we do __set_bit(acpi_id, acpi_id_present);
      
      Fixes: 59a56802 ("xen/acpi-processor: C and P-state driver that uploads said data to hypervisor.")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Reviewed-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8effa218
    • David Howells's avatar
      rxrpc: Don't treat call aborts as conn aborts · 637b9b18
      David Howells authored
      [ Upstream commit 57b0c9d4 ]
      
      If a call-level abort is received for the previous call to complete on a
      connection channel, then that abort is queued for the connection processor
      to handle.  Unfortunately, the connection processor then assumes without
      checking that the abort is connection-level (ie. callNumber is 0) and
      distributes it over all active calls on that connection, thereby
      incorrectly aborting them.
      
      Fix this by discarding aborts aimed at a completed call.
      
      Further, discard all packets aimed at a call that's complete if there's
      currently an active call on a channel, since the DATA packets associated
      with the new call automatically terminate the old call.
      
      Fixes: 18bfeba5 ("rxrpc: Perform terminal call ACK/ABORT retransmission from conn processor")
      Reported-by: default avatarMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      637b9b18
    • David Howells's avatar
      rxrpc: Fix Tx ring annotation after initial Tx failure · 4a9fabcd
      David Howells authored
      [ Upstream commit 03877bf6 ]
      
      rxrpc calls have a ring of packets that are awaiting ACK or retransmission
      and a parallel ring of annotations that tracks the state of those packets.
      If the initial transmission of a packet on the underlying UDP socket fails
      then the packet annotation is marked for resend - but the setting of this
      mark accidentally erases the last-packet mark also stored in the same
      annotation slot.  If this happens, a call won't switch out of the Tx phase
      when all the packets have been transmitted.
      
      Fix this by retaining the last-packet mark and only altering the packet
      state.
      
      Fixes: 248f219c ("rxrpc: Rewrite the data and ack handling code")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4a9fabcd
    • Qu Wenruo's avatar
      btrfs: qgroup: Fix root item corruption when multiple same source snapshots... · 204bfcda
      Qu Wenruo authored
      btrfs: qgroup: Fix root item corruption when multiple same source snapshots are created with quota enabled
      
      [ Upstream commit 4d31778a ]
      
      When multiple pending snapshots referring to the same source subvolume
      are executed, enabled quota will cause root item corruption, where root
      items are using old bytenr (no backref in extent tree).
      
      This can be triggered by fstests btrfs/152.
      
      The cause is when source subvolume is still dirty, extra commit
      (simplied transaction commit) of qgroup_account_snapshot() can skip
      dirty roots not recorded in current transaction, making root item of
      source subvolume not updated.
      
      Fix it by forcing recording source subvolume in current transaction
      before qgroup sub-transaction commit.
      Reported-by: default avatarJustin Maggard <jmaggard@netgear.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      204bfcda
    • Jeff Mahoney's avatar
      btrfs: fix lockdep splat in btrfs_alloc_subvolume_writers · de00d572
      Jeff Mahoney authored
      [ Upstream commit 8a5a916d ]
      
      While running btrfs/011, I hit the following lockdep splat.
      
      This is the important bit:
         pcpu_alloc+0x1ac/0x5e0
         __percpu_counter_init+0x4e/0xb0
         btrfs_init_fs_root+0x99/0x1c0 [btrfs]
         btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
         resolve_indirect_refs+0x130/0x830 [btrfs]
         find_parent_nodes+0x69e/0xff0 [btrfs]
         btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
         btrfs_find_all_roots+0x50/0x70 [btrfs]
         btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
         btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]
      
      The percpu_counter_init call in btrfs_alloc_subvolume_writers
      uses GFP_KERNEL, which we can't do during transaction commit.
      
      This switches it to GFP_NOFS.
      
      ========================================================
      WARNING: possible irq lock inversion dependency detected
      4.12.14-kvmsmall #8 Tainted: G        W
      --------------------------------------------------------
      kswapd0/50 just changed the state of lock:
       (&delayed_node->mutex){+.+.-.}, at: [<ffffffffc06994fa>] __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
      but this lock took another, RECLAIM_FS-unsafe lock in the past:
       (pcpu_alloc_mutex){+.+.+.}
      
      and interrupts could create inverse lock ordering between them.
      
      other info that might help us debug this:
      Chain exists of:
        &delayed_node->mutex --> &found->groups_sem --> pcpu_alloc_mutex
      
       Possible interrupt unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(pcpu_alloc_mutex);
                                     local_irq_disable();
                                     lock(&delayed_node->mutex);
                                     lock(&found->groups_sem);
        <Interrupt>
          lock(&delayed_node->mutex);
      
       *** DEADLOCK ***
      
      2 locks held by kswapd0/50:
       #0:  (shrinker_rwsem){++++..}, at: [<ffffffff811dc11f>] shrink_slab+0x7f/0x5b0
       #1:  (&type->s_umount_key#30){+++++.}, at: [<ffffffff8126dec6>] trylock_super+0x16/0x50
      
      the shortest dependencies between 2nd lock and 1st lock:
         -> (pcpu_alloc_mutex){+.+.+.} ops: 4904 {
            HARDIRQ-ON-W at:
                                __mutex_lock+0x4e/0x8c0
                                pcpu_alloc+0x1ac/0x5e0
                                alloc_kmem_cache_cpus.isra.70+0x25/0xa0
                                __do_tune_cpucache+0x2c/0x220
                                do_tune_cpucache+0x26/0xc0
                                enable_cpucache+0x6d/0xf0
                                kmem_cache_init_late+0x42/0x75
                                start_kernel+0x343/0x4cb
                                x86_64_start_kernel+0x127/0x134
                                secondary_startup_64+0xa5/0xb0
            SOFTIRQ-ON-W at:
                                __mutex_lock+0x4e/0x8c0
                                pcpu_alloc+0x1ac/0x5e0
                                alloc_kmem_cache_cpus.isra.70+0x25/0xa0
                                __do_tune_cpucache+0x2c/0x220
                                do_tune_cpucache+0x26/0xc0
                                enable_cpucache+0x6d/0xf0
                                kmem_cache_init_late+0x42/0x75
                                start_kernel+0x343/0x4cb
                                x86_64_start_kernel+0x127/0x134
                                secondary_startup_64+0xa5/0xb0
            RECLAIM_FS-ON-W at:
                                   __kmalloc+0x47/0x310
                                   pcpu_extend_area_map+0x2b/0xc0
                                   pcpu_alloc+0x3ec/0x5e0
                                   alloc_kmem_cache_cpus.isra.70+0x25/0xa0
                                   __do_tune_cpucache+0x2c/0x220
                                   do_tune_cpucache+0x26/0xc0
                                   enable_cpucache+0x6d/0xf0
                                   __kmem_cache_create+0x1bf/0x390
                                   create_cache+0xba/0x1b0
                                   kmem_cache_create+0x1f8/0x2b0
                                   ksm_init+0x6f/0x19d
                                   do_one_initcall+0x50/0x1b0
                                   kernel_init_freeable+0x201/0x289
                                   kernel_init+0xa/0x100
                                   ret_from_fork+0x3a/0x50
            INITIAL USE at:
                               __mutex_lock+0x4e/0x8c0
                               pcpu_alloc+0x1ac/0x5e0
                               alloc_kmem_cache_cpus.isra.70+0x25/0xa0
                               setup_cpu_cache+0x2f/0x1f0
                               __kmem_cache_create+0x1bf/0x390
                               create_boot_cache+0x8b/0xb1
                               kmem_cache_init+0xa1/0x19e
                               start_kernel+0x270/0x4cb
                               x86_64_start_kernel+0x127/0x134
                               secondary_startup_64+0xa5/0xb0
          }
          ... key      at: [<ffffffff821d8e70>] pcpu_alloc_mutex+0x70/0xa0
          ... acquired at:
         pcpu_alloc+0x1ac/0x5e0
         __percpu_counter_init+0x4e/0xb0
         btrfs_init_fs_root+0x99/0x1c0 [btrfs]
         btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
         resolve_indirect_refs+0x130/0x830 [btrfs]
         find_parent_nodes+0x69e/0xff0 [btrfs]
         btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
         btrfs_find_all_roots+0x50/0x70 [btrfs]
         btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
         btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]
         transaction_kthread+0x176/0x1b0 [btrfs]
         kthread+0x102/0x140
         ret_from_fork+0x3a/0x50
      
        -> (&fs_info->commit_root_sem){++++..} ops: 1566382 {
           HARDIRQ-ON-W at:
                              down_write+0x3e/0xa0
                              cache_block_group+0x287/0x420 [btrfs]
                              find_free_extent+0x106c/0x12d0 [btrfs]
                              btrfs_reserve_extent+0xd8/0x170 [btrfs]
                              cow_file_range.isra.66+0x133/0x470 [btrfs]
                              run_delalloc_range+0x121/0x410 [btrfs]
                              writepage_delalloc.isra.50+0xfe/0x180 [btrfs]
                              __extent_writepage+0x19a/0x360 [btrfs]
                              extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs]
                              extent_writepages+0x4d/0x60 [btrfs]
                              do_writepages+0x1a/0x70
                              __filemap_fdatawrite_range+0xa7/0xe0
                              btrfs_rename+0x5ee/0xdb0 [btrfs]
                              vfs_rename+0x52a/0x7e0
                              SyS_rename+0x351/0x3b0
                              do_syscall_64+0x79/0x1e0
                              entry_SYSCALL_64_after_hwframe+0x42/0xb7
           HARDIRQ-ON-R at:
                              down_read+0x35/0x90
                              caching_thread+0x57/0x560 [btrfs]
                              normal_work_helper+0x1c0/0x5e0 [btrfs]
                              process_one_work+0x1e0/0x5c0
                              worker_thread+0x44/0x390
                              kthread+0x102/0x140
                              ret_from_fork+0x3a/0x50
           SOFTIRQ-ON-W at:
                              down_write+0x3e/0xa0
                              cache_block_group+0x287/0x420 [btrfs]
                              find_free_extent+0x106c/0x12d0 [btrfs]
                              btrfs_reserve_extent+0xd8/0x170 [btrfs]
                              cow_file_range.isra.66+0x133/0x470 [btrfs]
                              run_delalloc_range+0x121/0x410 [btrfs]
                              writepage_delalloc.isra.50+0xfe/0x180 [btrfs]
                              __extent_writepage+0x19a/0x360 [btrfs]
                              extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs]
                              extent_writepages+0x4d/0x60 [btrfs]
                              do_writepages+0x1a/0x70
                              __filemap_fdatawrite_range+0xa7/0xe0
                              btrfs_rename+0x5ee/0xdb0 [btrfs]
                              vfs_rename+0x52a/0x7e0
                              SyS_rename+0x351/0x3b0
                              do_syscall_64+0x79/0x1e0
                              entry_SYSCALL_64_after_hwframe+0x42/0xb7
           SOFTIRQ-ON-R at:
                              down_read+0x35/0x90
                              caching_thread+0x57/0x560 [btrfs]
                              normal_work_helper+0x1c0/0x5e0 [btrfs]
                              process_one_work+0x1e0/0x5c0
                              worker_thread+0x44/0x390
                              kthread+0x102/0x140
                              ret_from_fork+0x3a/0x50
           INITIAL USE at:
                             down_write+0x3e/0xa0
                             cache_block_group+0x287/0x420 [btrfs]
                             find_free_extent+0x106c/0x12d0 [btrfs]
                             btrfs_reserve_extent+0xd8/0x170 [btrfs]
                             cow_file_range.isra.66+0x133/0x470 [btrfs]
                             run_delalloc_range+0x121/0x410 [btrfs]
                             writepage_delalloc.isra.50+0xfe/0x180 [btrfs]
                             __extent_writepage+0x19a/0x360 [btrfs]
                             extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs]
                             extent_writepages+0x4d/0x60 [btrfs]
                             do_writepages+0x1a/0x70
                             __filemap_fdatawrite_range+0xa7/0xe0
                             btrfs_rename+0x5ee/0xdb0 [btrfs]
                             vfs_rename+0x52a/0x7e0
                             SyS_rename+0x351/0x3b0
                             do_syscall_64+0x79/0x1e0
                             entry_SYSCALL_64_after_hwframe+0x42/0xb7
         }
         ... key      at: [<ffffffffc0729578>] __key.61970+0x0/0xfffffffffff9aa88 [btrfs]
         ... acquired at:
         cache_block_group+0x287/0x420 [btrfs]
         find_free_extent+0x106c/0x12d0 [btrfs]
         btrfs_reserve_extent+0xd8/0x170 [btrfs]
         btrfs_alloc_tree_block+0x12f/0x4c0 [btrfs]
         btrfs_create_tree+0xbb/0x2a0 [btrfs]
         btrfs_create_uuid_tree+0x37/0x140 [btrfs]
         open_ctree+0x23c0/0x2660 [btrfs]
         btrfs_mount+0xd36/0xf90 [btrfs]
         mount_fs+0x3a/0x160
         vfs_kern_mount+0x66/0x150
         btrfs_mount+0x18c/0xf90 [btrfs]
         mount_fs+0x3a/0x160
         vfs_kern_mount+0x66/0x150
         do_mount+0x1c1/0xcc0
         SyS_mount+0x7e/0xd0
         do_syscall_64+0x79/0x1e0
         entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
       -> (&found->groups_sem){++++..} ops: 2134587 {
          HARDIRQ-ON-W at:
                            down_write+0x3e/0xa0
                            __link_block_group+0x34/0x130 [btrfs]
                            btrfs_read_block_groups+0x33d/0x7b0 [btrfs]
                            open_ctree+0x2054/0x2660 [btrfs]
                            btrfs_mount+0xd36/0xf90 [btrfs]
                            mount_fs+0x3a/0x160
                            vfs_kern_mount+0x66/0x150
                            btrfs_mount+0x18c/0xf90 [btrfs]
                            mount_fs+0x3a/0x160
                            vfs_kern_mount+0x66/0x150
                            do_mount+0x1c1/0xcc0
                            SyS_mount+0x7e/0xd0
                            do_syscall_64+0x79/0x1e0
                            entry_SYSCALL_64_after_hwframe+0x42/0xb7
          HARDIRQ-ON-R at:
                            down_read+0x35/0x90
                            btrfs_calc_num_tolerated_disk_barrier_failures+0x113/0x1f0 [btrfs]
                            open_ctree+0x207b/0x2660 [btrfs]
                            btrfs_mount+0xd36/0xf90 [btrfs]
                            mount_fs+0x3a/0x160
                            vfs_kern_mount+0x66/0x150
                            btrfs_mount+0x18c/0xf90 [btrfs]
                            mount_fs+0x3a/0x160
                            vfs_kern_mount+0x66/0x150
                            do_mount+0x1c1/0xcc0
                            SyS_mount+0x7e/0xd0
                            do_syscall_64+0x79/0x1e0
                            entry_SYSCALL_64_after_hwframe+0x42/0xb7
          SOFTIRQ-ON-W at:
                            down_write+0x3e/0xa0
                            __link_block_group+0x34/0x130 [btrfs]
                            btrfs_read_block_groups+0x33d/0x7b0 [btrfs]
                            open_ctree+0x2054/0x2660 [btrfs]
                            btrfs_mount+0xd36/0xf90 [btrfs]
                            mount_fs+0x3a/0x160
                            vfs_kern_mount+0x66/0x150
                            btrfs_mount+0x18c/0xf90 [btrfs]
                            mount_fs+0x3a/0x160
                            vfs_kern_mount+0x66/0x150
                            do_mount+0x1c1/0xcc0
                            SyS_mount+0x7e/0xd0
                            do_syscall_64+0x79/0x1e0
                            entry_SYSCALL_64_after_hwframe+0x42/0xb7
          SOFTIRQ-ON-R at:
                            down_read+0x35/0x90
                            btrfs_calc_num_tolerated_disk_barrier_failures+0x113/0x1f0 [btrfs]
                            open_ctree+0x207b/0x2660 [btrfs]
                            btrfs_mount+0xd36/0xf90 [btrfs]
                            mount_fs+0x3a/0x160
                            vfs_kern_mount+0x66/0x150
                            btrfs_mount+0x18c/0xf90 [btrfs]
                            mount_fs+0x3a/0x160
                            vfs_kern_mount+0x66/0x150
                            do_mount+0x1c1/0xcc0
                            SyS_mount+0x7e/0xd0
                            do_syscall_64+0x79/0x1e0
                            entry_SYSCALL_64_after_hwframe+0x42/0xb7
          INITIAL USE at:
                           down_write+0x3e/0xa0
                           __link_block_group+0x34/0x130 [btrfs]
                           btrfs_read_block_groups+0x33d/0x7b0 [btrfs]
                           open_ctree+0x2054/0x2660 [btrfs]
                           btrfs_mount+0xd36/0xf90 [btrfs]
                           mount_fs+0x3a/0x160
                           vfs_kern_mount+0x66/0x150
                           btrfs_mount+0x18c/0xf90 [btrfs]
                           mount_fs+0x3a/0x160
                           vfs_kern_mount+0x66/0x150
                           do_mount+0x1c1/0xcc0
                           SyS_mount+0x7e/0xd0
                           do_syscall_64+0x79/0x1e0
                           entry_SYSCALL_64_after_hwframe+0x42/0xb7
        }
        ... key      at: [<ffffffffc0729488>] __key.59101+0x0/0xfffffffffff9ab78 [btrfs]
        ... acquired at:
         find_free_extent+0xcb4/0x12d0 [btrfs]
         btrfs_reserve_extent+0xd8/0x170 [btrfs]
         btrfs_alloc_tree_block+0x12f/0x4c0 [btrfs]
         __btrfs_cow_block+0x110/0x5b0 [btrfs]
         btrfs_cow_block+0xd7/0x290 [btrfs]
         btrfs_search_slot+0x1f6/0x960 [btrfs]
         btrfs_lookup_inode+0x2a/0x90 [btrfs]
         __btrfs_update_delayed_inode+0x65/0x210 [btrfs]
         btrfs_commit_inode_delayed_inode+0x121/0x130 [btrfs]
         btrfs_evict_inode+0x3fe/0x6a0 [btrfs]
         evict+0xc4/0x190
         __dentry_kill+0xbf/0x170
         dput+0x2ae/0x2f0
         SyS_rename+0x2a6/0x3b0
         do_syscall_64+0x79/0x1e0
         entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      -> (&delayed_node->mutex){+.+.-.} ops: 5580204 {
         HARDIRQ-ON-W at:
                          __mutex_lock+0x4e/0x8c0
                          btrfs_delayed_update_inode+0x46/0x6e0 [btrfs]
                          btrfs_update_inode+0x83/0x110 [btrfs]
                          btrfs_dirty_inode+0x62/0xe0 [btrfs]
                          touch_atime+0x8c/0xb0
                          do_generic_file_read+0x818/0xb10
                          __vfs_read+0xdc/0x150
                          vfs_read+0x8a/0x130
                          SyS_read+0x45/0xa0
                          do_syscall_64+0x79/0x1e0
                          entry_SYSCALL_64_after_hwframe+0x42/0xb7
         SOFTIRQ-ON-W at:
                          __mutex_lock+0x4e/0x8c0
                          btrfs_delayed_update_inode+0x46/0x6e0 [btrfs]
                          btrfs_update_inode+0x83/0x110 [btrfs]
                          btrfs_dirty_inode+0x62/0xe0 [btrfs]
                          touch_atime+0x8c/0xb0
                          do_generic_file_read+0x818/0xb10
                          __vfs_read+0xdc/0x150
                          vfs_read+0x8a/0x130
                          SyS_read+0x45/0xa0
                          do_syscall_64+0x79/0x1e0
                          entry_SYSCALL_64_after_hwframe+0x42/0xb7
         IN-RECLAIM_FS-W at:
                             __mutex_lock+0x4e/0x8c0
                             __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
                             btrfs_evict_inode+0x22c/0x6a0 [btrfs]
                             evict+0xc4/0x190
                             dispose_list+0x35/0x50
                             prune_icache_sb+0x42/0x50
                             super_cache_scan+0x139/0x190
                             shrink_slab+0x262/0x5b0
                             shrink_node+0x2eb/0x2f0
                             kswapd+0x2eb/0x890
                             kthread+0x102/0x140
                             ret_from_fork+0x3a/0x50
         INITIAL USE at:
                         __mutex_lock+0x4e/0x8c0
                         btrfs_delayed_update_inode+0x46/0x6e0 [btrfs]
                         btrfs_update_inode+0x83/0x110 [btrfs]
                         btrfs_dirty_inode+0x62/0xe0 [btrfs]
                         touch_atime+0x8c/0xb0
                         do_generic_file_read+0x818/0xb10
                         __vfs_read+0xdc/0x150
                         vfs_read+0x8a/0x130
                         SyS_read+0x45/0xa0
                         do_syscall_64+0x79/0x1e0
                         entry_SYSCALL_64_after_hwframe+0x42/0xb7
       }
       ... key      at: [<ffffffffc072d488>] __key.56935+0x0/0xfffffffffff96b78 [btrfs]
       ... acquired at:
         __lock_acquire+0x264/0x11c0
         lock_acquire+0xbd/0x1e0
         __mutex_lock+0x4e/0x8c0
         __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
         btrfs_evict_inode+0x22c/0x6a0 [btrfs]
         evict+0xc4/0x190
         dispose_list+0x35/0x50
         prune_icache_sb+0x42/0x50
         super_cache_scan+0x139/0x190
         shrink_slab+0x262/0x5b0
         shrink_node+0x2eb/0x2f0
         kswapd+0x2eb/0x890
         kthread+0x102/0x140
         ret_from_fork+0x3a/0x50
      
      stack backtrace:
      CPU: 1 PID: 50 Comm: kswapd0 Tainted: G        W        4.12.14-kvmsmall #8 SLE15 (unreleased)
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
      Call Trace:
       dump_stack+0x78/0xb7
       print_irq_inversion_bug.part.38+0x19f/0x1aa
       check_usage_forwards+0x102/0x120
       ? ret_from_fork+0x3a/0x50
       ? check_usage_backwards+0x110/0x110
       mark_lock+0x16c/0x270
       __lock_acquire+0x264/0x11c0
       ? pagevec_lookup_entries+0x1a/0x30
       ? truncate_inode_pages_range+0x2b3/0x7f0
       lock_acquire+0xbd/0x1e0
       ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
       __mutex_lock+0x4e/0x8c0
       ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
       ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
       ? btrfs_evict_inode+0x1f6/0x6a0 [btrfs]
       __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
       btrfs_evict_inode+0x22c/0x6a0 [btrfs]
       evict+0xc4/0x190
       dispose_list+0x35/0x50
       prune_icache_sb+0x42/0x50
       super_cache_scan+0x139/0x190
       shrink_slab+0x262/0x5b0
       shrink_node+0x2eb/0x2f0
       kswapd+0x2eb/0x890
       kthread+0x102/0x140
       ? mem_cgroup_shrink_node+0x2c0/0x2c0
       ? kthread_create_on_node+0x40/0x40
       ret_from_fork+0x3a/0x50
      Signed-off-by: default avatarJeff Mahoney <jeffm@suse.com>
      Reviewed-by: default avatarLiu Bo <bo.liu@linux.alibaba.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      de00d572
    • Filipe Manana's avatar
      Btrfs: fix copy_items() return value when logging an inode · 92efba91
      Filipe Manana authored
      [ Upstream commit 8434ec46 ]
      
      When logging an inode, at tree-log.c:copy_items(), if we call
      btrfs_next_leaf() at the loop which checks for the need to log holes, we
      need to make sure copy_items() returns the value 1 to its caller and
      not 0 (on success). This is because the path the caller passed was
      released and is now different from what is was before, and the caller
      expects a return value of 0 to mean both success and that the path
      has not changed, while a return value of 1 means both success and
      signals the caller that it can not reuse the path, it has to perform
      another tree search.
      
      Even though this is a case that should not be triggered on normal
      circumstances or very rare at least, its consequences can be very
      unpredictable (especially when replaying a log tree).
      
      Fixes: 16e7549f ("Btrfs: incompatible format change to remove hole extents")
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      92efba91
    • Qu Wenruo's avatar
      btrfs: tests/qgroup: Fix wrong tree backref level · d7255626
      Qu Wenruo authored
      [ Upstream commit 3c0efdf0 ]
      
      The extent tree of the test fs is like the following:
      
       BTRFS info (device (null)): leaf 16327509003777336587 total ptrs 1 free space 3919
        item 0 key (4096 168 4096) itemoff 3944 itemsize 51
                extent refs 1 gen 1 flags 2
                tree block key (68719476736 0 0) level 1
                                                 ^^^^^^^
                ref#0: tree block backref root 5
      
      And it's using an empty tree for fs tree, so there is no way that its
      level can be 1.
      
      For REAL (created by mkfs) fs tree backref with no skinny metadata, the
      result should look like:
      
       item 3 key (30408704 EXTENT_ITEM 4096) itemoff 3845 itemsize 51
               refs 1 gen 4 flags TREE_BLOCK
               tree block key (256 INODE_ITEM 0) level 0
                                                 ^^^^^^^
               tree block backref root 5
      
      Fix the level to 0, so it won't break later tree level checker.
      
      Fixes: faa2dbf0 ("Btrfs: add sanity tests for new qgroup accounting code")
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d7255626
    • Nicholas Piggin's avatar
      powerpc/64s: sreset panic if there is no debugger or crash dump handlers · 27a913cc
      Nicholas Piggin authored
      [ Upstream commit d40b6768 ]
      
      system_reset_exception does most of its own crash handling now,
      invoking the debugger or crash dumps if they are registered. If not,
      then it goes through to die() to print stack traces, and then is
      supposed to panic (according to comments).
      
      However after die() prints oopses, it does its own handling which
      doesn't allow system_reset_exception to panic (e.g., it may just
      kill the current process). This patch causes sreset exceptions to
      return from die after it prints messages but before acting.
      
      This also stops die from invoking the debugger on 0x100 crashes.
      system_reset_exception similarly calls the debugger. It had been
      thought this was harmless (because if the debugger was disabled,
      neither call would fire, and if it was enabled the first call
      would return). However in some cases like xmon 'X' command, the
      debugger returns 0, which currently causes it to be entered
      again (first in system_reset_exception, then in die), which is
      confusing.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      27a913cc
    • Florian Fainelli's avatar
      net: bgmac: Correctly annotate register space · 305f25c1
      Florian Fainelli authored
      [ Upstream commit 16a1c064 ]
      
      All the members: base, idm_base and nicpm_base should be annotated with
      __iomem since they are pointers to register space. This fixes a bunch of
      sparse reported warnings.
      
      Fixes: f6a95a24 ("net: ethernet: bgmac: Add platform device support")
      Fixes: dd5c5d03 ("net: ethernet: bgmac: add NS2 support")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      305f25c1
    • Florian Fainelli's avatar
      net: bgmac: Fix endian access in bgmac_dma_tx_ring_free() · 435290f7
      Florian Fainelli authored
      [ Upstream commit 60d6e6f0 ]
      
      bgmac_dma_tx_ring_free() assigns the ctl1 word which is a litle endian
      32-bit word without using proper accessors, fix this, and because a
      length cannot be negative, use unsigned int while at it.
      
      Fixes: 9cde9450 ("bgmac: implement scatter/gather support")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      435290f7
    • David S. Miller's avatar
      sparc64: Make atomic_xchg() an inline function rather than a macro. · 4a6cd791
      David S. Miller authored
      [ Upstream commit d13864b6 ]
      
      This avoids a lot of -Wunused warnings such as:
      
      ====================
      kernel/debug/debug_core.c: In function ‘kgdb_cpu_enter’:
      ./arch/sparc/include/asm/cmpxchg_64.h:55:22: warning: value computed is not used [-Wunused-value]
       #define xchg(ptr,x) ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr))))
      
      ./arch/sparc/include/asm/atomic_64.h:86:30: note: in expansion of macro ‘xchg’
       #define atomic_xchg(v, new) (xchg(&((v)->counter), new))
                                    ^~~~
      kernel/debug/debug_core.c:508:4: note: in expansion of macro ‘atomic_xchg’
          atomic_xchg(&kgdb_active, cpu);
          ^~~~~~~~~~~
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4a6cd791
    • David Howells's avatar
      fscache: Fix hanging wait on page discarded by writeback · 22f1bde5
      David Howells authored
      [ Upstream commit 2c984257 ]
      
      If the fscache asynchronous write operation elects to discard a page that's
      pending storage to the cache because the page would be over the store limit
      then it needs to wake the page as someone may be waiting on completion of
      the write.
      
      The problem is that the store limit may be updated by a different
      asynchronous operation - and so may miss the write - and that the store
      limit may not even get updated until later by the netfs.
      
      Fix the kernel hang by making fscache_write_op() mark as written any pages
      that are over the limit.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      22f1bde5
    • Alexander Graf's avatar
      lan78xx: Connect phy early · 6d03ff16
      Alexander Graf authored
      [ Upstream commit 92571a1a ]
      
      When using wicked with a lan78xx device attached to the system, we
      end up with ethtool commands issued on the device before an ifup
      got issued. That lead to the following crash:
      
          Unable to handle kernel NULL pointer dereference at virtual address 0000039c
          pgd = ffff800035b30000
          [0000039c] *pgd=0000000000000000
          Internal error: Oops: 96000004 [#1] SMP
          Modules linked in: [...]
          Supported: Yes
          CPU: 3 PID: 638 Comm: wickedd Tainted: G            E      4.12.14-0-default #1
          Hardware name: raspberrypi rpi/rpi, BIOS 2018.03-rc2 02/21/2018
          task: ffff800035e74180 task.stack: ffff800036718000
          PC is at phy_ethtool_ksettings_get+0x20/0x98
          LR is at lan78xx_get_link_ksettings+0x44/0x60 [lan78xx]
          pc : [<ffff0000086f7f30>] lr : [<ffff000000dcca84>] pstate: 20000005
          sp : ffff80003671bb20
          x29: ffff80003671bb20 x28: ffff800035e74180
          x27: ffff000008912000 x26: 000000000000001d
          x25: 0000000000000124 x24: ffff000008f74d00
          x23: 0000004000114809 x22: 0000000000000000
          x21: ffff80003671bbd0 x20: 0000000000000000
          x19: ffff80003671bbd0 x18: 000000000000040d
          x17: 0000000000000001 x16: 0000000000000000
          x15: 0000000000000000 x14: ffffffffffffffff
          x13: 0000000000000000 x12: 0000000000000020
          x11: 0101010101010101 x10: fefefefefefefeff
          x9 : 7f7f7f7f7f7f7f7f x8 : fefefeff31677364
          x7 : 0000000080808080 x6 : ffff80003671bc9c
          x5 : ffff80003671b9f8 x4 : ffff80002c296190
          x3 : 0000000000000000 x2 : 0000000000000000
          x1 : ffff80003671bbd0 x0 : ffff80003671bc00
          Process wickedd (pid: 638, stack limit = 0xffff800036718000)
          Call trace:
          Exception stack(0xffff80003671b9e0 to 0xffff80003671bb20)
          b9e0: ffff80003671bc00 ffff80003671bbd0 0000000000000000 0000000000000000
          ba00: ffff80002c296190 ffff80003671b9f8 ffff80003671bc9c 0000000080808080
          ba20: fefefeff31677364 7f7f7f7f7f7f7f7f fefefefefefefeff 0101010101010101
          ba40: 0000000000000020 0000000000000000 ffffffffffffffff 0000000000000000
          ba60: 0000000000000000 0000000000000001 000000000000040d ffff80003671bbd0
          ba80: 0000000000000000 ffff80003671bbd0 0000000000000000 0000004000114809
          baa0: ffff000008f74d00 0000000000000124 000000000000001d ffff000008912000
          bac0: ffff800035e74180 ffff80003671bb20 ffff000000dcca84 ffff80003671bb20
          bae0: ffff0000086f7f30 0000000020000005 ffff80002c296000 ffff800035223900
          bb00: 0000ffffffffffff 0000000000000000 ffff80003671bb20 ffff0000086f7f30
          [<ffff0000086f7f30>] phy_ethtool_ksettings_get+0x20/0x98
          [<ffff000000dcca84>] lan78xx_get_link_ksettings+0x44/0x60 [lan78xx]
          [<ffff0000087cbc40>] ethtool_get_settings+0x68/0x210
          [<ffff0000087cc0d4>] dev_ethtool+0x214/0x2180
          [<ffff0000087e5008>] dev_ioctl+0x400/0x630
          [<ffff00000879dd00>] sock_do_ioctl+0x70/0x88
          [<ffff00000879f5f8>] sock_ioctl+0x208/0x368
          [<ffff0000082cde10>] do_vfs_ioctl+0xb0/0x848
          [<ffff0000082ce634>] SyS_ioctl+0x8c/0xa8
          Exception stack(0xffff80003671bec0 to 0xffff80003671c000)
          bec0: 0000000000000009 0000000000008946 0000fffff4e841d0 0000aa0032687465
          bee0: 0000aaaafa2319d4 0000fffff4e841d4 0000000032687465 0000000032687465
          bf00: 000000000000001d 7f7fff7f7f7f7f7f 72606b622e71ff4c 7f7f7f7f7f7f7f7f
          bf20: 0101010101010101 0000000000000020 ffffffffffffffff 0000ffff7f510c68
          bf40: 0000ffff7f6a9d18 0000ffff7f44ce30 000000000000040d 0000ffff7f6f98f0
          bf60: 0000fffff4e842c0 0000000000000001 0000aaaafa2c2e00 0000ffff7f6ab000
          bf80: 0000fffff4e842c0 0000ffff7f62a000 0000aaaafa2b9f20 0000aaaafa2c2e00
          bfa0: 0000fffff4e84818 0000fffff4e841a0 0000ffff7f5ad0cc 0000fffff4e841a0
          bfc0: 0000ffff7f44ce3c 0000000080000000 0000000000000009 000000000000001d
          bfe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
      
      The culprit is quite simple: The driver tries to access the phy left and right,
      but only actually has a working reference to it when the device is up.
      
      The fix thus is quite simple too: Get a reference to the phy on probe already
      and keep it even when the device is going down.
      
      With this patch applied, I can successfully run wicked on my system and bring
      the interface up and down as many times as I want, without getting NULL pointer
      dereferences in between.
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6d03ff16
    • Sean Christopherson's avatar
      KVM: VMX: raise internal error for exception during invalid protected mode state · 80b8f3da
      Sean Christopherson authored
      [ Upstream commit add5ff7a ]
      
      Exit to userspace with KVM_INTERNAL_ERROR_EMULATION if we encounter
      an exception in Protected Mode while emulating guest due to invalid
      guest state.  Unlike Big RM, KVM doesn't support emulating exceptions
      in PM, i.e. PM exceptions are always injected via the VMCS.  Because
      we will never do VMRESUME due to emulation_required, the exception is
      never realized and we'll keep emulating the faulting instruction over
      and over until we receive a signal.
      
      Exit to userspace iff there is a pending exception, i.e. don't exit
      simply on a requested event. The purpose of this check and exit is to
      aid in debugging a guest that is in all likelihood already doomed.
      Invalid guest state in PM is extremely limited in normal operation,
      e.g. it generally only occurs for a few instructions early in BIOS,
      and any exception at this time is all but guaranteed to be fatal.
      Non-vectored interrupts, e.g. INIT, SIPI and SMI, can be cleanly
      handled/emulated, while checking for vectored interrupts, e.g. INTR
      and NMI, without hitting false positives would add a fair amount of
      complexity for almost no benefit (getting hit by lightning seems
      more likely than encountering this specific scenario).
      
      Add a WARN_ON_ONCE to vmx_queue_exception() if we try to inject an
      exception via the VMCS and emulation_required is true.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      80b8f3da
    • Sai Praneeth's avatar
      x86/mm: Fix bogus warning during EFI bootup, use boot_cpu_has() instead of... · fd97bbca
      Sai Praneeth authored
      x86/mm: Fix bogus warning during EFI bootup, use boot_cpu_has() instead of this_cpu_has() in build_cr3_noflush()
      
      [ Upstream commit 162ee5a8 ]
      
      Linus reported the following boot warning:
      
        WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/tlbflush.h:134 load_new_mm_cr3+0x114/0x170
        [...]
        Call Trace:
        switch_mm_irqs_off+0x267/0x590
        switch_mm+0xe/0x20
        efi_switch_mm+0x3e/0x50
        efi_enter_virtual_mode+0x43f/0x4da
        start_kernel+0x3bf/0x458
        secondary_startup_64+0xa5/0xb0
      
      ... after merging:
      
        03781e40: x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3
      
      When the platform supports PCID and if CONFIG_DEBUG_VM=y is enabled,
      build_cr3_noflush() (called via switch_mm()) does a sanity check to see
      if X86_FEATURE_PCID is set.
      
      Presently, build_cr3_noflush() uses "this_cpu_has(X86_FEATURE_PCID)" to
      perform the check but this_cpu_has() works only after SMP is initialized
      (i.e. per cpu cpu_info's should be populated) and this happens to be very
      late in the boot process (during rest_init()).
      
      As efi_runtime_services() are called during (early) kernel boot time
      and run time, modify build_cr3_noflush() to use boot_cpu_has() all the
      time. As suggested by Dave Hansen, this should be OK because all CPU's have
      same capabilities on x86.
      
      With this change the warning is fixed.
      
      ( Dave also suggested that we put a warning in this_cpu_has() if it's used
        early in the boot process. This is still work in progress as it affects
        MCE. )
      Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Lee Chun-Yi <jlee@suse.com>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Shankar <ravi.v.shankar@intel.com>
      Cc: Ricardo Neri <ricardo.neri@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/1522870459-7432-1-git-send-email-sai.praneeth.prakhya@intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fd97bbca
    • Davidlohr Bueso's avatar
      sched/rt: Fix rq->clock_update_flags < RQCF_ACT_SKIP warning · 3aeaeecd
      Davidlohr Bueso authored
      [ Upstream commit d29a2064 ]
      
      While running rt-tests' pi_stress program I got the following splat:
      
        rq->clock_update_flags < RQCF_ACT_SKIP
        WARNING: CPU: 27 PID: 0 at kernel/sched/sched.h:960 assert_clock_updated.isra.38.part.39+0x13/0x20
      
        [...]
      
        <IRQ>
        enqueue_top_rt_rq+0xf4/0x150
        ? cpufreq_dbs_governor_start+0x170/0x170
        sched_rt_rq_enqueue+0x65/0x80
        sched_rt_period_timer+0x156/0x360
        ? sched_rt_rq_enqueue+0x80/0x80
        __hrtimer_run_queues+0xfa/0x260
        hrtimer_interrupt+0xcb/0x220
        smp_apic_timer_interrupt+0x62/0x120
        apic_timer_interrupt+0xf/0x20
        </IRQ>
      
        [...]
      
        do_idle+0x183/0x1e0
        cpu_startup_entry+0x5f/0x70
        start_secondary+0x192/0x1d0
        secondary_startup_64+0xa5/0xb0
      
      We can get rid of it be the "traditional" means of adding an
      update_rq_clock() call after acquiring the rq->lock in
      do_sched_rt_period_timer().
      
      The case for the RT task throttling (which this workload also hits)
      can be ignored in that the skip_update call is actually bogus and
      quite the contrary (the request bits are removed/reverted).
      
      By setting RQCF_UPDATED we really don't care if the skip is happening
      or not and will therefore make the assert_clock_updated() check happy.
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Reviewed-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dave@stgolabs.net
      Cc: linux-kernel@vger.kernel.org
      Cc: rostedt@goodmis.org
      Link: http://lkml.kernel.org/r/20180402164954.16255-1-dave@stgolabs.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3aeaeecd
    • Nicholas Piggin's avatar
      powerpc/64s/idle: Fix restore of AMOR on POWER9 after deep sleep · be6a5ad5
      Nicholas Piggin authored
      [ Upstream commit c1b25a17 ]
      
      POWER8 restores AMOR when waking from deep sleep, but POWER9 does not,
      because it does not go through the subcore restore.
      
      Have POWER9 restore it in core restore.
      
      Fixes: ee97b6b9 ("powerpc/mm/radix: Setup AMOR in HV mode to allow key 0")
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      be6a5ad5
    • Jun Piao's avatar
      ocfs2/dlm: don't handle migrate lockres if already in shutdown · 839c27f7
      Jun Piao authored
      [ Upstream commit bb34f24c ]
      
      We should not handle migrate lockres if we are already in
      'DLM_CTXT_IN_SHUTDOWN', as that will cause lockres remains after leaving
      dlm domain.  At last other nodes will get stuck into infinite loop when
      requsting lock from us.
      
      The problem is caused by concurrency umount between nodes.  Before
      receiveing N1's DLM_BEGIN_EXIT_DOMAIN_MSG, N2 has picked up N1 as the
      migrate target.  So N2 will continue sending lockres to N1 even though
      N1 has left domain.
      
              N1                             N2 (owner)
                                             touch file
      
          access the file,
          and get pr lock
      
                                             begin leave domain and
                                             pick up N1 as new owner
      
          begin leave domain and
          migrate all lockres done
      
                                             begin migrate lockres to N1
      
          end leave domain, but
          the lockres left
          unexpectedly, because
          migrate task has passed
      
      [piaojun@huawei.com: v3]
        Link: http://lkml.kernel.org/r/5A9CBD19.5020107@huawei.com
      Link: http://lkml.kernel.org/r/5A99F028.2090902@huawei.comSigned-off-by: default avatarJun Piao <piaojun@huawei.com>
      Reviewed-by: default avatarYiwen Jiang <jiangyiwen@huawei.com>
      Reviewed-by: default avatarJoseph Qi <jiangqi903@gmail.com>
      Reviewed-by: default avatarChangwei Ge <ge.changwei@h3c.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      839c27f7
    • Mikhail Malygin's avatar
      IB/rxe: Fix for oops in rxe_register_device on ppc64le arch · 9ebe2977
      Mikhail Malygin authored
      [ Upstream commit efc365e7 ]
      
      On ppc64le arch rxe_add command causes oops in kernel log:
      
      [   92.495140] Oops: Kernel access of bad area, sig: 11 [#1]
      [   92.499710] SMP NR_CPUS=2048 NUMA pSeries
      [   92.499792] Modules linked in: ipt_MASQUERADE(E) nf_nat_masquerade_ipv4(E) nf_conntrack_netlink(E) nfnetlink(E) xfrm_user(E) iptable
      _nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E) xt_addrtype(E) iptable_filter(E) ip_tables(E) xt_conntrack(E) x_tables(E)
       nf_nat(E) nf_conntrack(E) br_netfilter(E) bridge(E) stp(E) llc(E) overlay(E) af_packet(E) rpcrdma(E) ib_isert(E) iscsi_target_mod(E) i
      b_iser(E) libiscsi(E) ib_srpt(E) target_core_mod(E) ib_srp(E) ib_ipoib(E) rdma_ucm(E) ib_ucm(E) ib_uverbs(E) ib_umad(E) bochs_drm(E) tt
      m(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) drm(E) agpgart(E) virtio_rng(E) virtio_console(E) rtc_
      generic(E) dm_ec(OEN) ttln_rdma(OEN) rdma_cm(E) configfs(E) iw_cm(E) ib_cm(E) rdma_rxe(E) ip6_udp_tunnel(E) udp_tunnel(E) ib_core(E) ql
      a2xxx(E)
      [   92.499832]  scsi_transport_fc(E) nvme_fc(E) nvme_fabrics(E) nvme_core(E) ipmi_watchdog(E) ipmi_ssif(E) ipmi_poweroff(E) ipmi_powernv(EX) ipmi_devintf(E) ipmi_msghandler(E) dummy(E) ext4(E) crc16(E) jbd2(E) mbcache(E) dm_service_time(E) scsi_transport_iscsi(E) sd_mod(E) sr_mod(E) cdrom(E) hid_generic(E) usbhid(E) virtio_blk(E) virtio_scsi(E) virtio_net(E) ibmvscsi(EX) scsi_transport_srp(E) xhci_pci(E) xhci_hcd(E) usbcore(E) usb_common(E) virtio_pci(E) virtio_ring(E) virtio(E) sunrpc(E) dm_mirror(E) dm_region_hash(E) dm_log(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) autofs4(E)
      [   92.499834] Supported: No, Unsupported modules are loaded
      [   92.499839] CPU: 3 PID: 5576 Comm: sh Tainted: G           OE   NX 4.4.120-ttln.17-default #1
      [   92.499841] task: c0000000afe8a490 ti: c0000000beba8000 task.ti: c0000000beba8000
      [   92.499842] NIP: c00000000008ba3c LR: c000000000027644 CTR: c00000000008ba10
      [   92.499844] REGS: c0000000bebab750 TRAP: 0300   Tainted: G           OE   NX  (4.4.120-ttln.17-default)
      [   92.499850] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28424428  XER: 20000000
      [   92.499871] CFAR: 0000000000002424 DAR: 0000000000000208 DSISR: 40000000 SOFTE: 1
                     GPR00: c000000000027644 c0000000bebab9d0 c000000000f09700 0000000000000000
                     GPR04: d0000000043d7192 0000000000000002 000000000000001a fffffffffffffffe
                     GPR08: 000000000000009c c00000000008ba10 d0000000043e5848 d0000000043d3828
                     GPR12: c00000000008ba10 c000000007a02400 0000000010062e38 0000010020388860
                     GPR16: 0000000000000000 0000000000000000 00000100203885f0 00000000100f6c98
                     GPR20: c0000000b3f1fcc0 c0000000b3f1fc48 c0000000b3f1fbd0 c0000000b3f1fb58
                     GPR24: c0000000b3f1fae0 c0000000b3f1fa68 00000000000005dc c0000000b3f1f9f0
                     GPR28: d0000000043e5848 c0000000b3f1f900 c0000000b3f1f320 c0000000b3f1f000
      [   92.499881] NIP [c00000000008ba3c] dma_get_required_mask_pSeriesLP+0x2c/0x1a0
      [   92.499885] LR [c000000000027644] dma_get_required_mask+0x44/0xac
      [   92.499886] Call Trace:
      [   92.499891] [c0000000bebab9d0] [c0000000bebaba30] 0xc0000000bebaba30 (unreliable)
      [   92.499894] [c0000000bebaba10] [c000000000027644] dma_get_required_mask+0x44/0xac
      [   92.499904] [c0000000bebaba30] [d0000000043cb4b4] rxe_register_device+0xc4/0x430 [rdma_rxe]
      [   92.499910] [c0000000bebabab0] [d0000000043c06c8] rxe_add+0x448/0x4e0 [rdma_rxe]
      [   92.499915] [c0000000bebabb30] [d0000000043d28dc] rxe_net_add+0x4c/0xf0 [rdma_rxe]
      [   92.499921] [c0000000bebabb60] [d0000000043d305c] rxe_param_set_add+0x6c/0x1ac [rdma_rxe]
      [   92.499924] [c0000000bebabbf0] [c0000000000e78c0] param_attr_store+0xa0/0x180
      [   92.499927] [c0000000bebabc70] [c0000000000e6448] module_attr_store+0x48/0x70
      [   92.499932] [c0000000bebabc90] [c000000000391f60] sysfs_kf_write+0x70/0xb0
      [   92.499935] [c0000000bebabcb0] [c000000000390f1c] kernfs_fop_write+0x18c/0x1e0
      [   92.499939] [c0000000bebabd00] [c0000000002e22ac] __vfs_write+0x4c/0x1d0
      [   92.499942] [c0000000bebabd90] [c0000000002e2f94] vfs_write+0xc4/0x200
      [   92.499945] [c0000000bebabde0] [c0000000002e488c] SyS_write+0x6c/0x110
      [   92.499948] [c0000000bebabe30] [c000000000009384] system_call+0x38/0xe4
      [   92.499949] Instruction dump:
      [   92.499954] 4e800020 3c4c00e8 3842dcf0 7c0802a6 f8010010 60000000 7c0802a6 fba1ffe8
      [   92.499958] fbc1fff0 fbe1fff8 f8010010 f821ffc1 <e9230208> 7c7e1b78 2fa90000 419e0078
      [   92.499962] ---[ end trace bed077e15eb420cf ]---
      
      It fails in dma_get_required_mask, that has ppc-specific implementation,
      and fail if provided device argument is NULL
      Signed-off-by: default avatarMikhail Malygin <mikhail@malygin.me>
      Reviewed-by: default avatarYonatan Cohen <yonatanc@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9ebe2977
    • Nikolay Borisov's avatar
      btrfs: Fix possible softlock on single core machines · 370b3353
      Nikolay Borisov authored
      [ Upstream commit 1e1c50a9 ]
      
      do_chunk_alloc implements a loop checking whether there is a pending
      chunk allocation and if so causes the caller do loop. Generally this
      loop is executed only once, however testing with btrfs/072 on a single
      core vm machines uncovered an extreme case where the system could loop
      indefinitely. This is due to a missing cond_resched when loop which
      doesn't give a chance to the previous chunk allocator finish its job.
      
      The fix is to simply add the missing cond_resched.
      
      Fixes: 6d74119f ("Btrfs: avoid taking the chunk_mutex in do_chunk_alloc")
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      370b3353
    • Liu Bo's avatar
      Btrfs: fix NULL pointer dereference in log_dir_items · acfd8e88
      Liu Bo authored
      [ Upstream commit 80c0b421 ]
      
      0, 1 and <0 can be returned by btrfs_next_leaf(), and when <0 is
      returned, path->nodes[0] could be NULL, log_dir_items lacks such a
      check for <0 and we may run into a null pointer dereference panic.
      
      Fixes: e02119d5 ("Btrfs: Add a write ahead tree log to optimize synchronous operations")
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarLiu Bo <bo.liu@linux.alibaba.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      acfd8e88
    • Liu Bo's avatar
      Btrfs: bail out on error during replay_dir_deletes · afef64b1
      Liu Bo authored
      [ Upstream commit b98def7c ]
      
      If errors were returned by btrfs_next_leaf(), replay_dir_deletes needs
      to bail out, otherwise @ret would be forced to be 0 after 'break;' and
      the caller won't be aware of it.
      
      Fixes: e02119d5 ("Btrfs: Add a write ahead tree log to optimize synchronous operations")
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarLiu Bo <bo.liu@linux.alibaba.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      afef64b1
    • Yang Shi's avatar
      mm: thp: fix potential clearing to referenced flag in page_idle_clear_pte_refs_one() · 5ade3c96
      Yang Shi authored
      [ Upstream commit f0849ac0 ]
      
      For PTE-mapped THP, the compound THP has not been split to normal 4K
      pages yet, the whole THP is considered referenced if any one of sub page
      is referenced.
      
      When walking PTE-mapped THP by pvmw, all relevant PTEs will be checked
      to retrieve referenced bit.  But, the current code just returns the
      result of the last PTE.  If the last PTE has not referenced, the
      referenced flag will be cleared.
      
      Just set referenced when ptep{pmdp}_clear_young_notify() returns true.
      
      Link: http://lkml.kernel.org/r/1518212451-87134-1-git-send-email-yang.shi@linux.alibaba.comSigned-off-by: default avatarYang Shi <yang.shi@linux.alibaba.com>
      Reported-by: default avatarGang Deng <gavin.dg@linux.alibaba.com>
      Suggested-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5ade3c96
    • Huang Ying's avatar
      mm: fix races between address_space dereference and free in page_evicatable · 8d700626
      Huang Ying authored
      [ Upstream commit e92bb4dd ]
      
      When page_mapping() is called and the mapping is dereferenced in
      page_evicatable() through shrink_active_list(), it is possible for the
      inode to be truncated and the embedded address space to be freed at the
      same time.  This may lead to the following race.
      
      CPU1                                                CPU2
      
      truncate(inode)                                     shrink_active_list()
        ...                                                 page_evictable(page)
        truncate_inode_page(mapping, page);
          delete_from_page_cache(page)
            spin_lock_irqsave(&mapping->tree_lock, flags);
              __delete_from_page_cache(page, NULL)
                page_cache_tree_delete(..)
                  ...                                         mapping = page_mapping(page);
                  page->mapping = NULL;
                  ...
            spin_unlock_irqrestore(&mapping->tree_lock, flags);
            page_cache_free_page(mapping, page)
              put_page(page)
                if (put_page_testzero(page)) -> false
      - inode now has no pages and can be freed including embedded address_space
      
                                                              mapping_unevictable(mapping)
      							  test_bit(AS_UNEVICTABLE, &mapping->flags);
      - we've dereferenced mapping which is potentially already free.
      
      Similar race exists between swap cache freeing and page_evicatable()
      too.
      
      The address_space in inode and swap cache will be freed after a RCU
      grace period.  So the races are fixed via enclosing the page_mapping()
      and address_space usage in rcu_read_lock/unlock().  Some comments are
      added in code to make it clear what is protected by the RCU read lock.
      
      Link: http://lkml.kernel.org/r/20180212081227.1940-1-ying.huang@intel.comSigned-off-by: default avatar"Huang, Ying" <ying.huang@intel.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8d700626
    • Claudio Imbrenda's avatar
      mm/ksm: fix interaction with THP · 763111d9
      Claudio Imbrenda authored
      [ Upstream commit 77da2ba0 ]
      
      This patch fixes a corner case for KSM.  When two pages belong or
      belonged to the same transparent hugepage, and they should be merged,
      KSM fails to split the page, and therefore no merging happens.
      
      This bug can be reproduced by:
      * making sure ksm is running (in case disabling ksmtuned)
      * enabling transparent hugepages
      * allocating a THP-aligned 1-THP-sized buffer
        e.g. on amd64: posix_memalign(&p, 1<<21, 1<<21)
      * filling it with the same values
        e.g. memset(p, 42, 1<<21)
      * performing madvise to make it mergeable
        e.g. madvise(p, 1<<21, MADV_MERGEABLE)
      * waiting for KSM to perform a few scans
      
      The expected outcome is that the all the pages get merged (1 shared and
      the rest sharing); the actual outcome is that no pages get merged (1
      unshared and the rest volatile)
      
      The reason of this behaviour is that we increase the reference count
      once for both pages we want to merge, but if they belong to the same
      hugepage (or compound page), the reference counter used in both cases is
      the one of the head of the compound page.  This means that
      split_huge_page will find a value of the reference counter too high and
      will fail.
      
      This patch solves this problem by testing if the two pages to merge
      belong to the same hugepage when attempting to merge them.  If so, the
      hugepage is split safely.  This means that the hugepage is not split if
      not necessary.
      
      Link: http://lkml.kernel.org/r/1521548069-24758-1-git-send-email-imbrenda@linux.vnet.ibm.comSigned-off-by: default avatarClaudio Imbrenda <imbrenda@linux.vnet.ibm.com>
      Co-authored-by: default avatarGerald Schaefer <gerald.schaefer@de.ibm.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      763111d9
    • Thomas Falcon's avatar
      ibmvnic: Zero used TX descriptor counter on reset · 378a1e49
      Thomas Falcon authored
      [ Upstream commit 41f71467 ]
      
      The counter that tracks used TX descriptors pending completion
      needs to be zeroed as part of a device reset. This change fixes
      a bug causing transmit queues to be stopped unnecessarily and in
      some cases a transmit queue stall and timeout reset. If the counter
      is not reset, the remaining descriptors will not be "removed",
      effectively reducing queue capacity. If the queue is over half full,
      it will cause the queue to stall if stopped.
      Signed-off-by: default avatarThomas Falcon <tlfalcon@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      378a1e49
    • Esben Haabendal's avatar
      dp83640: Ensure against premature access to PHY registers after reset · d04e5e72
      Esben Haabendal authored
      [ Upstream commit 76327a35 ]
      
      The datasheet specifies a 3uS pause after performing a software
      reset. The default implementation of genphy_soft_reset() does not
      provide this, so implement soft_reset with the needed pause.
      Signed-off-by: default avatarEsben Haabendal <eha@deif.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d04e5e72
    • Sandipan Das's avatar
      perf clang: Add support for recent clang versions · 4be06bc0
      Sandipan Das authored
      [ Upstream commit 7854e499 ]
      
      The clang API calls used by perf have changed in recent releases and
      builds succeed with libclang-3.9 only. This introduces compatibility
      with libclang-4.0 and above.
      
      Without this patch, we will see the following compilation errors with
      libclang-4.0+:
      
       util/c++/clang.cpp: In function ‘clang::CompilerInvocation* perf::createCompilerInvocation(llvm::opt::ArgStringList, llvm::StringRef&, clang::DiagnosticsEngine&)’:
       util/c++/clang.cpp:62:33: error: ‘IK_C’ was not declared in this scope
         Opts.Inputs.emplace_back(Path, IK_C);
                                        ^~~~
       util/c++/clang.cpp: In function ‘std::unique_ptr<llvm::Module> perf::getModuleFromSource(llvm::opt::ArgStringList, llvm::StringRef, llvm::IntrusiveRefCntPtr<clang::vfs::FileSystem>)’:
       util/c++/clang.cpp:75:26: error: no matching function for call to ‘clang::CompilerInstance::setInvocation(clang::CompilerInvocation*)’
         Clang.setInvocation(&*CI);
                                 ^
       In file included from util/c++/clang.cpp:14:0:
       /usr/include/clang/Frontend/CompilerInstance.h:231:8: note: candidate: void clang::CompilerInstance::setInvocation(std::shared_ptr<clang::CompilerInvocation>)
          void setInvocation(std::shared_ptr<CompilerInvocation> Value);
               ^~~~~~~~~~~~~
      
      Committer testing:
      
      Tested on Fedora 27 after installing the clang-devel and llvm-devel
      packages, versions:
      
        # rpm -qa | egrep llvm\|clang
        llvm-5.0.1-6.fc27.x86_64
        clang-libs-5.0.1-5.fc27.x86_64
        clang-5.0.1-5.fc27.x86_64
        clang-tools-extra-5.0.1-5.fc27.x86_64
        llvm-libs-5.0.1-6.fc27.x86_64
        llvm-devel-5.0.1-6.fc27.x86_64
        clang-devel-5.0.1-5.fc27.x86_64
        #
      
      Make sure you don't have some older version lying around in /usr/local,
      etc, then:
      
        $ make LIBCLANGLLVM=1 -C tools/perf install-bin
      
      And in the end perf will be linked agains these libraries:
      
        # ldd ~/bin/perf | egrep -i llvm\|clang
      	libclangAST.so.5 => /lib64/libclangAST.so.5 (0x00007f8bb2eb4000)
      	libclangBasic.so.5 => /lib64/libclangBasic.so.5 (0x00007f8bb29e3000)
      	libclangCodeGen.so.5 => /lib64/libclangCodeGen.so.5 (0x00007f8bb23f7000)
      	libclangDriver.so.5 => /lib64/libclangDriver.so.5 (0x00007f8bb2060000)
      	libclangFrontend.so.5 => /lib64/libclangFrontend.so.5 (0x00007f8bb1d06000)
      	libclangLex.so.5 => /lib64/libclangLex.so.5 (0x00007f8bb1a3e000)
      	libclangTooling.so.5 => /lib64/libclangTooling.so.5 (0x00007f8bb17d4000)
      	libclangEdit.so.5 => /lib64/libclangEdit.so.5 (0x00007f8bb15c5000)
      	libclangSema.so.5 => /lib64/libclangSema.so.5 (0x00007f8bb0cc9000)
      	libclangAnalysis.so.5 => /lib64/libclangAnalysis.so.5 (0x00007f8bb0a23000)
      	libclangParse.so.5 => /lib64/libclangParse.so.5 (0x00007f8bb0725000)
      	libclangSerialization.so.5 => /lib64/libclangSerialization.so.5 (0x00007f8bb039a000)
      	libLLVM-5.0.so => /lib64/libLLVM-5.0.so (0x00007f8bace98000)
      	libclangASTMatchers.so.5 => /lib64/../lib64/libclangASTMatchers.so.5 (0x00007f8bab735000)
      	libclangFormat.so.5 => /lib64/../lib64/libclangFormat.so.5 (0x00007f8bab4b2000)
      	libclangRewrite.so.5 => /lib64/../lib64/libclangRewrite.so.5 (0x00007f8bab2a1000)
      	libclangToolingCore.so.5 => /lib64/../lib64/libclangToolingCore.so.5 (0x00007f8bab08e000)
        #
      Signed-off-by: default avatarSandipan Das <sandipan@linux.vnet.ibm.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Fixes: 00b86691 ("perf clang: Add builtin clang support ant test case")
      Link: http://lkml.kernel.org/r/20180404180419.19056-2-sandipan@linux.vnet.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4be06bc0
    • Sandipan Das's avatar
      perf tools: Fix perf builds with clang support · ee7c28b2
      Sandipan Das authored
      [ Upstream commit c2fb54a1 ]
      
      For libclang, some distro packages provide static libraries (.a) while
      some provide shared libraries (.so). Currently, perf code can only be
      linked with static libraries. This makes perf build possible for both
      cases.
      Signed-off-by: default avatarSandipan Das <sandipan@linux.vnet.ibm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Fixes: d58ac0bf ("perf build: Add clang and llvm compile and linking support")
      Link: http://lkml.kernel.org/r/20180404180419.19056-1-sandipan@linux.vnet.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ee7c28b2
    • Anshuman Khandual's avatar
      powerpc/fscr: Enable interrupts earlier before calling get_user() · 6689a4c7
      Anshuman Khandual authored
      [ Upstream commit 709b973c ]
      
      The function get_user() can sleep while trying to fetch instruction
      from user address space and causes the following warning from the
      scheduler.
      
      BUG: sleeping function called from invalid context
      
      Though interrupts get enabled back but it happens bit later after
      get_user() is called. This change moves enabling these interrupts
      earlier covering the function get_user(). While at this, lets check
      for kernel mode and crash as this interrupt should not have been
      triggered from the kernel context.
      Signed-off-by: default avatarAnshuman Khandual <khandual@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6689a4c7
    • Shunyong Yang's avatar
      cpufreq: CPPC: Initialize shared perf capabilities of CPUs · 96fdc64d
      Shunyong Yang authored
      [ Upstream commit 8913315e ]
      
      When multiple CPUs are related in one cpufreq policy, the first online
      CPU will be chosen by default to handle cpufreq operations. Let's take
      cpu0 and cpu1 as an example.
      
      When cpu0 is offline, policy->cpu will be shifted to cpu1. cpu1's perf
      capabilities should be initialized. Otherwise, perf capabilities are 0s
      and speed change can not take effect.
      
      This patch copies perf capabilities of the first online CPU to other
      shared CPUs when policy shared type is CPUFREQ_SHARED_TYPE_ANY.
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarShunyong Yang <shunyong.yang@hxt-semitech.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      96fdc64d
    • Carlos Maiolino's avatar
      Force log to disk before reading the AGF during a fstrim · 8bff7ca9
      Carlos Maiolino authored
      [ Upstream commit 8c81dd46 ]
      
      Forcing the log to disk after reading the agf is wrong, we might be
      calling xfs_log_force with XFS_LOG_SYNC with a metadata lock held.
      
      This can cause a deadlock when racing a fstrim with a filesystem
      shutdown.
      
      The deadlock has been identified due a miscalculation bug in device-mapper
      dm-thin, which returns lack of space to its users earlier than the device itself
      really runs out of space, changing the device-mapper volume into an error state.
      
      The problem happened while filling the filesystem with a single file,
      triggering the bug in device-mapper, consequently causing an IO error
      and shutting down the filesystem.
      
      If such file is removed, and fstrim executed before the XFS finishes the
      shut down process, the fstrim process will end up holding the buffer
      lock, and going to sleep on the cil wait queue.
      
      At this point, the shut down process will try to wake up all the threads
      waiting on the cil wait queue, but for this, it will try to hold the
      same buffer log already held my the fstrim, locking up the filesystem.
      Signed-off-by: default avatarCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8bff7ca9
    • Jens Axboe's avatar
      sr: get/drop reference to device in revalidate and check_events · 28143fe3
      Jens Axboe authored
      [ Upstream commit 2d097c50 ]
      
      We can't just use scsi_cd() to get the scsi_cd structure, we have
      to grab a live reference to the device. For both callbacks, we're
      not inside an open where we already hold a reference to the device.
      
      This fixes device removal/addition under concurrent device access,
      which otherwise could result in the below oops.
      
      NULL pointer dereference at 0000000000000010
      PGD 0 P4D 0
      Oops: 0000 [#1] PREEMPT SMP
      Modules linked in:
      sr 12:0:0:0: [sr2] scsi-1 drive
       scsi_debug crc_t10dif crct10dif_generic crct10dif_common nvme nvme_core sb_edac xl
      sr 12:0:0:0: Attached scsi CD-ROM sr2
       sr_mod cdrom btrfs xor zstd_decompress zstd_compress xxhash lzo_compress zlib_defc
      sr 12:0:0:0: Attached scsi generic sg7 type 5
       igb ahci libahci i2c_algo_bit libata dca [last unloaded: crc_t10dif]
      CPU: 43 PID: 4629 Comm: systemd-udevd Not tainted 4.16.0+ #650
      Hardware name: Dell Inc. PowerEdge T630/0NT78X, BIOS 2.3.4 11/09/2016
      RIP: 0010:sr_block_revalidate_disk+0x23/0x190 [sr_mod]
      RSP: 0018:ffff883ff357bb58 EFLAGS: 00010292
      RAX: ffffffffa00b07d0 RBX: ffff883ff3058000 RCX: ffff883ff357bb66
      RDX: 0000000000000003 RSI: 0000000000007530 RDI: ffff881fea631000
      RBP: 0000000000000000 R08: ffff881fe4d38400 R09: 0000000000000000
      R10: 0000000000000000 R11: 00000000000001b6 R12: 000000000800005d
      R13: 000000000800005d R14: ffff883ffd9b3790 R15: 0000000000000000
      FS:  00007f7dc8e6d8c0(0000) GS:ffff883fff340000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000010 CR3: 0000003ffda98005 CR4: 00000000003606e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       ? __invalidate_device+0x48/0x60
       check_disk_change+0x4c/0x60
       sr_block_open+0x16/0xd0 [sr_mod]
       __blkdev_get+0xb9/0x450
       ? iget5_locked+0x1c0/0x1e0
       blkdev_get+0x11e/0x320
       ? bdget+0x11d/0x150
       ? _raw_spin_unlock+0xa/0x20
       ? bd_acquire+0xc0/0xc0
       do_dentry_open+0x1b0/0x320
       ? inode_permission+0x24/0xc0
       path_openat+0x4e6/0x1420
       ? cpumask_any_but+0x1f/0x40
       ? flush_tlb_mm_range+0xa0/0x120
       do_filp_open+0x8c/0xf0
       ? __seccomp_filter+0x28/0x230
       ? _raw_spin_unlock+0xa/0x20
       ? __handle_mm_fault+0x7d6/0x9b0
       ? list_lru_add+0xa8/0xc0
       ? _raw_spin_unlock+0xa/0x20
       ? __alloc_fd+0xaf/0x160
       ? do_sys_open+0x1a6/0x230
       do_sys_open+0x1a6/0x230
       do_syscall_64+0x5a/0x100
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      Reviewed-by: default avatarLee Duncan <lduncan@suse.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      28143fe3