1. 18 Sep, 2014 40 commits
    • Filipe Manana's avatar
      Btrfs: fix csum tree corruption, duplicate and outdated checksums · 157de422
      Filipe Manana authored
      commit 27b9a812 upstream.
      
      Under rare circumstances we can end up leaving 2 versions of a checksum
      for the same file extent range.
      
      The reason for this is that after calling btrfs_next_leaf we process
      slot 0 of the leaf it returns, instead of processing the slot set in
      path->slots[0]. Most of the time (by far) path->slots[0] is 0, but after
      btrfs_next_leaf() releases the path and before it searches for the next
      leaf, another task might cause a split of the next leaf, which migrates
      some of its keys to the leaf we were processing before calling
      btrfs_next_leaf(). In this case btrfs_next_leaf() returns again the
      same leaf but with path->slots[0] having a slot number corresponding
      to the first new key it got, that is, a slot number that didn't exist
      before calling btrfs_next_leaf(), as the leaf now has more keys than
      it had before. So we must really process the returned leaf starting at
      path->slots[0] always, as it isn't always 0, and the key at slot 0 can
      have an offset much lower than our search offset/bytenr.
      
      For example, consider the following scenario, where we have:
      
      sums->bytenr: 40157184, sums->len: 16384, sums end: 40173568
      four 4kb file data blocks with offsets 40157184, 40161280, 40165376, 40169472
      
        Leaf N:
      
          slot = 0                           slot = btrfs_header_nritems() - 1
        |-------------------------------------------------------------------|
        | [(CSUM CSUM 39239680), size 8] ... [(CSUM CSUM 40116224), size 4] |
        |-------------------------------------------------------------------|
      
        Leaf N + 1:
      
            slot = 0                          slot = btrfs_header_nritems() - 1
        |--------------------------------------------------------------------|
        | [(CSUM CSUM 40161280), size 32] ... [((CSUM CSUM 40615936), size 8 |
        |--------------------------------------------------------------------|
      
      Because we are at the last slot of leaf N, we call btrfs_next_leaf() to
      find the next highest key, which releases the current path and then searches
      for that next key. However after releasing the path and before finding that
      next key, the item at slot 0 of leaf N + 1 gets moved to leaf N, due to a call
      to ctree.c:push_leaf_left() (via ctree.c:split_leaf()), and therefore
      btrfs_next_leaf() will returns us a path again with leaf N but with the slot
      pointing to its new last key (CSUM CSUM 40161280). This new version of leaf N
      is then:
      
          slot = 0                        slot = btrfs_header_nritems() - 2  slot = btrfs_header_nritems() - 1
        |----------------------------------------------------------------------------------------------------|
        | [(CSUM CSUM 39239680), size 8] ... [(CSUM CSUM 40116224), size 4]  [(CSUM CSUM 40161280), size 32] |
        |----------------------------------------------------------------------------------------------------|
      
      And incorrecly using slot 0, makes us set next_offset to 39239680 and we jump
      into the "insert:" label, which will set tmp to:
      
          tmp = min((sums->len - total_bytes) >> blocksize_bits,
              (next_offset - file_key.offset) >> blocksize_bits) =
          min((16384 - 0) >> 12, (39239680 - 40157184) >> 12) =
          min(4, (u64)-917504 = 18446744073708634112 >> 12) = 4
      
      and
      
         ins_size = csum_size * tmp = 4 * 4 = 16 bytes.
      
      In other words, we insert a new csum item in the tree with key
      (CSUM_OBJECTID CSUM_KEY 40157184 = sums->bytenr) that contains the checksums
      for all the data (4 blocks of 4096 bytes each = sums->len). Which is wrong,
      because the item with key (CSUM CSUM 40161280) (the one that was moved from
      leaf N + 1 to the end of leaf N) contains the old checksums of the last 12288
      bytes of our data and won't get those old checksums removed.
      
      So this leaves us 2 different checksums for 3 4kb blocks of data in the tree,
      and breaks the logical rule:
      
         Key_N+1.offset >= Key_N.offset + length_of_data_its_checksums_cover
      
      An obvious bad effect of this is that a subsequent csum tree lookup to get
      the checksum of any of the blocks with logical offset of 40161280, 40165376
      or 40169472 (the last 3 4kb blocks of file data), will get the old checksums.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      157de422
    • Takashi Iwai's avatar
      Btrfs: Fix memory corruption by ulist_add_merge() on 32bit arch · a4312389
      Takashi Iwai authored
      commit 4eb1f66d upstream.
      
      We've got bug reports that btrfs crashes when quota is enabled on
      32bit kernel, typically with the Oops like below:
       BUG: unable to handle kernel NULL pointer dereference at 00000004
       IP: [<f9234590>] find_parent_nodes+0x360/0x1380 [btrfs]
       *pde = 00000000
       Oops: 0000 [#1] SMP
       CPU: 0 PID: 151 Comm: kworker/u8:2 Tainted: G S      W 3.15.2-1.gd43d97e-default #1
       Workqueue: btrfs-qgroup-rescan normal_work_helper [btrfs]
       task: f1478130 ti: f147c000 task.ti: f147c000
       EIP: 0060:[<f9234590>] EFLAGS: 00010213 CPU: 0
       EIP is at find_parent_nodes+0x360/0x1380 [btrfs]
       EAX: f147dda8 EBX: f147ddb0 ECX: 00000011 EDX: 00000000
       ESI: 00000000 EDI: f147dda4 EBP: f147ddf8 ESP: f147dd38
        DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
       CR0: 8005003b CR2: 00000004 CR3: 00bf3000 CR4: 00000690
       Stack:
        00000000 00000000 f147dda4 00000050 00000001 00000000 00000001 00000050
        00000001 00000000 d3059000 00000001 00000022 000000a8 00000000 00000000
        00000000 000000a1 00000000 00000000 00000001 00000000 00000000 11800000
       Call Trace:
        [<f923564d>] __btrfs_find_all_roots+0x9d/0xf0 [btrfs]
        [<f9237bb1>] btrfs_qgroup_rescan_worker+0x401/0x760 [btrfs]
        [<f9206148>] normal_work_helper+0xc8/0x270 [btrfs]
        [<c025e38b>] process_one_work+0x11b/0x390
        [<c025eea1>] worker_thread+0x101/0x340
        [<c026432b>] kthread+0x9b/0xb0
        [<c0712a71>] ret_from_kernel_thread+0x21/0x30
        [<c0264290>] kthread_create_on_node+0x110/0x110
      
      This indicates a NULL corruption in prefs_delayed list.  The further
      investigation and bisection pointed that the call of ulist_add_merge()
      results in the corruption.
      
      ulist_add_merge() takes u64 as aux and writes a 64bit value into
      old_aux.  The callers of this function in backref.c, however, pass a
      pointer of a pointer to old_aux.  That is, the function overwrites
      64bit value on 32bit pointer.  This caused a NULL in the adjacent
      variable, in this case, prefs_delayed.
      
      Here is a quick attempt to band-aid over this: a new function,
      ulist_add_merge_ptr() is introduced to pass/store properly a pointer
      value instead of u64.  There are still ugly void ** cast remaining
      in the callers because void ** cannot be taken implicitly.  But, it's
      safer than explicit cast to u64, anyway.
      
      Bugzilla: https://bugzilla.novell.com/show_bug.cgi?id=887046Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      a4312389
    • Aneesh Kumar K.V's avatar
      powerpc/mm: Use read barrier when creating real_pte · ff72db9c
      Aneesh Kumar K.V authored
      commit 85c1fafd upstream.
      
      On ppc64 we support 4K hash pte with 64K page size. That requires
      us to track the hash pte slot information on a per 4k basis. We do that
      by storing the slot details in the second half of pte page. The pte bit
      _PAGE_COMBO is used to indicate whether the second half need to be
      looked while building real_pte. We need to use read memory barrier while
      doing that so that load of hidx is not reordered w.r.t _PAGE_COMBO
      check. On the store side we already do a lwsync in __hash_page_4K
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ff72db9c
    • Aneesh Kumar K.V's avatar
      powerpc/thp: Use ACCESS_ONCE when loading pmdp · 216e985c
      Aneesh Kumar K.V authored
      commit 7e467245 upstream.
      
      We would get wrong results in compiler recomputed old_pmd. Avoid
      that by using ACCESS_ONCE
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      216e985c
    • Aneesh Kumar K.V's avatar
      powerpc/thp: Invalidate with vpn in loop · ee1fd754
      Aneesh Kumar K.V authored
      commit 969b7b20 upstream.
      
      As per ISA, for 4k base page size we compare 14..65 bits of VA specified
      with the entry_VA in tlb. That implies we need to make sure we do a
      tlbie with all the possible 4k va we used to access the 16MB hugepage.
      With 64k base page size we compare 14..57 bits of VA. Hence we cannot
      ignore the lower 24 bits of va while tlbie .We also cannot tlb
      invalidate a 16MB entry with just one tlbie instruction because
      we don't track which va was used to instantiate the tlb entry.
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ee1fd754
    • Aneesh Kumar K.V's avatar
      powerpc/thp: Handle combo pages in invalidate · f837b291
      Aneesh Kumar K.V authored
      commit fc047955 upstream.
      
      If we changed base page size of the segment, either via sub_page_protect
      or via remap_4k_pfn, we do a demote_segment which doesn't flush the hash
      table entries. We do a lazy hash page table flush for all mapped pages
      in the demoted segment. This happens when we handle hash page fault for
      these pages.
      
      We use _PAGE_COMBO bit along with _PAGE_HASHPTE to indicate whether a
      pte is backed by 4K hash pte. If we find _PAGE_COMBO not set on the pte,
      that implies that we could possibly have older 64K hash pte entries in
      the hash page table and we need to invalidate those entries.
      
      Use _PAGE_COMBO to determine the page size with which we should
      invalidate the hash table entries on unmap.
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      f837b291
    • Aneesh Kumar K.V's avatar
      powerpc/thp: Invalidate old 64K based hash page mapping before insert of 4k pte · 0d03f310
      Aneesh Kumar K.V authored
      commit 629149fa upstream.
      
      If we changed base page size of the segment, either via sub_page_protect
      or via remap_4k_pfn, we do a demote_segment which doesn't flush the hash
      table entries. We do a lazy hash page table flush for all mapped pages
      in the demoted segment. This happens when we handle hash page fault
      for these pages.
      
      We use _PAGE_COMBO bit along with _PAGE_HASHPTE to indicate whether a
      pte is backed by 4K hash pte. If we find _PAGE_COMBO not set on the pte,
      that implies that we could possibly have older 64K hash pte entries in
      the hash page table and we need to invalidate those entries.
      
      Handle this correctly for 16M pages
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      [ kamal: backport to 3.13-stable: context ]
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      0d03f310
    • Aneesh Kumar K.V's avatar
      powerpc/thp: Don't recompute vsid and ssize in loop on invalidate · 31ea77a4
      Aneesh Kumar K.V authored
      commit fa1f8ae8 upstream.
      
      The segment identifier and segment size will remain the same in
      the loop, So we can compute it outside. We also change the
      hugepage_invalidate interface so that we can use it the later patch
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      31ea77a4
    • Aneesh Kumar K.V's avatar
      powerpc/thp: Add write barrier after updating the valid bit · 5eb4b79d
      Aneesh Kumar K.V authored
      commit b0aa44a3 upstream.
      
      With hugepages, we store the hpte valid information in the pte page
      whose address is stored in the second half of the PMD. Use a
      write barrier to make sure clearing pmd busy bit and updating
      hpte valid info are ordered properly.
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      5eb4b79d
    • Gavin Shan's avatar
      powerpc/pseries: Failure on removing device node · f756490a
      Gavin Shan authored
      commit f1b3929c upstream.
      
      While running command "drmgr -c phb -r -s 'PHB 528'", following
      backtrace jumped out because the target device node isn't marked
      with OF_DETACHED by of_detach_node(), which caused by error
      returned from memory hotplug related reconfig notifier when
      disabling CONFIG_MEMORY_HOTREMOVE. The patch fixes it.
      
      ERROR: Bad of_node_put() on /pci@800000020000210/ethernet@0
      CPU: 14 PID: 2252 Comm: drmgr Tainted: G        W     3.16.0+ #427
      Call Trace:
      [c000000012a776a0] [c000000000013d9c] .show_stack+0x88/0x148 (unreliable)
      [c000000012a77750] [c00000000083cd34] .dump_stack+0x7c/0x9c
      [c000000012a777d0] [c0000000006807c4] .of_node_release+0x58/0xe0
      [c000000012a77860] [c00000000038a7d0] .kobject_release+0x174/0x1b8
      [c000000012a77900] [c00000000038a884] .kobject_put+0x70/0x78
      [c000000012a77980] [c000000000681680] .of_node_put+0x28/0x34
      [c000000012a77a00] [c000000000681ea8] .__of_get_next_child+0x64/0x70
      [c000000012a77a90] [c000000000682138] .of_find_node_by_path+0x1b8/0x20c
      [c000000012a77b40] [c000000000051840] .ofdt_write+0x308/0x688
      [c000000012a77c20] [c000000000238430] .proc_reg_write+0xb8/0xd4
      [c000000012a77cd0] [c0000000001cbeac] .vfs_write+0xec/0x1f8
      [c000000012a77d70] [c0000000001cc3b0] .SyS_write+0x58/0xa0
      [c000000012a77e30] [c00000000000a064] syscall_exit+0x0/0x98
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      f756490a
    • Ronald Wahl's avatar
      carl9170: fix sending URBs with wrong type when using full-speed · 0aef7cc1
      Ronald Wahl authored
      commit 671796dd upstream.
      
      The driver assumes that endpoint 4 is always an interrupt endpoint.
      Unfortunately the type differs between high-speed and full-speed
      configurations while in the former case it is indeed an interrupt
      endpoint this is not true for the latter case - here it is a bulk
      endpoint. When sending URBs with the wrong type the kernel will
      generate a warning message including backtrace. In this specific
      case there will be a huge amount of warnings which can bring the system
      to freeze.
      
      To fix this we are now sending URBs to endpoint 4 using the type
      found in the endpoint descriptor.
      
      A side note: The carl9170 firmware currently specifies endpoint 4 as
      interrupt endpoint even in the full-speed configuration but this has
      no relevance because before this firmware is loaded the endpoint type
      is as described above and after the firmware is running the stick is not
      reenumerated and so the old descriptor is used.
      Signed-off-by: default avatarRonald Wahl <ronald.wahl@raritan.com>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      0aef7cc1
    • David Vrabel's avatar
      x86/xen: resume timer irqs early · bf8e7c35
      David Vrabel authored
      commit 8d5999df upstream.
      
      If the timer irqs are resumed during device resume it is possible in
      certain circumstances for the resume to hang early on, before device
      interrupts are resumed.  For an Ubuntu 14.04 PVHVM guest this would
      occur in ~0.5% of resume attempts.
      
      It is not entirely clear what is occuring the point of the hang but I
      think a task necessary for the resume calls schedule_timeout(),
      waiting for a timer interrupt (which never arrives).  This failure may
      require specific tasks to be running on the other VCPUs to trigger
      (processes are not frozen during a suspend/resume if PREEMPT is
      disabled).
      
      Add IRQF_EARLY_RESUME to the timer interrupts so they are resumed in
      syscore_resume().
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      bf8e7c35
    • Takashi Iwai's avatar
      ALSA: hda/ca0132 - Don't try loading firmware at resume when already failed · b74103cb
      Takashi Iwai authored
      commit e24aa0a4 upstream.
      
      CA0132 driver tries to reload the firmware at resume.  Usually this
      works since the firmware loader core caches the firmware contents by
      itself.  However, if the driver failed to load the firmwares
      (e.g. missing files), reloading the firmware at resume goes through
      the actual file loading code path, and triggers a kernel WARNING like:
      
       WARNING: CPU: 10 PID:11371 at drivers/base/firmware_class.c:1105 _request_firmware+0x9ab/0x9d0()
      
      For avoiding this situation, this patch makes CA0132 skipping the f/w
      loading at resume when it failed at probe time.
      Reported-and-tested-by: default avatarJanek Kozicki <cosurgi@gmail.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      b74103cb
    • Clemens Ladisch's avatar
      ALSA: usb-audio: fix BOSS ME-25 MIDI regression · 08007455
      Clemens Ladisch authored
      commit 53da5ebf upstream.
      
      The BOSS ME-25 turns out not to have any useful descriptors in its MIDI
      interface, so its needs a quirk entry after all.
      Reported-and-tested-by: default avatarKees van Veen <kees.vanveen@gmail.com>
      Fixes: 8e5ced83 ("ALSA: usb-audio: remove superfluous Roland quirks")
      Signed-off-by: default avatarClemens Ladisch <clemens@ladisch.de>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      08007455
    • Mario Kleiner's avatar
      drm/nouveau: Bump version from 1.1.1 to 1.1.2 · a4e6faf3
      Mario Kleiner authored
      commit 7820e5ee upstream.
      
      Linux 3.16 fixed multiple bugs in kms pageflip completion events
      and timestamping, which were originally introduced in Linux 3.13.
      
      These fixes have been backported to all stable kernels since 3.13.
      
      However, the userspace nouveau-ddx needs to be aware if it is
      running on a kernel on which these bugs are fixed, or not.
      
      Bump the patchlevel of the drm driver version to signal this,
      so backporting this patch to stable 3.13+ kernels will give the
      ddx the required info.
      Signed-off-by: default avatarMario Kleiner <mario.kleiner.de@gmail.com>
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      a4e6faf3
    • Ilya Dryomov's avatar
      libceph: set last_piece in ceph_msg_data_pages_cursor_init() correctly · 572d9b26
      Ilya Dryomov authored
      commit 5f740d7e upstream.
      
      Determining ->last_piece based on the value of ->page_offset + length
      is incorrect because length here is the length of the entire message.
      ->last_piece set to false even if page array data item length is <=
      PAGE_SIZE, which results in invalid length passed to
      ceph_tcp_{send,recv}page() and causes various asserts to fire.
      
          # cat pages-cursor-init.sh
          #!/bin/bash
          rbd create --size 10 --image-format 2 foo
          FOO_DEV=$(rbd map foo)
          dd if=/dev/urandom of=$FOO_DEV bs=1M &>/dev/null
          rbd snap create foo@snap
          rbd snap protect foo@snap
          rbd clone foo@snap bar
          # rbd_resize calls librbd rbd_resize(), size is in bytes
          ./rbd_resize bar $(((4 << 20) + 512))
          rbd resize --size 10 bar
          BAR_DEV=$(rbd map bar)
          # trigger a 512-byte copyup -- 512-byte page array data item
          dd if=/dev/urandom of=$BAR_DEV bs=1M count=1 seek=5
      
      The problem exists only in ceph_msg_data_pages_cursor_init(),
      ceph_msg_data_pages_advance() does the right thing.  The size_t cast is
      unnecessary.
      Signed-off-by: default avatarIlya Dryomov <ilya.dryomov@inktank.com>
      Reviewed-by: default avatarSage Weil <sage@redhat.com>
      Reviewed-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      572d9b26
    • Oleg Nesterov's avatar
      vm_is_stack: use for_each_thread() rather then buggy while_each_thread() · 19518ed0
      Oleg Nesterov authored
      commit 4449a51a upstream.
      
      Aleksei hit the soft lockup during reading /proc/PID/smaps.  David
      investigated the problem and suggested the right fix.
      
      while_each_thread() is racy and should die, this patch updates
      vm_is_stack().
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reported-by: default avatarAleksei Besogonov <alex.besogonov@gmail.com>
      Tested-by: default avatarAleksei Besogonov <alex.besogonov@gmail.com>
      Suggested-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      19518ed0
    • Jiri Kosina's avatar
      drm/i915: read HEAD register back in init_ring_common() to enforce ordering · ca1d9016
      Jiri Kosina authored
      commit ece4a17d upstream.
      
      Withtout this, ring initialization fails reliabily during resume with
      
      	[drm:init_ring_common] *ERROR* render ring initialization failed ctl 0001f001 head ffffff8804 tail 00000000 start 000e4000
      
      This is not a complete fix, but it is verified to make the ring
      initialization failures during resume much less likely.
      
      We were not able to root-cause this bug (likely HW-specific to Gen4 chips)
      yet. This is therefore used as a ducttape before problem is fully
      understood and proper fix created, so that people don't suffer from
      completely unusable systems in the meantime.
      
      The discussion and debugging is happening at
      
      	https://bugs.freedesktop.org/show_bug.cgi?id=76554Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ca1d9016
    • Sasha Levin's avatar
      kernel/smp.c:on_each_cpu_cond(): fix warning in fallback path · 9aa0c0dd
      Sasha Levin authored
      commit 618fde87 upstream.
      
      The rarely-executed memry-allocation-failed callback path generates a
      WARN_ON_ONCE() when smp_call_function_single() succeeds.  Presumably
      it's supposed to warn on failures.
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Cc: Christoph Lameter <cl@gentwo.org>
      Cc: Gilad Ben-Yossef <gilad@benyossef.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Tejun Heo <htejun@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      9aa0c0dd
    • Steven Rostedt (Red Hat)'s avatar
      ring-buffer: Always reset iterator to reader page · 428e6bb1
      Steven Rostedt (Red Hat) authored
      commit 651e22f2 upstream.
      
      When performing a consuming read, the ring buffer swaps out a
      page from the ring buffer with a empty page and this page that
      was swapped out becomes the new reader page. The reader page
      is owned by the reader and since it was swapped out of the ring
      buffer, writers do not have access to it (there's an exception
      to that rule, but it's out of scope for this commit).
      
      When reading the "trace" file, it is a non consuming read, which
      means that the data in the ring buffer will not be modified.
      When the trace file is opened, a ring buffer iterator is allocated
      and writes to the ring buffer are disabled, such that the iterator
      will not have issues iterating over the data.
      
      Although the ring buffer disabled writes, it does not disable other
      reads, or even consuming reads. If a consuming read happens, then
      the iterator is reset and starts reading from the beginning again.
      
      My tests would sometimes trigger this bug on my i386 box:
      
      WARNING: CPU: 0 PID: 5175 at kernel/trace/trace.c:1527 __trace_find_cmdline+0x66/0xaa()
      Modules linked in:
      CPU: 0 PID: 5175 Comm: grep Not tainted 3.16.0-rc3-test+ #8
      Hardware name:                  /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006
       00000000 00000000 f09c9e1c c18796b3 c1b5d74c f09c9e4c c103a0e3 c1b5154b
       f09c9e78 00001437 c1b5d74c 000005f7 c10bd85a c10bd85a c1cac57c f09c9eb0
       ed0e0000 f09c9e64 c103a185 00000009 f09c9e5c c1b5154b f09c9e78 f09c9e80^M
      Call Trace:
       [<c18796b3>] dump_stack+0x4b/0x75
       [<c103a0e3>] warn_slowpath_common+0x7e/0x95
       [<c10bd85a>] ? __trace_find_cmdline+0x66/0xaa
       [<c10bd85a>] ? __trace_find_cmdline+0x66/0xaa
       [<c103a185>] warn_slowpath_fmt+0x33/0x35
       [<c10bd85a>] __trace_find_cmdline+0x66/0xaa^M
       [<c10bed04>] trace_find_cmdline+0x40/0x64
       [<c10c3c16>] trace_print_context+0x27/0xec
       [<c10c4360>] ? trace_seq_printf+0x37/0x5b
       [<c10c0b15>] print_trace_line+0x319/0x39b
       [<c10ba3fb>] ? ring_buffer_read+0x47/0x50
       [<c10c13b1>] s_show+0x192/0x1ab
       [<c10bfd9a>] ? s_next+0x5a/0x7c
       [<c112e76e>] seq_read+0x267/0x34c
       [<c1115a25>] vfs_read+0x8c/0xef
       [<c112e507>] ? seq_lseek+0x154/0x154
       [<c1115ba2>] SyS_read+0x54/0x7f
       [<c188488e>] syscall_call+0x7/0xb
      ---[ end trace 3f507febd6b4cc83 ]---
      >>>> ##### CPU 1 buffer started ####
      
      Which was the __trace_find_cmdline() function complaining about the pid
      in the event record being negative.
      
      After adding more test cases, this would trigger more often. Strangely
      enough, it would never trigger on a single test, but instead would trigger
      only when running all the tests. I believe that was the case because it
      required one of the tests to be shutting down via delayed instances while
      a new test started up.
      
      After spending several days debugging this, I found that it was caused by
      the iterator becoming corrupted. Debugging further, I found out why
      the iterator became corrupted. It happened with the rb_iter_reset().
      
      As consuming reads may not read the full reader page, and only part
      of it, there's a "read" field to know where the last read took place.
      The iterator, must also start at the read position. In the rb_iter_reset()
      code, if the reader page was disconnected from the ring buffer, the iterator
      would start at the head page within the ring buffer (where writes still
      happen). But the mistake there was that it still used the "read" field
      to start the iterator on the head page, where it should always start
      at zero because readers never read from within the ring buffer where
      writes occur.
      
      I originally wrote a patch to have it set the iter->head to 0 instead
      of iter->head_page->read, but then I questioned why it wasn't always
      setting the iter to point to the reader page, as the reader page is
      still valid.  The list_empty(reader_page->list) just means that it was
      successful in swapping out. But the reader_page may still have data.
      
      There was a bug report a long time ago that was not reproducible that
      had something about trace_pipe (consuming read) not matching trace
      (iterator read). This may explain why that happened.
      
      Anyway, the correct answer to this bug is to always use the reader page
      an not reset the iterator to inside the writable ring buffer.
      
      Fixes: d769041f "ring_buffer: implement new locking"
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      428e6bb1
    • Steven Rostedt (Red Hat)'s avatar
      ring-buffer: Up rb_iter_peek() loop count to 3 · 6570cd4e
      Steven Rostedt (Red Hat) authored
      commit 021de3d9 upstream.
      
      After writting a test to try to trigger the bug that caused the
      ring buffer iterator to become corrupted, I hit another bug:
      
       WARNING: CPU: 1 PID: 5281 at kernel/trace/ring_buffer.c:3766 rb_iter_peek+0x113/0x238()
       Modules linked in: ipt_MASQUERADE sunrpc [...]
       CPU: 1 PID: 5281 Comm: grep Tainted: G        W     3.16.0-rc3-test+ #143
       Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
        0000000000000000 ffffffff81809a80 ffffffff81503fb0 0000000000000000
        ffffffff81040ca1 ffff8800796d6010 ffffffff810c138d ffff8800796d6010
        ffff880077438c80 ffff8800796d6010 ffff88007abbe600 0000000000000003
       Call Trace:
        [<ffffffff81503fb0>] ? dump_stack+0x4a/0x75
        [<ffffffff81040ca1>] ? warn_slowpath_common+0x7e/0x97
        [<ffffffff810c138d>] ? rb_iter_peek+0x113/0x238
        [<ffffffff810c138d>] ? rb_iter_peek+0x113/0x238
        [<ffffffff810c14df>] ? ring_buffer_iter_peek+0x2d/0x5c
        [<ffffffff810c6f73>] ? tracing_iter_reset+0x6e/0x96
        [<ffffffff810c74a3>] ? s_start+0xd7/0x17b
        [<ffffffff8112b13e>] ? kmem_cache_alloc_trace+0xda/0xea
        [<ffffffff8114cf94>] ? seq_read+0x148/0x361
        [<ffffffff81132d98>] ? vfs_read+0x93/0xf1
        [<ffffffff81132f1b>] ? SyS_read+0x60/0x8e
        [<ffffffff8150bf9f>] ? tracesys+0xdd/0xe2
      
      Debugging this bug, which triggers when the rb_iter_peek() loops too
      many times (more than 2 times), I discovered there's a case that can
      cause that function to legitimately loop 3 times!
      
      rb_iter_peek() is different than rb_buffer_peek() as the rb_buffer_peek()
      only deals with the reader page (it's for consuming reads). The
      rb_iter_peek() is for traversing the buffer without consuming it, and as
      such, it can loop for one more reason. That is, if we hit the end of
      the reader page or any page, it will go to the next page and try again.
      
      That is, we have this:
      
       1. iter->head > iter->head_page->page->commit
          (rb_inc_iter() which moves the iter to the next page)
          try again
      
       2. event = rb_iter_head_event()
          event->type_len == RINGBUF_TYPE_TIME_EXTEND
          rb_advance_iter()
          try again
      
       3. read the event.
      
      But we never get to 3, because the count is greater than 2 and we
      cause the WARNING and return NULL.
      
      Up the counter to 3.
      
      Fixes: 69d1b839 "ring-buffer: Bind time extend and data events together"
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      6570cd4e
    • Christian Borntraeger's avatar
      s390/locking: Reenable optimistic spinning · f7772b7b
      Christian Borntraeger authored
      commit 36e7fdaa upstream.
      
      commit 4badad35 (locking/mutex: Disable
      optimistic spinning on some architectures) fenced spinning for
      architectures without proper cmpxchg.
      There is no need to disable mutex spinning on s390, though:
      The instructions CS,CSG and friends provide the proper guarantees.
      (We dont implement cmpxchg with locks).
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      f7772b7b
    • Axel Lin's avatar
      hwmon: (dme1737) Prevent overflow problem when writing large limits · d1771883
      Axel Lin authored
      commit d58e47d7 upstream.
      
      On platforms with sizeof(int) < sizeof(long), writing a temperature
      limit larger than MAXINT will result in unpredictable limit values
      written to the chip. Avoid auto-conversion from long to int to fix
      the problem.
      
      Voltage limits, fan minimum speed, pwm frequency, pwm ramp rate, and
      other attributes have the same problem, fix them as well.
      
      Zone temperature limits are signed, but were cached as u8, causing
      unepected values to be reported for negative temperatures. Cache as
      s8 to fix the problem.
      
      vrm is an u8, so the written value needs to be limited to [0, 255].
      Signed-off-by: default avatarAxel Lin <axel.lin@ingics.com>
      [Guenter Roeck: Fix zone temperature cache]
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      d1771883
    • Axel Lin's avatar
      hwmon: (ads1015) Fix out-of-bounds array access · 97603c44
      Axel Lin authored
      commit e9814295 upstream.
      
      Current code uses data_rate as array index in ads1015_read_adc() and uses pga
      as array index in ads1015_reg_to_mv, so we must make sure both data_rate and
      pga settings are in valid value range.
      Return -EINVAL if the setting is out-of-range.
      Signed-off-by: default avatarAxel Lin <axel.lin@ingics.com>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      97603c44
    • Marc Zyngier's avatar
      net: sun4i-emac: fix memory leak on bad packet · d32e8241
      Marc Zyngier authored
      commit 2670cc69 upstream.
      
      Upon reception of a new frame, the emac driver checks for a number
      of error conditions, and flag the packet as "bad" if any of these
      are present. It then allocates a skb unconditionally, but only uses
      it if the packet is "good". On the error path, the skb is just forgotten,
      and the system leaks memory.
      
      The piece of junk I have on my desk seems to encounter such error
      frequently enough so that the box goes OOM after a couple of days,
      which makes me grumpy.
      
      Fix this by moving the allocation on the "good_packet" path (and
      convert it to netdev_alloc_skb while we're at it).
      
      Tested on a random Allwinner A20 board.
      
      Cc: Stefan Roese <sr@denx.de>
      Cc: Maxime Ripard <maxime.ripard@free-electrons.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Acked-by: default avatarMaxime Ripard <maxime.ripard@free-electrons.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      d32e8241
    • Matt Fleming's avatar
      x86/efi: Enforce CONFIG_RELOCATABLE for EFI boot stub · 37c8ee0c
      Matt Fleming authored
      commit 7b2a583a upstream.
      
      Without CONFIG_RELOCATABLE the early boot code will decompress the
      kernel to LOAD_PHYSICAL_ADDR. While this may have been fine in the BIOS
      days, that isn't going to fly with UEFI since parts of the firmware
      code/data may be located at LOAD_PHYSICAL_ADDR.
      
      Straying outside of the bounds of the regions we've explicitly requested
      from the firmware will cause all sorts of trouble. Bruno reports that
      his machine resets while trying to decompress the kernel image.
      
      We already go to great pains to ensure the kernel is loaded into a
      suitably aligned buffer, it's just that the address isn't necessarily
      LOAD_PHYSICAL_ADDR, because we can't guarantee that address isn't in-use
      by the firmware.
      
      Explicitly enforce CONFIG_RELOCATABLE for the EFI boot stub, so that we
      can load the kernel at any address with the correct alignment.
      Reported-by: default avatarBruno Prémont <bonbons@linux-vserver.org>
      Tested-by: default avatarBruno Prémont <bonbons@linux-vserver.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      37c8ee0c
    • Steve Wise's avatar
      RDMA/iwcm: Use a default listen backlog if needed · 77eefef0
      Steve Wise authored
      commit 2f0304d2 upstream.
      
      If the user creates a listening cm_id with backlog of 0 the IWCM ends
      up not allowing any connection requests at all.  The correct behavior
      is for the IWCM to pick a default value if the user backlog parameter
      is zero.
      
      Lustre from version 1.8.8 onward uses a backlog of 0, which breaks
      iwarp support without this fix.
      Signed-off-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      77eefef0
    • Wanpeng Li's avatar
      KVM: nVMX: fix "acknowledge interrupt on exit" when APICv is in use · 45a1ef32
      Wanpeng Li authored
      commit 56cc2406 upstream.
      
      After commit 77b0f5d6 (KVM: nVMX: Ack and write vector info to intr_info
      if L1 asks us to), "Acknowledge interrupt on exit" behavior can be
      emulated. To do so, KVM will ask the APIC for the interrupt vector if
      during a nested vmexit if VM_EXIT_ACK_INTR_ON_EXIT is set.  With APICv,
      kvm_get_apic_interrupt would return -1 and give the following WARNING:
      
      Call Trace:
       [<ffffffff81493563>] dump_stack+0x49/0x5e
       [<ffffffff8103f0eb>] warn_slowpath_common+0x7c/0x96
       [<ffffffffa059709a>] ? nested_vmx_vmexit+0xa4/0x233 [kvm_intel]
       [<ffffffff8103f11a>] warn_slowpath_null+0x15/0x17
       [<ffffffffa059709a>] nested_vmx_vmexit+0xa4/0x233 [kvm_intel]
       [<ffffffffa0594295>] ? nested_vmx_exit_handled+0x6a/0x39e [kvm_intel]
       [<ffffffffa0537931>] ? kvm_apic_has_interrupt+0x80/0xd5 [kvm]
       [<ffffffffa05972ec>] vmx_check_nested_events+0xc3/0xd3 [kvm_intel]
       [<ffffffffa051ebe9>] inject_pending_event+0xd0/0x16e [kvm]
       [<ffffffffa051efa0>] vcpu_enter_guest+0x319/0x704 [kvm]
      
      To fix this, we cannot rely on the processor's virtual interrupt delivery,
      because "acknowledge interrupt on exit" must only update the virtual
      ISR/PPR/IRR registers (and SVI, which is just a cache of the virtual ISR)
      but it should not deliver the interrupt through the IDT.  Thus, KVM has
      to deliver the interrupt "by hand", similar to the treatment of EOI in
      commit fc57ac2c (KVM: lapic: sync highest ISR to hardware apic on
      EOI, 2014-05-14).
      
      The patch modifies kvm_cpu_get_interrupt to always acknowledge an
      interrupt; there are only two callers, and the other is not affected
      because it is never reached with kvm_apic_vid_enabled() == true.  Then it
      modifies apic_set_isr and apic_clear_irr to update SVI and RVI in addition
      to the registers.
      Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Suggested-by: default avatar"Zhang, Yang Z" <yang.z.zhang@intel.com>
      Tested-by: default avatarLiu, RongrongX <rongrongx.liu@intel.com>
      Tested-by: default avatarFelipe Reyes <freyes@suse.com>
      Fixes: 77b0f5d6Signed-off-by: default avatarWanpeng Li <wanpeng.li@linux.intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      45a1ef32
    • Alex Deucher's avatar
      drm/radeon: tweak ACCEL_WORKING2 query for hawaii · aed63860
      Alex Deucher authored
      commit 3c64bd26 upstream.
      
      Return 2 so we can be sure the kernel has the necessary
      changes for acceleration to work.
      
      Note: This patch depends on these two commits:
       - drm/radeon: fix cut and paste issue for hawaii.
       - drm/radeon: use packet2 for nop on hawaii with old firmware
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarAndreas Boll <andreas.boll.dev@gmail.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      aed63860
    • Alex Deucher's avatar
      drm/radeon: use packet2 for nop on hawaii with old firmware · 519eac7c
      Alex Deucher authored
      commit 0e16e4cf upstream.
      
      Older firmware didn't support the new nop packet.
      
      v2 (Andreas Boll):
       - Drop usage of packet3 for new firmware
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Reviewed-by: Christian König <christian.koenig@amd.com> (v1)
      Signed-off-by: default avatarAndreas Boll <andreas.boll.dev@gmail.com>
      [ kamal: backport to 3.13: context ]
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      519eac7c
    • Alex Deucher's avatar
      drm/radeon: re-enable dpm by default on BTC · c031389a
      Alex Deucher authored
      commit c08abf11 upstream.
      
      This patch depends on:
      e0792981
      (drm/radeon/dpm: fix typo in vddci setup for eg/btc)
      
      bugs:
      https://bugs.freedesktop.org/show_bug.cgi?id=73053
      https://bugzilla.kernel.org/show_bug.cgi?id=68571Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      c031389a
    • Alex Deucher's avatar
      drm/radeon: re-enable dpm by default on cayman · 43ee7956
      Alex Deucher authored
      commit 8f500af4 upstream.
      
      This patch depends on:
      b0880e87
      (drm/radeon/dpm: fix vddci setup typo on cayman)
      
      bug:
      https://bugs.freedesktop.org/show_bug.cgi?id=69723Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      43ee7956
    • Alex Deucher's avatar
      drm/radeon/dpm: handle voltage info fetching on hawaii · 1e22169f
      Alex Deucher authored
      commit 6b57f20c upstream.
      
      Some hawaii cards use a different method to fetch the
      voltage info from the vbios.
      
      bug:
      https://bugs.freedesktop.org/show_bug.cgi?id=74250Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      1e22169f
    • Alex Deucher's avatar
      drm/radeon/atom: add new voltage fetch function for hawaii · 6f846ebf
      Alex Deucher authored
      commit e9f274b2 upstream.
      
      Some hawaii boards use a different method for fetching the
      voltage information from the vbios.
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      6f846ebf
    • Christian König's avatar
      drm/radeon: set VM base addr using the PFP v2 · 382d6be0
      Christian König authored
      commit f1d2a26b upstream.
      
      Seems to make VM flushes more stable on SI and CIK.
      
      v2: only use the PFP on the GFX ring on CIK
      Signed-off-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      382d6be0
    • Alex Deucher's avatar
      drm/radeon: load the lm63 driver for an lm64 thermal chip. · 7b421b86
      Alex Deucher authored
      commit 5dc35532 upstream.
      
      Looks like the lm63 driver supports the lm64 as well.
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      7b421b86
    • Tyrel Datwyler's avatar
      powerpc/pci: Reorder pci bus/bridge unregistration during PHB removal · d3a130af
      Tyrel Datwyler authored
      commit 73400565 upstream.
      
      Commit bcdde7e2 made __sysfs_remove_dir() recursive and introduced a BUG_ON
      during PHB removal while attempting to delete the power managment attribute
      group of the bus. This is a result of tearing the bridge and bus devices down
      out of order in remove_phb_dynamic. Since, the the bus resides below the bridge
      in the sysfs device tree it should be torn down first.
      
      This patch simply moves the device_unregister call for the PHB bridge device
      after the device_unregister call for the PHB bus.
      
      Fixes: bcdde7e2 ("sysfs: make __sysfs_remove_dir() recursive")
      Signed-off-by: default avatarTyrel Datwyler <tyreld@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      d3a130af
    • Andrey Utkin's avatar
    • Tetsuo Handa's avatar
      drm/ttm: Pass GFP flags in order to avoid deadlock. · 25c4b2ce
      Tetsuo Handa authored
      commit a91576d7 upstream.
      
      Commit 7dc19d5a "drivers: convert shrinkers to new count/scan API" added
      deadlock warnings that ttm_page_pool_free() and ttm_dma_page_pool_free()
      are currently doing GFP_KERNEL allocation.
      
      But these functions did not get updated to receive gfp_t argument.
      This patch explicitly passes sc->gfp_mask or GFP_KERNEL to these functions,
      and removes the deadlock warning.
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      25c4b2ce
    • Tetsuo Handa's avatar
      drm/ttm: Fix possible stack overflow by recursive shrinker calls. · 5ac857f7
      Tetsuo Handa authored
      commit 71336e01 upstream.
      
      While ttm_dma_pool_shrink_scan() tries to take mutex before doing GFP_KERNEL
      allocation, ttm_pool_shrink_scan() does not do it. This can result in stack
      overflow if kmalloc() in ttm_page_pool_free() triggered recursion due to
      memory pressure.
      
        shrink_slab()
        => ttm_pool_shrink_scan()
           => ttm_page_pool_free()
              => kmalloc(GFP_KERNEL)
                 => shrink_slab()
                    => ttm_pool_shrink_scan()
                       => ttm_page_pool_free()
                          => kmalloc(GFP_KERNEL)
      
      Change ttm_pool_shrink_scan() to do like ttm_dma_pool_shrink_scan() does.
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      5ac857f7