1. 26 Aug, 2012 23 commits
    • Sarah Sharp's avatar
      xhci: Fix bug after deq ptr set to link TRB. · 1aac2e73
      Sarah Sharp authored
      commit 50d0206f upstream.
      
      This patch fixes a particularly nasty bug that was revealed by the ring
      expansion patches.  The bug has been present since the very beginning of
      the xHCI driver history, and could have caused general protection faults
      from bad memory accesses.
      
      The first thing to note is that a Set TR Dequeue Pointer command can
      move the dequeue pointer to a link TRB, if the canceled or stalled
      transfer TD ended just before a link TRB.  The function to increment the
      dequeue pointer, inc_deq, was written before cancellation and stall
      support was added.  It assumed that the dequeue pointer could never
      point to a link TRB.  It would unconditionally increment the dequeue
      pointer at the start of the function, check if the pointer was now on a
      link TRB, and move it to the top of the next segment if so.
      
      This means that if a Set TR Dequeue Point command moved the dequeue
      pointer to a link TRB, a subsequent call to inc_deq() would move the
      pointer off the segment and into la-la-land.  It would then read from
      that memory to determine if it was a link TRB.  Other functions would
      often call inc_deq() until the dequeue pointer matched some other
      pointer, which means this function would quite happily read all of
      system memory before wrapping around to the right pointer value.
      
      Often, there would be another endpoint segment from a different ring
      allocated from the same DMA pool, which would be contiguous to the
      segment inc_deq just stepped off of.  inc_deq would eventually find the
      link TRB in that segment, and blindly move the dequeue pointer back to
      the top of the correct ring segment.
      
      The only reason the original code worked at all is because there was
      only one ring segment.  With the ring expansion patches, the dequeue
      pointer would eventually wrap into place, but the dequeue segment would
      be out-of-sync.  On the second TD after the dequeue pointer was moved to
      a link TRB, trb_in_td() would fail (because the dequeue pointer and
      dequeue segment were out-of-sync), and this message would appear:
      
      ERROR Transfer event TRB DMA ptr not part of current TD
      
      This fixes bugzilla entry 4333 (option-based modem unhappy on USB 3.0
      port: "Transfer event TRB DMA ptr not part of current TD", "rejecting
      I/O to offline device"),
      
      	https://bugzilla.kernel.org/show_bug.cgi?id=43333
      
      and possibly other general protection fault bugs as well.
      
      This patch should be backported to kernels as old as 2.6.31.  A separate
      patch will be created for kernels older than 3.4, since inc_deq was
      modified in 3.4 and this patch will not apply.
      Signed-off-by: default avatarSarah Sharp <sarah.a.sharp@linux.intel.com>
      Tested-by: default avatarJames Ettle <theholyettlz@googlemail.com>
      Tested-by: default avatarMatthew Hall <mhall@mhcomputing.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1aac2e73
    • Sarah Sharp's avatar
      xhci: Switch PPT ports to EHCI on shutdown. · 0adf7a08
      Sarah Sharp authored
      commit e95829f4 upstream.
      
      The Intel desktop boards DH77EB and DH77DF have a hardware issue that
      can be worked around by BIOS.  If the USB ports are switched to xHCI on
      shutdown, the xHCI host will send a spurious interrupt, which will wake
      the system.  Some BIOS will work around this, but not all.
      
      The bug can be avoided if the USB ports are switched back to EHCI on
      shutdown.  The Intel Windows driver switches the ports back to EHCI, so
      change the Linux xHCI driver to do the same.
      
      Unfortunately, we can't tell the two effected boards apart from other
      working motherboards, because the vendors will change the DMI strings
      for the DH77EB and DH77DF boards to their own custom names.  One example
      is Compulab's mini-desktop, the Intense-PC.  Instead, key off the
      Panther Point xHCI host PCI vendor and device ID, and switch the ports
      over for all PPT xHCI hosts.
      
      The only impact this will have on non-effected boards is to add a couple
      hundred milliseconds delay on boot when the BIOS has to switch the ports
      over from EHCI to xHCI.
      
      This patch should be backported to kernels as old as 3.0, that contain
      the commit 69e848c2 "Intel xhci: Support
      EHCI/xHCI port switching."
      Signed-off-by: default avatarSarah Sharp <sarah.a.sharp@linux.intel.com>
      Reported-by: default avatarDenis Turischev <denis@compulab.co.il>
      Tested-by: default avatarDenis Turischev <denis@compulab.co.il>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0adf7a08
    • Sarah Sharp's avatar
      xhci: Increase reset timeout for Renesas 720201 host. · ebd311e0
      Sarah Sharp authored
      commit 22ceac19 upstream.
      
      The NEC/Renesas 720201 xHCI host controller does not complete its reset
      within 250 milliseconds.  In fact, it takes about 9 seconds to reset the
      host controller, and 1 second for the host to be ready for doorbell
      rings.  Extend the reset and CNR polling timeout to 10 seconds each.
      
      This patch should be backported to kernels as old as 2.6.31, that
      contain the commit 66d4eadd "USB: xhci:
      BIOS handoff and HW initialization."
      Signed-off-by: default avatarSarah Sharp <sarah.a.sharp@linux.intel.com>
      Reported-by: default avatarEdwin Klein Mentink <e.kleinmentink@zonnet.nl>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ebd311e0
    • Sarah Sharp's avatar
      xhci: Add Etron XHCI_TRUST_TX_LENGTH quirk. · 40c293d8
      Sarah Sharp authored
      commit 5cb7df2b upstream.
      
      Gary reports that with recent kernels, he notices more xHCI driver
      warnings:
      
      xhci_hcd 0000:03:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
      
      We think his Etron xHCI host controller may have the same buggy behavior
      as the Fresco Logic xHCI host.  When a short transfer is received, the
      host will mark the transfer as successfully completed when it should be
      marking it with a short completion.
      
      Fix this by turning on the XHCI_TRUST_TX_LENGTH quirk when the Etron
      host is discovered.  Note that Gary has revision 1, but if Etron fixes
      this bug in future revisions, the quirk will have no effect.
      
      This patch should be backported to kernels as old as 2.6.36, that
      contain a backported version of commit
      1530bbc6 "xhci: Add new short TX quirk
      for Fresco Logic host."
      Signed-off-by: default avatarSarah Sharp <sarah.a.sharp@linux.intel.com>
      Reported-by: default avatarGary E. Miller <gem@rellim.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      40c293d8
    • Theodore Ts'o's avatar
      ext4: fix kernel BUG on large-scale rm -rf commands · d8b868fd
      Theodore Ts'o authored
      commit 89a4e48f upstream.
      
      Commit 968dee77: "ext4: fix hole punch failure when depth is greater
      than 0" introduced a regression in v3.5.1/v3.6-rc1 which caused kernel
      crashes when users ran run "rm -rf" on large directory hierarchy on
      ext4 filesystems on RAID devices:
      
          BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
      
          Process rm (pid: 18229, threadinfo ffff8801276bc000, task ffff880123631710)
          Call Trace:
           [<ffffffff81236483>] ? __ext4_handle_dirty_metadata+0x83/0x110
           [<ffffffff812353d3>] ext4_ext_truncate+0x193/0x1d0
           [<ffffffff8120a8cf>] ? ext4_mark_inode_dirty+0x7f/0x1f0
           [<ffffffff81207e05>] ext4_truncate+0xf5/0x100
           [<ffffffff8120cd51>] ext4_evict_inode+0x461/0x490
           [<ffffffff811a1312>] evict+0xa2/0x1a0
           [<ffffffff811a1513>] iput+0x103/0x1f0
           [<ffffffff81196d84>] do_unlinkat+0x154/0x1c0
           [<ffffffff8118cc3a>] ? sys_newfstatat+0x2a/0x40
           [<ffffffff81197b0b>] sys_unlinkat+0x1b/0x50
           [<ffffffff816135e9>] system_call_fastpath+0x16/0x1b
          Code: 8b 4d 20 0f b7 41 02 48 8d 04 40 48 8d 04 81 49 89 45 18 0f b7 49 02 48 83 c1 01 49 89 4d 00 e9 ae f8 ff ff 0f 1f 00 49 8b 45 28 <48> 8b 40 28 49 89 45 20 e9 85 f8 ff ff 0f 1f 80 00 00 00
      
          RIP  [<ffffffff81233164>] ext4_ext_remove_space+0xa34/0xdf0
      
      This could be reproduced as follows:
      
      The problem in commit 968dee77 was that caused the variable 'i' to
      be left uninitialized if the truncate required more space than was
      available in the journal.  This resulted in the function
      ext4_ext_truncate_extend_restart() returning -EAGAIN, which caused
      ext4_ext_remove_space() to restart the truncate operation after
      starting a new jbd2 handle.
      Reported-by: default avatarMaciej Żenczykowski <maze@google.com>
      Reported-by: default avatarMarti Raudsepp <marti@juffo.org>
      Tested-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d8b868fd
    • Theodore Ts'o's avatar
      ext4: fix long mount times on very big file systems · e1b5abaa
      Theodore Ts'o authored
      commit 0548bbb8 upstream.
      
      Commit 8aeb00ff: "ext4: fix overhead calculation used by
      ext4_statfs()" introduced a O(n**2) calculation which makes very large
      file systems take forever to mount.  Fix this with an optimization for
      non-bigalloc file systems.  (For bigalloc file systems the overhead
      needs to be set in the the superblock.)
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e1b5abaa
    • Theodore Ts'o's avatar
      ext4: avoid kmemcheck complaint from reading uninitialized memory · 9a321bec
      Theodore Ts'o authored
      commit 7e731bc9 upstream.
      
      Commit 03179fe9 introduced a kmemcheck complaint in
      ext4_da_get_block_prep() because we save and restore
      ei->i_da_metadata_calc_last_lblock even though it is left
      uninitialized in the case where i_da_metadata_calc_len is zero.
      
      This doesn't hurt anything, but silencing the kmemcheck complaint
      makes it easier for people to find real bugs.
      
      Addresses https://bugzilla.kernel.org/show_bug.cgi?id=45631
      (which is marked as a regression).
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9a321bec
    • Theodore Ts'o's avatar
      ext4: make sure the journal sb is written in ext4_clear_journal_err() · 8b9f3861
      Theodore Ts'o authored
      commit d796c52e upstream.
      
      After we transfer set the EXT4_ERROR_FS bit in the file system
      superblock, it's not enough to call jbd2_journal_clear_err() to clear
      the error indication from journal superblock --- we need to call
      jbd2_journal_update_sb_errno() as well.  Otherwise, when the root file
      system is mounted read-only, the journal is replayed, and the error
      indicator is transferred to the superblock --- but the s_errno field
      in the jbd2 superblock is left set (since although we cleared it in
      memory, we never flushed it out to disk).
      
      This can end up confusing e2fsck.  We should make e2fsck more robust
      in this case, but the kernel shouldn't be leaving things in this
      confused state, either.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8b9f3861
    • Alex Deucher's avatar
      drm/radeon: fix bank tiling parameters on evergreen · 75a75718
      Alex Deucher authored
      commit c8d15edc upstream.
      
      Handle the 16 bank case.
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      75a75718
    • Alex Deucher's avatar
      drm/radeon: fix bank tiling parameters on cayman · e3d7a0f4
      Alex Deucher authored
      commit 5b23c904 upstream.
      
      Handle the 16 bank case.
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e3d7a0f4
    • Alex Deucher's avatar
      drm/radeon: add some new SI pci ids · 844bebfd
      Alex Deucher authored
      commit 2f292004 upstream.
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      844bebfd
    • Jerome Glisse's avatar
      drm/radeon: do not reenable crtc after moving vram start address · 3407b518
      Jerome Glisse authored
      commit 81ee8fb6 upstream.
      
      It seems we can not update the crtc scanout address. After disabling
      crtc, update to base address do not take effect after crtc being
      reenable leading to at least frame being scanout from the old crtc
      base address. Disabling crtc display request lead to same behavior.
      
      So after changing the vram address if we don't keep crtc disabled
      we will have the GPU trying to read some random system memory address
      with some iommu this will broke the crtc engine and will lead to
      broken display and iommu error message.
      
      So to avoid this, disable crtc. For flicker less boot we will need
      to avoid moving the vram start address.
      
      This patch should also fix :
      
      https://bugs.freedesktop.org/show_bug.cgi?id=42373Signed-off-by: default avatarJerome Glisse <jglisse@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3407b518
    • Alex Deucher's avatar
      drm/radeon: properly handle crtc powergating · b248464f
      Alex Deucher authored
      commit 6c0ae2ab upstream.
      
      Need to make sure the crtc is gated on before modesetting.
      Explicitly gate the crtc on in prepare() and set a flag
      so that the dpms functions don't gate it off during
      mode set.
      
      Noticed by sylware on IRC.
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b248464f
    • Daniel Vetter's avatar
      drm/i915: reorder edp disabling to fix ivb MacBook Air · acafb023
      Daniel Vetter authored
      commit 35a38556 upstream.
      
      eDP is tons of fun. It turns out that at least the new MacBook Air 5,1
      model absolutely doesn't like the new force vdd dance we've introduced
      in
      
      commit 6cb49835
      Author: Daniel Vetter <daniel.vetter@ffwll.ch>
      Date:   Sun May 20 17:14:50 2012 +0200
      
          drm/i915: enable vdd when switching off the eDP panel
      
      But that patch also tried to fix some neat edp sequence issue with the
      force_vdd timings. Closer inspection reveals that we've raised
      force_vdd only to do the aux channel communication dp_sink_dpms. If we
      move the edp_panel_off below that, we don't need any force_vdd for the
      disable sequence, which makes the Air happy.
      
      Unfortunately the reporter of the original bug that the above commit
      fixed is travelling, so we can't test whether this regresses things.
      But my theory is that since we don't check for any power-off ->
      force_vdd-on delays in edp_panel_vdd_on, this was the actual
      root-cause of this failure. With that force_vdd dance completely
      eliminated, I'm hopeful the original bug stays fixed, too.
      
      For reference the old bug, which hopefully doesn't get broken by this:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=43163
      
      In any case, regression fixers win over plain bugfixes, so this needs
      to go in asap.
      
      v2: The crucial pieces seems to be to clear the force_vdd flag
      uncoditionally, too, in edp_panel_off. Looks like this is left behind
      by the firmware somehow.
      
      v3: The Apple firmware seems to switch off the panel on it's own, hence
      we still need to keep force_vdd on, but properly clear it when switching
      the panel off.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=45671Tested-by: default avatarRoberto Romer <sildurin@gmail.com>
      Tested-by: default avatarDaniel Wagner <wagi@monom.org>
      Tested-by: default avatarKeith Packard <keithp@keithp.com>
      Cc: Keith Packard <keithp@keithp.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      acafb023
    • Daniel Vetter's avatar
      drm/i915: ignore eDP bpc settings from vbt · c02a2965
      Daniel Vetter authored
      commit 4344b813 upstream.
      
      This has originally been introduced to not oversubscribe the dp links
      in
      
      commit 885a5fb5
      Author: Zhenyu Wang <zhenyuw@linux.intel.com>
      Date:   Tue Jan 12 05:38:31 2010 +0800
      
          drm/i915: fix pixel color depth setting on eDP
      
      Since then we've fixed up the dp link bandwidth calculation code and
      should now automatically fall back to 6bpc dithering. So this is
      unnecessary.
      
      Furthermore it seems to break the new MacbookPro with retina display,
      hence let's just rip this out.
      Reported-by: default avatarBenoit Gschwind <gschwind@gnu-log.net>
      Cc: Benoit Gschwind <gschwind@gnu-log.net>
      Cc: Francois Rigaut <frigaut@gmail.com>
      Tested-by: default avatarBenoit Gschwind <gschwind@gnu-log.net>
      Tested-by: Bernhard Froemel <froemel at vmars tuwien.ac.at>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      --
      
      Testing feedback highgly welcome, and thanks for Benoit for finding
      out that the bpc computations are busted.
      -Daniel
      c02a2965
    • Daniel Vetter's avatar
      drm/i915: correctly order the ring init sequence · 57ecc93c
      Daniel Vetter authored
      commit 0d8957c8 upstream.
      
      We may only start to set up the new register values after having
      confirmed that the ring is truely off. Otherwise the hw might lose the
      newly written register values. This is caught later on in the init
      sequence, when we check whether the register writes have stuck.
      Reviewed-by: default avatarJani Nikula <jani.nikula@intel.com>
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50522Tested-by: default avatarYang Guang <guang.a.yang@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      57ecc93c
    • Christoph Bumiller's avatar
    • Jesse Barnes's avatar
      drm/i915: prefer wide & slow to fast & narrow in DP configs · 6700eb4d
      Jesse Barnes authored
      commit 2514bc51 upstream.
      
      High frequency link configurations have the potential to cause trouble
      with long and/or cheap cables, so prefer slow and wide configurations
      instead.  This patch has the potential to cause trouble for eDP
      configurations that lie about available lanes, so if we run into that we
      can make it conditional on eDP.
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45801
      Tested-by: peter@colberg.org
      Signed-off-by: default avatarJesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Jonathan Nieder <jrnieder@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6700eb4d
    • Stefano Stabellini's avatar
      xen: mark local pages as FOREIGN in the m2p_override · b9247be5
      Stefano Stabellini authored
      commit b9e0d95c upstream.
      
      When the frontend and the backend reside on the same domain, even if we
      add pages to the m2p_override, these pages will never be returned by
      mfn_to_pfn because the check "get_phys_to_machine(pfn) != mfn" will
      always fail, so the pfn of the frontend will be returned instead
      (resulting in a deadlock because the frontend pages are already locked).
      
      INFO: task qemu-system-i38:1085 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      qemu-system-i38 D ffff8800cfc137c0     0  1085      1 0x00000000
       ffff8800c47ed898 0000000000000282 ffff8800be4596b0 00000000000137c0
       ffff8800c47edfd8 ffff8800c47ec010 00000000000137c0 00000000000137c0
       ffff8800c47edfd8 00000000000137c0 ffffffff82213020 ffff8800be4596b0
      Call Trace:
       [<ffffffff81101ee0>] ? __lock_page+0x70/0x70
       [<ffffffff81a0fdd9>] schedule+0x29/0x70
       [<ffffffff81a0fe80>] io_schedule+0x60/0x80
       [<ffffffff81101eee>] sleep_on_page+0xe/0x20
       [<ffffffff81a0e1ca>] __wait_on_bit_lock+0x5a/0xc0
       [<ffffffff81101ed7>] __lock_page+0x67/0x70
       [<ffffffff8106f750>] ? autoremove_wake_function+0x40/0x40
       [<ffffffff811867e6>] ? bio_add_page+0x36/0x40
       [<ffffffff8110b692>] set_page_dirty_lock+0x52/0x60
       [<ffffffff81186021>] bio_set_pages_dirty+0x51/0x70
       [<ffffffff8118c6b4>] do_blockdev_direct_IO+0xb24/0xeb0
       [<ffffffff811e71a0>] ? ext3_get_blocks_handle+0xe00/0xe00
       [<ffffffff8118ca95>] __blockdev_direct_IO+0x55/0x60
       [<ffffffff811e71a0>] ? ext3_get_blocks_handle+0xe00/0xe00
       [<ffffffff811e91c8>] ext3_direct_IO+0xf8/0x390
       [<ffffffff811e71a0>] ? ext3_get_blocks_handle+0xe00/0xe00
       [<ffffffff81004b60>] ? xen_mc_flush+0xb0/0x1b0
       [<ffffffff81104027>] generic_file_aio_read+0x737/0x780
       [<ffffffff813bedeb>] ? gnttab_map_refs+0x15b/0x1e0
       [<ffffffff811038f0>] ? find_get_pages+0x150/0x150
       [<ffffffff8119736c>] aio_rw_vect_retry+0x7c/0x1d0
       [<ffffffff811972f0>] ? lookup_ioctx+0x90/0x90
       [<ffffffff81198856>] aio_run_iocb+0x66/0x1a0
       [<ffffffff811998b8>] do_io_submit+0x708/0xb90
       [<ffffffff81199d50>] sys_io_submit+0x10/0x20
       [<ffffffff81a18d69>] system_call_fastpath+0x16/0x1b
      
      The explanation is in the comment within the code:
      
      We need to do this because the pages shared by the frontend
      (xen-blkfront) can be already locked (lock_page, called by
      do_read_cache_page); when the userspace backend tries to use them
      with direct_IO, mfn_to_pfn returns the pfn of the frontend, so
      do_blockdev_direct_IO is going to try to lock the same pages
      again resulting in a deadlock.
      
      A simplified call graph looks like this:
      
      pygrub                          QEMU
      -----------------------------------------------
      do_read_cache_page              io_submit
        |                              |
      lock_page                       ext3_direct_IO
                                       |
                                      bio_add_page
                                       |
                                      lock_page
      
      Internally the xen-blkback uses m2p_add_override to swizzle (temporarily)
      a 'struct page' to have a different MFN (so that it can point to another
      guest). It also can easily find out whether another pfn corresponding
      to the mfn exists in the m2p, and can set the FOREIGN bit
      in the p2m, making sure that mfn_to_pfn returns the pfn of the backend.
      
      This allows the backend to perform direct_IO on these pages, but as a
      side effect prevents the frontend from using get_user_pages_fast on
      them while they are being shared with the backend.
      Signed-off-by: default avatarStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b9247be5
    • Zach Brown's avatar
      fuse: verify all ioctl retry iov elements · 90f9cb72
      Zach Brown authored
      commit fb6ccff6 upstream.
      
      Commit 7572777e attempted to verify that
      the total iovec from the client doesn't overflow iov_length() but it
      only checked the first element.  The iovec could still overflow by
      starting with a small element.  The obvious fix is to check all the
      elements.
      
      The overflow case doesn't look dangerous to the kernel as the copy is
      limited by the length after the overflow.  This fix restores the
      intention of returning an error instead of successfully copying less
      than the iovec represented.
      
      I found this by code inspection.  I built it but don't have a test case.
      I'm cc:ing stable because the initial commit did as well.
      Signed-off-by: default avatarZach Brown <zab@redhat.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      90f9cb72
    • Fabio Estevam's avatar
      dma: imx-dma: Fix kernel crash due to missing clock conversion · 9ea2c02b
      Fabio Estevam authored
      commit a2367db2 upstream.
      
      With the new i.MX clock infrastructure we need to request the dma clocks
      seperately: ahb and ipg clocks.
      
      This fixes the following kernel crash and make audio to be functional again:
      
      root@freescale /home$ aplay audio48k16S.wav
      Playing WAVE 'audio48k16S.wav' : Signed 16 bit Little Endian, Rate 48000 Hz, Stereo
      Unable to handle kernel NULL pointer dereference at virtual address 00000000
      pgd = c7b74000
      [00000000] *pgd=a7bb5831, *pte=00000000, *ppte=00000000
      Internal error: Oops: 17 [#1] PREEMPT ARM
      Modules linked in:
      CPU: 0    Not tainted  (3.5.0-rc5-next-20120702-00007-g3028b64 #1128)
      PC is at snd_dmaengine_pcm_get_chan+0x8/0x10
      LR is at snd_imx_pcm_hw_params+0x18/0xdc
      pc : [<c02d3cf8>]    lr : [<c02e95ec>]    psr: a0000013
      sp : c7b45e30  ip : ffffffff  fp : c7ae58e0
      r10: 00000000  r9 : c7ae981c  r8 : c7b88800
      r7 : c7ae5a60  r6 : c7ae5b20  r5 : c7ae9810  r4 : c7afa060
      r3 : 00000000  r2 : 00000001  r1 : c7b88800  r0 : c7afa060
      Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
      Control: 0005317f  Table: a7b74000  DAC: 00000015
      Process aplay (pid: 701, stack limit = 0xc7b44270)
      Stack: (0xc7b45e30 to 0xc7b46000)
      5e20:                                     00100000 00000029 c7b88800 c02db870
      5e40: c7ae5a60 c02d4594 00000010 01ae5a60 c7ae5a60 c7ae9810 c7ae9810 c7afa060
      5e60: c7ae5b20 c7ae5a60 c7b88800 c02e3ef0 c02e3e08 c7b1e400 c7afa060 c7b88800
      5e80: 00000000 c0014da8 c7b44000 00000000 bec566ac c02cd400 c7afa060 c7afa060
      5ea0: bec56800 c7b88800 c0014da8 c02cdd7c c04ee710 c04ee7b8 00000003 c005fc74
      5ec0: 00000000 7fffffff c7b45f00 c7afa060 c7b67420 c7ba3070 00000004 c0014da8
      5ee0: c7b44000 00000000 bec566ac c02ced88 c04e95f8 b6f5ab04 c7b45fb0 0145a468
      5f00: 0145a600 bec566bc bec56800 c7b67420 c7ba3070 c00d499c c7b45f18 c7b45f18
      5f20: 0000001a 00000004 00000001 c7b44000 c0527f40 00000009 00000008 00000000
      5f40: c7b44000 c002c9ec 00000001 c04f0ab0 c04ebec0 00000101 00000000 0000000a
      5f60: 60000093 c7b67420 bec56800 c25c4111 00000004 c0014da8 c7b44000 00000000
      5f80: bec566ac c00d4f38 b6ffb658 00000000 c0522d80 0145a468 b6fd5000 0145a418
      5fa0: 00000036 c0014c00 0145a468 b6fd5000 00000004 c25c4111 bec56800 00020001
      5fc0: 0145a468 b6fd5000 0145a418 00000036 0145a468 0145a600 bec566bc bec566ac
      5fe0: 0145a468 bec56388 b6f65ce4 b6dcebec 20000010 00000004 00000000 00000000
      [<c02d3cf8>] (snd_dmaengine_pcm_get_chan+0x8/0x10) from [<c02e95ec>] (snd_imx_pcm_hw_params+0x18/0xdc)
      [<c02e95ec>] (snd_imx_pcm_hw_params+0x18/0xdc) from [<c02e3ef0>] (soc_pcm_hw_params+0xe8/0x1f0)
      [<c02e3ef0>] (soc_pcm_hw_params+0xe8/0x1f0) from [<c02cd400>] (snd_pcm_hw_params+0x124/0x474)
      [<c02cd400>] (snd_pcm_hw_params+0x124/0x474) from [<c02cdd7c>] (snd_pcm_common_ioctl1+0x4b4/0xf74)
      [<c02cdd7c>] (snd_pcm_common_ioctl1+0x4b4/0xf74) from [<c02ced88>] (snd_pcm_playback_ioctl1+0x30/0x510)
      [<c02ced88>] (snd_pcm_playback_ioctl1+0x30/0x510) from [<c00d499c>] (do_vfs_ioctl+0x80/0x5e4)
      [<c00d499c>] (do_vfs_ioctl+0x80/0x5e4) from [<c00d4f38>] (sys_ioctl+0x38/0x60)
      [<c00d4f38>] (sys_ioctl+0x38/0x60) from [<c0014c00>] (ret_fast_syscall+0x0/0x2c)
      Code: e593000c e12fff1e e59030a0 e59330bc (e5930000)
      ---[ end trace fa518c8ba3a74e97 ]--
      Reported-by: default avatarJavier Martin <javier.martin@vista-silicon.com>
      Signed-off-by: default avatarFabio Estevam <fabio.estevam@freescale.com>
      Acked-by: default avatarSascha Hauer <s.hauer@pengutronix.de>
      Signed-off-by: default avatarVinod Koul <vinod.koul@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9ea2c02b
    • Heiko Carstens's avatar
      s390/compat: fix mmap compat system calls · a8753193
      Heiko Carstens authored
      commit e8587121 upstream.
      
      The native 31 bit and the compat behaviour for the mmap system calls differ:
      
      In native 31 bit mode the passed in address for the mmap system call will be
      unmodified passed to sys_mmap_pgoff().
      In compat mode however the passed in address will be modified with
      compat_ptr() which masks out the most significant bit.
      
      The result is that in native 31 bit mode each mmap request (with MAP_FIXED)
      will fail where the most significat bit is set, while in compat mode it
      may succeed.
      
      This odd behaviour was introduced with d3815898 "[S390] mmap: add missing
      compat_ptr conversion to both mmap compat syscalls".
      
      To restore a consistent behaviour accross native and compat mode this
      patch functionally reverts the above mentioned commit.
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a8753193
    • Heiko Carstens's avatar
      s390/compat: fix compat wrappers for process_vm system calls · bdac2ea2
      Heiko Carstens authored
      commit 82aabdb6 upstream.
      
      The compat wrappers incorrectly called the non compat versions of
      the system process_vm system calls.
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bdac2ea2
  2. 15 Aug, 2012 17 commits
    • Greg Kroah-Hartman's avatar
      Linux 3.4.9 · 196ad09b
      Greg Kroah-Hartman authored
      196ad09b
    • Stanislaw Gruszka's avatar
      rt61pci: fix NULL pointer dereference in config_lna_gain · 8f3ce224
      Stanislaw Gruszka authored
      commit deee0214 upstream.
      
      We can not pass NULL libconf->conf->channel to rt61pci_config() as it
      is dereferenced unconditionally in rt61pci_config_lna_gain() subroutine.
      
      Resolves:
      https://bugzilla.kernel.org/show_bug.cgi?id=44361
      
      Reported-and-tested-by: <dolohow@gmail.com>
      Signed-off-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8f3ce224
    • Chris Bagwell's avatar
      Input: wacom - Bamboo One 1024 pressure fix · 117b7f7c
      Chris Bagwell authored
      commit 6dc46351 upstream.
      
      Bamboo One's with ID of 0x6a and 0x6b were added with correct
      indication of 1024 pressure levels but the Graphire packet routine
      was only looking at 9 bits.  Increased to 10 bits.
      
      This bug caused these devices to roll over to zero pressure at half
      way mark.
      
      The other devices using this routine only support 256 or 512 range
      and look to fix unused bits at zero.
      Signed-off-by: default avatarChris Bagwell <chris@cnpbagwell.com>
      Reported-by: default avatarTushant Mirchandani <tushantin@gmail.com>
      Reviewed-by: default avatarPing Cheng <pingc@wacom.com>
      Signed-off-by: default avatarDmitry Torokhov <dtor@mail.ru>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      117b7f7c
    • Arnd Bergmann's avatar
      Input: eeti_ts: pass gpio value instead of IRQ · 95c92481
      Arnd Bergmann authored
      commit 4eef6cbf upstream.
      
      The EETI touchscreen asserts its IRQ line as soon as it has data in its
      internal buffers. The line is automatically deasserted once all data has
      been read via I2C. Hence, the driver has to monitor the GPIO line and
      cannot simply rely on the interrupt handler reception.
      
      In the current implementation of the driver, irq_to_gpio() is used to
      determine the GPIO number from the i2c_client's IRQ value.
      
      As irq_to_gpio() is not available on all platforms, this patch changes
      this and makes the driver ignore the passed in IRQ. Instead, a GPIO is
      added to the platform_data struct and gpio_to_irq is used to derive the
      IRQ from that GPIO. If this fails, bail out. The driver is only able to
      work in environments where the touchscreen GPIO can be mapped to an
      IRQ.
      
      Without this patch, building raumfeld_defconfig results in:
      
      drivers/input/touchscreen/eeti_ts.c: In function 'eeti_ts_irq_active':
      drivers/input/touchscreen/eeti_ts.c:65:2: error: implicit declaration of function 'irq_to_gpio' [-Werror=implicit-function-declaration]
      Signed-off-by: default avatarDaniel Mack <zonque@gmail.com>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
      Cc: Sven Neumann <s.neumann@raumfeld.com>
      Cc: linux-input@vger.kernel.org
      Cc: Haojian Zhuang <haojian.zhuang@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      95c92481
    • Tushar Dave's avatar
      e1000e: NIC goes up and immediately goes down · 733e9ad9
      Tushar Dave authored
      commit b7ec70be upstream.
      
      Found that commit d478eb44 was a bad commit.
      If the link partner is transmitting codeword (even if NULL codeword),
      then the RXCW.C bit will be set so check for RXCW.CW is unnecessary.
      Ref: RH BZ 840642
      Reported-by: default avatarFabio Futigami <ffutigam@redhat.com>
      Signed-off-by: default avatarTushar Dave <tushar.n.dave@intel.com>
      CC: Marcelo Ricardo Leitner <mleitner@redhat.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarPeter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      733e9ad9
    • Johannes Berg's avatar
      iwlwifi: disable greenfield transmissions as a workaround · 66f3d999
      Johannes Berg authored
      commit 50e2a30c upstream.
      
      There's a bug that causes the rate scaling to get stuck
      when it has to use single-stream rates with a peer that
      can do GF and SGI; the two are incompatible so we can't
      use them together, but that causes the algorithm to not
      work at all, it always rejects updates.
      
      Disable greenfield for now to prevent that problem.
      Reviewed-by: default avatarEmmanuel Grumbach <emmanuel.grumbach@intel.com>
      Tested-by: default avatarCesar Eduardo Barros <cesarb@cesarb.net>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      66f3d999
    • Stanislav Kinsbursky's avatar
      tun: don't zeroize sock->file on detach · c6c0038a
      Stanislav Kinsbursky authored
      commit 66d1b926 upstream.
      
      This is a fix for bug, introduced in 3.4 kernel by commit
      1ab5ecb9 ("tun: don't hold network
      namespace by tun sockets"), which, among other things, replaced simple
      sock_put() by sk_release_kernel(). Below is sequence, which leads to
      oops for non-persistent devices:
      
      tun_chr_close()
      tun_detach()				<== tun->socket.file = NULL
      tun_free_netdev()
      sk_release_sock()
      sock_release(sock->file == NULL)
      iput(SOCK_INODE(sock))			<== dereference on NULL pointer
      
      This patch just removes zeroing of socket's file from __tun_detach().
      sock_release() will do this.
      Reported-by: default avatarRuan Zhijie <ruanzhijie@hotmail.com>
      Tested-by: default avatarRuan Zhijie <ruanzhijie@hotmail.com>
      Acked-by: default avatarAl Viro <viro@ZenIV.linux.org.uk>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarStanislav Kinsbursky <skinsbursky@parallels.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c6c0038a
    • Liang Li's avatar
      cfg80211: fix interface combinations check for ADHOC(IBSS) · 510f1d14
      Liang Li authored
      partial of commit 8e8b41f9 upstream.
      
      As part of commit 463454b5 ("cfg80211: fix interface
      combinations check"), this extra check was introduced:
      
             if ((all_iftypes & used_iftypes) != used_iftypes)
                     goto cont;
      
      However, most wireless NIC drivers did not advertise ADHOC in
      wiphy.iface_combinations[i].limits[] and hence we'll get -EBUSY
      when we bring up a ADHOC wlan with commands similar to:
      
       # iwconfig wlan0 mode ad-hoc && ifconfig wlan0 up
      
      In commit 8e8b41f9 ("cfg80211: enforce lack of interface
      combinations"), the change below fixes the issue:
      
             if (total == 1)
                     return 0;
      
      But it also introduces other dependencies for stable. For example,
      a full cherry pick of 8e8b41f9 would introduce additional
      regressions unless we also start cherry picking driver specific
      fixes like the following:
      
        9b4760e3  ath5k: add possible wiphy interface combinations
        1ae2fc25  mac80211_hwsim: advertise interface combinations
        20c8e8dc  ath9k: add possible wiphy interface combinations
      
      And the purpose of the 'if (total == 1)' is to cover the specific
      use case (IBSS, adhoc) that was mentioned above. So we just pick
      the specific part out from 8e8b41f9 here.
      
      Doing so gives stable kernels a way to fix the change introduced
      by 463454b5, without having to make cherry picks specific to
      various NIC drivers.
      Signed-off-by: default avatarLiang Li <liang.li@windriver.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      510f1d14
    • Daniel Drake's avatar
      cfg80211: process pending events when unregistering net device · 25320e75
      Daniel Drake authored
      commit 1f6fc43e upstream.
      
      libertas currently calls cfg80211_disconnected() when it is being
      brought down. This causes an event to be allocated, but since the
      wdev is already removed from the rdev by the time that the event
      processing work executes, the event is never processed or freed.
      http://article.gmane.org/gmane.linux.kernel.wireless.general/95666
      
      Fix this leak, and other possible situations, by processing the event
      queue when a device is being unregistered. Thanks to Johannes Berg for
      the suggestion.
      Signed-off-by: default avatarDaniel Drake <dsd@laptop.org>
      Reviewed-by: default avatarJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      25320e75
    • Arnd Bergmann's avatar
      ARM: pxa: remove irq_to_gpio from ezx-pcap driver · 5784dff6
      Arnd Bergmann authored
      commit 59ee93a5 upstream.
      
      The irq_to_gpio function was removed from the pxa platform
      in linux-3.2, and this driver has been broken since.
      
      There is actually no in-tree user of this driver that adds
      this platform device, but the driver can and does get enabled
      on some platforms.
      
      Without this patch, building ezx_defconfig results in:
      
      drivers/mfd/ezx-pcap.c: In function 'pcap_isr_work':
      drivers/mfd/ezx-pcap.c:205:2: error: implicit declaration of function 'irq_to_gpio' [-Werror=implicit-function-declaration]
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarHaojian Zhuang <haojian.zhuang@gmail.com>
      Cc: Samuel Ortiz <sameo@linux.intel.com>
      Cc: Daniel Ribeiro <drwyrm@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5784dff6
    • Shawn Guo's avatar
      ARM: dts: imx53-ard: add regulators for lan9220 · 0f27ec33
      Shawn Guo authored
      commit 1eec0c56 upstream.
      
      Since commit c7e963f6 (net/smsc911x: Add regulator support), the lan9220
      device tree probe fails on imx53-ard board, because the commit makes
      VDD33A and VDDVARIO supplies mandatory for the driver.
      
      Add a fixed dummy 3V3 supplying lan9220 to fix the regression.
      Signed-off-by: default avatarShawn Guo <shawn.guo@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0f27ec33
    • Marek Vasut's avatar
      ARM: mxs: Remove MMAP_MIN_ADDR setting from mxs_defconfig · 87a266d5
      Marek Vasut authored
      commit 3bed491c upstream.
      
      The CONFIG_DEFAULT_MMAP_MIN_ADDR was set to 65536 in mxs_defconfig,
      this caused severe breakage of userland applications since the upper
      limit for ARM is 32768. By default CONFIG_DEFAULT_MMAP_MIN_ADDR is
      set to 4096 and can also be changed via /proc/sys/vm/mmap_min_addr
      if needed.
      
      Quoting Russell King [1]:
      
      "4096 is also fine for ARM too. There's not much point in having
      defconfigs change it - that would just be pure noise in the config
      files."
      
      the CONFIG_DEFAULT_MMAP_MIN_ADDR can be removed from the defconfig
      altogether.
      
      This problem was introduced by commit cde7c41f (ARM: configs: add
      defconfig for mach-mxs).
      
      [1] http://marc.info/?l=linux-arm-kernel&m=134401593807820&w=2Signed-off-by: default avatarMarek Vasut <marex@denx.de>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Wolfgang Denk <wd@denx.de>
      Signed-off-by: default avatarShawn Guo <shawn.guo@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      87a266d5
    • Roland Dreier's avatar
      target: Check number of unmap descriptors against our limit · 62135663
      Roland Dreier authored
      commit 7409a665 upstream.
      
      Fail UNMAP commands that have more than our reported limit on unmap
      descriptors.
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      [bwh: Backported to 3.2: adjust filename]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      62135663
    • Roland Dreier's avatar
      target: Fix possible integer underflow in UNMAP emulation · c329e96a
      Roland Dreier authored
      commit b7fc7f37 upstream.
      
      It's possible for an initiator to send us an UNMAP command with a
      descriptor that is less than 8 bytes; in that case it's really bad for
      us to set an unsigned int to that value, subtract 8 from it, and then
      use that as a limit for our loop (since the value will wrap around to
      a huge positive value).
      
      Fix this by making size be signed and only looping if size >= 16 (ie
      if we have at least a full descriptor available).
      
      Also remove offset as an obfuscated name for the constant 8.
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      [bwh: Backported to 3.2: adjust filename, context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c329e96a
    • Roland Dreier's avatar
      target: Fix reading of data length fields for UNMAP commands · 6fb6469c
      Roland Dreier authored
      commit 1a5fa457 upstream.
      
      The UNMAP DATA LENGTH and UNMAP BLOCK DESCRIPTOR DATA LENGTH fields
      are in the unmap descriptor (the payload transferred to our data out
      buffer), not in the CDB itself.  Read them from the correct place in
      target_emulated_unmap.
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      [bwh: Backported to 3.2: adjust filename, context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6fb6469c
    • Roland Dreier's avatar
      target: Add range checking to UNMAP emulation · 74c90d4c
      Roland Dreier authored
      commit 2594e298 upstream.
      
      When processing an UNMAP command, we need to make sure that the number
      of blocks we're asked to UNMAP does not exceed our reported maximum
      number of blocks per UNMAP, and that the range of blocks we're
      unmapping doesn't go past the end of the device.
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      [bwh: Backported to 3.2: adjust filename, context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      74c90d4c
    • Mel Gorman's avatar
      mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables · 49ca2404
      Mel Gorman authored
      commit d833352a upstream.
      
      If a process creates a large hugetlbfs mapping that is eligible for page
      table sharing and forks heavily with children some of whom fault and
      others which destroy the mapping then it is possible for page tables to
      get corrupted.  Some teardowns of the mapping encounter a "bad pmd" and
      output a message to the kernel log.  The final teardown will trigger a
      BUG_ON in mm/filemap.c.
      
      This was reproduced in 3.4 but is known to have existed for a long time
      and goes back at least as far as 2.6.37.  It was probably was introduced
      in 2.6.20 by [39dde65c: shared page table for hugetlb page].  The messages
      look like this;
      
      [  ..........] Lots of bad pmd messages followed by this
      [  127.164256] mm/memory.c:391: bad pmd ffff880412e04fe8(80000003de4000e7).
      [  127.164257] mm/memory.c:391: bad pmd ffff880412e04ff0(80000003de6000e7).
      [  127.164258] mm/memory.c:391: bad pmd ffff880412e04ff8(80000003de0000e7).
      [  127.186778] ------------[ cut here ]------------
      [  127.186781] kernel BUG at mm/filemap.c:134!
      [  127.186782] invalid opcode: 0000 [#1] SMP
      [  127.186783] CPU 7
      [  127.186784] Modules linked in: af_packet cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf ext3 jbd dm_mod coretemp crc32c_intel usb_storage ghash_clmulni_intel aesni_intel i2c_i801 r8169 mii uas sr_mod cdrom sg iTCO_wdt iTCO_vendor_support shpchp serio_raw cryptd aes_x86_64 e1000e pci_hotplug dcdbas aes_generic container microcode ext4 mbcache jbd2 crc16 sd_mod crc_t10dif i915 drm_kms_helper drm i2c_algo_bit ehci_hcd ahci libahci usbcore rtc_cmos usb_common button i2c_core intel_agp video intel_gtt fan processor thermal thermal_sys hwmon ata_generic pata_atiixp libata scsi_mod
      [  127.186801]
      [  127.186802] Pid: 9017, comm: hugetlbfs-test Not tainted 3.4.0-autobuild #53 Dell Inc. OptiPlex 990/06D7TR
      [  127.186804] RIP: 0010:[<ffffffff810ed6ce>]  [<ffffffff810ed6ce>] __delete_from_page_cache+0x15e/0x160
      [  127.186809] RSP: 0000:ffff8804144b5c08  EFLAGS: 00010002
      [  127.186810] RAX: 0000000000000001 RBX: ffffea000a5c9000 RCX: 00000000ffffffc0
      [  127.186811] RDX: 0000000000000000 RSI: 0000000000000009 RDI: ffff88042dfdad00
      [  127.186812] RBP: ffff8804144b5c18 R08: 0000000000000009 R09: 0000000000000003
      [  127.186813] R10: 0000000000000000 R11: 000000000000002d R12: ffff880412ff83d8
      [  127.186814] R13: ffff880412ff83d8 R14: 0000000000000000 R15: ffff880412ff83d8
      [  127.186815] FS:  00007fe18ed2c700(0000) GS:ffff88042dce0000(0000) knlGS:0000000000000000
      [  127.186816] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [  127.186817] CR2: 00007fe340000503 CR3: 0000000417a14000 CR4: 00000000000407e0
      [  127.186818] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  127.186819] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [  127.186820] Process hugetlbfs-test (pid: 9017, threadinfo ffff8804144b4000, task ffff880417f803c0)
      [  127.186821] Stack:
      [  127.186822]  ffffea000a5c9000 0000000000000000 ffff8804144b5c48 ffffffff810ed83b
      [  127.186824]  ffff8804144b5c48 000000000000138a 0000000000001387 ffff8804144b5c98
      [  127.186825]  ffff8804144b5d48 ffffffff811bc925 ffff8804144b5cb8 0000000000000000
      [  127.186827] Call Trace:
      [  127.186829]  [<ffffffff810ed83b>] delete_from_page_cache+0x3b/0x80
      [  127.186832]  [<ffffffff811bc925>] truncate_hugepages+0x115/0x220
      [  127.186834]  [<ffffffff811bca43>] hugetlbfs_evict_inode+0x13/0x30
      [  127.186837]  [<ffffffff811655c7>] evict+0xa7/0x1b0
      [  127.186839]  [<ffffffff811657a3>] iput_final+0xd3/0x1f0
      [  127.186840]  [<ffffffff811658f9>] iput+0x39/0x50
      [  127.186842]  [<ffffffff81162708>] d_kill+0xf8/0x130
      [  127.186843]  [<ffffffff81162812>] dput+0xd2/0x1a0
      [  127.186845]  [<ffffffff8114e2d0>] __fput+0x170/0x230
      [  127.186848]  [<ffffffff81236e0e>] ? rb_erase+0xce/0x150
      [  127.186849]  [<ffffffff8114e3ad>] fput+0x1d/0x30
      [  127.186851]  [<ffffffff81117db7>] remove_vma+0x37/0x80
      [  127.186853]  [<ffffffff81119182>] do_munmap+0x2d2/0x360
      [  127.186855]  [<ffffffff811cc639>] sys_shmdt+0xc9/0x170
      [  127.186857]  [<ffffffff81410a39>] system_call_fastpath+0x16/0x1b
      [  127.186858] Code: 0f 1f 44 00 00 48 8b 43 08 48 8b 00 48 8b 40 28 8b b0 40 03 00 00 85 f6 0f 88 df fe ff ff 48 89 df e8 e7 cb 05 00 e9 d2 fe ff ff <0f> 0b 55 83 e2 fd 48 89 e5 48 83 ec 30 48 89 5d d8 4c 89 65 e0
      [  127.186868] RIP  [<ffffffff810ed6ce>] __delete_from_page_cache+0x15e/0x160
      [  127.186870]  RSP <ffff8804144b5c08>
      [  127.186871] ---[ end trace 7cbac5d1db69f426 ]---
      
      The bug is a race and not always easy to reproduce.  To reproduce it I was
      doing the following on a single socket I7-based machine with 16G of RAM.
      
      $ hugeadm --pool-pages-max DEFAULT:13G
      $ echo $((18*1048576*1024)) > /proc/sys/kernel/shmmax
      $ echo $((18*1048576*1024)) > /proc/sys/kernel/shmall
      $ for i in `seq 1 9000`; do ./hugetlbfs-test; done
      
      On my particular machine, it usually triggers within 10 minutes but
      enabling debug options can change the timing such that it never hits.
      Once the bug is triggered, the machine is in trouble and needs to be
      rebooted.  The machine will respond but processes accessing proc like "ps
      aux" will hang due to the BUG_ON.  shutdown will also hang and needs a
      hard reset or a sysrq-b.
      
      The basic problem is a race between page table sharing and teardown.  For
      the most part page table sharing depends on i_mmap_mutex.  In some cases,
      it is also taking the mm->page_table_lock for the PTE updates but with
      shared page tables, it is the i_mmap_mutex that is more important.
      
      Unfortunately it appears to be also insufficient. Consider the following
      situation
      
      Process A					Process B
      ---------					---------
      hugetlb_fault					shmdt
        						LockWrite(mmap_sem)
          						  do_munmap
      						    unmap_region
      						      unmap_vmas
      						        unmap_single_vma
      						          unmap_hugepage_range
            						            Lock(i_mmap_mutex)
      							    Lock(mm->page_table_lock)
      							    huge_pmd_unshare/unmap tables <--- (1)
      							    Unlock(mm->page_table_lock)
            						            Unlock(i_mmap_mutex)
        huge_pte_alloc				      ...
          Lock(i_mmap_mutex)				      ...
          vma_prio_walk, find svma, spte		      ...
          Lock(mm->page_table_lock)			      ...
          share spte					      ...
          Unlock(mm->page_table_lock)			      ...
          Unlock(i_mmap_mutex)			      ...
        hugetlb_no_page									  <--- (2)
      						      free_pgtables
      						        unlink_file_vma
      							hugetlb_free_pgd_range
      						    remove_vma_list
      
      In this scenario, it is possible for Process A to share page tables with
      Process B that is trying to tear them down.  The i_mmap_mutex on its own
      does not prevent Process A walking Process B's page tables.  At (1) above,
      the page tables are not shared yet so it unmaps the PMDs.  Process A sets
      up page table sharing and at (2) faults a new entry.  Process B then trips
      up on it in free_pgtables.
      
      This patch fixes the problem by adding a new function
      __unmap_hugepage_range_final that is only called when the VMA is about to
      be destroyed.  This function clears VM_MAYSHARE during
      unmap_hugepage_range() under the i_mmap_mutex.  This makes the VMA
      ineligible for sharing and avoids the race.  Superficially this looks like
      it would then be vunerable to truncate and madvise issues but hugetlbfs
      has its own truncate handlers so does not use unmap_mapping_range() and
      does not support madvise(DONTNEED).
      
      This should be treated as a -stable candidate if it is merged.
      
      Test program is as follows. The test case was mostly written by Michal
      Hocko with a few minor changes to reproduce this bug.
      
      ==== CUT HERE ====
      
      static size_t huge_page_size = (2UL << 20);
      static size_t nr_huge_page_A = 512;
      static size_t nr_huge_page_B = 5632;
      
      unsigned int get_random(unsigned int max)
      {
      	struct timeval tv;
      
      	gettimeofday(&tv, NULL);
      	srandom(tv.tv_usec);
      	return random() % max;
      }
      
      static void play(void *addr, size_t size)
      {
      	unsigned char *start = addr,
      		      *end = start + size,
      		      *a;
      	start += get_random(size/2);
      
      	/* we could itterate on huge pages but let's give it more time. */
      	for (a = start; a < end; a += 4096)
      		*a = 0;
      }
      
      int main(int argc, char **argv)
      {
      	key_t key = IPC_PRIVATE;
      	size_t sizeA = nr_huge_page_A * huge_page_size;
      	size_t sizeB = nr_huge_page_B * huge_page_size;
      	int shmidA, shmidB;
      	void *addrA = NULL, *addrB = NULL;
      	int nr_children = 300, n = 0;
      
      	if ((shmidA = shmget(key, sizeA, IPC_CREAT|SHM_HUGETLB|0660)) == -1) {
      		perror("shmget:");
      		return 1;
      	}
      
      	if ((addrA = shmat(shmidA, addrA, SHM_R|SHM_W)) == (void *)-1UL) {
      		perror("shmat");
      		return 1;
      	}
      	if ((shmidB = shmget(key, sizeB, IPC_CREAT|SHM_HUGETLB|0660)) == -1) {
      		perror("shmget:");
      		return 1;
      	}
      
      	if ((addrB = shmat(shmidB, addrB, SHM_R|SHM_W)) == (void *)-1UL) {
      		perror("shmat");
      		return 1;
      	}
      
      fork_child:
      	switch(fork()) {
      		case 0:
      			switch (n%3) {
      			case 0:
      				play(addrA, sizeA);
      				break;
      			case 1:
      				play(addrB, sizeB);
      				break;
      			case 2:
      				break;
      			}
      			break;
      		case -1:
      			perror("fork:");
      			break;
      		default:
      			if (++n < nr_children)
      				goto fork_child;
      			play(addrA, sizeA);
      			break;
      	}
      	shmdt(addrA);
      	shmdt(addrB);
      	do {
      		wait(NULL);
      	} while (--n > 0);
      	shmctl(shmidA, IPC_RMID, NULL);
      	shmctl(shmidB, IPC_RMID, NULL);
      	return 0;
      }
      
      [akpm@linux-foundation.org: name the declaration's args, fix CONFIG_HUGETLBFS=n build]
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      49ca2404