1. 24 Mar, 2014 10 commits
    • Heinz Mauelshagen's avatar
      dm cache: fix access beyond end of origin device · b5ef2d01
      Heinz Mauelshagen authored
      commit e893fba9 upstream.
      
      In order to avoid wasting cache space a partial block at the end of the
      origin device is not cached.  Unfortunately, the check for such a
      partial block at the end of the origin device was flawed.
      
      Fix accesses beyond the end of the origin device that occured due to
      attempted promotion of an undetected partial block by:
      
      - initializing the per bio data struct to allow cache_end_io to work properly
      - recognizing access to the partial block at the end of the origin device
      - avoiding out of bounds access to the discard bitset
      
      Otherwise, users can experience errors like the following:
      
       attempt to access beyond end of device
       dm-5: rw=0, want=20971520, limit=20971456
       ...
       device-mapper: cache: promotion failed; couldn't copy block
      Signed-off-by: default avatarHeinz Mauelshagen <heinzm@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      b5ef2d01
    • Heinz Mauelshagen's avatar
      dm cache: fix truncation bug when copying a block to/from >2TB fast device · 67f1830f
      Heinz Mauelshagen authored
      commit 8b9d9666 upstream.
      
      During demotion or promotion to a cache's >2TB fast device we must not
      truncate the cache block's associated sector to 32bits.  The 32bit
      temporary result of from_cblock() caused a 32bit multiplication when
      calculating the sector of the fast device in issue_copy_real().
      
      Use an intermediate 64bit type to store the 32bit from_cblock() to allow
      for proper 64bit multiplication.
      
      Here is an example of how this bug manifests on an ext4 filesystem:
      
       EXT4-fs error (device dm-0): ext4_mb_generate_buddy:756: group 17136, 32768 clusters in bitmap, 30688 in gd; block bitmap corrupt.
       JBD2: Spotted dirty metadata buffer (dev = dm-0, blocknr = 0). There's a risk of filesystem corruption in case of system crash.
      Signed-off-by: default avatarHeinz Mauelshagen <heinzm@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      67f1830f
    • Joe Thornber's avatar
      dm space map metadata: fix refcount decrement below 0 which caused corruption · b07b194d
      Joe Thornber authored
      commit cebc2de4 upstream.
      
      This has been a relatively long-standing issue that wasn't nailed down
      until Teng-Feng Yang's meticulous bug report to dm-devel on 3/7/2014,
      see: http://www.redhat.com/archives/dm-devel/2014-March/msg00021.html
      
      From that report:
        "When decreasing the reference count of a metadata block with its
        reference count equals 3, we will call dm_btree_remove() to remove
        this enrty from the B+tree which keeps the reference count info in
        metadata device.
      
        The B+tree will try to rebalance the entry of the child nodes in each
        node it traversed, and the rebalance process contains the following
        steps.
      
        (1) Finding the corresponding children in current node (shadow_current(s))
        (2) Shadow the children block (issue BOP_INC)
        (3) redistribute keys among children, and free children if necessary (issue BOP_DEC)
      
        Since the update of a metadata block's reference count could be
        recursive, we will stash these reference count update operations in
        smm->uncommitted and then process them in a FILO fashion.
      
        The problem is that step(3) could free the children which is created
        in step(2), so the BOP_DEC issued in step(3) will be carried out
        before the BOP_INC issued in step(2) since these BOPs will be
        processed in FILO fashion. Once the BOP_DEC from step(3) tries to
        decrease the reference count of newly shadow block, it will report
        failure for its reference equals 0 before decreasing. It looks like we
        can solve this issue by processing these BOPs in a FIFO fashion
        instead of FILO."
      
      Commit 5b564d80 ("dm space map: disallow decrementing a reference count
      below zero") changed the code to report an error for this temporary
      refcount decrement below zero.  So what was previously a harmless
      invalid refcount became a hard failure due to the new error path:
      
       device-mapper: space map common: unable to decrement a reference count below 0
       device-mapper: thin: 253:6: dm_thin_insert_block() failed: error = -22
       device-mapper: thin: 253:6: switching pool to read-only mode
      
      This bug is in dm persistent-data code that is common to the DM thin and
      cache targets.  So any users of those targets should apply this fix.
      
      Fix this by applying recursive space map operations in FIFO order rather
      than FILO.
      
      Resolves: https://bugzilla.kernel.org/show_bug.cgi?id=68801Reported-by: default avatarApollon Oikonomopoulos <apoikos@debian.org>
      Reported-by: edwillam1007@gmail.com
      Reported-by: default avatarTeng-Feng Yang <shinrairis@gmail.com>
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      b07b194d
    • Laura Abbott's avatar
      mm/compaction: break out of loop on !PageBuddy in isolate_freepages_block · fd175558
      Laura Abbott authored
      commit 2af120bc upstream.
      
      We received several reports of bad page state when freeing CMA pages
      previously allocated with alloc_contig_range:
      
          BUG: Bad page state in process Binder_A  pfn:63202
          page:d21130b0 count:0 mapcount:1 mapping:  (null) index:0x7dfbf
          page flags: 0x40080068(uptodate|lru|active|swapbacked)
      
      Based on the page state, it looks like the page was still in use.  The
      page flags do not make sense for the use case though.  Further debugging
      showed that despite alloc_contig_range returning success, at least one
      page in the range still remained in the buddy allocator.
      
      There is an issue with isolate_freepages_block.  In strict mode (which
      CMA uses), if any pages in the range cannot be isolated,
      isolate_freepages_block should return failure 0.  The current check
      keeps track of the total number of isolated pages and compares against
      the size of the range:
      
              if (strict && nr_strict_required > total_isolated)
                      total_isolated = 0;
      
      After taking the zone lock, if one of the pages in the range is not in
      the buddy allocator, we continue through the loop and do not increment
      total_isolated.  If in the last iteration of the loop we isolate more
      than one page (e.g.  last page needed is a higher order page), the check
      for total_isolated may pass and we fail to detect that a page was
      skipped.  The fix is to bail out if the loop immediately if we are in
      strict mode.  There's no benfit to continuing anyway since we need all
      pages to be isolated.  Additionally, drop the error checking based on
      nr_strict_required and just check the pfn ranges.  This matches with
      what isolate_freepages_range does.
      Signed-off-by: default avatarLaura Abbott <lauraa@codeaurora.org>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: default avatarBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Acked-by: default avatarMichal Nazarewicz <mina86@mina86.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      fd175558
    • Arnd Bergmann's avatar
      vmxnet3: fix building without CONFIG_PCI_MSI · e468205e
      Arnd Bergmann authored
      commit 0a8d8c44 upstream.
      
      Since commit d25f06ea "vmxnet3: fix netpoll race condition",
      the vmxnet3 driver fails to build when CONFIG_PCI_MSI is disabled,
      because it unconditionally references the vmxnet3_msix_rx()
      function.
      
      To fix this, use the same #ifdef in the caller that exists around
      the function definition.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Shreyas Bhatewara <sbhatewara@vmware.com>
      Cc: "VMware, Inc." <pv-drivers@vmware.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      e468205e
    • Neil Horman's avatar
      vmxnet3: fix netpoll race condition · 56ee1332
      Neil Horman authored
      commit d25f06ea upstream.
      
      vmxnet3's netpoll driver is incorrectly coded.  It directly calls
      vmxnet3_do_poll, which is the driver internal napi poll routine.  As the netpoll
      controller method doesn't block real napi polls in any way, there is a potential
      for race conditions in which the netpoll controller method and the napi poll
      method run concurrently.  The result is data corruption causing panics such as this
      one recently observed:
      PID: 1371   TASK: ffff88023762caa0  CPU: 1   COMMAND: "rs:main Q:Reg"
       #0 [ffff88023abd5780] machine_kexec at ffffffff81038f3b
       #1 [ffff88023abd57e0] crash_kexec at ffffffff810c5d92
       #2 [ffff88023abd58b0] oops_end at ffffffff8152b570
       #3 [ffff88023abd58e0] die at ffffffff81010e0b
       #4 [ffff88023abd5910] do_trap at ffffffff8152add4
       #5 [ffff88023abd5970] do_invalid_op at ffffffff8100cf95
       #6 [ffff88023abd5a10] invalid_op at ffffffff8100bf9b
          [exception RIP: vmxnet3_rq_rx_complete+1968]
          RIP: ffffffffa00f1e80  RSP: ffff88023abd5ac8  RFLAGS: 00010086
          RAX: 0000000000000000  RBX: ffff88023b5dcee0  RCX: 00000000000000c0
          RDX: 0000000000000000  RSI: 00000000000005f2  RDI: ffff88023b5dcee0
          RBP: ffff88023abd5b48   R8: 0000000000000000   R9: ffff88023a3b6048
          R10: 0000000000000000  R11: 0000000000000002  R12: ffff8802398d4cd8
          R13: ffff88023af35140  R14: ffff88023b60c890  R15: 0000000000000000
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
       #7 [ffff88023abd5b50] vmxnet3_do_poll at ffffffffa00f204a [vmxnet3]
       #8 [ffff88023abd5b80] vmxnet3_netpoll at ffffffffa00f209c [vmxnet3]
       #9 [ffff88023abd5ba0] netpoll_poll_dev at ffffffff81472bb7
      
      The fix is to do as other drivers do, and have the poll controller call the top
      half interrupt handler, which schedules a napi poll properly to recieve frames
      
      Tested by myself, successfully.
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      CC: Shreyas Bhatewara <sbhatewara@vmware.com>
      CC: "VMware, Inc." <pv-drivers@vmware.com>
      CC: "David S. Miller" <davem@davemloft.net>
      Reviewed-by: default avatarShreyas N Bhatewara <sbhatewara@vmware.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      56ee1332
    • Bjorn Helgaas's avatar
      PCI: Enable INTx in pci_reenable_device() only when MSI/MSI-X not enabled · 16cd1798
      Bjorn Helgaas authored
      commit 3cdeb713 upstream.
      
      Andreas reported that after 1f42db78 ("PCI: Enable INTx if BIOS left
      them disabled"), pciehp surprise removal stopped working.
      
      This happens because pci_reenable_device() on the hotplug bridge (used in
      the pciehp_configure_device() path) clears the Interrupt Disable bit, which
      apparently breaks the bridge's MSI hotplug event reporting.
      
      Previously we cleared the Interrupt Disable bit in do_pci_enable_device(),
      which is used by both pci_enable_device() and pci_reenable_device().  But
      we use pci_reenable_device() after the driver may have enabled MSI or
      MSI-X, and we *set* Interrupt Disable as part of enabling MSI/MSI-X.
      
      This patch clears Interrupt Disable only when MSI/MSI-X has not been
      enabled.
      
      Fixes: 1f42db78 PCI: Enable INTx if BIOS left them disabled
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=71691Reported-and-tested-by: default avatarAndreas Noever <andreas.noever@gmail.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      CC: Sarah Sharp <sarah.a.sharp@linux.intel.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      16cd1798
    • Radim Krčmář's avatar
      KVM: SVM: fix cr8 intercept window · 8df3fd9a
      Radim Krčmář authored
      commit 596f3142 upstream.
      
      We always disable cr8 intercept in its handler, but only re-enable it
      if handling KVM_REQ_EVENT, so there can be a window where we do not
      intercept cr8 writes, which allows an interrupt to disrupt a higher
      priority task.
      
      Fix this by disabling intercepts in the same function that re-enables
      them when needed. This fixes BSOD in Windows 2008.
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Reviewed-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      8df3fd9a
    • Michael Kerrisk's avatar
      ipc: Fix 2 bugs in msgrcv() MSG_COPY implementation · 4abc1815
      Michael Kerrisk authored
      commit 4f87dac3 upstream.
      
      While testing and documenting the msgrcv() MSG_COPY flag that Stanislav
      Kinsbursky added in commit 4a674f34 ("ipc: introduce message queue
      copy feature" => kernel 3.8), I discovered a couple of bugs in the
      implementation.  The two bugs concern MSG_COPY interactions with other
      msgrcv() flags, namely:
      
       (A) MSG_COPY + MSG_EXCEPT
       (B) MSG_COPY + !IPC_NOWAIT
      
      The bugs are distinct (and the fix for the first one is obvious),
      however my fix for both is a single-line patch, which is why I'm
      combining them in a single mail, rather than writing two mails+patches.
      
       ===== (A) MSG_COPY + MSG_EXCEPT =====
      
      With the addition of the MSG_COPY flag, there are now two msgrcv()
      flags--MSG_COPY and MSG_EXCEPT--that modify the meaning of the 'msgtyp'
      argument in unrelated ways.  Specifying both in the same call is a
      logical error that is currently permitted, with the effect that MSG_COPY
      has priority and MSG_EXCEPT is ignored.  The call should give an error
      if both flags are specified.  The patch below implements that behavior.
      
       ===== (B) (B) MSG_COPY + !IPC_NOWAIT =====
      
      The test code that was submitted in commit 3a665531 ("selftests: IPC
      message queue copy feature test") shows MSG_COPY being used in
      conjunction with IPC_NOWAIT.  In other words, if there is no message at
      the position 'msgtyp'.  return immediately with the error in ENOMSG.
      
      What was not (fully) tested is the behavior if MSG_COPY is specified
      *without* IPC_NOWAIT, and there is an odd behavior.  If the queue
      contains less than 'msgtyp' messages, then the call blocks until the
      next message is written to the queue.  At that point, the msgrcv() call
      returns a copy of the newly added message, regardless of whether that
      message is at the ordinal position 'msgtyp'.  This is clearly bogus, and
      problematic for applications that might want to make use of the MSG_COPY
      flag.
      
      I considered the following possible solutions to this problem:
      
       (1) Force the call to block until a message *does* appear at the
           position 'msgtyp'.
      
       (2) If the MSG_COPY flag is specified, the kernel should implicitly add
           IPC_NOWAIT, so that the call fails with ENOMSG for this case.
      
       (3) If the MSG_COPY flag is specified, but IPC_NOWAIT is not, generate
           an error (probably, EINVAL is the right one).
      
      I do not know if any application would really want to have the
      functionality of solution (1), especially since an application can
      determine in advance the number of messages in the queue using msgctl()
      IPC_STAT.  Obviously, this solution would be the most work to implement.
      
      Solution (2) would have the effect of silently fixing any applications
      that tried to employ broken behavior.  However, it would mean that if we
      later decided to implement solution (1), then user-space could not
      easily detect what the kernel supports (but, since I'm somewhat doubtful
      that solution (1) is needed, I'm not sure that this is much of a
      problem).
      
      Solution (3) would have the effect of informing broken applications that
      they are doing something broken.  The downside is that this would cause
      a ABI breakage for any applications that are currently employing the
      broken behavior.  However:
      
      a) Those applications are almost certainly not getting the results they
         expect.
      b) Possibly, those applications don't even exist, because MSG_COPY is
         currently hidden behind CONFIG_CHECKPOINT_RESTORE.
      
      The upside of solution (3) is that if we later decided to implement
      solution (1), user-space could determine what the kernel supports, via
      the error return.
      
      In my view, solution (3) is mildly preferable to solution (2), and
      solution (1) could still be done later if anyone really cares.  The
      patch below implements solution (3).
      
      PS.  For anyone out there still listening, it's the usual story:
      documenting an API (and the thinking about, and the testing of the API,
      that documentation entails) is the one of the single best ways of
      finding bugs in the API, as I've learned from a lot of experience.  Best
      to do that documentation before releasing the API.
      Signed-off-by: default avatarMichael Kerrisk <mtk.manpages@gmail.com>
      Acked-by: default avatarStanislav Kinsbursky <skinsbursky@parallels.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Cc: Serge Hallyn <serge.hallyn@canonical.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      4abc1815
    • Richard Weinberger's avatar
      i2c: Remove usage of orphaned symbol OF_I2C · 726d5fbd
      Richard Weinberger authored
      commit 62c19c9d upstream.
      
      The symbol is an orphan, don't depend on it anymore.
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      [wsa: enhanced commit message]
      Signed-off-by: default avatarWolfram Sang <wsa@the-dreams.de>
      Fixes: 687b81d0 (i2c: move OF helpers into the core)
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      726d5fbd
  2. 22 Mar, 2014 30 commits