1. 17 Nov, 2015 40 commits
    • NeilBrown's avatar
      md/raid10: don't clear bitmap bit when bad-block-list write fails. · 965d8d1d
      NeilBrown authored
      commit c340702c upstream.
      
      When a write fails and a bad-block-list is present, we can
      update the bad-block-list instead of writing the data.  If
      this succeeds then it is OK clear the relevant bitmap-bit as
      no further 'sync' of the block is needed.
      
      However if writing the bad-block-list fails then we need to
      treat the write as failed and particularly must not clear
      the bitmap bit.  Otherwise the device can be re-added (after
      any hardware connection issues are resolved) and because the
      relevant bit in the bitmap is clear, that block will not be
      resynced.  This leads to data corruption.
      
      We already delay the final bio_endio() on the write until
      the bad-block-list is written so that when the write
      returns: either that data is safe, the bad-block record is
      safe, or the fact that the device is faulty is safe.
      However we *don't* delay the clearing of the bitmap, so the
      bitmap bit can be recorded as cleared before we know if the
      bad-block-list was written safely.
      
      So: delay that until the write really is safe.
      i.e. move the call to close_write() until just before
      calling bio_endio(), and recheck the 'is array degraded'
      status before making that call.
      
      This bug goes back to v3.1 when bad-block-lists were
      introduced, though it only affects arrays created with
      mdadm-3.3 or later as only those have bad-block lists.
      
      Backports will require at least
      Commit: 95af587e ("md/raid10: ensure device failure recorded before write request returns.")
      as well.  I'll send that to 'stable' separately.
      
      Note that of the two tests of R10BIO_WriteError that this
      patch adds, the first is certain to fail and the second is
      certain to succeed.  However doing it this way makes the
      patch more obviously correct.  I will tidy the code up in a
      future merge window.
      Reported-by: default avatarNate Dailey <nate.dailey@stratus.com>
      Fixes: bd870a16 ("md/raid10:  Handle write errors by updating badblock log.")
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      965d8d1d
    • NeilBrown's avatar
      md/raid10: ensure device failure recorded before write request returns. · c1fba1c8
      NeilBrown authored
      commit 95af587e upstream.
      
      When a write to one of the legs of a RAID10 fails, the failure is
      recorded in the metadata of the other legs so that after a restart
      the data on the failed drive wont be trusted even if that drive seems
      to be working again (maybe a cable was unplugged).
      
      Currently there is no interlock between the write request completing
      and the metadata update.  So it is possible that the write will
      complete, the app will confirm success in some way, and then the
      machine will crash before the metadata update completes.
      
      This is an extremely small hole for a racy to fit in, but it is
      theoretically possible and so should be closed.
      
      So:
       - set MD_CHANGE_PENDING when requesting a metadata update for a
         failed device, so we can know with certainty when it completes
       - queue requests that experienced an error on a new queue which
         is only processed after the metadata update completes
       - call raid_end_bio_io() on bios in that queue when the time comes.
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      c1fba1c8
    • NeilBrown's avatar
      md/raid1: don't clear bitmap bit when bad-block-list write fails. · 1f6c748a
      NeilBrown authored
      commit bd8688a1 upstream.
      
      When a write fails and a bad-block-list is present, we can
      update the bad-block-list instead of writing the data.  If
      this succeeds then it is OK clear the relevant bitmap-bit as
      no further 'sync' of the block is needed.
      
      However if writing the bad-block-list fails then we need to
      treat the write as failed and particularly must not clear
      the bitmap bit.  Otherwise the device can be re-added (after
      any hardware connection issues are resolved) and because the
      relevant bit in the bitmap is clear, that block will not be
      resynced.  This leads to data corruption.
      
      We already delay the final bio_endio() on the write until
      the bad-block-list is written so that when the write
      returns: either that data is safe, the bad-block record is
      safe, or the fact that the device is faulty is safe.
      However we *don't* delay the clearing of the bitmap, so the
      bitmap bit can be recorded as cleared before we know if the
      bad-block-list was written safely.
      
      So: delay that until the write really is safe.
      i.e. move the call to close_write() until just before
      calling bio_endio(), and recheck the 'is array degraded'
      status before making that call.
      
      This bug goes back to v3.1 when bad-block-lists were
      introduced, though it only affects arrays created with
      mdadm-3.3 or later as only those have bad-block lists.
      
      Backports will require at least
      Commit: 55ce74d4 ("md/raid1: ensure device failure recorded before write request returns.")
      as well.  I'll send that to 'stable' separately.
      
      Note that of the two tests of R1BIO_WriteError that this
      patch adds, the first is certain to fail and the second is
      certain to succeed.  However doing it this way makes the
      patch more obviously correct.  I will tidy the code up in a
      future merge window.
      Reported-and-tested-by: default avatarNate Dailey <nate.dailey@stratus.com>
      Cc: Jes Sorensen <Jes.Sorensen@redhat.com>
      Fixes: cd5ff9a1 ("md/raid1:  Handle write errors by updating badblock log.")
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      1f6c748a
    • NeilBrown's avatar
      md/raid1: ensure device failure recorded before write request returns. · 6a1281c3
      NeilBrown authored
      commit 55ce74d4 upstream.
      
      When a write to one of the legs of a RAID1 fails, the failure is
      recorded in the metadata of the other leg(s) so that after a restart
      the data on the failed drive wont be trusted even if that drive seems
      to be working again  (maybe a cable was unplugged).
      
      Similarly when we record a bad-block in response to a write failure,
      we must not let the write complete until the bad-block update is safe.
      
      Currently there is no interlock between the write request completing
      and the metadata update.  So it is possible that the write will
      complete, the app will confirm success in some way, and then the
      machine will crash before the metadata update completes.
      
      This is an extremely small hole for a racy to fit in, but it is
      theoretically possible and so should be closed.
      
      So:
       - set MD_CHANGE_PENDING when requesting a metadata update for a
         failed device, so we can know with certainty when it completes
       - queue requests that experienced an error on a new queue which
         is only processed after the metadata update completes
       - call raid_end_bio_io() on bios in that queue when the time comes.
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      6a1281c3
    • Mike Snitzer's avatar
      dm btree: fix leak of bufio-backed block in btree_split_beneath error path · 12d1c67b
      Mike Snitzer authored
      commit 4dcb8b57 upstream.
      
      btree_split_beneath()'s error path had an outstanding FIXME that speaks
      directly to the potential for _not_ cleaning up a previously allocated
      bufio-backed block.
      
      Fix this by releasing the previously allocated bufio block using
      unlock_block().
      Reported-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarJoe Thornber <thornber@redhat.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      12d1c67b
    • Joe Thornber's avatar
      dm btree remove: fix a bug when rebalancing nodes after removal · 11020754
      Joe Thornber authored
      commit 2871c69e upstream.
      
      Commit 4c7e3093 ("dm btree remove: fix bug in redistribute3") wasn't
      a complete fix for redistribute3().
      
      The redistribute3 function takes 3 btree nodes and shares out the entries
      evenly between them.  If the three nodes in total contained
      (MAX_ENTRIES * 3) - 1 entries between them then this was erroneously getting
      rebalanced as (MAX_ENTRIES - 1) on the left and right, and (MAX_ENTRIES + 1) in
      the center.
      
      Fix this issue by being more careful about calculating the target number
      of entries for the left and right nodes.
      
      Unit tested in userspace using this program:
      https://github.com/jthornber/redistribute3-test/blob/master/redistribute3_t.cSigned-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      11020754
    • Guillaume Nault's avatar
      ppp: fix pppoe_dev deletion condition in pppoe_release() · e3e62cc7
      Guillaume Nault authored
      commit 1acea4f6 upstream.
      
      We can't rely on PPPOX_ZOMBIE to decide whether to clear po->pppoe_dev.
      PPPOX_ZOMBIE can be set by pppoe_disc_rcv() even when po->pppoe_dev is
      NULL. So we have no guarantee that (sk->sk_state & PPPOX_ZOMBIE) implies
      (po->pppoe_dev != NULL).
      Since we're releasing a PPPoE socket, we want to release the pppoe_dev
      if it exists and reset sk_state to PPPOX_DEAD, no matter the previous
      value of sk_state. So we can just check for po->pppoe_dev and avoid any
      assumption on sk->sk_state.
      
      Fixes: 2b018d57 ("pppoe: drop PPPOX_ZOMBIEs in pppoe_release")
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      e3e62cc7
    • Jan Kara's avatar
      mm: make sendfile(2) killable · 279fc860
      Jan Kara authored
      commit 296291cd upstream.
      
      Currently a simple program below issues a sendfile(2) system call which
      takes about 62 days to complete in my test KVM instance.
      
              int fd;
              off_t off = 0;
      
              fd = open("file", O_RDWR | O_TRUNC | O_SYNC | O_CREAT, 0644);
              ftruncate(fd, 2);
              lseek(fd, 0, SEEK_END);
              sendfile(fd, fd, &off, 0xfffffff);
      
      Now you should not ask kernel to do a stupid stuff like copying 256MB in
      2-byte chunks and call fsync(2) after each chunk but if you do, sysadmin
      should have a way to stop you.
      
      We actually do have a check for fatal_signal_pending() in
      generic_perform_write() which triggers in this path however because we
      always succeed in writing something before the check is done, we return
      value > 0 from generic_perform_write() and thus the information about
      signal gets lost.
      
      Fix the problem by doing the signal check before writing anything.  That
      way generic_perform_write() returns -EINTR, the error gets propagated up
      and the sendfile loop terminates early.
      Signed-off-by: default avatarJan Kara <jack@suse.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      279fc860
    • Vasant Hegde's avatar
      powerpc/rtas: Validate rtas.entry before calling enter_rtas() · 08fd1afd
      Vasant Hegde authored
      commit 8832317f upstream.
      
      Currently we do not validate rtas.entry before calling enter_rtas(). This
      leads to a kernel oops when user space calls rtas system call on a powernv
      platform (see below). This patch adds code to validate rtas.entry before
      making enter_rtas() call.
      
        Oops: Exception in kernel mode, sig: 4 [#1]
        SMP NR_CPUS=1024 NUMA PowerNV
        task: c000000004294b80 ti: c0000007e1a78000 task.ti: c0000007e1a78000
        NIP: 0000000000000000 LR: 0000000000009c14 CTR: c000000000423140
        REGS: c0000007e1a7b920 TRAP: 0e40   Not tainted  (3.18.17-340.el7_1.pkvm3_1_0.2400.1.ppc64le)
        MSR: 1000000000081000 <HV,ME>  CR: 00000000  XER: 00000000
        CFAR: c000000000009c0c SOFTE: 0
        NIP [0000000000000000]           (null)
        LR [0000000000009c14] 0x9c14
        Call Trace:
        [c0000007e1a7bba0] [c00000000041a7f4] avc_has_perm_noaudit+0x54/0x110 (unreliable)
        [c0000007e1a7bd80] [c00000000002ddc0] ppc_rtas+0x150/0x2d0
        [c0000007e1a7be30] [c000000000009358] syscall_exit+0x0/0x98
      
      Fixes: 55190f88 ("powerpc: Add skeleton PowerNV platform")
      Reported-by: default avatarNAGESWARA R. SASTRY <nasastry@in.ibm.com>
      Signed-off-by: default avatarVasant Hegde <hegdevasant@linux.vnet.ibm.com>
      [mpe: Reword change log, trim oops, and add stable + fixes]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      08fd1afd
    • Ilia Mirkin's avatar
      drm/nouveau/gem: return only valid domain when there's only one · f996f92d
      Ilia Mirkin authored
      commit 2a6c521b upstream.
      
      On nv50+, we restrict the valid domains to just the one where the buffer
      was originally created. However after the buffer is evicted to system
      memory, we might move it back to a different domain that was not
      originally valid. When sharing the buffer and retrieving its GEM_INFO
      data, we still want the domain that will be valid for this buffer in a
      pushbuf, not the one where it currently happens to be.
      
      This resolves fdo#92504 and several others. These are due to suspend
      evicting all buffers, making it more likely that they temporarily end up
      in the wrong place.
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92504Signed-off-by: default avatarIlia Mirkin <imirkin@alum.mit.edu>
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      f996f92d
    • Doron Tsur's avatar
      IB/cm: Fix rb-tree duplicate free and use-after-free · 1ae4e01f
      Doron Tsur authored
      commit 0ca81a28 upstream.
      
      ib_send_cm_sidr_rep could sometimes erase the node from the sidr
      (depending on errors in the process). Since ib_send_cm_sidr_rep is
      called both from cm_sidr_req_handler and cm_destroy_id, cm_id_priv
      could be either erased from the rb_tree twice or not erased at all.
      Fixing that by making sure it's erased only once before freeing
      cm_id_priv.
      
      Fixes: a977049d ('[PATCH] IB: Add the kernel CM implementation')
      Signed-off-by: default avatarDoron Tsur <doront@mellanox.com>
      Signed-off-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      1ae4e01f
    • Charles Keepax's avatar
      ASoC: wm8904: Correct number of EQ registers · d2420ed8
      Charles Keepax authored
      commit 97aff2c0 upstream.
      
      There are 24 EQ registers not 25, I suspect this bug came about because
      the registers start at EQ1 not zero. The bug is relatively harmless as
      the extra register written is an unused one.
      Signed-off-by: default avatarCharles Keepax <ckeepax@opensource.wolfsonmicro.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      d2420ed8
    • Herbert Xu's avatar
      crypto: api - Only abort operations on fatal signal · ff4b4b7e
      Herbert Xu authored
      commit 3fc89adb upstream.
      
      Currently a number of Crypto API operations may fail when a signal
      occurs.  This causes nasty problems as the caller of those operations
      are often not in a good position to restart the operation.
      
      In fact there is currently no need for those operations to be
      interrupted by user signals at all.  All we need is for them to
      be killable.
      
      This patch replaces the relevant calls of signal_pending with
      fatal_signal_pending, and wait_for_completion_interruptible with
      wait_for_completion_killable, respectively.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      [bwh: Backported to 3.2: drop change to crypto_user_skcipher_alg(), which
       we don't have]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      ff4b4b7e
    • Laura Abbott's avatar
      xhci: Add spurious wakeup quirk for LynxPoint-LP controllers · c79e716d
      Laura Abbott authored
      commit fd7cd061 upstream.
      
      We received several reports of systems rebooting and powering on
      after an attempted shutdown. Testing showed that setting
      XHCI_SPURIOUS_WAKEUP quirk in addition to the XHCI_SPURIOUS_REBOOT
      quirk allowed the system to shutdown as expected for LynxPoint-LP
      xHCI controllers. Set the quirk back.
      
      Note that the quirk was originally introduced for LynxPoint and
      LynxPoint-LP just for this same reason. See:
      
      commit 638298dc ("xhci: Fix spurious wakeups after S5 on Haswell")
      
      It was later limited to only concern HP machines as it caused
      regression on some machines, see both bug and commit:
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=66171
      commit 6962d914 ("xhci: Limit the spurious wakeup fix only to HP machines")
      
      Later it was discovered that the powering on after shutdown
      was limited to LynxPoint-LP (Haswell-ULT) and that some non-LP HP
      machine suffered from spontaneous resume from S3 (which should
      not be related to the SPURIOUS_WAKEUP quirk at all). An attempt
      to fix this then removed the SPURIOUS_WAKEUP flag usage completely.
      
      commit b45abacd ("xhci: no switching back on non-ULT Haswell")
      
      Current understanding is that LynxPoint-LP (Haswell ULT) machines
      need the SPURIOUS_WAKEUP quirk, otherwise they will restart, and
      plain Lynxpoint (Haswell) machines may _not_ have the quirk
      set otherwise they again will restart.
      Signed-off-by: default avatarLaura Abbott <labbott@fedoraproject.org>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Oliver Neukum <oneukum@suse.com>
      [Added more history to commit message -Mathias]
      Signed-off-by: default avatarMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      c79e716d
    • Denis Turischev's avatar
      xhci: Switch Intel Lynx Point LP ports to EHCI on shutdown. · 77156ac0
      Denis Turischev authored
      commits c09ec25d and
      0a939993 upstream.
      
      The same issue like with Panther Point chipsets. If the USB ports are
      switched to xHCI on shutdown, the xHCI host will send a spurious interrupt,
      which will wake the system. Some BIOS have work around for this, but not all.
      One example is Compulab's mini-desktop, the Intense-PC2.
      
      The bug can be avoided if the USB ports are switched back to EHCI on
      shutdown.
      
      This patch should be backported to stable kernels as old as 3.12,
      that contain the commit 638298dc
      "xhci: Fix spurious wakeups after S5 on Haswell"
      Signed-off-by: default avatarDenis Turischev <denis@compulab.co.il>
      Signed-off-by: default avatarMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      Patch "xhci: Switch Intel Lynx Point ports to EHCI on shutdown."
      commit c09ec25d is not fully correct
      
      It switches both Lynx Point and Lynx Point-LP ports to EHCI on shutdown.
      On some Lynx Point machines it causes spurious interrupt,
      which wake the system: bugzilla.kernel.org/show_bug.cgi?id=76291
      
      On Lynx Point-LP on the contrary switching ports to EHCI seems to be
      necessary to fix these spurious interrupts.
      Signed-off-by: default avatarDenis Turischev <denis@compulab.co.il>
      Reported-by: default avatarWulf Richartz <wulf.richartz@gmail.com>
      Cc: Mathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      [bwh: Combined the above commits and backported to 3.2: adjust context to
       apply after "xhci: Limit the spurious wakeup fix only to HP machines" and
       "xhci: no switching back on non-ULT Haswell"]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      77156ac0
    • Mathias Nyman's avatar
      xhci: handle no ping response error properly · c35a75f2
      Mathias Nyman authored
      commit 3b4739b8 upstream.
      
      If a host fails to wake up a isochronous SuperSpeed device from U1/U2
      in time for a isoch transfer it will generate a "No ping response error"
      Host will then move to the next transfer descriptor.
      
      Handle this case in the same way as missed service errors, tag the
      current TD as skipped and handle it on the next transfer event.
      Signed-off-by: default avatarMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      c35a75f2
    • Mathias Nyman's avatar
      xhci: don't finish a TD if we get a short transfer event mid TD · 7c6aca19
      Mathias Nyman authored
      commit e210c422 upstream.
      
      If the difference is big enough between the bytes asked and received
      in a bulk transfer we can get a short transfer event pointing to a TRB in
      the middle of the TD. We don't want to handle the TD yet as we will anyway
      receive a new event for the last TRB in the TD.
      
      Hold off from finishing the TD and removing it from the list until we
      receive an event for the last TRB in the TD
      Signed-off-by: default avatarMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      7c6aca19
    • Christian Zander's avatar
      iommu/vt-d: fix range computation when making room for large pages · cca53289
      Christian Zander authored
      commit ba2374fd upstream.
      
      In preparation for the installation of a large page, any small page
      tables that may still exist in the target IOV address range are
      removed.  However, if a scatter/gather list entry is large enough to
      fit more than one large page, the address space for any subsequent
      large pages is not cleared of conflicting small page tables.
      
      This can cause legitimate mapping requests to fail with errors of the
      form below, potentially followed by a series of IOMMU faults:
      
      ERROR: DMA PTE for vPFN 0xfde00 already set (to 7f83a4003 not 7e9e00083)
      
      In this example, a 4MiB scatter/gather list entry resulted in the
      successful installation of a large page @ vPFN 0xfdc00, followed by
      a failed attempt to install another large page @ vPFN 0xfde00, due to
      the presence of a pointer to a small page table @ 0x7f83a4000.
      
      To address this problem, compute the number of large pages that fit
      into a given scatter/gather list entry, and use it to derive the
      last vPFN covered by the large page(s).
      Signed-off-by: default avatarChristian Zander <christian@nervanasys.com>
      Signed-off-by: default avatarDavid Woodhouse <David.Woodhouse@intel.com>
      [bwh: Backported to 3.2:
       - Add the lvl_pages variable, added by an earlier commit upstream
       - Also change arguments to dma_pte_clear_range(), which is called by
         dma_pte_free_pagetable() upstream]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      cca53289
    • Russell King's avatar
      crypto: ahash - ensure statesize is non-zero · 44a4ec5c
      Russell King authored
      commit 8996eafd upstream.
      
      Unlike shash algorithms, ahash drivers must implement export
      and import as their descriptors may contain hardware state and
      cannot be exported as is.  Unfortunately some ahash drivers did
      not provide them and end up causing crashes with algif_hash.
      
      This patch adds a check to prevent these drivers from registering
      ahash algorithms until they are fixed.
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      44a4ec5c
    • David Henningsson's avatar
      ALSA: hda - Fix inverted internal mic on Lenovo G50-80 · 680137ae
      David Henningsson authored
      commit e8d65a8d upstream.
      
      Add the appropriate quirk to indicate the Lenovo G50-80 has a stereo
      mic input where one channel has reverse polarity.
      
      Alsa-info available at:
      https://launchpadlibrarian.net/220846272/AlsaInfo.txt
      
      BugLink: https://bugs.launchpad.net/bugs/1504778Signed-off-by: default avatarDavid Henningsson <david.henningsson@canonical.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      680137ae
    • Cathy Avery's avatar
      xen-blkfront: check for null drvdata in blkback_changed (XenbusStateClosing) · 26e6ab3e
      Cathy Avery authored
      commit a54c8f0f upstream.
      
      xen-blkfront will crash if the check to talk_to_blkback()
      in blkback_changed()(XenbusStateInitWait) returns an error.
      The driver data is freed and info is set to NULL. Later during
      the close process via talk_to_blkback's call to xenbus_dev_fatal()
      the null pointer is passed to and dereference in blkfront_closing.
      Signed-off-by: default avatarCathy Avery <cathy.avery@oracle.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      26e6ab3e
    • Christoph Hellwig's avatar
      3w-9xxx: don't unmap bounce buffered commands · 677ee633
      Christoph Hellwig authored
      commit 15e3d5a2 upstream.
      
      3w controller don't dma map small single SGL entry commands but instead
      bounce buffer them.  Add a helper to identify these commands and don't
      call scsi_dma_unmap for them.
      
      Based on an earlier patch from James Bottomley.
      
      Fixes: 118c85 ("3w-9xxx: fix command completion race")
      Reported-by: default avatarTóth Attila <atoth@atoth.sote.hu>
      Tested-by: default avatarTóth Attila <atoth@atoth.sote.hu>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Acked-by: default avatarAdam Radford <aradford@gmail.com>
      Signed-off-by: default avatarJames Bottomley <JBottomley@Odin.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      677ee633
    • Peter Zijlstra's avatar
      sched/core: Fix TASK_DEAD race in finish_task_switch() · d4d8ceb5
      Peter Zijlstra authored
      commit 95913d97 upstream.
      
      So the problem this patch is trying to address is as follows:
      
              CPU0                            CPU1
      
              context_switch(A, B)
                                              ttwu(A)
                                                LOCK A->pi_lock
                                                A->on_cpu == 0
              finish_task_switch(A)
                prev_state = A->state  <-.
                WMB                      |
                A->on_cpu = 0;           |
                UNLOCK rq0->lock         |
                                         |    context_switch(C, A)
                                         `--  A->state = TASK_DEAD
                prev_state == TASK_DEAD
                  put_task_struct(A)
                                              context_switch(A, C)
                                              finish_task_switch(A)
                                                A->state == TASK_DEAD
                                                  put_task_struct(A)
      
      The argument being that the WMB will allow the load of A->state on CPU0
      to cross over and observe CPU1's store of A->state, which will then
      result in a double-drop and use-after-free.
      
      Now the comment states (and this was true once upon a long time ago)
      that we need to observe A->state while holding rq->lock because that
      will order us against the wakeup; however the wakeup will not in fact
      acquire (that) rq->lock; it takes A->pi_lock these days.
      
      We can obviously fix this by upgrading the WMB to an MB, but that is
      expensive, so we'd rather avoid that.
      
      The alternative this patch takes is: smp_store_release(&A->on_cpu, 0),
      which avoids the MB on some archs, but not important ones like ARM.
      Reported-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Cc: manfred@colorfullife.com
      Cc: will.deacon@arm.com
      Fixes: e4a52bcb ("sched: Remove rq->lock from the first half of ttwu()")
      Link: http://lkml.kernel.org/r/20150929124509.GG3816@twins.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      [bwh: Backported to 3.2:
       - Adjust filename
       - As smp_store_release() is not defined, use smp_mb()]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      d4d8ceb5
    • Takashi Iwai's avatar
      ALSA: synth: Fix conflicting OSS device registration on AWE32 · 6f4f891b
      Takashi Iwai authored
      commit 225db576 upstream.
      
      When OSS emulation is loaded on ISA SB AWE32 chip, we get now kernel
      warnings like:
        WARNING: CPU: 0 PID: 2791 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x51/0x80()
        sysfs: cannot create duplicate filename '/devices/isa/sbawe.0/sound/card0/seq-oss-0-0'
      
      It's because both emux synth and opl3 drivers try to register their
      OSS device object with the same static index number 0.  This hasn't
      been a big problem until the recent rewrite of device management code
      (that exposes sysfs at the same time), but it's been an obvious bug.
      
      This patch works around it just by using a different index number of
      emux synth object.  There can be a more elegant way to fix, but it's
      enough for now, as this code won't be touched so often, in anyway.
      Reported-and-tested-by: default avatarMichael Shell <list1@michaelshell.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      6f4f891b
    • Johannes Berg's avatar
      iwlwifi: dvm: fix D3 firmware PN programming · 45a7943d
      Johannes Berg authored
      commit 5bd16687 upstream.
      
      The code to send the RX PN data (for each TID) to the firmware
      has a devastating bug: it overwrites the data for TID 0 with
      all the TID data, leaving the remaining TIDs zeroed. This will
      allow replays to actually be accepted by the firmware, which
      could allow waking up the system.
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarLuca Coelho <luciano.coelho@intel.com>
      [bwh: Backported to 3.2: adjust filename]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      45a7943d
    • Guillaume Nault's avatar
      ppp: don't override sk->sk_state in pppoe_flush_dev() · a2a46f58
      Guillaume Nault authored
      commit e6740165 upstream.
      
      Since commit 2b018d57 ("pppoe: drop PPPOX_ZOMBIEs in pppoe_release"),
      pppoe_release() calls dev_put(po->pppoe_dev) if sk is in the
      PPPOX_ZOMBIE state. But pppoe_flush_dev() can set sk->sk_state to
      PPPOX_ZOMBIE _and_ reset po->pppoe_dev to NULL. This leads to the
      following oops:
      
      [  570.140800] BUG: unable to handle kernel NULL pointer dereference at 00000000000004e0
      [  570.142931] IP: [<ffffffffa018c701>] pppoe_release+0x50/0x101 [pppoe]
      [  570.144601] PGD 3d119067 PUD 3dbc1067 PMD 0
      [  570.144601] Oops: 0000 [#1] SMP
      [  570.144601] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pppoe pppox ppp_generic slhc loop crc32c_intel ghash_clmulni_intel jitterentropy_rng sha256_generic hmac drbg ansi_cprng aesni_intel aes_x86_64 ablk_helper cryptd lrw gf128mul glue_helper acpi_cpufreq evdev serio_raw processor button ext4 crc16 mbcache jbd2 virtio_net virtio_blk virtio_pci virtio_ring virtio
      [  570.144601] CPU: 1 PID: 15738 Comm: ppp-apitest Not tainted 4.2.0 #1
      [  570.144601] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014
      [  570.144601] task: ffff88003d30d600 ti: ffff880036b60000 task.ti: ffff880036b60000
      [  570.144601] RIP: 0010:[<ffffffffa018c701>]  [<ffffffffa018c701>] pppoe_release+0x50/0x101 [pppoe]
      [  570.144601] RSP: 0018:ffff880036b63e08  EFLAGS: 00010202
      [  570.144601] RAX: 0000000000000000 RBX: ffff880034340000 RCX: 0000000000000206
      [  570.144601] RDX: 0000000000000006 RSI: ffff88003d30dd20 RDI: ffff88003d30dd20
      [  570.144601] RBP: ffff880036b63e28 R08: 0000000000000001 R09: 0000000000000000
      [  570.144601] R10: 00007ffee9b50420 R11: ffff880034340078 R12: ffff8800387ec780
      [  570.144601] R13: ffff8800387ec7b0 R14: ffff88003e222aa0 R15: ffff8800387ec7b0
      [  570.144601] FS:  00007f5672f48700(0000) GS:ffff88003fc80000(0000) knlGS:0000000000000000
      [  570.144601] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  570.144601] CR2: 00000000000004e0 CR3: 0000000037f7e000 CR4: 00000000000406a0
      [  570.144601] Stack:
      [  570.144601]  ffffffffa018f240 ffff8800387ec780 ffffffffa018f240 ffff8800387ec7b0
      [  570.144601]  ffff880036b63e48 ffffffff812caabe ffff880039e4e000 0000000000000008
      [  570.144601]  ffff880036b63e58 ffffffff812cabad ffff880036b63ea8 ffffffff811347f5
      [  570.144601] Call Trace:
      [  570.144601]  [<ffffffff812caabe>] sock_release+0x1a/0x75
      [  570.144601]  [<ffffffff812cabad>] sock_close+0xd/0x11
      [  570.144601]  [<ffffffff811347f5>] __fput+0xff/0x1a5
      [  570.144601]  [<ffffffff811348cb>] ____fput+0x9/0xb
      [  570.144601]  [<ffffffff81056682>] task_work_run+0x66/0x90
      [  570.144601]  [<ffffffff8100189e>] prepare_exit_to_usermode+0x8c/0xa7
      [  570.144601]  [<ffffffff81001a26>] syscall_return_slowpath+0x16d/0x19b
      [  570.144601]  [<ffffffff813babb1>] int_ret_from_sys_call+0x25/0x9f
      [  570.144601] Code: 48 8b 83 c8 01 00 00 a8 01 74 12 48 89 df e8 8b 27 14 e1 b8 f7 ff ff ff e9 b7 00 00 00 8a 43 12 a8 0b 74 1c 48 8b 83 a8 04 00 00 <48> 8b 80 e0 04 00 00 65 ff 08 48 c7 83 a8 04 00 00 00 00 00 00
      [  570.144601] RIP  [<ffffffffa018c701>] pppoe_release+0x50/0x101 [pppoe]
      [  570.144601]  RSP <ffff880036b63e08>
      [  570.144601] CR2: 00000000000004e0
      [  570.200518] ---[ end trace 46956baf17349563 ]---
      
      pppoe_flush_dev() has no reason to override sk->sk_state with
      PPPOX_ZOMBIE. pppox_unbind_sock() already sets sk->sk_state to
      PPPOX_DEAD, which is the correct state given that sk is unbound and
      po->pppoe_dev is NULL.
      
      Fixes: 2b018d57 ("pppoe: drop PPPOX_ZOMBIEs in pppoe_release")
      Tested-by: default avatarOleksii Berezhniak <core@irc.lg.ua>
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      a2a46f58
    • Jann Horn's avatar
      drivers/tty: require read access for controlling terminal · db0054fe
      Jann Horn authored
      commit 0c556271 upstream.
      
      This is mostly a hardening fix, given that write-only access to other
      users' ttys is usually only given through setgid tty executables.
      Signed-off-by: default avatarJann Horn <jann@thejh.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [bwh: Backported to 3.2:
       - __proc_set_tty() also takes a task_struct pointer]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      db0054fe
    • Kosuke Tatsukawa's avatar
      tty: fix stall caused by missing memory barrier in drivers/tty/n_tty.c · 80910ccd
      Kosuke Tatsukawa authored
      commit e81107d4 upstream.
      
      My colleague ran into a program stall on a x86_64 server, where
      n_tty_read() was waiting for data even if there was data in the buffer
      in the pty.  kernel stack for the stuck process looks like below.
       #0 [ffff88303d107b58] __schedule at ffffffff815c4b20
       #1 [ffff88303d107bd0] schedule at ffffffff815c513e
       #2 [ffff88303d107bf0] schedule_timeout at ffffffff815c7818
       #3 [ffff88303d107ca0] wait_woken at ffffffff81096bd2
       #4 [ffff88303d107ce0] n_tty_read at ffffffff8136fa23
       #5 [ffff88303d107dd0] tty_read at ffffffff81368013
       #6 [ffff88303d107e20] __vfs_read at ffffffff811a3704
       #7 [ffff88303d107ec0] vfs_read at ffffffff811a3a57
       #8 [ffff88303d107f00] sys_read at ffffffff811a4306
       #9 [ffff88303d107f50] entry_SYSCALL_64_fastpath at ffffffff815c86d7
      
      There seems to be two problems causing this issue.
      
      First, in drivers/tty/n_tty.c, __receive_buf() stores the data and
      updates ldata->commit_head using smp_store_release() and then checks
      the wait queue using waitqueue_active().  However, since there is no
      memory barrier, __receive_buf() could return without calling
      wake_up_interactive_poll(), and at the same time, n_tty_read() could
      start to wait in wait_woken() as in the following chart.
      
              __receive_buf()                         n_tty_read()
      ------------------------------------------------------------------------
      if (waitqueue_active(&tty->read_wait))
      /* Memory operations issued after the
         RELEASE may be completed before the
         RELEASE operation has completed */
                                              add_wait_queue(&tty->read_wait, &wait);
                                              ...
                                              if (!input_available_p(tty, 0)) {
      smp_store_release(&ldata->commit_head,
                        ldata->read_head);
                                              ...
                                              timeout = wait_woken(&wait,
                                                TASK_INTERRUPTIBLE, timeout);
      ------------------------------------------------------------------------
      
      The second problem is that n_tty_read() also lacks a memory barrier
      call and could also cause __receive_buf() to return without calling
      wake_up_interactive_poll(), and n_tty_read() to wait in wait_woken()
      as in the chart below.
      
              __receive_buf()                         n_tty_read()
      ------------------------------------------------------------------------
                                              spin_lock_irqsave(&q->lock, flags);
                                              /* from add_wait_queue() */
                                              ...
                                              if (!input_available_p(tty, 0)) {
                                              /* Memory operations issued after the
                                                 RELEASE may be completed before the
                                                 RELEASE operation has completed */
      smp_store_release(&ldata->commit_head,
                        ldata->read_head);
      if (waitqueue_active(&tty->read_wait))
                                              __add_wait_queue(q, wait);
                                              spin_unlock_irqrestore(&q->lock,flags);
                                              /* from add_wait_queue() */
                                              ...
                                              timeout = wait_woken(&wait,
                                                TASK_INTERRUPTIBLE, timeout);
      ------------------------------------------------------------------------
      
      There are also other places in drivers/tty/n_tty.c which have similar
      calls to waitqueue_active(), so instead of adding many memory barrier
      calls, this patch simply removes the call to waitqueue_active(),
      leaving just wake_up*() behind.
      
      This fixes both problems because, even though the memory access before
      or after the spinlocks in both wake_up*() and add_wait_queue() can
      sneak into the critical section, it cannot go past it and the critical
      section assures that they will be serialized (please see "INTER-CPU
      ACQUIRING BARRIER EFFECTS" in Documentation/memory-barriers.txt for a
      better explanation).  Moreover, the resulting code is much simpler.
      
      Latency measurement using a ping-pong test over a pty doesn't show any
      visible performance drop.
      Signed-off-by: default avatarKosuke Tatsukawa <tatsu@ab.jp.nec.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [bwh: Backported to 3.2:
       - Use wake_up_interruptible(), not wake_up_interruptible_poll()
       - There are only two spurious uses of waitqueue_active() to remove]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      80910ccd
    • Vincent Palatin's avatar
      usb: Add device quirk for Logitech PTZ cameras · 32b95ad7
      Vincent Palatin authored
      commit 72194739 upstream.
      
      Add a device quirk for the Logitech PTZ Pro Camera and its sibling the
      ConferenceCam CC3000e Camera.
      This fixes the failed camera enumeration on some boot, particularly on
      machines with fast CPU.
      
      Tested by connecting a Logitech PTZ Pro Camera to a machine with a
      Haswell Core i7-4600U CPU @ 2.10GHz, and doing thousands of reboot cycles
      while recording the kernel logs and taking camera picture after each boot.
      Before the patch, more than 7% of the boots show some enumeration transfer
      failures and in a few of them, the kernel is giving up before actually
      enumerating the webcam. After the patch, the enumeration has been correct
      on every reboot.
      Signed-off-by: default avatarVincent Palatin <vpalatin@chromium.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      32b95ad7
    • Yao-Wen Mao's avatar
      USB: Add reset-resume quirk for two Plantronics usb headphones. · e365a50e
      Yao-Wen Mao authored
      commit 8484bf29 upstream.
      
      These two headphones need a reset-resume quirk to properly resume to
      original volume level.
      Signed-off-by: default avatarYao-Wen Mao <yaowen@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      e365a50e
    • Dan Carpenter's avatar
      iio: accel: sca3000: memory corruption in sca3000_read_first_n_hw_rb() · 9d2d631c
      Dan Carpenter authored
      commit eda7d0f3 upstream.
      
      "num_read" is in byte units but we are write u16s so we end up write
      twice as much as intended.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      9d2d631c
    • John Stultz's avatar
      clocksource: Fix abs() usage w/ 64bit values · 3f980809
      John Stultz authored
      commit 67dfae0c upstream.
      
      This patch fixes one cases where abs() was being used with 64-bit
      nanosecond values, where the result may be capped at 32-bits.
      
      This potentially could cause watchdog false negatives on 32-bit
      systems, so this patch addresses the issue by using abs64().
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Link: http://lkml.kernel.org/r/1442279124-7309-2-git-send-email-john.stultz@linaro.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      3f980809
    • NeilBrown's avatar
      md/raid0: apply base queue limits *before* disk_stack_limits · e1263d46
      NeilBrown authored
      commit 66eefe5d upstream.
      
      Calling e.g. blk_queue_max_hw_sectors() after calls to
      disk_stack_limits() discards the settings determined by
      disk_stack_limits().
      So we need to make those calls first.
      
      Fixes: 199dc6ed ("md/raid0: update queue parameter in a safer location.")
      Reported-by: default avatarJes Sorensen <Jes.Sorensen@redhat.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      [bwh: Backported to 3.2: the code being moved looks a little different]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      e1263d46
    • NeilBrown's avatar
      md/raid0: update queue parameter in a safer location. · 1afa0468
      NeilBrown authored
      commit 199dc6ed upstream.
      
      When a (e.g.) RAID5 array is reshaped to RAID0, the updating
      of queue parameters (e.g. max number of sectors per bio) is
      done in the wrong place.
      It should be part of ->run, but it is actually part of ->takeover.
      This means it happens before level_store() calls:
      
      	blk_set_stacking_limits(&mddev->queue->limits);
      
      and so it ineffective.  This can lead to errors from underlying
      devices.
      
      So move all the relevant settings out of create_stripe_zones()
      and into raid0_run().
      
      As this can lead to a bug-on it is suitable for any -stable
      kernel which supports reshape to RAID0.  So 2.6.35 or later.
      As the bug has been present for five years there is no urgency,
      so no need to rush into -stable.
      
      Fixes: 9af204cf ("md: Add support for Raid5->Raid0 and Raid10->Raid0 takeover")
      Reported-by: default avatarYi Zhang <yizhan@redhat.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      [bwh: Backported to 3.2:
       - md has no discard or write-same support
       - md is not used by dm-raid so mddev->queue is never null
       - Open-code rdev_for_each()
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      1afa0468
    • Steve French's avatar
      Do not fall back to SMBWriteX in set_file_size error cases · c62cf38b
      Steve French authored
      commit 646200a0 upstream.
      
      The error paths in set_file_size for cifs and smb3 are incorrect.
      
      In the unlikely event that a server did not support set file info
      of the file size, the code incorrectly falls back to trying SMBWriteX
      (note that only the original core SMB Write, used for example by DOS,
      can set the file size this way - this actually  does not work for the more
      recent SMBWriteX).  The idea was since the old DOS SMB Write could set
      the file size if you write zero bytes at that offset then use that if
      server rejects the normal set file info call.
      
      Fortunately the SMBWriteX will never be sent on the wire (except when
      file size is zero) since the length and offset fields were reversed
      in the two places in this function that call SMBWriteX causing
      the fall back path to return an error. It is also important to never call
      an SMB request from an SMB2/sMB3 session (which theoretically would
      be possible, and can cause a brief session drop, although the client
      recovers) so this should be fixed.  In practice this path does not happen
      with modern servers but the error fall back to SMBWriteX is clearly wrong.
      
      Removing the calls to SMBWriteX in the error paths in cifs_set_file_size
      
      Pointed out by PaX/grsecurity team
      Signed-off-by: default avatarSteve French <steve.french@primarydata.com>
      Reported-by: default avatarPaX Team <pageexec@freemail.hu>
      CC: Emese Revfy <re.emese@gmail.com>
      CC: Brad Spengler <spender@grsecurity.net>
      [bwh: Backported to 3.2: deleted code looks slightly different]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      c62cf38b
    • Mel Gorman's avatar
      mm: hugetlbfs: skip shared VMAs when unmapping private pages to satisfy a fault · 846bc2d8
      Mel Gorman authored
      commit 2f84a899 upstream.
      
      SunDong reported the following on
      
        https://bugzilla.kernel.org/show_bug.cgi?id=103841
      
      	I think I find a linux bug, I have the test cases is constructed. I
      	can stable recurring problems in fedora22(4.0.4) kernel version,
      	arch for x86_64.  I construct transparent huge page, when the parent
      	and child process with MAP_SHARE, MAP_PRIVATE way to access the same
      	huge page area, it has the opportunity to lead to huge page copy on
      	write failure, and then it will munmap the child corresponding mmap
      	area, but then the child mmap area with VM_MAYSHARE attributes, child
      	process munmap this area can trigger VM_BUG_ON in set_vma_resv_flags
      	functions (vma - > vm_flags & VM_MAYSHARE).
      
      There were a number of problems with the report (e.g.  it's hugetlbfs that
      triggers this, not transparent huge pages) but it was fundamentally
      correct in that a VM_BUG_ON in set_vma_resv_flags() can be triggered that
      looks like this
      
      	 vma ffff8804651fd0d0 start 00007fc474e00000 end 00007fc475e00000
      	 next ffff8804651fd018 prev ffff8804651fd188 mm ffff88046b1b1800
      	 prot 8000000000000027 anon_vma           (null) vm_ops ffffffff8182a7a0
      	 pgoff 0 file ffff88106bdb9800 private_data           (null)
      	 flags: 0x84400fb(read|write|shared|mayread|maywrite|mayexec|mayshare|dontexpand|hugetlb)
      	 ------------
      	 kernel BUG at mm/hugetlb.c:462!
      	 SMP
      	 Modules linked in: xt_pkttype xt_LOG xt_limit [..]
      	 CPU: 38 PID: 26839 Comm: map Not tainted 4.0.4-default #1
      	 Hardware name: Dell Inc. PowerEdge R810/0TT6JF, BIOS 2.7.4 04/26/2012
      	 set_vma_resv_flags+0x2d/0x30
      
      The VM_BUG_ON is correct because private and shared mappings have
      different reservation accounting but the warning clearly shows that the
      VMA is shared.
      
      When a private COW fails to allocate a new page then only the process
      that created the VMA gets the page -- all the children unmap the page.
      If the children access that data in the future then they get killed.
      
      The problem is that the same file is mapped shared and private.  During
      the COW, the allocation fails, the VMAs are traversed to unmap the other
      private pages but a shared VMA is found and the bug is triggered.  This
      patch identifies such VMAs and skips them.
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Reported-by: default avatarSunDong <sund_sky@126.com>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: David Rientjes <rientjes@google.com>
      Reviewed-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      846bc2d8
    • Ben Hutchings's avatar
      genirq: Fix race in register_irq_proc() · bde3a53c
      Ben Hutchings authored
      commit 95c2b175 upstream.
      
      Per-IRQ directories in procfs are created only when a handler is first
      added to the irqdesc, not when the irqdesc is created.  In the case of
      a shared IRQ, multiple tasks can race to create a directory.  This
      race condition seems to have been present forever, but is easier to
      hit with async probing.
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Link: http://lkml.kernel.org/r/1443266636.2004.2.camel@decadent.org.ukSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      bde3a53c
    • Thomas Gleixner's avatar
      x86/process: Add proper bound checks in 64bit get_wchan() · 5311d93d
      Thomas Gleixner authored
      commit eddd3826 upstream.
      
      Dmitry Vyukov reported the following using trinity and the memory
      error detector AddressSanitizer
      (https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel).
      
      [ 124.575597] ERROR: AddressSanitizer: heap-buffer-overflow on
      address ffff88002e280000
      [ 124.576801] ffff88002e280000 is located 131938492886538 bytes to
      the left of 28857600-byte region [ffffffff81282e0a, ffffffff82e0830a)
      [ 124.578633] Accessed by thread T10915:
      [ 124.579295] inlined in describe_heap_address
      ./arch/x86/mm/asan/report.c:164
      [ 124.579295] #0 ffffffff810dd277 in asan_report_error
      ./arch/x86/mm/asan/report.c:278
      [ 124.580137] #1 ffffffff810dc6a0 in asan_check_region
      ./arch/x86/mm/asan/asan.c:37
      [ 124.581050] #2 ffffffff810dd423 in __tsan_read8 ??:0
      [ 124.581893] #3 ffffffff8107c093 in get_wchan
      ./arch/x86/kernel/process_64.c:444
      
      The address checks in the 64bit implementation of get_wchan() are
      wrong in several ways:
      
       - The lower bound of the stack is not the start of the stack
         page. It's the start of the stack page plus sizeof (struct
         thread_info)
      
       - The upper bound must be:
      
             top_of_stack - TOP_OF_KERNEL_STACK_PADDING - 2 * sizeof(unsigned long).
      
         The 2 * sizeof(unsigned long) is required because the stack pointer
         points at the frame pointer. The layout on the stack is: ... IP FP
         ... IP FP. So we need to make sure that both IP and FP are in the
         bounds.
      
      Fix the bound checks and get rid of the mix of numeric constants, u64
      and unsigned long. Making all unsigned long allows us to use the same
      function for 32bit as well.
      
      Use READ_ONCE() when accessing the stack. This does not prevent a
      concurrent wakeup of the task and the stack changing, but at least it
      avoids TOCTOU.
      
      Also check task state at the end of the loop. Again that does not
      prevent concurrent changes, but it avoids walking for nothing.
      
      Add proper comments while at it.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Based-on-patch-from: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Reviewed-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Kostya Serebryany <kcc@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: kasan-dev <kasan-dev@googlegroups.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de>
      Link: http://lkml.kernel.org/r/20150930083302.694788319@linutronix.deSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      [bwh: Backported to 3.2:
       - s/READ_ONCE/ACCESS_ONCE/
       - Remove use of TOP_OF_KERNEL_STACK_PADDING, not defined here and would
         be defined as 0]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      5311d93d
    • James Hogan's avatar
      MIPS: dma-default: Fix 32-bit fall back to GFP_DMA · b9f15ae6
      James Hogan authored
      commit 53960059 upstream.
      
      If there is a DMA zone (usually 24bit = 16MB I believe), but no DMA32
      zone, as is the case for some 32-bit kernels, then massage_gfp_flags()
      will cause DMA memory allocated for devices with a 32..63-bit
      coherent_dma_mask to fall back to using __GFP_DMA, even though there may
      only be 32-bits of physical address available anyway.
      
      Correct that case to compare against a mask the size of phys_addr_t
      instead of always using a 64-bit mask.
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Fixes: a2e715a8 ("MIPS: DMA: Fix computation of DMA flags from device's coherent_dma_mask.")
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/9610/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      b9f15ae6
    • shengyong's avatar
      UBI: return ENOSPC if no enough space available · 02eb3b90
      shengyong authored
      commit 7c7feb2e upstream.
      
      UBI: attaching mtd1 to ubi0
      UBI: scanning is finished
      UBI error: init_volumes: not enough PEBs, required 706, available 686
      UBI error: ubi_wl_init: no enough physical eraseblocks (-20, need 1)
      UBI error: ubi_attach_mtd_dev: failed to attach mtd1, error -12 <= NOT ENOMEM
      UBI error: ubi_init: cannot attach mtd1
      
      If available PEBs are not enough when initializing volumes, return -ENOSPC
      directly. If available PEBs are not enough when initializing WL, return
      -ENOSPC instead of -ENOMEM.
      Signed-off-by: default avatarSheng Yong <shengyong1@huawei.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      Reviewed-by: default avatarDavid Gstir <david@sigma-star.at>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      02eb3b90