1. 02 Jul, 2014 9 commits
    • Benjamin LaHaise's avatar
      aio: fix kernel memory disclosure in io_getevents() introduced in v3.10 · bee3f7b8
      Benjamin LaHaise authored
      commit edfbbf38 upstream.
      
      A kernel memory disclosure was introduced in aio_read_events_ring() in v3.10
      by commit a31ad380.  The changes made to
      aio_read_events_ring() failed to correctly limit the index into
      ctx->ring_pages[], allowing an attacked to cause the subsequent kmap() of
      an arbitrary page with a copy_to_user() to copy the contents into userspace.
      This vulnerability has been assigned CVE-2014-0206.  Thanks to Mateusz and
      Petr for disclosing this issue.
      
      This patch applies to v3.12+.  A separate backport is needed for 3.10/3.11.
      Signed-off-by: default avatarBenjamin LaHaise <bcrl@kvack.org>
      Cc: Mateusz Guzik <mguzik@redhat.com>
      Cc: Petr Matousek <pmatouse@redhat.com>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      bee3f7b8
    • James Hogan's avatar
      serial: 8250_dw: Fix LCR workaround regression · be31bc4b
      James Hogan authored
      commit 6979f8d2 upstream.
      
      Commit c49436b6 (serial: 8250_dw: Improve unwritable LCR workaround)
      caused a regression. It added a check that the LCR was written properly
      to detect and workaround the busy quirk, but the behaviour of bit 5
      (UART_LCR_SPAR) differs between IP versions 3.00a and 3.14c per the
      docs. On older versions this caused the check to fail and it would
      repeatedly force idle and rewrite the LCR register, causing delays and
      preventing any input from serial being received.
      
      This is fixed by masking out UART_LCR_SPAR before making the comparison.
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Tim Kryger <tim.kryger@linaro.org>
      Cc: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
      Cc: Matt Porter <matt.porter@linaro.org>
      Cc: Markus Mayer <markus.mayer@linaro.org>
      Tested-by: default avatarTim Kryger <tim.kryger@linaro.org>
      Tested-by: default avatarEzequiel Garcia <ezequiel.garcia@free-electrons.com>
      Tested-by: default avatarHeikki Krogerus <heikki.krogerus@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      be31bc4b
    • Tim Kryger's avatar
      serial: 8250_dw: Improve unwritable LCR workaround · 24015775
      Tim Kryger authored
      commit c49436b6 upstream.
      
      When configured with UART_16550_COMPATIBLE=NO or in versions prior to
      the introduction of this option, the Designware UART will ignore writes
      to the LCR if the UART is busy.  The current workaround saves a copy of
      the last written LCR and re-writes it in the ISR for a special interrupt
      that is raised when a write was ignored.
      
      Unfortunately, interrupts are typically disabled prior to performing a
      sequence of register writes that include the LCR so the point at which
      the retry occurs is too late.  An example is serial8250_do_set_termios()
      where an ignored LCR write results in the baud divisor not being set and
      instead a garbage character is sent out the transmitter.
      
      Furthermore, since serial_port_out() offers no way to indicate failure,
      a serious effort must be made to ensure that the LCR is actually updated
      before returning back to the caller.  This is difficult, however, as a
      UART that was busy during the first attempt is likely to still be busy
      when a subsequent attempt is made unless some extra action is taken.
      
      This updated workaround reads back the LCR after each write to confirm
      that the new value was accepted by the hardware.  Should the hardware
      ignore a write, the TX/RX FIFOs are cleared and the receive buffer read
      before attempting to rewrite the LCR out of the hope that doing so will
      force the UART into an idle state.  While this may seem unnecessarily
      aggressive, writes to the LCR are used to change the baud rate, parity,
      stop bit, or data length so the data that may be lost is likely not
      important.  Admittedly, this is far from ideal but it seems to be the
      best that can be done given the hardware limitations.
      
      Lastly, the revised workaround doesn't touch the LCR in the ISR, so it
      avoids the possibility of a "serial8250: too much work for irq" lock up.
      This problem is rare in real situations but can be reproduced easily by
      wiring up two UARTs and running the following commands.
      
        # stty -F /dev/ttyS1 echo
        # stty -F /dev/ttyS2 echo
        # cat /dev/ttyS1 &
        [1] 375
        # echo asdf > /dev/ttyS1
        asdf
      
        [   27.700000] serial8250: too much work for irq96
        [   27.700000] serial8250: too much work for irq96
        [   27.710000] serial8250: too much work for irq96
        [   27.710000] serial8250: too much work for irq96
        [   27.720000] serial8250: too much work for irq96
        [   27.720000] serial8250: too much work for irq96
        [   27.730000] serial8250: too much work for irq96
        [   27.730000] serial8250: too much work for irq96
        [   27.740000] serial8250: too much work for irq96
      Signed-off-by: default avatarTim Kryger <tim.kryger@linaro.org>
      Reviewed-by: default avatarMatt Porter <matt.porter@linaro.org>
      Reviewed-by: default avatarMarkus Mayer <markus.mayer@linaro.org>
      Reviewed-by: default avatarHeikki Krogerus <heikki.krogerus@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      
      Conflicts:
      	drivers/tty/serial/8250/8250_dw.c
      24015775
    • Naoya Horiguchi's avatar
      mm: add !pte_present() check on existing hugetlb_entry callbacks · 7032d5fb
      Naoya Horiguchi authored
      commit d4c54919 upstream.
      
      The age table walker doesn't check non-present hugetlb entry in common
      path, so hugetlb_entry() callbacks must check it.  The reason for this
      behavior is that some callers want to handle it in its own way.
      
      [ I think that reason is bogus, btw - it should just do what the regular
        code does, which is to call the "pte_hole()" function for such hugetlb
        entries  - Linus]
      
      However, some callers don't check it now, which causes unpredictable
      result, for example when we have a race between migrating hugepage and
      reading /proc/pid/numa_maps.  This patch fixes it by adding !pte_present
      checks on buggy callbacks.
      
      This bug exists for years and got visible by introducing hugepage
      migration.
      
      ChangeLog v2:
      - fix if condition (check !pte_present() instead of pte_present())
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: <stable@vger.kernel.org> [3.12+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      [ Backported to 3.15.  Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org> ]
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      7032d5fb
    • Jeff Layton's avatar
      nfsd: don't halt scanning the DRC LRU list when there's an RC_INPROG entry · 5b96c379
      Jeff Layton authored
      commit 1b19453d upstream.
      
      Currently, the DRC cache pruner will stop scanning the list when it
      hits an entry that is RC_INPROG. It's possible however for a call to
      take a *very* long time. In that case, we don't want it to block other
      entries from being pruned if they are expired or we need to trim the
      cache to get back under the limit.
      
      Fix the DRC cache pruner to just ignore RC_INPROG entries.
      Signed-off-by: default avatarJeff Layton <jlayton@primarydata.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      5b96c379
    • Anatol Pomozov's avatar
      aio: block io_destroy() until all context requests are completed · 04242f7d
      Anatol Pomozov authored
      commit e02ba72a upstream.
      
      deletes aio context and all resources related to. It makes sense that
      no IO operations connected to the context should be running after the context
      is destroyed. As we removed io_context we have no chance to
      get requests status or call io_getevents().
      
      man page for io_destroy says that this function may block until
      all context's requests are completed. Before kernel 3.11 io_destroy()
      blocked indeed, but since aio refactoring in 3.11 it is not true anymore.
      
      Here is a pseudo-code that shows a testcase for a race condition discovered
      in 3.11:
      
        initialize io_context
        io_submit(read to buffer)
        io_destroy()
      
        // context is destroyed so we can free the resources
        free(buffers);
      
        // if the buffer is allocated by some other user he'll be surprised
        // to learn that the buffer still filled by an outstanding operation
        // from the destroyed io_context
      
      The fix is straight-forward - add a completion struct and wait on it
      in io_destroy, complete() should be called when number of in-fligh requests
      reaches zero.
      
      If two or more io_destroy() called for the same context simultaneously then
      only the first one waits for IO completion, other calls behaviour is undefined.
      
      Tested: ran http://pastebin.com/LrPsQ4RL testcase for several hours and
        do not see the race condition anymore.
      Signed-off-by: default avatarAnatol Pomozov <anatol.pomozov@gmail.com>
      Signed-off-by: default avatarBenjamin LaHaise <bcrl@kvack.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      04242f7d
    • Jeff Layton's avatar
      nfsd: don't try to reuse an expired DRC entry off the list · 0c9a0cfb
      Jeff Layton authored
      commit a0ef5e19 upstream.
      
      Currently when we are processing a request, we try to scrape an expired
      or over-limit entry off the list in preference to allocating a new one
      from the slab.
      
      This is unnecessarily complicated. Just use the slab layer.
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      0c9a0cfb
    • Zhichuang SUN's avatar
      drivers/video/fbdev/fb-puv3.c: Add header files for function unifb_mmap · 9e6f2084
      Zhichuang SUN authored
      commit fbc6c4a1 upstream.
      
      Function unifb_mmap calls functions which are defined in linux/mm.h
      and asm/pgtable.h
      
      The related error (for unicore32 with unicore32_defconfig):
      	CC      drivers/video/fbdev/fb-puv3.o
      	drivers/video/fbdev/fb-puv3.c: In function 'unifb_mmap':
      	drivers/video/fbdev/fb-puv3.c:646: error: implicit declaration of
      				      function 'vm_iomap_memory'
      	drivers/video/fbdev/fb-puv3.c:646: error: implicit declaration of
      				      function 'pgprot_noncached'
      Signed-off-by: default avatarZhichuang Sun <sunzc522@gmail.com>
      Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
      Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
      Cc: Jingoo Han <jg1.han@samsung.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Joe Perches <joe@perches.com>
      Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
      Cc: linux-fbdev@vger.kernel.org
      Acked-by: default avatarXuetao Guan <gxt@mprc.pku.edu.cn>
      Signed-off-by: default avatarTomi Valkeinen <tomi.valkeinen@ti.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      9e6f2084
    • Chen Gang's avatar
      arch/unicore32/mm/alignment.c: include "asm/pgtable.h" to avoid compiling error · 976c2ec4
      Chen Gang authored
      commit 1ff38c56 upstream.
      
      Need include "asm/pgtable.h" to include "asm-generic/pgtable-nopmd.h",
      so can let 'pmd_t' defined. The related error with allmodconfig:
      
          CC      arch/unicore32/mm/alignment.o
        In file included from arch/unicore32/mm/alignment.c:24:
        arch/unicore32/include/asm/tlbflush.h:135: error: expected .). before .*. token
        arch/unicore32/include/asm/tlbflush.h:154: error: expected .). before .*. token
        In file included from arch/unicore32/mm/alignment.c:27:
        arch/unicore32/mm/mm.h:15: error: expected .=., .,., .;., .sm. or ._attribute__. before .*. token
        arch/unicore32/mm/mm.h:20: error: expected .=., .,., .;., .sm. or ._attribute__. before .*. token
        arch/unicore32/mm/mm.h:25: error: expected .=., .,., .;., .sm. or ._attribute__. before .*. token
        make[1]: *** [arch/unicore32/mm/alignment.o] Error 1
        make: *** [arch/unicore32/mm] Error 2
      Signed-off-by: default avatarChen Gang <gang.chen.5i5j@gmail.com>
      Acked-by: default avatarXuetao Guan <gxt@mprc.pku.edu.cn>
      Signed-off-by: default avatarXuetao Guan <gxt@mprc.pku.edu.cn>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      976c2ec4
  2. 27 Jun, 2014 31 commits
    • Goldwyn Rodrigues's avatar
      ocfs2: revert iput deferring code in ocfs2_drop_dentry_lock · fc597a30
      Goldwyn Rodrigues authored
      commit 8ed6b237 upstream.
      
      The following patches are reverted in this patch because these patches
      caused performance regression in the remote unlink() calls.
      
        ea455f8a - ocfs2: Push out dropping of dentry lock to ocfs2_wq
        f7b1aa69 - ocfs2: Fix deadlock on umount
        5fd13189 - ocfs2: Don't oops in ocfs2_kill_sb on a failed mount
      
      Previous patches in this series removed the possible deadlocks from
      downconvert thread so the above patches shouldn't be needed anymore.
      
      The regression is caused because these patches delay the iput() in case
      of dentry unlocks.  This also delays the unlocking of the open lockres.
      The open lockresource is required to test if the inode can be wiped from
      disk or not.  When the deleting node does not get the open lock, it
      marks it as orphan (even though it is not in use by another
      node/process) and causes a journal checkpoint.  This delays operations
      following the inode eviction.  This also moves the inode to the orphaned
      inode which further causes more I/O and a lot of unneccessary orphans.
      
      The following script can be used to generate the load causing issues:
      
        declare -a create
        declare -a remove
        declare -a iterations=(1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384)
        unique="`mktemp -u XXXXX`"
        script="/tmp/idontknow-${unique}.sh"
        cat <<EOF > "${script}"
        for n in {1..8}; do mkdir -p test/dir\${n}
          eval touch test/dir\${n}/foo{1.."\$1"}
        done
        EOF
        chmod 700 "${script}"
      
        function fcreate ()
        {
          exec 2>&1 /usr/bin/time --format=%E "${script}" "$1"
        }
      
        function fremove ()
        {
          exec 2>&1 /usr/bin/time --format=%E ssh node2 "cd `pwd`; rm -Rf test*"
        }
      
        function fcp ()
        {
          exec 2>&1 /usr/bin/time --format=%E ssh node3 "cd `pwd`; cp -R test test.new"
        }
      
        echo -------------------------------------------------
        echo "| # files | create #s | copy #s | remove #s |"
        echo -------------------------------------------------
        for ((x=0; x < ${#iterations[*]} ; x++)) do
          create[$x]="`fcreate ${iterations[$x]}`"
          copy[$x]="`fcp ${iterations[$x]}`"
          remove[$x]="`fremove`"
          printf "| %8d | %9s | %9s | %9s |\n" ${iterations[$x]} ${create[$x]} ${copy[$x]} ${remove[$x]}
        done
        rm "${script}"
        echo "------------------------"
      Signed-off-by: default avatarSrinivas Eeda <srinivas.eeda@oracle.com>
      Signed-off-by: default avatarGoldwyn Rodrigues <rgoldwyn@suse.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      fc597a30
    • Jan Kara's avatar
      ocfs2: avoid blocking in ocfs2_mark_lockres_freeing() in downconvert thread · 8f265718
      Jan Kara authored
      commit 84d86f83 upstream.
      
      If we are dropping last inode reference from downconvert thread, we will
      end up calling ocfs2_mark_lockres_freeing() which can block if the lock
      we are freeing is queued thus creating an A-A deadlock.  Luckily, since
      we are the downconvert thread, we can immediately dequeue the lock and
      thus avoid waiting in this case.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarMark Fasheh <mfasheh@suse.de>
      Reviewed-by: default avatarSrinivas Eeda <srinivas.eeda@oracle.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      8f265718
    • Jan Kara's avatar
      ocfs2: implement delayed dropping of last dquot reference · cac99bee
      Jan Kara authored
      commit e3a767b6 upstream.
      
      We cannot drop last dquot reference from downconvert thread as that
      creates the following deadlock:
      
      NODE 1                                  NODE2
      holds dentry lock for 'foo'
      holds inode lock for GLOBAL_BITMAP_SYSTEM_INODE
                                              dquot_initialize(bar)
                                                ocfs2_dquot_acquire()
                                                  ocfs2_inode_lock(USER_QUOTA_SYSTEM_INODE)
                                                  ...
      downconvert thread (triggered from another
      node or a different process from NODE2)
        ocfs2_dentry_post_unlock()
          ...
          iput(foo)
            ocfs2_evict_inode(foo)
              ocfs2_clear_inode(foo)
                dquot_drop(inode)
                  ...
      	    ocfs2_dquot_release()
                    ocfs2_inode_lock(USER_QUOTA_SYSTEM_INODE)
                     - blocks
                                                  finds we need more space in
                                                  quota file
                                                  ...
                                                  ocfs2_extend_no_holes()
                                                    ocfs2_inode_lock(GLOBAL_BITMAP_SYSTEM_INODE)
                                                      - deadlocks waiting for
                                                        downconvert thread
      
      We solve the problem by postponing dropping of the last dquot reference to
      a workqueue if it happens from the downconvert thread.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarMark Fasheh <mfasheh@suse.de>
      Reviewed-by: default avatarSrinivas Eeda <srinivas.eeda@oracle.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      cac99bee
    • Jan Kara's avatar
      quota: provide function to grab quota structure reference · b5258061
      Jan Kara authored
      commit 9f985cb6 upstream.
      
      Provide dqgrab() function to get quota structure reference when we are
      sure it already has at least one active reference.  Make use of this
      function inside quota code.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarMark Fasheh <mfasheh@suse.de>
      Reviewed-by: default avatarSrinivas Eeda <srinivas.eeda@oracle.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      b5258061
    • Jan Kara's avatar
      ocfs2: move dquot_initialize() in ocfs2_delete_inode() somewhat later · 2eb0658f
      Jan Kara authored
      commit bd62ad7a upstream.
      
      Move dquot_initalize() call in ocfs2_delete_inode() after the moment we
      verify inode is actually a sane one to delete.  We certainly don't want
      to initialize quota for system inodes etc.  This also avoids calling
      into quota code from downconvert thread.
      
      Add more details into the comment why bailing out from
      ocfs2_delete_inode() when we are in downconvert thread is OK.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarMark Fasheh <mfasheh@suse.de>
      Reviewed-by: default avatarSrinivas Eeda <srinivas.eeda@oracle.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      2eb0658f
    • Lidong Zhong's avatar
      dlm: keep listening connection alive with sctp mode · 34b6c049
      Lidong Zhong authored
      commit 883854c5 upstream.
      
      The connection struct with nodeid 0 is the listening socket,
      not a connection to another node.  The sctp resend function
      was not checking that the nodeid was valid (non-zero), so it
      would mistakenly get and resend on the listening connection
      when nodeid was zero.
      Signed-off-by: default avatarLidong Zhong <lzhong@suse.com>
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      34b6c049
    • Miao Xie's avatar
      Btrfs: fix BUG_ON() casued by the reserved space migration · 9bf37c05
      Miao Xie authored
      commit 20dd2cbf upstream.
      
      When we did space balance and snapshot creation at the same time, we might
      meet the following oops:
       kernel BUG at fs/btrfs/inode.c:3038!
       [SNIP]
       Call Trace:
       [<ffffffffa0411ec7>] btrfs_orphan_cleanup+0x293/0x407 [btrfs]
       [<ffffffffa042dc45>] btrfs_mksubvol.isra.28+0x259/0x373 [btrfs]
       [<ffffffffa042de85>] btrfs_ioctl_snap_create_transid+0x126/0x156 [btrfs]
       [<ffffffffa042dff1>] btrfs_ioctl_snap_create_v2+0xd0/0x121 [btrfs]
       [<ffffffffa0430b2c>] btrfs_ioctl+0x414/0x1854 [btrfs]
       [<ffffffff813b60b7>] ? __do_page_fault+0x305/0x379
       [<ffffffff811215a9>] vfs_ioctl+0x1d/0x39
       [<ffffffff81121d7c>] do_vfs_ioctl+0x32d/0x3e2
       [<ffffffff81057fe7>] ? finish_task_switch+0x80/0xb8
       [<ffffffff81121e88>] SyS_ioctl+0x57/0x83
       [<ffffffff813b39ff>] ? do_device_not_available+0x12/0x14
       [<ffffffff813b99c2>] system_call_fastpath+0x16/0x1b
       [SNIP]
       RIP  [<ffffffffa040da40>] btrfs_orphan_add+0xc3/0x126 [btrfs]
      
      The reason of the problem is that the relocation root creation stole
      the reserved space, which was reserved for orphan item deletion.
      
      There are several ways to fix this problem, one is to increasing
      the reserved space size of the space balace, and then we can use
      that space to create the relocation tree for each fs/file trees.
      But it is hard to calculate the suitable size because we doesn't
      know how many fs/file trees we need relocate.
      
      We fixed this problem by reserving the space for relocation root creation
      actively since the space it need is very small (one tree block, used for
      root node copy), then we use that reserved space to create the
      relocation tree. If we don't reserve space for relocation tree creation,
      we will use the reserved space of the balance.
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      9bf37c05
    • Josef Bacik's avatar
      Btrfs: fix two use-after-free bugs with transaction cleanup · 8b16b61c
      Josef Bacik authored
      commit 724e2315 upstream.
      
      I was noticing the slab redzone stuff going off every once and a while during
      transaction aborts.  This was caused by two things
      
      1) We would walk the pending snapshots and set their error to -ECANCELED.  We
      don't need to do this, the snapshot stuff waits for a transaction commit and if
      there is a problem we just free our pending snapshot object and exit.  Doing
      this was causing us to touch the pending snapshot object after the thing had
      already been freed.
      
      2) We were freeing the transaction manually with wanton disregard for it's
      use_count reference counter.  To fix this I cleaned up the transaction freeing
      loop to either wait for the transaction commit to finish if it was in the middle
      of that (since it will be cleaned and freed up there) or to do the cleanup
      oursevles.
      
      I also moved the global "kill all things dirty everywhere" stuff outside of the
      transaction cleanup loop since that only needs to be done once.  With this patch
      I'm no longer seeing slab corruption because of use after frees.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      8b16b61c
    • Josef Bacik's avatar
      Btrfs: don't delete ordered roots from list during cleanup · 445d1c3a
      Josef Bacik authored
      commit 1de2cfde upstream.
      
      During transaction cleanup after an abort we are just removing roots from the
      ordered roots list which is incorrect.  We have a BUG_ON() to make sure that the
      root is still part of the ordered roots list when we put our ordered extent
      which we were tripping in this case.  So do like we do everywhere else and just
      move it to the tail of the ordered roots list and allow the normal cleanup to
      take care of stuff.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      445d1c3a
    • Josef Bacik's avatar
      Btrfs: cleanup transaction on abort · bd32872f
      Josef Bacik authored
      commit 4e121c06 upstream.
      
      If we abort not during a transaction commit we won't clean up anything until we
      unmount.  Unfortunately if we abort in the middle of writing out an ordered
      extent we won't clean it up and if somebody is waiting on that ordered extent
      they will wait forever.  To fix this just make the transaction kthread call the
      cleanup transaction stuff if it notices theres an error, and make
      btrfs_end_transaction wake up the transaction kthread if there is an error.
      Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      bd32872f
    • Josef Bacik's avatar
      Btrfs: do not release metadata for space cache inodes · 4b6d66d1
      Josef Bacik authored
      commit b6d08f06 upstream.
      
      I've been testing our error paths and I was tripping the BUG_ON() in
      drop_outstanding_extent because our outstanding_extents is 0 for space cache
      inodes.  This is because we don't reserve metadata space for these inodes since
      we depend on the global block reserve for our space.  To fix this we need to
      make sure the DO_ACCOUNTING stuff doesn't actually call release_metadata for
      space cache inodes.  With this patch I'm no longer panicing.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      4b6d66d1
    • Filipe David Borba Manana's avatar
      Btrfs: don't leak block group on error · 596075a2
      Filipe David Borba Manana authored
      commit e84cc142 upstream.
      
      In extent-tree.c:btrfs_write_dirty_block_groups(), if the call to
      write_one_cache_group() failed, we would return without putting
      the block group first.
      Signed-off-by: default avatarFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      596075a2
    • Filipe David Borba Manana's avatar
      Btrfs: fix sync fs to actually wait for all data to be persisted · a2ea3d78
      Filipe David Borba Manana authored
      commit 9b199859 upstream.
      
      Currently the fs sync function (super.c:btrfs_sync_fs()) doesn't
      wait for delayed work to finish before returning success to the
      caller. This change fixes this, ensuring that there's no data loss
      if a power failure happens right after fs sync returns success to
      the caller and before the next commit happens.
      
      Steps to reproduce the data loss issue:
      
      $ mkfs.btrfs -f /dev/sdb3
      $ mount /dev/sdb3 /mnt/btrfs
      $ perl -e '$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print $f $d; close($f);' && btrfs fi sync /mnt/btrfs
      
      Right after the btrfs fi sync command (a second or 2 for example), power
      off the machine and reboot it. The file will be empty, as it can be verified
      after mounting the filesystem and through btrfs-debug-tree:
      
      $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
              item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
                      location key (257 INODE_ITEM 0) type FILE
                      namelen 6 datalen 0 name: foobar
              item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
                      inode generation 7 transid 7 size 0 block group 0 mode 100644 links 1
              item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
                      inode ref index 2 namelen 6 name: foobar
      checksum tree key (CSUM_TREE ROOT_ITEM 0)
      leaf 29429760 items 0 free space 3995 generation 7 owner 7
      fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e
      chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae
      uuid tree key (UUID_TREE ROOT_ITEM 0)
      
      After this patch, the data loss no longer happens after a power failure and
      btrfs-debug-tree shows:
      
      $ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
      	item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
      		location key (257 INODE_ITEM 0) type FILE
      		namelen 6 datalen 0 name: foobar
      	item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
      		inode generation 6 transid 6 size 6001 block group 0 mode 100644 links 1
      	item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
      		inode ref index 2 namelen 6 name: foobar
      	item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53
      		extent data disk byte 12845056 nr 8192
      		extent data offset 0 nr 8192 ram 8192
      		extent compression 0
      checksum tree key (CSUM_TREE ROOT_ITEM 0)
      Signed-off-by: default avatarFilipe David Borba Manana <fdmanana@gmail.com>
      Reviewed-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      a2ea3d78
    • Filipe David Borba Manana's avatar
      Btrfs: fix tracking of orphan inode count · f19eb84e
      Filipe David Borba Manana authored
      commit 703c88e0 upstream.
      
      In inode.c:btrfs_orphan_add() if we failed to insert the orphan
      item, we would return without decrementing the orphan count that
      we just incremented before attempting the insertion, leaving the
      orphan inode count wrong.
      
      In inode.c:btrfs_orphan_del(), we were decrementing the inode
      orphan count if the bit BTRFS_INODE_ORPHAN_META_RESERVED was set,
      which is logically wrong because it should be decremented if the
      bit BTRFS_INODE_HAS_ORPHAN_ITEM was set - after all we increment
      the count when we set the bit BTRFS_INODE_HAS_ORPHAN_ITEM elsewhere.
      Signed-off-by: default avatarFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      f19eb84e
    • Steve French's avatar
      Do not send ClientGUID on SMB2.02 dialect · c82b3dd9
      Steve French authored
      commit 3c5f9be1 upstream.
      
      ClientGUID must be zero for SMB2.02 dialect.  See section 2.2.3
      of MS-SMB2. For SMB2.1 and later it must be non-zero.
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      CC: Sachin Prabhu <sprabhu@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      c82b3dd9
    • Sachin Prabhu's avatar
      cifs: Set client guid on per connection basis · 8f7e86ca
      Sachin Prabhu authored
      commit 39552ea8 upstream.
      
      When mounting from a Windows 2012R2 server, we hit the following
      problem:
      1) Mount with any of the following versions - 2.0, 2.1 or 3.0
      2) unmount
      3) Attempt a mount again using a different SMB version >= 2.0.
      
      You end up with the following failure:
      Status code returned 0xc0000203 STATUS_USER_SESSION_DELETED
      CIFS VFS: Send error in SessSetup = -5
      CIFS VFS: cifs_mount failed w/return code = -5
      
      I cannot reproduce this issue using a Windows 2008 R2 server.
      
      This appears to be caused because we use the same client guid for the
      connection on first mount which we then disconnect and attempt to mount
      again using a different protocol version. By generating a new guid each
      time a new connection is Negotiated, we avoid hitting this problem.
      Signed-off-by: default avatarSachin Prabhu <sprabhu@redhat.com>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      8f7e86ca
    • Steve French's avatar
      Check SMB3 dialects against downgrade attacks · 16e57e55
      Steve French authored
      commit ff1c038a upstream.
      
      When we are running SMB3 or SMB3.02 connections which are signed
      we need to validate the protocol negotiation information,
      to ensure that the negotiate protocol response was not tampered with.
      
      Add the missing FSCTL which is sent at mount time (immediately after
      the SMB3 Tree Connect) to validate that the capabilities match
      what we think the server sent.
      
      "Secure dialect negotiation is introduced in SMB3 to protect against
      man-in-the-middle attempt to downgrade dialect negotiation.
      The idea is to prevent an eavesdropper from downgrading the initially
      negotiated dialect and capabilities between the client and the server."
      
      For more explanation see 2.2.31.4 of MS-SMB2 or
      http://blogs.msdn.com/b/openspecification/archive/2012/06/28/smb3-secure-dialect-negotiation.aspxReviewed-by: default avatarPavel Shilovsky <piastry@etersoft.ru>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      [ddiss@suse.de: backported atop kernel without clone_range support]
      Signed-off-by: default avatarDavid Disseldorp <ddiss@suse.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      16e57e55
    • Michal Kubecek's avatar
      xfrm: fix race between netns cleanup and state expire notification · 3f8fd8ad
      Michal Kubecek authored
      commit 21ee543e upstream.
      
      The xfrm_user module registers its pernet init/exit after xfrm
      itself so that its net exit function xfrm_user_net_exit() is
      executed before xfrm_net_exit() which calls xfrm_state_fini() to
      cleanup the SA's (xfrm states). This opens a window between
      zeroing net->xfrm.nlsk pointer and deleting all xfrm_state
      instances which may access it (via the timer). If an xfrm state
      expires in this window, xfrm_exp_state_notify() will pass null
      pointer as socket to nlmsg_multicast().
      
      As the notifications are called inside rcu_read_lock() block, it
      is sufficient to retrieve the nlsk socket with rcu_dereference()
      and check the it for null.
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      3f8fd8ad
    • Michal Kubeček's avatar
      vlan: more careful checksum features handling · 306ba5b2
      Michal Kubeček authored
      commit da08143b upstream.
      
      When combining real_dev's features and vlan_features, simple
      bitwise AND is used. This doesn't work well for checksum
      offloading features as if one set has NETIF_F_HW_CSUM and the
      other NETIF_F_IP_CSUM and/or NETIF_F_IPV6_CSUM, we end up with
      no checksum offloading. However, from the logical point of view
      (how can_checksum_protocol() works), NETIF_F_HW_CSUM contains
      the functionality of NETIF_F_IP_CSUM and NETIF_F_IPV6_CSUM so
      that the result should be IP/IPV6.
      
      Add helper function netdev_intersect_features() implementing
      this logic and use it in vlan_dev_fix_features().
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      306ba5b2
    • Ben Hutchings's avatar
      net/compat: Fix minor information leak in siocdevprivate_ioctl() · e45145b6
      Ben Hutchings authored
      commit 417c3522 upstream.
      
      We don't need to check that ifr_data itself is a valid user pointer,
      but we should check &ifr_data is.  Thankfully the copy of ifr_name is
      checked, so this can only leak a few bytes from immediately above the
      user address limit.
      Signed-off-by: default avatarBen Hutchings <bhutchings@solarflare.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      e45145b6
    • Benjamin Poirier's avatar
      net: Do not enable tx-nocache-copy by default · 9f6e089c
      Benjamin Poirier authored
      commit cdb3f4a3 upstream.
      
      There are many cases where this feature does not improve performance or even
      reduces it.
      
      For example, here are the results from tests that I've run using 3.12.6 on one
      Intel Xeon W3565 and one i7 920 connected by ixgbe adapters. The results are
      from the Xeon, but they're similar on the i7. All numbers report the
      mean±stddev over 10 runs of 10s.
      
      1) latency tests similar to what is described in "c6e1a0d1 net: Allow no-cache
      copy from user on transmit"
      There is no statistically significant difference between tx-nocache-copy
      on/off.
      nic irqs spread out (one queue per cpu)
      
      200x netperf -r 1400,1
      tx-nocache-copy off
              692000±1000 tps
              50/90/95/99% latency (us): 275±2/643.8±0.4/799±1/2474.4±0.3
      tx-nocache-copy on
              693000±1000 tps
              50/90/95/99% latency (us): 274±1/644.1±0.7/800±2/2474.5±0.7
      
      200x netperf -r 14000,14000
      tx-nocache-copy off
              86450±80 tps
              50/90/95/99% latency (us): 334.37±0.02/838±1/2100±20/3990±40
      tx-nocache-copy on
              86110±60 tps
              50/90/95/99% latency (us): 334.28±0.01/837±2/2110±20/3990±20
      
      2) single stream throughput tests
      tx-nocache-copy leads to higher service demand
      
                              throughput  cpu0        cpu1        demand
                              (Gb/s)      (Gcycle)    (Gcycle)    (cycle/B)
      
      nic irqs and netperf on cpu0 (1x netperf -T0,0 -t omni -- -d send)
      
      tx-nocache-copy off     9402±5      9.4±0.2                 0.80±0.01
      tx-nocache-copy on      9403±3      9.85±0.04               0.838±0.004
      
      nic irqs on cpu0, netperf on cpu1 (1x netperf -T1,1 -t omni -- -d send)
      
      tx-nocache-copy off     9401±5      5.83±0.03   5.0±0.1     0.923±0.007
      tx-nocache-copy on      9404±2      5.74±0.03   5.523±0.009 0.958±0.002
      
      As a second example, here are some results from Eric Dumazet with latest
      net-next.
      tx-nocache-copy also leads to higher service demand
      
      (cpu is Intel(R) Xeon(R) CPU X5660  @ 2.80GHz)
      
      lpq83:~# ./ethtool -K eth0 tx-nocache-copy on
      lpq83:~# perf stat ./netperf -H lpq84 -c
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpq84.prod.google.com () port 0 AF_INET
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % U      us/KB   us/KB
      
       87380  16384  16384    10.00      9407.44   2.50     -1.00    0.522   -1.000
      
       Performance counter stats for './netperf -H lpq84 -c':
      
             4282.648396 task-clock                #    0.423 CPUs utilized
                   9,348 context-switches          #    0.002 M/sec
                      88 CPU-migrations            #    0.021 K/sec
                     355 page-faults               #    0.083 K/sec
          11,812,797,651 cycles                    #    2.758 GHz                     [82.79%]
           9,020,522,817 stalled-cycles-frontend   #   76.36% frontend cycles idle    [82.54%]
           4,579,889,681 stalled-cycles-backend    #   38.77% backend  cycles idle    [67.33%]
           6,053,172,792 instructions              #    0.51  insns per cycle
                                                   #    1.49  stalled cycles per insn [83.64%]
             597,275,583 branches                  #  139.464 M/sec                   [83.70%]
               8,960,541 branch-misses             #    1.50% of all branches         [83.65%]
      
            10.128990264 seconds time elapsed
      
      lpq83:~# ./ethtool -K eth0 tx-nocache-copy off
      lpq83:~# perf stat ./netperf -H lpq84 -c
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpq84.prod.google.com () port 0 AF_INET
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % U      us/KB   us/KB
      
       87380  16384  16384    10.00      9412.45   2.15     -1.00    0.449   -1.000
      
       Performance counter stats for './netperf -H lpq84 -c':
      
             2847.375441 task-clock                #    0.281 CPUs utilized
                  11,632 context-switches          #    0.004 M/sec
                      49 CPU-migrations            #    0.017 K/sec
                     354 page-faults               #    0.124 K/sec
           7,646,889,749 cycles                    #    2.686 GHz                     [83.34%]
           6,115,050,032 stalled-cycles-frontend   #   79.97% frontend cycles idle    [83.31%]
           1,726,460,071 stalled-cycles-backend    #   22.58% backend  cycles idle    [66.55%]
           2,079,702,453 instructions              #    0.27  insns per cycle
                                                   #    2.94  stalled cycles per insn [83.22%]
             363,773,213 branches                  #  127.757 M/sec                   [83.29%]
               4,242,732 branch-misses             #    1.17% of all branches         [83.51%]
      
            10.128449949 seconds time elapsed
      
      CC: Tom Herbert <therbert@google.com>
      Signed-off-by: default avatarBenjamin Poirier <bpoirier@suse.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      9f6e089c
    • Prarit Bhargava's avatar
      ACPI / memhotplug: add parameter to disable memory hotplug · e801ecec
      Prarit Bhargava authored
      commit 00159a20 upstream.
      
      When booting a kexec/kdump kernel on a system that has specific memory
      hotplug regions the boot will fail with warnings like:
      
       swapper/0: page allocation failure: order:9, mode:0x84d0
       CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-65.el7.x86_64 #1
       Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.S013.032920111005 03/29/2011
        0000000000000000 ffff8800341bd8c8 ffffffff815bcc67 ffff8800341bd950
        ffffffff8113b1a0 ffff880036339b00 0000000000000009 00000000000084d0
        ffff8800341bd950 ffffffff815b87ee 0000000000000000 0000000000000200
       Call Trace:
        [<ffffffff815bcc67>] dump_stack+0x19/0x1b
        [<ffffffff8113b1a0>] warn_alloc_failed+0xf0/0x160
        [<ffffffff815b87ee>] ?  __alloc_pages_direct_compact+0xac/0x196
        [<ffffffff8113f14f>] __alloc_pages_nodemask+0x7ff/0xa00
        [<ffffffff815b417c>] vmemmap_alloc_block+0x62/0xba
        [<ffffffff815b41e9>] vmemmap_alloc_block_buf+0x15/0x3b
        [<ffffffff815b1ff6>] vmemmap_populate+0xb4/0x21b
        [<ffffffff815b461d>] sparse_mem_map_populate+0x27/0x35
        [<ffffffff815b400f>] sparse_add_one_section+0x7a/0x185
        [<ffffffff815a1e9f>] __add_pages+0xaf/0x240
        [<ffffffff81047359>] arch_add_memory+0x59/0xd0
        [<ffffffff815a21d9>] add_memory+0xb9/0x1b0
        [<ffffffff81333b9c>] acpi_memory_device_add+0x18d/0x26d
        [<ffffffff81309a01>] acpi_bus_device_attach+0x7d/0xcd
        [<ffffffff8132379d>] acpi_ns_walk_namespace+0xc8/0x17f
        [<ffffffff81309984>] ? acpi_bus_type_and_status+0x90/0x90
        [<ffffffff81309984>] ? acpi_bus_type_and_status+0x90/0x90
        [<ffffffff81323c8c>] acpi_walk_namespace+0x95/0xc5
        [<ffffffff8130a6d6>] acpi_bus_scan+0x8b/0x9d
        [<ffffffff81a2019a>] acpi_scan_init+0x63/0x160
        [<ffffffff81a1ffb5>] acpi_init+0x25d/0x2a6
        [<ffffffff81a1fd58>] ? acpi_sleep_proc_init+0x2a/0x2a
        [<ffffffff810020e2>] do_one_initcall+0xe2/0x190
        [<ffffffff819e20c4>] kernel_init_freeable+0x17c/0x207
        [<ffffffff819e18d0>] ? do_early_param+0x88/0x88
        [<ffffffff8159fea0>] ? rest_init+0x80/0x80
        [<ffffffff8159feae>] kernel_init+0xe/0x180
        [<ffffffff815cca2c>] ret_from_fork+0x7c/0xb0
        [<ffffffff8159fea0>] ? rest_init+0x80/0x80
       Mem-Info:
       Node 0 DMA per-cpu:
       CPU    0: hi:    0, btch:   1 usd:   0
       Node 0 DMA32 per-cpu:
       CPU    0: hi:   42, btch:   7 usd:   0
       active_anon:0 inactive_anon:0 isolated_anon:0
        active_file:0 inactive_file:0 isolated_file:0
        unevictable:0 dirty:0 writeback:0 unstable:0
        free:872 slab_reclaimable:13 slab_unreclaimable:1880
        mapped:0 shmem:0 pagetables:0 bounce:0
        free_cma:0
      
      because the system has run out of memory at boot time.  This occurs
      because of the following sequence in the boot:
      
      Main kernel boots and sets E820 map.  The second kernel is booted with a
      map generated by the kdump service using memmap= and memmap=exactmap.
      These parameters are added to the kernel parameters of the kexec/kdump
      kernel.   The kexec/kdump kernel has limited memory resources so as not
      to severely impact the main kernel.
      
      The system then panics and the kdump/kexec kernel boots (which is a
      completely new kernel boot).  During this boot ACPI is initialized and the
      kernel (as can be seen above) traverses the ACPI namespace and finds an
      entry for a memory device to be hotadded.
      
      ie)
      
        [<ffffffff815a1e9f>] __add_pages+0xaf/0x240
        [<ffffffff81047359>] arch_add_memory+0x59/0xd0
        [<ffffffff815a21d9>] add_memory+0xb9/0x1b0
        [<ffffffff81333b9c>] acpi_memory_device_add+0x18d/0x26d
        [<ffffffff81309a01>] acpi_bus_device_attach+0x7d/0xcd
        [<ffffffff8132379d>] acpi_ns_walk_namespace+0xc8/0x17f
        [<ffffffff81309984>] ? acpi_bus_type_and_status+0x90/0x90
        [<ffffffff81309984>] ? acpi_bus_type_and_status+0x90/0x90
        [<ffffffff81323c8c>] acpi_walk_namespace+0x95/0xc5
        [<ffffffff8130a6d6>] acpi_bus_scan+0x8b/0x9d
        [<ffffffff81a2019a>] acpi_scan_init+0x63/0x160
        [<ffffffff81a1ffb5>] acpi_init+0x25d/0x2a6
      
      At this point the kernel adds page table information and the the kexec/kdump
      kernel runs out of memory.
      
      This can also be reproduced by using the memmap=exactmap and mem=X
      parameters on the main kernel and booting.
      
      This patchset resolves the problem by adding a kernel parameter,
      acpi_no_memhotplug, to disable ACPI memory hotplug.
      Signed-off-by: default avatarPrarit Bhargava <prarit@redhat.com>
      Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Acked-by: default avatarToshi Kani <toshi.kani@hp.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      e801ecec
    • Peter Zijlstra's avatar
      sched: Make scale_rt_power() deal with backward clocks · 5ff029e2
      Peter Zijlstra authored
      commit cadefd3d upstream.
      
      Mike reported that, while unlikely, its entirely possible for
      scale_rt_power() to see the time go backwards. This yields rather
      'interesting' results.
      
      So like all other sites that deal with clocks; make this one ignore
      backward clock movement too.
      Reported-by: default avatarMike Galbraith <bitbucket@online.de>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20140227094035.GZ9987@twins.programming.kicks-ass.net
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      5ff029e2
    • Wendy Xiong's avatar
      [SCSI] ipr: Add new CCIN definition for Grand Canyon support · dcc23f13
      Wendy Xiong authored
      commit 5eeac3e9 upstream.
      
      Add the appropriate definition and table entry for new hardware support.
      Signed-off-by: default avatarWen Xiong <wenxiong@linux.vnet.ibm.com>
      Acked-by: default avatarBrian King <brking@linux.vnet.ibm.com>
      Signed-off-by: default avatarJames Bottomley <JBottomley@Parallels.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      dcc23f13
    • Mike Qiu's avatar
      powerpc/mm: fix ".__node_distance" undefined · 311222a9
      Mike Qiu authored
      commit 12c743eb upstream.
      
        CHK     include/config/kernel.release
        CHK     include/generated/uapi/linux/version.h
        CHK     include/generated/utsrelease.h
        ...
        Building modules, stage 2.
      WARNING: 1 bad relocations
      c0000000013d6a30 R_PPC64_ADDR64    uprobes_fetch_type_table
        WRAP    arch/powerpc/boot/zImage.pseries
        WRAP    arch/powerpc/boot/zImage.epapr
        MODPOST 1849 modules
      ERROR: ".__node_distance" [drivers/block/nvme.ko] undefined!
      make[1]: *** [__modpost] Error 1
      make: *** [modules] Error 2
      make: *** Waiting for unfinished jobs....
      
      The reason is symbol "__node_distance" not been exported in powerpc.
      Signed-off-by: default avatarMike Qiu <qiudayu@linux.vnet.ibm.com>
      Acked-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: Jesse Larrew <jlarrew@linux.vnet.ibm.com>
      Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
      Cc: Alistair Popple <alistair@popple.id.au>
      Cc: Mike Qiu <qiudayu@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      311222a9
    • Petr Mladek's avatar
      ftrace/x86: Call text_ip_addr() instead of the duplicated code · 38a572d0
      Petr Mladek authored
      commit 964f7b6b upstream.
      
      I just went over this when looking at some Xen-related ftrace initialization
      problems. They were related to Xen code that is not upstream but this clean up
      would make sense here.
      
      I think that this was already the intention when text_ip_addr() was introduced
      in the commit 87fbb2ac (ftrace/x86: Use breakpoints for converting
      function graph caller). Anyway, better do it now before it shots people into
      their leg ;-)
      
      Link: http://lkml.kernel.org/p/1401812601-2359-1-git-send-email-pmladek@suse.czSigned-off-by: default avatarPetr Mladek <pmladek@suse.cz>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      38a572d0
    • J. Bruce Fields's avatar
      nfsd4: fix FREE_STATEID lockowner leak · 2eaaa8d2
      J. Bruce Fields authored
      commit 48385408 upstream.
      
      27b11428 ("nfsd4: remove lockowner when removing lock stateid")
      introduced a memory leak.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarJeff Layton <jeff.layton@primarydata.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      2eaaa8d2
    • Ying Xue's avatar
      tipc: fix memory leak of publications · 48195684
      Ying Xue authored
      commit 1621b94d upstream.
      
      Commit 1bb8dce5 ("tipc: fix memory
      leak during module removal") introduced a memory leak issue: when
      name table is stopped, it's forgotten that publication instances are
      freed properly. Additionally the useless "continue" statement in
      tipc_nametbl_stop() is removed as well.
      Reported-by: default avatarJason <huzhijiang@gmail.com>
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Acked-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      48195684
    • Jiang Liu's avatar
      intel_idle: close avn_cstates array with correct marker · 4c65d4f6
      Jiang Liu authored
      commit 88390996 upstream.
      
      Close avn_cstates array with correct marker to avoid overflow
      in function intel_idle_cpu_init().
      
      [rjw: The problem was introduced when commit 22e580d0 was merged
       on top of eba682a5 (intel_idle: shrink states tables).]
      
      Fixes: 22e580d0 (intel_idle: Fixed C6 state on Avoton/Rangeley processors)
      Signed-off-by: default avatarJiang Liu <jiang.liu@linux.intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      4c65d4f6
    • Viresh Kumar's avatar
      tick-sched: Check tick_nohz_enabled in tick_nohz_switch_to_nohz() · e54e6e8e
      Viresh Kumar authored
      commit 27630532 upstream.
      
      Since commit d689fe22 (NOHZ: Check for nohz active instead of nohz
      enabled) the tick_nohz_switch_to_nohz() function returns because it
      checks for the tick_nohz_active flag. This can't be set, because the
      function itself sets it.
      
      Undo the change in tick_nohz_switch_to_nohz().
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Cc: linaro-kernel@lists.linaro.org
      Cc: fweisbec@gmail.com
      Cc: Arvind.Chauhan@arm.com
      Cc: linaro-networking@linaro.org
      Cc: <stable@vger.kernel.org> # 3.13+
      Link: http://lkml.kernel.org/r/40939c05f2d65d781b92b20302b02243d0654224.1397537987.git.viresh.kumar@linaro.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      e54e6e8e
    • Konstantin Khlebnikov's avatar
      epoll: fix use-after-free in eventpoll_release_file · c79460f6
      Konstantin Khlebnikov authored
      commit ebe06187 upstream.
      
      This fixes use-after-free of epi->fllink.next inside list loop macro.
      This loop actually releases elements in the body.  The list is
      rcu-protected but here we cannot hold rcu_read_lock because we need to
      lock mutex inside.
      
      The obvious solution is to use list_for_each_entry_safe().  RCU-ness
      isn't essential because nobody can change this list under us, it's final
      fput for this file.
      
      The bug was introduced by ae10b2b4 ("epoll: optimize EPOLL_CTL_DEL
      using rcu")
      Signed-off-by: default avatarKonstantin Khlebnikov <koct9i@gmail.com>
      Reported-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Stable <stable@vger.kernel.org> # 3.13+
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Jason Baron <jbaron@akamai.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      c79460f6