1. 11 Jan, 2016 3 commits
    • Dave Chinner's avatar
      xfs: handle dquot buffer readahead in log recovery correctly · 7d6a13f0
      Dave Chinner authored
      When we do dquot readahead in log recovery, we do not use a verifier
      as the underlying buffer may not have dquots in it. e.g. the
      allocation operation hasn't yet been replayed. Hence we do not want
      to fail recovery because we detect an operation to be replayed has
      not been run yet. This problem was addressed for inodes in commit
      d8914002 ("xfs: inode buffers may not be valid during recovery
      readahead") but the problem was not recognised to exist for dquots
      and their buffers as the dquot readahead did not have a verifier.
      
      The result of not using a verifier is that when the buffer is then
      next read to replay a dquot modification, the dquot buffer verifier
      will only be attached to the buffer if *readahead is not complete*.
      Hence we can read the buffer, replay the dquot changes and then add
      it to the delwri submission list without it having a verifier
      attached to it. This then generates warnings in xfs_buf_ioapply(),
      which catches and warns about this case.
      
      Fix this and make it handle the same readahead verifier error cases
      as for inode buffers by adding a new readahead verifier that has a
      write operation as well as a read operation that marks the buffer as
      not done if any corruption is detected.  Also make sure we don't run
      readahead if the dquot buffer has been marked as cancelled by
      recovery.
      
      This will result in readahead either succeeding and the buffer
      having a valid write verifier, or readahead failing and the buffer
      state requiring the subsequent read to resubmit the IO with the new
      verifier.  In either case, this will result in the buffer always
      ending up with a valid write verifier on it.
      
      Note: we also need to fix the inode buffer readahead error handling
      to mark the buffer with EIO. Brian noticed the code I copied from
      there wrong during review, so fix it at the same time. Add comments
      linking the two functions that handle readahead verifier errors
      together so we don't forget this behavioural link in future.
      
      cc: <stable@vger.kernel.org> # 3.12 - current
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      7d6a13f0
    • Dave Chinner's avatar
      xfs: inode recovery readahead can race with inode buffer creation · b79f4a1c
      Dave Chinner authored
      When we do inode readahead in log recovery, we do can do the
      readahead before we've replayed the icreate transaction that stamps
      the buffer with inode cores. The inode readahead verifier catches
      this and marks the buffer as !done to indicate that it doesn't yet
      contain valid inodes.
      
      In adding buffer error notification  (i.e. setting b_error = -EIO at
      the same time as as we clear the done flag) to such a readahead
      verifier failure, we can then get subsequent inode recovery failing
      with this error:
      
      XFS (dm-0): metadata I/O error: block 0xa00060 ("xlog_recover_do..(read#2)") error 5 numblks 32
      
      This occurs when readahead completion races with icreate item replay
      such as:
      
      	inode readahead
      		find buffer
      		lock buffer
      		submit RA io
      	....
      	icreate recovery
      	    xfs_trans_get_buffer
      		find buffer
      		lock buffer
      		<blocks on RA completion>
      	.....
      	<ra completion>
      		fails verifier
      		clear XBF_DONE
      		set bp->b_error = -EIO
      		release and unlock buffer
      	<icreate gains lock>
      	icreate initialises buffer
      	marks buffer as done
      	adds buffer to delayed write queue
      	releases buffer
      
      At this point, we have an initialised inode buffer that is up to
      date but has an -EIO state registered against it. When we finally
      get to recovering an inode in that buffer:
      
      	inode item recovery
      	    xfs_trans_read_buffer
      		find buffer
      		lock buffer
      		sees XBF_DONE is set, returns buffer
      	    sees bp->b_error is set
      		fail log recovery!
      
      Essentially, we need xfs_trans_get_buf_map() to clear the error status of
      the buffer when doing a lookup. This function returns uninitialised
      buffers, so the buffer returned can not be in an error state and
      none of the code that uses this function expects b_error to be set
      on return. Indeed, there is an ASSERT(!bp->b_error); in the
      transaction case in xfs_trans_get_buf_map() that would have caught
      this if log recovery used transactions....
      
      This patch firstly changes the inode readahead failure to set -EIO
      on the buffer, and secondly changes xfs_buf_get_map() to never
      return a buffer with an error state set so this first change doesn't
      cause unexpected log recovery failures.
      
      cc: <stable@vger.kernel.org> # 3.12 - current
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      b79f4a1c
    • Eric Sandeen's avatar
      xfs: eliminate committed arg from xfs_bmap_finish · f6106efa
      Eric Sandeen authored
      Calls to xfs_bmap_finish() and xfs_trans_ijoin(), and the
      associated comments were replicated several times across
      the attribute code, all dealing with what to do if the
      transaction was or wasn't committed.
      
      And in that replicated code, an ASSERT() test of an
      uninitialized variable occurs in several locations:
      
      	error = xfs_attr_thing(&args);
      	if (!error) {
      		error = xfs_bmap_finish(&args.trans, args.flist,
      					&committed);
      	}
      	if (error) {
      		ASSERT(committed);
      
      If the first xfs_attr_thing() failed, we'd skip the xfs_bmap_finish,
      never set "committed", and then test it in the ASSERT.
      
      Fix this up by moving the committed state internal to xfs_bmap_finish,
      and add a new inode argument.  If an inode is passed in, it is passed
      through to __xfs_trans_roll() and joined to the transaction there if
      the transaction was committed.
      
      xfs_qm_dqalloc() was a little unique in that it called bjoin rather
      than ijoin, but as Dave points out we can detect the committed state
      but checking whether (*tpp != tp).
      
      Addresses-Coverity-Id: 102360
      Addresses-Coverity-Id: 102361
      Addresses-Coverity-Id: 102363
      Addresses-Coverity-Id: 102364
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      f6106efa
  2. 08 Jan, 2016 2 commits
    • Dave Chinner's avatar
      xfs: bmapbt checking on debug kernels too expensive · e3543819
      Dave Chinner authored
      For large sparse or fragmented files, checking every single entry in
      the bmapbt on every operation is prohibitively expensive. Especially
      as such checks rarely discover problems during normal operations on
      high extent coutn files. Our regression tests don't tend to exercise
      files with hundreds of thousands to millions of extents, so mostly
      this isn't noticed.
      
      However, trying to run things like xfs_mdrestore of large filesystem
      dumps on a debug kernel quickly becomes impossible as the CPU is
      completely burnt up repeatedly walking the sparse file bmapbt that
      is generated for every allocation that is made.
      
      Hence, if the file has more than 10,000 extents, just don't bother
      with walking the tree to check it exhaustively. The btree code has
      checks that ensure that the newly inserted/removed/modified record
      is correctly ordered, so the entrie tree walk in thses cases has
      limited additional value.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      e3543819
    • Dave Chinner's avatar
      xfs: add tracepoints to readpage calls · 121e213e
      Dave Chinner authored
      This allows us to see page cache driven readahead in action as it
      passes through XFS. This helps to understand buffered read
      throughput problems such as readahead IO IO sizes being too small
      for the underlying device to reach max throughput.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      121e213e
  3. 04 Jan, 2016 11 commits
  4. 03 Jan, 2016 3 commits
  5. 31 Dec, 2015 5 commits
    • Linus Torvalds's avatar
      Merge tag 'pci-v4.4-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · 9c982e86
      Linus Torvalds authored
      Pull PCI bugfix from Bjorn Helgaas:
       "Here's another fix for v4.4.
      
        This fixes 32-bit config reads for the HiSilicon driver.  Obviously
        the driver is completely broken without this fix (apparently it
        actually was tested internally, but got broken somehow in the process
        of upstreaming it).
      
        Summary:
      
        HiSilicon host bridge driver
          Fix 32-bit config reads (Dongdong Liu)"
      
      * tag 'pci-v4.4-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        PCI: hisi: Fix hisi_pcie_cfg_read() 32-bit reads
      9c982e86
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · 7c672dd6
      Linus Torvalds authored
      Pull sparc fixes from David Miller:
       "Just some missing syscall wire ups"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
        sparc: Wire up mlock2 system call.
        sparc: Add all necessary direct socket system calls.
      7c672dd6
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 8f5daf2a
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Prevent XFRM per-cpu counter updates for one namespace from being
          applied to another namespace.  Fix from DanS treetman.
      
       2) Fix RCU de-reference in iwl_mvm_get_key_sta_id(), from Johannes
          Berg.
      
       3) Remove ethernet header assumption in nft_do_chain_netdev(), from
          Pablo Neira Ayuso.
      
       4) Fix cpsw PHY ident with multiple slaves and fixed-phy, from Pascal
          Speck.
      
       5) Fix use after free in sixpack_close and mkiss_close.
      
       6) Fix VXLAN fw assertion on bnx2x, from Yuval Mintz.
      
       7) natsemi doesn't check for DMA mapping errors, from Alexey
          Khoroshilov.
      
       8) Fix inverted test in ip6addrlbl_get(), from ANdrey Ryabinin.
      
       9) Missing initialization of needed_headroom in geneve tunnel driver,
          from Paolo Abeni.
      
      10) Fix conntrack template leak in openvswitch, from Joe Stringer.
      
      11) Mission initialization of wq->flags in sock_alloc_inode(), from
          Nicolai Stange.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (35 commits)
        sctp: sctp should release assoc when sctp_make_abort_user return NULL in sctp_close
        net, socket, socket_wq: fix missing initialization of flags
        drivers: net: cpsw: fix error return code
        openvswitch: Fix template leak in error cases.
        sctp: label accepted/peeled off sockets
        sctp: use GFP_USER for user-controlled kmalloc
        qlcnic: fix a loop exit condition better
        net: cdc_ncm: avoid changing RX/TX buffers on MTU changes
        geneve: initialize needed_headroom
        ipv6: honor ifindex in case we receive ll addresses in router advertisements
        addrconf: always initialize sysctl table data
        ipv6/addrlabel: fix ip6addrlbl_get()
        switchdev: bridge: Pass ageing time as clock_t instead of jiffies
        sh_eth: fix 16-bit descriptor field access endianness too
        veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
        net: usb: cdc_ncm: Adding Dell DW5813 LTE AT&T Mobile Broadband Card
        net: usb: cdc_ncm: Adding Dell DW5812 LTE Verizon Mobile Broadband Card
        natsemi: add checks for dma mapping errors
        rhashtable: Kill harmless RCU warning in rhashtable_walk_init
        openvswitch: correct encoding of set tunnel action attributes
        ...
      8f5daf2a
    • David S. Miller's avatar
      sparc: Wire up mlock2 system call. · 42d85c52
      David S. Miller authored
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      42d85c52
    • David S. Miller's avatar
      sparc: Add all necessary direct socket system calls. · 8b30ca73
      David S. Miller authored
      The GLIBC folks would like to eliminate socketcall support
      eventually, and this makes sense regardless so wire them
      all up.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8b30ca73
  6. 30 Dec, 2015 16 commits
    • Xin Long's avatar
      sctp: sctp should release assoc when sctp_make_abort_user return NULL in sctp_close · 068d8bd3
      Xin Long authored
      In sctp_close, sctp_make_abort_user may return NULL because of memory
      allocation failure. If this happens, it will bypass any state change
      and never free the assoc. The assoc has no chance to be freed and it
      will be kept in memory with the state it had even after the socket is
      closed by sctp_close().
      
      So if sctp_make_abort_user fails to allocate memory, we should abort
      the asoc via sctp_primitive_ABORT as well. Just like the annotation in
      sctp_sf_cookie_wait_prm_abort and sctp_sf_do_9_1_prm_abort said,
      "Even if we can't send the ABORT due to low memory delete the TCB.
      This is a departure from our typical NOMEM handling".
      
      But then the chunk is NULL (low memory) and the SCTP_CMD_REPLY cmd would
      dereference the chunk pointer, and system crash. So we should add
      SCTP_CMD_REPLY cmd only when the chunk is not NULL, just like other
      places where it adds SCTP_CMD_REPLY cmd.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      068d8bd3
    • David S. Miller's avatar
      Merge tag 'wireless-drivers-for-davem-2015-12-28' of... · a0ccc3f2
      David S. Miller authored
      Merge tag 'wireless-drivers-for-davem-2015-12-28' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
      
      Kalle Valo says:
      
      ====================
      iwlwifi
      
      * don't load firmware that won't exist for 7260
      * fix RCU splat
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a0ccc3f2
    • Nicolai Stange's avatar
      net, socket, socket_wq: fix missing initialization of flags · 574aab1e
      Nicolai Stange authored
      Commit ceb5d58b ("net: fix sock_wake_async() rcu protection") from
      the current 4.4 release cycle introduced a new flags member in
      struct socket_wq and moved SOCKWQ_ASYNC_NOSPACE and SOCKWQ_ASYNC_WAITDATA
      from struct socket's flags member into that new place.
      
      Unfortunately, the new flags field is never initialized properly, at least
      not for the struct socket_wq instance created in sock_alloc_inode().
      
      One particular issue I encountered because of this is that my GNU Emacs
      failed to draw anything on my desktop -- i.e. what I got is a transparent
      window, including the title bar. Bisection lead to the commit mentioned
      above and further investigation by means of strace told me that Emacs
      is indeed speaking to my Xorg through an O_ASYNC AF_UNIX socket. This is
      reproducible 100% of times and the fact that properly initializing the
      struct socket_wq ->flags fixes the issue leads me to the conclusion that
      somehow SOCKWQ_ASYNC_WAITDATA got set in the uninitialized ->flags,
      preventing my Emacs from receiving any SIGIO's due to data becoming
      available and it got stuck.
      
      Make sock_alloc_inode() set the newly created struct socket_wq's ->flags
      member to zero.
      
      Fixes: ceb5d58b ("net: fix sock_wake_async() rcu protection")
      Signed-off-by: default avatarNicolai Stange <nicstange@gmail.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      574aab1e
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · c6169202
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "Make the block layer great again.
      
        Basically three amazing fixes in this pull request, split into 4
        patches.  Believe me, they should go into 4.4.  Two of them fix a
        regression, the third and last fixes an easy-to-trigger bug.
      
         - Fix a bad irq enable through null_blk, for queue_mode=1 and using
           timer completions.  Add a block helper to restart a queue
           asynchronously, and use that from null_blk.  From me.
      
         - Fix a performance issue in NVMe.  Some devices (Intel Pxxxx) expose
           a stripe boundary, and performance suffers if we cross it.  We took
           that into account for merging, but not for the newer splitting
           code.  Fix from Keith.
      
         - Fix a kernel oops in lightnvm with multiple channels.  From Matias"
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        lightnvm: wrong offset in bad blk lun calculation
        null_blk: use async queue restart helper
        block: add blk_start_queue_async()
        block: Split bios on chunk boundaries
      c6169202
    • Gary Wang's avatar
      drm/i915: increase the tries for HDMI hotplug live status checking · 3d8acd1f
      Gary Wang authored
      The total delay of HDMI hotplug detecting with 30ms is sometimes not
      enoughtfor HDMI live status up with specific HDMI monitors in BSW platform.
      
      After doing experiments for following monitors, it needs 80ms at least
      for those worst cases.
      
      Lenovo L246 1xwA (4 failed, necessary hot-plug delay: 58/40/60/40ms)
      Philips HH2AP (9 failed, necessary hot-plug delay: 80/50/50/60/46/40/58/58/39ms)
      BENQ ET-0035-N (6 failed, necessary hot-plug delay: 60/50/50/80/80/40ms)
      DELL U2713HM (2 failed, necessary hot-plug delay: 58/59ms)
      HP HP-LP2475w (5 failed, necessary hot-plug delay: 70/50/40/60/40ms)
      
      It looks like 70-80 ms is BSW platform needs in some bad cases of the
      monitors at this end (8 times delay at most). Keep less than 100ms for
      HDCP pulse HPD low (with at least 100ms) to respond a plug out.
      Reviewed-by: default avatarCooper Chiou <cooper.chiou@intel.com>
      Tested-by: default avatarGary Wang <gary.c.wang@intel.com>
      Cc: Gavin Hindman <gavin.hindman@intel.com>
      Cc: Sonika Jindal <sonika.jindal@intel.com>
      Cc: Shashank Sharma <shashank.sharma@intel.com>
      Cc: Shobhit Kumar <shobhit.kumar@intel.com>
      Signed-off-by: default avatarGary Wang <gary.c.wang@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1450858295-12804-1-git-send-email-gary.c.wang@intel.comTested-by: default avatarShobhit Kumar <shobhit.kumar@intel.com>
      Cc: drm-intel-fixes@lists.freedesktop.org
      Fixes: 237ed86c ("drm/i915: Check live status before reading edid")
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      (cherry picked from commit f8d03ea0)
      [Jani: undo the file mode change of the original commit]
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      3d8acd1f
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 866be88a
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "9 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm/vmstat: fix overflow in mod_zone_page_state()
        ocfs2/dlm: clear migration_pending when migration target goes down
        mm/memory_hotplug.c: check for missing sections in test_pages_in_a_zone()
        ocfs2: fix flock panic issue
        m32r: add io*_rep helpers
        m32r: fix build failure
        arch/x86/xen/suspend.c: include xen/xen.h
        mm: memcontrol: fix possible memcg leak due to interrupted reclaim
        ocfs2: fix BUG when calculate new backup super
      866be88a
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · e25bd6ca
      Linus Torvalds authored
      Pull vfs fix from Al Viro:
       "Fix for 3.15 breakage of fcntl64() in arm OABI compat.  -stable
        fodder"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        [PATCH] arm: fix handling of F_OFD_... in oabi_fcntl64()
      e25bd6ca
    • Heiko Carstens's avatar
      mm/vmstat: fix overflow in mod_zone_page_state() · 6cdb18ad
      Heiko Carstens authored
      mod_zone_page_state() takes a "delta" integer argument.  delta contains
      the number of pages that should be added or subtracted from a struct
      zone's vm_stat field.
      
      If a zone is larger than 8TB this will cause overflows.  E.g.  for a
      zone with a size slightly larger than 8TB the line
      
          mod_zone_page_state(zone, NR_ALLOC_BATCH, zone->managed_pages);
      
      in mm/page_alloc.c:free_area_init_core() will result in a negative
      result for the NR_ALLOC_BATCH entry within the zone's vm_stat, since 8TB
      contain 0x8xxxxxxx pages which will be sign extended to a negative
      value.
      
      Fix this by changing the delta argument to long type.
      
      This could fix an early boot problem seen on s390, where we have a 9TB
      system with only one node.  ZONE_DMA contains 2GB and ZONE_NORMAL the
      rest.  The system is trying to allocate a GFP_DMA page but ZONE_DMA is
      completely empty, so it tries to reclaim pages in an endless loop.
      
      This was seen on a heavily patched 3.10 kernel.  One possible
      explaination seem to be the overflows caused by mod_zone_page_state().
      Unfortunately I did not have the chance to verify that this patch
      actually fixes the problem, since I don't have access to the system
      right now.  However the overflow problem does exist anyway.
      
      Given the description that a system with slightly less than 8TB does
      work, this seems to be a candidate for the observed problem.
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6cdb18ad
    • xuejiufei's avatar
      ocfs2/dlm: clear migration_pending when migration target goes down · cc28d6d8
      xuejiufei authored
      We have found a BUG on res->migration_pending when migrating lock
      resources.  The situation is as follows.
      
      dlm_mark_lockres_migration
        res->migration_pending = 1;
        __dlm_lockres_reserve_ast
        dlm_lockres_release_ast returns with res->migration_pending remains
            because other threads reserve asts
        wait dlm_migration_can_proceed returns 1
        >>>>>>> o2hb found that target goes down and remove target
                from domain_map
        dlm_migration_can_proceed returns 1
        dlm_mark_lockres_migrating returns -ESHOTDOWN with
            res->migration_pending still remains.
      
      When reentering dlm_mark_lockres_migrating(), it will trigger the BUG_ON
      with res->migration_pending.  So clear migration_pending when target is
      down.
      Signed-off-by: default avatarJiufei Xue <xuejiufei@huawei.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cc28d6d8
    • Andrew Banman's avatar
      mm/memory_hotplug.c: check for missing sections in test_pages_in_a_zone() · 5f0f2887
      Andrew Banman authored
      test_pages_in_a_zone() does not account for the possibility of missing
      sections in the given pfn range.  pfn_valid_within always returns 1 when
      CONFIG_HOLES_IN_ZONE is not set, allowing invalid pfns from missing
      sections to pass the test, leading to a kernel oops.
      
      Wrap an additional pfn loop with PAGES_PER_SECTION granularity to check
      for missing sections before proceeding into the zone-check code.
      
      This also prevents a crash from offlining memory devices with missing
      sections.  Despite this, it may be a good idea to keep the related patch
      '[PATCH 3/3] drivers: memory: prohibit offlining of memory blocks with
      missing sections' because missing sections in a memory block may lead to
      other problems not covered by the scope of this fix.
      Signed-off-by: default avatarAndrew Banman <abanman@sgi.com>
      Acked-by: default avatarAlex Thorlton <athorlton@sgi.com>
      Cc: Russ Anderson <rja@sgi.com>
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Greg KH <greg@kroah.com>
      Cc: Seth Jennings <sjennings@variantweb.net>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5f0f2887
    • Junxiao Bi's avatar
      ocfs2: fix flock panic issue · b5a8bc33
      Junxiao Bi authored
      Commit 4f656367 ("Move locks API users to locks_lock_inode_wait()")
      move flock/posix lock indentify code to locks_lock_inode_wait(), but
      missed to set fl_flags to FL_FLOCK which caused the following kernel
      panic on 4.4.0_rc5.
      
        kernel BUG at fs/locks.c:1895!
        invalid opcode: 0000 [#1] SMP
        Modules linked in: ocfs2(O) ocfs2_dlmfs(O) ocfs2_stack_o2cb(O) ocfs2_dlm(O) ocfs2_nodemanager(O) ocfs2_stackglue(O) iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi xen_kbdfront xen_netfront xen_fbfront xen_blkfront
        CPU: 0 PID: 20268 Comm: flock_unit_test Tainted: G           O    4.4.0-rc5-next-20151217 #1
        Hardware name: Xen HVM domU, BIOS 4.3.1OVM 05/14/2014
        task: ffff88007b3672c0 ti: ffff880028b58000 task.ti: ffff880028b58000
        RIP: locks_lock_inode_wait+0x2e/0x160
        Call Trace:
          ocfs2_do_flock+0x91/0x160 [ocfs2]
          ocfs2_flock+0x76/0xd0 [ocfs2]
          SyS_flock+0x10f/0x1a0
          entry_SYSCALL_64_fastpath+0x12/0x71
        Code: e5 41 57 41 56 49 89 fe 41 55 41 54 53 48 89 f3 48 81 ec 88 00 00 00 8b 46 40 83 e0 03 83 f8 01 0f 84 ad 00 00 00 83 f8 02 74 04 <0f> 0b eb fe 4c 8d ad 60 ff ff ff 4c 8d 7b 58 e8 0e 8e 73 00 4d
        RIP  locks_lock_inode_wait+0x2e/0x160
         RSP <ffff880028b5bce8>
        ---[ end trace dfca74ec9b5b274c ]---
      
      Fixes: 4f656367 ("Move locks API users to locks_lock_inode_wait()")
      Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b5a8bc33
    • Sudip Mukherjee's avatar
      m32r: add io*_rep helpers · 92a8ed4c
      Sudip Mukherjee authored
      m32r allmodconfig was failing with the error:
      
        error: implicit declaration of function 'read'
      
      On checking io.h it turned out that 'read' is not defined but 'readb' is
      defined and 'ioread8' will then obviously mean 'readb'.
      
      At the same time some of the helper functions ioreadN_rep() and
      iowriteN_rep() were missing which also led to the build failure.
      Signed-off-by: default avatarSudip Mukherjee <sudip@vectorindia.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      92a8ed4c
    • Sudip Mukherjee's avatar
      m32r: fix build failure · 6122192e
      Sudip Mukherjee authored
      m32r allmodconfig is failing with:
      
        In file included from ../include/linux/kvm_para.h:4:0,
                         from ../kernel/watchdog.c:26:
        ../include/uapi/linux/kvm_para.h:30:26: fatal error: asm/kvm_para.h: No such file or directory
      
      kvm_para.h was not included in the build.
      Signed-off-by: default avatarSudip Mukherjee <sudip@vectorindia.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6122192e
    • Andrew Morton's avatar
      arch/x86/xen/suspend.c: include xen/xen.h · facca616
      Andrew Morton authored
      Fix the build warning:
      
        arch/x86/xen/suspend.c: In function 'xen_arch_pre_suspend':
        arch/x86/xen/suspend.c:70:9: error: implicit declaration of function 'xen_pv_domain' [-Werror=implicit-function-declaration]
                if (xen_pv_domain())
                    ^
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      facca616
    • Vladimir Davydov's avatar
      mm: memcontrol: fix possible memcg leak due to interrupted reclaim · 6df38689
      Vladimir Davydov authored
      Memory cgroup reclaim can be interrupted with mem_cgroup_iter_break()
      once enough pages have been reclaimed, in which case, in contrast to a
      full round-trip over a cgroup sub-tree, the current position stored in
      mem_cgroup_reclaim_iter of the target cgroup does not get invalidated
      and so is left holding the reference to the last scanned cgroup.  If the
      target cgroup does not get scanned again (we might have just reclaimed
      the last page or all processes might exit and free their memory
      voluntary), we will leak it, because there is nobody to put the
      reference held by the iterator.
      
      The problem is easy to reproduce by running the following command
      sequence in a loop:
      
          mkdir /sys/fs/cgroup/memory/test
          echo 100M > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
          echo $$ > /sys/fs/cgroup/memory/test/cgroup.procs
          memhog 150M
          echo $$ > /sys/fs/cgroup/memory/cgroup.procs
          rmdir test
      
      The cgroups generated by it will never get freed.
      
      This patch fixes this issue by making mem_cgroup_iter avoid taking
      reference to the current position.  In order not to hit use-after-free
      bug while running reclaim in parallel with cgroup deletion, we make use
      of ->css_released cgroup callback to clear references to the dying
      cgroup in all reclaim iterators that might refer to it.  This callback
      is called right before scheduling rcu work which will free css, so if we
      access iter->position from rcu read section, we might be sure it won't
      go away under us.
      
      [hannes@cmpxchg.org: clean up css ref handling]
      Fixes: 5ac8fb31 ("mm: memcontrol: convert reclaim iterator to simple css refcounting")
      Signed-off-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@kernel.org>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: <stable@vger.kernel.org>	[3.19+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6df38689
    • Joseph Qi's avatar
      ocfs2: fix BUG when calculate new backup super · 5c9ee4cb
      Joseph Qi authored
      When resizing, it firstly extends the last gd.  Once it should backup
      super in the gd, it calculates new backup super and update the
      corresponding value.
      
      But it currently doesn't consider the situation that the backup super is
      already done.  And in this case, it still sets the bit in gd bitmap and
      then decrease from bg_free_bits_count, which leads to a corrupted gd and
      trigger the BUG in ocfs2_block_group_set_bits:
      
          BUG_ON(le16_to_cpu(bg->bg_free_bits_count) < num_bits);
      
      So check whether the backup super is done and then do the updates.
      Signed-off-by: default avatarJoseph Qi <joseph.qi@huawei.com>
      Reviewed-by: default avatarJiufei Xue <xuejiufei@huawei.com>
      Reviewed-by: default avatarYiwen Jiang <jiangyiwen@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5c9ee4cb