1. 18 Jan, 2016 1 commit
  2. 11 Jan, 2016 4 commits
    • Dave Chinner's avatar
      dde7f55b
    • Dave Chinner's avatar
      xfs: handle dquot buffer readahead in log recovery correctly · 7d6a13f0
      Dave Chinner authored
      When we do dquot readahead in log recovery, we do not use a verifier
      as the underlying buffer may not have dquots in it. e.g. the
      allocation operation hasn't yet been replayed. Hence we do not want
      to fail recovery because we detect an operation to be replayed has
      not been run yet. This problem was addressed for inodes in commit
      d8914002 ("xfs: inode buffers may not be valid during recovery
      readahead") but the problem was not recognised to exist for dquots
      and their buffers as the dquot readahead did not have a verifier.
      
      The result of not using a verifier is that when the buffer is then
      next read to replay a dquot modification, the dquot buffer verifier
      will only be attached to the buffer if *readahead is not complete*.
      Hence we can read the buffer, replay the dquot changes and then add
      it to the delwri submission list without it having a verifier
      attached to it. This then generates warnings in xfs_buf_ioapply(),
      which catches and warns about this case.
      
      Fix this and make it handle the same readahead verifier error cases
      as for inode buffers by adding a new readahead verifier that has a
      write operation as well as a read operation that marks the buffer as
      not done if any corruption is detected.  Also make sure we don't run
      readahead if the dquot buffer has been marked as cancelled by
      recovery.
      
      This will result in readahead either succeeding and the buffer
      having a valid write verifier, or readahead failing and the buffer
      state requiring the subsequent read to resubmit the IO with the new
      verifier.  In either case, this will result in the buffer always
      ending up with a valid write verifier on it.
      
      Note: we also need to fix the inode buffer readahead error handling
      to mark the buffer with EIO. Brian noticed the code I copied from
      there wrong during review, so fix it at the same time. Add comments
      linking the two functions that handle readahead verifier errors
      together so we don't forget this behavioural link in future.
      
      cc: <stable@vger.kernel.org> # 3.12 - current
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      7d6a13f0
    • Dave Chinner's avatar
      xfs: inode recovery readahead can race with inode buffer creation · b79f4a1c
      Dave Chinner authored
      When we do inode readahead in log recovery, we do can do the
      readahead before we've replayed the icreate transaction that stamps
      the buffer with inode cores. The inode readahead verifier catches
      this and marks the buffer as !done to indicate that it doesn't yet
      contain valid inodes.
      
      In adding buffer error notification  (i.e. setting b_error = -EIO at
      the same time as as we clear the done flag) to such a readahead
      verifier failure, we can then get subsequent inode recovery failing
      with this error:
      
      XFS (dm-0): metadata I/O error: block 0xa00060 ("xlog_recover_do..(read#2)") error 5 numblks 32
      
      This occurs when readahead completion races with icreate item replay
      such as:
      
      	inode readahead
      		find buffer
      		lock buffer
      		submit RA io
      	....
      	icreate recovery
      	    xfs_trans_get_buffer
      		find buffer
      		lock buffer
      		<blocks on RA completion>
      	.....
      	<ra completion>
      		fails verifier
      		clear XBF_DONE
      		set bp->b_error = -EIO
      		release and unlock buffer
      	<icreate gains lock>
      	icreate initialises buffer
      	marks buffer as done
      	adds buffer to delayed write queue
      	releases buffer
      
      At this point, we have an initialised inode buffer that is up to
      date but has an -EIO state registered against it. When we finally
      get to recovering an inode in that buffer:
      
      	inode item recovery
      	    xfs_trans_read_buffer
      		find buffer
      		lock buffer
      		sees XBF_DONE is set, returns buffer
      	    sees bp->b_error is set
      		fail log recovery!
      
      Essentially, we need xfs_trans_get_buf_map() to clear the error status of
      the buffer when doing a lookup. This function returns uninitialised
      buffers, so the buffer returned can not be in an error state and
      none of the code that uses this function expects b_error to be set
      on return. Indeed, there is an ASSERT(!bp->b_error); in the
      transaction case in xfs_trans_get_buf_map() that would have caught
      this if log recovery used transactions....
      
      This patch firstly changes the inode readahead failure to set -EIO
      on the buffer, and secondly changes xfs_buf_get_map() to never
      return a buffer with an error state set so this first change doesn't
      cause unexpected log recovery failures.
      
      cc: <stable@vger.kernel.org> # 3.12 - current
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      b79f4a1c
    • Eric Sandeen's avatar
      xfs: eliminate committed arg from xfs_bmap_finish · f6106efa
      Eric Sandeen authored
      Calls to xfs_bmap_finish() and xfs_trans_ijoin(), and the
      associated comments were replicated several times across
      the attribute code, all dealing with what to do if the
      transaction was or wasn't committed.
      
      And in that replicated code, an ASSERT() test of an
      uninitialized variable occurs in several locations:
      
      	error = xfs_attr_thing(&args);
      	if (!error) {
      		error = xfs_bmap_finish(&args.trans, args.flist,
      					&committed);
      	}
      	if (error) {
      		ASSERT(committed);
      
      If the first xfs_attr_thing() failed, we'd skip the xfs_bmap_finish,
      never set "committed", and then test it in the ASSERT.
      
      Fix this up by moving the committed state internal to xfs_bmap_finish,
      and add a new inode argument.  If an inode is passed in, it is passed
      through to __xfs_trans_roll() and joined to the transaction there if
      the transaction was committed.
      
      xfs_qm_dqalloc() was a little unique in that it called bjoin rather
      than ijoin, but as Dave points out we can detect the committed state
      but checking whether (*tpp != tp).
      
      Addresses-Coverity-Id: 102360
      Addresses-Coverity-Id: 102361
      Addresses-Coverity-Id: 102363
      Addresses-Coverity-Id: 102364
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      f6106efa
  3. 08 Jan, 2016 2 commits
    • Dave Chinner's avatar
      xfs: bmapbt checking on debug kernels too expensive · e3543819
      Dave Chinner authored
      For large sparse or fragmented files, checking every single entry in
      the bmapbt on every operation is prohibitively expensive. Especially
      as such checks rarely discover problems during normal operations on
      high extent coutn files. Our regression tests don't tend to exercise
      files with hundreds of thousands to millions of extents, so mostly
      this isn't noticed.
      
      However, trying to run things like xfs_mdrestore of large filesystem
      dumps on a debug kernel quickly becomes impossible as the CPU is
      completely burnt up repeatedly walking the sparse file bmapbt that
      is generated for every allocation that is made.
      
      Hence, if the file has more than 10,000 extents, just don't bother
      with walking the tree to check it exhaustively. The btree code has
      checks that ensure that the newly inserted/removed/modified record
      is correctly ordered, so the entrie tree walk in thses cases has
      limited additional value.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      e3543819
    • Dave Chinner's avatar
      xfs: add tracepoints to readpage calls · 121e213e
      Dave Chinner authored
      This allows us to see page cache driven readahead in action as it
      passes through XFS. This helps to understand buffered read
      throughput problems such as readahead IO IO sizes being too small
      for the underlying device to reach max throughput.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      121e213e
  4. 04 Jan, 2016 26 commits
  5. 03 Jan, 2016 3 commits
  6. 31 Dec, 2015 4 commits
    • Linus Torvalds's avatar
      Merge tag 'pci-v4.4-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · 9c982e86
      Linus Torvalds authored
      Pull PCI bugfix from Bjorn Helgaas:
       "Here's another fix for v4.4.
      
        This fixes 32-bit config reads for the HiSilicon driver.  Obviously
        the driver is completely broken without this fix (apparently it
        actually was tested internally, but got broken somehow in the process
        of upstreaming it).
      
        Summary:
      
        HiSilicon host bridge driver
          Fix 32-bit config reads (Dongdong Liu)"
      
      * tag 'pci-v4.4-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        PCI: hisi: Fix hisi_pcie_cfg_read() 32-bit reads
      9c982e86
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · 7c672dd6
      Linus Torvalds authored
      Pull sparc fixes from David Miller:
       "Just some missing syscall wire ups"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
        sparc: Wire up mlock2 system call.
        sparc: Add all necessary direct socket system calls.
      7c672dd6
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 8f5daf2a
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Prevent XFRM per-cpu counter updates for one namespace from being
          applied to another namespace.  Fix from DanS treetman.
      
       2) Fix RCU de-reference in iwl_mvm_get_key_sta_id(), from Johannes
          Berg.
      
       3) Remove ethernet header assumption in nft_do_chain_netdev(), from
          Pablo Neira Ayuso.
      
       4) Fix cpsw PHY ident with multiple slaves and fixed-phy, from Pascal
          Speck.
      
       5) Fix use after free in sixpack_close and mkiss_close.
      
       6) Fix VXLAN fw assertion on bnx2x, from Yuval Mintz.
      
       7) natsemi doesn't check for DMA mapping errors, from Alexey
          Khoroshilov.
      
       8) Fix inverted test in ip6addrlbl_get(), from ANdrey Ryabinin.
      
       9) Missing initialization of needed_headroom in geneve tunnel driver,
          from Paolo Abeni.
      
      10) Fix conntrack template leak in openvswitch, from Joe Stringer.
      
      11) Mission initialization of wq->flags in sock_alloc_inode(), from
          Nicolai Stange.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (35 commits)
        sctp: sctp should release assoc when sctp_make_abort_user return NULL in sctp_close
        net, socket, socket_wq: fix missing initialization of flags
        drivers: net: cpsw: fix error return code
        openvswitch: Fix template leak in error cases.
        sctp: label accepted/peeled off sockets
        sctp: use GFP_USER for user-controlled kmalloc
        qlcnic: fix a loop exit condition better
        net: cdc_ncm: avoid changing RX/TX buffers on MTU changes
        geneve: initialize needed_headroom
        ipv6: honor ifindex in case we receive ll addresses in router advertisements
        addrconf: always initialize sysctl table data
        ipv6/addrlabel: fix ip6addrlbl_get()
        switchdev: bridge: Pass ageing time as clock_t instead of jiffies
        sh_eth: fix 16-bit descriptor field access endianness too
        veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
        net: usb: cdc_ncm: Adding Dell DW5813 LTE AT&T Mobile Broadband Card
        net: usb: cdc_ncm: Adding Dell DW5812 LTE Verizon Mobile Broadband Card
        natsemi: add checks for dma mapping errors
        rhashtable: Kill harmless RCU warning in rhashtable_walk_init
        openvswitch: correct encoding of set tunnel action attributes
        ...
      8f5daf2a
    • David S. Miller's avatar
      sparc: Wire up mlock2 system call. · 42d85c52
      David S. Miller authored
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      42d85c52