1. 03 Apr, 2014 2 commits
    • Mark Tinguely's avatar
      xfs: fix directory hash ordering bug · c88547a8
      Mark Tinguely authored
      Commit f5ea1100 ("xfs: add CRCs to dir2/da node blocks") introduced
      in 3.10 incorrectly converted the btree hash index array pointer in
      xfs_da3_fixhashpath(). It resulted in the the current hash always
      being compared against the first entry in the btree rather than the
      current block index into the btree block's hash entry array. As a
      result, it was comparing the wrong hashes, and so could misorder the
      entries in the btree.
      
      For most cases, this doesn't cause any problems as it requires hash
      collisions to expose the ordering problem. However, when there are
      hash collisions within a directory there is a very good probability
      that the entries will be ordered incorrectly and that actually
      matters when duplicate hashes are placed into or removed from the
      btree block hash entry array.
      
      This bug results in an on-disk directory corruption and that results
      in directory verifier functions throwing corruption warnings into
      the logs. While no data or directory entries are lost, access to
      them may be compromised, and attempts to remove entries from a
      directory that has suffered from this corruption may result in a
      filesystem shutdown.  xfs_repair will fix the directory hash
      ordering without data loss occuring.
      
      [dchinner: wrote useful a commit message]
      
      cc: <stable@vger.kernel.org>
      Reported-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarMark Tinguely <tinguely@sgi.com>
      Reviewed-by: default avatarBen Myers <bpm@sgi.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      c88547a8
    • Dan Carpenter's avatar
      xfs: extra semi-colon breaks a condition · 805eeb8e
      Dan Carpenter authored
      There were some extra semi-colons here which mean that we return true
      unintentionally.
      
      Fixes: a49935f2 ('xfs: xfs_check_page_type buffer checks need help')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      805eeb8e
  2. 07 Mar, 2014 5 commits
    • Dave Chinner's avatar
      xfs: inode log reservations are still too small · fe4c224a
      Dave Chinner authored
      Back in commit 23956703 ("xfs: inode log reservations are too
      small"), the reservation size was increased to take into account the
      difference in size between the in-memory BMBT block headers and the
      on-disk BMDR headers. This solved a transaction overrun when logging
      the inode size.
      
      Recently, however, we've seen a number of these same overruns on
      kernels with the above fix in it. All of them have been by 4 bytes,
      so we must still not be accounting for something correctly.
      
      Through inspection it turns out the above commit didn't take into
      account everything it should have. That is, it only accounts for a
      single log op_hdr structure, when it can actually require up to four
      op_hdrs - one for each region (log iovec) that is formatted. These
      regions are the inode log format header, the inode core, and the two
      forks that can be held in the literal area of the inode.
      
      This means we are not accounting for 36 bytes of log space that the
      transaction can use, and hence when we get inodes in certain formats
      with particular fragmentation patterns we can overrun the
      transaction. Fix this by adding the correct accounting for log
      op_headers in the transaction.
      Tested-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      fe4c224a
    • Dave Chinner's avatar
      xfs: xfs_check_page_type buffer checks need help · a49935f2
      Dave Chinner authored
      xfs_aops_discard_page() was introduced in the following commit:
      
        xfs: truncate delalloc extents when IO fails in writeback
      
      ... to clean up left over delalloc ranges after I/O failure in
      ->writepage(). generic/224 tests for this scenario and occasionally
      reproduces panics on sub-4k blocksize filesystems.
      
      The cause of this is failure to clean up the delalloc range on a
      page where the first buffer does not match one of the expected
      states of xfs_check_page_type(). If a buffer is not unwritten,
      delayed or dirty&mapped, xfs_check_page_type() stops and
      immediately returns 0.
      
      The stress test of generic/224 creates a scenario where the first
      several buffers of a page with delayed buffers are mapped & uptodate
      and some subsequent buffer is delayed. If the ->writepage() happens
      to fail for this page, xfs_aops_discard_page() incorrectly skips
      the entire page.
      
      This then causes later failures either when direct IO maps the range
      and finds the stale delayed buffer, or we evict the inode and find
      that the inode still has a delayed block reservation accounted to
      it.
      
      We can easily fix this xfs_aops_discard_page() failure by making
      xfs_check_page_type() check all buffers, but this breaks
      xfs_convert_page() more than it is already broken. Indeed,
      xfs_convert_page() wants xfs_check_page_type() to tell it if the
      first buffers on the pages are of a type that can be aggregated into
      the contiguous IO that is already being built.
      
      xfs_convert_page() should not be writing random buffers out of a
      page, but the current behaviour will cause it to do so if there are
      buffers that don't match the current specification on the page.
      Hence for xfs_convert_page() we need to:
      
      	a) return "not ok" if the first buffer on the page does not
      	match the specification provided to we don't write anything;
      	and
      	b) abort it's buffer-add-to-io loop the moment we come
      	across a buffer that does not match the specification.
      
      Hence we need to fix both xfs_check_page_type() and
      xfs_convert_page() to work correctly with pages that have mixed
      buffer types, whilst allowing xfs_aops_discard_page() to scan all
      buffers on the page for a type match.
      Reported-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      a49935f2
    • Brian Foster's avatar
      xfs: avoid AGI/AGF deadlock scenario for inode chunk allocation · e480a723
      Brian Foster authored
      The inode chunk allocation path can lead to deadlock conditions if
      a transaction is dirtied with an AGF (to fix up the freelist) for
      an AG that cannot satisfy the actual allocation request. This code
      path is written to try and avoid this scenario, but it can be
      reproduced by running xfstests generic/270 in a loop on a 512b fs.
      
      An example situation is:
      - process A attempts an inode allocation on AG 3, modifies
        the freelist, fails the allocation and ultimately moves on to
        AG 0 with the AG 3 AGF held
      - process B is doing a free space operation (i.e., truncate) and
        acquires the AG 0 AGF, waits on the AG 3 AGF
      - process A acquires the AG 0 AGI, waits on the AG 0 AGF (deadlock)
      
      The problem here is that process A acquired the AG 3 AGF while
      moving on to AG 0 (and releasing the AG 3 AGI with the AG 3 AGF
      held). xfs_dialloc() makes one pass through each of the AGs when
      attempting to allocate an inode chunk. The expectation is a clean
      transaction if a particular AG cannot satisfy the allocation
      request. xfs_ialloc_ag_alloc() is written to support this through
      use of the minalignslop allocation args field.
      
      When using the agi->agi_newino optimization, we attempt an exact
      bno allocation request based on the location of the previously
      allocated chunk. minalignslop is set to inform the allocator that
      we will require alignment on this chunk, and thus to not allow the
      request for this AG if the extra space is not available. Suppose
      that the AG in question has just enough space for this request, but
      not at the requested bno. xfs_alloc_fix_freelist() will proceed as
      normal as it determines the request should succeed, and thus it is
      allowed to modify the agf. xfs_alloc_ag_vextent() ultimately fails
      because the requested bno is not available. In response, the caller
      moves on to a NEAR_BNO allocation request for the same AG. The
      alignment is set, but the minalignslop field is never reset. This
      increases the overall requirement of the request from the first
      attempt. If this delta is the difference between allocation success
      and failure for the AG, xfs_alloc_fix_freelist() rejects this
      request outright the second time around and causes the allocation
      request to unnecessarily fail for this AG.
      
      To address this situation, reset the minalignslop field immediately
      after use and prevent it from leaking into subsequent requests.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarMark Tinguely <tinguely@sgi.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      e480a723
    • Dave Chinner's avatar
      xfs: use NOIO contexts for vm_map_ram · ae687e58
      Dave Chinner authored
      When we map pages in the buffer cache, we can do so in GFP_NOFS
      contexts. However, the vmap interfaces do not provide any method of
      communicating this information to memory reclaim, and hence we get
      lockdep complaining about it regularly and occassionally see hangs
      that may be vmap related reclaim deadlocks. We can also see these
      same problems from anywhere where we use vmalloc for a large buffer
      (e.g. attribute code) inside a transaction context.
      
      A typical lockdep report shows up as a reclaim state warning like so:
      
      [14046.101458] =================================
      [14046.102850] [ INFO: inconsistent lock state ]
      [14046.102850] 3.14.0-rc4+ #2 Not tainted
      [14046.102850] ---------------------------------
      [14046.102850] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
      [14046.102850] kswapd0/14 [HC0[0]:SC0[0]:HE1:SE1] takes:
      [14046.102850]  (&xfs_dir_ilock_class){++++?+}, at: [<791a04bb>] xfs_ilock+0xff/0x16a
      [14046.102850] {RECLAIM_FS-ON-W} state was registered at:
      [14046.102850]   [<7904cdb1>] mark_held_locks+0x81/0xe7
      [14046.102850]   [<7904d390>] lockdep_trace_alloc+0x5c/0xb4
      [14046.102850]   [<790c2c28>] kmem_cache_alloc_trace+0x2b/0x11e
      [14046.102850]   [<790ba7f4>] vm_map_ram+0x119/0x3e6
      [14046.102850]   [<7914e124>] _xfs_buf_map_pages+0x5b/0xcf
      [14046.102850]   [<7914ed74>] xfs_buf_get_map+0x67/0x13f
      [14046.102850]   [<7917506f>] xfs_attr_rmtval_set+0x396/0x4d5
      [14046.102850]   [<7916e8bb>] xfs_attr_leaf_addname+0x18f/0x37d
      [14046.102850]   [<7916ed9e>] xfs_attr_set_int+0x2f5/0x3e8
      [14046.102850]   [<7916eefc>] xfs_attr_set+0x6b/0x74
      [14046.102850]   [<79168355>] xfs_xattr_set+0x61/0x81
      [14046.102850]   [<790e5b10>] generic_setxattr+0x59/0x68
      [14046.102850]   [<790e4c06>] __vfs_setxattr_noperm+0x58/0xce
      [14046.102850]   [<790e4d0a>] vfs_setxattr+0x8e/0x92
      [14046.102850]   [<790e4ddd>] setxattr+0xcf/0x159
      [14046.102850]   [<790e5423>] SyS_lsetxattr+0x88/0xbb
      [14046.102850]   [<79268438>] sysenter_do_call+0x12/0x36
      
      Now, we can't completely remove these traces - mainly because
      vm_map_ram() will do GFP_KERNEL allocation and that generates the
      above warning before we get into the reclaim code, but we can turn
      them all into false positive warnings.
      
      To do that, use the method that DM and other IO context code uses to
      avoid this problem: there is a process flag to tell memory reclaim
      not to do IO that we can set appropriately. That prevents GFP_KERNEL
      context reclaim being done from deep inside the vmalloc code in
      places we can't directly pass a GFP_NOFS context to. That interface
      has a pair of wrapper functions: memalloc_noio_save() and
      memalloc_noio_restore().
      
      Adding them around vm_map_ram and the vzalloc call in
      kmem_alloc_large() will prevent deadlocks and most lockdep reports
      for this issue. Also, convert the vzalloc() call in
      kmem_alloc_large() to use __vmalloc() so that we can pass the
      correct gfp context to the data page allocation routine inside
      __vmalloc() so that it is clear that GFP_NOFS context is important
      to this vmalloc call.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      ae687e58
    • Dave Chinner's avatar
      xfs: don't leak EFSBADCRC to userspace · ac75a1f7
      Dave Chinner authored
      While the verifier routines may return EFSBADCRC when a buffer has
      a bad CRC, we need to translate that to EFSCORRUPTED so that the
      higher layers treat the error appropriately and we return a
      consistent error to userspace. This fixes a xfs/005 regression.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      ac75a1f7
  3. 03 Feb, 2014 4 commits
    • Linus Torvalds's avatar
      Linus 3.14-rc1 · 38dbfb59
      Linus Torvalds authored
      38dbfb59
    • Linus Torvalds's avatar
      Merge branch 'parisc-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · 69048e01
      Linus Torvalds authored
      Pull parisc updates from Helge Deller:
       "The three major changes in this patchset is a implementation for
        flexible userspace memory maps, cache-flushing fixes (again), and a
        long-discussed ABI change to make EWOULDBLOCK the same value as
        EAGAIN.
      
        parisc has been the only platform where we had EWOULDBLOCK != EAGAIN
        to keep HP-UX compatibility.  Since we will probably never implement
        full HP-UX support, we prefer to drop this compatibility to make it
        easier for us with Linux userspace programs which mostly never checked
        for both values.  We don't expect major fall-outs because of this
        change, and if we face some, we will simply rebuild the necessary
        applications in the debian archives"
      
      * 'parisc-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        parisc: add flexible mmap memory layout support
        parisc: Make EWOULDBLOCK be equal to EAGAIN on parisc
        parisc: convert uapi/asm/stat.h to use native types only
        parisc: wire up sched_setattr and sched_getattr
        parisc: fix cache-flushing
        parisc/sti_console: prefer Linux fonts over built-in ROM fonts
      69048e01
    • Mikulas Patocka's avatar
      hpfs: optimize quad buffer loading · 1c0b8a7a
      Mikulas Patocka authored
      HPFS needs to load 4 consecutive 512-byte sectors when accessing the
      directory nodes or bitmaps.  We can't switch to 2048-byte block size
      because files are allocated in the units of 512-byte sectors.
      
      Previously, the driver would allocate a 2048-byte area using kmalloc,
      copy the data from four buffers to this area and eventually copy them
      back if they were modified.
      
      In the current implementation of the buffer cache, buffers are allocated
      in the pagecache.  That means that 4 consecutive 512-byte buffers are
      stored in consecutive areas in the kernel address space.  So, we don't
      need to allocate extra memory and copy the content of the buffers there.
      
      This patch optimizes the code to avoid copying the buffers.  It checks
      if the four buffers are stored in contiguous memory - if they are not,
      it falls back to allocating a 2048-byte area and copying data there.
      Signed-off-by: default avatarMikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1c0b8a7a
    • Mikulas Patocka's avatar
      hpfs: remember free space · 2cbe5c76
      Mikulas Patocka authored
      Previously, hpfs scanned all bitmaps each time the user asked for free
      space using statfs.  This patch changes it so that hpfs scans the
      bitmaps only once, remembes the free space and on next invocation of
      statfs it returns the value instantly.
      
      New versions of wine are hammering on the statfs syscall very heavily,
      making some games unplayable when they're stored on hpfs, with load
      times in minutes.
      
      This should be backported to the stable kernels because it fixes
      user-visible problem (excessive level load times in wine).
      Signed-off-by: default avatarMikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2cbe5c76
  4. 02 Feb, 2014 12 commits
  5. 01 Feb, 2014 12 commits
  6. 31 Jan, 2014 5 commits
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-3.14-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · 8a1f006a
      Linus Torvalds authored
      Pull NFS client bugfixes from Trond Myklebust:
       "Highlights:
      
         - Fix several races in nfs_revalidate_mapping
         - NFSv4.1 slot leakage in the pNFS files driver
         - Stable fix for a slot leak in nfs40_sequence_done
         - Don't reject NFSv4 servers that support ACLs with only ALLOW aces"
      
      * tag 'nfs-for-3.14-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
        nfs: initialize the ACL support bits to zero.
        NFSv4.1: Cleanup
        NFSv4.1: Clean up nfs41_sequence_done
        NFSv4: Fix a slot leak in nfs40_sequence_done
        NFSv4.1 free slot before resending I/O to MDS
        nfs: add memory barriers around NFS_INO_INVALID_DATA and NFS_INO_INVALIDATING
        NFS: Fix races in nfs_revalidate_mapping
        sunrpc: turn warn_gssd() log message into a dprintk()
        NFS: fix the handling of NFS_INO_INVALID_DATA flag in nfs_revalidate_mapping
        nfs: handle servers that support only ALLOW ACE type.
      8a1f006a
    • Linus Torvalds's avatar
      Merge tag 'sound-fix-3.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 14864a52
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "The big chunks here are the updates for oxygen driver for Xonar DG
        devices, which were slipped from the previous pull request.  They are
        device-specific and thus not too dangerous.
      
        Other than that, all patches are small bug fixes, mainly for Samsung
        build fixes, a few HD-audio enhancements, and other misc ASoC fixes.
        (And this time ASoC merge is less than Octopus, lucky seven :)"
      
      * tag 'sound-fix-3.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (42 commits)
        ALSA: hda/hdmi - allow PIN_OUT to be dynamically enabled
        ALSA: hda - add headset mic detect quirks for another Dell laptop
        ALSA: oxygen: Xonar DG(X): cleanup and minor changes
        ALSA: oxygen: Xonar DG(X): modify high-pass filter control
        ALSA: oxygen: Xonar DG(X): modify input select functions
        ALSA: oxygen: Xonar DG(X): modify capture volume functions
        ALSA: oxygen: Xonar DG(X): use headphone volume control
        ALSA: oxygen: Xonar DG(X): modify playback output select
        ALSA: oxygen: Xonar DG(X): capture from I2S channel 1, not 2
        ALSA: oxygen: Xonar DG(X): move the mixer code into another file
        ALSA: oxygen: modify CS4245 register dumping function
        ALSA: oxygen: modify adjust_dg_dac_routing function
        ALSA: oxygen: Xonar DG(X): modify DAC/ADC parameters function
        ALSA: oxygen: Xonar DG(X): modify initialization functions
        ALSA: oxygen: Xonar DG(X): add new CS4245 SPI functions
        ALSA: oxygen: additional definitions for the Xonar DG/DGX card
        ALSA: oxygen: change description of the xonar_dg.c file
        ALSA: oxygen: export oxygen_update_dac_routing symbol
        ALSA: oxygen: add mute mask for the OXYGEN_PLAY_ROUTING register
        ALSA: oxygen: modify the SPI writing function
        ...
      14864a52
    • Linus Torvalds's avatar
      Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending · 4e13c5d0
      Linus Torvalds authored
      Pull SCSI target updates from Nicholas Bellinger:
       "The highlights this round include:
      
        - add support for SCSI Referrals (Hannes)
        - add support for T10 DIF into target core (nab + mkp)
        - add support for T10 DIF emulation in FILEIO + RAMDISK backends (Sagi + nab)
        - add support for T10 DIF -> bio_integrity passthrough in IBLOCK backend (nab)
        - prep changes to iser-target for >= v3.15 T10 DIF support (Sagi)
        - add support for qla2xxx N_Port ID Virtualization - NPIV (Saurav + Quinn)
        - allow percpu_ida_alloc() to receive task state bitmask (Kent)
        - fix >= v3.12 iscsi-target session reset hung task regression (nab)
        - fix >= v3.13 percpu_ref se_lun->lun_ref_active race (nab)
        - fix a long-standing network portal creation race (Andy)"
      
      * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (51 commits)
        target: Fix percpu_ref_put race in transport_lun_remove_cmd
        target/iscsi: Fix network portal creation race
        target: Report bad sector in sense data for DIF errors
        iscsi-target: Convert gfp_t parameter to task state bitmask
        iscsi-target: Fix connection reset hang with percpu_ida_alloc
        percpu_ida: Make percpu_ida_alloc + callers accept task state bitmask
        iscsi-target: Pre-allocate more tags to avoid ack starvation
        qla2xxx: Configure NPIV fc_vport via tcm_qla2xxx_npiv_make_lport
        qla2xxx: Enhancements to enable NPIV support for QLOGIC ISPs with TCM/LIO.
        qla2xxx: Fix scsi_host leak on qlt_lport_register callback failure
        IB/isert: pass scatterlist instead of cmd to fast_reg_mr routine
        IB/isert: Move fastreg descriptor creation to a function
        IB/isert: Avoid frwr notation, user fastreg
        IB/isert: seperate connection protection domains and dma MRs
        tcm_loop: Enable DIF/DIX modes in SCSI host LLD
        target/rd: Add DIF protection into rd_execute_rw
        target/rd: Add support for protection SGL setup + release
        target/rd: Refactor rd_build_device_space + rd_release_device_space
        target/file: Add DIF protection support to fd_execute_rw
        target/file: Add DIF protection init/format support
        ...
      4e13c5d0
    • Lorenzo Pieralisi's avatar
      drivers: bus: fix CCI driver kcalloc call parameters swap · 7c762036
      Lorenzo Pieralisi authored
      This patch fixes a bug/typo in the CCI driver kcalloc usage
      that inadvertently swapped the parameters order in the
      kcalloc call and went unnoticed.
      Reported-by: default avatarXia Feng <xiafeng@allwinnertech.com>
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: default avatarOlof Johansson <olof@lixom.net>
      7c762036
    • Tim Kryger's avatar
      ARM: dts: bcm28155-ap: Fix Card Detection GPIO · 94db37ad
      Tim Kryger authored
      The board schematic states that the "SD_CARD_DET_N gets pulled to GND
      when card is inserted" so the polarity has been updated to active low.
      
      Polarity is now specified with a GPIO define instead of a magic number.
      Signed-off-by: default avatarTim Kryger <tim.kryger@linaro.org>
      Reviewed-by: default avatarMatt Porter <matt.porter@linaro.org>
      Signed-off-by: default avatarOlof Johansson <olof@lixom.net>
      94db37ad