1. 08 May, 2013 40 commits
    • Alex Hung's avatar
      hp-wmi: add more definitions for new event_id's · d9e290a0
      Alex Hung authored
      New HP laptops start generating new events, and hp-wmi prints unknown
      event_ids for them. This patch also removes these messages
      Signed-off-by: default avatarAlex Hung <alex.hung@canonical.com>
      Signed-off-by: default avatarMatthew Garrett <matthew.garrett@nebula.com>
      d9e290a0
    • David Woodhouse's avatar
      dell-laptop: Fix krealloc() misuse in parse_da_table() · a30450c7
      David Woodhouse authored
      If krealloc() returns NULL, it *doesn't* free the original. So any code
      of the form 'foo = krealloc(foo, …);' is almost certainly a bug.
      Signed-off-by: default avatarDavid Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: default avatarMatthew Garrett <matthew.garrett@nebula.com>
      a30450c7
    • Shuah Khan's avatar
      hp_accel: Ignore the error from lis3lv02d_poweron() at resume · 77838199
      Shuah Khan authored
      The error in lis3lv02_poweron() is harmless in the resume path, so
      we should ignore it. It is inline with the other usages of lis3lv02_poweron()
      and matches the 3.0 code for this routine. This patch is in suse git and
      might have missed making it into the mainline.
      opensuse - commit id: 66ccdac87c322cf7af12bddba8c805af640b1cff
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarShuah Khan <shuah.khan@hp.com>
      CC: stable@vger.kernel.org 3.8, 3.4, 3.5, 3.2
      Signed-off-by: default avatarMatthew Garrett <matthew.garrett@nebula.com>
      77838199
    • AceLan Kao's avatar
      dell: add new dell WMI format for the AIO machines · 5dd760b8
      AceLan Kao authored
      There is a new DELL WMI spec. with new WMI event format.
      I'm working on the AIO machines, but I think the new format will apply to
      all the Dell's machines, not only for AIO, which will be released later
      this year.
      
      The new format of the WMI buffer is shown as below
      word 0 - the number of words following in the WMI buffer(not including
              this word.
      word 1 - the event type
      	0x0000 - A hot key is pressed or an event occurred
      	0x000F - A sequence of hot keys are pressed
      word 2 and on - the event data
      Signed-off-by: default avatarAceLan Kao <acelan.kao@canonical.com>
      Signed-off-by: default avatarMatthew Garrett <matthew.garrett@nebula.com>
      5dd760b8
    • Linus Torvalds's avatar
      Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband · e0fd9aff
      Linus Torvalds authored
      Pull InfiniBand/RDMA changes from Roland Dreier:
       - XRC transport fixes
       - Fix DHCP on IPoIB
       - mlx4 preparations for flow steering
       - iSER fixes
       - miscellaneous other fixes
      
      * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (23 commits)
        IB/iser: Add support for iser CM REQ additional info
        IB/iser: Return error to upper layers on EAGAIN registration failures
        IB/iser: Move informational messages from error to info level
        IB/iser: Add module version
        mlx4_core: Expose a few helpers to fill DMFS HW strucutures
        mlx4_core: Directly expose fields of DMFS HW rule control segment
        mlx4_core: Change a few DMFS fields names to match firmare spec
        mlx4: Match DMFS promiscuous field names to firmware spec
        mlx4_core: Move DMFS HW structs to common header file
        IB/mlx4: Set link type for RAW PACKET QPs in the QP context
        IB/mlx4: Disable VLAN stripping for RAW PACKET QPs
        mlx4_core: Reduce warning message for SRQ_LIMIT event to debug level
        RDMA/iwcm: Don't touch cmid after dropping reference
        IB/qib: Correct qib_verbs_register_sysfs() error handling
        IB/ipath: Correct ipath_verbs_register_sysfs() error handling
        RDMA/cxgb4: Fix SQ allocation when on-chip SQ is disabled
        SRPT: Fix odd use of WARN_ON()
        IPoIB: Fix ipoib_hard_header() return value
        RDMA: Rename random32() to prandom_u32()
        RDMA/cxgb3: Fix uninitialized variable
        ...
      e0fd9aff
    • Linus Torvalds's avatar
      Merge tag 'arm64-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64 · 3d15b798
      Linus Torvalds authored
      Pull arm64 update from Catalin Marinas:
      
       - Since drivers/irqchip/irq-gic.c no longer has dependencies on arm32
         specifics (the 'gic' branch merged), it can be enabled on arm64.
      
       - Enable arm64 support for poweroff/restart (for code under
         drivers/power/reset/).
      
       - Fixes (dts file, exception handling, bitops)
      
      * tag 'arm64-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64:
        arm64: Treat the bitops index argument as an 'int'
        arm64: Ignore the 'write' ESR flag on cache maintenance faults
        arm64: dts: fix #address-cells for foundation-v8
        arm64: vexpress: Add support for poweroff/restart
        arm64: Enable support for the ARM GIC interrupt controller
      3d15b798
    • Linus Torvalds's avatar
      Merge tag 'f2fs-for-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs · 942d33da
      Linus Torvalds authored
      Pull f2fs updates from Jaegeuk Kim:
       "This patch-set includes the following major enhancement patches.
         - introduce a new gloabl lock scheme
         - add tracepoints on several major functions
         - fix the overall cleaning process focused on victim selection
         - apply the block plugging to merge IOs as much as possible
         - enhance management of free nids and its list
         - enhance the readahead mode for node pages
         - address several cretical deadlock conditions
         - reduce lock_page calls
      
        The other minor bug fixes and enhancements are as follows.
         - calculation mistakes: overflow
         - bio types: READ, READA, and READ_SYNC
         - fix the recovery flow, data races, and null pointer errors"
      
      * tag 'f2fs-for-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (68 commits)
        f2fs: cover free_nid management with spin_lock
        f2fs: optimize scan_nat_page()
        f2fs: code cleanup for scan_nat_page() and build_free_nids()
        f2fs: bugfix for alloc_nid_failed()
        f2fs: recover when journal contains deleted files
        f2fs: continue to mount after failing recovery
        f2fs: avoid deadlock during evict after f2fs_gc
        f2fs: modify the number of issued pages to merge IOs
        f2fs: remove useless #include <linux/proc_fs.h> as we're now using sysfs as debug entry.
        f2fs: fix inconsistent using of NM_WOUT_THRESHOLD
        f2fs: check truncation of mapping after lock_page
        f2fs: enhance alloc_nid and build_free_nids flows
        f2fs: add a tracepoint on f2fs_new_inode
        f2fs: check nid == 0 in add_free_nid
        f2fs: add REQ_META about metadata requests for submit
        f2fs: give a chance to merge IOs by IO scheduler
        f2fs: avoid frequent background GC
        f2fs: add tracepoints to debug checkpoint request
        f2fs: add tracepoints for write page operations
        f2fs: add tracepoints to debug the block allocation
        ...
      942d33da
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rkuo/linux-hexagon-kernel · 246e6a0d
      Linus Torvalds authored
      Pull Hexagon fixes from Richard Kuo:
       "A bug fix and a Kconfig cleanup"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rkuo/linux-hexagon-kernel:
        HEXAGON: Remove non existent reference to GENERIC_KERNEL_EXECVE & GENERIC_KERNEL_THREAD
        Hexagon: fix register used to call do_work_pending
      246e6a0d
    • Chris Mason's avatar
      mm/slab: Fix crash during slab init · 956e46ef
      Chris Mason authored
      Commit 8a965b3b ("mm, slab_common: Fix bootstrap creation of kmalloc
      caches") introduced a regression that caused us to crash early during
      boot.  The commit was introducing ordering of slab creation, making sure
      two odd-sized slabs were created after specific powers of two sizes.
      
      But, if any of the power of two slabs were created earlier during boot,
      slabs at index 1 or 2 might not get created at all.  This patch makes
      sure none of the slabs get skipped.
      
      Tony Lindgren bisected this down to the offending commit, which really
      helped because bisect kept bringing me to almost but not quite this one.
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Acked-by: default avatarTony Lindgren <tony@atomide.com>
      Acked-by: default avatarSoren Brinkmann <soren.brinkmann@xilinx.com>
      Tested-by: default avatarTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Tested-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      956e46ef
    • Roland Dreier's avatar
    • Linus Torvalds's avatar
      Merge branch 'for-3.10/drivers' of git://git.kernel.dk/linux-block · ebb37277
      Linus Torvalds authored
      Pull block driver updates from Jens Axboe:
       "It might look big in volume, but when categorized, not a lot of
        drivers are touched.  The pull request contains:
      
         - mtip32xx fixes from Micron.
      
         - A slew of drbd updates, this time in a nicer series.
      
         - bcache, a flash/ssd caching framework from Kent.
      
         - Fixes for cciss"
      
      * 'for-3.10/drivers' of git://git.kernel.dk/linux-block: (66 commits)
        bcache: Use bd_link_disk_holder()
        bcache: Allocator cleanup/fixes
        cciss: bug fix to prevent cciss from loading in kdump crash kernel
        cciss: add cciss_allow_hpsa module parameter
        drivers/block/mg_disk.c: add CONFIG_PM_SLEEP to suspend/resume functions
        mtip32xx: Workaround for unaligned writes
        bcache: Make sure blocksize isn't smaller than device blocksize
        bcache: Fix merge_bvec_fn usage for when it modifies the bvm
        bcache: Correctly check against BIO_MAX_PAGES
        bcache: Hack around stuff that clones up to bi_max_vecs
        bcache: Set ra_pages based on backing device's ra_pages
        bcache: Take data offset from the bdev superblock.
        mtip32xx: mtip32xx: Disable TRIM support
        mtip32xx: fix a smatch warning
        bcache: Disable broken btree fuzz tester
        bcache: Fix a format string overflow
        bcache: Fix a minor memory leak on device teardown
        bcache: Documentation updates
        bcache: Use WARN_ONCE() instead of __WARN()
        bcache: Add missing #include <linux/prefetch.h>
        ...
      ebb37277
    • Linus Torvalds's avatar
      Merge branch 'for-3.10/core' of git://git.kernel.dk/linux-block · 4de13d7a
      Linus Torvalds authored
      Pull block core updates from Jens Axboe:
      
       - Major bit is Kents prep work for immutable bio vecs.
      
       - Stable candidate fix for a scheduling-while-atomic in the queue
         bypass operation.
      
       - Fix for the hang on exceeded rq->datalen 32-bit unsigned when merging
         discard bios.
      
       - Tejuns changes to convert the writeback thread pool to the generic
         workqueue mechanism.
      
       - Runtime PM framework, SCSI patches exists on top of these in James'
         tree.
      
       - A few random fixes.
      
      * 'for-3.10/core' of git://git.kernel.dk/linux-block: (40 commits)
        relay: move remove_buf_file inside relay_close_buf
        partitions/efi.c: replace useless kzalloc's by kmalloc's
        fs/block_dev.c: fix iov_shorten() criteria in blkdev_aio_read()
        block: fix max discard sectors limit
        blkcg: fix "scheduling while atomic" in blk_queue_bypass_start
        Documentation: cfq-iosched: update documentation help for cfq tunables
        writeback: expose the bdi_wq workqueue
        writeback: replace custom worker pool implementation with unbound workqueue
        writeback: remove unused bdi_pending_list
        aoe: Fix unitialized var usage
        bio-integrity: Add explicit field for owner of bip_buf
        block: Add an explicit bio flag for bios that own their bvec
        block: Add bio_alloc_pages()
        block: Convert some code to bio_for_each_segment_all()
        block: Add bio_for_each_segment_all()
        bounce: Refactor __blk_queue_bounce to not use bi_io_vec
        raid1: use bio_copy_data()
        pktcdvd: Use bio_reset() in disabled code to kill bi_idx usage
        pktcdvd: use bio_copy_data()
        block: Add bio_copy_data()
        ...
      4de13d7a
    • Jaegeuk Kim's avatar
      f2fs: cover free_nid management with spin_lock · 59bbd474
      Jaegeuk Kim authored
      After build_free_nids() searches free nid candidates from nat pages and
      current journal blocks, it checks all the candidates if they are allocated
      so that the nat cache has its nid with an allocated block address.
      
      In this procedure, previously we used
          list_for_each_entry_safe(fnid, next_fnid, &nm_i->free_nid_list, list).
      But, this is not covered by free_nid_list_lock, resulting in null pointer bug.
      
      This patch moves this checking routine inside add_free_nid() in order not to use
      the spin_lock.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
      59bbd474
    • Haicheng Li's avatar
      f2fs: optimize scan_nat_page() · 23d38844
      Haicheng Li authored
      When nm_i->fcnt > 2 * MAX_FREE_NIDS, stop scanning other NAT entries.
      Signed-off-by: default avatarHaicheng Li <haicheng.li@linux.intel.com>
      [Jaegeuk Kim: fix handling the return value of add_free_nid()]
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
      23d38844
    • Haicheng Li's avatar
      f2fs: code cleanup for scan_nat_page() and build_free_nids() · 8760952d
      Haicheng Li authored
      This patch does two cleanups:
      1. remove unused variable "fcnt" in build_free_nids().
      2. make scan_nat_page() as void type and remove useless variable "fcnt".
      Signed-off-by: default avatarHaicheng Li <haicheng.li@linux.intel.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
      8760952d
    • Haicheng Li's avatar
      f2fs: bugfix for alloc_nid_failed() · 95630cba
      Haicheng Li authored
      Directly drop the free_nid cache when nm_i->fcnt > 2 * MAX_FREE_NIDS
      
      Since there is NOT nmi->free_nid_list_lock spinlock protection between
      a sequential calling of alloc_nid() and alloc_nid_failed(), some other
      threads may already add new free_nid to the free_nid_list during this
      period.
      
      We need to make sure nmi->fcnt is never > 2 * MAX_FREE_NIDS.
      Signed-off-by: default avatarHaicheng Li <haicheng.li@linux.intel.com>
      [Jaegeuk Kim: fit the coding style]
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
      95630cba
    • Chris Fries's avatar
      f2fs: recover when journal contains deleted files · 047184b4
      Chris Fries authored
      When recovering a journal file with fsync data for files that have
      been deleted, don't bail out on recovery.
      Signed-off-by: default avatarChris Fries <C.Fries@motorola.com>
      Reviewed-by: default avatarRussell Knize <rknize2@motorola.com>
      Reviewed-by: default avatarJason Hrycay <jason.hrycay@motorola.com>
      [Jaegeuk Kim: fit the coding style]
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
      047184b4
    • Chris Fries's avatar
      f2fs: continue to mount after failing recovery · bde582b2
      Chris Fries authored
      When unable to roll forward the journal, we shouldn't bail out and
      not mount, we should continue to attempt the mount.  Bad recovery data
      is likely unrecoverable at this point, and requiring the user to try
      to mount again doesn't solve any issues.
      Signed-off-by: default avatarChris Fries <C.Fries@motorola.com>
      Reviewed-by: default avatarRussell Knize <rknize2@motorola.com>
      Reviewed-by: default avatarJason Hrycay <jason.hrycay@motorola.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
      bde582b2
    • Jaegeuk Kim's avatar
      f2fs: avoid deadlock during evict after f2fs_gc · 531ad7d5
      Jaegeuk Kim authored
      o Deadlock case #1
      
      Thread 1:
      - writeback_sb_inodes
       - do_writepages
        - f2fs_write_data_pages
         - write_cache_pages
          - f2fs_write_data_page
           - f2fs_balance_fs
            - wait mutex_lock(gc_mutex)
      
      Thread 2:
      - f2fs_balance_fs
       - mutex_lock(gc_mutex)
       - f2fs_gc
        - f2fs_iget
         - wait iget_locked(inode->i_lock)
      
      Thread 3:
      - do_unlinkat
       - iput
        - lock(inode->i_lock)
         - evict
          - inode_wait_for_writeback
      
      o Deadlock case #2
      
      Thread 1:
      - __writeback_single_inode
       : set I_SYNC
        - do_writepages
         - f2fs_write_data_page
          - f2fs_balance_fs
           - f2fs_gc
            - iput
             - evict
              - inode_wait_for_writeback(I_SYNC)
      
      In order to avoid this, even though iput is called with the zero-reference
      count, we need to stop the eviction procedure if the inode is on writeback.
      So this patch links f2fs_drop_inode which checks the I_SYNC flag.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
      531ad7d5
    • Catalin Marinas's avatar
      arm64: Treat the bitops index argument as an 'int' · 420c158d
      Catalin Marinas authored
      The bitops prototype use an 'int' as the bit index type but the asm
      implementation assume it to be a 'long'. Since the compiler does not
      guarantee zeroing the upper 32-bits in a register when used as 'int',
      change the bitops implementation accordingly.
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      420c158d
    • Catalin Marinas's avatar
      arm64: Ignore the 'write' ESR flag on cache maintenance faults · 0e7f7bcc
      Catalin Marinas authored
      ESR.WnR bit is always set on data cache maintenance faults even though
      the page is not required to have write permission. If a translation
      fault (page not yet mapped) happens for read-only user address range,
      Linux incorrectly assumes a permission fault. This patch adds the check
      of the ESR.CM bit during the page fault handling to ignore the 'write'
      flag.
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reported-by: default avatarTim Northover <Tim.Northover@arm.com>
      Cc: stable@vger.kernel.org
      0e7f7bcc
    • Mark Rutland's avatar
      arm64: dts: fix #address-cells for foundation-v8 · ed1f2363
      Mark Rutland authored
      Commit 90556ca1 ("arm64: vexpress: Add dts files for the ARMv8 RTSM
      models") added foundation-v8.dts, but erroneously set
      /cpus/#address-cells = <1> while providing two cells in each cpus/cpu@N
      node's reg property.
      
      As of commit ea393a2e ("arm64: smp: honour #address-size when parsing
      CPU reg property") we read in as many address cells as specified rather
      than always reading two. This means that for foundation-v8.dts, we only
      read the first reg cell (zero) for each cpu node, and receive a lot of
      warnings at boot of the form "/cpus/cpu@1: duplicate cpu reg properties
      in the DT".
      
      This patch corrects foundation-v8.dts to have the correct value for
      /cpus/#address-cells.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Pawel Moll <pawel.moll@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Tested-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      ed1f2363
    • Catalin Marinas's avatar
      arm64: vexpress: Add support for poweroff/restart · aa1e8ec1
      Catalin Marinas authored
      This patch adds the arm_pm_poweroff definition expected by the
      vexpress-poweroff.c driver and enables the latter for arm64.
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: default avatarPawel Moll <pawel.moll@arm.com>
      aa1e8ec1
    • Catalin Marinas's avatar
      arm64: Enable support for the ARM GIC interrupt controller · c4188edc
      Catalin Marinas authored
      This patch enables ARM_GIC on the arm64 kernel.
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      c4188edc
    • Catalin Marinas's avatar
      Merge branch 'gic' into HEAD · e6b6dc7f
      Catalin Marinas authored
      * arm64-prep-gic:
        irqchip: gic: Perform the gic_secondary_init() call via CPU notifier
        irqchip: gic: Call handle_bad_irq() directly
        arm: Move chained_irq_(enter|exit) to a generic file
        arm: Move the set_handle_irq and handle_arch_irq declarations to asm/irq.h
      e6b6dc7f
    • Linus Torvalds's avatar
      Merge branch 'akpm' (incoming from Andrew) · 5af43c24
      Linus Torvalds authored
      Merge more incoming from Andrew Morton:
      
       - Various fixes which were stalled or which I picked up recently
      
       - A large rotorooting of the AIO code.  Allegedly to improve
         performance but I don't really have good performance numbers (I might
         have lost the email) and I can't raise Kent today.  I held this out
         of 3.9 and we could give it another cycle if it's all too late/scary.
      
      I ended up taking only the first two thirds of the AIO rotorooting.  I
      left the percpu parts and the batch completion for later.  - Linus
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (33 commits)
        aio: don't include aio.h in sched.h
        aio: kill ki_retry
        aio: kill ki_key
        aio: give shared kioctx fields their own cachelines
        aio: kill struct aio_ring_info
        aio: kill batch allocation
        aio: change reqs_active to include unreaped completions
        aio: use cancellation list lazily
        aio: use flush_dcache_page()
        aio: make aio_read_evt() more efficient, convert to hrtimers
        wait: add wait_event_hrtimeout()
        aio: refcounting cleanup
        aio: make aio_put_req() lockless
        aio: do fget() after aio_get_req()
        aio: dprintk() -> pr_debug()
        aio: move private stuff out of aio.h
        aio: add kiocb_cancel()
        aio: kill return value of aio_complete()
        char: add aio_{read,write} to /dev/{null,zero}
        aio: remove retry-based AIO
        ...
      5af43c24
    • Kent Overstreet's avatar
      aio: don't include aio.h in sched.h · a27bb332
      Kent Overstreet authored
      Faster kernel compiles by way of fewer unnecessary includes.
      
      [akpm@linux-foundation.org: fix fallout]
      [akpm@linux-foundation.org: fix build]
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a27bb332
    • Kent Overstreet's avatar
      aio: kill ki_retry · 41ef4eb8
      Kent Overstreet authored
      Thanks to Zach Brown's work to rip out the retry infrastructure, we don't
      need this anymore - ki_retry was only called right after the kiocb was
      initialized.
      
      This also refactors and trims some duplicated code, as well as cleaning up
      the refcounting/error handling a bit.
      
      [akpm@linux-foundation.org: use fmode_t in aio_run_iocb()]
      [akpm@linux-foundation.org: fix file_start_write/file_end_write tests]
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      41ef4eb8
    • Kent Overstreet's avatar
      aio: kill ki_key · 8a660890
      Kent Overstreet authored
      ki_key wasn't actually used for anything previously - it was always 0.
      Drop it to trim struct kiocb a bit.
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8a660890
    • Kent Overstreet's avatar
      aio: give shared kioctx fields their own cachelines · 4e23bcae
      Kent Overstreet authored
      [akpm@linux-foundation.org: make reqs_active __cacheline_aligned_in_smp]
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4e23bcae
    • Kent Overstreet's avatar
      aio: kill struct aio_ring_info · 58c85dc2
      Kent Overstreet authored
      struct aio_ring_info was kind of odd, the only place it's used is where
      it's embedded in struct kioctx - there's no real need for it.
      
      The next patch rearranges struct kioctx and puts various things on their
      own cachelines - getting rid of struct aio_ring_info now makes that
      reordering a bit clearer.
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      58c85dc2
    • Kent Overstreet's avatar
      aio: kill batch allocation · a1c8eae7
      Kent Overstreet authored
      Previously, allocating a kiocb required touching quite a few global
      (well, per kioctx) cachelines...  so batching up allocation to amortize
      those was worthwhile.  But we've gotten rid of some of those, and in
      another couple of patches kiocb allocation won't require writing to any
      shared cachelines, so that means we can just rip this code out.
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a1c8eae7
    • Kent Overstreet's avatar
      aio: change reqs_active to include unreaped completions · 3e845ce0
      Kent Overstreet authored
      The aio code tries really hard to avoid having to deal with the
      completion ringbuffer overflowing.  To do that, it has to keep track of
      the number of outstanding kiocbs, and the number of completions
      currently in the ringbuffer - and it's got to check that every time we
      allocate a kiocb.  Ouch.
      
      But - we can improve this quite a bit if we just change reqs_active to
      mean "number of outstanding requests and unreaped completions" - that
      means kiocb allocation doesn't have to look at the ringbuffer, which is
      a fairly significant win.
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3e845ce0
    • Kent Overstreet's avatar
      aio: use cancellation list lazily · 0460fef2
      Kent Overstreet authored
      Cancelling kiocbs requires adding them to a per kioctx linked list,
      which is one of the few things we need to take the kioctx lock for in
      the fast path.  But most kiocbs can't be cancelled - so if we just do
      this lazily, we can avoid quite a bit of locking overhead.
      
      While we're at it, instead of using a flag bit switch to using ki_cancel
      itself to indicate that a kiocb has been cancelled/completed.  This lets
      us get rid of ki_flags entirely.
      
      [akpm@linux-foundation.org: remove buggy BUG()]
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0460fef2
    • Kent Overstreet's avatar
      aio: use flush_dcache_page() · 21b40200
      Kent Overstreet authored
      This wasn't causing problems before because it's not needed on x86, but
      it is needed on other architectures.
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      21b40200
    • Kent Overstreet's avatar
      aio: make aio_read_evt() more efficient, convert to hrtimers · a31ad380
      Kent Overstreet authored
      Previously, aio_read_event() pulled a single completion off the
      ringbuffer at a time, locking and unlocking each time.  Change it to
      pull off as many events as it can at a time, and copy them directly to
      userspace.
      
      This also fixes a bug where if copying the event to userspace failed,
      we'd lose the event.
      
      Also convert it to wait_event_interruptible_hrtimeout(), which
      simplifies it quite a bit.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a31ad380
    • Kent Overstreet's avatar
      wait: add wait_event_hrtimeout() · 774a08b3
      Kent Overstreet authored
      Analagous to wait_event_timeout() and friends, this adds
      wait_event_hrtimeout() and wait_event_interruptible_hrtimeout().
      
      Note that unlike the versions that use regular timers, these don't
      return the amount of time remaining when they return - instead, they
      return 0 or -ETIME if they timed out.  because I was uncomfortable with
      the semantics of doing it the other way (that I could get it right,
      anyways).
      
      If the timer expires, there's no real guarantee that expire_time -
      current_time would be <= 0 - due to timer slack certainly, and I'm not
      sure I want to know the implications of the different clock bases in
      hrtimers.
      
      If the timer does expire and the code calculates that the time remaining
      is nonnegative, that could be even worse if the calling code then reuses
      that timeout.  Probably safer to just return 0 then, but I could imagine
      weird bugs or at least unintended behaviour arising from that too.
      
      I came to the conclusion that if other users end up actually needing the
      amount of time remaining, the sanest thing to do would be to create a
      version that uses absolute timeouts instead of relative.
      
      [akpm@linux-foundation.org: fix description of `timeout' arg]
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      774a08b3
    • Kent Overstreet's avatar
      aio: refcounting cleanup · 36f55889
      Kent Overstreet authored
      The usage of ctx->dead was fubar - it makes no sense to explicitly check
      it all over the place, especially when we're already using RCU.
      
      Now, ctx->dead only indicates whether we've dropped the initial
      refcount. The new teardown sequence is:
      
        set ctx->dead
        hlist_del_rcu();
        synchronize_rcu();
      
      Now we know no system calls can take a new ref, and it's safe to drop
      the initial ref:
      
        put_ioctx();
      
      We also need to ensure there are no more outstanding kiocbs.  This was
      done incorrectly - it was being done in kill_ctx(), and before dropping
      the initial refcount.  At this point, other syscalls may still be
      submitting kiocbs!
      
      Now, we cancel and wait for outstanding kiocbs in free_ioctx(), after
      kioctx->users has dropped to 0 and we know no more iocbs could be
      submitted.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      36f55889
    • Kent Overstreet's avatar
      aio: make aio_put_req() lockless · 11599eba
      Kent Overstreet authored
      Freeing a kiocb needed to touch the kioctx for three things:
      
       * Pull it off the reqs_active list
       * Decrementing reqs_active
       * Issuing a wakeup, if the kioctx was in the process of being freed.
      
      This patch moves these to aio_complete(), for a couple reasons:
      
       * aio_complete() already has to issue the wakeup, so if we drop the
         kioctx refcount before aio_complete does its wakeup we don't have to
         do it twice.
       * aio_complete currently has to take the kioctx lock, so it makes sense
         for it to pull the kiocb off the reqs_active list too.
       * A later patch is going to change reqs_active to include unreaped
         completions - this will mean allocating a kiocb doesn't have to look
         at the ringbuffer. So taking the decrement of reqs_active out of
         kiocb_free() is useful prep work for that patch.
      
      This doesn't really affect cancellation, since existing (usb) code that
      implements a cancel function still calls aio_complete() - we just have
      to make sure that aio_complete does the necessary teardown for cancelled
      kiocbs.
      
      It does affect code paths where we free kiocbs that were never
      submitted; they need to decrement reqs_active and pull the kiocb off the
      reqs_active list.  This occurs in two places: kiocb_batch_free(), which
      is going away in a later patch, and the error path in io_submit_one.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Acked-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      11599eba
    • Kent Overstreet's avatar
      aio: do fget() after aio_get_req() · 1d98ebfc
      Kent Overstreet authored
      aio_get_req() will fail if we have the maximum number of requests
      outstanding, which depending on the application may not be uncommon.  So
      avoid doing an unnecessary fget().
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Acked-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1d98ebfc