1. 08 Nov, 2013 40 commits
    • Jens Axboe's avatar
      skd: cleanup the skd_*() function block wrapping · 6a5ec65b
      Jens Axboe authored
      Just call the block functions directly, don't wrap them
      in skd helpers. With only one queueing model enabled, there's
      no point in doing that.
      
      Also kill the ->start_time and ->bio from the skd_request_context,
      we don't use those anymore.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      6a5ec65b
    • Jens Axboe's avatar
      skd: rip out bio path · fcd37eb3
      Jens Axboe authored
      The skd driver has a selectable rq or bio based queueing model.
      For 3.14, we want to turn this into a single blk-mq interface
      instead. With the immutable biovecs being merged in 3.13, the
      bio model would need patches to even work. So rip it out, with
      a conversion pending for blk-mq in the next release.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      fcd37eb3
    • Wei Yongjun's avatar
      skd: fix error return code in skd_pci_probe() · 1762b57f
      Wei Yongjun authored
      Fix to return -ENOMEM in the skd construct error handling
      case instead of 0, as done elsewhere in this function.
      Signed-off-by: default avatarWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      1762b57f
    • Heiko Carstens's avatar
      s390/dasd: hold request queue sysfs lock when calling elevator_init() · ef089941
      Heiko Carstens authored
      "elevator: Fix a race in elevator switching and md device initialization"
      changed the semantics of elevator_init() in a way that now enforces to hold
      the corresponding request queue's sysfs_lock when calling elevator_init()
      to fix a race.
      The patch did not convert the s390 dasd device driver which is the only
      device driver which also calls elevator_init(). So add the missing locking.
      
      Cc: Tomoki Sekiyama <tomoki.sekiyama@hds.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      ef089941
    • Stephen M. Cameron's avatar
      cciss: return 0 from driver probe function on success, not 1 · b88fac63
      Stephen M. Cameron authored
      A return value of 1 is interpreted as an error
      Signed-off-by: default avatarStephen M. Cameron <scameron@beardog.cce.hp.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b88fac63
    • rchinthekindi's avatar
      skd: Replaced custom debug PRINTKs with pr_debug · 2e44b427
      rchinthekindi authored
      Replaced DPRINTK() and VPRINTK() with pr_debug().
      Signed-off-by: default avatarRamprasad C <ramprasad.chinthekindi@hgst.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      2e44b427
    • Akhil Bhansali's avatar
      skd: Fix checkpatch ERRORS and removed unused functions · f721bb0d
      Akhil Bhansali authored
      This patch fixes checkpatch.pl errors for assignment in if condition.
      It also removes unused readq / readl function calls.
      
      As Andrew had disabled the compilation of drivers for 32 bit,
      I have modified format specifiers in few VPRINTKs to avoid warnings
      during 64 bit compilation.
      Signed-off-by: default avatarAkhil Bhansali <abhansali@stec-inc.com>
      Reviewed-by: default avatarRamprasad Chinthekindi <rchinthekindi@stec-inc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f721bb0d
    • Philip J Kelleher's avatar
      rsxx: Fix possible kernel panic with invalid config. · 8c49a77c
      Philip J Kelleher authored
      This patch fixes a possible Kernel Panic on driver load if
      the configuration on the card is messed up or not yet set.
      The driver could possible give a 32 bit unsigned all Fs to
      the kernel as the device's block size.
      
      Now we only write the block size to the kernel if the
      configuration from the card is valid.
      
      Also, driver version is being updated.
      Signed-off-by: default avatarPhilip J Kelleher <pjk1939@linux.vnet.ibm.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      8c49a77c
    • Philip J Kelleher's avatar
      rsxx: Disallow discards from being unmapped. · e35f38bf
      Philip J Kelleher authored
      This patch fixes a bug in which discards were always
      calling pci_unmap_page. Discards should never call the
      pci_unmap_page function call because they are never mapped.
      
      This caused a race condition on PowerPC systems when issuing
      discards, writes, and reads all at the same time. The
      pci_map_page function would eventually map logical address
      0 for a read or write. Discards are always assigned a DMA
      address of 0 because they are never mapped. So if
      pci_map_page mapped address 0 for a DMA and a discard was
      "unmapped" then the address would be freed and would cause
      an EEH event to occur when Hardware accesses the address.
      
      This was injected/uncovered in commit:
      b347f9cf0bc8d42ee95ba1d3837fd93045ab336b
      
      The pci_dma_mapping_error function declares -1 a DMA_ERROR
      not 0 like initially thought So before we would never unmap
      discards because they were considered NULL.
      
      This patch should fall on top of commit id:
      fc1967bb08a6184ed44ef990e1dd4389901b809c
      
      Also, the driver version is being up dated.
      Signed-off-by: default avatarPhilip J Kelleher <pjk1939@linux.vnet.ibm.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e35f38bf
    • Lars Ellenberg's avatar
      drbd: avoid to shrink max_bio_size due to peer re-configuration · 35f47ef1
      Lars Ellenberg authored
      For a long time, the receiving side has spread "too large" incoming
      requests over multiple bios.  No need to shrink our max_bio_size
      (max_hw_sectors) if the peer is reconfigured to use a different storage.
      
      The problem manifests itself if we are not the top of the device stack
      (DRBD is used a LVM PV).
      
      A hardware reconfiguration on the peer may cause the supported
      max_bio_size to shrink, and the connection handshake would now
      unnecessarily shrink the max_bio_size on the active node.
      
      There is no way to notify upper layers that they have to "re-stack"
      their limits. So they won't notice at all, and may keep submitting bios
      that are suddenly considered "too large for device".
      
      We already check for compatibility and ignore changes on the peer,
      the code only was masked out unless we have a fully established connection.
      We just need to allow it a bit earlier during the handshake.
      
      Also consider max_hw_sectors in our merge bvec function, just in case.
      Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: default avatarLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      35f47ef1
    • Lars Ellenberg's avatar
      drbd: fix decoding of bitmap vli rle for device sizes > 64 TB · d2da5b0c
      Lars Ellenberg authored
      Symptoms: disconnect after bitmap exchange due to
      bitmap overflow (e:49731075554) while decoding bm RLE packet
      
      In the decoding step of the variable length integer run length encoding
      there was potentially an uncatched bitshift by wordsize (variable >> 64).
      
      The result of which is "undefined" :(
      (only "sometimes" the result is the desired 0)
      
      Fix: don't do any bit shift magic for shift == 64, just assign.
      Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: default avatarLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d2da5b0c
    • Philipp Reisner's avatar
      drbd: Fix adding of new minors with freshly created meta data · 57737adc
      Philipp Reisner authored
      Online adding of new minors with freshly created meta data
      to an resource with an established connection failed, with a
      wrong state transition on one side on one side of the new minor.
      
      Freshly created meta-data has a la_size (last agreed size) of 0.
      When we online add such devices, the code wrongly got into
      the code path for resyncing new storage that was added while
      the disk was detached.
      
      Fixed that by making the GREW from ZERO a special case.
      Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: default avatarLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      57737adc
    • Philipp Reisner's avatar
      drbd: Fix an connection drop issue after enabling allow-two-primaries · b874d231
      Philipp Reisner authored
      Since drbd-8.4.0 it is possible to change the allow-two-primaries
      network option while the connection is established.
      
      The sequence code used to partially order packets from the
      data socket with packets from the meta-data socket, still assued
      that the allow-two-primaries option is constant while the
      connection is established.
      
      I.e.
      On a node that has the RESOLVE_CONFLICTS bits set, after enabling
      allow-two-primaries, when receiving the next data packet it timed out
      while waiting for the necessary packets on the data socket to arrive
      (wait_for_and_update_peer_seq() function).
      
      Fixed that by always tracking the sequence number, but only waiting
      for it if allow-two-primaries is set.
      Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: default avatarLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b874d231
    • Lars Ellenberg's avatar
      drbd: fix NULL pointer deref in module init error path · 69babf05
      Lars Ellenberg authored
      If we want to iterate over the (as of yet still empty) list in the
      cleanup path, we need to initialize the list before the first goto fail.
      Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: default avatarLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      69babf05
    • Jens Axboe's avatar
      block: disable cpqarray in Kconfig · 7badfb1c
      Jens Axboe authored
      Mike writes:
      
      "cpqarray hasn't been used in over 12 years. It's doubtful that anyone
       still uses the board. It's time the driver was removed from the mainline
       kernel.  The only updates these days are minor and mostly done by people
       outside of HP."
      
      If nobody yells, we'll remove it from the kernel tree completely
      for 3.15.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7badfb1c
    • Akhil Bhansali's avatar
      Add support for sTec's pci-e flash card Kronos · e67f86b3
      Akhil Bhansali authored
      Signed-off-by: default avatarAkhil Bhansali <abhansali@stec-inc.com>
      Signed-off-by: default avatarRamprasad Chinthekindi <rchinthekindi@stec-inc.com>
      Reviewed-by: default avatarJeff Moyer <jmoyer@redhat.com>
      
      Folded patch, contributions to clean up this driver from:
      
      Jens Axboe
      Dan Carpenter
      Andrew Morton
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e67f86b3
    • Philip J Kelleher's avatar
      rsxx: Kernel Panic caused by mapping Discards · 0317cd6d
      Philip J Kelleher authored
      This fixes a kernel panic injected by commit id
      8d26750143341831bc312f61c5ed141eeb75b8d0 where discards
      are getting mapped through the pci_map_page function call.
      
      The driver will now start verifying that a dma is not a
      discard before issuing a the pci_map_page function call.
      
      Also, we are updating the driver version.
      Signed-off-by: default avatarPhilip J Kelleher <pjk1939@linux.vnet.ibm.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0317cd6d
    • David Milburn's avatar
      mtip32xx: dynamically allocate buffer in debugfs functions · c8afd0dc
      David Milburn authored
      Dynamically allocate buf to prevent warnings:
      
      drivers/block/mtip32xx/mtip32xx.c: In function ‘mtip_hw_read_device_status’:
      drivers/block/mtip32xx/mtip32xx.c:2823: warning: the frame size of 1056 bytes is larger than 1024 bytes
      drivers/block/mtip32xx/mtip32xx.c: In function ‘mtip_hw_read_registers’:
      drivers/block/mtip32xx/mtip32xx.c:2894: warning: the frame size of 1056 bytes is larger than 1024 bytes
      drivers/block/mtip32xx/mtip32xx.c: In function ‘mtip_hw_read_flags’:
      drivers/block/mtip32xx/mtip32xx.c:2917: warning: the frame size of 1056 bytes is larger than 1024 bytes
      Signed-off-by: default avatarDavid Milburn <dmilburn@redhat.com>
      Acked-by: default avatarAsai Thambi S P <asamymuthupa@micron.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c8afd0dc
    • Asai Thambi S P's avatar
      mtip32xx: Add SRSI support · 8f8b8995
      Asai Thambi S P authored
      This patch add support for SRSI(Surprise Removal Surprise Insertion).
      
      Approach:
      ---------
      Surprise Removal:
      -----------------
      On surprise removal of the device, gendisk, request queue, device index, sysfs
      entries, etc are retained as long as device is in use - mounted filesystem,
      device opened by an application, etc. The service thread breaks out of the main
      while loop, waits for pci remove to exit, and then waits for device to become
      free. When there no holders of the device, service thread cleans up the block
      and device related stuff and returns.
      
      Surprise Insertion:
      -------------------
      No change, this scenario follows the normal pci probe() function flow.
      Signed-off-by: default avatarAsai Thambi S P <asamymuthupa@micron.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      8f8b8995
    • Philip J Kelleher's avatar
      rsxx: Moving pci_map_page to prevent overflow. · 1b21f5b2
      Philip J Kelleher authored
      The pci_map_page function has been moved into our
      issued workqueue to prevent an us running out of
      mappable addresses on non-HWWD PCIe x8 slots. The
      maximum amount that can possible be mapped at one
      time now is: 255 dmas X 4 dma channels X 4096 Bytes.
      Signed-off-by: default avatarPhilip J Kelleher <pjk1939@linux.vnet.ibm.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      1b21f5b2
    • Philip J Kelleher's avatar
      rsxx: Handling failed pci_map_page on PowerPC and double free. · e5feab22
      Philip J Kelleher authored
      The rsxx driver was not checking the correct value during a
      pci_map_page failure. Fixing this also uncovered a
      double free if the bio was returned before it was
      broken up into indiviadual 4k dmas, that is also
      fixed here.
      Signed-off-by: default avatarPhilip J Kelleher <pjk1939@linux.vnet.ibm.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e5feab22
    • Mikulas Patocka's avatar
      loop: fix crash when using unassigned loop device · ef7e7c82
      Mikulas Patocka authored
      When the loop module is loaded, it creates 8 loop devices /dev/loop[0-7].
      The devices have no request routine and thus, when they are used without
      being assigned, a crash happens.
      
      For example, these commands cause crash (assuming there are no used loop
      devices):
      
      Kernel Fault: Code=26 regs=000000007f420980 (Addr=0000000000000010)
      CPU: 1 PID: 50 Comm: kworker/1:1 Not tainted 3.11.0 #1
      Workqueue: ksnaphd do_metadata [dm_snapshot]
      task: 000000007fcf4078 ti: 000000007f420000 task.ti: 000000007f420000
      [  116.319988]
           YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
      PSW: 00001000000001001111111100001111 Not tainted
      r00-03  000000ff0804ff0f 00000000408bf5d0 00000000402d8204 000000007b7ff6c0
      r04-07  00000000408a95d0 000000007f420950 000000007b7ff6c0 000000007d06c930
      r08-11  000000007f4205c0 0000000000000001 000000007f4205c0 000000007f4204b8
      r12-15  0000000000000010 0000000000000000 0000000000000000 0000000000000000
      r16-19  000000001108dd48 000000004061cd7c 000000007d859800 000000000800000f
      r20-23  0000000000000000 0000000000000008 0000000000000000 0000000000000000
      r24-27  00000000ffffffff 000000007b7ff6c0 000000007d859800 00000000408a95d0
      r28-31  0000000000000000 000000007f420950 000000007f420980 000000007f4208e8
      sr00-03  0000000000000000 0000000000000000 0000000000000000 0000000000303000
      sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000
      [  117.549988]
      IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000402d82fc 00000000402d8300
       IIR: 53820020    ISR: 0000000000000000  IOR: 0000000000000010
       CPU:        1   CR30: 000000007f420000 CR31: ffffffffffffffff
       ORIG_R28: 0000000000000001
       IAOQ[0]: generic_make_request+0x11c/0x1a0
       IAOQ[1]: generic_make_request+0x120/0x1a0
       RP(r2): generic_make_request+0x24/0x1a0
      Backtrace:
       [<00000000402d83f0>] submit_bio+0x70/0x140
       [<0000000011087c4c>] dispatch_io+0x234/0x478 [dm_mod]
       [<0000000011087f44>] sync_io+0xb4/0x190 [dm_mod]
       [<00000000110883bc>] dm_io+0x2c4/0x310 [dm_mod]
       [<00000000110bfcd0>] do_metadata+0x28/0xb0 [dm_snapshot]
       [<00000000401591d8>] process_one_work+0x160/0x460
       [<0000000040159bc0>] worker_thread+0x300/0x478
       [<0000000040161a70>] kthread+0x118/0x128
       [<0000000040104020>] end_fault_vector+0x20/0x28
       [<0000000040177220>] task_tick_fair+0x420/0x4d0
       [<00000000401aa048>] invoke_rcu_core+0x50/0x60
       [<00000000401ad5b8>] rcu_check_callbacks+0x210/0x8d8
       [<000000004014aaa0>] update_process_times+0xa8/0xc0
       [<00000000401ab86c>] rcu_process_callbacks+0x4b4/0x598
       [<0000000040142408>] __do_softirq+0x250/0x2c0
       [<00000000401789d0>] find_busiest_group+0x3c0/0xc70
      [  119.379988]
      Kernel panic - not syncing: Kernel Fault
      Rebooting in 1 seconds..
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      ef7e7c82
    • Vegard Nossum's avatar
      xen/blkback: fix reference counting · ea5ec76d
      Vegard Nossum authored
      If the permission check fails, we drop a reference to the blkif without
      having taken it in the first place. The bug was introduced in commit
      604c499c (xen/blkback: Check device
      permissions before allowing OP_DISCARD).
      
      Cc: stable@vger.kernel.org
      Cc: Jan Beulich <JBeulich@suse.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      ea5ec76d
    • Roger Pau Monne's avatar
      xen-blkfront: improve aproximation of required grants per request · c47206e2
      Roger Pau Monne authored
      Improve the calculation of required grants to process a request by
      using nr_phys_segments instead of always assuming a request is going
      to use all posible segments.
      
      nr_phys_segments contains the number of scatter-gather DMA addr+len
      pairs, which is basically what we put at every granted page.
      for_each_sg iterates over the DMA addr+len pairs and uses a grant
      page for each of them.
      Signed-off-by: default avatarRoger Pau Monné <roger.pau@citrix.com>
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c47206e2
    • Roger Pau Monne's avatar
      xen-blkfront: revoke foreign access for grants not mapped by the backend · fbe363c4
      Roger Pau Monne authored
      There's no need to keep the foreign access in a grant if it is not
      persistently mapped by the backend. This allows us to free grants that
      are not mapped by the backend, thus preventing blkfront from hoarding
      all grants.
      
      The main effect of this is that blkfront will only persistently map
      the same grants as the backend, and it will always try to use grants
      that are already mapped by the backend. Also the number of persistent
      grants in blkfront is the same as in blkback (and is controlled by the
      value in blkback).
      Signed-off-by: default avatarRoger Pau Monné <roger.pau@citrix.com>
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: default avatarMatt Wilson <msw@amazon.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      fbe363c4
    • Michael Opdenacker's avatar
      mg_disk: remove deprecated IRQF_DISABLED · 370d6686
      Michael Opdenacker authored
      This patch proposes to remove the use of the IRQF_DISABLED flag
      
      It's a NOOP since 2.6.35 and it will be removed one day.
      Signed-off-by: default avatarMichael Opdenacker <michael.opdenacker@free-electrons.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      370d6686
    • Duan Jiong's avatar
      block: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO · c7d1ba41
      Duan Jiong authored
      This patch fixes coccinelle error regarding usage of IS_ERR and
      PTR_ERR instead of PTR_ERR_OR_ZERO.
      Signed-off-by: default avatarDuan Jiong <duanj.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c7d1ba41
    • Duan Jiong's avatar
      block: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO · 8616ebb1
      Duan Jiong authored
      This patch fixes coccinelle error regarding usage of IS_ERR and
      PTR_ERR instead of PTR_ERR_OR_ZERO.
      Signed-off-by: default avatarDuan Jiong <duanj.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      8616ebb1
    • Geert Uytterhoeven's avatar
      block: Do not call sector_div() with a 64-bit divisor · 97597dc0
      Geert Uytterhoeven authored
      do_div() (called by sector_div() if CONFIG_LBDAF=y) is meant for divisions
      of 64-bit number by 32-bit numbers.  Passing 64-bit divisor types caused
      issues in the past on 32-bit platforms, cfr. commit
      ea077b1b ("m68k: Truncate base in
      do_div()").
      
      As queue_limits.max_discard_sectors and .discard_granularity are unsigned
      int, max_discard_sectors and granularity should be unsigned int.
      As bdev_discard_alignment() returns int, alignment should be int.
      Now 2 calls to sector_div() can be replaced by 32-bit arithmetic:
        - The 64-bit modulo operation can become a 32-bit modulo operation,
        - The 64-bit division and multiplication can be replaced by a 32-bit
          modulo operation and a subtraction.
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      97597dc0
    • Chen Gang's avatar
      kernel: trace: blktrace: remove redundent memcpy() in compat_blk_trace_setup() · f8c5e944
      Chen Gang authored
      do_blk_trace_setup() will fully initialize 'buts.name', so can remove
      the related memcpy(). And also use BLKTRACE_BDEV_SIZE and ARRAY_SIZE
      instead of hard code number '32'.
      Signed-off-by: default avatarChen Gang <gang.chen@asianux.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f8c5e944
    • Kent Overstreet's avatar
      block: Consolidate duplicated bio_trim() implementations · 6678d83f
      Kent Overstreet authored
      Someone cut and pasted md's md_trim_bio() into xen-blkfront.c. Come on,
      we should know better than this.
      Signed-off-by: default avatarKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      6678d83f
    • Kent Overstreet's avatar
      block: Use rw_copy_check_uvector() · e0ce0eac
      Kent Overstreet authored
      No need for silly open coding - and struct sg_iovec has exactly the same
      layout as struct iovec...
      Signed-off-by: default avatarKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e0ce0eac
    • Alireza Haghdoost's avatar
      block: Enable sysfs nomerge control for I/O requests in the plug list · 23779fbc
      Alireza Haghdoost authored
      This patch enables the sysfs to control I/O request merge
      functionality in the plug list. While this control has been
      implemented for the request queue, it was dismissed in the plug list.
      Therefore, block layer merges requests together (or attempt to merge)
      even if the merge capability was disable using sysfs nomerge parameter
      value 2.
      
      This limitation is directly affects functionality of io_submit()
      system call. The system call enables user to submit a bunch of IO
      requests from user space using struct iocb **ios input argument.
      However, the unconditioned merging functionality in the plug list
      potentially merges these requests together down the road. Therefore,
      there is no way to distinguish between an application sending bunch of
      sequential IOs and an application sending one big IO. Ultimately, all
      requests generated by the former app merge within the plug list
      together and looks similar to the second app.
      
      While the merging functionality is a desirable feature to improve the
      performance of IO subsystem for some applications, it is not useful
      for other application like ours at all.
      Signed-off-by: default avatarAlireza Haghdoost <alireza@cs.umn.edu>
      Reviewed-by: default avatarJeff Moyer <jmoyer@redhat.com>
      
      Coding style modified.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      23779fbc
    • Mike Snitzer's avatar
      block: properly stack underlying max_segment_size to DM device · d82ae52e
      Mike Snitzer authored
      Without this patch all DM devices will default to BLK_MAX_SEGMENT_SIZE
      (65536) even if the underlying device(s) have a larger value -- this is
      due to blk_stack_limits() using min_not_zero() when stacking the
      max_segment_size limit.
      
      1073741824
      
      before patch:
      65536
      
      after patch:
      1073741824
      Reported-by: default avatarLukasz Flis <l.flis@cyfronet.pl>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # v3.3+
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d82ae52e
    • Tomoki Sekiyama's avatar
      elevator: acquire q->sysfs_lock in elevator_change() · 7c8a3679
      Tomoki Sekiyama authored
      Add locking of q->sysfs_lock into elevator_change() (an exported function)
      to ensure it is held to protect q->elevator from elevator_init(), even if
      elevator_change() is called from non-sysfs paths.
      sysfs path (elv_iosched_store) uses __elevator_change(), non-locking
      version, as the lock is already taken by elv_iosched_store().
      Signed-off-by: default avatarTomoki Sekiyama <tomoki.sekiyama@hds.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7c8a3679
    • Tomoki Sekiyama's avatar
      elevator: Fix a race in elevator switching and md device initialization · eb1c160b
      Tomoki Sekiyama authored
      The soft lockup below happens at the boot time of the system using dm
      multipath and the udev rules to switch scheduler.
      
      [  356.127001] BUG: soft lockup - CPU#3 stuck for 22s! [sh:483]
      [  356.127001] RIP: 0010:[<ffffffff81072a7d>]  [<ffffffff81072a7d>] lock_timer_base.isra.35+0x1d/0x50
      ...
      [  356.127001] Call Trace:
      [  356.127001]  [<ffffffff81073810>] try_to_del_timer_sync+0x20/0x70
      [  356.127001]  [<ffffffff8118b08a>] ? kmem_cache_alloc_node_trace+0x20a/0x230
      [  356.127001]  [<ffffffff810738b2>] del_timer_sync+0x52/0x60
      [  356.127001]  [<ffffffff812ece22>] cfq_exit_queue+0x32/0xf0
      [  356.127001]  [<ffffffff812c98df>] elevator_exit+0x2f/0x50
      [  356.127001]  [<ffffffff812c9f21>] elevator_change+0xf1/0x1c0
      [  356.127001]  [<ffffffff812caa50>] elv_iosched_store+0x20/0x50
      [  356.127001]  [<ffffffff812d1d09>] queue_attr_store+0x59/0xb0
      [  356.127001]  [<ffffffff812143f6>] sysfs_write_file+0xc6/0x140
      [  356.127001]  [<ffffffff811a326d>] vfs_write+0xbd/0x1e0
      [  356.127001]  [<ffffffff811a3ca9>] SyS_write+0x49/0xa0
      [  356.127001]  [<ffffffff8164e899>] system_call_fastpath+0x16/0x1b
      
      This is caused by a race between md device initialization by multipathd and
      shell script to switch the scheduler using sysfs.
      
       - multipathd:
         SyS_ioctl -> do_vfs_ioctl -> dm_ctl_ioctl -> ctl_ioctl -> table_load
         -> dm_setup_md_queue -> blk_init_allocated_queue -> elevator_init
          q->elevator = elevator_alloc(q, e); // not yet initialized
      
       - sh -c 'echo deadline > /sys/$DEVPATH/queue/scheduler':
         elevator_switch (in the call trace above)
          struct elevator_queue *old = q->elevator;
          q->elevator = elevator_alloc(q, new_e);
          elevator_exit(old);                 // lockup! (*)
      
       - multipathd: (cont.)
          err = e->ops.elevator_init_fn(q);   // init fails; q->elevator is modified
      
      (*) When del_timer_sync() is called, lock_timer_base() will loop infinitely
      while timer->base == NULL. In this case, as timer will never initialized,
      it results in lockup.
      
      This patch introduces acquisition of q->sysfs_lock around elevator_init()
      into blk_init_allocated_queue(), to provide mutual exclusion between
      initialization of the q->scheduler and switching of the scheduler.
      
      This should fix this bugzilla:
      https://bugzilla.redhat.com/show_bug.cgi?id=902012Signed-off-by: default avatarTomoki Sekiyama <tomoki.sekiyama@hds.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      eb1c160b
    • Christoph Lameter's avatar
      block: Replace __get_cpu_var uses · 170d800a
      Christoph Lameter authored
      __get_cpu_var() is used for multiple purposes in the kernel source. One of
      them is address calculation via the form &__get_cpu_var(x).  This calculates
      the address for the instance of the percpu variable of the current processor
      based on an offset.
      
      Other use cases are for storing and retrieving data from the current
      processors percpu area.  __get_cpu_var() can be used as an lvalue when
      writing data or on the right side of an assignment.
      
      __get_cpu_var() is defined as :
      
      #define __get_cpu_var(var) (*this_cpu_ptr(&(var)))
      
      __get_cpu_var() always only does an address determination. However, store
      and retrieve operations could use a segment prefix (or global register on
      other platforms) to avoid the address calculation.
      
      this_cpu_write() and this_cpu_read() can directly take an offset into a
      percpu area and use optimized assembly code to read and write per cpu
      variables.
      
      This patch converts __get_cpu_var into either an explicit address
      calculation using this_cpu_ptr() or into a use of this_cpu operations that
      use the offset.  Thereby address calculations are avoided and less registers
      are used when code is generated.
      
      At the end of the patch set all uses of __get_cpu_var have been removed so
      the macro is removed too.
      
      The patch set includes passes over all arches as well. Once these operations
      are used throughout then specialized macros can be defined in non -x86
      arches as well in order to optimize per cpu access by f.e.  using a global
      register that may be set to the per cpu base.
      
      Transformations done to __get_cpu_var()
      
      1. Determine the address of the percpu instance of the current processor.
      
      	DEFINE_PER_CPU(int, y);
      	int *x = &__get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(&y);
      
      2. Same as #1 but this time an array structure is involved.
      
      	DEFINE_PER_CPU(int, y[20]);
      	int *x = __get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(y);
      
      3. Retrieve the content of the current processors instance of a per cpu
      variable.
      
      	DEFINE_PER_CPU(int, y);
      	int x = __get_cpu_var(y)
      
         Converts to
      
      	int x = __this_cpu_read(y);
      
      4. Retrieve the content of a percpu struct
      
      	DEFINE_PER_CPU(struct mystruct, y);
      	struct mystruct x = __get_cpu_var(y);
      
         Converts to
      
      	memcpy(&x, this_cpu_ptr(&y), sizeof(x));
      
      5. Assignment to a per cpu variable
      
      	DEFINE_PER_CPU(int, y)
      	__get_cpu_var(y) = x;
      
         Converts to
      
      	this_cpu_write(y, x);
      
      6. Increment/Decrement etc of a per cpu variable
      
      	DEFINE_PER_CPU(int, y);
      	__get_cpu_var(y)++
      
         Converts to
      
      	this_cpu_inc(y)
      Signed-off-by: default avatarChristoph Lameter <cl@linux.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      170d800a
    • Mikulas Patocka's avatar
      bdi: test bdi_init failure · 8077c0d9
      Mikulas Patocka authored
      There were two places where return value from bdi_init was not tested.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      8077c0d9
    • Mikulas Patocka's avatar
      block: fix a probe argument to blk_register_region · a207f593
      Mikulas Patocka authored
      The probe function is supposed to return NULL on failure (as we can see in
      kobj_lookup: kobj = probe(dev, index, data); ... if (kobj) return kobj;
      
      However, in loop and brd, it returns negative error from ERR_PTR.
      
      This causes a crash if we simulate disk allocation failure and run
      less -f /dev/loop0 because the negative number is interpreted as a pointer:
      
      BUG: unable to handle kernel NULL pointer dereference at 00000000000002b4
      IP: [<ffffffff8118b188>] __blkdev_get+0x28/0x450
      PGD 23c677067 PUD 23d6d1067 PMD 0
      Oops: 0000 [#1] PREEMPT SMP
      Modules linked in: loop hpfs nvidia(PO) ip6table_filter ip6_tables uvesafb cfbcopyarea cfbimgblt cfbfillrect fbcon font bitblit fbcon_rotate fbcon_cw fbcon_ud fbcon_ccw softcursor fb fbdev msr ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_state ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc tun ipv6 cpufreq_stats cpufreq_ondemand cpufreq_userspace cpufreq_powersave cpufreq_conservative hid_generic spadfs usbhid hid fuse raid0 snd_usb_audio snd_pcm_oss snd_mixer_oss md_mod snd_pcm snd_timer snd_page_alloc snd_hwdep snd_usbmidi_lib dmi_sysfs snd_rawmidi nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack snd soundcore lm85 hwmon_vid ohci_hcd ehci_pci ehci_hcd serverworks sata_svw libata acpi_cpufreq freq_table mperf ide_core usbcore kvm_amd kvm tg3 i2c_piix4 libphy microcode e100 usb_common ptp skge i2c_core pcspkr k10temp evdev floppy hwmon pps_core mii rtc_cmos button processor unix [last unloaded: nvidia]
      CPU: 1 PID: 6831 Comm: less Tainted: P        W  O 3.10.15-devel #18
      Hardware name: empty empty/S3992-E, BIOS 'V1.06   ' 06/09/2009
      task: ffff880203cc6bc0 ti: ffff88023e47c000 task.ti: ffff88023e47c000
      RIP: 0010:[<ffffffff8118b188>]  [<ffffffff8118b188>] __blkdev_get+0x28/0x450
      RSP: 0018:ffff88023e47dbd8  EFLAGS: 00010286
      RAX: ffffffffffffff74 RBX: ffffffffffffff74 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001
      RBP: ffff88023e47dc18 R08: 0000000000000002 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff88023f519658
      R13: ffffffff8118c300 R14: 0000000000000000 R15: ffff88023f519640
      FS:  00007f2070bf7700(0000) GS:ffff880247400000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000000002b4 CR3: 000000023da1d000 CR4: 00000000000007e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Stack:
       0000000000000002 0000001d00000000 000000003e47dc50 ffff88023f519640
       ffff88043d5bb668 ffffffff8118c300 ffff88023d683550 ffff88023e47de60
       ffff88023e47dc98 ffffffff8118c10d 0000001d81605698 0000000000000292
      Call Trace:
       [<ffffffff8118c300>] ? blkdev_get_by_dev+0x60/0x60
       [<ffffffff8118c10d>] blkdev_get+0x1dd/0x370
       [<ffffffff8118c300>] ? blkdev_get_by_dev+0x60/0x60
       [<ffffffff813cea6c>] ? _raw_spin_unlock+0x2c/0x50
       [<ffffffff8118c300>] ? blkdev_get_by_dev+0x60/0x60
       [<ffffffff8118c365>] blkdev_open+0x65/0x80
       [<ffffffff8114d12e>] do_dentry_open.isra.18+0x23e/0x2f0
       [<ffffffff8114d214>] finish_open+0x34/0x50
       [<ffffffff8115e122>] do_last.isra.62+0x2d2/0xc50
       [<ffffffff8115eb58>] path_openat.isra.63+0xb8/0x4d0
       [<ffffffff81115a8e>] ? might_fault+0x4e/0xa0
       [<ffffffff8115f4f0>] do_filp_open+0x40/0x90
       [<ffffffff813cea6c>] ? _raw_spin_unlock+0x2c/0x50
       [<ffffffff8116db85>] ? __alloc_fd+0xa5/0x1f0
       [<ffffffff8114e45f>] do_sys_open+0xef/0x1d0
       [<ffffffff8114e559>] SyS_open+0x19/0x20
       [<ffffffff813cff16>] system_call_fastpath+0x1a/0x1f
      Code: 44 00 00 55 48 89 e5 41 57 49 89 ff 41 56 41 89 d6 41 55 41 54 4c 8d 67 18 53 48 83 ec 18 89 75 cc e9 f2 00 00 00 0f 1f 44 00 00 <48> 8b 80 40 03 00 00 48 89 df 4c 8b 68 58 e8 d5
      a4 07 00 44 89
      RIP  [<ffffffff8118b188>] __blkdev_get+0x28/0x450
       RSP <ffff88023e47dbd8>
      CR2: 00000000000002b4
      ---[ end trace bb7f32dbf02398dc ]---
      
      The brd change should be backported to stable kernels starting with 2.6.25.
      The loop change should be backported to stable kernels starting with 2.6.22.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Cc: stable@kernel.org	# 2.6.22+
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a207f593
    • Mikulas Patocka's avatar
      loop: fix crash if blk_alloc_queue fails · 3ec981e3
      Mikulas Patocka authored
      loop: fix crash if blk_alloc_queue fails
      
      If blk_alloc_queue fails, loop_add cleans up, but it doesn't clean up the
      identifier allocated with idr_alloc. That causes crash on module unload in
      idr_for_each(&loop_index_idr, &loop_exit_cb, NULL); where we attempt to
      remove non-existed device with that id.
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000380
      IP: [<ffffffff812057c9>] del_gendisk+0x19/0x2d0
      PGD 43d399067 PUD 43d0ad067 PMD 0
      Oops: 0000 [#1] PREEMPT SMP
      Modules linked in: loop(-) dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_loop dm_mod ip6table_filter ip6_tables uvesafb cfbcopyarea cfbimgblt cfbfillrect fbcon font bitblit fbcon_rotate fbcon_cw fbcon_ud fbcon_ccw softcursor fb fbdev msr ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_state ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc tun ipv6 cpufreq_userspace cpufreq_stats cpufreq_ondemand cpufreq_conservative cpufreq_powersave spadfs fuse hid_generic usbhid hid raid0 md_mod dmi_sysfs nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack snd_usb_audio snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc lm85 hwmon_vid snd_hwdep snd_usbmidi_lib snd_rawmidi snd soundcore acpi_cpufreq ohci_hcd freq_table tg3 ehci_pci mperf ehci_hcd kvm_amd kvm sata_svw serverworks libphy libata ide_core k10temp usbcore hwmon microcode ptp pcspkr pps_core e100 skge mii usb_common i2c_piix4 floppy evdev rtc_cmos i2c_core processor but!
       ton unix
      CPU: 7 PID: 2735 Comm: rmmod Tainted: G        W    3.10.15-devel #15
      Hardware name: empty empty/S3992-E, BIOS 'V1.06   ' 06/09/2009
      task: ffff88043d38e780 ti: ffff88043d21e000 task.ti: ffff88043d21e000
      RIP: 0010:[<ffffffff812057c9>]  [<ffffffff812057c9>] del_gendisk+0x19/0x2d0
      RSP: 0018:ffff88043d21fe10  EFLAGS: 00010282
      RAX: ffffffffa05102e0 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: ffff88043ea82800 RDI: 0000000000000000
      RBP: ffff88043d21fe48 R08: 0000000000000000 R09: 0000000000000001
      R10: 0000000000000001 R11: 0000000000000000 R12: 00000000000000ff
      R13: 0000000000000080 R14: 0000000000000000 R15: ffff88043ea82800
      FS:  00007ff646534700(0000) GS:ffff880447000000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000000000000380 CR3: 000000043e9bf000 CR4: 00000000000007e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Stack:
       ffffffff8100aba4 0000000000000092 ffff88043d21fe48 ffff88043ea82800
       00000000000000ff ffff88043d21fe98 0000000000000000 ffff88043d21fe60
       ffffffffa05102b4 0000000000000000 ffff88043d21fe70 ffffffffa05102ec
      Call Trace:
       [<ffffffff8100aba4>] ? native_sched_clock+0x24/0x80
       [<ffffffffa05102b4>] loop_remove+0x14/0x40 [loop]
       [<ffffffffa05102ec>] loop_exit_cb+0xc/0x10 [loop]
       [<ffffffff81217b74>] idr_for_each+0x104/0x190
       [<ffffffffa05102e0>] ? loop_remove+0x40/0x40 [loop]
       [<ffffffff8109adc5>] ? trace_hardirqs_on_caller+0x105/0x1d0
       [<ffffffffa05135dc>] loop_exit+0x34/0xa58 [loop]
       [<ffffffff810a98ea>] SyS_delete_module+0x13a/0x260
       [<ffffffff81221d5e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
       [<ffffffff813cff16>] system_call_fastpath+0x1a/0x1f
      Code: f0 4c 8b 6d f8 c9 c3 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 56 41 55 4c 8d af 80 00 00 00 41 54 53 48 89 fb 48 83 ec 18 <48> 83 bf 80 03 00
      00 00 74 4d e8 98 fe ff ff 31 f6 48 c7 c7 20
      RIP  [<ffffffff812057c9>] del_gendisk+0x19/0x2d0
       RSP <ffff88043d21fe10>
      CR2: 0000000000000380
      ---[ end trace 64ec069ec70f1309 ]---
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Cc: stable@kernel.org	# 3.1+
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3ec981e3