1. 11 Oct, 2012 18 commits
  2. 28 Sep, 2012 2 commits
  3. 27 Sep, 2012 2 commits
  4. 26 Sep, 2012 4 commits
    • Mikulas Patocka's avatar
      percpu-rw-semaphore: fix documentation typos · e6b5c082
      Mikulas Patocka authored
      One more patch for this thing, fixing some typos in the documentation.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e6b5c082
    • Fengguang Wu's avatar
      fs/block_dev.c:1644:5: sparse: symbol 'blkdev_mmap' was not declared · 3eab7315
      Fengguang Wu authored
      blkdev_mmap() isn't used outside of fs/block_dev.c, mark it as
      static.
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3eab7315
    • Mikulas Patocka's avatar
      blockdev: turn a rw semaphore into a percpu rw semaphore · 62ac665f
      Mikulas Patocka authored
      This avoids cache line bouncing when many processes lock the semaphore
      for read.
      
      New percpu lock implementation
      
      The lock consists of an array of percpu unsigned integers, a boolean
      variable and a mutex.
      
      When we take the lock for read, we enter rcu read section, check for a
      "locked" variable. If it is false, we increase a percpu counter on the
      current cpu and exit the rcu section. If "locked" is true, we exit the
      rcu section, take the mutex and drop it (this waits until a writer
      finished) and retry.
      
      Unlocking for read just decreases percpu variable. Note that we can
      unlock on a difference cpu than where we locked, in this case the
      counter underflows. The sum of all percpu counters represents the number
      of processes that hold the lock for read.
      
      When we need to lock for write, we take the mutex, set "locked" variable
      to true and synchronize rcu. Since RCU has been synchronized, no
      processes can create new read locks. We wait until the sum of percpu
      counters is zero - when it is, there are no readers in the critical
      section.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      62ac665f
    • Mikulas Patocka's avatar
      Fix a crash when block device is read and block size is changed at the same time · b87570f5
      Mikulas Patocka authored
      The kernel may crash when block size is changed and I/O is issued
      simultaneously.
      
      Because some subsystems (udev or lvm) may read any block device anytime,
      the bug actually puts any code that changes a block device size in
      jeopardy.
      
      The crash can be reproduced if you place "msleep(1000)" to
      blkdev_get_blocks just before "bh->b_size = max_blocks <<
      inode->i_blkbits;".
      Then, run "dd if=/dev/ram0 of=/dev/null bs=4k count=1 iflag=direct"
      While it is waiting in msleep, run "blockdev --setbsz 2048 /dev/ram0"
      You get a BUG.
      
      The direct and non-direct I/O is written with the assumption that block
      size does not change. It doesn't seem practical to fix these crashes
      one-by-one there may be many crash possibilities when block size changes
      at a certain place and it is impossible to find them all and verify the
      code.
      
      This patch introduces a new rw-lock bd_block_size_semaphore. The lock is
      taken for read during I/O. It is taken for write when changing block
      size. Consequently, block size can't be changed while I/O is being
      submitted.
      
      For asynchronous I/O, the patch only prevents block size change while
      the I/O is being submitted. The block size can change when the I/O is in
      progress or when the I/O is being finished. This is acceptable because
      there are no accesses to block size when asynchronous I/O is being
      finished.
      
      The patch prevents block size changing while the device is mapped with
      mmap.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b87570f5
  5. 21 Sep, 2012 2 commits
    • Tejun Heo's avatar
      block: fix request_queue->flags initialization · 60ea8226
      Tejun Heo authored
      A queue newly allocated with blk_alloc_queue_node() has only
      QUEUE_FLAG_BYPASS set.  For request-based drivers,
      blk_init_allocated_queue() is called and q->queue_flags is overwritten
      with QUEUE_FLAG_DEFAULT which doesn't include BYPASS even though the
      initial bypass is still in effect.
      
      In blk_init_allocated_queue(), or QUEUE_FLAG_DEFAULT to q->queue_flags
      instead of overwriting.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: stable@vger.kernel.org
      Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      60ea8226
    • Tejun Heo's avatar
      block: lift the initial queue bypass mode on blk_register_queue() instead of... · 749fefe6
      Tejun Heo authored
      block: lift the initial queue bypass mode on blk_register_queue() instead of blk_init_allocated_queue()
      
      b82d4b19 ("blkcg: make request_queue bypassing on allocation") made
      request_queues bypassed on allocation to avoid switching on and off
      bypass mode on a queue being initialized.  Some drivers allocate and
      then destroy a lot of queues without fully initializing them and
      incurring bypass latency overhead on each of them could add upto
      significant overhead.
      
      Unfortunately, blk_init_allocated_queue() is never used by queues of
      bio-based drivers, which means that all bio-based driver queues are in
      bypass mode even after initialization and registration complete
      successfully.
      
      Due to the limited way request_queues are used by bio drivers, this
      problem is hidden pretty well but it shows up when blk-throttle is
      used in combination with a bio-based driver.  Trying to configure
      (echoing to cgroupfs file) blk-throttle for a bio-based driver hangs
      indefinitely in blkg_conf_prep() waiting for bypass mode to end.
      
      This patch moves the initial blk_queue_bypass_end() call from
      blk_init_allocated_queue() to blk_register_queue() which is called for
      any userland-visible queues regardless of its type.
      
      I believe this is correct because I don't think there is any block
      driver which needs or wants working elevator and blk-cgroup on a queue
      which isn't visible to userland.  If there are such users, we need a
      different solution.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarJoseph Glanville <joseph.glanville@orionvm.com.au>
      Cc: stable@vger.kernel.org
      Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      749fefe6
  6. 20 Sep, 2012 5 commits
  7. 12 Sep, 2012 1 commit
    • Peter Senna Tschudin's avatar
      block/blk-tag.c: Remove useless kfree · d41570b7
      Peter Senna Tschudin authored
      Remove useless kfree() and clean up code related to the removal.
      
      The semantic patch that finds this problem is as follows:
      (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @r exists@
      position p1,p2;
      expression x;
      @@
      
      if (x@p1 == NULL) { ... kfree@p2(x); ... return ...; }
      
      @unchanged exists@
      position r.p1,r.p2;
      expression e <= r.x,x,e1;
      iterator I;
      statement S;
      @@
      
      if (x@p1 == NULL) { ... when != I(x,...) S
                              when != e = e1
                              when != e += e1
                              when != e -= e1
                              when != ++e
                              when != --e
                              when != e++
                              when != e--
                              when != &e
         kfree@p2(x); ... return ...; }
      
      @ok depends on unchanged exists@
      position any r.p1;
      position r.p2;
      expression x;
      @@
      
      ... when != true x@p1 == NULL
      kfree@p2(x);
      
      @depends on !ok && unchanged@
      position r.p2;
      expression x;
      @@
      
      *kfree@p2(x);
      // </smpl>
      Signed-off-by: default avatarPeter Senna Tschudin <peter.senna@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d41570b7
  8. 09 Sep, 2012 6 commits
    • Jaehoon Chung's avatar
      block: remove the duplicated setting for congestion_threshold · e32463b2
      Jaehoon Chung authored
      Before call the blk_queue_congestion_threshold(),
      the blk_queue_congestion_threshold() is already called at blk_queue_make_rquest().
      Because this code is the duplicated, it has removed.
      Signed-off-by: default avatarJaehoon Chung <jh80.chung@samsung.com>
      Signed-off-by: default avatarKyungmin Park <kyungmin.park@samsung.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e32463b2
    • Dave Reisner's avatar
      block: reject invalid queue attribute values · b1f3b64d
      Dave Reisner authored
      Instead of using simple_strtoul which "converts" invalid numbers to 0,
      use strict_strtoul and perform error checking to ensure that userspace
      passes us a valid unsigned long. This addresses problems with functions
      such as writev, which might want to write a trailing newline -- the
      newline should rightfully be rejected, but the value preceeding it
      should be preserved.
      
      Fixes BZ#46981.
      Signed-off-by: default avatarDave Reisner <dreisner@archlinux.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b1f3b64d
    • Kent Overstreet's avatar
      block: Add bio_clone_bioset(), bio_clone_kmalloc() · bf800ef1
      Kent Overstreet authored
      Previously, there was bio_clone() but it only allocated from the fs bio
      set; as a result various users were open coding it and using
      __bio_clone().
      
      This changes bio_clone() to become bio_clone_bioset(), and then we add
      bio_clone() and bio_clone_kmalloc() as wrappers around it, making use of
      the functionality the last patch adedd.
      
      This will also help in a later patch changing how bio cloning works.
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: NeilBrown <neilb@suse.de>
      CC: Alasdair Kergon <agk@redhat.com>
      CC: Boaz Harrosh <bharrosh@panasas.com>
      CC: Jeff Garzik <jeff@garzik.org>
      Acked-by: default avatarJeff Garzik <jgarzik@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      bf800ef1
    • Kent Overstreet's avatar
      block: Consolidate bio_alloc_bioset(), bio_kmalloc() · 3f86a82a
      Kent Overstreet authored
      Previously, bio_kmalloc() and bio_alloc_bioset() behaved slightly
      different because there was some almost-duplicated code - this fixes
      some of that.
      
      The important change is that previously bio_kmalloc() always set
      bi_io_vec = bi_inline_vecs, even if nr_iovecs == 0 - unlike
      bio_alloc_bioset(). This would cause bio_has_data() to return true; I
      don't know if this resulted in any actual bugs but it was certainly
      wrong.
      
      bio_kmalloc() and bio_alloc_bioset() also have different arbitrary
      limits on nr_iovecs - 1024 (UIO_MAXIOV) for bio_kmalloc(), 256
      (BIO_MAX_PAGES) for bio_alloc_bioset(). This patch doesn't fix that, but
      at least they're enforced closer together and hopefully they will be
      fixed in a later patch.
      
      This'll also help with some future cleanups - there are a fair number of
      functions that allocate bios (e.g. bio_clone()), and now they don't have
      to be duplicated for bio_alloc(), bio_alloc_bioset(), and bio_kmalloc().
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      v7: Re-add dropped comments, improv patch description
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3f86a82a
    • Kent Overstreet's avatar
      block: Kill bi_destructor · 4254bba1
      Kent Overstreet authored
      Now that we've got generic code for freeing bios allocated from bio
      pools, this isn't needed anymore.
      
      This patch also makes bio_free() static, since without bi_destructor
      there should be no need for it to be called anywhere else.
      
      bio_free() is now only called from bio_put, so we can refactor those a
      bit - move some code from bio_put() to bio_free() and kill the redundant
      bio->bi_next = NULL.
      
      v5: Switch to BIO_KMALLOC_POOL ((void *)~0), per Boaz
      v6: BIO_KMALLOC_POOL now NULL, drop bio_free's EXPORT_SYMBOL
      v7: No #define BIO_KMALLOC_POOL anymore
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      4254bba1
    • Kent Overstreet's avatar
      pktcdvd: Switch to bio_kmalloc() · ccc5c9ca
      Kent Overstreet authored
      This is prep work for killing bi_destructor - previously, pktcdvd had
      its own pkt_bio_alloc which was basically duplication bio_kmalloc(),
      necessitating its own bi_destructor implementation.
      
      v5: Un-reorder some functions, to make the patch easier to review
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Acked-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      ccc5c9ca