1. 07 Jun, 2013 1 commit
    • Stefan Bader's avatar
      xen/blkback: Use physical sector size for setup · 7c4d7d71
      Stefan Bader authored
      Currently xen-blkback passes the logical sector size over xenbus and
      xen-blkfront sets up the paravirt disk with that logical block size.
      But newer drives usually have the logical sector size set to 512 for
      compatibility reasons and would show the actual sector size only in
      physical sector size.
      This results in the device being partitioned and accessed in dom0 with
      the correct sector size, but the guest thinks 512 bytes is the correct
      block size. And that results in poor performance.
      
      To fix this, blkback gets modified to pass also physical-sector-size
      over xenbus and blkfront to use both values to set up the paravirt
      disk. I did not just change the passed in sector-size because I am
      not sure having a bigger logical sector size than the physical one
      is valid (and that would happen if a newer dom0 kernel hits an older
      domU kernel). Also this way a domU set up before should still be
      accessible (just some tools might detect the unaligned setup).
      
      [v2: Make xenbus write failure non-fatal]
      [v3: Use xenbus_scanf instead of xenbus_gather]
      [v4: Rebased against segment changes]
      Signed-off-by: default avatarStefan Bader <stefan.bader@canonical.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      7c4d7d71
  2. 04 Jun, 2013 2 commits
  3. 08 May, 2013 1 commit
  4. 07 May, 2013 1 commit
  5. 18 Apr, 2013 7 commits
    • Roger Pau Monne's avatar
      xen-block: implement indirect descriptors · 402b27f9
      Roger Pau Monne authored
      Indirect descriptors introduce a new block operation
      (BLKIF_OP_INDIRECT) that passes grant references instead of segments
      in the request. This grant references are filled with arrays of
      blkif_request_segment_aligned, this way we can send more segments in a
      request.
      
      The proposed implementation sets the maximum number of indirect grefs
      (frames filled with blkif_request_segment_aligned) to 256 in the
      backend and 32 in the frontend. The value in the frontend has been
      chosen experimentally, and the backend value has been set to a sane
      value that allows expanding the maximum number of indirect descriptors
      in the frontend if needed.
      
      The migration code has changed from the previous implementation, in
      which we simply remapped the segments on the shared ring. Now the
      maximum number of segments allowed in a request can change depending
      on the backend, so we have to requeue all the requests in the ring and
      in the queue and split the bios in them if they are bigger than the
      new maximum number of segments.
      
      [v2: Fixed minor comments by Konrad.
      [v1: Added padding to make the indirect request 64bit aligned.
       Added some BUGs, comments; fixed number of indirect pages in
       blkif_get_x86_{32/64}_req. Added description about the indirect operation
       in blkif.h]
      Signed-off-by: default avatarRoger Pau Monné <roger.pau@citrix.com>
      [v3: Fixed spaces and tabs mix ups]
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      402b27f9
    • Roger Pau Monne's avatar
      xen-blkback: expand map/unmap functions · 31552ee3
      Roger Pau Monne authored
      Preparatory change for implementing indirect descriptors. Change
      xen_blkbk_{map/unmap} in order to be able to map/unmap a random amount
      of grants (previously it was limited to
      BLKIF_MAX_SEGMENTS_PER_REQUEST). Also, remove the usage of pending_req
      in the map/unmap functions, so we can map/unmap grants without needing
      to pass a pending_req.
      Signed-off-by: default avatarRoger Pau Monné <roger.pau@citrix.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: xen-devel@lists.xen.org
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      31552ee3
    • Roger Pau Monne's avatar
      xen-blkback: make the queue of free requests per backend · bf0720c4
      Roger Pau Monne authored
      Remove the last dependency from blkbk by moving the list of free
      requests to blkif. This change reduces the contention on the list of
      available requests.
      Signed-off-by: default avatarRoger Pau Monné <roger.pau@citrix.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: xen-devel@lists.xen.org
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      bf0720c4
    • Roger Pau Monne's avatar
      xen-blkback: move pending handles list from blkbk to pending_req · bb6acb28
      Roger Pau Monne authored
      Moving grant ref handles from blkbk to pending_req will allow us to
      get rid of the shared blkbk structure.
      Signed-off-by: default avatarRoger Pau Monné <roger.pau@citrix.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: xen-devel@lists.xen.org
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      bb6acb28
    • Roger Pau Monne's avatar
      xen-blkback: implement LRU mechanism for persistent grants · 3f3aad5e
      Roger Pau Monne authored
      This mechanism allows blkback to change the number of grants
      persistently mapped at run time.
      
      The algorithm uses a simple LRU mechanism that removes (if needed) the
      persistent grants that have not been used since the last LRU run, or
      if all grants have been used it removes the first grants in the list
      (that are not in use).
      
      The algorithm allows the user to change the maximum number of
      persistent grants, by changing max_persistent_grants in sysfs.
      
      Since we are storing the persistent grants used inside the request
      struct (to be able to mark them as "unused" when unmapping), we no
      longer need the bitmap (unmap_seg).
      Signed-off-by: default avatarRoger Pau Monné <roger.pau@citrix.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: xen-devel@lists.xen.org
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      3f3aad5e
    • Roger Pau Monne's avatar
      xen-blkback: use balloon pages for all mappings · c6cc142d
      Roger Pau Monne authored
      Using balloon pages for all granted pages allows us to simplify the
      logic in blkback, especially in the xen_blkbk_map function, since now
      we can decide if we want to map a grant persistently or not after we
      have actually mapped it. This could not be done before because
      persistent grants used ballooned pages, whereas non-persistent grants
      used pages from the kernel.
      
      This patch also introduces several changes, the first one is that the
      list of free pages is no longer global, now each blkback instance has
      it's own list of free pages that can be used to map grants. Also, a
      run time parameter (max_buffer_pages) has been added in order to tune
      the maximum number of free pages each blkback instance will keep in
      it's buffer.
      Signed-off-by: default avatarRoger Pau Monné <roger.pau@citrix.com>
      Cc: xen-devel@lists.xen.org
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      c6cc142d
    • Roger Pau Monne's avatar
      xen-blkback: print stats about persistent grants · c1a15d08
      Roger Pau Monne authored
      Signed-off-by: default avatarRoger Pau Monné <roger.pau@citrix.com>
      Cc: xen-devel@lists.xen.org
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      c1a15d08
  6. 15 Apr, 2013 1 commit
  7. 14 Apr, 2013 10 commits
  8. 13 Apr, 2013 3 commits
    • Suleiman Souhlal's avatar
      vfs: Revert spurious fix to spinning prevention in prune_icache_sb · 5b55d708
      Suleiman Souhlal authored
      Revert commit 62a3ddef ("vfs: fix spinning prevention in prune_icache_sb").
      
      This commit doesn't look right: since we are looking at the tail of the
      list (sb->s_inode_lru.prev) if we want to skip an inode, we should put
      it back at the head of the list instead of the tail, otherwise we will
      keep spinning on it.
      
      Discovered when investigating why prune_icache_sb came top in perf
      reports of a swapping load.
      Signed-off-by: default avatarSuleiman Souhlal <suleiman@google.com>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: stable@vger.kernel.org # v3.2+
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5b55d708
    • Linus Torvalds's avatar
      kobject: fix kset_find_obj() race with concurrent last kobject_put() · a49b7e82
      Linus Torvalds authored
      Anatol Pomozov identified a race condition that hits module unloading
      and re-loading.  To quote Anatol:
      
       "This is a race codition that exists between kset_find_obj() and
        kobject_put().  kset_find_obj() might return kobject that has refcount
        equal to 0 if this kobject is freeing by kobject_put() in other
        thread.
      
        Here is timeline for the crash in case if kset_find_obj() searches for
        an object tht nobody holds and other thread is doing kobject_put() on
        the same kobject:
      
          THREAD A (calls kset_find_obj())     THREAD B (calls kobject_put())
          splin_lock()
                                               atomic_dec_return(kobj->kref), counter gets zero here
                                               ... starts kobject cleanup ....
                                               spin_lock() // WAIT thread A in kobj_kset_leave()
          iterate over kset->list
          atomic_inc(kobj->kref) (counter becomes 1)
          spin_unlock()
                                               spin_lock() // taken
                                               // it does not know that thread A increased counter so it
                                               remove obj from list
                                               spin_unlock()
                                               vfree(module) // frees module object with containing kobj
      
          // kobj points to freed memory area!!
          kobject_put(kobj) // OOPS!!!!
      
        The race above happens because module.c tries to use kset_find_obj()
        when somebody unloads module.  The module.c code was introduced in
        commit 6494a93d"
      
      Anatol supplied a patch specific for module.c that worked around the
      problem by simply not using kset_find_obj() at all, but rather than make
      a local band-aid, this just fixes kset_find_obj() to be thread-safe
      using the proper model of refusing the get a new reference if the
      refcount has already dropped to zero.
      
      See examples of this proper refcount handling not only in the kref
      documentation, but in various other equivalent uses of this pattern by
      grepping for atomic_inc_not_zero().
      
      [ Side note: the module race does indicate that module loading and
        unloading is not properly serialized wrt sysfs information using the
        module mutex.  That may require further thought, but this is the
        correct fix at the kobject layer regardless. ]
      Reported-analyzed-and-tested-by: default avatarAnatol Pomozov <anatol.pomozov@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a49b7e82
    • Josef Bacik's avatar
      Btrfs: make sure nbytes are right after log replay · 4bc4bee4
      Josef Bacik authored
      While trying to track down a tree log replay bug I noticed that fsck was always
      complaining about nbytes not being right for our fsynced file.  That is because
      the new fsync stuff doesn't wait for ordered extents to complete, so the inodes
      nbytes are not necessarily updated properly when we log it.  So to fix this we
      need to set nbytes to whatever it is on the inode that is on disk, so when we
      replay the extents we can just add the bytes that are being added as we replay
      the extent.  This makes it work for the case that we have the wrong nbytes or
      the case that we logged everything and nbytes is actually correct.  With this
      I'm no longer getting nbytes errors out of btrfsck.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
      4bc4bee4
  9. 12 Apr, 2013 14 commits