1. 17 Apr, 2017 9 commits
    • Josef Bacik's avatar
      nbd: handle dead connections · 560bc4b3
      Josef Bacik authored
      Sometimes we like to upgrade our server without making all of our
      clients freak out and reconnect.  This patch provides a way to specify a
      dead connection timeout to allow us to pause all requests and wait for
      new connections to be opened.  With this in place I can take down the
      nbd server for less than the dead connection timeout time and bring it
      back up and everything resumes gracefully.
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      560bc4b3
    • Josef Bacik's avatar
      nbd: only clear the queue on device teardown · 2516ab15
      Josef Bacik authored
      When running a disconnect torture test I noticed that sometimes we would
      crash with a negative ref count on our queue.  This was because we were
      ending the same request twice.  Turns out we were racing with
      NBD_CLEAR_SOCK clearing the requests as well as the teardown of the
      device clearing the requests.  So instead make the ioctl only shutdown
      the sockets and make it so that we only ever run nbd_clear_que from the
      device teardown.
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      2516ab15
    • Josef Bacik's avatar
      nbd: multicast dead link notifications · 799f9a38
      Josef Bacik authored
      Provide a mechanism to notify userspace that there's been a link problem
      on a NBD device.  This will allow userspace to re-establish a connection
      and provide the new socket to the device without disrupting the device.
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      799f9a38
    • Josef Bacik's avatar
      nbd: add a reconfigure netlink command · b7aa3d39
      Josef Bacik authored
      We want to be able to reconnect dead connections to existing block
      devices, so add a reconfigure netlink command.  We will also allow users
      to change their timeout on the fly, but everything else will require a
      disconnect and reconnect.  You won't be able to add more connections
      either, simply replace dead connections with new more lively
      connections.
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      b7aa3d39
    • Josef Bacik's avatar
      nbd: add a basic netlink interface · e46c7287
      Josef Bacik authored
      The existing ioctl interface for configuring NBD devices is a bit
      cumbersome and hard to extend.  The other problem is we leave a
      userspace app sitting in it's syscall until the device disconnects,
      which is less than ideal.
      
      This patch introduces a netlink interface for adding and disconnecting
      nbd devices.  This has the benefits of being easily extendable without
      breaking older userspace applications, and allows us to configure a nbd
      device without leaving a userspace app sitting waiting for the device to
      disconnect.
      
      With this interface we also gain the ability to configure more devices
      than are preallocated at insmod time.  We also have gained the ability
      to not specify a particular device and be provided one for us so that
      userspace doesn't need to find a free device to configure.
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      e46c7287
    • Josef Bacik's avatar
      nbd: stop using the bdev everywhere · 29eaadc0
      Josef Bacik authored
      In preparation for the upcoming netlink interface we need to not rely on
      already having the bdev for the NBD device we are doing operations on.
      Instead of passing the bdev around, just use it in places where we know
      we already have the bdev.
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      29eaadc0
    • Josef Bacik's avatar
      nbd: separate out the config information · 5ea8d108
      Josef Bacik authored
      In order to properly refcount the various aspects of a NBD device we
      need to separate out the configuration elements of the nbd device.  The
      configuration of a NBD device has a different lifetime from the actual
      device, so it doesn't make sense to bundle these two concepts.  Add a
      config_refs to keep track of the configuration structure, that way we
      can be sure that we never access it when we've torn down the device.
      Add a new nbd_config structure to hold all of the transient
      configuration information.  Finally create this when we open the device
      so that it is in place when we start to configure the device.  This has
      a nice side-effect of fixing a long standing problem where you could end
      up with a half-configured nbd device that needed to be "disconnected" in
      order to be usable again.  Now once we close our device the
      configuration will be discarded.
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      5ea8d108
    • Josef Bacik's avatar
      nbd: handle single path failures gracefully · f3733247
      Josef Bacik authored
      Currently if we have multiple connections and one of them goes down we will tear
      down the whole device.  However there's no reason we need to do this as we
      could have other connections that are working fine.  Deal with this by keeping
      track of the state of the different connections, and if we lose one we mark it
      as dead and send all IO destined for that socket to one of the other healthy
      sockets.  Any outstanding requests that were on the dead socket will timeout and
      be re-submitted properly.
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      f3733247
    • Josef Bacik's avatar
      nbd: put socket in error cases · 9b1355d5
      Josef Bacik authored
      When adding a new socket we look it up and then try to add it to our
      configuration.  If any of those steps fail we need to make sure we put
      the socket so we don't leak them.
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      9b1355d5
  2. 16 Apr, 2017 19 commits
  3. 14 Apr, 2017 7 commits
    • Dan Carpenter's avatar
      net: off by one in inet6_pton() · a88086e0
      Dan Carpenter authored
      If "scope_len" is sizeof(scope_id) then we would put the NUL terminator
      one space beyond the end of the buffer.
      
      Fixes: b1a951fe ("net/utils: generic inet_pton_with_scope helper")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      a88086e0
    • Omar Sandoval's avatar
      blk-mq: introduce Kyber multiqueue I/O scheduler · 00e04393
      Omar Sandoval authored
      The Kyber I/O scheduler is an I/O scheduler for fast devices designed to
      scale to multiple queues. Users configure only two knobs, the target
      read and synchronous write latencies, and the scheduler tunes itself to
      achieve that latency goal.
      
      The implementation is based on "tokens", built on top of the scalable
      bitmap library. Tokens serve as a mechanism for limiting requests. There
      are two tiers of tokens: queueing tokens and dispatch tokens.
      
      A queueing token is required to allocate a request. In fact, these
      tokens are actually the blk-mq internal scheduler tags, but the
      scheduler manages the allocation directly in order to implement its
      policy.
      
      Dispatch tokens are device-wide and split up into two scheduling
      domains: reads vs. writes. Each hardware queue dispatches batches
      round-robin between the scheduling domains as long as tokens are
      available for that domain.
      
      These tokens can be used as the mechanism to enable various policies.
      The policy Kyber uses is inspired by active queue management techniques
      for network routing, similar to blk-wbt. The scheduler monitors
      latencies and scales the number of dispatch tokens accordingly. Queueing
      tokens are used to prevent starvation of synchronous requests by
      asynchronous requests.
      
      Various extensions are possible, including better heuristics and ionice
      support. The new scheduler isn't set as the default yet.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      00e04393
    • Omar Sandoval's avatar
      blk-mq-sched: make completed_request() callback more useful · c05f8525
      Omar Sandoval authored
      Currently, this callback is called right after put_request() and has no
      distinguishable purpose. Instead, let's call it before put_request() as
      soon as I/O has completed on the request, before we account it in
      blk-stat. With this, Kyber can enable stats when it sees a latency
      outlier and make sure the outlier gets accounted.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      c05f8525
    • Omar Sandoval's avatar
      blk-mq: export helpers · 5b727272
      Omar Sandoval authored
      blk_mq_finish_request() is required for schedulers that define their own
      put_request(). blk_mq_run_hw_queue() is required for schedulers that
      hold back requests to be run later.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      5b727272
    • Omar Sandoval's avatar
      blk-mq: add shallow depth option for blk_mq_get_tag() · 229a9287
      Omar Sandoval authored
      Wire up the sbitmap_get_shallow() operation to the tag code so that a
      caller can limit the number of tags available to it.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      229a9287
    • Omar Sandoval's avatar
      sbitmap: add sbitmap_get_shallow() operation · c05e6673
      Omar Sandoval authored
      This operation supports the use case of limiting the number of bits that
      can be allocated for a given operation. Rather than setting aside some
      bits at the end of the bitmap, we can set aside bits in each word of the
      bitmap. This means we can keep the allocation hints spread out and
      support sbitmap_resize() nicely at the cost of lower granularity for the
      allowed depth.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      c05e6673
    • Christoph Hellwig's avatar
      remove the mg_disk driver · 84253394
      Christoph Hellwig authored
      This drivers was added in 2008, but as far as a I can tell we never had a
      single platform that actually registered resources for the platform driver.
      
      It's also been unmaintained for a long time and apparently has a ATA mode
      that can be driven using the IDE/libata subsystem.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      84253394
  4. 11 Apr, 2017 1 commit
    • Jan Kara's avatar
      block: Fix list corruption of blk stats callback list · 3f19cd23
      Jan Kara authored
      When CFQ calls wbt_disable_default(), it will call
      blk_stat_remove_callback() to stop gathering IO statistics for the
      purposes of writeback throttling. Later, when request_queue is
      unregistered, wbt_exit() will call blk_stat_remove_callback() again
      which will try to delete callback from the list again and possibly cause
      list corruption.
      
      Fix the problem by making wbt_disable_default() called wbt_exit() which
      is properly guarded against being called multiple times.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      3f19cd23
  5. 10 Apr, 2017 2 commits
  6. 08 Apr, 2017 2 commits
    • Martin K. Petersen's avatar
      scsi: sd: Remove LBPRZ dependency for discards · bcd069bb
      Martin K. Petersen authored
      Separating discards and zeroout operations allows us to remove the LBPRZ
      block zeroing constraints from discards and honor the device preferences
      for UNMAP commands.
      
      If supported by the device, we'll also choose UNMAP over one of the
      WRITE SAME variants for discards.
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      bcd069bb
    • Martin K. Petersen's avatar
      scsi: sd: Separate zeroout and discard command choices · e6bd9312
      Martin K. Petersen authored
      Now that zeroout and discards are distinct operations we need to
      separate the policy of choosing the appropriate command. Create a
      zeroing_mode which can be one of:
      
      write:			Zeroout assist not present, use regular WRITE
      writesame:		Allow WRITE SAME(10/16) with a zeroed payload
      writesame_16_unmap:	Allow WRITE SAME(16) with UNMAP
      writesame_10_unmap:	Allow WRITE SAME(10) with UNMAP
      
      The last two are conditional on the device being thin provisioned with
      LBPRZ=1 and LBPWS=1 or LBPWS10=1 respectively.
      
      Whether to set the UNMAP bit or not depends on the REQ_NOUNMAP flag. And
      if none of the _unmap variants are supported, regular WRITE SAME will be
      used if the device supports it.
      
      The zeroout_mode is exported in sysfs and the detected mode for a given
      device can be overridden using the string constants above.
      
      With this change in place we can now issue WRITE SAME(16) with UNMAP set
      for block zeroing applications that require hard guarantees and
      logical_block_size granularity. And at the same time use the UNMAP
      command with the device's preferred granulary and alignment for discard
      operations.
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      e6bd9312