1. 22 May, 2012 21 commits
  2. 20 May, 2012 10 commits
    • NeilBrown's avatar
      md/raid10: split out interpretation of layout to separate function. · deb200d0
      NeilBrown authored
      We will soon be interpreting the layout (and chunksize etc) from
      multiple places to support reshape.  So split it out into separate
      function.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      deb200d0
    • NeilBrown's avatar
      md/raid10: Introduce 'prev' geometry to support reshape. · f8c9e74f
      NeilBrown authored
      When RAID10 supports reshape it will need a 'previous' and a 'current'
      geometry, so introduce that here.
      Use the 'prev' geometry when before the reshape_position, and the
      current 'geo' when beyond it.  At other times, use both as
      appropriate.
      
      For now, both are identical (And reshape_position is never set).
      
      When we use the 'prev' geometry, we must use the old data_offset.
      When we use the current (And a reshape is happening) we must use
      the new_data_offset.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      f8c9e74f
    • NeilBrown's avatar
      md: use resync_max_sectors for reshape as well as resync. · c804cdec
      NeilBrown authored
      Some resync type operations need to act on the address space of the
      device, others on the address space of the array.
      
      This only affects RAID10, so it sets resync_max_sectors to the array
      size (it defaults to the device size), and that is currently used for
      resync only.  However reshape of a RAID10 must be done against the
      array size, not device size, so change code to use resync_max_sectors
      for both the resync and the reshape cases.
      This does not affect RAID5 or RAID1, just RAID10.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      c804cdec
    • NeilBrown's avatar
      md: teach sync_page_io about new_data_offset. · 1fdd6fc9
      NeilBrown authored
      Some code in raid1 and raid10 use sync_page_io to
      read/write pages when responding to read errors.
      As we will shortly support changing data_offset for
      raid10, this function must understand new_data_offset.
      
      So add that understanding.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      1fdd6fc9
    • NeilBrown's avatar
      md/raid10: collect some geometry fields into a dedicated structure. · 5cf00fcd
      NeilBrown authored
      We will shortly be adding reshape support for RAID10 which will
      require it having 2 concurrent geometries (before and after).
      To make that easier, collect most geometry fields into 'struct geom'
      and access them from there.  Then we will more easily be able to add
      a second set of fields.
      
      Note that 'copies' is not in this struct and so cannot be changed.
      There is little need to change this number and doing so is a lot
      more difficult as it requires reallocating more things.
      So leave it out for now.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      5cf00fcd
    • NeilBrown's avatar
      md/raid5: allow for change in data_offset while managing a reshape. · b5254dd5
      NeilBrown authored
      The important issue here is incorporating the different in data_offset
      into calculations concerning when we might need to over-write data
      that is still thought to be valid.
      
      To this end we find the minimum offset difference across all devices
      and add that where appropriate.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      b5254dd5
    • NeilBrown's avatar
      md/raid5: Use correct data_offset for all IO. · 05616be5
      NeilBrown authored
      As there can now be two different data_offsets - an 'old' and
      a 'new' - we need to carefully choose between them.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      05616be5
    • NeilBrown's avatar
      md: add possibility to change data-offset for devices. · c6563a8c
      NeilBrown authored
      When reshaping we can avoid costly intermediate backup by
      changing the 'start' address of the array on the device
      (if there is enough room).
      
      So as a first step, allow such a change to be requested
      through sysfs, and recorded in v1.x metadata.
      
      (As we didn't previous check that all 'pad' fields were zero,
       we need a new FEATURE flag for this.
       A (belatedly) check that all remaining 'pad' fields are
       zero to avoid a repeat of this)
      
      The new data offset must be requested separately for each device.
      This allows each to have a different change in the data offset.
      This is not likely to be used often but as data_offset can be
      set per-device, new_data_offset should be too.
      
      This patch also removes the 'acknowledged' arg to rdev_set_badblocks as
      it is never used and never will be.  At the same time we add a new
      arg ('in_new') which is currently always zero but will be used more
      soon.
      
      When a reshape finishes we will need to update the data_offset
      and rdev->sectors.  So provide an exported function to do that.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      c6563a8c
    • NeilBrown's avatar
      md: allow a reshape operation to be reversed. · 2c810cdd
      NeilBrown authored
      Currently a reshape operation always progresses from the start
      of the array to the end unless the number of devices is being
      reduced, in which case it progressed in the opposite direction.
      
      To reverse a partial reshape which changes the number of devices
      you can stop the array and re-assemble with the raid-disks numbers
      reversed and it will undo.
      
      However for a reshape that does not change the number of devices
      it is not possible to reverse the reshape in the middle - you have to
      wait until it completes.
      
      So add a 'reshape_direction' attribute with is either 'forwards' or
      'backwards' and can be explicitly set when delta_disks is zero.
      
      This will become more important when we allow the data_offset to
      change in a reshape.  Then the explicit statement of what direction is
      being used will be more useful.
      
      This can be enabled in raid5 trivially as it already supports
      reverse reshape and just needs to use a different trigger to request it.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      2c810cdd
    • Shaohua Li's avatar
      md: using GFP_NOIO to allocate bio for flush request · b5e1b8ce
      Shaohua Li authored
      A flush request is usually issued in transaction commit code path, so
      using GFP_KERNEL to allocate memory for flush request bio falls into
      the classic deadlock issue.
      
      This is suitable for any -stable kernel to which it applies as it
      avoids a possible deadlock.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarShaohua Li <shli@fusionio.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      b5e1b8ce
  3. 18 May, 2012 1 commit
    • NeilBrown's avatar
      md/raid10: fix transcription error in calc_sectors conversion. · b0d634d5
      NeilBrown authored
      The old code was
      		sector_div(stride, fc);
      the new code was
      		sector_dir(size, conf->near_copies);
      
      'size' is right (the stride various wasn't really needed), but
      'fc' means 'far_copies', and that is an important difference.
      
      Signed-off-by: NeilBrown <neilb@suse.de>       
      b0d634d5
  4. 17 May, 2012 2 commits
    • Jonathan Brassow's avatar
      MD: Add del_timer_sync to mddev_suspend (fix nasty panic) · 0d9f4f13
      Jonathan Brassow authored
      Use del_timer_sync to remove timer before mddev_suspend finishes.
      
      We don't want a timer going off after an mddev_suspend is called.  This is
      especially true with device-mapper, since it can call the destructor function
      immediately following a suspend.  This results in the removal (kfree) of the
      structures upon which the timer depends - resulting in a very ugly panic.
      Therefore, we add a del_timer_sync to mddev_suspend to prevent this.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      0d9f4f13
    • NeilBrown's avatar
      md/raid10: set dev_sectors properly when resizing devices in array. · 6508fdbf
      NeilBrown authored
      raid10 stores dev_sectors in 'conf' separately from the one in
      'mddev' because it can have a very significant effect on block
      addressing and so need to be updated carefully.
      
      However raid10_resize isn't updating it at all!
      
      To update it correctly, we need to make sure it is a proper
      multiple of the chunksize taking various details of the layout
      in to account.
      This calculation is currently done in setup_conf.   So split it
      out from there and call it from raid10_resize as well.
      Then set conf->dev_sectors properly.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      6508fdbf
  5. 04 May, 2012 1 commit
  6. 29 Apr, 2012 5 commits
    • Linus Torvalds's avatar
      Linux 3.4-rc5 · 69964ea4
      Linus Torvalds authored
      69964ea4
    • Linus Torvalds's avatar
      Merge tag 'pm-for-3.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 6cfdd02b
      Linus Torvalds authored
      Pull power management fixes from Rafael J. Wysocki:
       "Fix for an issue causing hibernation to hang on systems with highmem
        (that practically means i386) due to broken memory management (bug
        introduced in 3.2, so -stable material) and PM documentation update
        making the freezer documentation follow the code again after some
        recent updates."
      
      * tag 'pm-for-3.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        PM / Freezer / Docs: Update documentation about freezing of tasks
        PM / Hibernate: fix the number of pages used for hibernate/thaw buffering
      6cfdd02b
    • Linus Torvalds's avatar
      autofs: make the autofsv5 packet file descriptor use a packetized pipe · 64f371bc
      Linus Torvalds authored
      The autofs packet size has had a very unfortunate size problem on x86:
      because the alignment of 'u64' differs in 32-bit and 64-bit modes, and
      because the packet data was not 8-byte aligned, the size of the autofsv5
      packet structure differed between 32-bit and 64-bit modes despite
      looking otherwise identical (300 vs 304 bytes respectively).
      
      We first fixed that up by making the 64-bit compat mode know about this
      problem in commit a32744d4 ("autofs: work around unhappy compat
      problem on x86-64"), and that made a 32-bit 'systemd' work happily on a
      64-bit kernel because everything then worked the same way as on a 32-bit
      kernel.
      
      But it turned out that 'automount' had actually known and worked around
      this problem in user space, so fixing the kernel to do the proper 32-bit
      compatibility handling actually *broke* 32-bit automount on a 64-bit
      kernel, because it knew that the packet sizes were wrong and expected
      those incorrect sizes.
      
      As a result, we ended up reverting that compatibility mode fix, and
      thus breaking systemd again, in commit fcbf94b9.
      
      With both automount and systemd doing a single read() system call, and
      verifying that they get *exactly* the size they expect but using
      different sizes, it seemed that fixing one of them inevitably seemed to
      break the other.  At one point, a patch I seriously considered applying
      from Michael Tokarev did a "strcmp()" to see if it was automount that
      was doing the operation.  Ugly, ugly.
      
      However, a prettier solution exists now thanks to the packetized pipe
      mode.  By marking the communication pipe as being packetized (by simply
      setting the O_DIRECT flag), we can always just write the bigger packet
      size, and if user-space does a smaller read, it will just get that
      partial end result and the extra alignment padding will simply be thrown
      away.
      
      This makes both automount and systemd happy, since they now get the size
      they asked for, and the kernel side of autofs simply no longer needs to
      care - it could pad out the packet arbitrarily.
      
      Of course, if there is some *other* user of autofs (please, please,
      please tell me it ain't so - and we haven't heard of any) that tries to
      read the packets with multiple writes, that other user will now be
      broken - the whole point of the packetized mode is that one system call
      gets exactly one packet, and you cannot read a packet in pieces.
      Tested-by: default avatarMichael Tokarev <mjt@tls.msk.ru>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: David Miller <davem@davemloft.net>
      Cc: Ian Kent <raven@themaw.net>
      Cc: Thomas Meyer <thomas@m3y3r.de>
      Cc: stable@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      64f371bc
    • Marcos Paulo de Souza's avatar
      PM / Freezer / Docs: Update documentation about freezing of tasks · 26e0f90f
      Marcos Paulo de Souza authored
      The file Documentation/power/freezing-of-tasks.txt was still referencing
      the TIF_FREEZE flag, that was removed by the commit
      d88e4cb6(freezer: remove now unused
      TIF_FREEZE).
      
      This patch removes all the references of TIF_FREEZE that were left
      behind.
      Signed-off-by: default avatarMarcos Paulo de Souza <marcos.souza.org@gmail.com>
      Signed-off-by: default avatarSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      26e0f90f
    • Linus Torvalds's avatar
      pipes: add a "packetized pipe" mode for writing · 9883035a
      Linus Torvalds authored
      The actual internal pipe implementation is already really about
      individual packets (called "pipe buffers"), and this simply exposes that
      as a special packetized mode.
      
      When we are in the packetized mode (marked by O_DIRECT as suggested by
      Alan Cox), a write() on a pipe will not merge the new data with previous
      writes, so each write will get a pipe buffer of its own.  The pipe
      buffer is then marked with the PIPE_BUF_FLAG_PACKET flag, which in turn
      will tell the reader side to break the read at that boundary (and throw
      away any partial packet contents that do not fit in the read buffer).
      
      End result: as long as you do writes less than PIPE_BUF in size (so that
      the pipe doesn't have to split them up), you can now treat the pipe as a
      packet interface, where each read() system call will read one packet at
      a time.  You can just use a sufficiently big read buffer (PIPE_BUF is
      sufficient, since bigger than that doesn't guarantee atomicity anyway),
      and the return value of the read() will naturally give you the size of
      the packet.
      
      NOTE! We do not support zero-sized packets, and zero-sized reads and
      writes to a pipe continue to be no-ops.  Also note that big packets will
      currently be split at write time, but that the size at which that
      happens is not really specified (except that it's bigger than PIPE_BUF).
      Currently that limit is the system page size, but we might want to
      explicitly support bigger packets some day.
      
      The main user for this is going to be the autofs packet interface,
      allowing us to stop having to care so deeply about exact packet sizes
      (which have had bugs with 32/64-bit compatibility modes).  But user
      space can create packetized pipes with "pipe2(fd, O_DIRECT)", which will
      fail with an EINVAL on kernels that do not support this interface.
      Tested-by: default avatarMichael Tokarev <mjt@tls.msk.ru>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: David Miller <davem@davemloft.net>
      Cc: Ian Kent <raven@themaw.net>
      Cc: Thomas Meyer <thomas@m3y3r.de>
      Cc: stable@kernel.org  # needed for systemd/autofs interaction fix
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9883035a