Commits · 6423104b6a1e6f0c18be60e8c33f02d263331d5e · nexedi / linux

21 May, 2010 3 commits

writeback: fixups for !dirty_writeback_centisecs · 6423104b

Jens Axboe authored May 21, 2010

Commit 69b62d01 fixed up most of the places where we would enter
busy schedule() spins when disabling the periodic background
writeback. This fixes up the sb timer so that it doesn't get
hammered on with the delay disabled, and ensures that it gets
rearmed if needed when /proc/sys/vm/dirty_writeback_centisecs
gets modified.

bdi_forker_task() also needs to check for !dirty_writeback_centisecs
and use schedule() appropriately, fix that up too.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

6423104b

writeback: bdi_writeback_task() must set task state before calling schedule() · f9eadbbd

Jens Axboe authored May 18, 2010

Calling schedule without setting the task state to non-running will
return immediately, so ensure that we set it properly and check our
sleep conditions after doing so.

This is a fixup for commit 69b62d01.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

f9eadbbd

writeback: ensure that WB_SYNC_NONE writeback with sb pinned is sync · 7c8a3554

Jens Axboe authored May 18, 2010

Even if the writeout itself isn't a data integrity operation, we need
to ensure that the caller doesn't drop the sb umount sem before we
have actually done the writeback.

This is a fixup for commit e913fc82.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

7c8a3554

18 May, 2010 6 commits

drivers/block/drbd: Use kzalloc · 2db4e42e

Julia Lawall authored May 13, 2010

Use kzalloc rather than the combination of kmalloc and memset.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression x,size,flags;
statement S;
@@

-x = kmalloc(size,flags);
+x = kzalloc(size,flags);
 if (x == NULL) S
-memset(x, 0, size);
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

2db4e42e

drbd: Create new current UUID as late as possible · 0c3f3451

Philipp Reisner authored May 17, 2010

The choice was to either delay creation of the new UUID until
IO got thawed or to delay it until the first IO request.

Both are correct, the later is more friendly to users of
dual-primary setups, that actually only write on one side.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

0c3f3451

drbd: If we detect late that IO got frozen, retry after we thawed. · 9a25a04c

Philipp Reisner authored May 10, 2010

If we detect late (= after grabing mdev->req_lock) that IO got frozen, we
return 1 to generic_make_request(), which simply will retry to make a
request for that bio.

In the subsequent call of generic_make_request() into drbd_make_request_26()
we sleep in inc_ap_bio().
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

9a25a04c

drbd: always use_bmbv, ignore setting · a1c88d0d

Lars Ellenberg authored May 14, 2010

Now that the peer may handle multi-bio EEs,
we can ignore the peer's limit,
and concentrate on the limits of the local IO stack.

This is safe accross drbd protocol versions,
as our queue_max_sectors() will be adjusted accordingly.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

a1c88d0d

drbd: allow resync requests to be larger than max_segment_size · bb3d000c

Lars Ellenberg authored May 14, 2010

this should allow for better background resync performance.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

bb3d000c

drbd: Allow drbd_epoch_entries to use multiple bios. · 45bb912b

Lars Ellenberg authored May 14, 2010

This should allow for better performance if the lower level IO stack
of the peers differs in limits exposed either via the queue,
or via some merge_bvec_fn.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

45bb912b

17 May, 2010 31 commits

drbd: reduce sizeof struct drbd_epoch_entry by 8 byte by aligning members · 708d740e
Lars Ellenberg authored May 03, 2010
```
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
```
708d740e

drbd: Fixes to the new delay_probes code · 162f3ec7

Philipp Reisner authored May 06, 2010

* Only send delay_probes with protocol 93 or newer
* drbd_send_delay_probes() is called only from worker context,
  no atomic_t needed for delay_seq
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

162f3ec7

drbd: A fixes to the new resync speed code · a8cdfd8d

Philipp Reisner authored May 05, 2010

* Mention P_DELAY_PROBE in the packet naming array
* Do not corrupt the mdev->data.work list in case the timer goes
  off before delay_probe_work got handled by the worker
* Do not mod_timer() twice for a single delay_probe pair
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

a8cdfd8d

drbd: Proc bits of new resync speed stuff · eedf386a

Philipp Reisner authored May 04, 2010

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

eedf386a

drbd: Control the actual resync rate based on the queuing delay of data packets · cdd67a74

Philipp Reisner authored May 04, 2010

In a setup with a high bandwidth and high latency network, eventually
involving deep queues in routers, it is beneficial to only fill those
queues up to an limited extend with resync data.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

cdd67a74

drbd: Actually send delay probes · bd26bfc5

Philipp Reisner authored May 04, 2010

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

bd26bfc5

drbd: Four new configuration settings for resync speed control · 67c7ddd0

Philipp Reisner authored May 04, 2010

To reasonably control resync speed over drbd-proxy connections,
drbd has to measure the current delay of packets transmitted over
the (possibly congested) data socket vs the meta-data socket.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

67c7ddd0

drbd: Sending of delay_probes · 7237bc43

Philipp Reisner authored May 03, 2010

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

7237bc43

drbd: Receiving of delay_probes · 0ced55a3

Philipp Reisner authored Apr 30, 2010

Delay_probes are new packets in the DRBD protocol, which allow
DRBD to know the current delay packets have on the data socket.
(relative to the meta data socket)
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

0ced55a3

drbd: Fixed bitmap in case of online-grow without resync · 5223671b

Philipp Reisner authored Apr 28, 2010

The "surplus" bits of the old (smaller) bitmap must be clean
in case of online-grow without resync.

Note: Reverted 67ae8b80d4a116ab3b7094eb3723506b20c06dff as
well, since the lines added by this patch are redundant. The
bits get set by the bm_set_surplus(b) call before that.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

5223671b

drbd: Added transmission faults to the fault injection code · 6b4388ac

Philipp Reisner authored Apr 26, 2010

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

6b4388ac

drbd: bugfix: Make resize work, if remote's size was limiting and increased in the meantime · 087c2492
Philipp Reisner authored Mar 26, 2010
```
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
```
087c2492

drbd: Implemented the --assume-clean option for drbdsetup resize · 6495d2c6

Philipp Reisner authored Mar 24, 2010

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

6495d2c6

drbd: Added some missing statics · b4ee79da

Philipp Reisner authored Apr 01, 2010

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

b4ee79da

drbd: Make sure to resync all of the new storage upon online resize · fd76438c

Philipp Reisner authored Apr 01, 2010

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

fd76438c

drbd: Implemented flags for the resize packet · e89b591c

Philipp Reisner authored Mar 24, 2010

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

e89b591c

drbd: Implemented the set_new_bits parameter for drbd_bm_resize() · 02d9a94b

Philipp Reisner authored Mar 24, 2010

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

02d9a94b

drbd: made determin_dev_size's parameter an flag enum · d845030f

Philipp Reisner authored Mar 24, 2010

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

d845030f

drbd: New handler: initial-split-brain · 3a11a487

Adam Gandelman authored Apr 08, 2010

Some wish to be notified of all instances of split brain, not just those that
go unresolved.  The initial-split-brain handler is called to notify someone
upon  detection of all split brain conditions even if auto-recovery policies
are configured.
Signed-off-by: Adam Gandelman <adam.gandelman@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

3a11a487

drbd: fail_requests_early: remove incorrect and unnecessary optimization · 979f5c7f

Lars Ellenberg authored Apr 06, 2010

The condition does not fit the commend (I may well be Primary,
even if I lost the disk earlier and now the connection).

And this is catched below anyways, where it also gets logged.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

979f5c7f

drbd: check for corrupt or malicous sector addresses when receiving data · 6666032a

Lars Ellenberg authored Apr 06, 2010

Even if it should never happen if the peer does behave, we need to
double check, and not even attempt access beyond end of device.
It usually would be caught by lower layers, resulting in "IO error",
but may also end up in the internal meta data area.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

6666032a

drbd: cleanup: This code path to trigger a resync is no longer needed · c3fe30b0
Philipp Reisner authored Apr 01, 2010
```
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
```
c3fe30b0

drbd: don't start a resync without access to up-to-date Data · 8d4ce82b

Lars Ellenberg authored Apr 01, 2010

In case both nodes are "inconsistent", invalidate would
have started a resync anyways, without a chance to ever
succeed, just filling the logs with warning messages.

Simply disallow that state change,
re-using the SS_NO_UP_TO_DATE_DISK return value.

This also changes the corresponding error string to
"Need access to UpToDate Data" -- I found the
"Refusing to be Primary without at least one UpToDate disk"
answer misleading in some situations anyways.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

8d4ce82b

drbd: fix potential protocol error · c3470cde

Lars Ellenberg authored Apr 01, 2010

Don't forget to drain the digest in case we cannot satisfy a
checksum based resync or online-verify request.

It would additionally cause a protocoll error,
dropping the connection.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

c3470cde

drbd: remove bogus ASSERT · 8d1894eb

Lars Ellenberg authored Apr 01, 2010

block_id may be ID_SYNCER,
as well as checksum based resync request magic, or online verify magic.

Let's just drop that ASSERT.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

8d1894eb

drbd: fix regression: attach while connected failed · e0f83012

Lars Ellenberg authored Apr 01, 2010

commit e4f925e1
Author: Philipp Reisner <philipp.reisner@linbit.com>
Date:   Wed Mar 17 14:18:41 2010 +0100

    drbd: Do not upgrade state to Outdated if already Inconsistent

prevented the necessary state transition for attaching while connected
(Diskless -> Consistent respectively Outdated).
This is the fix for the fix.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

e0f83012

drbd: Do not upgrade state to Outdated if already Inconsistent [Bugz 277] · e4f925e1

Philipp Reisner authored Mar 17, 2010

There was a race condition:
  In a situation with a SyncSource+Primary and a SyncTarget+Secondary node,
  and a resync dependency to some other device. After both nodes decided
  to do the resync, the other device finishes its resync process.
  At that time SyncSource already sent the P_SYNC_UUID packet, and
  already updated its peer disk state to Inconsistent.
  The SyncTarget node waits for the P_SYNC_UUID and sends a state packet
  to report the resync dependency change. That packet still carries
  a disk state of Outdated.

Impact:
  If application writes come in, during that time on the Primary node,
  those do not get replicated, and the out-of-sync counter gets increased.
  => The completion of resync is not detected on the primary node.
  => stalled.
  Those blocks get resync'ed with the next resync, since the are get
  marked as out-of-sync in the bitmap.

In order to fix this, we filter out that wrong state change in the
sanitize_state() function.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

e4f925e1

drbd: use proc_create_data with explicit NULL argument · 8c484ee4

Lars Ellenberg authored Mar 11, 2010

To document that we know about deprecation of proc_create,
even though we are not affected, as we don't use the ->data member,
open code proc_create_data(..., NULL);
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

8c484ee4

writeback: Update dirty flags in two steps · 5547e8aa

Dmitry Monakhov authored May 07, 2010

Filesystems with delalloc support may dirty inode during writepages.
As result inode will have dirty metadata flags even after write_inode.
In fact we have two dedicated functions for proper data and metadata
writeback. It is reasonable to separate flags updates in two stages.

https://bugzilla.kernel.org/show_bug.cgi?id=15906Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

5547e8aa

writeback: fix WB_SYNC_NONE writeback from umount · e913fc82

Jens Axboe authored May 17, 2010

When umount calls sync_filesystem(), we first do a WB_SYNC_NONE
writeback to kick off writeback of pending dirty inodes, then follow
that up with a WB_SYNC_ALL to wait for it. Since umount already holds
the sb s_umount mutex, WB_SYNC_NONE ends up doing nothing and all
writeback happens as WB_SYNC_ALL. This can greatly slow down umount,
since WB_SYNC_ALL writeback is a data integrity operation and thus
a bigger hammer than simple WB_SYNC_NONE. For barrier aware file systems
it's a lot slower.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

e913fc82

writeback: disable periodic old data writeback for !dirty_writeback_centisecs · 69b62d01

Jens Axboe authored May 17, 2010

Prior to 2.6.32, setting /proc/sys/vm/dirty_writeback_centisecs disabled
periodic dirty writeback from kupdate. This got broken and now causes
excessive sys CPU usage if set to zero, as we'll keep beating on
schedule().

Cc: stable@kernel.org
Reported-by: Justin Maggard <jmaggard10@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

69b62d01