- 21 May, 2010 3 commits
-
-
Jens Axboe authored
Commit 69b62d01 fixed up most of the places where we would enter busy schedule() spins when disabling the periodic background writeback. This fixes up the sb timer so that it doesn't get hammered on with the delay disabled, and ensures that it gets rearmed if needed when /proc/sys/vm/dirty_writeback_centisecs gets modified. bdi_forker_task() also needs to check for !dirty_writeback_centisecs and use schedule() appropriately, fix that up too. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
Calling schedule without setting the task state to non-running will return immediately, so ensure that we set it properly and check our sleep conditions after doing so. This is a fixup for commit 69b62d01. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
Even if the writeout itself isn't a data integrity operation, we need to ensure that the caller doesn't drop the sb umount sem before we have actually done the writeback. This is a fixup for commit e913fc82. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
- 18 May, 2010 6 commits
-
-
Julia Lawall authored
Use kzalloc rather than the combination of kmalloc and memset. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression x,size,flags; statement S; @@ -x = kmalloc(size,flags); +x = kzalloc(size,flags); if (x == NULL) S -memset(x, 0, size); // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
The choice was to either delay creation of the new UUID until IO got thawed or to delay it until the first IO request. Both are correct, the later is more friendly to users of dual-primary setups, that actually only write on one side. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
If we detect late (= after grabing mdev->req_lock) that IO got frozen, we return 1 to generic_make_request(), which simply will retry to make a request for that bio. In the subsequent call of generic_make_request() into drbd_make_request_26() we sleep in inc_ap_bio(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Now that the peer may handle multi-bio EEs, we can ignore the peer's limit, and concentrate on the limits of the local IO stack. This is safe accross drbd protocol versions, as our queue_max_sectors() will be adjusted accordingly. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
this should allow for better background resync performance. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
This should allow for better performance if the lower level IO stack of the peers differs in limits exposed either via the queue, or via some merge_bvec_fn. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
- 17 May, 2010 31 commits
-
-
Lars Ellenberg authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
* Only send delay_probes with protocol 93 or newer * drbd_send_delay_probes() is called only from worker context, no atomic_t needed for delay_seq Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
* Mention P_DELAY_PROBE in the packet naming array * Do not corrupt the mdev->data.work list in case the timer goes off before delay_probe_work got handled by the worker * Do not mod_timer() twice for a single delay_probe pair Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
In a setup with a high bandwidth and high latency network, eventually involving deep queues in routers, it is beneficial to only fill those queues up to an limited extend with resync data. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
To reasonably control resync speed over drbd-proxy connections, drbd has to measure the current delay of packets transmitted over the (possibly congested) data socket vs the meta-data socket. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Delay_probes are new packets in the DRBD protocol, which allow DRBD to know the current delay packets have on the data socket. (relative to the meta data socket) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
The "surplus" bits of the old (smaller) bitmap must be clean in case of online-grow without resync. Note: Reverted 67ae8b80d4a116ab3b7094eb3723506b20c06dff as well, since the lines added by this patch are redundant. The bits get set by the bm_set_surplus(b) call before that. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Adam Gandelman authored
Some wish to be notified of all instances of split brain, not just those that go unresolved. The initial-split-brain handler is called to notify someone upon detection of all split brain conditions even if auto-recovery policies are configured. Signed-off-by: Adam Gandelman <adam.gandelman@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
The condition does not fit the commend (I may well be Primary, even if I lost the disk earlier and now the connection). And this is catched below anyways, where it also gets logged. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Even if it should never happen if the peer does behave, we need to double check, and not even attempt access beyond end of device. It usually would be caught by lower layers, resulting in "IO error", but may also end up in the internal meta data area. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
In case both nodes are "inconsistent", invalidate would have started a resync anyways, without a chance to ever succeed, just filling the logs with warning messages. Simply disallow that state change, re-using the SS_NO_UP_TO_DATE_DISK return value. This also changes the corresponding error string to "Need access to UpToDate Data" -- I found the "Refusing to be Primary without at least one UpToDate disk" answer misleading in some situations anyways. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Don't forget to drain the digest in case we cannot satisfy a checksum based resync or online-verify request. It would additionally cause a protocoll error, dropping the connection. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
block_id may be ID_SYNCER, as well as checksum based resync request magic, or online verify magic. Let's just drop that ASSERT. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
commit e4f925e1 Author: Philipp Reisner <philipp.reisner@linbit.com> Date: Wed Mar 17 14:18:41 2010 +0100 drbd: Do not upgrade state to Outdated if already Inconsistent prevented the necessary state transition for attaching while connected (Diskless -> Consistent respectively Outdated). This is the fix for the fix. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
There was a race condition: In a situation with a SyncSource+Primary and a SyncTarget+Secondary node, and a resync dependency to some other device. After both nodes decided to do the resync, the other device finishes its resync process. At that time SyncSource already sent the P_SYNC_UUID packet, and already updated its peer disk state to Inconsistent. The SyncTarget node waits for the P_SYNC_UUID and sends a state packet to report the resync dependency change. That packet still carries a disk state of Outdated. Impact: If application writes come in, during that time on the Primary node, those do not get replicated, and the out-of-sync counter gets increased. => The completion of resync is not detected on the primary node. => stalled. Those blocks get resync'ed with the next resync, since the are get marked as out-of-sync in the bitmap. In order to fix this, we filter out that wrong state change in the sanitize_state() function. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
To document that we know about deprecation of proc_create, even though we are not affected, as we don't use the ->data member, open code proc_create_data(..., NULL); Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Dmitry Monakhov authored
Filesystems with delalloc support may dirty inode during writepages. As result inode will have dirty metadata flags even after write_inode. In fact we have two dedicated functions for proper data and metadata writeback. It is reasonable to separate flags updates in two stages. https://bugzilla.kernel.org/show_bug.cgi?id=15906Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
When umount calls sync_filesystem(), we first do a WB_SYNC_NONE writeback to kick off writeback of pending dirty inodes, then follow that up with a WB_SYNC_ALL to wait for it. Since umount already holds the sb s_umount mutex, WB_SYNC_NONE ends up doing nothing and all writeback happens as WB_SYNC_ALL. This can greatly slow down umount, since WB_SYNC_ALL writeback is a data integrity operation and thus a bigger hammer than simple WB_SYNC_NONE. For barrier aware file systems it's a lot slower. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
Prior to 2.6.32, setting /proc/sys/vm/dirty_writeback_centisecs disabled periodic dirty writeback from kupdate. This got broken and now causes excessive sys CPU usage if set to zero, as we'll keep beating on schedule(). Cc: stable@kernel.org Reported-by: Justin Maggard <jmaggard10@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-