- 09 Nov, 2012 16 commits
-
-
Andreas Gruenbacher authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
If for some reason (typically "split-brained" cluster manager) drbd replica data has diverged, we can chose a victim, and reconnect using "--discard-my-data", causing the victim to become sync-target, fetching all changed blocks from the peer. If we are Primary, we are potentially in use, and we refuse to "roll back" changes to the data below the page cache and other users. Rename the error symbol for this to ERR_DISCARD_IMPOSSIBLE. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
We don't discard anything here, really. We resolve conflicting, concurrent writes to overlapping data blocks. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
To avoid confusion with REQ_DISCARD aka TRIM, rename our "discard concurrent write acks" from P_DISCARD_WRITE to P_SUPERSEDED. At the same time, rename the drbd request event DISCARD_WRITE to CONFLICT_RESOLVED. It already triggers both successful completion or restart of the request, depending on our RQ_POSTPONED flag. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Don't drop a request from the transfer log just because it was NEG_ACKED. We need it around to be able to verify P_BARRIER_ACKs against the transver log. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Almost all code paths calling start_new_tl_epoch() guarded it with if (... current_tle_writes > 0 ... ). Just move that inside start_new_tl_epoch(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Requests of an acked epoch are stored on the barrier_acked_requests list. In case the private bio of such a request completes while IO on the drbd device is suspended [req_mod(completed_ok)] then the request stays there. When thawing IO because the fence_peer handler returned, then we use tl_clear() to apply the connection_lost_while_pending event to all requests on the transfer-log and the barrier_acked_requests list. Up to now the connection_lost_while_pending event was not applied on requests on the barrier_acked_requests list. Fixed that. I.e. now the connection_lost_while_pending and resend events are applied to requests on the barrier_acked_requests list. For that it is necessary that the resend event finishes (local only) READS correctly. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
The DISCARD_CONCURRENT flag should be set on one node and cleared on the other node. As the code was before it was theoretical possible that a node accepts the meta socket, but has to close it later on, and keeps the DISCARD_CONCURRENT flag. Correct this by moving the clear_bit(DISCARD_CONCURRENT) where the packet gets sent. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
DRBD has a concept of request epochs or reorder-domains, which are separated on the wire by P_BARRIER packets. Older DRBD is not able to handle zero-sized requests at all, so we need to map empty flushes to these drbd barriers. These are the equivalent of empty flushes, and by default trigger flushes on the receiving side anyways (unless not supported or explicitly disabled), so there is no need to handle this differently in newer drbd either. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Since the listening socket is open all the time, it was possible to get into stable "initial packet S crossed" loops. * when both sides realize in the drbd_socket_okay() call at the end of the loop that the other side closed the main socket you had the chance to get into a stable loop with repeated "packet S crossed" messages. * when both sides do not realize with the drbd_socket_okay() call at the end of the loop that the other side closed the main socket you had the chance to get into a stable loop with alternating "packet S crossed" "packet M crossed" messages. In order to break out these stable loops randomize the behaviour if such a crossing of P_INITIAL_DATA or P_INITIAL_META packets is detected. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Since now our listening socket is open all the time we will get connection tries of the peer always in. No need to try it three times. This is valid when connecting to older peers as well, it simply increases the probability that the new version DRBD will accept a connection instead that it will establish one. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Since the drbd_socket_okay() function itself tests if the the socket is NULL, the explicit test "if (sock.socket && &msock.socket)" was redundent. Apart from that the address opperator ('&') before msock.socket rendered the test pointless. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Marek authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
In 8.4, we may have bios spanning two activity log extents. Fixup drbd_al_begin_io() and drbd_al_complete_io() to deal with zero sized bios. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
We now can schedule only a specific range of sectors for online verify, or interrupt a running verify without interrupting the connection. Had to bump the protocol version differently, we are now 101. Added verify_can_do_stop_sector() { protocol >= 97 && protocol != 100; } Also, the return value convention for worker callbacks has changed, we returned "true/false" for "keep the connection up" in 8.3, we return 0 for success and <= for failure in 8.4. Affected: receive_state() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
- 08 Nov, 2012 24 commits
-
-
Lars Ellenberg authored
If you do back to back wait-sync/invalidate on a Primary in a tight loop, during application IO load, you could trigger a race: kernel: block drbd6: FIXME going to queue 'set_n_write from StartingSync' but 'write from resync_finished' still pending? Fix this by changing the order of the drbd_queue_work() and the wake_up() in dec_ap_pending(), and adding the additional drbd_flush_workqueue() before requesting the full sync. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
In case we want to hard-reset from the local-io-error handler, we need to call it before notifying the peer or aborting local IO. Otherwise the peer will advance its data generation UUIDs even if secondary. This way, local io error looks like a "regular" node crash, which reduces the number of different failure cases. This may be useful in a bigger picture where crashed or otherwise "misbehaving" nodes are automatically re-deployed. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Fix asserts like block drbd0: in got_BlockAck:4634: rs_pending_cnt = -35 < 0 ! We reset the resync lru cache and related information (rs_pending_cnt), once we successfully finished a resync or online verify, or if the replication connection is lost. We also need to reset it if a resync or online verify is aborted because a lower level disk failed. In that case the replication link is still established, and we may still have packets queued in the network buffers which want to touch rs_pending_cnt. We do not have any synchronization mechanism to know for sure when all such pending resync related packets have been drained. To avoid this counter to go negative (and violate the ASSERT that it will always be >= 0), just do not reset it when we lose a disk. It is good enough to make sure it is re-initialized before the next resync can start: reset it when we re-attach a disk. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
We cache the congestion status in mdev->congestion_reason whenever drbd_congested() was called. Reset this cached info before reporting it when reading /proc/drbd. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
If the drbd worker thread is synchronously waiting for some userland callback, we don't want some casual pageout to block on us. Have drbd_congested() report congestion in that case. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Aborting local requests (not waiting for completion from the lower level disk) is dangerous: if the master bio has been completed to upper layers, data pages may be re-used for other things already. If local IO is still pending and later completes, this may cause crashes or corrupt unrelated data. Only abort local IO if explicitly requested. Intended use case is a lower level device that turned into a tarpit, not completing io requests, not even doing error completion. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
The two unused "global flags" in 8.3 are "per volume" flags in 8.4. Still, they are unused, so lose them. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
We must not look at mdev->actlog, unless we have a get_ldev() reference. It also does not make much sense to try to disconnect or pull-ahead of the peer, if we don't have good local data. Only even consider congestion policies, if our local disk is D_UP_TO_DATE. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
drbd_adm_down() does adm_detach(), which can fail with various error codes, or be interrupted by a signal. The interrupted by signal case was not properly handled, leading to block drbd0: ASSERT( mdev->state.disk == D_DISKLESS && mdev->state.conn == C_STANDALONE ) in drbd/drbd_worker.c and further to destroying objects while still in use, and resulting crashes. Detect the interruption, and take the error path out. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Sometimes, a lower level block device turns into a tar-pit, not completing requests at all, not even doing error completion. We can force-detach from such a tar-pit block device, either by disk-timeout, or by drbdadm detach --force. Queueing for retry only from the request destruction path (kref hit 0) makes it impossible to retry affected read requests from the peer, until the local IO completion happened, as the locally submitted bio holds a reference on the drbd request object. If we can only complete READs when the local completion finally happens, we would not need to force-detach in the first place. Instead, queue for retry where we otherwise had done the error completion. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
cherry-picked and adapted from drbd 9 devel branch This looks cleaner to me, and also gets rid of the other ugly if-inside-case-fall-through. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
cherry-picked and adapted from drbd 9 devel branch The logic for when to get or put a reference is in mod_rq_state(). To not get confused in the freeze/thaw respectively resend/restart paths, or when cleaning up requests waiting for P_BARRIER_ACK, this also introduces additional state flags: RQ_COMPLETION_SUSP, and RQ_EXP_BARR_ACK. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
cherry-picked and adapted from drbd 9 devel branch completion_ref will count pending events necessary for completion. kref is for destruction. This only introduces these new members of struct drbd_request, a followup patch will make actual use of them. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
The previous commit causes __drbd_make_request() to always return 0. Change it to void. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
cherry-picked and adapted from drbd 9 devel branch READs will be interesting to at most one connection, WRITEs should be interesting for all established connections. Introduce some helper functions to hopefully make this easier to follow. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
cherry-picked and adapted from drbd 9 devel branch DRBD requests (struct drbd_request) are already on the per resource transfer log list, and carry their epoch number. We do not need to additionally link them on other ring lists in other structs. The drbd sender thread can recognize itself when to send a P_BARRIER, by tracking the currently processed epoch, and how many writes have been processed for that epoch. If the epoch of the request to be processed does not match the currently processed epoch, any writes have been processed in it, a P_BARRIER for this last processed epoch is send out first. The new epoch then becomes the currently processed epoch. To not get stuck in drbd_al_begin_io() waiting for P_BARRIER_ACK, the sender thread also needs to handle the case when the current epoch was closed already, but no new requests are queued yet, and send out P_BARRIER as soon as possible. This is done by comparing the per resource "current transfer log epoch" (tconn->current_tle_nr) with the per connection "currently processed epoch number" (tconn->send.current_epoch_nr), while waiting for new requests to be processed in wait_for_work(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
cherry-picked and adapted from drbd 9 devel branch In 8.4, we don't distinguish between "resource work" and "connection work" yet, we have one worker for both, as we still have only one connection. We only ever used the "data.work", no need to keep the "meta.work" around. Move tconn->data.work to tconn->sender_work. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
cherry-picked and adapted from drbd 9 devel branch In 8.4, we still use drbd_queue_work_front(), so in normal operation, we can not dequeue batches, but only single items. Still, followup commits will wake the worker without explicitly queueing a work item, so up() is replaced by a simple wake_up(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
cherry-picked from drbd 9 devel branch. In preparation of multiple connections, the "barrier number" or "epoch number" needs to be tracked per-resource, not per connection. The sequence number space will not be reset anymore. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Meanwhile, this is used to restart failed READ requests as well. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
The commit drbd: simplify retry path of failed READ requests simplified it too much: it just did not do anything for local read errors. Add the missing req_may_be_completed_not_susp() to the READ_COMPLETED_WITH_ERROR case. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
BUG: unable to handle kernel NULL pointer dereference at (null) ... [<d1e17561>] ? _drbd_bm_set_bits+0x151/0x240 [drbd] [<d1e236f8>] ? receive_bitmap+0x4f8/0xbc0 [drbd] This fixes an off-by-one error in the receive_bitmap() path, if run-length encoded bitmap transfer is enabled. If the bitmap is an exact multiple of PAGE_SIZE, which means the visible capacity of the drbd device is an exact multiple of 128 MiB (for 4k page size), and bitmap compression (use-rle) is enabled (which became default with 8.4), and the very last bit is dirty and reported in an rle comressed bitmap packet, we ended up trying to kmap_atomic a page pointer that does not exist (bitmap->bm_pages[last index + 1]). bug introduced by: Date: Fri Jul 24 15:33:24 2009 +0200 set bits: optimize for complete last word, fix off-by-one-word corner case made effective by: Date: Thu Dec 16 00:32:38 2010 +0100 drbd: get rid of unused debug code Long time ago, we had paranoia code in the bitmap that allocated one extra word, assigned a magic value, and checked on every occasion that the magic value was still unchanged. That debug code is unused, the extra long word complicates code a bit. Get rid of it. No-one triggered this bug in the last few years, because a large subset of our userbase is unaffected: * typically the last few blocks of a device are not modified frequently, and remain unset * use-rle was disabled by default in drbd < 8.4 * those with slightly "odd" device sizes, or * drbd internal meta data (which will skew the device size slightly, thus makes it harder to have a bug relevant device size) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-