Commit 9fa48269 authored by Lars Ellenberg's avatar Lars Ellenberg Committed by Jens Axboe

drbd: prevent NULL pointer deref when resuming diskless primary

In a multiple error scenario, we may end up with a "frozen" Primary,
that has no access to any data (no local disk, no replication link).

If we then resume-io, we try to generate a new data generation id,
which will fail if there is no longer a local disk.

Double check for available local data,
which prevents the NULL pointer deref.

If we are diskless, turn the resume-io in this situation
into the first stage of a "force down", by bumping the "effective" data
gen id, which will prevent later attach or connect to the former data
set without first being demoted (deconfigured).
Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: default avatarLars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: default avatarJens Axboe <axboe@fb.com>
parent 668700b4
...@@ -2920,7 +2920,30 @@ int drbd_adm_resume_io(struct sk_buff *skb, struct genl_info *info) ...@@ -2920,7 +2920,30 @@ int drbd_adm_resume_io(struct sk_buff *skb, struct genl_info *info)
mutex_lock(&adm_ctx.resource->adm_mutex); mutex_lock(&adm_ctx.resource->adm_mutex);
device = adm_ctx.device; device = adm_ctx.device;
if (test_bit(NEW_CUR_UUID, &device->flags)) { if (test_bit(NEW_CUR_UUID, &device->flags)) {
if (get_ldev_if_state(device, D_ATTACHING)) {
drbd_uuid_new_current(device); drbd_uuid_new_current(device);
put_ldev(device);
} else {
/* This is effectively a multi-stage "forced down".
* The NEW_CUR_UUID bit is supposedly only set, if we
* lost the replication connection, and are configured
* to freeze IO and wait for some fence-peer handler.
* So we still don't have a replication connection.
* And now we don't have a local disk either. After
* resume, we will fail all pending and new IO, because
* we don't have any data anymore. Which means we will
* eventually be able to terminate all users of this
* device, and then take it down. By bumping the
* "effective" data uuid, we make sure that you really
* need to tear down before you reconfigure, we will
* the refuse to re-connect or re-attach (because no
* matching real data uuid exists).
*/
u64 val;
get_random_bytes(&val, sizeof(u64));
drbd_set_ed_uuid(device, val);
drbd_warn(device, "Resumed without access to data; please tear down before attempting to re-configure.\n");
}
clear_bit(NEW_CUR_UUID, &device->flags); clear_bit(NEW_CUR_UUID, &device->flags);
} }
drbd_suspend_io(device); drbd_suspend_io(device);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment