Commits · 91af8300d9c1d7c6b6a2fd754109e08d4798b8d8 · nexedi / linux

16 Oct, 2017 2 commits

bcache: check ca->alloc_thread initialized before wake up it · 91af8300

Coly Li authored Oct 13, 2017

In bcache code, sysfs entries are created before all resources get
allocated, e.g. allocation thread of a cache set.

There is posibility for NULL pointer deference if a resource is accessed
but which is not initialized yet. Indeed Jorg Bornschein catches one on
cache set allocation thread and gets a kernel oops.

The reason for this bug is, when bch_bucket_alloc() is called during
cache set registration and attaching, ca->alloc_thread is not properly
allocated and initialized yet, call wake_up_process() on ca->alloc_thread
triggers NULL pointer deference failure. A simple and fast fix is, before
waking up ca->alloc_thread, checking whether it is allocated, and only
wake up ca->alloc_thread when it is not NULL.
Signed-off-by: Coly Li <colyli@suse.de>
Reported-by: Jorg Bornschein <jb@capsec.org>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: stable@vger.kernel.org
Reviewed-by: Michael Lyle <mlyle@lyle.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

91af8300

bcache: Avoid nested function definition · 58f913dc

Peter Foley authored Oct 13, 2017

Fixes below error with clang:
../drivers/md/bcache/sysfs.c:759:3: error: function definition is not allowed here
                {       return *((uint16_t *) r) - *((uint16_t *) l); }
                ^
../drivers/md/bcache/sysfs.c:789:32: error: use of undeclared identifier 'cmp'
                sort(p, n, sizeof(uint16_t), cmp, NULL);
                                             ^
2 errors generated.

v2:
rename function to __bch_cache_cmp
Signed-off-by: Peter Foley <pefoley2@pefoley.com>
Reviewed-by: Coly Li <colyli@suse.de>
Reviewed-by: Michael Lyle <mlyle@lyle.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

58f913dc

14 Oct, 2017 1 commit

mm/page-writeback.c: make changes of dirty_writeback_centisecs take effect immediately · 515c24c1

Yafang Shao authored Oct 14, 2017

This patch is the followup of the prvious patch:
[writeback: schedule periodic writeback with sysctl].

There's another issue to fix.
For example,
- When the tunable was set to one hour and is reset to one second, the
  new setting will not take effect for up to one hour.

Kicking the flusher threads immediately fixes it.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

515c24c1

13 Oct, 2017 37 commits

null_blk: add usage hints for no_sched · fc186311

weiping zhang authored Oct 14, 2017

This parameter provide an option to disable io scheduler when nullb*
in multi-queue mode.
Signed-off-by: weiping zhang <zhangweiping@didichuxing.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

fc186311

null_blk: update usage hints for submit_queues · 23c4490d

weiping zhang authored Oct 14, 2017

update the range of submits_queues, and correct usage hints.
Signed-off-by: weiping zhang <zhangweiping@didichuxing.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

23c4490d

Revert "lightnvm: prevent bd removal if busy" · cdd094fd

Jens Axboe authored Oct 13, 2017

Christoph correctly points out that this issue is no different
for other block devices, and poking at cross layer internals
is not the right way to solve it.

This reverts commit bb6aa6f0.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

cdd094fd

lightnvm: implement generic path for sync I/O · 1a94b2d4

Javier González authored Oct 13, 2017

Implement a generic path for sending sync I/O on LightNVM. This allows
to reuse the standard synchronous path trough blk_execute_rq(), instead
of implementing a wait_for_completion on the target side (e.g., pblk).
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

1a94b2d4

lightnvm: fail fast on passthrough commands · 1b839187

Javier González authored Oct 13, 2017

Make LightNVM passhtrough commands fail fast. User space will then take
care of re-submitting.

Fixes: 84d4add7 ('lightnvm: add ioctls for vector I/Os')
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

1b839187

lightnvm: pblk: avoid being reported as hung on rated GC · 8da10cce

Javier González authored Oct 13, 2017

The amount of GC I/O on the write buffer is managed by the rate-limiter,
which is calculated as a function of the number of available free
blocks. When reaching the stable point, we risk having scheduled more
I/Os for GC than are allowed on the write buffer. This would result on
the GC semaphore balancing the outstanding read GC I/Os to be reported
as "hung", though the behavior is normal.

Solve this by allowing to schedule when we detect that the read GC path
is not moving forward.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

8da10cce

lightnvm: pblk: cleanup unused and static functions · 8bd40020

Javier González authored Oct 13, 2017

Cleanup up unused and static functions across the whole codebase.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

8bd40020

lightnvm: pblk: remove spinlock when freeing line metadata · 28bd1094

Hans Holmberg authored Oct 13, 2017

Lockdep complains about being in atomic context while freeing line
metadata - and rightly so as we take a spinlock and end up calling
vfree that might sleep(in pblk_mfree).

There is no need for holding the line manager free_lock while
freeing line metadata as the pipeline as stopped, so remove the lock.

Fixes: 588726d3 ("lightnvm: pblk: fail gracefully on irrec. error")
Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

28bd1094

lightnvm: pblk: correct valid lba count calculation · 03e868eb

Hans Holmberg authored Oct 13, 2017

During garbage collect, lbas being written can end up
being invalidated. Make sure that this is reflected in
the valid lba count.
Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

03e868eb

lightnvm: pblk: gc all lines in the pipeline before exit · d6b992f7

Hans Holmberg authored Oct 13, 2017

Finish garbage collect of the lines that are in the gc pipeline
before exiting. Ensure that all lines already in in the pipeline
goes through, from read to write.

Do this by keeping track of how many lines are in the pipeline
and waiting for that number to reach zero before exiting the gc
reader task.

Since we're adding a new gc line counter, change the name of
inflight_gc to read_inflight_gc to make the distinction clear.
Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

d6b992f7

lightnvm: pblk: add l2p crc debug printouts · c5586192

Hans Holmberg authored Oct 13, 2017

Print the CRC of the logical-to-physical mapping during exit and
after recovering the L2P table to facilitate detection of meta
data corruption/recovery issues.

The CRC printed after recovery should match the CRC printed during
the previous exit - if it doesn't this indicates that either the meta
data written to the disk is corrupt or recovery failed.
Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

c5586192

lightnvm: pblk: shut down gc gracefully during exit · 1edebacf

Hans Holmberg authored Oct 13, 2017

Shut down the GC workqueues and tasks in the right order.
Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

1edebacf

lightnvm: pblk: consider bad sectors in emeta during recovery · 75610cd9

Hans Holmberg authored Oct 13, 2017

When recovering lines we need to consider that bad blocks in a line
affect the emeta area size.

Previously it was assumed that the emeta area would grow by the
number of sectors per page * number of bad blocks in the line.

This assumption is not correct - the number of "extra" pages that are
consumed could be both smaller (depending on emeta size) and bigger
(depending on the placement of the bad blocks).

Fix this by calculating the emeta start by iterating backwards
through the line, skipping ppas that map to bad blocks.

Also fix the data types used for ppa indices/counts in
pblk_recov_l2p_from_emeta - we should use u64.
Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

75610cd9

lightnvm: pblk: start gc if needed during init · 03661b5f

Hans Holmberg authored Oct 13, 2017

Start GC if needed, directly after init, as we might
need to garbage collect in order to make room for user writes.

Create a helper function that allows to kick GC without exposing the
internals of the GC/rate-limiter interaction.
Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

03661b5f

lightnvm: pblk: free full lines during recovery · 37ce33d5

Hans Holmberg authored Oct 13, 2017

When rebuilding the L2P table, any full lines (lines without any
valid sectors) will be identified. If these lines are not freed,
we risk not being able to allocate the first data line.

This patch refactors the part of GC that frees empty lines
into a separate function and adds a call to this after the
L2P table has been rebuilt.
Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

37ce33d5

lightnvm: pblk: recover partially written lines correctly · 92957091

Hans Holmberg authored Oct 13, 2017

When recovering partially written lines, the valid sector
count must be decreased by the number of padded sectors
in the line.

Update line recovery to take all ADDR_EMPTY(padded) sectors
into account.
Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

92957091

lightnvm: pblk: prevent gc kicks when gc is not operational · 3e3a5b8e

Hans Holmberg authored Oct 13, 2017

GC can be kicked after it has been shut down when closing the last
line during exit, resulting in accesses to freed structures.

Make sure that GC is not triggered while it is not operational.
Also make sure that GC won't be re-activated during exit when
running on another processor by using timer_del_sync.
Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

3e3a5b8e

lightnvm: pblk: fix releases of kmem cache in error path · 22a4e061

Rakesh Pandit authored Oct 13, 2017

If pblk_core_init fails lets destroy all global caches.
Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

22a4e061

lightnvm: pblk: reduce arguments in __pblk_rb_update_l2p · 05ed3447

Rakesh Pandit authored Oct 13, 2017

We already pass the structure pointer so no need to pass the member.
Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

05ed3447

lightnvm: remove stale extern and unused exported symbols · eb6f168f

Rakesh Pandit authored Oct 13, 2017

Not all exported symbols are being used outside core and there were
some stale entries in lightnvm.h
Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

eb6f168f

lightnvm: remove unused argument from nvm_set_tgt_bb_tbl · ef56b9ce

Rakesh Pandit authored Oct 13, 2017

vblk isn't being used anyway and if we ever have a usecase we can
introduce this again.  This makes the logic easier and removes
unnecessary checks.
Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

ef56b9ce

lightnvm: pblk: remove useless line · e480689b

Rakesh Pandit authored Oct 13, 2017

Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

e480689b

lightnvm: pblk: fix changing GC group list for a line · 27b97872

Rakesh Pandit authored Oct 13, 2017

pblk_line_gc_list seems to had a bug since the introduction of pblk in
getting GC list for a line. In b20ba1bc while redesigning the GC
algorithm, the naming for the GC thresholds was altered, but the
values for high_thrs and mid_thrs were not. The result is that when
moving to the GC lists, the mid threshold is never evaluated.

Fixes: a4bd217b("lightnvm: physical block device (pblk) target")
Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

27b97872

lightnvm: pblk: ensure right bad block calculation · e6b754c2

Javier González authored Oct 13, 2017

Make sure that the variable controlling block threshold for allocating
extra metadata sectors in case of a line with bad blocks does not get a
negative value. Otherwise, the line will be marked as corrupted and
wasted.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

e6b754c2

lightnvm: pblk: enable 1 LUN configuration · 21d22871

Javier González authored Oct 13, 2017

Metadata I/Os are scheduled to minimize their impact on user data I/Os.
When there are enough LUNs instantiated (i.e., enough bandwidth), it is
easy to interleave metadata and data one after the other so that
metadata I/Os are the ones being blocked and not vice-versa.

We do this by calculating the distance between the I/Os in terms of the
LUNs that are not in used, and selecting a free LUN that satisfies a
the simple heuristic that metadata is scheduled behind. The per-LUN
semaphores guarantee consistency. This works fine on >1 LUN
configuration. However, when a single LUN is instantiated, this design
leads to a deadlock, where metadata waits to be scheduled on a free LUN.

This patch implements the 1 LUN case by simply scheduling the metadada
I/O after the data I/O. In the process, we refactor the way a line is
replaced to ensure that metadata writes are submitted after data writes
in order to guarantee block sequentiality. Note that, since there is
only one LUN, both I/Os will block each other by design. However, such
configuration only pursues tight read latencies, not write bandwidth.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

21d22871

lightnvm: pblk: remove I/O dependency on write path · 1e82123d

Javier González authored Oct 13, 2017

pblk schedules user I/O, metadata I/O and erases on the write path in
order to minimize collisions at the media level. Until now, there has
been a dependency between user and metadata I/Os that could lead to a
deadlock as both take the per-LUN semaphore to schedule submission.

This path removes this dependency and guarantees forward progress at a
per I/O granurality.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

1e82123d

lightnvm: pblk: remove redundant check on read path · 0f9248cf

Javier González authored Oct 13, 2017

A partial read I/O in pblk is an I/O where some sectors reside in the
write buffer in main memory and some are persisted on the device. Such
an I/O must at least contain 2 lbas, therefore checking for the case
where a single lba is mapped is not necessary.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

0f9248cf

lightnvm: pblk: guarantee line integrity on reads · 7bd4d370

Javier González authored Oct 13, 2017

When a line is recycled during garbage collection, reads can still be
issued to the line. If the line is freed in the middle of this process,
data corruption might occur.

This patch guarantees that lines are not freed in the middle of reads
that target them (lines). Specifically, we use the existing line
reference to decide when a line is eligible for being freed after the
recycle process.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

7bd4d370

lightnvm: pblk: check lba sanity on read path · a4809fee

Javier González authored Oct 13, 2017

As part of pblk's recovery scheme, we store the lba mapped to each
physical sector on the device's out-of-bound (OOB) area.

On the read path, we can use this information to validate that the data
being delivered to the upper layers corresponds to the lba being
requested. The cost of this check is an extra copy on the DMA region on
the device and an extra comparison in the host, given that (i) the OOB
area is being read together with the data in the media, and (ii) the DMA
region allocated for the ppa list can be reused for the metadata stored
on the OOB area.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

a4809fee

lightnvm: pblk: use rqd->end_io for completion · 26532ee5

Javier González authored Oct 13, 2017

For consistency with the rest of pblk, use rqd->end_io to point to the
function taking care of ending the request on the completion path.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

26532ee5

lightnvm: pblk: refactor rqd alloc/free · 67bf26a3

Javier González authored Oct 13, 2017

Refactor the rqd allocation and free functions so that all I/O types can
use these helper functions.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

67bf26a3

lightnvm: pblk: improve naming for internal req. · e2cddf20

Javier González authored Oct 13, 2017

Each request type sent to the LightNVM subsystem requires different
metadata. Until now, we have tailored this metadata based on write, read
and erase commands. However, pblk uses different metadata for internal
writes that do not hit the write buffer. Instead of abusing the metadata
for reads, create a new request type - internal write to improve
code readability.

In the process, create internal values for each I/O type instead of
abusing the READ/WRITE macros, as suggested by Christoph.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

e2cddf20

lightnvm: pblk: allocate bio size more accurately · 875d94f3

Javier González authored Oct 13, 2017

Wait until we know the exact number of ppas to be sent to the device,
before allocating the bio.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

875d94f3

lightnvm: pblk: simplify path on REQ_PREFLUSH · 6ca2f71f

Javier González authored Oct 13, 2017

On REQ_PREFLUSH, directly tag the I/O context flags to signal a flush in
the write to cache path, instead of finding the correct entry context
and imposing a memory barrier. This simplifies the code and might
potentially prevent race conditions when adding functionality to the
write path.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

6ca2f71f

lightnvm: pblk: put bio on bio completion · 55e836d4

Javier González authored Oct 13, 2017

Simplify put bio by doing it on bio end_io instead of manually putting
it on the completion path.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

55e836d4

lightnvm: pblk: refactor read path on GC · 2a19b10d

Javier González authored Oct 13, 2017

Simplify the part of the garbage collector where data is read from the
line being recycled and moved into an internal queue before being copied
to the memory buffer. This allows to get rid of a dedicated function,
which introduces an unnecessary dependency on the code.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

2a19b10d

lightnvm: pblk: simplify data validity check on GC · d340121e

Javier González authored Oct 13, 2017

When a line is selected for recycling by the garbage collector (GC), the
line state changes and the invalid bitmap is frozen, preventing
invalidations from happening. Throughout the GC, the L2P map is checked
to verify that not data being recycled has been updated. The last check
is done before the new map is being stored on the L2P table. Though
this algorithm works, it requires a number of corner cases to be checked
each time the L2P table is being updated. This complicates readability
and is error prone in case that the recycling algorithm is modified.

Instead, this patch makes the invalid bitmap accessible even when the
line is being recycled. When recycled data is being remapped, it is
enough to check the invalid bitmap for the line before updating the L2P
table.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

d340121e