- 18 Dec, 2018 11 commits
-
-
Jens Axboe authored
In preparation of handing in iocbs in a different fashion as well. Also make it clear that the iocb being passed in isn't modified, by marking it const throughout. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Replace the percpu_ref_put() + kmem_cache_free() with a call to iocb_put() instead. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Plugging is meant to optimize submission of a string of IOs, if we don't have more than 2 being submitted, don't bother setting up a plug. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
It's 192 bytes, fairly substantial. Most items don't need to be cleared, especially not upfront. Clear the ones we do need to clear, and leave the other ones for setup when the iocb is prepared and submitted. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
This is in preparation for certain types of IO not needing a ring reserveration. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
We know this is a read/write request, but in preparation for having different kinds of those, ensure that we call the assigned handler instead of assuming it's aio_complete_rq(). Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
* for-4.21/block: (351 commits) blk-mq: enable IO poll if .nr_queues of type poll > 0 blk-mq: change blk_mq_queue_busy() to blk_mq_queue_inflight() blk-mq: skip zero-queue maps in blk_mq_map_swqueue block: fix blk-iolatency accounting underflow blk-mq: fix dispatch from sw queue block: mq-deadline: Fix write completion handling nvme-pci: don't share queue maps blk-mq: only dispatch to non-defauly queue maps if they have queues blk-mq: export hctx->type in debugfs instead of sysfs blk-mq: fix allocation for queue mapping table blk-wbt: export internal state via debugfs blk-mq-debugfs: support rq_qos block: update sysfs documentation block: loop: check error using IS_ERR instead of IS_ERR_OR_NULL in loop_add() aoe: add __exit annotation block: clear REQ_HIPRI if polling is not supported blk-mq: replace and kill blk_mq_request_issue_directly blk-mq: issue directly with bypass 'false' in blk_mq_sched_insert_requests blk-mq: refactor the code of issue request directly block: remove the bio_integrity_advance export ...
-
Ming Lei authored
The queue mapping of type poll only exists when set->map[HCTX_TYPE_POLL].nr_queues is bigger than zero, so enhance the constraint by checking .nr_queues of type poll before enabling IO poll. Otherwise IO race & timeout can be observed when running block/007. Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
There's a single user of this function, dm, and dm just wants to check if IO is inflight, not that it's just allocated. This fixes a hang with srp/002 in blktests with dm, where it tries to suspend but waits for inflight IO to finish first. As it checks for just allocated requests, this fails. Tested-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Mimi Zohar authored
Start the policy_tokens and the associated enumeration from zero, simplifying the pt macro. Signed-off-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Linus Torvalds authored
The code uses a bitmap to check for duplicate tokens during parsing, and that doesn't work at all for the negative Opt_err token case. There is absolutely no reason to make Opt_err be negative, and in fact it only confuses things, since some of the affected functions actually return a positive Opt_xyz enum _or_ a regular negative error code (eg -EINVAL), and using -1 for Opt_err makes no sense. There are similar problems in ima_policy.c and key encryption, but they don't have the immediate bug wrt bitmap handing, and ima_policy.c in particular needs a different patch to make the enum values match the token array index. Mimi is sending that separately. Reported-by: syzbot+a22e0dc07567662c50bc@syzkaller.appspotmail.com Reported-by: Eric Biggers <ebiggers@kernel.org> Fixes: 5208cc83 ("keys, trusted: fix: *do not* allow duplicate key options") Fixes: 00d60fd3 ("KEYS: Provide keyctls to drive the new key type ops for asymmetric keys [ver #2]") Cc: James Morris James Morris <jmorris@namei.org> Cc: Mimi Zohar <zohar@linux.vnet.ibm.com> Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Cc: Peter Huewe <peterhuewe@gmx.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
- 17 Dec, 2018 11 commits
-
-
Ming Lei authored
From 7e849dd9 ("nvme-pci: don't share queue maps"), the mapping table won't be initialized actually if map->nr_queues is zero, so we can't use blk_mq_map_queue_type() to retrieve hctx any more. This way still may cause broken mapping, fix it by skipping zero-queues maps in blk_mq_map_swqueue(). Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Dennis Zhou authored
The blk-iolatency controller measures the time from rq_qos_throttle() to rq_qos_done_bio() and attributes this time to the first bio that needs to create the request. This means if a bio is plug-mergeable or bio-mergeable, it gets to bypass the blk-iolatency controller. The recent series [1], to tag all bios w/ blkgs undermined how iolatency was determining which bios it was charging and should process in rq_qos_done_bio(). Because all bios are being tagged, this caused the atomic_t for the struct rq_wait inflight count to underflow and result in a stall. This patch adds a new flag BIO_TRACKED to let controllers know that a bio is going through the rq_qos path. blk-iolatency now checks if this flag is set to see if it should process the bio in rq_qos_done_bio(). Overloading BLK_QUEUE_ENTERED works, but makes the flag rules confusing. BIO_THROTTLED was another candidate, but the flag is set for all bios that have gone through blk-throttle code. Overloading a flag comes with the burden of making sure that when either implementation changes, a change in setting rules for one doesn't cause a bug in the other. So here, we unfortunately opt for adding a new flag. [1] https://lore.kernel.org/lkml/20181205171039.73066-1-dennis@kernel.org/ Fixes: 5cdf2e3f ("blkcg: associate blkg when associating a device") Signed-off-by: Dennis Zhou <dennis@kernel.org> Cc: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
When a request is added to rq list of sw queue(ctx), the rq may be from a different type of hctx, especially after multi queue mapping is introduced. So when dispach request from sw queue via blk_mq_flush_busy_ctxs() or blk_mq_dequeue_from_ctx(), one request belonging to other queue type of hctx can be dispatched to current hctx in case that read queue or poll queue is enabled. This patch fixes this issue by introducing per-queue-type list. Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Changed by me to not use separately cacheline aligned lists, just place them all in the same cacheline where we had just the one list and lock before. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Damien Le Moal authored
For a zoned block device using mq-deadline, if a write request for a zone is received while another write was already dispatched for the same zone, dd_dispatch_request() will return NULL and the newly inserted write request is kept in the scheduler queue waiting for the ongoing zone write to complete. With this behavior, when no other request has been dispatched, rq_list in blk_mq_sched_dispatch_requests() is empty and blk_mq_sched_mark_restart_hctx() not called. This in turn leads to __blk_mq_free_request() call of blk_mq_sched_restart() to not run the queue when the already dispatched write request completes. The newly dispatched request stays stuck in the scheduler queue until eventually another request is submitted. This problem does not affect SCSI disk as the SCSI stack handles queue restart on request completion. However, this problem is can be triggered the nullblk driver with zoned mode enabled. Fix this by always requesting a queue restart in dd_dispatch_request() if no request was dispatched while WRITE requests are queued. Fixes: 5700f691 ("mq-deadline: Introduce zone locking support") Cc: <stable@vger.kernel.org> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Add missing export of blk_mq_sched_restart() Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Now that the block layer checks if a queue map has any queues inside it there is no more reason to duplicate the maps for the non-default types. Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
We should check if a given queue map actually has queues enabled before dispatching to it. This allows drivers to not initialize optional but not used map types, which subsequently will allow fixing problems with queue map rebuilds for that case. Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
Now we only export hctx->type via sysfs, and there isn't such info in hctx entry under debugfs. We often use debugfs only to diagnose queue mapping issue, so add the support in debugfs. Queue mapping becomes a bit more complicated after multiple queue mapping is supported, we may write blktest to verify if queue mapping is valid based on blk-mq-debugfs. Given not necessary to export hctx->type twice, so remove the export from sysfs. Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
Type of each element in queue mapping table is 'unsigned int, intead of 'struct blk_mq_queue_map)', so fix it. Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
This information is helpful to either investigate issues, or understand wbt's internal behaviour. Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
blk-mq-debugfs has been proved as very helpful for debug some tough issues, such as IO hang. We have seen blk-wbt related IO hang several times, even inside Red Hat BZ, there is such report not sovled yet, so this patch adds support debugfs on rq_qos. Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Damien Le Moal authored
Add the description of the zoned, nr_zones and chunk_sectors sysfs queue attributes to Documentation/block/queue-sysfs.txt. The description of the zoned and chunk_sector attributes are mostly copied from ABI/testing/sysfs-block (added a typo fix). While at it, also fix a typo in the description of the io_poll_delay attribute. nr_zones description is also added to ABI/testing/sysfs-block and contact email address updated for the zoned attribute. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 16 Dec, 2018 9 commits
-
-
Linus Torvalds authored
-
Chengguang Xu authored
blk_mq_init_queue() will not return NULL pointer to its caller, so it's better to replace IS_ERR_OR_NULL using IS_ERR in loop_add(). If in the future things change to check NULL pointer inside loop_add(), we should return -ENOMEM as return code instead of PTR_ERR(NULL). Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Chengguang Xu authored
Add __exit annotation to cleanup helper which is only called once in the module. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
This prevents a HIPRI bio from being submitted through a stacking driver that does not support polling and thus won't poll for I/O completion. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jianchao Wang authored
Replace blk_mq_request_issue_directly with blk_mq_try_issue_directly in blk_insert_cloned_request and kill it as nobody uses it any more. Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jianchao Wang authored
It is not necessary to issue request directly with bypass 'true' in blk_mq_sched_insert_requests and handle the non-issued requests itself. Just set bypass to 'false' and let blk_mq_try_issue_directly handle them totally. Remove the blk_rq_can_direct_dispatch check, because blk_mq_try_issue_directly can handle it well.If request is direct-issued unsuccessfully, insert the reset. Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jianchao Wang authored
Merge blk_mq_try_issue_directly and __blk_mq_try_issue_directly into one interface to unify the interfaces to issue requests directly. The merged interface takes over the requests totally, it could insert, end or do nothing based on the return value of .queue_rq and 'bypass' parameter. Then caller needn't any other handling any more and then code could be cleaned up. And also the commit c616cbee ( blk-mq: punt failed direct issue to dispatch list ) always inserts requests to hctx dispatch list whenever get a BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE, this is overkill and will harm the merging. We just need to do that for the requests that has been through .queue_rq. This patch also could fix this. Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 14 Dec, 2018 9 commits
-
-
Linus Torvalds authored
Merge misc fixes from Andrew Morton: "11 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: scripts/spdxcheck.py: always open files in binary mode checkstack.pl: fix for aarch64 userfaultfd: check VM_MAYWRITE was set after verifying the uffd is registered fs/iomap.c: get/put the page in iomap_page_create/release() hugetlbfs: call VM_BUG_ON_PAGE earlier in free_huge_page() memblock: annotate memblock_is_reserved() with __init_memblock psi: fix reference to kernel commandline enable arch/sh/include/asm/io.h: provide prototypes for PCI I/O mapping in asm/io.h mm/sparse: add common helper to mark all memblocks present mm: introduce common STRUCT_PAGE_MAX_SHIFT define alpha: fix hang caused by the bootmem removal
-
Thierry Reding authored
The spdxcheck script currently falls over when confronted with a binary file (such as Documentation/logo.gif). To avoid that, always open files in binary mode and decode line-by-line, ignoring encoding errors. One tricky case is when piping data into the script and reading it from standard input. By default, standard input will be opened in text mode, so we need to reopen it in binary mode. The breakage only happens with python3 and results in a UnicodeDecodeError (according to Uwe). Link: http://lkml.kernel.org/r/20181212131210.28024-1-thierry.reding@gmail.com Fixes: 6f4d29df ("scripts/spdxcheck.py: make python3 compliant") Signed-off-by: Thierry Reding <treding@nvidia.com> Reviewed-by: Jeremy Cline <jcline@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joe Perches <joe@perches.com> Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Qian Cai authored
There is actually a space after "sp," like this, ffff2000080813c8: a9bb7bfd stp x29, x30, [sp, #-80]! Right now, checkstack.pl isn't able to print anything on aarch64, because it won't be able to match the stating objdump line of a function due to this missing space. Hence, it displays every stack as zero-size. After this patch, checkpatch.pl is able to match the start of a function's objdump, and is then able to calculate each function's stack correctly. Link: http://lkml.kernel.org/r/20181207195843.38528-1-cai@lca.pwSigned-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Andrea Arcangeli authored
Calling UFFDIO_UNREGISTER on virtual ranges not yet registered in uffd could trigger an harmless false positive WARN_ON. Check the vma is already registered before checking VM_MAYWRITE to shut off the false positive warning. Link: http://lkml.kernel.org/r/20181206212028.18726-2-aarcange@redhat.com Cc: <stable@vger.kernel.org> Fixes: 29ec9066 ("userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas") Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: syzbot+06c7092e7d71218a2c16@syzkaller.appspotmail.com Acked-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Piotr Jaroszynski authored
migrate_page_move_mapping() expects pages with private data set to have a page_count elevated by 1. This is what used to happen for xfs through the buffer_heads code before the switch to iomap in commit 82cb1417 ("xfs: add support for sub-pagesize writeback without buffer_heads"). Not having the count elevated causes move_pages() to fail on memory mapped files coming from xfs. Make iomap compatible with the migrate_page_move_mapping() assumption by elevating the page count as part of iomap_page_create() and lowering it in iomap_page_release(). It causes the move_pages() syscall to misbehave on memory mapped files from xfs. It does not not move any pages, which I suppose is "just" a perf issue, but it also ends up returning a positive number which is out of spec for the syscall. Talking to Michal Hocko, it sounds like returning positive numbers might be a necessary update to move_pages() anyway though (https://lkml.kernel.org/r/20181116114955.GJ14706@dhcp22.suse.cz). I only hit this in tests that verify that move_pages() actually moved the pages. The test also got confused by the positive return from move_pages() (it got treated as a success as positive numbers were not expected and not handled) making it a bit harder to track down what's going on. Link: http://lkml.kernel.org/r/20181115184140.1388751-1-pjaroszynski@nvidia.com Fixes: 82cb1417 ("xfs: add support for sub-pagesize writeback without buffer_heads") Signed-off-by: Piotr Jaroszynski <pjaroszynski@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Brian Foster <bfoster@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Yongkai Wu authored
A stack trace was triggered by VM_BUG_ON_PAGE(page_mapcount(page), page) in free_huge_page(). Unfortunately, the page->mapping field was set to NULL before this test. This made it more difficult to determine the root cause of the problem. Move the VM_BUG_ON_PAGE tests earlier in the function so that if they do trigger more information is present in the page struct. Link: http://lkml.kernel.org/r/1543491843-23438-1-git-send-email-nic_w@163.comSigned-off-by: Yongkai Wu <nic_w@163.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Yueyi Li authored
Found warning: WARNING: EXPORT symbol "gsi_write_channel_scratch" [vmlinux] version generation failed, symbol will not be versioned. WARNING: vmlinux.o(.text+0x1e0a0): Section mismatch in reference from the function valid_phys_addr_range() to the function .init.text:memblock_is_reserved() The function valid_phys_addr_range() references the function __init memblock_is_reserved(). This is often because valid_phys_addr_range lacks a __init annotation or the annotation of memblock_is_reserved is wrong. Use __init_memblock instead of __init. Link: http://lkml.kernel.org/r/BLUPR13MB02893411BF12EACB61888E80DFAE0@BLUPR13MB0289.namprd13.prod.outlook.comSigned-off-by: Yueyi Li <liyueyi@live.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Baruch Siach authored
The kernel commandline parameter named in CONFIG_PSI_DEFAULT_DISABLED help text contradicts the documentation in kernel-parameters.txt, and the code. Fix that. Link: http://lkml.kernel.org/r/20181203213416.GA12627@cmpxchg.org Fixes: e0c27447 ("psi: make disabling/enabling easier for vendor kernels") Signed-off-by: Baruch Siach <baruch@tkos.co.il> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Mark Brown authored
Most architectures provide prototypes for the PCI I/O mapping operations when asm/io.h is included but SH doesn't currently do that, leading to for example warnings in sound/pci/hda/patch_ca0132.c when pci_iomap() is used on current -next. Make SH more consistent with other architectures by including asm-generic/pci_iomap.h in asm/io.h. Link: http://lkml.kernel.org/r/20181106175142.27988-1-broonie@kernel.orgSigned-off-by: Mark Brown <broonie@kernel.org> Reported-by: kbuild test robot <lkp@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-