- 12 Apr, 2016 32 commits
-
-
Jens Axboe authored
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
-
Jens Axboe authored
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
-
Jens Axboe authored
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
-
Jens Axboe authored
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
-
Jens Axboe authored
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
-
Jens Axboe authored
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
-
Jens Axboe authored
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
-
Jens Axboe authored
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
-
Jens Axboe authored
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
-
Jens Axboe authored
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
-
Jens Axboe authored
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
-
Jens Axboe authored
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
-
Jens Axboe authored
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
-
Jens Axboe authored
The driver calls it with 0 for flags, since it doesn't have a writeback cache. Just remove the call, as it's a no-op right now. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
-
Jens Axboe authored
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
-
Jens Axboe authored
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
-
Jens Axboe authored
Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Jens Axboe authored
Switch to the newer interface, instead of using blk_queue_flush() directly. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Jens Axboe authored
-
Jens Axboe authored
Add an internal helper and flag for setting whether a queue has write back caching, or write through (or none). Add a sysfs file to show this as well, and make it changeable from user space. This will replace the (awkward) blk_queue_flush() interface that drivers currently use to inform the block layer of write cache state and capabilities. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
-
Sagi Grimberg authored
No caller outside the blk-mq code so we can settle with it static. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Keith Busch authored
Only a single tags array anyway. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Sagi Grimberg authored
blk-mq offers a tagset iterator so let's use that instead of using nvme_clear_queues. Note, we changed nvme_queue_cancel_ios name to nvme_cancel_io as there is no concept of a queue now in this function (we also lost the print). Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Keith Busch authored
If the controller is degraded, the driver should stay out of the way so the user can recover the drive. This patch skips driver initiated async event requests when the drive is in this state. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Ming Lin authored
This moves nvme_setup_{flush,discard,rw} calls into a common nvme_setup_cmd() helper. So we can eventually hide all the command setup in the core module and don't even need to update the fabrics drivers for any specific command type. Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Ming Lin authored
This rewrites nvme_setup_discard() with blk_add_request_payload(). It allocates only the necessary amount(16 bytes) for the payload. Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Ming Lin authored
The helper returns the number of bytes that need to be mapped using PRPs/SGL entries. Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Ming Lin authored
When unloading driver, nvme_disable_io_queues() calls nvme_delete_queue() that sends nvme_admin_delete_cq command to admin sq. So when the command completed, the lock acquired by nvme_irq() actually belongs to admin queue. While the lock that nvme_del_cq_end() trying to acquire belongs to io queue. So it will not deadlock. This patch adds lock nesting notation to fix following report. [ 109.840952] ============================================= [ 109.846379] [ INFO: possible recursive locking detected ] [ 109.851806] 4.5.0+ #180 Tainted: G E [ 109.856533] --------------------------------------------- [ 109.861958] swapper/0/0 is trying to acquire lock: [ 109.866771] (&(&nvmeq->q_lock)->rlock){-.....}, at: [<ffffffffc0820bc6>] nvme_del_cq_end+0x26/0x70 [nvme] [ 109.876535] [ 109.876535] but task is already holding lock: [ 109.882398] (&(&nvmeq->q_lock)->rlock){-.....}, at: [<ffffffffc0820c2b>] nvme_irq+0x1b/0x50 [nvme] [ 109.891547] [ 109.891547] other info that might help us debug this: [ 109.898107] Possible unsafe locking scenario: [ 109.898107] [ 109.904056] CPU0 [ 109.906515] ---- [ 109.908974] lock(&(&nvmeq->q_lock)->rlock); [ 109.913381] lock(&(&nvmeq->q_lock)->rlock); [ 109.917787] [ 109.917787] *** DEADLOCK *** [ 109.917787] [ 109.923738] May be due to missing lock nesting notation [ 109.923738] [ 109.930558] 1 lock held by swapper/0/0: [ 109.934413] #0: (&(&nvmeq->q_lock)->rlock){-.....}, at: [<ffffffffc0820c2b>] nvme_irq+0x1b/0x50 [nvme] [ 109.944010] [ 109.944010] stack backtrace: [ 109.948389] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G E 4.5.0+ #180 [ 109.955734] Hardware name: Dell Inc. OptiPlex 7010/0YXT71, BIOS A15 08/12/2013 [ 109.962989] 0000000000000000 ffff88011e203c38 ffffffff81383d9c ffffffff81c13540 [ 109.970478] ffffffff826711d0 ffff88011e203ce8 ffffffff810bb429 0000000000000046 [ 109.977964] 0000000000000046 0000000000000000 0000000000b2e597 ffffffff81f4cb00 [ 109.985453] Call Trace: [ 109.987911] <IRQ> [<ffffffff81383d9c>] dump_stack+0x85/0xc9 [ 109.993711] [<ffffffff810bb429>] __lock_acquire+0x19b9/0x1c60 [ 109.999575] [<ffffffff810b6d1d>] ? trace_hardirqs_off+0xd/0x10 [ 110.005524] [<ffffffff810b386d>] ? complete+0x3d/0x50 [ 110.010688] [<ffffffff810bb760>] lock_acquire+0x90/0xf0 [ 110.016029] [<ffffffffc0820bc6>] ? nvme_del_cq_end+0x26/0x70 [nvme] [ 110.022418] [<ffffffff81772afb>] _raw_spin_lock_irqsave+0x4b/0x60 [ 110.028632] [<ffffffffc0820bc6>] ? nvme_del_cq_end+0x26/0x70 [nvme] [ 110.035019] [<ffffffffc0820bc6>] nvme_del_cq_end+0x26/0x70 [nvme] [ 110.041232] [<ffffffff8135b485>] blk_mq_end_request+0x35/0x60 [ 110.047095] [<ffffffffc0821ad8>] nvme_complete_rq+0x68/0x190 [nvme] [ 110.053481] [<ffffffff8135b53f>] __blk_mq_complete_request+0x8f/0x130 [ 110.060043] [<ffffffff8135b611>] blk_mq_complete_request+0x31/0x40 [ 110.066343] [<ffffffffc08209e3>] __nvme_process_cq+0x83/0x240 [nvme] [ 110.072818] [<ffffffffc0820c35>] nvme_irq+0x25/0x50 [nvme] [ 110.078419] [<ffffffff810cdb66>] handle_irq_event_percpu+0x36/0x110 [ 110.084804] [<ffffffff810cdc77>] handle_irq_event+0x37/0x60 [ 110.090491] [<ffffffff810d0ea3>] handle_edge_irq+0x93/0x150 [ 110.096180] [<ffffffff81012306>] handle_irq+0xa6/0x130 [ 110.101431] [<ffffffff81011abe>] do_IRQ+0x5e/0x120 [ 110.106333] [<ffffffff8177384c>] common_interrupt+0x8c/0x8c Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Keith Busch authored
Multiple users have reported device initialization failure due the driver not receiving legacy PCI interrupts. This is not unique to any particular controller, but has been observed on multiple platforms. There have been no issues reported or observed when with message signaled interrupts, so this patch attempts to use MSI-x during initialization, falling back to MSI. If that fails, legacy would become the default. The setup_io_queues error handling had to change as a result: the admin queue's msix_entry used to be initialized to the legacy IRQ. The case where nr_io_queues is 0 would fail request_irq when setting up the admin queue's interrupt since re-enabling MSI-x fails with 0 vectors, leaving the admin queue's msix_entry invalid. Instead, return success immediately. Reported-by: Tim Muhlemmer <muhlemmer@gmail.com> Reported-by: Jon Derrick <jonathan.derrick@intel.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Sagi Grimberg authored
Its useful to iterate on all the active tags in cases where we will need to fail all the queues IO. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> [hch: carefully check for valid tagsets] Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Ming Lin authored
We could kmalloc() the payload, so need the offset in page. Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Howard Cochran authored
Commit 947e9762 ("writeback: update wb_over_bg_thresh() to use wb_domain aware operations") unintentionally changed this function's meaning from "are there more dirty pages than the background writeback threshold" to "are there more dirty pages than the writeback threshold". The background writeback threshold is typically half of the writeback threshold, so this had the effect of raising the number of dirty pages required to cause a writeback worker to perform background writeout. This can cause a very severe performance regression when a BDI uses BDI_CAP_STRICTLIMIT because balance_dirty_pages() and the writeback worker can now disagree on whether writeback should be initiated. For example, in a system having 1GB of RAM, a single spinning disk, and a "pass-through" FUSE filesystem mounted over the disk, application code mmapped a 128MB file on the disk and was randomly dirtying pages in that mapping. Because FUSE uses strictlimit and has a default max_ratio of only 1%, in balance_dirty_pages, thresh is ~200, bg_thresh is ~100, and the dirty_freerun_ceiling is the average of those, ~150. So, it pauses the dirtying processes when we have 151 dirty pages and wakes up a background writeback worker. But the worker tests the wrong threshold (200 instead of 100), so it does not initiate writeback and just returns. Thus, balance_dirty_pages keeps looping, sleeping and then waking up the worker who will do nothing. It remains stuck in this state until the few dirty pages that we have finally expire and we write them back for that reason. Then the whole process repeats, resulting in near-zero throughput through the FUSE BDI. The fix is to call the parameterized variant of wb_calc_thresh, so that the worker will do writeback if the bg_thresh is exceeded which was the bahavior before the referenced commit. Fixes: 947e9762 ("writeback: update wb_over_bg_thresh() to use wb_domain aware operations") Signed-off-by: Howard Cochran <hcochran@kernelspring.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com>
-
- 11 Apr, 2016 4 commits
-
-
Linus Torvalds authored
-
git://ftp.arm.linux.org.uk/~rmk/linux-armLinus Torvalds authored
Pull ARM fixes from Russell King: "A couple of small fixes, and wiring up the new syscalls which appeared during the merge window" * 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: ARM: 8550/1: protect idiv patching against undefined gcc behavior ARM: wire up preadv2 and pwritev2 syscalls ARM: SMP enable of cache maintanence broadcast
-
git://git.linaro.org/people/ulf.hansson/mmcLinus Torvalds authored
Pull MMC fixes from Ulf Hansson: "Here are a couple of mmc fixes intended for v4.6 rc3: MMC host: - sdhci: Fix regression setting power on Trats2 board - sdhci-pci: Add support and PCI IDs for more Broxton host controllers" * tag 'mmc-v4.6-rc1' of git://git.linaro.org/people/ulf.hansson/mmc: mmc: sdhci-pci: Add support and PCI IDs for more Broxton host controllers mmc: sdhci: Fix regression setting power on Trats2 board
-
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linuxLinus Torvalds authored
Pull i2c fixes from Wolfram Sang: "Some bugfixes from I2C: - fix a uevent triggered boot problem by removing a useless debug print - fix sysfs-attributes of the new i2c-demux-pinctrl driver to follow standard kernel behaviour - fix a potential division-by-zero error (needed two takes)" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: jz4780: really prevent potential division by zero Revert "i2c: jz4780: prevent potential division by zero" i2c: jz4780: prevent potential division by zero i2c: mux: demux-pinctrl: Update docs to new sysfs-attributes i2c: mux: demux-pinctrl: Clean up sysfs attributes i2c: prevent endless uevent loop with CONFIG_I2C_DEBUG_CORE
-
- 10 Apr, 2016 1 commit
-
-
Linus Torvalds authored
This reverts commit 1028b55b. It's broken: it makes ext4 return an error at an invalid point, causing the readdir wrappers to write the the position of the last successful directory entry into the position field, which means that the next readdir will now return that last successful entry _again_. You can only return fatal errors (that terminate the readdir directory walk) from within the filesystem readdir functions, the "normal" errors (that happen when the readdir buffer fills up, for example) happen in the iterorator where we know the position of the actual failing entry. I do have a very different patch that does the "signal_pending()" handling inside the iterator function where it is allowable, but while that one passes all the sanity checks, I screwed up something like four times while emailing it out, so I'm not going to commit it today. So my track record is not good enough, and the stars will have to align better before that one gets committed. And it would be good to get some review too, of course, since celestial alignments are always an iffy debugging model. IOW, let's just revert the commit that caused the problem for now. Reported-by: Greg Thelen <gthelen@google.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
- 09 Apr, 2016 3 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linuxLinus Torvalds authored
Pull parisc fixes from Helge Deller: "Since commit 0de79858 ("parisc: Use generic extable search and sort routines") module loading is boken on parisc, because the parisc module loader wasn't prepared for the new R_PARISC_PCREL32 relocations. In addition, due to that breakage, Mikulas Patocka noticed that handling exceptions from modules probably never worked on parisc. It was just masked by the fact that exceptions from modules don't happen during normal use. This patch series fixes those issues and survives the tests of the lib/test_user_copy kernel module test. Some patches are tagged for stable" * 'parisc-4.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Update comment regarding relative extable support parisc: Unbreak handling exceptions from kernel modules parisc: Fix kernel crash with reversed copy_from_user() parisc: Avoid function pointers for kernel exception routines parisc: Handle R_PARISC_PCREL32 relocations in kernel modules
-
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimmLinus Torvalds authored
Pull libnvdimm fixes from Dan Williams: "Three fixes, the first two are tagged for -stable: - The ndctl utility/library gained expanded unit tests illuminating a long standing bug in the libnvdimm SMART data retrieval implementation. It has been broken since its initial implementation, now fixed. - Another one line fix for the detection of stale info blocks. Without this change userspace can get into a situation where it is unable to reconfigure a namespace. - Fix the badblock initialization path in the presence of the new (in v4.6-rc1) section alignment workarounds. Without this change badblocks will be reported at the wrong offset. These have received a build success report from the kbuild robot and have appeared in -next with no reported issues" * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: libnvdimm, pfn: fix nvdimm_namespace_add_poison() vs section alignment libnvdimm, pfn: fix uuid validation libnvdimm: fix smart data retrieval
-
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpioLinus Torvalds authored
Pull GPIO fixes from Linus Walleij: "Here is a set of four GPIO fixes. The two fixes to the core are serious as they are regressing minor architectures. Core fixes: - Defer GPIO device setup until after gpiolib is initialized. It turns out that a few very tightly integrated GPIO platform drivers initialize so early (befor core_initcall()) so that the gpiolib isn't even initialized itself. That limits what the library can do, and we cannot reference uninitialized fields until later. Defer some of the initialization until right after the gpiolib is initialized in these (rare) cases. - As a consequence: do not use devm_* resources when allocating the states in the initial set-up of the gpiochip. Driver fixes: - In ACPI retrieveal: ignore GpioInt when looking for output GPIOs. - Fix legacy builds on the PXA without a backing pin controller. - Use correct datatype on pca953x register writes" * tag 'gpio-v4.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: gpio: pca953x: Use correct u16 value for register word write gpiolib: Defer gpio device setup until after gpiolib initialization gpiolib: Do not use devm functions when registering gpio chip gpio: pxa: fix legacy non pinctrl aware builds gpio / ACPI: ignore GpioInt() GPIOs when requesting GPIO_OUT_*
-