- 23 May, 2019 11 commits
-
-
git://git.infradead.org/nvmeJens Axboe authored
Pull NVMe changes from Keith. * 'nvme-5.2-rc2' of git://git.infradead.org/nvme: nvme-pci: use blk-mq mapping for unmanaged irqs nvme: update MAINTAINERS nvme: copy MTFA field from identify controller nvme: fix memory leak for power latency tolerance nvme: release namespace SRCU protection before performing controller ioctls nvme: merge nvme_ns_ioctl into nvme_ioctl nvme: remove the ifdef around nvme_nvm_ioctl nvme: fix srcu locking on error return in nvme_get_ns_from_disk nvme: Fix known effects nvme-pci: Sync queues on reset nvme-pci: Unblock reset_work on IO failure nvme-pci: Don't disable on timeout in reset state nvme-pci: Fix controller freeze wait disabling
-
Jens Axboe authored
Various fixes and changes have been applied to liburing since we copied some select bits to the kernel testing/examples part, sync up with liburing to get those changes. Most notable is the change that split the CQE reading into the peek and seen event, instead of being just a single function. Also fixes an unsigned wrap issue in io_uring_submit(), leak of 'fd' in setup if we fail, and various other little issues. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Currently fails with: io_uring-bench.o: In function `main': /home/axboe/git/linux-block/tools/io_uring/io_uring-bench.c:560: undefined reference to `pthread_create' /home/axboe/git/linux-block/tools/io_uring/io_uring-bench.c:588: undefined reference to `pthread_join' collect2: error: ld returned 1 exit status Makefile:11: recipe for target 'io_uring-bench' failed make: *** [io_uring-bench] Error 1 Move -lpthread to the end. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Bob Liu authored
The following is a description of a hang in blk_mq_freeze_queue_wait(). The hang happens on attempt to freeze a queue while another task does queue unfreeze. The root cause is an incorrect sequence of percpu_ref_resurrect() and percpu_ref_kill() and as a result those two can be swapped: CPU#0 CPU#1 ---------------- ----------------- q1 = blk_mq_init_queue(shared_tags) q2 = blk_mq_init_queue(shared_tags): blk_mq_add_queue_tag_set(shared_tags): blk_mq_update_tag_set_depth(shared_tags): list_for_each_entry() blk_mq_freeze_queue(q1) > percpu_ref_kill() > blk_mq_freeze_queue_wait() blk_cleanup_queue(q1) blk_mq_freeze_queue(q1) > percpu_ref_kill() ^^^^^^ freeze_depth can't guarantee the order blk_mq_unfreeze_queue() > percpu_ref_resurrect() > blk_mq_freeze_queue_wait() ^^^^^^ Hang here!!!! This wrong sequence raises kernel warning: percpu_ref_kill_and_confirm called more than once on blk_queue_usage_counter_release! WARNING: CPU: 0 PID: 11854 at lib/percpu-refcount.c:336 percpu_ref_kill_and_confirm+0x99/0xb0 But the most unpleasant effect is a hang of a blk_mq_freeze_queue_wait(), which waits for a zero of a q_usage_counter, which never happens because percpu-ref was reinited (instead of being killed) and stays in PERCPU state forever. How to reproduce: - "insmod null_blk.ko shared_tags=1 nr_devices=0 queue_mode=2" - cpu0: python Script.py 0; taskset the corresponding process running on cpu0 - cpu1: python Script.py 1; taskset the corresponding process running on cpu1 Script.py: ------ #!/usr/bin/python3 import os import sys while True: on = "echo 1 > /sys/kernel/config/nullb/%s/power" % sys.argv[1] off = "echo 0 > /sys/kernel/config/nullb/%s/power" % sys.argv[1] os.system(on) os.system(off) ------ This bug was first reported and fixed by Roman, previous discussion: [1] Message id: 1443287365-4244-7-git-send-email-akinobu.mita@gmail.com [2] Message id: 1443563240-29306-6-git-send-email-tj@kernel.org [3] https://patchwork.kernel.org/patch/9268199/Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
At this point these fields aren't used for anything, so we can remove them. Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
We fundamentally do not have a maximum segement size for devices with a virt boundary. So don't bother checking it, especially given that the existing checks didn't properly work to start with as we never fully update the front/back segment size and miss the bi_seg_front_size that wuld have been required for some cases. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
We currently fail to update the front/back segment size in the bio when deciding to allow an otherwise gappy segement to a device with a virt boundary. The reason why this did not cause problems is that devices with a virt boundary fundamentally don't use segments as we know it and thus don't care. Make that assumption formal by forcing an unlimited segement size in this case. Fixes: f6970f83 ("block: don't check if adjacent bvecs in one bio can be mergeable") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Currently ll_merge_requests_fn, unlike all other merge functions, reduces nr_phys_segments by one if the last segment of the previous, and the first segment of the next segement are contigous. While this seems like a nice solution to avoid building smaller than possible requests it causes a mismatch between the segments actually present in the request and those iterated over by the bvec iterators, including __rq_for_each_bio. This can for example mistrigger the single segment optimization in the nvme-pci driver, and might lead to mismatching nr_phys_segments number when recalculating the number of request when inserting a cloned request. We could possibly work around this by making the bvec iterators take the front and back segment size into account, but that would require moving them from the bio to the bio_iter and spreading this mess over all users of bvecs. Or we could simply remove this optimization under the assumption that most users already build good enough bvecs, and that the bio merge patch never cared about this optimization either. The latter is what this patch does. dff824b2 ("nvme-pci: optimize mapping of small single segment requests"). Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Andrea Parri authored
This barrier only applies to the read-modify-write operations; in particular, it does not apply to the atomic_set() primitive. Replace the barrier with an smp_mb(). Fixes: 6c0ca7ae ("sbitmap: fix wakeup hang after sbq resize") Cc: stable@vger.kernel.org Reported-by: "Paul E. McKenney" <paulmck@linux.ibm.com> Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrea Parri <andrea.parri@amarulasolutions.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Omar Sandoval <osandov@fb.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: linux-block@vger.kernel.org Cc: "Paul E. McKenney" <paulmck@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Andrea Parri authored
This barrier only applies to the read-modify-write operations; in particular, it does not apply to the atomic_set() primitive. Replace the barrier with an smp_mb(). Fixes: dac56212 ("bio: skip atomic inc/dec of ->bi_cnt for most use cases") Cc: stable@vger.kernel.org Reported-by: "Paul E. McKenney" <paulmck@linux.ibm.com> Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrea Parri <andrea.parri@amarulasolutions.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Ming Lei <ming.lei@redhat.com> Cc: linux-block@vger.kernel.org Cc: "Paul E. McKenney" <paulmck@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Ed Cashin authored
Justin Sanders, who has extensive experience with ATA over Ethernet in general and AoE SCSI and block-device drivers in particular, is ready to take on the role of aoe maintainer. The driver needs a more active maintainer. Signed-off-by: Ed Cashin <ed.cashin@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 22 May, 2019 1 commit
-
-
Keith Busch authored
If a device is providing a single IRQ vector, the IO queue will share that vector with the admin queue. This is an unmanaged vector, so does not have a valid PCI IRQ affinity. Avoid trying to extract a managed affinity in this case and let blk-mq set up the cpu:queue mapping instead. Otherwise we'd hit the following warning when the device is using MSI: WARNING: CPU: 4 PID: 7 at drivers/pci/msi.c:1272 pci_irq_get_affinity+0x66/0x80 Modules linked in: nvme nvme_core serio_raw CPU: 4 PID: 7 Comm: kworker/u16:0 Tainted: G W 5.2.0-rc1+ #494 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Workqueue: nvme-reset-wq nvme_reset_work [nvme] RIP: 0010:pci_irq_get_affinity+0x66/0x80 Code: 0b 31 c0 c3 83 e2 10 48 c7 c0 b0 83 35 91 74 2a 48 8b 87 d8 03 00 00 48 85 c0 74 0e 48 8b 50 30 48 85 d2 74 05 39 70 14 77 05 <0f> 0b 31 c0 c3 48 63 f6 48 8d 04 76 48 8d 04 c2 f3 c3 48 8b 40 30 RSP: 0000:ffffb5abc01d3cc8 EFLAGS: 00010246 RAX: ffff9536786a39c0 RBX: 0000000000000000 RCX: 0000000000000080 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9536781ed000 RBP: ffff95367346a008 R08: ffff95367d43f080 R09: ffff953678c07800 R10: ffff953678164800 R11: 0000000000000000 R12: 0000000000000000 R13: ffff9536781ed000 R14: 00000000ffffffff R15: ffff95367346a008 FS: 0000000000000000(0000) GS:ffff95367d400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fdf814a3ff0 CR3: 000000001a20f000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: blk_mq_pci_map_queues+0x37/0xd0 nvme_pci_map_queues+0x80/0xb0 [nvme] blk_mq_alloc_tag_set+0x133/0x2f0 nvme_reset_work+0x105d/0x1590 [nvme] process_one_work+0x291/0x530 worker_thread+0x218/0x3d0 ? process_one_work+0x530/0x530 kthread+0x111/0x130 ? kthread_park+0x90/0x90 ret_from_fork+0x1f/0x30 ---[ end trace 74587339d93c83c0 ]--- Fixes: 22b55601 ("nvme-pci: Separate IO and admin queue IRQ vectors") Reported-by: Iván Chavero <ichavero@chavero.com.mx> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Keith Busch <keith.busch@intel.com>
-
- 21 May, 2019 2 commits
-
-
Keith Busch authored
Use my kernel.org email for nvme. This forwards to all my accounts. Signed-off-by: Keith Busch <keith.busch@intel.com>
-
Laine Walker-Avina authored
We use the controller's reported maximum firmware activation time as our timeout before resetting a controller for a failed activation notice, but this value was never being read so we could only use the default timeout. Copy the Identify Controller MTFA field to the corresponding nvme_ctrl's mtfa field. Fixes: b6dccf7f (“nvme: add support for FW activation without reset”). Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Minwoo Im <minwoo.im@samsung.com> Signed-off-by: Laine Walker-Avina <laine.walker-avina@intel.com> [changelog, fix endian] Signed-off-by: Keith Busch <keith.busch@intel.com>
-
- 17 May, 2019 23 commits
-
-
git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds authored
Pull KVM updates from Paolo Bonzini: "ARM: - support for SVE and Pointer Authentication in guests - PMU improvements POWER: - support for direct access to the POWER9 XIVE interrupt controller - memory and performance optimizations x86: - support for accessing memory not backed by struct page - fixes and refactoring Generic: - dirty page tracking improvements" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (155 commits) kvm: fix compilation on aarch64 Revert "KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU" kvm: x86: Fix L1TF mitigation for shadow MMU KVM: nVMX: Disable intercept for FS/GS base MSRs in vmcs02 when possible KVM: PPC: Book3S: Remove useless checks in 'release' method of KVM device KVM: PPC: Book3S HV: XIVE: Fix spelling mistake "acessing" -> "accessing" KVM: PPC: Book3S HV: Make sure to load LPID for radix VCPUs kvm: nVMX: Set nested_run_pending in vmx_set_nested_state after checks complete tests: kvm: Add tests for KVM_SET_NESTED_STATE KVM: nVMX: KVM_SET_NESTED_STATE - Tear down old EVMCS state before setting new state tests: kvm: Add tests for KVM_CAP_MAX_VCPUS and KVM_CAP_MAX_CPU_ID tests: kvm: Add tests to .gitignore KVM: Introduce KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 KVM: Fix kvm_clear_dirty_log_protect off-by-(minus-)one KVM: Fix the bitmap range to copy during clear dirty KVM: arm64: Fix ptrauth ID register masking logic KVM: x86: use direct accessors for RIP and RSP KVM: VMX: Use accessors for GPRs outside of dedicated caching logic KVM: x86: Omit caching logic for always-available GPRs kvm, x86: Properly check whether a pfn is an MMIO or not ...
-
Linus Torvalds authored
Merge tag 'nds32-for-linus-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/greentime/linux Pull nds32 updates from Greentime Hu: - Clean up codes and Makefile - Fix a vDSO bug - Remove useless functions/header files - Update git repo path in MAINTAINERS * tag 'nds32-for-linus-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/greentime/linux: nds32: Fix vDSO clock_getres() MAINTAINERS: update nds32 git repo path nds32: don't export low-level cache flushing routines arch: nds32: Kconfig: pedantic formatting nds32: fix semicolon code style issue nds32: vdso: drop unnecessary cc-ldoption nds32: remove unused generic-y += cmpxchg-local.h nds32: Use the correct style for SPDX License Identifier nds32: remove __virt_to_bus and __bus_to_virt nds32: vdso: fix and clean-up Makefile nds32: add vmlinux.lds and vdso.so to .gitignore nds32: ex-exit: Remove unneeded need_resched() loop nds32/io: Remove useless definition of mmiowb() nds32: Removed unused thread flag TIF_USEDFPU
-
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linuxLinus Torvalds authored
Pull more s390 updates from Martin Schwidefsky: - Enhancements for the QDIO layer - Remove the RCP trace event - Avoid three build issues - Move the defconfig to the configs directory * tag 's390-5.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390: move arch/s390/defconfig to arch/s390/configs/defconfig s390/qdio: optimize state inspection of HW-owned SBALs s390/qdio: use get_buf_state() in debug_get_buf_state() s390/qdio: allow to scan all Output SBALs in one go s390/cio: Remove tracing for rchp instruction s390/kasan: adapt disabled_wait usage to avoid build error latent_entropy: avoid build error when plugin cflags are not set s390/boot: fix compiler error due to missing awk strtonum
-
Yufen Yu authored
Unconditionally hide device pm latency tolerance when uninitializing the controller to ensure all qos resources are released so that we're not leaking this memory. This is safe to call if none were allocated in the first place, or were previously freed. Fixes: c5552fde("nvme: Enable autonomous power state transitions") Suggested-by: Keith Busch <keith.busch@intel.com> Tested-by: David Milburn <dmilburn@redhat.com> Signed-off-by: Yufen Yu <yuyufen@huawei.com> [changelog] Signed-off-by: Keith Busch <keith.busch@intel.com>
-
Christoph Hellwig authored
Holding the SRCU critical section protecting the namespace list can cause deadlocks when using the per-namespace admin passthrough ioctl to delete as namespace. Release it earlier when performing per-controller ioctls to avoid that. Reported-by: Kenneth Heitke <kenneth.heitke@intel.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Christoph Hellwig authored
Merge the two functions to make future changes a little easier. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
-
Christoph Hellwig authored
We already have a proper stub if lightnvm is not enabled, so don't bother with the ifdef. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
-
Christoph Hellwig authored
If we can't get a namespace don't leak the SRCU lock. nvme_ioctl was working around this, but nvme_pr_command wasn't handling this properly. Just do what callers would usually expect. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
-
Keith Busch authored
We're trying to append known effects to the ones reported in the controller's log. The original patch accomplished this, but something went wrong when patch was merged causing the effects log to override the known effects. Link: http://lists.infradead.org/pipermail/linux-nvme/2019-May/023710.html Fixes: f4524cc4 ("nvme-pci: add known admin effects to augument admin effects log page") Cc: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Keith Busch <keith.busch@intel.com>
-
Keith Busch authored
A controller with multiple namespaces may have multiple request_queues with their own timeout work. If a controller fails with IO outstanding to diffent namespaces, each request queue may attempt to handle it, so ensure there is no previously scheduled timeout work executing prior to starting controller initialization by synchronizing with each queue. Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com>
-
Keith Busch authored
The reset_work waits for queued IO to complete before setting the controller to live. If any of these times out and requeues, we won't be able to restart the controller because the reset_work is already running. Flush all entered requests to a failed completion if a timeout occurs in the connecting state, and ensure the controller can't transition to the live state after we've unblocked it from waiting for completions. Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com>
-
Keith Busch authored
The reset state doesn't dispatch commands that it needs to wait for anymore. If a timeout occurs in this state, the reset work is already disabling the controller, so just reset the request's timer. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com>
-
Keith Busch authored
If a controller disabling didn't start a freeze, don't wait for the operation to complete. Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com>
-
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds authored
Pull more vfs mount updates from Al Viro: "Propagation of new syscalls to other architectures + cosmetic change from Christian (fscontext didn't follow the convention for anon inode names)" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: uapi: Wire up the mount API syscalls on non-x86 arches [ver #2] uapi, x86: Fix the syscall numbering of the mount API syscalls [ver #2] uapi, fsopen: use square brackets around "fscontext" [ver #2]
-
Paolo Bonzini authored
Commit e45adf66 ("KVM: Introduce a new guest mapping API", 2019-01-31) introduced a build failure on aarch64 defconfig: $ make -j$(nproc) ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- O=out defconfig \ Image.gz ... ../arch/arm64/kvm/../../../virt/kvm/kvm_main.c: In function '__kvm_map_gfn': ../arch/arm64/kvm/../../../virt/kvm/kvm_main.c:1763:9: error: implicit declaration of function 'memremap'; did you mean 'memset_p'? ../arch/arm64/kvm/../../../virt/kvm/kvm_main.c:1763:46: error: 'MEMREMAP_WB' undeclared (first use in this function) ../arch/arm64/kvm/../../../virt/kvm/kvm_main.c: In function 'kvm_vcpu_unmap': ../arch/arm64/kvm/../../../virt/kvm/kvm_main.c:1795:3: error: implicit declaration of function 'memunmap'; did you mean 'vm_munmap'? because these functions are declared in <linux/io.h> rather than <asm/io.h>, and the former was being pulled in already on x86 but not on aarch64. Reported-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
git://git.kernel.dk/linux-blockLinus Torvalds authored
Pull io_uring fixes from Jens Axboe: "A small set of fixes for io_uring. This contains: - smp_rmb() cleanup for io_cqring_events() (Jackie) - io_cqring_wait() simplification (Jackie) - removal of dead 'ev_flags' passing (me) - SQ poll CPU affinity verification fix (me) - SQ poll wait fix (Roman) - SQE command prep cleanup and fix (Stefan)" * tag 'for-linus-20190516' of git://git.kernel.dk/linux-block: io_uring: use wait_event_interruptible for cq_wait conditional wait io_uring: adjust smp_rmb inside io_cqring_events io_uring: fix infinite wait in khread_park() on io_finish_async() io_uring: remove 'ev_flags' argument io_uring: fix failure to verify SQ_AFF cpu io_uring: fix race condition reading SQE data
-
git://git.kernel.dk/linux-blockLinus Torvalds authored
Pull more block updates from Jens Axboe: "This is mainly some late lightnvm changes that came in just before the merge window, as well as fixes that have been queued up since the initial pull request was frozen. This contains: - lightnvm changes, fixing race conditions, improving memory utilization, and improving pblk compatability (Chansol, Igor, Marcin) - NVMe pull request with minor fixes all over the map (via Christoph) - remove redundant error print in sata_rcar (Geert) - struct_size() cleanup (Jackie) - dasd CONFIG_LBADF warning fix (Ming) - brd cond_resched() improvement (Mikulas)" * tag 'for-5.2/block-post-20190516' of git://git.kernel.dk/linux-block: (41 commits) block/bio-integrity: use struct_size() in kmalloc() nvme: validate cntlid during controller initialisation nvme: change locking for the per-subsystem controller list nvme: trace all async notice events nvme: fix typos in nvme status code values nvme-fabrics: remove unused argument nvme-multipath: avoid crash on invalid subsystem cntlid enumeration nvme-fc: use separate work queue to avoid warning nvme-rdma: remove redundant reference between ib_device and tagset nvme-pci: mark expected switch fall-through nvme-pci: add known admin effects to augument admin effects log page nvme-pci: init shadow doorbell after each reset brd: add cond_resched to brd_free_pages sata_rcar: Remove ata_host_alloc() error printing s390/dasd: fix build warning in dasd_eckd_build_cp_raw lightnvm: pblk: use nvm_rq_to_ppa_list() lightnvm: pblk: simplify partial read path lightnvm: do not remove instance under global lock lightnvm: track inflight target creations lightnvm: pblk: recover only written metadata ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linuxLinus Torvalds authored
Pull more clk framework updates from Stephen Boyd: "One more patch to remove io.h from clk-provider.h. We used to need this include when we had clk_readl() and clk_writel(), but those are gone now so this patch pushes the dependency out to the users of clk-provider.h" * tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: Remove io.h from clk-provider.h
-
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroupLinus Torvalds authored
Pull cgroup fix from Tejun Heo: "The cgroup2 freezer pulled in this cycle broke strace. This pull request includes a workaround for the problem. It's not a complete fix in that it may cause spurious frozen state flip-flops which is fairly minor. Will push a full fix once it's ready" * 'for-5.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: signal: unconditionally leave the frozen state in ptrace_stop()
-
Linus Torvalds authored
Merge tag 'linux-kselftest-5.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull more kselftest updates from Shuah Khan: - kselftest framework bpf build/test workflow regression fix - Fix to kselftest install to use default install path - Fix to kselftest KBUILD_OUTPUT builds to not clutter main KBUILD_OUTPUT directory with selftest objects - .gitignore fixes (Kelsey Skunberg) - rseq selftests updates (Mathieu Desnoyers and Martin Schwidefsky) They change the per-architecture pre-abort signatures to ensure those are valid trap instructions. The way exit points are presented to debuggers is enhanced, ensuring all exit points are present, so debuggers don't have to disassemble rseq critical section to properly skip over them. Discussions with the glibc community is reaching a consensus of exposing a __rseq_handled symbol from glibc to coexist with rseq early adopters. Update the rseq selftest code to expose and use this symbol. Support for compiling asm goto with clang is added with the "-no-integrated-as" compiler switch, similarly to the top level kernel Makefile. - kselftest Makefile test run output refactoring and making test output TAP13 compliant from Kees Cook: This re-factors the selftest Makefiles to extract the test running logic to be reused between "run_tests" and "emit_tests", while also fixing up the test output to be TAP version 13 compliant: - added "plan" line - fixed result line syntax - moved all test output to be "# "-prefixed as TAP "diagnostic" lines The prefixing code includes a fallback mode for limited execution environments. Additionally, the plan lines are fixed for all callers of kselftest.h. * tag 'linux-kselftest-5.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (25 commits) selftests: avoid KBUILD_OUTPUT dir cluttering with selftest objects selftests: drivers: Create .gitignore to include /dma-buf/udmabuf selftests: pidfd: Create .gitignore to include pidfd_test selftests: fix bpf build/test workflow regression when KBUILD_OUTPUT is set selftests: fix install target to use default install path rseq/selftests: add -no-integrated-as for clang rseq/selftests: mips: use break instruction for RSEQ_SIG rseq/selftests: powerpc code signature: generate valid instructions rseq/selftests: aarch64 code signature: handle big-endian environment rseq/selftests: arm: use udf instruction for RSEQ_SIG rseq/selftests: s390: use trap4 for RSEQ_SIG rseq/selftests: x86: use ud1 instruction as RSEQ_SIG opcode rseq/selftests: s390: use jg instruction for jumps outside of the asm rseq/selftests: Use __rseq_handled symbol to coexist with glibc rseq/selftests: Introduce __rseq_cs_ptr_array, rename __rseq_table to __rseq_cs rseq/selftests: Add __rseq_exit_point_array section for debuggers rseq/selftests: x86: Work-around bogus gcc-8 optimisation selftests: Add test plan API to kselftest.h and adjust callers selftests: Remove KSFT_TAP_LEVEL selftests: Move test output to diagnostic lines ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linuxLinus Torvalds authored
Pull Devicetree vendor prefix conversion from Rob Herring: "Conversion of vendor-prefixes.txt to json-schema" * tag 'devicetree-for-5.2-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: dt-bindings: Convert vendor prefixes to json-schema
-
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fsLinus Torvalds authored
Pull AFS callback promise fixes from David Howells: "This series fixes a bunch of problems in callback promise handling, where a callback promise indicates a promise on the part of the server to notify the client in the event of some sort of change to a file or volume. In the event of a break, the client has to go and refetch the client status from the server and discard any cached permission information as the ACL might have changed. The problem in the current code is that changes made by other clients aren't always noticed, primarily because the file status information and the callback information aren't updated in the same critical section, even if these are carried in the same reply from an RPC operation, and so the AFS_VNODE_CB_PROMISED flag is unreliable. Arranging for them to be done in the same critical section during reply decoding is tricky because of the FS.InlineBulkStatus op - which has all the statuses in the reply arriving and then all the callbacks, so they have to be buffered. It simplifies things a lot to move the critical section out of the decode phase and do it after the RPC function returns. Also new inodes (either newly fetched or newly created) aren't properly managed against a callback break happening before we get the local inode up and running. Fix this by: - There's now a combined file status and callback record (struct afs_status_cb) to carry both plus some flags. - Each operation wrapper function allocates sufficient afs_status_cb records for all the vnodes it is interested in and passes them into RPC operations to be filled in from the reply. - The FileStatus and CallBack record decoders no longer apply the new/revised status and callback information to the inode/vnode at the point of decoding and instead store the information into the record from (2). - afs_vnode_commit_status() then revises the file status, detects deletion and notes callback information inside of a single critical section. It also checks the callback break counters and cancels the callback promise if they changed during the operation. [*] Note that "callback break counters" are counters of server events that cancel one or more callback promises that the client thinks it has. The client counts the events and compares the counters before and after an operation to see if the callback promise it thinks it just got evaporated before it got recorded under lock. - Volume and server callback break counters are passed into afs_iget() allowing callback breaks concurrent with inode set up to be detected and the callback promise thence to be cancelled. - AFS validation checks are now done under RCU conditions using a read lock on cb_lock. This requires vnode->cb_interest to be made RCU safe. - If the checks in (6) fail, the callback breaker is then called under write lock on the cb_lock - but only if the callback break counter didn't change from the value read before the checks were made. - Results from FS.InlineBulkStatus that correspond to inodes we currently have in memory are now used to update those inodes' status and callback information rather than being discarded. This requires those inodes to be looked up before the RPC op is made and all their callback break values saved. To aid in this, the following changes have also been made: - Don't pass the vnode into the reply delivery functions or the decoders. The vnode shouldn't be altered anywhere in those paths. The only exception, for the moment, is for the call done hook for file lock ops that wants access to both the vnode and the call - this can be fixed at a later time. - Get rid of the call->reply[] void* array and replace it with named and typed members. This avoids confusion since different ops were mapping different reply[] members to different things. - Fix an order-1 kmalloc allocation in afs_do_lookup() and replace it with kvcalloc(). - Always get the reply time. Since callback, lock and fileserver record expiry times are calculated for several RPCs, make this mandatory. - Call afs_pages_written_back() from the operation wrapper rather than from the delivery function. - Don't store the version and type from a callback promise in a reply as the information in them is of very limited use" * tag 'afs-fixes-b-20190516' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Fix application of the results of a inline bulk status fetch afs: Pass pre-fetch server and volume break counts into afs_iget5_set() afs: Fix unlink to handle YFS.RemoveFile2 better afs: Clear AFS_VNODE_CB_PROMISED if we detect callback expiry afs: Make vnode->cb_interest RCU safe afs: Split afs_validate() so first part can be used under LOOKUP_RCU afs: Don't save callback version and type fields afs: Fix application of status and callback to be under same lock afs: Always get the reply time afs: Fix order-1 allocation in afs_do_lookup() afs: Get rid of afs_call::reply[] afs: Don't pass the vnode pointer through into the inline bulk status op
-
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fsLinus Torvalds authored
Pull misc AFS fixes from David Howells: "This fixes a set of miscellaneous issues in the afs filesystem, including: - leak of keys on file close. - broken error handling in xattr functions. - missing locking when updating VL server list. - volume location server DNS lookup whereby preloaded cells may not ever get a lookup and regular DNS lookups to maintain server lists consume power unnecessarily. - incorrect error propagation and handling in the fileserver iteration code causes operations to sometimes apparently succeed. - interruption of server record check/update side op during fileserver iteration causes uninterruptible main operations to fail unexpectedly. - callback promise expiry time miscalculation. - over invalidation of the callback promise on directories. - double locking on callback break waking up file locking waiters. - double increment of the vnode callback break counter. Note that it makes some changes outside of the afs code, including: - an extra parameter to dns_query() to allow the dns_resolver key just accessed to be immediately invalidated. AFS is caching the results itself, so the key can be discarded. - an interruptible version of wait_var_event(). - an rxrpc function to allow the maximum lifespan to be set on a call. - a way for an rxrpc call to be marked as non-interruptible" * tag 'afs-fixes-20190516' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Fix double inc of vnode->cb_break afs: Fix lock-wait/callback-break double locking afs: Don't invalidate callback if AFS_VNODE_DIR_VALID not set afs: Fix calculation of callback expiry time afs: Make dynamic root population wait uninterruptibly for proc_cells_lock afs: Make some RPC operations non-interruptible rxrpc: Allow the kernel to mark a call as being non-interruptible afs: Fix error propagation from server record check/update afs: Fix the maximum lifespan of VL and probe calls rxrpc: Provide kernel interface to set max lifespan on a call afs: Fix "kAFS: AFS vnode with undefined type 0" afs: Fix cell DNS lookup Add wait_var_event_interruptible() dns_resolver: Allow used keys to be invalidated afs: Fix afs_cell records to always have a VL server list record afs: Fix missing lock when replacing VL server list afs: Fix afs_xattr_get_yfs() to not try freeing an error value afs: Fix incorrect error handling in afs_xattr_get_acl() afs: Fix key leak in afs_release() and afs_evict_inode()
-
- 16 May, 2019 3 commits
-
-
git://github.com/ceph/ceph-clientLinus Torvalds authored
Pull ceph updates from Ilya Dryomov: "On the filesystem side we have: - a fix to enforce quotas set above the mount point (Luis Henriques) - support for exporting snapshots through NFS (Zheng Yan) - proper statx implementation (Jeff Layton). statx flags are mapped to MDS caps, with AT_STATX_{DONT,FORCE}_SYNC taken into account. - some follow-up dentry name handling fixes, in particular elimination of our hand-rolled helper and the switch to __getname() as suggested by Al (Jeff Layton) - a set of MDS client cleanups in preparation for async MDS requests in the future (Jeff Layton) - a fix to sync the filesystem before remounting (Jeff Layton) On the rbd side, work is on-going on object-map and fast-diff image features" * tag 'ceph-for-5.2-rc1' of git://github.com/ceph/ceph-client: (29 commits) ceph: flush dirty inodes before proceeding with remount ceph: fix unaligned access in ceph_send_cap_releases libceph: make ceph_pr_addr take an struct ceph_entity_addr pointer libceph: fix unaligned accesses in ceph_entity_addr handling rbd: don't assert on writes to snapshots rbd: client_mutex is never nested ceph: print inode number in __caps_issued_mask debugging messages ceph: just call get_session in __ceph_lookup_mds_session ceph: simplify arguments and return semantics of try_get_cap_refs ceph: fix comment over ceph_drop_caps_for_unlink ceph: move wait for mds request into helper function ceph: have ceph_mdsc_do_request call ceph_mdsc_submit_request ceph: after an MDS request, do callback and completions ceph: use pathlen values returned by set_request_path_attr ceph: use __getname/__putname in ceph_mdsc_build_path ceph: use ceph_mdsc_build_path instead of clone_dentry_name ceph: fix potential use-after-free in ceph_mdsc_build_path ceph: dump granular cap info in "caps" debugfs file ceph: make iterate_session_caps a public symbol ceph: fix NULL pointer deref when debugging is enabled ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linuxLinus Torvalds authored
Pull thermal management updates from Zhang Rui: - Remove the 'module' Kconfig option for thermal subsystem framework because the thermal framework are required to be ready as early as possible to avoid overheat at boot time (Daniel Lezcano) - Fix a bug that thermal framework pokes disabled thermal zones upon resume (Wei Wang) - A couple of cleanups and trivial fixes on int340x thermal drivers (Srinivas Pandruvada, Zhang Rui, Sumeet Pawnikar) * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: drivers: thermal: processor_thermal: Downgrade error message mlxsw: Remove obsolete dependency on THERMAL=m hwmon/drivers/core: Simplify complex dependency thermal/drivers/core: Fix typo in the option name thermal/drivers/core: Remove depends on THERMAL in Kconfig thermal/drivers/core: Remove module unload code thermal/drivers/core: Remove the module Kconfig's option thermal: core: skip update disabled thermal zones after suspend thermal: make device_register's type argument const thermal: intel: int340x: processor_thermal_device: simplify to get driver data thermal/int3403_thermal: favor _TMP instead of PTYP
-
Linus Torvalds authored
Merge tag 'for-5.2/dm-changes-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - Improve DM snapshot target's scalability by using finer grained locking. Requires some list_bl interface improvements. - Add ability for DM integrity to use a bitmap mode, that tracks regions where data and metadata are out of sync, instead of using a journal. - Improve DM thin provisioning target to not write metadata changes to disk if the thin-pool and associated thin devices are merely activated but not used. This avoids metadata corruption due to concurrent activation of thin devices across different OS instances (e.g. split brain scenarios, which ultimately would be avoided if proper device filters were used -- but not having proper filtering has proven a very common configuration mistake) - Fix missing call to path selector type->end_io in DM multipath. This fixes reported performance problems due to inaccurate path selector IO accounting causing an imbalance of IO (e.g. avoiding issuing IO to particular path due to it seemingly being heavily used). - Fix bug in DM cache metadata's loading of its discard bitset that could lead to all cache blocks being discarded if the very first cache block was discarded (thankfully in practice the first cache block is generally in use; be it FS superblock, partition table, disk label, etc). - Add testing-only DM dust target which simulates a device that has failing sectors and/or read failures. - Fix a DM init error path reference count hang that caused boot hangs if user supplied malformed input on kernel commandline. - Fix a couple issues with DM crypt target's logging being overly verbose or lacking context. - Various other small fixes to DM init, DM multipath, DM zoned, and DM crypt. * tag 'for-5.2/dm-changes-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (42 commits) dm: fix a couple brace coding style issues dm crypt: print device name in integrity error message dm crypt: move detailed message into debug level dm ioctl: fix hang in early create error condition dm integrity: whitespace, coding style and dead code cleanup dm integrity: implement synchronous mode for reboot handling dm integrity: handle machine reboot in bitmap mode dm integrity: add a bitmap mode dm integrity: introduce a function add_new_range_and_wait() dm integrity: allow large ranges to be described dm ingerity: pass size to dm_integrity_alloc_page_list() dm integrity: introduce rw_journal_sectors() dm integrity: update documentation dm integrity: don't report unused options dm integrity: don't check null pointer before kvfree and vfree dm integrity: correctly calculate the size of metadata area dm dust: Make dm_dust_init and dm_dust_exit static dm dust: remove redundant unsigned comparison to less than zero dm mpath: always free attached_handler_name in parse_path() dm init: fix max devices/targets checks ...
-