- 31 Jul, 2017 40 commits
-
-
Eric Farman authored
[ Upstream commit 773c7220 ] In the case of a graceful set of detaches, where the virtio-scsi-ccw disk is removed from the guest prior to the controller, the guest behaves quite normally. Specifically, the detach gets us into sd_sync_cache to issue a Synchronize Cache(10) command, which immediately fails (and is retried a couple of times) because the device has been removed. Later, the removal of the controller sees two CRWs presented, but there's no further indication of the removal from the guest viewpoint. [ 17.217458] sd 0:0:0:0: [sda] Synchronizing SCSI cache [ 17.219257] sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [ 21.449400] crw_info : CRW reports slct=0, oflw=0, chn=1, rsc=3, anc=0, erc=4, rsid=2 [ 21.449406] crw_info : CRW reports slct=0, oflw=0, chn=0, rsc=3, anc=0, erc=4, rsid=0 However, on s390, the SCSI disks can be removed "by surprise" when an entire controller (host) is removed and all associated disks are removed via the loop in scsi_forget_host. The same call to sd_sync_cache is made, but because the controller has already been removed, the Synchronize Cache(10) command is neither issued (and then failed) nor rejected. That the I/O isn't returned means the guest cannot have other devices added nor removed, and other tasks (such as shutdown or reboot) issued by the guest will not complete either. The virtio ring has already been marked as broken (via virtio_break_device in virtio_ccw_remove), but we still attempt to queue the command only to have it remain there. The calling sequence provides a bit of distinction for us: virtscsi_queuecommand() -> virtscsi_kick_cmd() -> virtscsi_add_cmd() -> virtqueue_add_sgs() -> virtqueue_add() if success return 0 elseif vq->broken or vring_mapping_error() return -EIO else return -ENOSPC A return of ENOSPC is generally a temporary condition, so returning "host busy" from virtscsi_queuecommand makes sense here, to have it redriven in a moment or two. But the EIO return code is more of a permanent error and so it would be wise to return the I/O itself and allow the calling thread to finish gracefully. The result is these four kernel messages in the guest (the fourth one does not occur prior to this patch): [ 22.921562] crw_info : CRW reports slct=0, oflw=0, chn=1, rsc=3, anc=0, erc=4, rsid=2 [ 22.921580] crw_info : CRW reports slct=0, oflw=0, chn=0, rsc=3, anc=0, erc=4, rsid=0 [ 22.921978] sd 0:0:0:0: [sda] Synchronizing SCSI cache [ 22.921993] sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK I opted to fill in the same response data that is returned from the more graceful device detach, where the disk device is removed prior to the controller device. Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
-
Vineeth Remanan Pillai authored
[ Upstream commit 90c311b0 ] During an OOM scenario, request slots could not be created as skb allocation fails. So the netback cannot pass in packets and netfront wrongly assumes that there is no more work to be done and it disables polling. This causes Rx to stall. The issue is with the retry logic which schedules the timer if the created slots are less than NET_RX_SLOTS_MIN. The count of new request slots to be pushed are calculated as a difference between new req_prod and rsp_cons which could be more than the actual slots, if there are unconsumed responses. The fix is to calculate the count of newly created slots as the difference between new req_prod and old req_prod. Signed-off-by: Vineeth Remanan Pillai <vineethp@amazon.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
-
Stefano Stabellini authored
[ Upstream commit f1225ee4 ] In xen_swiotlb_map_page and xen_swiotlb_map_sg_attrs, if the original page is not suitable, we swap it for another page from the swiotlb pool. In these cases, we don't update the previously calculated dma address for the page before calling xen_dma_map_page. Thus, we end up calling xen_dma_map_page passing the wrong dev_addr, resulting in xen_dma_map_page mistakenly assuming that the page is foreign when it is local. Fix the bug by updating dev_addr appropriately. This change has no effect on x86, because xen_dma_map_page is a stub there. Signed-off-by: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: Pooya Keshavarzi <Pooya.Keshavarzi@de.bosch.com> Tested-by: Pooya Keshavarzi <Pooya.Keshavarzi@de.bosch.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
-
G. Campana authored
[ Upstream commit 8379cadf ] Using control_work instead of config_work as the 3rd argument to container_of results in an invalid portdev pointer. Indeed, the work structure is initialized as below: INIT_WORK(&portdev->config_work, &config_work_handler); It leads to a crash when portdev->vdev is dereferenced later. This bug is triggered when the guest uses a virtio-console without multiport feature and receives a config_changed virtio interrupt. Signed-off-by: G. Campana <gcampana@quarkslab.com> Reviewed-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
-
Liu Bo authored
[ Upstream commit 91298eec ] For such a file mapping, [0-4k][hole][8k-12k] In NO_HOLES mode, we don't have the [hole] extent any more. Commit c1aa4575 ("Btrfs: fix shrinking truncate when the no_holes feature is enabled") fixed disk isize not being updated in NO_HOLES mode when data is not flushed. However, even if data has been flushed, we can still have trouble in updating disk isize since we updated disk isize to 'start' of the last evicted extent. Reviewed-by: Chris Mason <clm@fb.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
-
Gavin Shan authored
[ Upstream commit 387bbc97 ] We give up recovery on permanent error, simply shutdown the affected devices and remove them. If the devices can't be put into quiet state, they spew more traffic that is likely to cause another unexpected EEH error. This was observed on "p8dtu2u" machine: 0002:00:00.0 PCI bridge: IBM Device 03dc 0002:01:00.0 Ethernet controller: Intel Corporation \ Ethernet Controller X710/X557-AT 10GBASE-T (rev 02) 0002:01:00.1 Ethernet controller: Intel Corporation \ Ethernet Controller X710/X557-AT 10GBASE-T (rev 02) 0002:01:00.2 Ethernet controller: Intel Corporation \ Ethernet Controller X710/X557-AT 10GBASE-T (rev 02) 0002:01:00.3 Ethernet controller: Intel Corporation \ Ethernet Controller X710/X557-AT 10GBASE-T (rev 02) On P8 PowerNV platform, the IO path is frozen when shutdowning the devices, meaning the memory registers are inaccessible. It is why the devices can't be put into quiet state before removing them. This fixes the issue by enabling IO path prior to putting the devices into quiet state. Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
-
Florian Fainelli authored
[ Upstream commit 3894396e ] bgmac_open() calls phy_start() to initialize the PHY state machine, which will set the interface's carrier state accordingly, no need to force that as this could be conflicting with the PHY state determined by PHYLIB. Fixes: dd4544f0 ("bgmac: driver for GBit MAC core on BCMA bus") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
-
Florian Fainelli authored
[ Upstream commit c3897f2a ] The driver does not start the transmit queue in bgmac_open(). If the queue was stopped prior to closing then re-opening the interface, we would never be able to wake-up again. Fixes: dd4544f0 ("bgmac: driver for GBit MAC core on BCMA bus") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
-
Florian Fainelli authored
[ Upstream commit d2b13233 ] We are checking for the Start of Frame bit in the ctl1 word, while this bit is set in the ctl0 word instead. Read the ctl0 word and update the check to verify that. Fixes: 9cde9450 ("bgmac: implement scatter/gather support") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
-
David S. Miller authored
[ Upstream commit 750afbf8 ] Fixes: f1640c3d ("bgmac: fix a missing check for build_skb") Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
-
Rafał Miłecki authored
[ Upstream commit 36bcc0c9 ] Bit-flip errors may occur on NAND flashes and are harmless. Handle them gracefully as read content is still reliable and can be parsed. Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: Brian Norris <computersforpeace@gmail.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
-
wangweidong authored
[ Upstream commit f1640c3d ] when build_skb failed, it may occure a NULL pointer. So add a 'NULL check' for it. Signed-off-by: Weidong Wang <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
-
Rafał Miłecki authored
[ Upstream commit 2a36a5c3 ] We allowed using bcm47xxpart on BCM5301X arch with commit: 9e3afa5f ("mtd: bcm47xxpart: allow enabling on ARCH_BCM_5301X") BCM5301X devices may contain some partitions in higher memory, e.g. Netgear R8000 has board_data at 0x2600000. To detect them we should use size limit on MIPS only. Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: Brian Norris <computersforpeace@gmail.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
-
Álvaro Fernández Rojas authored
[ Upstream commit 07b50db6 ] Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com> Cc: john@phrozen.org Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/13307/Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
-
Álvaro Fernández Rojas authored
[ Upstream commit d7146829 ] Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com> Cc: john@phrozen.org Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/13306/Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
-
John Crispin authored
[ Upstream commit e906a5f6 ] A few fixes to the pinmux data, 2 new muxes and a minor whitespace cleanup. Signed-off-by: John Crispin <blogic@openwrt.org> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/11991/Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
-
Florian Fainelli authored
[ Upstream commit e6afb1ad ] Commit beb0babf ("korina: disable napi on close and restart") introduced calls to napi_disable() that were missing before, unfortunately this leaves a small window during which NAPI has a chance to run, yet we just freed resources since korina_free_ring() has been called: Fix this by disabling NAPI first then freeing resource, and make sure that we also cancel the restart task before doing the resource freeing. Fixes: beb0babf ("korina: disable napi on close and restart") Reported-by: Alexandros C. Couloumbis <alex@ozo.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
-
Jon Mason authored
[ Upstream commit 0c2bf9f9 ] GIC_PPI flags were misconfigured for the timers, resulting in errors like: [ 0.000000] GIC: PPI11 is secure or misconfigured Changing them to being edge triggered corrects the issue Suggested-by: Rafał Miłecki <rafal@milecki.pl> Signed-off-by: Jon Mason <jon.mason@broadcom.com> Fixes: d27509f1 ("ARM: BCM5301X: add dts files for BCM4708 SoC") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
-
Quinn Tran authored
[ Upstream commit 4f060736 ] Termination of Immediate Notify IOCB was using wrong IOCB handle. IOCB completion code was unable to find appropriate code path due to wrong handle. Following message is seen in the logs. "Error entry - invalid handle/queue (ffff)." Signed-off-by: Quinn Tran <quinn.tran@cavium.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Reviewed-by: Christoph Hellwig <hch@lst.de> [ bvanassche: Fixed word order in patch title ] Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
-
Johannes Thumshirn authored
[ Upstream commit 8667f515 ] Set the elsiocb contexts to NULL after freeing as others depend on it. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Acked-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
-
Damien Le Moal authored
[ Upstream commit 26f28197 ] Zoned block devices force the use of READ/WRITE(16) commands by setting sdkp->use_16_for_rw and clearing sdkp->use_10_for_rw. This result in DPOFUA always being disabled for these drives as the assumed use of the deprecated READ/WRITE(6) commands only looks at sdkp->use_10_for_rw. Strenghten the test by also checking that sdkp->use_16_for_rw is false. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
-
Dmitry Vyukov authored
[ Upstream commit ce2e852e ] emulator_fix_hypercall() replaces hypercall with vmcall instruction, but it does not handle GP exception properly when writes the new instruction. It can return X86EMUL_PROPAGATE_FAULT without setting exception information. This leads to incorrect emulation and triggers WARN_ON(ctxt->exception.vector > 0x1f) in x86_emulate_insn() as discovered by syzkaller fuzzer: WARNING: CPU: 2 PID: 18646 at arch/x86/kvm/emulate.c:5558 Call Trace: warn_slowpath_null+0x2c/0x40 kernel/panic.c:582 x86_emulate_insn+0x16a5/0x4090 arch/x86/kvm/emulate.c:5572 x86_emulate_instruction+0x403/0x1cc0 arch/x86/kvm/x86.c:5618 emulate_instruction arch/x86/include/asm/kvm_host.h:1127 [inline] handle_exception+0x594/0xfd0 arch/x86/kvm/vmx.c:5762 vmx_handle_exit+0x2b7/0x38b0 arch/x86/kvm/vmx.c:8625 vcpu_enter_guest arch/x86/kvm/x86.c:6888 [inline] vcpu_run arch/x86/kvm/x86.c:6947 [inline] Set exception information when write in emulator_fix_hypercall() fails. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Cc: kvm@vger.kernel.org Cc: syzkaller@googlegroups.com Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
-
Mark Rutland authored
[ Upstream commit 3c226c63 ] In do_huge_pmd_numa_page(), we attempt to handle a migrating thp pmd by waiting until the pmd is unlocked before we return and retry. However, we can race with migrate_misplaced_transhuge_page(): // do_huge_pmd_numa_page // migrate_misplaced_transhuge_page() // Holds 0 refs on page // Holds 2 refs on page vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd); /* ... */ if (pmd_trans_migrating(*vmf->pmd)) { page = pmd_page(*vmf->pmd); spin_unlock(vmf->ptl); ptl = pmd_lock(mm, pmd); if (page_count(page) != 2)) { /* roll back */ } /* ... */ mlock_migrate_page(new_page, page); /* ... */ spin_unlock(ptl); put_page(page); put_page(page); // page freed here wait_on_page_locked(page); goto out; } This can result in the freed page having its waiters flag set unexpectedly, which trips the PAGE_FLAGS_CHECK_AT_PREP checks in the page alloc/free functions. This has been observed on arm64 KVM guests. We can avoid this by having do_huge_pmd_numa_page() take a reference on the page before dropping the pmd lock, mirroring what we do in __migration_entry_wait(). When we hit the race, migrate_misplaced_transhuge_page() will see the reference and abort the migration, as it may do today in other cases. Fixes: b8916634 ("mm: Prevent parallel splits during THP migration") Link: http://lkml.kernel.org/r/1497349722-6731-2-git-send-email-will.deacon@arm.comSigned-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Acked-by: Steve Capper <steve.capper@arm.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
-
Kees Cook authored
[ Upstream commit 41662f5c ] SYSCTL_WRITES_WARN was added in commit f4aacea2 ("sysctl: allow for strict write position handling"), and released in v3.16 in August of 2014. Since then I can find only 1 instance of non-zero offset writing[1], and it was fixed immediately in CRIU[2]. As such, it appears safe to flip this to the strict state now. [1] https://www.google.com/search?q="when%20file%20position%20was%20not%200" [2] http://lists.openvz.org/pipermail/criu/2015-April/019819.htmlSigned-off-by: Kees Cook <keescook@chromium.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
-
Baolin Wang authored
[ Upstream commit b3ce3ce0 ] When system try to close /dev/usb-ffs/adb/ep0 on one core, at the same time another core try to attach new UDC, which will cause deadlock as below scenario. Thus we should release ffs lock before issuing unregister_gadget_item(). [ 52.642225] c1 ====================================================== [ 52.642228] c1 [ INFO: possible circular locking dependency detected ] [ 52.642236] c1 4.4.6+ #1 Tainted: G W O [ 52.642241] c1 ------------------------------------------------------- [ 52.642245] c1 usb ffs open/2808 is trying to acquire lock: [ 52.642270] c0 (udc_lock){+.+.+.}, at: [<ffffffc00065aeec>] usb_gadget_unregister_driver+0x3c/0xc8 [ 52.642272] c1 but task is already holding lock: [ 52.642283] c0 (ffs_lock){+.+.+.}, at: [<ffffffc00066b244>] ffs_data_clear+0x30/0x140 [ 52.642285] c1 which lock already depends on the new lock. [ 52.642287] c1 the existing dependency chain (in reverse order) is: [ 52.642295] c0 -> #1 (ffs_lock){+.+.+.}: [ 52.642307] c0 [<ffffffc00012340c>] __lock_acquire+0x20f0/0x2238 [ 52.642314] c0 [<ffffffc000123b54>] lock_acquire+0xe4/0x298 [ 52.642322] c0 [<ffffffc000aaf6e8>] mutex_lock_nested+0x7c/0x3cc [ 52.642328] c0 [<ffffffc00066f7bc>] ffs_func_bind+0x504/0x6e8 [ 52.642334] c0 [<ffffffc000654004>] usb_add_function+0x84/0x184 [ 52.642340] c0 [<ffffffc000658ca4>] configfs_composite_bind+0x264/0x39c [ 52.642346] c0 [<ffffffc00065b348>] udc_bind_to_driver+0x58/0x11c [ 52.642352] c0 [<ffffffc00065b49c>] usb_udc_attach_driver+0x90/0xc8 [ 52.642358] c0 [<ffffffc0006598e0>] gadget_dev_desc_UDC_store+0xd4/0x128 [ 52.642369] c0 [<ffffffc0002c14e8>] configfs_write_file+0xd0/0x13c [ 52.642376] c0 [<ffffffc00023c054>] vfs_write+0xb8/0x214 [ 52.642381] c0 [<ffffffc00023cad4>] SyS_write+0x54/0xb0 [ 52.642388] c0 [<ffffffc000085ff0>] el0_svc_naked+0x24/0x28 [ 52.642395] c0 -> #0 (udc_lock){+.+.+.}: [ 52.642401] c0 [<ffffffc00011e3d0>] print_circular_bug+0x84/0x2e4 [ 52.642407] c0 [<ffffffc000123454>] __lock_acquire+0x2138/0x2238 [ 52.642412] c0 [<ffffffc000123b54>] lock_acquire+0xe4/0x298 [ 52.642420] c0 [<ffffffc000aaf6e8>] mutex_lock_nested+0x7c/0x3cc [ 52.642427] c0 [<ffffffc00065aeec>] usb_gadget_unregister_driver+0x3c/0xc8 [ 52.642432] c0 [<ffffffc00065995c>] unregister_gadget_item+0x28/0x44 [ 52.642439] c0 [<ffffffc00066b34c>] ffs_data_clear+0x138/0x140 [ 52.642444] c0 [<ffffffc00066b374>] ffs_data_reset+0x20/0x6c [ 52.642450] c0 [<ffffffc00066efd0>] ffs_data_closed+0xac/0x12c [ 52.642454] c0 [<ffffffc00066f070>] ffs_ep0_release+0x20/0x2c [ 52.642460] c0 [<ffffffc00023dbe4>] __fput+0xb0/0x1f4 [ 52.642466] c0 [<ffffffc00023dd9c>] ____fput+0x20/0x2c [ 52.642473] c0 [<ffffffc0000ee944>] task_work_run+0xb4/0xe8 [ 52.642482] c0 [<ffffffc0000cd45c>] do_exit+0x360/0xb9c [ 52.642487] c0 [<ffffffc0000cf228>] do_group_exit+0x4c/0xb0 [ 52.642494] c0 [<ffffffc0000dd3c8>] get_signal+0x380/0x89c [ 52.642501] c0 [<ffffffc00008a8f0>] do_signal+0x154/0x518 [ 52.642507] c0 [<ffffffc00008af00>] do_notify_resume+0x70/0x78 [ 52.642512] c0 [<ffffffc000085ee8>] work_pending+0x1c/0x20 [ 52.642514] c1 other info that might help us debug this: [ 52.642517] c1 Possible unsafe locking scenario: [ 52.642518] c1 CPU0 CPU1 [ 52.642520] c1 ---- ---- [ 52.642525] c0 lock(ffs_lock); [ 52.642529] c0 lock(udc_lock); [ 52.642533] c0 lock(ffs_lock); [ 52.642537] c0 lock(udc_lock); [ 52.642539] c1 *** DEADLOCK *** [ 52.642543] c1 1 lock held by usb ffs open/2808: [ 52.642555] c0 #0: (ffs_lock){+.+.+.}, at: [<ffffffc00066b244>] ffs_data_clear+0x30/0x140 [ 52.642557] c1 stack backtrace: [ 52.642563] c1 CPU: 1 PID: 2808 Comm: usb ffs open Tainted: G [ 52.642565] c1 Hardware name: Spreadtrum SP9860g Board (DT) [ 52.642568] c1 Call trace: [ 52.642573] c1 [<ffffffc00008b430>] dump_backtrace+0x0/0x170 [ 52.642577] c1 [<ffffffc00008b5c0>] show_stack+0x20/0x28 [ 52.642583] c1 [<ffffffc000422694>] dump_stack+0xa8/0xe0 [ 52.642587] c1 [<ffffffc00011e548>] print_circular_bug+0x1fc/0x2e4 [ 52.642591] c1 [<ffffffc000123454>] __lock_acquire+0x2138/0x2238 [ 52.642595] c1 [<ffffffc000123b54>] lock_acquire+0xe4/0x298 [ 52.642599] c1 [<ffffffc000aaf6e8>] mutex_lock_nested+0x7c/0x3cc [ 52.642604] c1 [<ffffffc00065aeec>] usb_gadget_unregister_driver+0x3c/0xc8 [ 52.642608] c1 [<ffffffc00065995c>] unregister_gadget_item+0x28/0x44 [ 52.642613] c1 [<ffffffc00066b34c>] ffs_data_clear+0x138/0x140 [ 52.642618] c1 [<ffffffc00066b374>] ffs_data_reset+0x20/0x6c [ 52.642621] c1 [<ffffffc00066efd0>] ffs_data_closed+0xac/0x12c [ 52.642625] c1 [<ffffffc00066f070>] ffs_ep0_release+0x20/0x2c [ 52.642629] c1 [<ffffffc00023dbe4>] __fput+0xb0/0x1f4 [ 52.642633] c1 [<ffffffc00023dd9c>] ____fput+0x20/0x2c [ 52.642636] c1 [<ffffffc0000ee944>] task_work_run+0xb4/0xe8 [ 52.642640] c1 [<ffffffc0000cd45c>] do_exit+0x360/0xb9c [ 52.642644] c1 [<ffffffc0000cf228>] do_group_exit+0x4c/0xb0 [ 52.642647] c1 [<ffffffc0000dd3c8>] get_signal+0x380/0x89c [ 52.642651] c1 [<ffffffc00008a8f0>] do_signal+0x154/0x518 [ 52.642656] c1 [<ffffffc00008af00>] do_notify_resume+0x70/0x78 [ 52.642659] c1 [<ffffffc000085ee8>] work_pending+0x1c/0x20 Acked-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Baolin Wang <baolin.wang@linaro.org> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
-
Deepak Rawat authored
[ Upstream commit 82fcee52 ] The hash table created during vmw_cmdbuf_res_man_create was never freed. This causes memory leak in context creation. Added the corresponding drm_ht_remove in vmw_cmdbuf_res_man_destroy. Tested for memory leak by running piglit overnight and kernel memory is not inflated which earlier was. Cc: <stable@vger.kernel.org> Signed-off-by: Deepak Rawat <drawat@vmware.com> Reviewed-by: Sinclair Yeh <syeh@vmware.com> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
-
Hui Wang authored
[ Upstream commit a8f20fd2 ] Recently we met a problem, the codec has valid adcs and input pins, and they can form valid input paths, but the driver does not build valid controls for them like "Mic boost", "Capture Volume" and "Capture Switch". Through debugging, I found the driver needs to shrink the invalid adcs and input paths for this machine, so it will move the whole column bitmap value to the previous column, after moving it, the driver forgets to set the original column bitmap value to zero, as a result, the driver will invalidate the path whose index value is the original colume bitmap value. After executing this function, all valid input paths are invalidated by a mistake, there are no any valid input paths, so the driver won't build controls for them. Fixes: 3a65bcdc ("ALSA: hda - Fix inconsistent input_paths after ADC reduction") Cc: <stable@vger.kernel.org> Signed-off-by: Hui Wang <hui.wang@canonical.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
-
Paul Burton authored
[ Upstream commit d8550860 ] When the scheduler sets TIF_NEED_RESCHED & we call into the scheduler from arch/mips/kernel/entry.S we disable interrupts. This is true regardless of whether we reach work_resched from syscall_exit_work, resume_userspace or by looping after calling schedule(). Although we disable interrupts in these paths we don't call trace_hardirqs_off() before calling into C code which may acquire locks, and we therefore leave lockdep with an inconsistent view of whether interrupts are disabled or not when CONFIG_PROVE_LOCKING & CONFIG_DEBUG_LOCKDEP are both enabled. Without tracing this interrupt state lockdep will print warnings such as the following once a task returns from a syscall via syscall_exit_partial with TIF_NEED_RESCHED set: [ 49.927678] ------------[ cut here ]------------ [ 49.934445] WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:3687 check_flags.part.41+0x1dc/0x1e8 [ 49.946031] DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled) [ 49.946355] CPU: 0 PID: 1 Comm: init Not tainted 4.10.0-00439-gc9fd5d362289-dirty #197 [ 49.963505] Stack : 0000000000000000 ffffffff81bb5d6a 0000000000000006 ffffffff801ce9c4 [ 49.974431] 0000000000000000 0000000000000000 0000000000000000 000000000000004a [ 49.985300] ffffffff80b7e487 ffffffff80a24498 a8000000ff160000 ffffffff80ede8b8 [ 49.996194] 0000000000000001 0000000000000000 0000000000000000 0000000077c8030c [ 50.007063] 000000007fd8a510 ffffffff801cd45c 0000000000000000 a8000000ff127c88 [ 50.017945] 0000000000000000 ffffffff801cf928 0000000000000001 ffffffff80a24498 [ 50.028827] 0000000000000000 0000000000000001 0000000000000000 0000000000000000 [ 50.039688] 0000000000000000 a8000000ff127bd0 0000000000000000 ffffffff805509bc [ 50.050575] 00000000140084e0 0000000000000000 0000000000000000 0000000000040a00 [ 50.061448] 0000000000000000 ffffffff8010e1b0 0000000000000000 ffffffff805509bc [ 50.072327] ... [ 50.076087] Call Trace: [ 50.079869] [<ffffffff8010e1b0>] show_stack+0x80/0xa8 [ 50.086577] [<ffffffff805509bc>] dump_stack+0x10c/0x190 [ 50.093498] [<ffffffff8015dde0>] __warn+0xf0/0x108 [ 50.099889] [<ffffffff8015de34>] warn_slowpath_fmt+0x3c/0x48 [ 50.107241] [<ffffffff801c15b4>] check_flags.part.41+0x1dc/0x1e8 [ 50.114961] [<ffffffff801c239c>] lock_is_held_type+0x8c/0xb0 [ 50.122291] [<ffffffff809461b8>] __schedule+0x8c0/0x10f8 [ 50.129221] [<ffffffff80946a60>] schedule+0x30/0x98 [ 50.135659] [<ffffffff80106278>] work_resched+0x8/0x34 [ 50.142397] ---[ end trace 0cb4f6ef5b99fe21 ]--- [ 50.148405] possible reason: unannotated irqs-off. [ 50.154600] irq event stamp: 400463 [ 50.159566] hardirqs last enabled at (400463): [<ffffffff8094edc8>] _raw_spin_unlock_irqrestore+0x40/0xa8 [ 50.171981] hardirqs last disabled at (400462): [<ffffffff8094eb98>] _raw_spin_lock_irqsave+0x30/0xb0 [ 50.183897] softirqs last enabled at (400450): [<ffffffff8016580c>] __do_softirq+0x4ac/0x6a8 [ 50.195015] softirqs last disabled at (400425): [<ffffffff80165e78>] irq_exit+0x110/0x128 Fix this by using the TRACE_IRQS_OFF macro to call trace_hardirqs_off() when CONFIG_TRACE_IRQFLAGS is enabled. This is done before invoking schedule() following the work_resched label because: 1) Interrupts are disabled regardless of the path we take to reach work_resched() & schedule(). 2) Performing the tracing here avoids the need to do it in paths which disable interrupts but don't call out to C code before hitting a path which uses the RESTORE_SOME macro that will call trace_hardirqs_on() or trace_hardirqs_off() as appropriate. We call trace_hardirqs_on() using the TRACE_IRQS_ON macro before calling syscall_trace_leave() for similar reasons, ensuring that lockdep has a consistent view of state after we re-enable interrupts. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Fixes: 1da177e4 ("Linux-2.6.12-rc2") Cc: linux-mips@linux-mips.org Cc: stable <stable@vger.kernel.org> Patchwork: https://patchwork.linux-mips.org/patch/15385/Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
-
Paul Burton authored
[ Upstream commit 161c51cc ] We allocate memory for a ready_count variable per-CPU, which is accessed via a cached non-coherent TLB mapping to perform synchronisation between threads within the core using LL/SC instructions. In order to ensure that the variable is contained within its own data cache line we allocate 2 lines worth of memory & align the resulting pointer to a line boundary. This is however unnecessary, since kmalloc is guaranteed to return memory which is at least cache-line aligned (see ARCH_DMA_MINALIGN). Stop the redundant manual alignment. Besides cleaning up the code & avoiding needless work, this has the side effect of avoiding an arithmetic error found by Bryan on 64 bit systems due to the 32 bit size of the former dlinesz. This led the ready_count variable to have its upper 32b cleared erroneously for MIPS64 kernels, causing problems when ready_count was later used on MIPS64 via cpuidle. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Fixes: 3179d37e ("MIPS: pm-cps: add PM state entry code for CPS systems") Reported-by: Bryan O'Donoghue <bryan.odonoghue@imgtec.com> Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@imgtec.com> Tested-by: Bryan O'Donoghue <bryan.odonoghue@imgtec.com> Cc: linux-mips@linux-mips.org Cc: stable <stable@vger.kernel.org> # v3.16+ Patchwork: https://patchwork.linux-mips.org/patch/15383/Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
-
James Hogan authored
[ Upstream commit 85423636 ] Since commit 81a76d71 ("MIPS: Avoid using unwind_stack() with usermode") show_backtrace() invokes the raw backtracer when cp0_status & ST0_KSU indicates user mode to fix issues on EVA kernels where user and kernel address spaces overlap. However this is used by show_stack() which creates its own pt_regs on the stack and leaves cp0_status uninitialised in most of the code paths. This results in the non deterministic use of the raw back tracer depending on the previous stack content. show_stack() deals exclusively with kernel mode stacks anyway, so explicitly initialise regs.cp0_status to KSU_KERNEL (i.e. 0) to ensure we get a useful backtrace. Fixes: 81a76d71 ("MIPS: Avoid using unwind_stack() with usermode") Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: linux-mips@linux-mips.org Cc: <stable@vger.kernel.org> # 3.15+ Patchwork: https://patchwork.linux-mips.org/patch/16656/Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
-
David Rientjes authored
[ Upstream commit 460bcec8 ] We got need_resched() warnings in swap_cgroup_swapoff() because swap_cgroup_ctrl[type].length is particularly large. Reschedule when needed. Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1704061315270.80559@chino.kir.corp.google.comSigned-off-by: David Rientjes <rientjes@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
-
Russell Currey authored
[ Upstream commit 71f677a9 ] The ast driver configures a window to enable access into BMC memory space in order to read some configuration registers. If this window is disabled, which it can be from the BMC side, the ast driver can't function. Closing this window is a necessity for security if a machine's host side and BMC side are controlled by different parties; i.e. a cloud provider offering machines "bare metal". A recent patch went in to try to check if that window is open but it does so by trying to access the registers in question and testing if the result is 0xffffffff. This method will trigger a PCIe error when the window is closed which on some systems will be fatal (it will trigger an EEH for example on POWER which will take out the device). This patch improves this in two ways: - First, if the firmware has put properties in the device-tree containing the relevant configuration information, we use these. - Otherwise, a bit in one of the SCU scratch registers (which are readable via the VGA register space and writeable by the BMC) will indicate if the BMC has closed the window. This bit has been defined by Y.C Chen from Aspeed. If the window is closed and the configuration isn't available from the device-tree, some sane defaults are used. Those defaults are hopefully sufficient for standard video modes used on a server. Signed-off-by: Russell Currey <ruscur@russell.cc> Acked-by: Joel Stanley <joel@jms.id.au> Cc: <stable@vger.kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
-
Kinglong Mee authored
[ Upstream commit 366a1569 ] Because nfs4_opendata_access() has close the state when access is denied, so the state isn't leak. Rather than revert the commit a974deee, I'd like clean the strange state close. [ 1615.094218] ------------[ cut here ]------------ [ 1615.094607] WARNING: CPU: 0 PID: 23702 at lib/list_debug.c:31 __list_add_valid+0x8e/0xa0 [ 1615.094913] list_add double add: new=ffff9d7901d9f608, prev=ffff9d7901d9f608, next=ffff9d7901ee8dd0. [ 1615.095458] Modules linked in: nfsv4(E) nfs(E) nfsd(E) tun bridge stp llc fuse ip_set nfnetlink vmw_vsock_vmci_transport vsock f2fs snd_seq_midi snd_seq_midi_event fscrypto coretemp ppdev crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_rapl_perf vmw_balloon snd_ens1371 joydev gameport snd_ac97_codec ac97_bus snd_seq snd_pcm snd_rawmidi snd_timer snd_seq_device snd soundcore nfit parport_pc parport acpi_cpufreq tpm_tis tpm_tis_core tpm i2c_piix4 vmw_vmci shpchp auth_rpcgss nfs_acl lockd(E) grace sunrpc(E) xfs libcrc32c vmwgfx drm_kms_helper ttm drm crc32c_intel mptspi e1000 serio_raw scsi_transport_spi mptscsih mptbase ata_generic pata_acpi fjes [last unloaded: nfs] [ 1615.097663] CPU: 0 PID: 23702 Comm: fstest Tainted: G W E 4.11.0-rc1+ #517 [ 1615.098015] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015 [ 1615.098807] Call Trace: [ 1615.099183] dump_stack+0x63/0x86 [ 1615.099578] __warn+0xcb/0xf0 [ 1615.099967] warn_slowpath_fmt+0x5f/0x80 [ 1615.100370] __list_add_valid+0x8e/0xa0 [ 1615.100760] nfs4_put_state_owner+0x75/0xc0 [nfsv4] [ 1615.101136] __nfs4_close+0x109/0x140 [nfsv4] [ 1615.101524] nfs4_close_state+0x15/0x20 [nfsv4] [ 1615.101949] nfs4_close_context+0x21/0x30 [nfsv4] [ 1615.102691] __put_nfs_open_context+0xb8/0x110 [nfs] [ 1615.103155] put_nfs_open_context+0x10/0x20 [nfs] [ 1615.103586] nfs4_file_open+0x13b/0x260 [nfsv4] [ 1615.103978] do_dentry_open+0x20a/0x2f0 [ 1615.104369] ? nfs4_copy_file_range+0x30/0x30 [nfsv4] [ 1615.104739] vfs_open+0x4c/0x70 [ 1615.105106] ? may_open+0x5a/0x100 [ 1615.105469] path_openat+0x623/0x1420 [ 1615.105823] do_filp_open+0x91/0x100 [ 1615.106174] ? __alloc_fd+0x3f/0x170 [ 1615.106568] do_sys_open+0x130/0x220 [ 1615.106920] ? __put_cred+0x3d/0x50 [ 1615.107256] SyS_open+0x1e/0x20 [ 1615.107588] entry_SYSCALL_64_fastpath+0x1a/0xa9 [ 1615.107922] RIP: 0033:0x7fab599069b0 [ 1615.108247] RSP: 002b:00007ffcf0600d78 EFLAGS: 00000246 ORIG_RAX: 0000000000000002 [ 1615.108575] RAX: ffffffffffffffda RBX: 00007fab59bcfae0 RCX: 00007fab599069b0 [ 1615.108896] RDX: 0000000000000200 RSI: 0000000000000200 RDI: 00007ffcf060255e [ 1615.109211] RBP: 0000000000040010 R08: 0000000000000000 R09: 0000000000000016 [ 1615.109515] R10: 00000000000006a1 R11: 0000000000000246 R12: 0000000000041000 [ 1615.109806] R13: 0000000000040010 R14: 0000000000001000 R15: 0000000000002710 [ 1615.110152] ---[ end trace 96ed63b1306bf2f3 ]--- Fixes: a974deee ("NFSv4: Fix memory and state leak in...") Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
-
Eric Leblond authored
[ Upstream commit 87e94dbc ] This patch fixes the creation of connection tracking entry from netlink when synproxy is used. It was missing the addition of the synproxy extension. This was causing kernel crashes when a conntrack entry created by conntrackd was used after the switch of traffic from active node to the passive node. Signed-off-by: Eric Leblond <eric@regit.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
-
Eric Dumazet authored
[ Upstream commit 2638fd0f ] Denys provided an awesome KASAN report pointing to an use after free in xt_TCPMSS I have provided three patches to fix this issue, either in xt_TCPMSS or in xt_tcpudp.c. It seems xt_TCPMSS patch has the smallest possible impact. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
-
Gao Feng authored
[ Upstream commit 9745e362 ] The register_vlan_device would invoke free_netdev directly, when register_vlan_dev failed. It would trigger the BUG_ON in free_netdev if the dev was already registered. In this case, the netdev would be freed in netdev_run_todo later. So add one condition check now. Only when dev is not registered, then free it directly. The following is the part coredump when netdev_upper_dev_link failed in register_vlan_dev. I removed the lines which are too long. [ 411.237457] ------------[ cut here ]------------ [ 411.237458] kernel BUG at net/core/dev.c:7998! [ 411.237484] invalid opcode: 0000 [#1] SMP [ 411.237705] [last unloaded: 8021q] [ 411.237718] CPU: 1 PID: 12845 Comm: vconfig Tainted: G E 4.12.0-rc5+ #6 [ 411.237737] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015 [ 411.237764] task: ffff9cbeb6685580 task.stack: ffffa7d2807d8000 [ 411.237782] RIP: 0010:free_netdev+0x116/0x120 [ 411.237794] RSP: 0018:ffffa7d2807dbdb0 EFLAGS: 00010297 [ 411.237808] RAX: 0000000000000002 RBX: ffff9cbeb6ba8fd8 RCX: 0000000000001878 [ 411.237826] RDX: 0000000000000001 RSI: 0000000000000282 RDI: 0000000000000000 [ 411.237844] RBP: ffffa7d2807dbdc8 R08: 0002986100029841 R09: 0002982100029801 [ 411.237861] R10: 0004000100029980 R11: 0004000100029980 R12: ffff9cbeb6ba9000 [ 411.238761] R13: ffff9cbeb6ba9060 R14: ffff9cbe60f1a000 R15: ffff9cbeb6ba9000 [ 411.239518] FS: 00007fb690d81700(0000) GS:ffff9cbebb640000(0000) knlGS:0000000000000000 [ 411.239949] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 411.240454] CR2: 00007f7115624000 CR3: 0000000077cdf000 CR4: 00000000003406e0 [ 411.240936] Call Trace: [ 411.241462] vlan_ioctl_handler+0x3f1/0x400 [8021q] [ 411.241910] sock_ioctl+0x18b/0x2c0 [ 411.242394] do_vfs_ioctl+0xa1/0x5d0 [ 411.242853] ? sock_alloc_file+0xa6/0x130 [ 411.243465] SyS_ioctl+0x79/0x90 [ 411.243900] entry_SYSCALL_64_fastpath+0x1e/0xa9 [ 411.244425] RIP: 0033:0x7fb69089a357 [ 411.244863] RSP: 002b:00007ffcd04e0fc8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [ 411.245445] RAX: ffffffffffffffda RBX: 00007ffcd04e2884 RCX: 00007fb69089a357 [ 411.245903] RDX: 00007ffcd04e0fd0 RSI: 0000000000008983 RDI: 0000000000000003 [ 411.246527] RBP: 00007ffcd04e0fd0 R08: 0000000000000000 R09: 1999999999999999 [ 411.246976] R10: 000000000000053f R11: 0000000000000202 R12: 0000000000000004 [ 411.247414] R13: 00007ffcd04e1128 R14: 00007ffcd04e2888 R15: 0000000000000001 [ 411.249129] RIP: free_netdev+0x116/0x120 RSP: ffffa7d2807dbdb0 Signed-off-by: Gao Feng <gfree.wind@vip.163.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
-
Wei Wang authored
[ Upstream commit 76371d2e ] In the existing dn_route.c code, dn_route_output_slow() takes dst->__refcnt before calling dn_insert_route() while dn_route_input_slow() does not take dst->__refcnt before calling dn_insert_route(). This makes the whole routing code very buggy. In dn_dst_check_expire(), dnrt_free() is called when rt expires. This makes the routes inserted by dn_route_output_slow() not able to be freed as the refcnt is not released. In dn_dst_gc(), dnrt_drop() is called to release rt which could potentially cause the dst->__refcnt to be dropped to -1. In dn_run_flush(), dst_free() is called to release all the dst. Again, it makes the dst inserted by dn_route_output_slow() not able to be released and also, it does not wait on the rcu and could potentially cause crash in the path where other users still refer to this dst. This patch makes sure both input and output path do not take dst->__refcnt before calling dn_insert_route() and also makes sure dnrt_free()/dst_free() is called when removing dst from the hash table. The only difference between those 2 calls is that dnrt_free() waits on the rcu while dst_free() does not. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
-
Xin Long authored
[ Upstream commit f8a894b2 ] Now when starting the dad work in addrconf_mod_dad_work, if the dad work is idle and queued, it needs to hold ifa. The problem is there's one gap in [1], during which if the pending dad work is removed elsewhere. It will miss to hold ifa, but the dad word is still idea and queue. if (!delayed_work_pending(&ifp->dad_work)) in6_ifa_hold(ifp); <--------------[1] mod_delayed_work(addrconf_wq, &ifp->dad_work, delay); An use-after-free issue can be caused by this. Chen Wei found this issue when WARN_ON(!hlist_unhashed(&ifp->addr_lst)) in net6_ifa_finish_destroy was hit because of it. As Hannes' suggestion, this patch is to fix it by holding ifa first in addrconf_mod_dad_work, then calling mod_delayed_work and putting ifa if the dad_work is already in queue. Note that this patch did not choose to fix it with: if (!mod_delayed_work(delay)) in6_ifa_hold(ifp); As with it, when delay == 0, dad_work would be scheduled immediately, all addrconf_mod_dad_work(0) callings had to be moved under ifp->lock. Reported-by: Wei Chen <weichen@redhat.com> Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
-
WANG Cong authored
[ Upstream commit b4846fc3 ] Andrey reported a lockdep warning on non-initialized spinlock: INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. CPU: 1 PID: 4099 Comm: a.out Not tainted 4.12.0-rc6+ #9 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 dump_stack+0x292/0x395 lib/dump_stack.c:52 register_lock_class+0x717/0x1aa0 kernel/locking/lockdep.c:755 ? 0xffffffffa0000000 __lock_acquire+0x269/0x3690 kernel/locking/lockdep.c:3255 lock_acquire+0x22d/0x560 kernel/locking/lockdep.c:3855 __raw_spin_lock_bh ./include/linux/spinlock_api_smp.h:135 _raw_spin_lock_bh+0x36/0x50 kernel/locking/spinlock.c:175 spin_lock_bh ./include/linux/spinlock.h:304 ip_mc_clear_src+0x27/0x1e0 net/ipv4/igmp.c:2076 igmpv3_clear_delrec+0xee/0x4f0 net/ipv4/igmp.c:1194 ip_mc_destroy_dev+0x4e/0x190 net/ipv4/igmp.c:1736 We miss a spin_lock_init() in igmpv3_add_delrec(), probably because previously we never use it on this code path. Since we already unlink it from the global mc_tomb list, it is probably safe not to acquire this spinlock here. It does not harm to have it although, to avoid conditional locking. Fixes: c38b7d32 ("igmp: acquire pmc lock for ip_mc_clear_src()") Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
-
WANG Cong authored
[ Upstream commit c38b7d32 ] Andrey reported a use-after-free in add_grec(): for (psf = *psf_list; psf; psf = psf_next) { ... psf_next = psf->sf_next; where the struct ip_sf_list's were already freed by: kfree+0xe8/0x2b0 mm/slub.c:3882 ip_mc_clear_src+0x69/0x1c0 net/ipv4/igmp.c:2078 ip_mc_dec_group+0x19a/0x470 net/ipv4/igmp.c:1618 ip_mc_drop_socket+0x145/0x230 net/ipv4/igmp.c:2609 inet_release+0x4e/0x1c0 net/ipv4/af_inet.c:411 sock_release+0x8d/0x1e0 net/socket.c:597 sock_close+0x16/0x20 net/socket.c:1072 This happens because we don't hold pmc->lock in ip_mc_clear_src() and a parallel mr_ifc_timer timer could jump in and access them. The RCU lock is there but it is merely for pmc itself, this spinlock could actually ensure we don't access them in parallel. Thanks to Eric and Long for discussion on this bug. Reported-by: Andrey Konovalov <andreyknvl@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Xin Long <lucien.xin@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
-