- 17 Jul, 2014 22 commits
-
-
Dennis Dalessandro authored
commit 7e6d3e5c upstream. This patch addresses an issue where the legacy diagpacket is sent in from the user, but the driver operates on only the extended diagpkt. This patch specifically initializes the extended diagpkt based on the legacy packet. Reported-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Mike Marciniszyn authored
commit 911eccd2 upstream. The code used a literal 1 in dispatching an IB_EVENT_PKEY_CHANGE. As of the dual port qib QDR card, this is not necessarily correct. Change to use the port as specified in the call. Reported-by: Alex Estrin <alex.estrin@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Yann Droneaud authored
commit 43bc8893 upstream. The i386 ABI disagrees with most other ABIs regarding alignment of data type larger than 4 bytes: on most ABIs a padding must be added at end of the structures, while it is not required on i386. So for most ABIs struct mlx5_ib_create_srq gets implicitly padded to be aligned on a 8 bytes multiple, while for i386, such padding is not added. Tool pahole could be used to find such implicit padding: $ pahole --anon_include \ --nested_anon_include \ --recursive \ --class_name mlx5_ib_create_srq \ drivers/infiniband/hw/mlx5/mlx5_ib.o Then, structure layout can be compared between i386 and x86_64: # +++ obj-i386/drivers/infiniband/hw/mlx5/mlx5_ib.o.pahole.txt 2014-03-28 11:43:07.386413682 +0100 # --- obj-x86_64/drivers/infiniband/hw/mlx5/mlx5_ib.o.pahole.txt 2014-03-27 13:06:17.788472721 +0100 # @@ -69,7 +68,6 @@ struct mlx5_ib_create_srq { # __u64 db_addr; /* 8 8 */ # __u32 flags; /* 16 4 */ # # - /* size: 20, cachelines: 1, members: 3 */ # - /* last cacheline: 20 bytes */ # + /* size: 24, cachelines: 1, members: 3 */ # + /* padding: 4 */ # + /* last cacheline: 24 bytes */ # }; ABI disagreement will make an x86_64 kernel try to read past the buffer provided by an i386 binary. When boundary check will be implemented, the x86_64 kernel will refuse to read past the i386 userspace provided buffer and the uverb will fail. Anyway, if the structure lay in memory on a page boundary and next page is not mapped, ib_copy_from_udata() will fail and the uverb will fail. This patch makes create_srq_user() takes care of the input data size to handle the case where no padding was provided. This way, x86_64 kernel will be able to handle struct mlx5_ib_create_srq as sent by unpatched and patched i386 libmlx5. Link: http://marc.info/?i=cover.1399309513.git.ydroneaud@opteya.com Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapter") Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Yann Droneaud authored
commit a8237b32 upstream. The i386 ABI disagrees with most other ABIs regarding alignment of data type larger than 4 bytes: on most ABIs a padding must be added at end of the structures, while it is not required on i386. So for most ABI struct mlx5_ib_create_cq get padded to be aligned on a 8 bytes multiple, while for i386, such padding is not added. The tool pahole can be used to find such implicit padding: $ pahole --anon_include \ --nested_anon_include \ --recursive \ --class_name mlx5_ib_create_cq \ drivers/infiniband/hw/mlx5/mlx5_ib.o Then, structure layout can be compared between i386 and x86_64: # +++ obj-i386/drivers/infiniband/hw/mlx5/mlx5_ib.o.pahole.txt 2014-03-28 11:43:07.386413682 +0100 # --- obj-x86_64/drivers/infiniband/hw/mlx5/mlx5_ib.o.pahole.txt 2014-03-27 13:06:17.788472721 +0100 # @@ -34,9 +34,8 @@ struct mlx5_ib_create_cq { # __u64 db_addr; /* 8 8 */ # __u32 cqe_size; /* 16 4 */ # # - /* size: 20, cachelines: 1, members: 3 */ # - /* last cacheline: 20 bytes */ # + /* size: 24, cachelines: 1, members: 3 */ # + /* padding: 4 */ # + /* last cacheline: 24 bytes */ # }; This ABI disagreement will make an x86_64 kernel try to read past the buffer provided by an i386 binary. When boundary check will be implemented, a x86_64 kernel will refuse to read past the i386 userspace provided buffer and the uverb will fail. Anyway, if the structure lies in memory on a page boundary and next page is not mapped, ib_copy_from_udata() will fail when trying to read the 4 bytes of padding and the uverb will fail. This patch makes create_cq_user() takes care of the input data size to handle the case where no padding is provided. This way, x86_64 kernel will be able to handle struct mlx5_ib_create_cq as sent by unpatched and patched i386 libmlx5. Link: http://marc.info/?i=cover.1399309513.git.ydroneaud@opteya.com Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapter") Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
gundberg authored
commit a9e0436b upstream. Use the prescaler index, rather than its value, to configure the watchdog. This will prevent a mismatch with the prescaler used to calculate the cycles. Signed-off-by: Per Gundberg <per.gundberg@icomera.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Michael Brunner <michael.brunner@kontron.com> Tested-by: Michael Brunner <michael.brunner@kontron.com> Signed-off-by: Wim Van Sebroeck <wim@iguana.be> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Gabor Juhos authored
commit 23afeb61 upstream. On some AR934x based systems, where the frequency of the AHB bus is relatively high, the built-in watchdog causes a spurious restart when it gets enabled. The possible cause of these restarts is that the timeout value written into the TIMER register does not reaches the hardware in time. Add an explicit delay into the ath79_wdt_enable function to avoid the spurious restarts. Signed-off-by: Gabor Juhos <juhosg@openwrt.org> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@iguana.be> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Viresh Kumar authored
commit 938626d9 upstream. Implementation of ->set_timeout() is supposed to set 'timeout' field of 'struct watchdog_device' passed to it. sp805 was rather setting this in a local variable. Fix it. Reported-by: Arun Ramamurthy <arun.ramamurthy@broadcom.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@iguana.be> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
hujianyang authored
commit 72abc8f4 upstream. I hit the same assert failed as Dolev Raviv reported in Kernel v3.10 shows like this: [ 9641.164028] UBIFS assert failed in shrink_tnc at 131 (pid 13297) [ 9641.234078] CPU: 1 PID: 13297 Comm: mmap.test Tainted: G O 3.10.40 #1 [ 9641.234116] [<c0011a6c>] (unwind_backtrace+0x0/0x12c) from [<c000d0b0>] (show_stack+0x20/0x24) [ 9641.234137] [<c000d0b0>] (show_stack+0x20/0x24) from [<c0311134>] (dump_stack+0x20/0x28) [ 9641.234188] [<c0311134>] (dump_stack+0x20/0x28) from [<bf22425c>] (shrink_tnc_trees+0x25c/0x350 [ubifs]) [ 9641.234265] [<bf22425c>] (shrink_tnc_trees+0x25c/0x350 [ubifs]) from [<bf2245ac>] (ubifs_shrinker+0x25c/0x310 [ubifs]) [ 9641.234307] [<bf2245ac>] (ubifs_shrinker+0x25c/0x310 [ubifs]) from [<c00cdad8>] (shrink_slab+0x1d4/0x2f8) [ 9641.234327] [<c00cdad8>] (shrink_slab+0x1d4/0x2f8) from [<c00d03d0>] (do_try_to_free_pages+0x300/0x544) [ 9641.234344] [<c00d03d0>] (do_try_to_free_pages+0x300/0x544) from [<c00d0a44>] (try_to_free_pages+0x2d0/0x398) [ 9641.234363] [<c00d0a44>] (try_to_free_pages+0x2d0/0x398) from [<c00c6a60>] (__alloc_pages_nodemask+0x494/0x7e8) [ 9641.234382] [<c00c6a60>] (__alloc_pages_nodemask+0x494/0x7e8) from [<c00f62d8>] (new_slab+0x78/0x238) [ 9641.234400] [<c00f62d8>] (new_slab+0x78/0x238) from [<c031081c>] (__slab_alloc.constprop.42+0x1a4/0x50c) [ 9641.234419] [<c031081c>] (__slab_alloc.constprop.42+0x1a4/0x50c) from [<c00f80e8>] (kmem_cache_alloc_trace+0x54/0x188) [ 9641.234459] [<c00f80e8>] (kmem_cache_alloc_trace+0x54/0x188) from [<bf227908>] (do_readpage+0x168/0x468 [ubifs]) [ 9641.234553] [<bf227908>] (do_readpage+0x168/0x468 [ubifs]) from [<bf2296a0>] (ubifs_readpage+0x424/0x464 [ubifs]) [ 9641.234606] [<bf2296a0>] (ubifs_readpage+0x424/0x464 [ubifs]) from [<c00c17c0>] (filemap_fault+0x304/0x418) [ 9641.234638] [<c00c17c0>] (filemap_fault+0x304/0x418) from [<c00de694>] (__do_fault+0xd4/0x530) [ 9641.234665] [<c00de694>] (__do_fault+0xd4/0x530) from [<c00e10c0>] (handle_pte_fault+0x480/0xf54) [ 9641.234690] [<c00e10c0>] (handle_pte_fault+0x480/0xf54) from [<c00e2bf8>] (handle_mm_fault+0x140/0x184) [ 9641.234716] [<c00e2bf8>] (handle_mm_fault+0x140/0x184) from [<c0316688>] (do_page_fault+0x150/0x3ac) [ 9641.234737] [<c0316688>] (do_page_fault+0x150/0x3ac) from [<c000842c>] (do_DataAbort+0x3c/0xa0) [ 9641.234759] [<c000842c>] (do_DataAbort+0x3c/0xa0) from [<c0314e38>] (__dabt_usr+0x38/0x40) After analyzing the code, I found a condition that may cause this failed in correct operations. Thus, I think this assertion is wrong and should be removed. Suppose there are two clean znodes and one dirty znode in TNC. So the per-filesystem atomic_t @clean_zn_cnt is (2). If commit start, dirty_znode is set to COW_ZNODE in get_znodes_to_commit() in case of potentially ops on this znode. We clear COW bit and DIRTY bit in write_index() without @tnc_mutex locked. We don't increase @clean_zn_cnt in this place. As the comments in write_index() shows, if another process hold @tnc_mutex and dirty this znode after we clean it, @clean_zn_cnt would be decreased to (1). We will increase @clean_zn_cnt to (2) with @tnc_mutex locked in free_obsolete_znodes() to keep it right. If shrink_tnc() performs between decrease and increase, it will release other 2 clean znodes it holds and found @clean_zn_cnt is less than zero (1 - 2 = -1), then hit the assertion. Because free_obsolete_znodes() will soon correct @clean_zn_cnt and no harm to fs in this case, I think this assertion could be removed. 2 clean zondes and 1 dirty znode, @clean_zn_cnt == 2 Thread A (commit) Thread B (write or others) Thread C (shrinker) ->write_index ->clear_bit(DIRTY_NODE) ->clear_bit(COW_ZNODE) @clean_zn_cnt == 2 ->mutex_locked(&tnc_mutex) ->dirty_cow_znode ->!ubifs_zn_cow(znode) ->!test_and_set_bit(DIRTY_NODE) ->atomic_dec(&clean_zn_cnt) ->mutex_unlocked(&tnc_mutex) @clean_zn_cnt == 1 ->mutex_locked(&tnc_mutex) ->shrink_tnc ->destroy_tnc_subtree ->atomic_sub(&clean_zn_cnt, 2) ->ubifs_assert <- hit ->mutex_unlocked(&tnc_mutex) @clean_zn_cnt == -1 ->mutex_lock(&tnc_mutex) ->free_obsolete_znodes ->atomic_inc(&clean_zn_cnt) ->mutux_unlock(&tnc_mutex) @clean_zn_cnt == 0 (correct after shrink) Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
hujianyang authored
commit 691a7c6f upstream. There is a race condition in UBIFS: Thread A (mmap) Thread B (fsync) ->__do_fault ->write_cache_pages -> ubifs_vm_page_mkwrite -> budget_space -> lock_page -> release/convert_page_budget -> SetPagePrivate -> TestSetPageDirty -> unlock_page -> lock_page -> TestClearPageDirty -> ubifs_writepage -> do_writepage -> release_budget -> ClearPagePrivate -> unlock_page -> !(ret & VM_FAULT_LOCKED) -> lock_page -> set_page_dirty -> ubifs_set_page_dirty -> TestSetPageDirty (set page dirty without budgeting) -> unlock_page This leads to situation where we have a diry page but no budget allocated for this page, so further write-back may fail with -ENOSPC. In this fix we return from page_mkwrite without performing unlock_page. We return VM_FAULT_LOCKED instead. After doing this, the race above will not happen. Signed-off-by: hujianyang <hujianyang@huawei.com> Tested-by: Laurence Withers <lwithers@guralp.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Markos Chandras authored
commit ab6c15bc upstream. Previously, the lower limit for the MIPS SC initialization loop was set incorrectly allowing one extra loop leading to writes beyond the MSC ioremap'd space. More precisely, the value of the 'imp' in the last loop increased beyond the msc_irqmap_t boundaries and as a result of which, the 'n' variable was loaded with an incorrect value. This value was used later on to calculate the offset in the MSC01_IC_SUP which led to random crashes like the following one: CPU 0 Unable to handle kernel paging request at virtual address e75c0200, epc == 8058dba4, ra == 8058db90 [...] Call Trace: [<8058dba4>] init_msc_irqs+0x104/0x154 [<8058b5bc>] arch_init_irq+0xd8/0x154 [<805897b0>] start_kernel+0x220/0x36c Kernel panic - not syncing: Attempted to kill the idle task! This patch fixes the problem Signed-off-by: Markos Chandras <markos.chandras@imgtec.com> Reviewed-by: James Hogan <james.hogan@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/7118/Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Alex Smith authored
commit 91ad11d7 upstream. On MIPS calls to _mcount in modules generate 2 instructions to load the _mcount address (and therefore 2 relocations). The mcount_loc table should only reference the first of these, so the second is filtered out by checking the relocation offset and ignoring ones that immediately follow the previous one seen. However if a module has an _mcount call at offset 0, the second relocation would not be filtered out due to old_r_offset == 0 being taken to mean that the current relocation is the first one seen, and both would end up in the mcount_loc table. This results in ftrace_make_nop() patching both (adjacent) instructions to branches over the _mcount call sequence like so: 0xffffffffc08a8000: 04 00 00 10 b 0xffffffffc08a8014 0xffffffffc08a8004: 04 00 00 10 b 0xffffffffc08a8018 0xffffffffc08a8008: 2d 08 e0 03 move at,ra ... The second branch is in the delay slot of the first, which is defined to be unpredictable - on the platform on which this bug was encountered, it triggers a reserved instruction exception. Fix by initializing old_r_offset to ~0 and using that instead of 0 to determine whether the current relocation is the first seen. Signed-off-by: Alex Smith <alex.smith@imgtec.com> Cc: linux-kernel@vger.kernel.org Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/7098/Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Asai Thambi S P authored
commit af5ded8c upstream. In module exit, dfs_parent and it's subtree were removed before unregistering with pci. When debugfs entry for each device is attempted to remove in pci_remove() context, they don't exist, as dfs_parent and its children were already ripped apart. Modified to first unregister with pci and then remove dfs_parent. Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Asai Thambi S P authored
commit 670a6414 upstream. Increased timeout for STANDBY IMMEDIATE command to 2 minutes. Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Asai Thambi S P authored
commit d1e714db upstream. A hardware quirk in P320h/P420m interfere with PCIe transactions on some AMD chipsets, making P320h/P420m unusable. This workaround is to disable ERO and NoSnoop bits in the parent and root complex for normal functioning of these devices NOTE: This workaround is specific to AMD chipset with a PCIe upstream device with device id 0x5aXX Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Sam Bradshaw <sbradshaw@micron.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Bjorn Helgaas authored
commit 67ebd814 upstream. 3448a19d "vgaarb: use bridges to control VGA routing where possible" added the "flags & PCI_VGA_STATE_CHANGE_DECODES" condition to an existing WARN_ON(), but used bitwise AND (&) instead of logical AND (&&), so the condition is never true. Replace with logical AND. Found by Coverity (CID 142811). Fixes: 3448a19d "vgaarb: use bridges to control VGA routing where possible" Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Yinghai Lu <yinghai@kernel.org> Acked-by: David Airlie <airlied@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Thomas Jarosch authored
commit 7c82126a upstream. After a CPU upgrade while keeping the same mainboard, we faced "spurious interrupt" problems again. It turned out that the new CPU also featured a new GPU with a different PCI ID. Add this PCI ID to the quirk table. Probably all other Intel GPU PCI IDs are affected, too, but I don't want to add them without a test system. See f67fd55f ("PCI: Add quirk for still enabled interrupts on Intel Sandy Bridge GPUs") for some history. [bhelgaas: add f67fd55f reference, stable tag] Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Hans de Goede authored
commit fb4f8f56 upstream. The touchpad on the GIGABYTE U2442 not only stops communicating when we try to set bit 3 (enable real hardware resolution) of reg_10, but on some BIOS versions also when we set bit 1 (enable two finger mode auto correct). I've asked the original reporter of: https://bugzilla.kernel.org/show_bug.cgi?id=61151 To check that not setting bit 1 does not lead to any adverse effects on his model / BIOS revision, and it does not, so this commit fixes the touchpad not working on these versions by simply never setting bit 1 for laptop models with the no_hw_res quirk. Reported-and-tested-by: James Lademann <jwlademann@gmail.com> Tested-by: Philipp Wolfer <ph.wolfer@gmail.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Hans de Goede authored
commit cd9e83e2 upstream. At least the Dell Vostro 5470 elantech *clickpad* reports right button clicks when clicked in the right bottom area: https://bugzilla.redhat.com/show_bug.cgi?id=1103528 This is different from how (elantech) clickpads normally operate, normally no matter where the user clicks on the pad the pad always reports a left button event, since there is only 1 hardware button beneath the path. It looks like Dell has put 2 buttons under the pad, one under each bottom corner, causing this. Since this however still clearly is a real clickpad hardware-wise, we still want to report it as such to userspace, so that things like finger movement in the bottom area can be properly ignored as it should be on clickpads. So deal with this weirdness by simply mapping a right click to a left click on elantech clickpads. As an added advantage this is something which we can simply do on all elantech clickpads, so no need to add special quirks for this weird model. Reported-and-tested-by: Elder Marco <eldermarco@gmail.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Mikulas Patocka authored
commit 81a9c5e7 upstream. On uniprocessor preemptible kernel, target core deadlocks on unload. The following events happen: * iscsit_del_np is called * it calls send_sig(SIGINT, np->np_thread, 1); * the scheduler switches to the np_thread * the np_thread is woken up, it sees that kthread_should_stop() returns false, so it doesn't terminate * the np_thread clears signals with flush_signals(current); and goes back to sleep in iscsit_accept_np * the scheduler switches back to iscsit_del_np * iscsit_del_np calls kthread_stop(np->np_thread); * the np_thread is waiting in iscsit_accept_np and it doesn't respond to kthread_stop The deadlock could be resolved if the administrator sends SIGINT signal to the np_thread with killall -INT iscsi_np The reproducible deadlock was introduced in commit db6077fd, but the thread-stopping code was racy even before. This patch fixes the problem. Using kthread_should_stop to stop the np_thread is unreliable, so we test np_thread_state instead. If np_thread_state equals ISCSI_NP_THREAD_SHUTDOWN, the thread exits. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Nicholas Bellinger authored
commit 68349756 upstream. This patch adds a explicit memset to the login response PDU exception path in iscsit_tx_login_rsp(). This addresses a regression bug introduced in commit baa4d64b where the initiator would end up not receiving the login response and associated status class + detail, before closing the login connection. Reported-by: Christophe Vu-Brugier <cvubrugier@yahoo.fr> Tested-by: Christophe Vu-Brugier <cvubrugier@yahoo.fr> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Nicholas Bellinger authored
commit 97c99b47 upstream. This patch changes iscsit_check_dataout_hdr() to dump the incoming Data-Out payload when the received ITT is not associated with a WRITE, instead of calling iscsit_reject_cmd() for the non WRITE ITT descriptor. This addresses a bug where an initiator sending an Data-Out for an ITT associated with a READ would end up generating a reject for the READ, eventually resulting in list corruption. Reported-by: Santosh Kulkarni <santosh.kulkarni@calsoftinc.com> Reported-by: Arshad Hussain <arshad.hussain@calsoftinc.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Nicholas Bellinger authored
commit 83ff42fc upstream. This patch fixes a left-over se_lun->lun_sep pointer OOPs when one of the /sys/kernel/config/target/$FABRIC/$WWPN/$TPGT/lun/$LUN/alua* attributes is accessed after the $DEVICE symlink has been removed. To address this bug, go ahead and clear se_lun->lun_sep memory in core_dev_unexport(), so that the existing checks for show/store ALUA attributes in target_core_fabric_configfs.c work as expected. Reported-by: Sebastian Herbszt <herbszt@gmx.de> Tested-by: Sebastian Herbszt <herbszt@gmx.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
- 16 Jul, 2014 6 commits
-
-
Thomas Gleixner authored
commit 27e35715 upstream. When the rtmutex fast path is enabled the slow unlock function can create the following situation: spin_lock(foo->m->wait_lock); foo->m->owner = NULL; rt_mutex_lock(foo->m); <-- fast path free = atomic_dec_and_test(foo->refcnt); rt_mutex_unlock(foo->m); <-- fast path if (free) kfree(foo); spin_unlock(foo->m->wait_lock); <--- Use after free. Plug the race by changing the slow unlock to the following scheme: while (!rt_mutex_has_waiters(m)) { /* Clear the waiters bit in m->owner */ clear_rt_mutex_waiters(m); owner = rt_mutex_owner(m); spin_unlock(m->wait_lock); if (cmpxchg(m->owner, owner, 0) == owner) return; spin_lock(m->wait_lock); } So in case of a new waiter incoming while the owner tries the slow path unlock we have two situations: unlock(wait_lock); lock(wait_lock); cmpxchg(p, owner, 0) == owner mark_rt_mutex_waiters(lock); acquire(lock); Or: unlock(wait_lock); lock(wait_lock); mark_rt_mutex_waiters(lock); cmpxchg(p, owner, 0) != owner enqueue_waiter(); unlock(wait_lock); lock(wait_lock); wakeup_next waiter(); unlock(wait_lock); lock(wait_lock); acquire(lock); If the fast path is disabled, then the simple m->owner = NULL; unlock(m->wait_lock); is sufficient as all access to m->owner is serialized via m->wait_lock; Also document and clarify the wakeup_next_waiter function as suggested by Oleg Nesterov. Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140611183852.937945560@linutronix.deSigned-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Thomas Gleixner authored
commit 3d5c9340 upstream. Even in the case when deadlock detection is not requested by the caller, we can detect deadlocks. Right now the code stops the lock chain walk and keeps the waiter enqueued, even on itself. Silly not to yell when such a scenario is detected and to keep the waiter enqueued. Return -EDEADLK unconditionally and handle it at the call sites. The futex calls return -EDEADLK. The non futex ones dequeue the waiter, throw a warning and put the task into a schedule loop. Tagged for stable as it makes the code more robust. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Brad Mouring <bmouring@ni.com> Link: http://lkml.kernel.org/r/20140605152801.836501969@linutronix.deSigned-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Thomas Gleixner authored
commit 82084984 upstream. When we walk the lock chain, we drop all locks after each step. So the lock chain can change under us before we reacquire the locks. That's harmless in principle as we just follow the wrong lock path. But it can lead to a false positive in the dead lock detection logic: T0 holds L0 T0 blocks on L1 held by T1 T1 blocks on L2 held by T2 T2 blocks on L3 held by T3 T4 blocks on L4 held by T4 Now we walk the chain lock T1 -> lock L2 -> adjust L2 -> unlock T1 -> lock T2 -> adjust T2 -> drop locks T2 times out and blocks on L0 Now we continue: lock T2 -> lock L0 -> deadlock detected, but it's not a deadlock at all. Brad tried to work around that in the deadlock detection logic itself, but the more I looked at it the less I liked it, because it's crystal ball magic after the fact. We actually can detect a chain change very simple: lock T1 -> lock L2 -> adjust L2 -> unlock T1 -> lock T2 -> adjust T2 -> next_lock = T2->pi_blocked_on->lock; drop locks T2 times out and blocks on L0 Now we continue: lock T2 -> if (next_lock != T2->pi_blocked_on->lock) return; So if we detect that T2 is now blocked on a different lock we stop the chain walk. That's also correct in the following scenario: lock T1 -> lock L2 -> adjust L2 -> unlock T1 -> lock T2 -> adjust T2 -> next_lock = T2->pi_blocked_on->lock; drop locks T3 times out and drops L3 T2 acquires L3 and blocks on L4 now Now we continue: lock T2 -> if (next_lock != T2->pi_blocked_on->lock) return; We don't have to follow up the chain at that point, because T2 propagated our priority up to T4 already. [ Folded a cleanup patch from peterz ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: Brad Mouring <bmouring@ni.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140605152801.930031935@linutronix.deSigned-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Thomas Gleixner authored
commit 397335f0 upstream. The current deadlock detection logic does not work reliably due to the following early exit path: /* * Drop out, when the task has no waiters. Note, * top_waiter can be NULL, when we are in the deboosting * mode! */ if (top_waiter && (!task_has_pi_waiters(task) || top_waiter != task_top_pi_waiter(task))) goto out_unlock_pi; So this not only exits when the task has no waiters, it also exits unconditionally when the current waiter is not the top priority waiter of the task. So in a nested locking scenario, it might abort the lock chain walk and therefor miss a potential deadlock. Simple fix: Continue the chain walk, when deadlock detection is enabled. We also avoid the whole enqueue, if we detect the deadlock right away (A-A). It's an optimization, but also prevents that another waiter who comes in after the detection and before the task has undone the damage observes the situation and detects the deadlock and returns -EDEADLOCK, which is wrong as the other task is not in a deadlock situation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Link: http://lkml.kernel.org/r/20140522031949.725272460@linutronix.deSigned-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Florian Westphal authored
[ Upstream commit 945b2b2d ] Quoting Samu Kallio: Basically what's happening is, during netns cleanup, nf_nat_net_exit gets called before ipv4_net_exit. As I understand it, nf_nat_net_exit is supposed to kill any conntrack entries which have NAT context (through nf_ct_iterate_cleanup), but for some reason this doesn't happen (perhaps something else is still holding refs to those entries?). When ipv4_net_exit is called, conntrack entries (including those with NAT context) are cleaned up, but the nat_bysource hashtable is long gone - freed in nf_nat_net_exit. The bug happens when attempting to free a conntrack entry whose NAT hash 'prev' field points to a slot in the freed hash table (head for that bin). We ignore conntracks with null nat bindings. But this is wrong, as these are in bysource hash table as well. Restore nat-cleaning for the netns-is-being-removed case. bug: https://bugzilla.kernel.org/show_bug.cgi?id=65191 Cc: <stable@vger.kernel.org> # 3.15.x Cc: <stable@vger.kernel.org> # 3.14.x Cc: <stable@vger.kernel.org> # 3.12.x Cc: <stable@vger.kernel.org> # 3.10.x Fixes: c2d421e1 ('netfilter: nf_nat: fix race when unloading protocol modules') Reported-by: Samu Kallio <samu.kallio@aberdeencloud.com> Debugged-by: Samu Kallio <samu.kallio@aberdeencloud.com> Signed-off-by: Florian Westphal <fw@strlen.de> Tested-by: Samu Kallio <samu.kallio@aberdeencloud.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Julian Anastasov authored
[ Upstream commit 9802d21e ] The tot_stats estimator is started only when CONFIG_SYSCTL is defined. But it is stopped without checking CONFIG_SYSCTL. Fix the crash by moving ip_vs_stop_estimator into ip_vs_control_net_cleanup_sysctl. The change is needed after commit 14e40546 ("IPVS: Add __ip_vs_control_{init,cleanup}_sysctl()") from 2.6.39. Cc: <stable@vger.kernel.org> # 3.15.x Cc: <stable@vger.kernel.org> # 3.14.x Cc: <stable@vger.kernel.org> # 3.12.x Cc: <stable@vger.kernel.org> # 3.10.x Cc: <stable@vger.kernel.org> # 3.2.x Reported-by: Jet Chen <jet.chen@intel.com> Tested-by: Jet Chen <jet.chen@intel.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Sgned-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
- 14 Jul, 2014 1 commit
-
-
Jiri Slaby authored
This reverts commit 0e2e24e5, which was applied twice mistakenly. The first one is bee3f7b8. Reported-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Petr Matousek <pmatouse@redhat.com> Cc: Kent Overstreet <kmo@daterainc.com> Cc: Jeff Moyer <jmoyer@redhat.com>
-
- 04 Jul, 2014 10 commits
-
-
Jiri Slaby authored
-
Jie Liu authored
commit f9fd0135 upstream. For discard operation, we should return EINVAL if the given range length is less than a block size, otherwise it will go through the file system to discard data blocks as the end range might be evaluated to -1, e.g, /xfs7: 9811378176 bytes were trimmed This issue can be triggered via xfstests/generic/288. Also, it seems to get the request queue pointer via bdev_get_queue() instead of the hard code pointer dereference is not a bad thing. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Dave Chinner authored
commit 27320369 upstream. Removing an inode from the namespace involves removing the directory entry and dropping the link count on the inode. Removing the directory entry can result in locking an AGF (directory blocks were freed) and removing a link count can result in placing the inode on an unlinked list which results in locking an AGI. The big problem here is that we have an ordering constraint on AGF and AGI locking - inode allocation locks the AGI, then can allocate a new extent for new inodes, locking the AGF after the AGI. Similarly, freeing the inode removes the inode from the unlinked list, requiring that we lock the AGI first, and then freeing the inode can result in an inode chunk being freed and hence freeing disk space requiring that we lock an AGF. Hence the ordering that is imposed by other parts of the code is AGI before AGF. This means we cannot remove the directory entry before we drop the inode reference count and put it on the unlinked list as this results in a lock order of AGF then AGI, and this can deadlock against inode allocation and freeing. Therefore we must drop the link counts before we remove the directory entry. This is still safe from a transactional point of view - it is not until we get to xfs_bmap_finish() that we have the possibility of multiple transactions in this operation. Hence as long as we remove the directory entry and drop the link count in the first transaction of the remove operation, there are no transactional constraints on the ordering here. Change the ordering of the operations in the xfs_remove() function to align the ordering of AGI and AGF locking to match that of the rest of the code. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Jie Liu authored
commit bb86d21c upstream. At xfs_iext_add(), if extent(s) are being appended to the last page in the indirection array and the new extent(s) don't fit in the page, the number of extents(erp->er_extcount) in a new allocated entry should be the minimum value between count and XFS_LINEAR_EXTS, instead of count. For now, there is no existing test case can demonstrates a problem with the er_extcount being set incorrectly here, but it obviously like a bug. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Geyslan G. Bem authored
commit 643f7c4e upstream. In xlog_verify_iclog a debug check of the incore log buffers prints an error if icptr is null and then goes on to dereference the pointer regardless. Convert this to an assert so that the intention is clear. This was reported by Coverty. Signed-off-by: Ben Myers <bpm@sgi.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Dave Chinner authored
commit ad22c7a0 upstream. Page cache allocation doesn't always go through ->begin_write and hence we don't always get the opportunity to set the allocation context to GFP_NOFS. Failing to do this means we open up the direct relcaim stack to recurse into the filesystem and consume a significant amount of stack. On RHEL6.4 kernels we are seeing ra_submit() and generic_file_splice_read() from an nfsd context recursing into the filesystem via the inode cache shrinker and evicting inodes. This is causing truncation to be run (e.g EOF block freeing) and causing bmap btree block merges and free space btree block splits to occur. These btree manipulations are occurring with the call chain already 30 functions deep and hence there is not enough stack space to complete such operations. To avoid these specific overruns, we need to prevent the page cache allocation from recursing via direct reclaim. We can do that because the allocation functions take the allocation context from that which is stored in the mapping for the inode. We don't set that right now, so the default is GFP_HIGHUSER_MOVABLE, which is effectively a GFP_KERNEL context. We need it to be the equivalent of GFP_NOFS, so when we initialise an inode, set the mapping gfp mask appropriately. This makes the use of AOP_FLAG_NOFS redundant from other parts of the XFS IO path, so get rid of it. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Eric Sandeen authored
commit 59e5a0e8 upstream. When xfs_growfs_data_private() is updating backup superblocks, it bails out on the first error encountered, whether reading or writing: * If we get an error writing out the alternate superblocks, * just issue a warning and continue. The real work is * already done and committed. This can cause a problem later during repair, because repair looks at all superblocks, and picks the most prevalent one as correct. If we bail out early in the backup superblock loop, we can end up with more "bad" matching superblocks than good, and a post-growfs repair may revert the filesystem to the old geometry. With the combination of superblock verifiers and old bugs, we're more likely to encounter read errors due to verification. And perhaps even worse, we don't even properly write any of the newly-added superblocks in the new AGs. Even with this change, growfs will still say: xfs_growfs: XFS_IOC_FSGROWFSDATA xfsctl failed: Structure needs cleaning data blocks changed from 319815680 to 335216640 which might be confusing to the user, but it at least communicates that something has gone wrong, and dmesg will probably highlight the need for an xfs_repair. And this is still best-effort; if verifiers fail on more than half the backup supers, they may still "win" - but that's probably best left to repair to more gracefully handle by doing its own strict verification as part of the backup super "voting." Signed-off-by: Eric Sandeen <sandeen@redhat.com> Acked-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Eric Sandeen authored
commit 31625f28 upstream. If we get EWRONGFS due to probing of non-xfs filesystems, there's no need to issue the scary corruption error and backtrace. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Dave Chinner authored
commit 2c6e24ce upstream. Recent analysis of a deadlocked XFS filesystem from a kernel crash dump indicated that the filesystem was stuck waiting for log space. The short story of the hang on the RHEL6 kernel is this: - the tail of the log is pinned by an inode - the inode has been pushed by the xfsaild - the inode has been flushed to it's backing buffer and is currently flush locked and hence waiting for backing buffer IO to complete and remove it from the AIL - the backing buffer is marked for write - it is on the delayed write queue - the inode buffer has been modified directly and logged recently due to unlinked inode list modification - the backing buffer is pinned in memory as it is in the active CIL context. - the xfsbufd won't start buffer writeback because it is pinned - xfssyncd won't force the log because it sees the log as needing to be covered and hence wants to issue a dummy transaction to move the log covering state machine along. Hence there is no trigger to force the CIL to the log and hence unpin the inode buffer and therefore complete the inode IO, remove it from the AIL and hence move the tail of the log along, allowing transactions to start again. Mainline kernels also have the same deadlock, though the signature is slightly different - the inode buffer never reaches the delayed write lists because xfs_buf_item_push() sees that it is pinned and hence never adds it to the delayed write list that the xfsaild flushes. There are two possible solutions here. The first is to simply force the log before trying to cover the log and so ensure that the CIL is emptied before we try to reserve space for the dummy transaction in the xfs_log_worker(). While this might work most of the time, it is still racy and is no guarantee that we don't get stuck in xfs_trans_reserve waiting for log space to come free. Hence it's not the best way to solve the problem. The second solution is to modify xfs_log_need_covered() to be aware of the CIL. We only should be attempting to cover the log if there is no current activity in the log - covering the log is the process of ensuring that the head and tail in the log on disk are identical (i.e. the log is clean and at idle). Hence, by definition, if there are items in the CIL then the log is not at idle and so we don't need to attempt to cover it. When we don't need to cover the log because it is active or idle, we issue a log force from xfs_log_worker() - if the log is idle, then this does nothing. However, if the log is active due to there being items in the CIL, it will force the items in the CIL to the log and unpin them. In the case of the above deadlock scenario, instead of xfs_log_worker() getting stuck in xfs_trans_reserve() attempting to cover the log, it will instead force the log, thereby unpinning the inode buffer, allowing IO to be issued and complete and hence removing the inode that was pinning the tail of the log from the AIL. At that point, everything will start moving along again. i.e. the xfs_log_worker turns back into a watchdog that can alleviate deadlocks based around pinned items that prevent the tail of the log from being moved... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
Jie Liu authored
commit 17ec81c1 upstream. At xfs_iext_realloc_direct(), the new_size is changed by adding if_bytes if originally the extent records are stored at the inline extent buffer, and we have to switch from it to a direct extent list for those new allocated extents, this is wrong. e.g, Create a file with three extents which was showing as following, xfs_io -f -c "truncate 100m" /xfs/testme for i in $(seq 0 5 10); do offset=$(($i * $((1 << 20)))) xfs_io -c "pwrite $offset 1m" /xfs/testme done Inline ------ irec: if_bytes bytes_diff new_size 1st 0 16 16 2nd 16 16 32 Switching --------- rnew_size 3rd 32 16 48 + 32 = 80 roundup=128 In this case, the desired value of new_size should be 48, and then it will be roundup to 64 and be assigned to rnew_size. However, this issue has been covered by resetting the if_bytes to the new_size which is calculated at the begnning of xfs_iext_add() before leaving out this function, and in turn make the rnew_size correctly again. Hence, this can not be detected via xfstestes. This patch fix above problem and revise the new_size comments at xfs_iext_realloc_direct() to make it more readable. Also, fix the comments while switching from the inline extent buffer to a direct extent list to reflect this change. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-
- 02 Jul, 2014 1 commit
-
-
Johan Hedberg authored
commit e694788d upstream. The conn->link_key variable tracks the type of link key in use. It is set whenever we respond to a link key request as well as when we get a link key notification event. These two events do not however always guarantee that encryption is enabled: getting a link key request and responding to it may only mean that the remote side has requested authentication but not encryption. On the other hand, the encrypt change event is a certain guarantee that encryption is enabled. The real encryption state is already tracked in the conn->link_mode variable through the HCI_LM_ENCRYPT bit. This patch fixes a check for encryption in the hci_conn_auth function to use the proper conn->link_mode value and thereby eliminates the chance of a false positive result. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org Signed-off-by: Jiri Slaby <jslaby@suse.cz>
-