- 15 Jan, 2015 40 commits
-
-
John David Anglin authored
commit 45db0738 upstream. The __ldcw macro has a problem when its argument needs to be reloaded from memory. The output memory operand and the input register operand both need to be reloaded using a register in class R1_REGS when generating 64-bit code. This fails because there's only a single register in the class. Instead, use a memory clobber. This also makes the __ldcw macro a compiler memory barrier. Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Richard Guy Briggs authored
commit 041d7b98 upstream. A regression was caused by commit 780a7654: audit: Make testing for a valid loginuid explicit. (which in turn attempted to fix a regression caused by e1760bd5) When audit_krule_to_data() fills in the rules to get a listing, there was a missing clause to convert back from AUDIT_LOGINUID_SET to AUDIT_LOGINUID. This broke userspace by not returning the same information that was sent and expected. The rule: auditctl -a exit,never -F auid=-1 gives: auditctl -l LIST_RULES: exit,never f24=0 syscall=all when it should give: LIST_RULES: exit,never auid=-1 (0xffffffff) syscall=all Tag it so that it is reported the same way it was set. Create a new private flags audit_krule field (pflags) to store it that won't interact with the public one from the API. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Lorenzo Pieralisi authored
commit f43c2718 upstream. On arm64 the TTBR0_EL1 register is set to either the reserved TTBR0 page tables on boot or to the active_mm mappings belonging to user space processes, it must never be set to swapper_pg_dir page tables mappings. When a CPU is booted its active_mm is set to init_mm even though its TTBR0_EL1 points at the reserved TTBR0 page mappings. This implies that when __cpu_suspend is triggered the active_mm can point at init_mm even if the current TTBR0_EL1 register contains the reserved TTBR0_EL1 mappings. Therefore, the mm save and restore executed in __cpu_suspend might turn out to be erroneous in that, if the current->active_mm corresponds to init_mm, on resume from low power it ends up restoring in the TTBR0_EL1 the init_mm mappings that are global and can cause speculation of TLB entries which end up being propagated to user space. This patch fixes the issue by checking the active_mm pointer before restoring the TTBR0 mappings. If the current active_mm == &init_mm, the code sets the TTBR0_EL1 to the reserved TTBR0 mapping instead of switching back to the active_mm, which is the expected behaviour corresponding to the TTBR0_EL1 settings when __cpu_suspend was entered. Fixes: 95322526 ("arm64: kernel: cpu_{suspend/resume} implementation") Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Laura Abbott authored
commit c3684fbb upstream. The function cpu_resume currently lives in the .data section. There's no reason for it to be there since we can use relative instructions without a problem. Move a few cpu_resume data structures out of the assembly file so the .data annotation can be dropped completely and cpu_resume ends up in the read only text section. Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Tested-by: Kees Cook <keescook@chromium.org> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Laura Abbott <lauraa@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com> [ luis: 3.16-stable prereq for f43c2718 "arm64: kernel: fix __cpu_suspend mm switch on warm-boot" ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Lorenzo Pieralisi authored
commit 714f5992 upstream. CPU suspend is the standard kernel interface to be used to enter low-power states on ARM64 systems. Current cpu_suspend implementation by default assumes that all low power states are losing the CPU context, so the CPU registers must be saved and cleaned to DRAM upon state entry. Furthermore, the current cpu_suspend() implementation assumes that if the CPU suspend back-end method returns when called, this has to be considered an error regardless of the return code (which can be successful) since the CPU was not expected to return from a code path that is different from cpu_resume code path - eg returning from the reset vector. All in all this means that the current API does not cope well with low-power states that preserve the CPU context when entered (ie retention states), since first of all the context is saved for nothing on state entry for those states and a successful state entry can return as a normal function return, which is considered an error by the current CPU suspend implementation. This patch refactors the cpu_suspend() API so that it can be split in two separate functionalities. The arm64 cpu_suspend API just provides a wrapper around CPU suspend operation hook. A new function is introduced (for architecture code use only) for states that require context saving upon entry: __cpu_suspend(unsigned long arg, int (*fn)(unsigned long)) __cpu_suspend() saves the context on function entry and calls the so called suspend finisher (ie fn) to complete the suspend operation. The finisher is not expected to return, unless it fails in which case the error is propagated back to the __cpu_suspend caller. The API refactoring results in the following pseudo code call sequence for a suspending CPU, when triggered from a kernel subsystem: /* * int cpu_suspend(unsigned long idx) * @idx: idle state index */ { -> cpu_suspend(idx) |---> CPU operations suspend hook called, if present |--> if (retention_state) |--> direct suspend back-end call (eg PSCI suspend) else |--> __cpu_suspend(idx, &back_end_finisher); } By refactoring the cpu_suspend API this way, the CPU operations back-end has a chance to detect whether idle states require state saving or not and can call the required suspend operations accordingly either through simple function call or indirectly through __cpu_suspend() which carries out state saving and suspend finisher dispatching to complete idle state entry. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> [ luis: 3.16-stable prereq for f43c2718 "arm64: kernel: fix __cpu_suspend mm switch on warm-boot" ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Lorenzo Pieralisi authored
commit 18ab7db6 upstream. Suspend init function must be marked as __init, since it is not needed after the kernel has booted. This patch moves the cpu_suspend_init() function to the __init section. Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> [ luis: 3.16-stable prereq for f43c2718 "arm64: kernel: fix __cpu_suspend mm switch on warm-boot" ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Richard Guy Briggs authored
commit 54dc77d9 upstream. Eric Paris explains: Since kauditd_send_multicast_skb() gets called in audit_log_end(), which can come from any context (aka even a sleeping context) GFP_KERNEL can't be used. Since the audit_buffer knows what context it should use, pass that down and use that. See: https://lkml.org/lkml/2014/12/16/542 BUG: sleeping function called from invalid context at mm/slab.c:2849 in_atomic(): 1, irqs_disabled(): 0, pid: 885, name: sulogin 2 locks held by sulogin/885: #0: (&sig->cred_guard_mutex){+.+.+.}, at: [<ffffffff91152e30>] prepare_bprm_creds+0x28/0x8b #1: (tty_files_lock){+.+.+.}, at: [<ffffffff9123e787>] selinux_bprm_committing_creds+0x55/0x22b CPU: 1 PID: 885 Comm: sulogin Not tainted 3.18.0-next-20141216 #30 Hardware name: Dell Inc. Latitude E6530/07Y85M, BIOS A15 06/20/2014 ffff880223744f10 ffff88022410f9b8 ffffffff916ba529 0000000000000375 ffff880223744f10 ffff88022410f9e8 ffffffff91063185 0000000000000006 0000000000000000 0000000000000000 0000000000000000 ffff88022410fa38 Call Trace: [<ffffffff916ba529>] dump_stack+0x50/0xa8 [<ffffffff91063185>] ___might_sleep+0x1b6/0x1be [<ffffffff910632a6>] __might_sleep+0x119/0x128 [<ffffffff91140720>] cache_alloc_debugcheck_before.isra.45+0x1d/0x1f [<ffffffff91141d81>] kmem_cache_alloc+0x43/0x1c9 [<ffffffff914e148d>] __alloc_skb+0x42/0x1a3 [<ffffffff914e2b62>] skb_copy+0x3e/0xa3 [<ffffffff910c263e>] audit_log_end+0x83/0x100 [<ffffffff9123b8d3>] ? avc_audit_pre_callback+0x103/0x103 [<ffffffff91252a73>] common_lsm_audit+0x441/0x450 [<ffffffff9123c163>] slow_avc_audit+0x63/0x67 [<ffffffff9123c42c>] avc_has_perm+0xca/0xe3 [<ffffffff9123dc2d>] inode_has_perm+0x5a/0x65 [<ffffffff9123e7ca>] selinux_bprm_committing_creds+0x98/0x22b [<ffffffff91239e64>] security_bprm_committing_creds+0xe/0x10 [<ffffffff911515e6>] install_exec_creds+0xe/0x79 [<ffffffff911974cf>] load_elf_binary+0xe36/0x10d7 [<ffffffff9115198e>] search_binary_handler+0x81/0x18c [<ffffffff91153376>] do_execveat_common.isra.31+0x4e3/0x7b7 [<ffffffff91153669>] do_execve+0x1f/0x21 [<ffffffff91153967>] SyS_execve+0x25/0x29 [<ffffffff916c61a9>] stub_execve+0x69/0xa0 Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Tested-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Signed-off-by: Paul Moore <pmoore@redhat.com> [ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Paul Moore authored
commit 3640dcfa upstream. Commit f1dc4867 ("audit: anchor all pid references in the initial pid namespace") introduced a find_vpid() call when adding/removing audit rules with PID/PPID filters; unfortunately this is problematic as find_vpid() only works if there is a task with the associated PID alive on the system. The following commands demonstrate a simple reproducer. # auditctl -D # auditctl -l # autrace /bin/true # auditctl -l This patch resolves the problem by simply using the PID provided by the user without any additional validation, e.g. no calls to check to see if the task/PID exists. Cc: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <pmoore@redhat.com> Acked-by: Eric Paris <eparis@redhat.com> Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Thomas Gleixner authored
commit a5fd9733 upstream. commit 4dbd2771 "tick: export nohz tick idle symbols for module use" was merged via the thermal tree without an explicit ack from the relevant maintainers. The exports are abused by the intel powerclamp driver which implements a fake idle state from a sched FIFO task. This causes all kinds of wreckage in the NOHZ core code which rightfully assumes that tick_nohz_idle_enter/exit() are only called from the idle task itself. Recent changes in the NOHZ core lead to a failure of the powerclamp driver and now people try to hack completely broken and backwards workarounds into the NOHZ core code. This is completely unacceptable and just papers over the real problem. There are way more subtle issues lurking around the corner. The real solution is to fix the powerclamp driver by rewriting it with a sane concept, but that's beyond the scope of this. So the only solution for now is to remove the calls into the core NOHZ code from the powerclamp trainwreck along with the exports. Fixes: d6d71ee4 "PM: Introduce Intel PowerClamp Driver" Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Pan Jacob jun <jacob.jun.pan@intel.com> Cc: LKP <lkp@01.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Zhang Rui <rui.zhang@intel.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412181110110.17382@nanosSigned-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Jiri Jaburek authored
commit d70a1b98 upstream. The Arcam rPAC seems to have the same problem - whenever anything (alsamixer, udevd, 3.9+ kernel from 60af3d03, ..) attempts to access mixer / control interface of the card, the firmware "locks up" the entire device, resulting in SNDRV_PCM_IOCTL_HW_PARAMS failed (-5): Input/output error from alsa-lib. Other operating systems can somehow read the mixer (there seems to be playback volume/mute), but any manipulation is ignored by the device (which has hardware volume controls). Signed-off-by: Jiri Jaburek <jjaburek@redhat.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Andy Lutomirski authored
commit 3fb2f423 upstream. It turns out that there's a lurking ABI issue. GCC, when compiling this in a 32-bit program: struct user_desc desc = { .entry_number = idx, .base_addr = base, .limit = 0xfffff, .seg_32bit = 1, .contents = 0, /* Data, grow-up */ .read_exec_only = 0, .limit_in_pages = 1, .seg_not_present = 0, .useable = 0, }; will leave .lm uninitialized. This means that anything in the kernel that reads user_desc.lm for 32-bit tasks is unreliable. Revert the .lm check in set_thread_area(). The value never did anything in the first place. Fixes: 0e58af4e ("x86/tls: Disallow unusual TLS segments") Signed-off-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/d7875b60e28c512f6a6fc0baf5714d58e7eaadbb.1418856405.git.luto@amacapital.netSigned-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Thomas Petazzoni authored
commit 00d8689b upstream. Originally, the I2C controller supported by the i2c-mv64xxx driver requires a lot of software support: an interrupt is generated at each step of an I2C transaction (after the start bit, after sending the address, etc.) and the driver is in charge of re-programming the I2C controller to do the next step of the I2C transaction. This explains the fairly complex state machine that the driver has. On Marvell Armada XP and later processors (Armada 375, 38x, etc.), the I2C controller was extended with a part called the "I2C Bridge", which allows to offload the I2C transaction completely to the hardware. Initial support for this mechanism was added in commit 930ab3d4 ("i2c: mv64xxx: Add I2C Transaction Generator support"). However, the implementation done in this commit has two related issues, which this commit fixes by completely changing how the offload implementation is done: * SMBus read transfers, where there is one write to select the register immediately followed in the same transaction by one read, were making the processor hang. This was easier visible on the Marvell Armada XP WRT1900AC platform using a driver for an I2C LED controller, or on other Armada XP platforms by using a simple 'i2cget' command to read an I2C EEPROM. * The implementation was based on the fact that the offload engine was re-programmed to transfer each message of an I2C xfer: this meant that each message sent with the offload engine was starting with a normal I2C start sequence. However, the I2C subsystem assumes that all messages belonging to the same xfer will use the so-called "repeated start" so that the entire I2C xfer is seen as one transfer by the I2C devices and cannot be interrupt by other I2C masters on the same bus. In fact, the "I2C Bridge" allows to offload three types of xfer: - xfer of one write message - xfer of one read message - xfer of one write message followed by one read message For all other situations, we have to fallback to not using the "I2C Bridge" in order to get proper I2C semantics. Therefore, this commit reworks the offload implementation to put it not at the message level, but at the xfer level: in the mv64xxx_i2c_xfer() function, we decide if the transaction can be offloaded (in which case it is handled by the mv64xxx_i2c_offload_xfer() function), or otherwise it is handled by the slow path (implemented in the existing mv64xxx_i2c_execute_msg()). This allows to simplify the state machine, which no longer needs to have any state related to the offload implementation: the offload implementation is now completely separated from the slow path (with the exception of the interrupt handler, of course). In summary: - mv64xxx_i2c_can_offload() will analyze an I2C xfer and decided of the "I2C Bridge" can be used to offload it or not. - mv64xxx_i2c_offload_xfer() will actually program the "I2C Bridge" to offload one xfer (of either one or two messages), and block using mv64xxx_i2c_wait_for_completion() until the xfer completes. - The interrupt handler mv64xxx_i2c_intr() is modified to push the offload related code to a separate function, mv64xxx_i2c_intr_offload(). It will take care of reading the received data if needed. This commit was tested on: - Armada XP OpenBlocks AX3-4 (EEPROM on I2C and RTC on I2C) - Armada XP WRT1900AC (LED controller on I2C) - Armada XP GP (EEPROM on I2C) Fixes: 930ab3d4 ("i2c: mv64xxx: Add I2C Transaction Generator support") Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> [wsa: fixed checkpatch warnings] Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Thomas Petazzoni authored
commit 12598695 upstream. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
zhendong chen authored
commit 5164bece upstream. In bio-based DM's clone_endio(), when target_type doesn't implement .end_io (e.g. linear) r will be always be initialized 0. So if a WRITE SAME bio fails WRITE SAME will not be disabled as intended. Fix this by initializing r to error, rather than 0, in clone_endio(). Signed-off-by: Alex Chen <alex.chen@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Fixes: 7eee4ae2 ("dm: disable WRITE SAME if it fails") Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Joe Thornber authored
commit 2c43fd26 upstream. Discard bios and thin device deletion have the potential to release data blocks. If the thin-pool is in out-of-data-space mode, and blocks were released, transition the thin-pool back to full write mode. The correct time to do this is just after the thin-pool metadata commit. It cannot be done before the commit because the space maps will not allow immediate reuse of the data blocks in case there's a rollback following power failure. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Joe Thornber authored
commit 45ec9bd0 upstream. When the pool was in PM_OUT_OF_SPACE mode its process_prepared_discard function pointer was incorrectly being set to process_prepared_discard_passdown rather than process_prepared_discard. This incorrect function pointer meant the discard was being passed down, but not effecting the mapping. As such any discard that was issued, in an attempt to reclaim blocks, would not successfully free data space. Reported-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Kailang Yang authored
commit 8b72415d upstream. New Dell desktop needs to support headset mode for ALC3234. Signed-off-by: Kailang Yang <kailang@realtek.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> [ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Chris Wilson authored
commit add284a3 upstream. In order to act as a full command barrier by itself, we need to tell the pipecontrol to actually stall the command streamer while the flush runs. We require the full command barrier before operations like MI_SET_CONTEXT, which currently rely on a prior invalidate flush. References: https://bugs.freedesktop.org/show_bug.cgi?id=83677 Cc: Simon Farnsworth <simon@farnz.org.uk> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Chris Wilson authored
commit 148b83d0 upstream. In the gen7 pipe control there is an extra bit to flush the media caches, so let's set it during cache invalidation flushes. v2: Rename to MEDIA_STATE_CLEAR to be more inline with spec. Cc: Simon Farnsworth <simon@farnz.org.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Nicholas Bellinger authored
commit 6bf6ca75 upstream. This patch changes iscsit_do_tx_data() to fail on short writes when kernel_sendmsg() returns a value different than requested transfer length, returning -EPIPE and thus causing a connection reset to occur. This avoids a potential bug in the original code where a short write would result in kernel_sendmsg() being called again with the original iovec base + length. In practice this has not been an issue because iscsit_do_tx_data() is only used for transferring 48 byte headers + 4 byte digests, along with seldom used control payloads from NOPIN + TEXT_RSP + REJECT with less than 32k of data. So following Al's audit of iovec consumers, go ahead and fail the connection on short writes for now, and remove the bogus logic ahead of his proper upstream fix. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Long Li authored
commit e86fb5e8 upstream. When ring buffer returns an error indicating retry, storvsc may not return a proper error code to SCSI when bounce buffer is not used. This has introduced I/O freeze on RAID running atop storvsc devices. This patch fixes it by always returning a proper error code. Signed-off-by: Long Li <longli@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Martin K. Petersen authored
commit 198a956a upstream. The Microsoft iSCSI target does not support REPORT SUPPORTED OPERATION CODES. Blacklist these devices so we don't attempt to send the command. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Tested-by: Mike Christie <michaelc@cs.wisc.edu> Reported-by: jazz@deti74.ru Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Paul Mackerras authored
commit 8117ac6a upstream. Currently, when going idle, we set the flag indicating that we are in nap mode (paca->kvm_hstate.hwthread_state) and then execute the nap (or sleep or rvwinkle) instruction, all with the MMU on. This is bad for two reasons: (a) the architecture specifies that those instructions must be executed with the MMU off, and in fact with only the SF, HV, ME and possibly RI bits set, and (b) this introduces a race, because as soon as we set the flag, another thread can switch the MMU to a guest context. If the race is lost, this thread will typically start looping on relocation-on ISIs at 0xc...4400. This fixes it by setting the MSR as required by the architecture before setting the flag or executing the nap/sleep/rvwinkle instruction. [ shreyas@linux.vnet.ibm.com: Edited to handle LE ] Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Thomas Gleixner authored
commit c291ee62 upstream. Since the rework of the sparse interrupt code to actually free the unused interrupt descriptors there exists a race between the /proc interfaces to the irq subsystem and the code which frees the interrupt descriptor. CPU0 CPU1 show_interrupts() desc = irq_to_desc(X); free_desc(desc) remove_from_radix_tree(); kfree(desc); raw_spinlock_irq(&desc->lock); /proc/interrupts is the only interface which can actively corrupt kernel memory via the lock access. /proc/stat can only read from freed memory. Extremly hard to trigger, but possible. The interfaces in /proc/irq/N/ are not affected by this because the removal of the proc file is serialized in procfs against concurrent readers/writers. The removal happens before the descriptor is freed. For architectures which have CONFIG_SPARSE_IRQ=n this is a non issue as the descriptor is never freed. It's merely cleared out with the irq descriptor lock held. So any concurrent proc access will either see the old correct value or the cleared out ones. Protect the lookup and access to the irq descriptor in show_interrupts() with the sparse_irq_lock. Provide kstat_irqs_usr() which is protecting the lookup and access with sparse_irq_lock and switch /proc/stat to use it. Document the existing kstat_irqs interfaces so it's clear that the caller needs to take care about protection. The users of these interfaces are either not affected due to SPARSE_IRQ=n or already protected against removal. Fixes: 1f5a5b87 "genirq: Implement a sane sparse_irq allocator" Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Sagi Grimberg authored
commit 23a548ee upstream. iSER will report supported protection operations based on the tpg attribute t10_pi settings and HCA PI offload capabilities. If the HCA does not support PI offload or tpg attribute t10_pi is not set, we fall to SW PI mode. In order to do that, we move iscsit_get_sup_prot_ops after connection tpg assignment. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Sagi Grimberg authored
commit 302cc7c3 upstream. Fallback to software mode DIF if HCA does not support PI (without crashing obviously). It is still possible to run with backend protection and an unprotected frontend, so looking at the command prot_op is not enough. Check device PI capability on a per-IO basis (isert_prot_cmd inline static) to determine if we need to handle protection information. Trace: BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 IP: [<ffffffffa037f8b1>] isert_reg_sig_mr+0x351/0x3b0 [ib_isert] Call Trace: [<ffffffff812b003a>] ? swiotlb_map_sg_attrs+0x7a/0x130 [<ffffffffa038184d>] isert_reg_rdma+0x2fd/0x370 [ib_isert] [<ffffffff8108f2ec>] ? idle_balance+0x6c/0x2c0 [<ffffffffa0382b68>] isert_put_datain+0x68/0x210 [ib_isert] [<ffffffffa02acf5b>] lio_queue_data_in+0x2b/0x30 [iscsi_target_mod] [<ffffffffa02306eb>] target_complete_ok_work+0x21b/0x310 [target_core_mod] [<ffffffff8106ece2>] process_one_work+0x182/0x3b0 [<ffffffff8106fda0>] worker_thread+0x120/0x3c0 [<ffffffff8106fc80>] ? maybe_create_worker+0x190/0x190 [<ffffffff8107594e>] kthread+0xce/0xf0 [<ffffffff81075880>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8159a22c>] ret_from_fork+0x7c/0xb0 [<ffffffff81075880>] ? kthread_freezable_should_stop+0x70/0x70 Reported-by: Slava Shwartsman <valyushash@gmail.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Sagi Grimberg authored
commit 570db170 upstream. This patch converts to allocate PI contexts dynamically in order avoid a potentially bogus np->tpg_np and associated NULL pointer dereference in isert_connect_request() during iser-target endpoint shutdown with multiple network portals. Also, there is really no need to allocate these at connection establishment since it is not guaranteed that all the IOs on that connection will be to a PI formatted device. We can do it in a lazy fashion so the initial burst will have a transient slow down, but very fast all IOs will allocate a PI context. Squashed: iser-target: Centralize PI context handling code Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Sagi Grimberg authored
commit b02efbfc upstream. In situations such as bond failover, The new session establishment implicitly invokes the termination of the old connection. So, we don't want to wait for the old connection wait_conn to completely terminate before we accept the new connection and post a login response. The solution is to deffer the comp_wait completion and the conn_put to a work so wait_conn will effectively be non-blocking (flush errors are assumed to come very fast). We allocate isert_release_wq with WQ_UNBOUND and WQ_UNBOUND_MAX_ACTIVE to spread the concurrency of release works. Reported-by: Slava Shwartsman <valyushash@gmail.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Sagi Grimberg authored
commit ca6c1d82 upstream. The np listener cm_id will also get ADDR_CHANGE event upcall (in case it is bound to a specific IP). Handle it correctly by creating a new cm_id and implicitly destroy the old one. Since this is the second event a listener np cm_id may encounter, we move the np cm_id event handling to a routine. Squashed: iser-target: Move cma_id setup to a function Reported-by: Slava Shwartsman <valyushash@gmail.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Sagi Grimberg authored
commit 19e2090f upstream. Take isert_conn pointer from cm_id->qp->qp_context. This will allow us to know that the cm_id context is always the network portal. This will make the cm_id event check (connection or network portal) more reliable. In order to avoid a NULL dereference in cma_id->qp->qp_context we destroy the qp after we destroy the cm_id (and make the dereference safe). session stablishment/teardown sequences can happen in parallel, we should take into account that connected_handler might race with connection teardown flow. Also, protect isert_conn->conn_device->active_qps decrement within the error patch during QP creation failure and the normal teardown path in isert_connect_release(). Squashed: iser-target: Decrement completion context active_qps in error flow Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> [ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Sagi Grimberg authored
commit 2371e5da upstream. There is no point in accepting a new CM request only when we are completely done with the last iscsi login. Instead we accept immediately, this will also cause the CM connection to reach connected state and the initiator is allowed to send the first login. We mark that we got the initial login and let iscsi layer pick it up when it gets there. This reduces the parallel login sequence by a factor of more then 4 (and more for multi-login) and also prevents the initiator (who does all logins in parallel) from giving up on login timeout expiration. In order to support multiple login requests sequence (CHAP) we call isert_rx_login_req from isert_rx_completion insead of letting isert_get_login_rx call it. Squashed: iser-target: Use kref_get_unless_zero in connected_handler iser-target: Acquire conn_mutex when changing connection state iser-target: Reject connect request in failure path Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Sagi Grimberg authored
commit 128e9cc8 upstream. ISER_CONN_UP state is not sufficient to know if we should wait for completion of flush errors and disconnected_handler event. Instead, split it to 2 states: - ISER_CONN_UP: Got to CM connected phase, This state indicates that we need to wait for a CM disconnect event before going to teardown. - ISER_CONN_FULL_FEATURE: Got to full feature phase after we posted login response, This state indicates that we posted recv buffers and we need to wait for flush completions before going to teardown. Also avoid deffering disconnected handler to a work, and handle it within disconnected handler. More work here is needed to handle DEVICE_REMOVAL event correctly (cleanup all resources). Squashed: iser-target: Don't deffer disconnected handler to a work Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Sagi Grimberg authored
commit 954f2372 upstream. Since commit 0fc4ea70 ("Target/iser: Don't put isert_conn inside disconnected handler") we put the conn kref in isert_wait_conn, so we need .wait_conn to be invoked also in the error path. Introduce call to isert_conn_terminate (called under lock) which transitions the connection state to TERMINATING and calls rdma_disconnect. If the state is already teminating, just bail out back (temination started). Also, make sure to destroy the connection when getting a connect error event if didn't get to connected (state UP). Same for the handling of REJECTED and UNREACHABLE cma events. Squashed: iscsi-target: Add call to wait_conn in establishment error flow Reported-by: Slava Shwartsman <valyushash@gmail.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> [ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Imre Deak authored
commit c352d1ba upstream. irq_mask should include all IRQ bits that we want to mask, but atm we set it incorrectly to the inverse of this. If the mask is used subsequently to enable/disable some IRQ bits, we may unintentionally unmask unrelated IRQs. I can't see any way that this can lead to a real problem in the current -nightly code, since the first place the mask will be used next (after a suspend/resume cycle) is in valleyview_irq_postinstall(), but the mask is reset there to its proper value. This causes a problem in the upstream kernel though, where - due to another issue - the mask is used in the above way to disable only the display IRQs. This other issue is fixed by: commit 950eabaf Author: Imre Deak <imre.deak@intel.com> Date: Mon Sep 8 15:21:09 2014 +0300 drm/i915: vlv: fix display IRQ enable/disable Interestingly, even with the above two bugs, we shouldn't in theory have any real problems (arguably a famous last sentence:). That's because even if we unmask something unintentionally via the VLV_IMR/VLV_IER register the master IRQ masking bit in VLV_MASTER_IER is still set and should prevent all i915 interrupts. According to my testing on an ASUS T100 with DSI output this isn't the case at least with the MIPIA_INTERRUPT. Leaving this one unmasked in IMR/IER, while having VLV_MASTER_IER set to 0 may lead to a lockup during system suspend as shown in the bugzilla ticket below. This fix should get rid of the problem reported there in upstream and older kernels. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85920Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> [ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Jiri Olsa authored
commit 9fc81d87 upstream. We allow PMU driver to change the cpu on which the event should be installed to. This happened in patch: e2d37cd2 ("perf: Allow the PMU driver to choose the CPU on which to install events") This patch also forces all the group members to follow the currently opened events cpu if the group happened to be moved. This and the change of event->cpu in perf_install_in_context() function introduced in: 0cda4c02 ("perf: Introduce perf_pmu_migrate_context()") forces group members to change their event->cpu, if the currently-opened-event's PMU changed the cpu and there is a group move. Above behaviour causes problem for breakpoint events, which uses event->cpu to touch cpu specific data for breakpoints accounting. By changing event->cpu, some breakpoints slots were wrongly accounted for given cpu. Vinces's perf fuzzer hit this issue and caused following WARN on my setup: WARNING: CPU: 0 PID: 20214 at arch/x86/kernel/hw_breakpoint.c:119 arch_install_hw_breakpoint+0x142/0x150() Can't find any breakpoint slot [...] This patch changes the group moving code to keep the event's original cpu. Reported-by: Vince Weaver <vince@deater.net> Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vince@deater.net> Cc: Yan, Zheng <zheng.z.yan@intel.com> Link: http://lkml.kernel.org/r/1418243031-20367-3-git-send-email-jolsa@kernel.orgSigned-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Jiri Olsa authored
commit af91568e upstream. The uncore_collect_events functions assumes that event group might contain only uncore events which is wrong, because it might contain any type of events. This bug leads to uncore framework touching 'not' uncore events, which could end up all sorts of bugs. One was triggered by Vince's perf fuzzer, when the uncore code touched breakpoint event private event space as if it was uncore event and caused BUG: BUG: unable to handle kernel paging request at ffffffff82822068 IP: [<ffffffff81020338>] uncore_assign_events+0x188/0x250 ... The code in uncore_assign_events() function was looking for event->hw.idx data while the event was initialized as a breakpoint with different members in event->hw union. This patch forces uncore_collect_events() to collect only uncore events. Reported-by: Vince Weaver <vince@deater.net> Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Yan, Zheng <zheng.z.yan@intel.com> Link: http://lkml.kernel.org/r/1418243031-20367-2-git-send-email-jolsa@kernel.orgSigned-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Filipe Manana authored
commit 678886bd upstream. When we abort a transaction we iterate over all the ranges marked as dirty in fs_info->freed_extents[0] and fs_info->freed_extents[1], clear them from those trees, add them back (unpin) to the free space caches and, if the fs was mounted with "-o discard", perform a discard on those regions. Also, after adding the regions to the free space caches, a fitrim ioctl call can see those ranges in a block group's free space cache and perform a discard on the ranges, so the same issue can happen without "-o discard" as well. This causes corruption, affecting one or multiple btree nodes (in the worst case leaving the fs unmountable) because some of those ranges (the ones in the fs_info->pinned_extents tree) correspond to btree nodes/leafs that are referred by the last committed super block - breaking the rule that anything that was committed by a transaction is untouched until the next transaction commits successfully. I ran into this while running in a loop (for several hours) the fstest that I recently submitted: [PATCH] fstests: add btrfs test to stress chunk allocation/removal and fstrim The corruption always happened when a transaction aborted and then fsck complained like this: _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent *** fsck.btrfs output *** Check tree block failed, want=94945280, have=0 Check tree block failed, want=94945280, have=0 Check tree block failed, want=94945280, have=0 Check tree block failed, want=94945280, have=0 Check tree block failed, want=94945280, have=0 read block failed check_tree_block Couldn't open file system In this case 94945280 corresponded to the root of a tree. Using frace what I observed was the following sequence of steps happened: 1) transaction N started, fs_info->pinned_extents pointed to fs_info->freed_extents[0]; 2) node/eb 94945280 is created; 3) eb is persisted to disk; 4) transaction N commit starts, fs_info->pinned_extents now points to fs_info->freed_extents[1], and transaction N completes; 5) transaction N + 1 starts; 6) eb is COWed, and btrfs_free_tree_block() called for this eb; 7) eb range (94945280 to 94945280 + 16Kb) is added to fs_info->pinned_extents (fs_info->freed_extents[1]); 8) Something goes wrong in transaction N + 1, like hitting ENOSPC for example, and the transaction is aborted, turning the fs into readonly mode. The stack trace I got for example: [112065.253935] [<ffffffff8140c7b6>] dump_stack+0x4d/0x66 [112065.254271] [<ffffffff81042984>] warn_slowpath_common+0x7f/0x98 [112065.254567] [<ffffffffa0325990>] ? __btrfs_abort_transaction+0x50/0x10b [btrfs] [112065.261674] [<ffffffff810429e5>] warn_slowpath_fmt+0x48/0x50 [112065.261922] [<ffffffffa032949e>] ? btrfs_free_path+0x26/0x29 [btrfs] [112065.262211] [<ffffffffa0325990>] __btrfs_abort_transaction+0x50/0x10b [btrfs] [112065.262545] [<ffffffffa036b1d6>] btrfs_remove_chunk+0x537/0x58b [btrfs] [112065.262771] [<ffffffffa033840f>] btrfs_delete_unused_bgs+0x1de/0x21b [btrfs] [112065.263105] [<ffffffffa0343106>] cleaner_kthread+0x100/0x12f [btrfs] (...) [112065.264493] ---[ end trace dd7903a975a31a08 ]--- [112065.264673] BTRFS: error (device sdc) in btrfs_remove_chunk:2625: errno=-28 No space left [112065.264997] BTRFS info (device sdc): forced readonly 9) The clear kthread sees that the BTRFS_FS_STATE_ERROR bit is set in fs_info->fs_state and calls btrfs_cleanup_transaction(), which in turn calls btrfs_destroy_pinned_extent(); 10) Then btrfs_destroy_pinned_extent() iterates over all the ranges marked as dirty in fs_info->freed_extents[], and for each one it calls discard, if the fs was mounted with "-o discard", and adds the range to the free space cache of the respective block group; 11) btrfs_trim_block_group(), invoked from the fitrim ioctl code path, sees the free space entries and performs a discard; 12) After an umount and mount (or fsck), our eb's location on disk was full of zeroes, and it should have been untouched, because it was marked as dirty in the fs_info->pinned_extents tree, and therefore used by the trees that the last committed superblock points to. Fix this by not performing a discard and not adding the ranges to the free space caches - it's useless from this point since the fs is now in readonly mode and we won't write free space caches to disk anymore (otherwise we would leak space) nor any new superblock. By not adding the ranges to the free space caches, it prevents other code paths from allocating that space and write to it as well, therefore being safer and simpler. This isn't a new problem, as it's been present since 2011 (git commit acce952b). Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Peter Rosin authored
commit 681a1956 upstream. When the codec is connected using i2c, it will only auto-increment register addresses if msb (0x80) of the register address byte is set. [Fixes cache sync if multiple adjacent registers are updated -- broonie] Signed-off-by: Peter Rosin <peda@axentia.se> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Sreekanth Reddy authored
commit 2311ce4d upstream. This reverts commit 963ba22b ("mpt3sas: Remove phys on topology change") Reverting the previous mpt3sas drives patch changes, since we will observe below issue Issue: Drives connected Enclosure/Expander will unregister with SCSI Transport Layer, if any one remove and add expander cable with in DMD (Device Missing Delay) time period or even any one power-off and power-on the Enclosure with in the DMD period. Signed-off-by: Sreekanth Reddy <Sreekanth.Reddy@avagotech.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Sreekanth Reddy authored
commit 81a89c2d upstream. This reverts commit 3520f9c7 ("mpt2sas: Remove phys on topology change") Reverting the previous mpt2sas drives patch changes, since we will observe below issue Issue: Drives connected Enclosure/Expander will unregister with SCSI Transport Layer, if any one remove and add expander cable with in DMD (Device Missing Delay) time period or even any one power-off and power-on the Enclosure with in the DMD period. Signed-off-by: Sreekanth Reddy <Sreekanth.Reddy@avagotech.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-