- 26 Oct, 2015 10 commits
-
-
John Stultz authored
commit 2619d7e9 upstream. The internal clocksteering done for fine-grained error correction uses a logarithmic approximation, so any time adjtimex() adjusts the clock steering, timekeeping_freqadjust() quickly approximates the correct clock frequency over a series of ticks. Unfortunately, the logic in timekeeping_freqadjust(), introduced in commit: dc491596 ("timekeeping: Rework frequency adjustments to work better w/ nohz") used the abs() function with a s64 error value to calculate the size of the approximated adjustment to be made. Per include/linux/kernel.h: "abs() should not be used for 64-bit types (s64, u64, long long) - use abs64()". Thus on 32-bit platforms, this resulted in the clocksteering to take a quite dampended random walk trying to converge on the proper frequency, which caused the adjustments to be made much slower then intended (most easily observed when large adjustments are made). This patch fixes the issue by using abs64() instead. Reported-by: Nuno Gonçalves <nunojpg@gmail.com> Tested-by: Nuno Goncalves <nunojpg@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Miroslav Lichvar <mlichvar@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1441840051-20244-1-git-send-email-john.stultz@linaro.orgSigned-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Guenter Roeck authored
commit 728d2940 upstream. The STEP_UP_TIME and STEP_DOWN_TIME registers are swapped for all chips but NCT6775. Reported-by: Grazvydas Ignotas <notasas@gmail.com> Reviewed-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Jann Horn authored
commit 4c17a6d5 upstream. This might lead to local privilege escalation (code execution as kernel) for systems where the following conditions are met: - CONFIG_CIFS_SMB2 and CONFIG_CIFS_POSIX are enabled - a cifs filesystem is mounted where: - the mount option "vers" was used and set to a value >=2.0 - the attacker has write access to at least one file on the filesystem To attack this, an attacker would have to guess the target_tcon pointer (but guessing wrong doesn't cause a crash, it just returns an error code) and win a narrow race. Signed-off-by: Jann Horn <jann@thejh.net> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Paul Mackerras authored
commit e297c939 upstream. This fixes a race which can result in the same virtual IRQ number being assigned to two different MSI interrupts. The most visible consequence of that is usually a warning and stack trace from the sysfs code about an attempt to create a duplicate entry in sysfs. The race happens when one CPU (say CPU 0) is disposing of an MSI while another CPU (say CPU 1) is setting up an MSI. CPU 0 calls (for example) pnv_teardown_msi_irqs(), which calls msi_bitmap_free_hwirqs() to indicate that the MSI (i.e. its hardware IRQ number) is no longer in use. Then, before CPU 0 gets to calling irq_dispose_mapping() to free up the virtal IRQ number, CPU 1 comes in and calls msi_bitmap_alloc_hwirqs() to allocate an MSI, and gets the same hardware IRQ number that CPU 0 just freed. CPU 1 then calls irq_create_mapping() to get a virtual IRQ number, which sees that there is currently a mapping for that hardware IRQ number and returns the corresponding virtual IRQ number (which is the same virtual IRQ number that CPU 0 was using). CPU 0 then calls irq_dispose_mapping() and frees that virtual IRQ number. Now, if another CPU comes along and calls irq_create_mapping(), it is likely to get the virtual IRQ number that was just freed, resulting in the same virtual IRQ number apparently being used for two different hardware interrupts. To fix this race, we just move the call to msi_bitmap_free_hwirqs() to after the call to irq_dispose_mapping(). Since virq_to_hw() doesn't work for the virtual IRQ number after irq_dispose_mapping() has been called, we need to call it before irq_dispose_mapping() and remember the result for the msi_bitmap_free_hwirqs() call. The pattern of calling msi_bitmap_free_hwirqs() before irq_dispose_mapping() appears in 5 places under arch/powerpc, and appears to have originated in commit 05af7bd2 ("[POWERPC] MPIC U3/U4 MSI backend") from 2007. Fixes: 05af7bd2 ("[POWERPC] MPIC U3/U4 MSI backend") Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> [ kamal: backport to 3.19-stable: pasemi/msi.c --> arch/powerpc/sysdev/mpic_pasemi_msi.c ] Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Ard Biesheuvel authored
commit a077224f upstream. While working on the 32-bit ARM port of UEFI, I noticed a strange corruption in the kernel log. The following snprintf() statement (in drivers/firmware/efi/efi.c:efi_md_typeattr_format()) snprintf(pos, size, "|%3s|%2s|%2s|%2s|%3s|%2s|%2s|%2s|%2s]", was producing the following output in the log: | | | | | |WB|WT|WC|UC] | | | | | |WB|WT|WC|UC] | | | | | |WB|WT|WC|UC] |RUN| | | | |WB|WT|WC|UC]* |RUN| | | | |WB|WT|WC|UC]* | | | | | |WB|WT|WC|UC] |RUN| | | | |WB|WT|WC|UC]* | | | | | |WB|WT|WC|UC] |RUN| | | | | | | |UC] |RUN| | | | | | | |UC] As it turns out, this is caused by incorrect code being emitted for the string() function in lib/vsprintf.c. The following code if (!(spec.flags & LEFT)) { while (len < spec.field_width--) { if (buf < end) *buf = ' '; ++buf; } } for (i = 0; i < len; ++i) { if (buf < end) *buf = *s; ++buf; ++s; } while (len < spec.field_width--) { if (buf < end) *buf = ' '; ++buf; } when called with len == 0, triggers an issue in the GCC SRA optimization pass (Scalar Replacement of Aggregates), which handles promotion of signed struct members incorrectly. This is a known but as yet unresolved issue. (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65932). In this particular case, it is causing the second while loop to be executed erroneously a single time, causing the additional space characters to be printed. So disable the optimization by passing -fno-ipa-sra. Acked-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Guenter Roeck authored
commit a394d635 upstream. Actually, spi_master_put() after spi_alloc_master() must _not_ be followed by kfree(). The memory is already freed with the call to spi_master_put() through spi_master_class, which registers a release function. Calling both spi_master_put() and kfree() results in often nasty (and delayed) crashes elsewhere in the kernel, often in the networking stack. This reverts commit eb4af0f5. Link to patch and concerns: https://lkml.org/lkml/2012/9/3/269 or http://lkml.iu.edu/hypermail/linux/kernel/1209.0/00790.html Alexey Klimov: This revert becomes valid after 94c69f76 when spi-imx.c has been fixed and there is no need to call kfree() so comment for spi_alloc_master() should be fixed. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Alexey Klimov <alexey.klimov@linaro.org> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Tan, Jui Nee authored
commit 02bc933e upstream. On Intel Baytrail, there is case when interrupt handler get called, no SPI message is captured. The RX FIFO is indeed empty when RX timeout pending interrupt (SSSR_TINT) happens. Use the BIOS version where both HSUART and SPI are on the same IRQ. Both drivers are using IRQF_SHARED when calling the request_irq function. When running two separate and independent SPI and HSUART application that generate data traffic on both components, user will see messages like below on the console: pxa2xx-spi pxa2xx-spi.0: bad message state in interrupt handler This commit will fix this by first checking Receiver Time-out Interrupt, if it is disabled, ignore the request and return without servicing. Signed-off-by: Tan, Jui Nee <jui.nee.tan@intel.com> Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
David Howells authored
commit 911b79cd upstream. If request_key() is used to find a keyring, only do the search part - don't do the construction part if the keyring was not found by the search. We don't really want keyrings in the negative instantiated state since the rejected/negative instantiation error value in the payload is unioned with keyring metadata. Now the kernel gives an error: request_key("keyring", "#selinux,bdekeyring", "keyring", KEY_SPEC_USER_SESSION_KEYRING) = -1 EPERM (Operation not permitted) Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
David Howells authored
commit f05819df upstream. The following sequence of commands: i=`keyctl add user a a @s` keyctl request2 keyring foo bar @t keyctl unlink $i @s tries to invoke an upcall to instantiate a keyring if one doesn't already exist by that name within the user's keyring set. However, if the upcall fails, the code sets keyring->type_data.reject_error to -ENOKEY or some other error code. When the key is garbage collected, the key destroy function is called unconditionally and keyring_destroy() uses list_empty() on keyring->type_data.link - which is in a union with reject_error. Subsequently, the kernel tries to unlink the keyring from the keyring names list - which oopses like this: BUG: unable to handle kernel paging request at 00000000ffffff8a IP: [<ffffffff8126e051>] keyring_destroy+0x3d/0x88 ... Workqueue: events key_garbage_collector ... RIP: 0010:[<ffffffff8126e051>] keyring_destroy+0x3d/0x88 RSP: 0018:ffff88003e2f3d30 EFLAGS: 00010203 RAX: 00000000ffffff82 RBX: ffff88003bf1a900 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 000000003bfc6901 RDI: ffffffff81a73a40 RBP: ffff88003e2f3d38 R08: 0000000000000152 R09: 0000000000000000 R10: ffff88003e2f3c18 R11: 000000000000865b R12: ffff88003bf1a900 R13: 0000000000000000 R14: ffff88003bf1a908 R15: ffff88003e2f4000 ... CR2: 00000000ffffff8a CR3: 000000003e3ec000 CR4: 00000000000006f0 ... Call Trace: [<ffffffff8126c756>] key_gc_unused_keys.constprop.1+0x5d/0x10f [<ffffffff8126ca71>] key_garbage_collector+0x1fa/0x351 [<ffffffff8105ec9b>] process_one_work+0x28e/0x547 [<ffffffff8105fd17>] worker_thread+0x26e/0x361 [<ffffffff8105faa9>] ? rescuer_thread+0x2a8/0x2a8 [<ffffffff810648ad>] kthread+0xf3/0xfb [<ffffffff810647ba>] ? kthread_create_on_node+0x1c2/0x1c2 [<ffffffff815f2ccf>] ret_from_fork+0x3f/0x70 [<ffffffff810647ba>] ? kthread_create_on_node+0x1c2/0x1c2 Note the value in RAX. This is a 32-bit representation of -ENOKEY. The solution is to only call ->destroy() if the key was successfully instantiated. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
David Howells authored
commit 94c4554b upstream. There appears to be a race between: (1) key_gc_unused_keys() which frees key->security and then calls keyring_destroy() to unlink the name from the name list (2) find_keyring_by_name() which calls key_permission(), thus accessing key->security, on a key before checking to see whether the key usage is 0 (ie. the key is dead and might be cleaned up). Fix this by calling ->destroy() before cleaning up the core key data - including key->security. Reported-by: Petr Matousek <pmatouse@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
- 23 Oct, 2015 2 commits
-
-
lucien authored
commit f648f807 upstream. Commit f8d96052 ("sctp: Enforce retransmission limit during shutdown") fixed a problem with excessive retransmissions in the SHUTDOWN_PENDING by not resetting the association overall_error_count. This allowed the association to better enforce assoc.max_retrans limit. However, the same issue still exists when the association is in SHUTDOWN_RECEIVED state. In this state, HB-ACKs will continue to reset the overall_error_count for the association would extend the lifetime of association unnecessarily. This patch solves this by resetting the overall_error_count whenever the current state is small then SCTP_STATE_SHUTDOWN_PENDING. As a small side-effect, we end up also handling SCTP_STATE_SHUTDOWN_ACK_SENT and SCTP_STATE_SHUTDOWN_SENT states, but they are not really impacted because we disable Heartbeats in those states. Fixes: Commit f8d96052 ("sctp: Enforce retransmission limit during shutdown") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Kamal Mostafa authored
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
- 21 Oct, 2015 1 commit
-
-
Jann Horn authored
commit fbb18169 upstream. It was possible for an attacking user to trick root (or another user) into writing his coredumps into an attacker-readable, pre-existing file using rename() or link(), causing the disclosure of secret data from the victim process' virtual memory. Depending on the configuration, it was also possible to trick root into overwriting system files with coredumps. Fix that issue by never writing coredumps into existing files. Requirements for the attack: - The attack only applies if the victim's process has a nonzero RLIMIT_CORE and is dumpable. - The attacker can trick the victim into coredumping into an attacker-writable directory D, either because the core_pattern is relative and the victim's cwd is attacker-writable or because an absolute core_pattern pointing to a world-writable directory is used. - The attacker has one of these: A: on a system with protected_hardlinks=0: execute access to a folder containing a victim-owned, attacker-readable file on the same partition as D, and the victim-owned file will be deleted before the main part of the attack takes place. (In practice, there are lots of files that fulfill this condition, e.g. entries in Debian's /var/lib/dpkg/info/.) This does not apply to most Linux systems because most distros set protected_hardlinks=1. B: on a system with protected_hardlinks=1: execute access to a folder containing a victim-owned, attacker-readable and attacker-writable file on the same partition as D, and the victim-owned file will be deleted before the main part of the attack takes place. (This seems to be uncommon.) C: on any system, independent of protected_hardlinks: write access to a non-sticky folder containing a victim-owned, attacker-readable file on the same partition as D (This seems to be uncommon.) The basic idea is that the attacker moves the victim-owned file to where he expects the victim process to dump its core. The victim process dumps its core into the existing file, and the attacker reads the coredump from it. If the attacker can't move the file because he does not have write access to the containing directory, he can instead link the file to a directory he controls, then wait for the original link to the file to be deleted (because the kernel checks that the link count of the corefile is 1). A less reliable variant that requires D to be non-sticky works with link() and does not require deletion of the original link: link() the file into D, but then unlink() it directly before the kernel performs the link count check. On systems with protected_hardlinks=0, this variant allows an attacker to not only gain information from coredumps, but also clobber existing, victim-writable files with coredumps. (This could theoretically lead to a privilege escalation.) Signed-off-by: Jann Horn <jann@thejh.net> Cc: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
- 20 Oct, 2015 27 commits
-
-
Christoph Hellwig authored
commit 294ab783 upstream. It looks like the Kconfig check that was meant to fix this (commit fe9233fb [SCSI] scsi_dh: fix kconfig related build errors) was actually reversed, but no-one noticed until the new set of patches which separated DM and SCSI_DH). Fixes: fe9233fbSigned-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: James Bottomley <JBottomley@Odin.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Daniel Borkmann authored
commit 6bb0fef4 upstream. When netlink mmap on receive side is the consumer of nf queue data, it can happen that in some edge cases, we write skb shared info into the user space mmap buffer: Assume a possible rx ring frame size of only 4096, and the network skb, which is being zero-copied into the netlink skb, contains page frags with an overall skb->len larger than the linear part of the netlink skb. skb_zerocopy(), which is generic and thus not aware of the fact that shared info cannot be accessed for such skbs then tries to write and fill frags, thus leaking kernel data/pointers and in some corner cases possibly writing out of bounds of the mmap area (when filling the last slot in the ring buffer this way). I.e. the ring buffer slot is then of status NL_MMAP_STATUS_VALID, has an advertised length larger than 4096, where the linear part is visible at the slot beginning, and the leaked sizeof(struct skb_shared_info) has been written to the beginning of the next slot (also corrupting the struct nl_mmap_hdr slot header incl. status etc), since skb->end points to skb->data + ring->frame_size - NL_MMAP_HDRLEN. The fix adds and lets __netlink_alloc_skb() take the actual needed linear room for the network skb + meta data into account. It's completely irrelevant for non-mmaped netlink sockets, but in case mmap sockets are used, it can be decided whether the available skb_tailroom() is really large enough for the buffer, or whether it needs to internally fallback to a normal alloc_skb(). >From nf queue side, the information whether the destination port is an mmap RX ring is not really available without extra port-to-socket lookup, thus it can only be determined in lower layers i.e. when __netlink_alloc_skb() is called that checks internally for this. I chose to add the extra ldiff parameter as mmap will then still work: We have data_len and hlen in nfqnl_build_packet_message(), data_len is the full length (capped at queue->copy_range) for skb_zerocopy() and hlen some possible part of data_len that needs to be copied; the rem_len variable indicates the needed remaining linear mmap space. The only other workaround in nf queue internally would be after allocation time by f.e. cap'ing the data_len to the skb_tailroom() iff we deal with an mmap skb, but that would 1) expose the fact that we use a mmap skb to upper layers, and 2) trim the skb where we otherwise could just have moved the full skb into the normal receive queue. After the patch, in my test case the ring slot doesn't fit and therefore shows NL_MMAP_STATUS_COPY, where a full skb carries all the data and thus needs to be picked up via recv(). Fixes: 3ab1f683 ("nfnetlink: add support for memory mapped netlink") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> [ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Linus Lüssing authored
commit 4635469f upstream. Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One OGM handler might undo the set/clear of a specific bit from another handler run in between. Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions. Fixes: 3f4841ff ("batman-adv: tvlv - add network coding container") Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com> [ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Linus Lüssing authored
commit 65d7d460 upstream. Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One OGM handler might undo the set/clear of a specific bit from another handler run in between. Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions. Fixes: 17cf0ea4 ("batman-adv: tvlv - add distributed arp table container") Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com> [ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Chuck Lever authored
commit 9d11b51c upstream. The Linux NFS server returns garbage in the data payload of inline NFS/RDMA READ replies. These are READs of under 1000 bytes or so where the client has not provided either a reply chunk or a write list. The NFS server delivers the data payload for an NFS READ reply to the transport in an xdr_buf page list. If the NFS client did not provide a reply chunk or a write list, send_reply() is supposed to set up a separate sge for the page containing the READ data, and another sge for XDR padding if needed, then post all of the sges via a single SEND Work Request. The problem is send_reply() does not advance through the xdr_buf when setting up scatter/gather entries for SEND WR. It always calls dma_map_xdr with xdr_off set to zero. When there's more than one sge, dma_map_xdr() sets up the SEND sge's so they all point to the xdr_buf's head. The current Linux NFS/RDMA client always provides a reply chunk or a write list when performing an NFS READ over RDMA. Therefore, it does not exercise this particular case. The Linux server has never had to use more than one extra sge for building RPC/RDMA replies with a Linux client. However, an NFS/RDMA client _is_ allowed to send small NFS READs without setting up a write list or reply chunk. The NFS READ reply fits entirely within the inline reply buffer in this case. This is perhaps a more efficient way of performing NFS READs that the Linux NFS/RDMA client may some day adopt. Fixes: b432e6b3 ('svcrdma: Change DMA mapping logic to . . .') BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=285Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Krzysztof Kozlowski authored
commit 5f09a5cb upstream. During probe the regulator (if present) was enabled but not disabled in case of failure. So an unsuccessful probe lead to enabling the regulator which was actually not needed because the device was not enabled. Additionally each deferred probe lead to increase of regulator enable count so it would not be effectively disabled during removal of the device. Test HW: Exynos4412 - Trats2 board Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Fixes: 498d22f6 ("thermal: exynos: Support for TMU regulator defined at device tree") Reviewed-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk> Signed-off-by: Lukasz Majewski <l.majewski@samsung.com> Tested-by: Lukasz Majewski <l.majewski@samsung.com> Signed-off-by: Eduardo Valentin <edubezval@gmail.com> [ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Florian Fainelli authored
commit 03679a14 upstream. The macro to write 64-bits quantities to the 32-bits register swapped the value and offsets arguments, we want to preserve the ordering of the arguments with respect to how writel() is implemented for instance: value first, offset/base second. Fixes: 246d7f77 ("net: dsa: add Broadcom SF2 switch driver") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Daniel Borkmann authored
commit 592867bf upstream. We may already have gotten a proper fd struct through fdget(), so whenever we return at the end of an map operation, we need to call fdput(). However, each map operation from syscall side first probes CHECK_ATTR() to verify that unused fields in the bpf_attr union are zero. In case of malformed input, we return with error, but the lookup to the map_fd was already performed at that time, so that we return without an corresponding fdput(). Fix it by performing an fdget() only right before bpf_map_get(). The fdget() invocation on maps in the verifier is not affected. Fixes: db20fd2b ("bpf: add lookup/update/delete/iterate methods to BPF maps") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Eric Dumazet authored
commit c8219906 upstream. In commit f341861f ("task_work: add a scheduling point in task_work_run()") I fixed a latency problem adding a cond_resched() call. Later, commit ac3d0da8 added yet another loop to reverse a list, bringing back the latency spike : I've seen in some cases this loop taking 275 ms, if for example a process with 2,000,000 files is killed. We could add yet another cond_resched() in the reverse loop, or we can simply remove the reversal, as I do not think anything would depend on order of task_work_add() submitted works. Fixes: ac3d0da8 ("task_work: Make task_work_add() lockless") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Maciej Żenczykowski <maze@google.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Haggai Eran authored
commit 11d74804 upstream. The mlx5_ib_reg_user_mr() function will attempt to call clean_mr() in its error flow even though there is never a case where the error flow occurs with a valid MR pointer to destroy. Remove the clean_mr() call and the incorrect comment above it. Fixes: b4cfe447 ("IB/mlx5: Implement on demand paging by adding support for MMU notifiers") Cc: Eli Cohen <eli@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Viresh Kumar authored
commit a2022001 upstream. Tolerance applies on both sides of the target voltage, i.e. both min and max sides. But while checking if a voltage is supported by the regulator or not, we haven't taken care of tolerance on the lower side. Fix that. Cc: Lucas Stach <l.stach@pengutronix.de> Fixes: 045ee45c ("cpufreq: cpufreq-dt: disable unsupported OPPs") Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Lucas Stach <l.stach@pengutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Daniel Borkmann authored
commit e41b0bed upstream. We previously register IPPROTO_ROUTING offload under inet6_add_offload(), but in error path, we try to unregister it with inet_del_offload(). This doesn't seem correct, it should actually be inet6_del_offload(), also ipv6_exthdrs_offload_exit() from that commit seems rather incorrect (it also uses rthdr_offload twice), but it got removed entirely later on. Fixes: 3336288a ("ipv6: Switch to using new offload infrastructure.") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Daniel Borkmann authored
commit b382c086 upstream. diag socket's sock_diag_put_filterinfo() dumps classic BPF programs upon request to user space (ss -0 -b). However, native eBPF programs attached to sockets (SO_ATTACH_BPF) cannot be dumped with this method: Their orig_prog is always NULL. However, sock_diag_put_filterinfo() unconditionally tries to access its filter length resp. wants to copy the filter insns from there. Internal cBPF to eBPF transformations attached to sockets don't have this issue, as orig_prog state is kept. It's currently only used by packet sockets. If we would want to add native eBPF support in the future, this needs to be done through a different attribute than PACKET_DIAG_FILTER to not confuse possible user space disassemblers that work on diag data. Fixes: 89aa0758 ("net: sock: allow eBPF programs to be attached to sockets") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Lukas Wunner authored
commit 69f92f67 upstream. Commit 92122789 ("drm/i915: preserve SSC if previously set v3") added code to intel_modeset_gem_init to override the SSC status read from VBT with the SSC status set by BIOS. However, intel_modeset_gem_init is invoked *after* intel_modeset_init, which calls intel_setup_outputs, which *modifies* SSC status by way of intel_init_pch_refclk. So unlike advertised, intel_modeset_gem_init doesn't preserve the SSC status set by BIOS but whatever intel_init_pch_refclk decided on. This is a problem on dual gpu laptops such as the MacBook Pro which require either a handler to switch DDC lines, or the discrete gpu to proxy DDC/AUX communication: Both the handler and the discrete gpu may initialize after the i915 driver, and consequently, an LVDS connector may initially seem disconnected and the SSC therefore is disabled by intel_init_pch_refclk, but on reprobe the connector may turn out to be connected and the SSC must then be enabled. Due to 92122789 however, the SSC is not enabled on reprobe since it is assumed BIOS disabled it while in fact it was disabled by intel_init_pch_refclk. Also, because the SSC status is preserved so late, the preserved value only ever gets used on resume but not on panel initialization: intel_modeset_init calls intel_init_display which indirectly calls intel_panel_use_ssc via multiple subroutines, *before* the BIOS value overrides the VBT value in intel_modeset_gem_init (intel_panel_use_ssc is the sole user of dev_priv->vbt.lvds_use_ssc). Fix this by moving the code introduced by 92122789 from intel_modeset_gem_init to intel_modeset_init before the invocation of intel_setup_outputs and intel_init_display. Add a DRM_DEBUG_KMS as suggested way back by Jani: http://lists.freedesktop.org/archives/intel-gfx/2014-June/046666.html Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=88861 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61115Tested-by: Paul Hordiienko <pvt.gord@gmail.com> [MBP 6,2 2010 intel ILK + nvidia GT216 pre-retina] Tested-by: William Brown <william@blackhats.net.au> [MBP 8,2 2011 intel SNB + amd turks pre-retina] Tested-by: Lukas Wunner <lukas@wunner.de> [MBP 9,1 2012 intel IVB + nvidia GK107 pre-retina] Tested-by: Bruno Bierbaumer <bruno@bierbaumer.net> [MBP 11,3 2013 intel HSW + nvidia GK107 retina -- work in progress] Fixes: 92122789 ("drm/i915: preserve SSC if previously set v3") Signed-off-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Jialing Fu authored
commit 71f8a4b8 upstream. The following panic is captured in ker3.14, but the issue still exists in latest kernel. --------------------------------------------------------------------- [ 20.738217] c0 3136 (Compiler) Unable to handle kernel NULL pointer dereference at virtual address 00000578 ...... [ 20.738499] c0 3136 (Compiler) PC is at _raw_spin_lock_irqsave+0x24/0x60 [ 20.738527] c0 3136 (Compiler) LR is at _raw_spin_lock_irqsave+0x20/0x60 [ 20.740134] c0 3136 (Compiler) Call trace: [ 20.740165] c0 3136 (Compiler) [<ffffffc0008ee900>] _raw_spin_lock_irqsave+0x24/0x60 [ 20.740200] c0 3136 (Compiler) [<ffffffc0000dd024>] __wake_up+0x1c/0x54 [ 20.740230] c0 3136 (Compiler) [<ffffffc000639414>] mmc_wait_data_done+0x28/0x34 [ 20.740262] c0 3136 (Compiler) [<ffffffc0006391a0>] mmc_request_done+0xa4/0x220 [ 20.740314] c0 3136 (Compiler) [<ffffffc000656894>] sdhci_tasklet_finish+0xac/0x264 [ 20.740352] c0 3136 (Compiler) [<ffffffc0000a2b58>] tasklet_action+0xa0/0x158 [ 20.740382] c0 3136 (Compiler) [<ffffffc0000a2078>] __do_softirq+0x10c/0x2e4 [ 20.740411] c0 3136 (Compiler) [<ffffffc0000a24bc>] irq_exit+0x8c/0xc0 [ 20.740439] c0 3136 (Compiler) [<ffffffc00008489c>] handle_IRQ+0x48/0xac [ 20.740469] c0 3136 (Compiler) [<ffffffc000081428>] gic_handle_irq+0x38/0x7c ---------------------------------------------------------------------- Because in SMP, "mrq" has race condition between below two paths: path1: CPU0: <tasklet context> static void mmc_wait_data_done(struct mmc_request *mrq) { mrq->host->context_info.is_done_rcv = true; // // If CPU0 has just finished "is_done_rcv = true" in path1, and at // this moment, IRQ or ICache line missing happens in CPU0. // What happens in CPU1 (path2)? // // If the mmcqd thread in CPU1(path2) hasn't entered to sleep mode: // path2 would have chance to break from wait_event_interruptible // in mmc_wait_for_data_req_done and continue to run for next // mmc_request (mmc_blk_rw_rq_prep). // // Within mmc_blk_rq_prep, mrq is cleared to 0. // If below line still gets host from "mrq" as the result of // compiler, the panic happens as we traced. wake_up_interruptible(&mrq->host->context_info.wait); } path2: CPU1: <The mmcqd thread runs mmc_queue_thread> static int mmc_wait_for_data_req_done(... { ... while (1) { wait_event_interruptible(context_info->wait, (context_info->is_done_rcv || context_info->is_new_req)); static void mmc_blk_rw_rq_prep(... { ... memset(brq, 0, sizeof(struct mmc_blk_request)); This issue happens very coincidentally; however adding mdelay(1) in mmc_wait_data_done as below could duplicate it easily. static void mmc_wait_data_done(struct mmc_request *mrq) { mrq->host->context_info.is_done_rcv = true; + mdelay(1); wake_up_interruptible(&mrq->host->context_info.wait); } At runtime, IRQ or ICache line missing may just happen at the same place of the mdelay(1). This patch gets the mmc_context_info at the beginning of function, it can avoid this race condition. Signed-off-by: Jialing Fu <jlfu@marvell.com> Tested-by: Shawn Lin <shawn.lin@rock-chips.com> Fixes: 2220eedf ("mmc: fix async request mechanism ....") Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Yishai Hadas authored
commit 35d4a0b6 upstream. Fixes: 2a72f212 ("IB/uverbs: Remove dev_table") Before this commit there was a device look-up table that was protected by a spin_lock used by ib_uverbs_open and by ib_uverbs_remove_one. When it was dropped and container_of was used instead, it enabled the race with remove_one as dev might be freed just after: dev = container_of(inode->i_cdev, struct ib_uverbs_device, cdev) but before the kref_get. In addition, this buggy patch added some dead code as container_of(x,y,z) can never be NULL and so dev can never be NULL. As a result the comment above ib_uverbs_open saying "the open method will either immediately run -ENXIO" is wrong as it can never happen. The solution follows Jason Gunthorpe suggestion from below URL: https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg25692.html cdev will hold a kref on the parent (the containing structure, ib_uverbs_device) and only when that kref is released it is guaranteed that open will never be called again. In addition, fixes the active count scheme to use an atomic not a kref to prevent WARN_ON as pointed by above comment from Jason. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Sagi Grimberg authored
commit 8d5944d8 upstream. If iser_initialize_task_headers() routine failed before dma mapping, we should not attempt to unmap in cleanup_task(). Fixes: 7414dde0 (IB/iser: Fix race between iser connection ...) Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Sagi Grimberg authored
commit d1673905 upstream. Since commit "IB/iser: Fix race between iser connection teardown..." iser_initialize_task_headers() might fail, so we need to check that. Fixes: 7414dde0 (IB/iser: Fix race between iser connection ...) Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Noa Osherovich authored
commit 5e99b139 upstream. The mlx4 IB driver implementation for ib_query_ah used a wrong offset (28 instead of 29) when link type is Ethernet. Fixed to use the correct one. Fixes: fa417f7b ('IB/mlx4: Add support for IBoE') Signed-off-by: Shani Michaeli <shanim@mellanox.com> Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Jack Morgenstein authored
commit 2b135db3 upstream. The pkey mapping for RoCE must remain the default mapping: VFs: virtual index 0 = mapped to real index 0 (0xFFFF) All others indices: mapped to a real pkey index containing an invalid pkey. PF: virtual index i = real index i. Don't allow users to change these mappings using files found in sysfs. Fixes: c1e7e466 ('IB/mlx4: Add iov directory in sysfs under the ib device') Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Jack Morgenstein authored
commit 90c1d8b6 upstream. send_mad_to_wire takes the same spinlock that is taken in the interrupt context. Therefore, it needs irqsave/restore. Fixes: b9c5d6a6 ('IB/mlx4: Add multicast group (MCG) paravirtualization for SR-IOV') Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Axel Lin authored
commit 2c3f4b97 upstream. All the callers assume devm_spear_pcm_platform_register is a devm_ API, so use devm_snd_dmaengine_pcm_register in devm_spear_pcm_platform_register. Fixes: e1771bcf ("ASoC: SPEAr: remove custom DMA alloc compat function") Signed-off-by: Axel Lin <axel.lin@ingics.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Kan Liang authored
commit 601083cf upstream. print_aggr() fails to print per-core/per-socket statistics after commit 582ec082 ("perf stat: Fix per-socket output bug for uncore events") if events have differnt cpus. Because in print_aggr(), aggr_get_id needs index (not cpu id) to find core/pkg id. Also, evsel cpu maps should be used to get aggregated id. Here is an example: Counting events cycles,uncore_imc_0/cas_count_read/. (Uncore event has cpumask 0,18) $ perf stat -e cycles,uncore_imc_0/cas_count_read/ -C0,18 --per-core sleep 2 Without this patch, it failes to get CPU 18 result. Performance counter stats for 'CPU(s) 0,18': S0-C0 1 7526851 cycles S0-C0 1 1.05 MiB uncore_imc_0/cas_count_read/ S1-C0 0 <not counted> cycles S1-C0 0 <not counted> MiB uncore_imc_0/cas_count_read/ With this patch, it can get both CPU0 and CPU18 result. Performance counter stats for 'CPU(s) 0,18': S0-C0 1 6327768 cycles S0-C0 1 0.47 MiB uncore_imc_0/cas_count_read/ S1-C0 1 330228 cycles S1-C0 1 0.29 MiB uncore_imc_0/cas_count_read/ Signed-off-by: Kan Liang <kan.liang@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Stephane Eranian <eranian@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Fixes: 582ec082 ("perf stat: Fix per-socket output bug for uncore events") Link: http://lkml.kernel.org/r/1435820925-51091-1-git-send-email-kan.liang@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Jisheng Zhang authored
commit 0dafa60e upstream. commit bb8175a8 ("mmc: sdhci: clarify DDR timing mode between SD-UHS and eMMC") added MMC_DDR52 as eMMC's DDR mode to be distinguished from SD-UHS, but it missed setting driver type for MMC_DDR52 timing mode. So sometimes we get the following error on Marvell BG2Q DMP board: [ 1.559598] mmcblk0: error -84 transferring data, sector 0, nr 8, cmd response 0x900, card status 0xb00 [ 1.569314] mmcblk0: retrying using single block read [ 1.575676] mmcblk0: error -84 transferring data, sector 2, nr 6, cmd response 0x900, card status 0x0 [ 1.585202] blk_update_request: I/O error, dev mmcblk0, sector 2 [ 1.591818] mmcblk0: error -84 transferring data, sector 3, nr 5, cmd response 0x900, card status 0x0 [ 1.601341] blk_update_request: I/O error, dev mmcblk0, sector 3 This patches fixes this by adding the missing driver type setting. Fixes: bb8175a8 ("mmc: sdhci: clarify DDR timing mode ...") Signed-off-by: Jisheng Zhang <jszhang@marvell.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Michal Kazior authored
commit 5e55e3cb upstream. The function returns 1 when DMA mapping fails. The driver would return bogus values and could possibly confuse itself if DMA failed. Fixes: 767d34fc ("ath10k: remove DMA mapping wrappers") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Paul Mackerras authored
commit 1e5bf454 upstream. The reference (R) and change (C) bits in a HPT entry can be set by hardware at any time up until the HPTE is invalidated and the TLB invalidation sequence has completed. This means that when removing a HPTE, we need to read the HPTE after the invalidation sequence has completed in order to obtain reliable values of R and C. The code in kvmppc_do_h_remove() used to do this. However, commit 6f22bd32 ("KVM: PPC: Book3S HV: Make HTAB code LE host aware") removed the read after invalidation as a side effect of other changes. This restores the read of the HPTE after invalidation. The user-visible effect of this bug would be that when migrating a guest, there is a small probability that a page modified by the guest and then unmapped by the guest might not get re-transmitted and thus the destination might end up with a stale copy of the page. Fixes: 6f22bd32Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-
Scott Feldman authored
commit eb4cb851 upstream. .maxtype should match .policy. Probably just been getting lucky here because IFLA_BRPORT_MAX > IFLA_BR_MAX. Fixes: 13323516 ("bridge: implement rtnl_link_ops->changelink") Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
-