- 06 Aug, 2014 23 commits
-
-
Jan Kara authored
commit a93cd4cf upstream. Hole punching code for files with indirect blocks wrongly computed number of blocks which need to be cleared when traversing the indirect block tree. That could result in punching more blocks than actually requested and thus effectively cause a data loss. For example: fallocate -n -p 10240000 4096 will punch the range 10240000 - 12632064 instead of the range 1024000 - 10244096. Fix the calculation. Fixes: 8bad6fc8Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Deng-Cheng Zhu authored
commit 8c9eb041 upstream. kvm_arch_vcpu_free() is called in 2 code paths: 1) kvm_vm_ioctl() kvm_vm_ioctl_create_vcpu() kvm_arch_vcpu_destroy() kvm_arch_vcpu_free() 2) kvm_put_kvm() kvm_destroy_vm() kvm_arch_destroy_vm() kvm_mips_free_vcpus() kvm_arch_vcpu_free() Neither of the paths handles VCPU free. We need to do it in kvm_arch_vcpu_free() corresponding to the memory allocation in kvm_arch_vcpu_create(). Signed-off-by:
Deng-Cheng Zhu <dengcheng.zhu@imgtec.com> Reviewed-by:
James Hogan <james.hogan@imgtec.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Paolo Bonzini authored
commit 8faeb529 upstream. Even though the virtio-scsi spec guarantees that all requests related to the TMF will have been completed by the time the TMF itself completes, the request queue's callback might not have run yet. This causes requests to be completed more than once, and as a result triggers a variety of BUGs or oopses. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Reviewed-by:
Venkatesh Srinivas <venkateshs@google.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Paolo Bonzini authored
commit cdda0e5a upstream. Calling the workqueue interface on uninitialized work items isn't a good idea even if they're zeroed. It's not failing catastrophically only through happy accidents. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Reviewed-by:
Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Brian King authored
commit 7114aae0 upstream. Add a memory barrier prior to sending a new command to the VIOS to ensure the VIOS does not receive stale data in the command buffer. Also add a memory barrier when processing the CRQ for completed commands. Signed-off-by:
Brian King <brking@linux.vnet.ibm.com> Acked-by:
Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Brian King authored
commit 9ee75597 upstream. If a CRQ reset is triggered for some reason while in the middle of performing VSCSI adapter initialization, we don't want to call the done function for the initialization MAD commands as this will only result in two threads attempting initialization at the same time, resulting in failures. Signed-off-by:
Brian King <brking@linux.vnet.ibm.com> Acked-by:
Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Wang, Yu authored
commit d6236f6d upstream. The system suspend flow as following: 1, Freeze all user processes and kenrel threads. 2, Try to suspend all devices. 2.1, If pci device is in RPM suspended state, then pci driver will try to resume it to RPM active state in the prepare stage. 2.2, xhci_resume function calls usb_hcd_resume_root_hub to queue two workqueue items to resume usb2&usb3 roothub devices. 2.3, Call suspend callbacks of devices. 2.3.1, All suspend callbacks of all hcd's children, including roothub devices are called. 2.3.2, Finally, hcd_pci_suspend callback is called. Due to workqueue threads were already frozen in step 1, the workqueue items can't be scheduled, and the roothub devices can't be resumed in this flow. The HCD_FLAG_WAKEUP_PENDING flag which is set in usb_hcd_resume_root_hub won't be cleared. Finally, hcd_pci_suspend will return -EBUSY, and system suspend fails. The reason why this issue doesn't show up very often is due to that choose_wakeup will be called in step 2.3.1. In step 2.3.1, if udev->do_remote_wakeup is not equal to device_may_wakeup(&udev->dev), then udev will resume to RPM active for changing the wakeup settings. This has been a lucky hit which hides this issue. For some special xHCI controllers which have no USB2 port, then roothub will not match hub driver due to probe failed. Then its do_remote_wakeup will be set to zero, and we won't be as lucky. xhci driver doesn't need to resume roothub devices everytime like in the above case. It's only needed when there are pending event TRBs. This patch should be back-ported to kernels as old as 3.2, that contains the commit f69e3120 "USB: XHCI: resume root hubs when the controller resumes" Signed-off-by:
Wang, Yu <yu.y.wang@intel.com> Acked-by:
Alan Stern <stern@rowland.harvard.edu> [use readl() instead of removed xhci_readl(), reword commit message -Mathias] Signed-off-by:
Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Lu Baolu authored
commit ff8cbf25 upstream. When xHCI PCI host is suspended, if do_wakeup is false in xhci_pci_suspend, xhci_bus_suspend needs to clear all root port wake on bits. Otherwise some Intel platforms may get a spurious wakeup, even if PCI PME# is disabled. This patch should be back-ported to kernels as old as 2.6.37, that contains the commit 9777e3ce "USB: xHCI: bus power management implementation". Signed-off-by:
Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by:
Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Mathias Nyman authored
commit 3213b151 upstream. The transfer burst count (TBC) field in xhci 1.0 hosts should be set to the number of bursts needed to transfer all packets in a isoc TD. Supported values are 0-2 (1 to 3 bursts per service interval). Formula for TBC calculation is given in xhci spec section 4.11.2.3: TBC = roundup( Transfer Descriptor Packet Count / Max Burst Size +1 ) - 1 This patch should be applied to stable kernels since 3.0 that contain the commit 5cd43e33 "xhci 1.0: Set transfer burst count field." Suggested-by:
ShiChun Ma <masc2008@qq.com> Signed-off-by:
Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Mathias Nyman authored
commit 6fcfb0d6 upstream. Command completion events normally include command completion status, SLOT_ID, and a pointer to the original command. Reset device command completion SLOT_ID may be zero according to xhci specs 4.6.11. VIA controllers set the SLOT_ID to zero, triggering a WARN_ON in the command completion handler. Use the SLOT ID found from the original command instead. This patch should be applied to stable kernels since 3.13 that contain the commit 20e7acb1 "xhci: use completion event's slot id rather than dig it out of command" Reported-by:
Saran Neti <sarannmr@gmail.com> Tested-by:
Saran Neti <sarannmr@gmail.com> Signed-off-by:
Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Ulrich Obergfell authored
commit 8922a908 upstream. After scsi_try_to_abort_cmd returns, the eh_abort_handler may have already found that the command has completed in the device, causing the host_byte to be nonzero (e.g. it could be DID_ABORT). When this happens, ORing DID_TIME_OUT into the host byte will corrupt the result field and initiate an unwanted command retry. Fix this by using set_host_byte instead, following the model of commit 2082ebc4. Signed-off-by:
Ulrich Obergfell <uobergfe@redhat.com> [Fix all instances according to review comments. - Paolo] Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Ewan D. Milne <emilne@redhat.com> Reviewed-by:
Hannes Reinecke <hare@suse.de> [ kamal: backport to 3.13-stable (only one instance of this) ] Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Bjørn Mork authored
commit b0ebef36 upstream. Adding a couple of Olivetti modems and blacklisting the net function on a couple which are already supported. Reported-by:
Lars Melin <larsm17@gmail.com> Signed-off-by:
Bjørn Mork <bjorn@mork.no> Signed-off-by:
Johan Hovold <johan@kernel.org> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Johan Hovold authored
commit aea1ae87 upstream. Fix NULL-pointer dereference when probing an interface with no endpoints. These devices have two bulk endpoints per interface, but this avoids crashing the kernel if a user forces a non-FTDI device to be probed. Note that the iterator variable was made unsigned in order to avoid a maybe-uninitialized compiler warning for ep_desc after the loop. Fixes: 895f28ba ("USB: ftdi_sio: fix hi-speed device packet size calculation") Reported-by:
Mike Remski <mremski@mutualink.net> Tested-by:
Mike Remski <mremski@mutualink.net> Signed-off-by:
Johan Hovold <johan@kernel.org> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Oliver Neukum authored
commit 1cab4c68 upstream. Reported by Alif Mubarak Ahmad: This device vendor and product id is 1c9e:9800 It is working as serial interface with generic usbserial driver. I thought it is more suitable to use usbserial option driver, which has better capability distinguishing between modem serial interface and micro sd storage interface. [ johan: style changes ] Signed-off-by:
Oliver Neukum <oneukum@suse.de> Tested-by:
Alif Mubarak Ahmad <alive4ever@live.com> Signed-off-by:
Johan Hovold <johan@kernel.org> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Thomas Gleixner authored
commit 4f436603 upstream. The ras3 block on spear320 claims to have 3 interrupts. In fact it has one and 6 reserved interrupts. Account the 6 reserved to this block so it has 7 interrupts total. That matches the datasheet and the device tree entries. Broken since commit 80515a5a(ARM: SPEAr3xx: shirq: simplify and move the shared irq multiplexor to DT). Testing is overrated.... Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20140619212712.872379208@linutronix.de Fixes: 80515a5a ('ARM: SPEAr3xx: shirq: simplify and move the shared irq multiplexor to DT') Acked-by:
Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by:
Jason Cooper <jason@lakedaemon.net> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Adam Thomson authored
commit a2c12493 upstream. Currently in the inkern.c code for IIO framework, the function of_iio_channel_get_by_name() will return a non-NULL pointer when it cannot find a channel using of_iio_channel_get() and when it tries to search for 'io-channel-ranges' property and fails. This is incorrect behaviour as the function which calls this expects a NULL pointer for failure. This patch rectifies the issue. Signed-off-by:
Adam Thomson <Adam.Thomson.Opensource@diasemi.com> Signed-off-by:
Jonathan Cameron <jic23@kernel.org> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Dan Carpenter authored
commit b70e19c2 upstream. We should be returning a negative error code instead of success here. This would have been detected by GCC, except that the "ret" variable was initialized with a bogus value to disable GCC's uninitialized variable warnings. I've cleaned that up, as well. Signed-off-by:
Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by:
Jonathan Cameron <jic23@kernel.org> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Michal Nazarewicz authored
commit f0688c8b upstream. If the descriptors do not need any strings and user space sends empty set of strings, the ffs->stringtabs field remains NULL. Thus *ffs->stringtabs in functionfs_bind leads to a NULL pointer dereferenece. The bug was introduced by commit [fd7c9a00: “use usb_string_ids_n()”]. While at it, remove double initialisation of lang local variable in that function. ffs->strings_count does not need to be checked in any way since in the above scenario it will remain zero and usb_string_ids_n() is a no-operation when colled with 0 argument. Signed-off-by:
Michal Nazarewicz <mina86@mina86.com> Signed-off-by:
Felipe Balbi <balbi@ti.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Linus Walleij authored
commit 82363cf2 upstream. There is a regression in the upcoming v3.16-rc1, that is caused by a problem that has been around for a while but now finally hangs the system. The bootcrawl looks like this: pinctrl-nomadik soc:pinctrl: pin GPIO256_AF28 already requested by a03e0000.usb_per5; cannot claim for musb-hdrc.0.auto pinctrl-nomadik soc:pinctrl: pin-256 (musb-hdrc.0.auto) status -22 pinctrl-nomadik soc:pinctrl: could not request pin 256 (GPIO256_AF28) from group usb_a_1 on device pinctrl-nomadik musb-hdrc musb-hdrc.0.auto: Error applying setting, reverse things back HS USB OTG: no transceiver configured musb-hdrc musb-hdrc.0.auto: musb_init_controller failed with status -517 platform musb-hdrc.0.auto: Driver musb-hdrc requests probe deferral (...) The ux500 MUSB driver propagates the OF node to the dynamically created musb-hdrc device, which is incorrect as it makes the OF core believe there are two devices spun from the very same DT node, which confuses other parts of the device core, notably the pin control subsystem, which will try to apply all the pin control settings also to the HDRC device as it gets instantiated. (The OMAP2430 for example, does not set the of_node member.) Cc: Arnd Bergmann <arnd@arndb.de> Acked-by:
Lee Jones <lee.jones@linaro.org> Signed-off-by:
Linus Walleij <linus.walleij@linaro.org> Signed-off-by:
Felipe Balbi <balbi@ti.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Paolo Bonzini authored
commit 7cb060a9 upstream. KVM does not really do much with the PAT, so this went unnoticed for a long time. It is exposed however if you try to do rdmsr on the PAT register. Reported-by:
Valentine Sinitsyn <valentine.sinitsyn@gmail.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Nadav Amit authored
commit 682367c4 upstream. Recent Intel CPUs have 10 variable range MTRRs. Since operating systems sometime make assumptions on CPUs while they ignore capability MSRs, it is better for KVM to be consistent with recent CPUs. Reporting more MTRRs than actually supported has no functional implications. Signed-off-by:
Nadav Amit <namit@cs.technion.ac.il> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
David R. Piegdon authored
commit c021f241 upstream. Fix a parser-bug in the omap2 muxing code where muxtable-entries will be wrongly selected if the requested muxname is a *prefix* of their m0-entry and they have a matching mN-entry. Fix by additionally checking that the length of the m0_entry is equal. For example muxing of "dss_data2.dss_data2" on omap32xx will fail because the prefix "dss_data2" will match the mux-entries "dss_data2" as well as "dss_data20", with the suffix "dss_data2" matching m0 (for dss_data2) and m4 (for dss_data20). Thus both are recognized as signal path candidates: Relevant muxentries from mux34xx.c: _OMAP3_MUXENTRY(DSS_DATA20, 90, "dss_data20", NULL, "mcspi3_somi", "dss_data2", "gpio_90", NULL, NULL, "safe_mode"), _OMAP3_MUXENTRY(DSS_DATA2, 72, "dss_data2", NULL, NULL, NULL, "gpio_72", NULL, NULL, "safe_mode"), This will result in a failure to mux the pin at all: _omap_mux_get_by_name: Multiple signal paths (2) for dss_data2.dss_data2 Patch should apply to linus' latest master down to rather old linux-2.6 trees. Signed-off-by:
David R. Piegdon <lkml@p23q.org> [tony@atomide.com: updated description to include full description] Signed-off-by:
Tony Lindgren <tony@atomide.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Jan Kara authored
commit c5c7b8dd upstream. Error recovery in ext4_alloc_branch() calls ext4_forget() even for buffer corresponding to indirect block it did not allocate. This leads to brelse() being called twice for that buffer (once from ext4_forget() and once from cleanup in ext4_ind_map_blocks()) leading to buffer use count misaccounting. Eventually (but often much later because there are other users of the buffer) we will see messages like: VFS: brelse: Trying to free free buffer Another manifestation of this problem is an error: JBD2 unexpected failure: jbd2_journal_revoke: !buffer_revoked(bh); inconsistent data on disk The fix is easy - don't forget buffer we did not allocate. Also add an explanatory comment because the indexing at ext4_alloc_branch() is somewhat subtle. Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
- 04 Aug, 2014 6 commits
-
-
David Vrabel authored
commit b7dd0e35 upstream. arch_gnttab_map_frames() and arch_gnttab_unmap_frames() are called in atomic context but were calling alloc_vm_area() which might sleep. Also, if a driver attempts to allocate a grant ref from an interrupt and the table needs expanding, then the CPU may already by in lazy MMU mode and apply_to_page_range() will BUG when it tries to re-enable lazy MMU mode. These two functions are only used in PV guests. Introduce arch_gnttab_init() to allocates the virtual address space in advance. Avoid the use of apply_to_page_range() by using saving and using the array of PTE addresses from the alloc_vm_area() call (which ensures that the required page tables are pre-allocated). Signed-off-by:
David Vrabel <david.vrabel@citrix.com> Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [ David Vrabel: Backported to 3.13.10. ] Signed-off-by:
David Vrabel <david.vrabel@citrix.com> Cc: Stefan Bader <stefan.bader@canonical.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Mengdong Lin authored
commit 2df6742f upstream. This is a temporary fix for some Intel HDMI codecs to avoid no sound output for a resuming playback after S3. After S3, the audio driver restores pin:cvt connection selections by snd_hda_codec_resume_cache(). However this can happen before the gfx side is ready and such connect selection is overlooked by HW. After gfx is ready, the pins make the default selection again. And this will cause multiple pins share a same convertor and mute control will affect each other. Thus a resumed audio playback become silent after S3. This patch verifies pin:cvt connection on preparing a stream, to assure the pin selects the right convetor and an assigned convertor is not shared by other unused pins. Apply this fix-up on Haswell, Broadwell and Valleyview (Baytrail). We need this temporary fix before a reliable software communication channel is established between audio and gfx, to sync audio/gfx operations. Signed-off-by:
Mengdong Lin <mengdong.lin@intel.com> Signed-off-by:
Takashi Iwai <tiwai@suse.de> Cc: David Henningsson <david.henningsson@canonical.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Mengdong Lin authored
commit b4f75aea upstream. This patch will verify the pin's coverter selection for an active stream when an unsol event reports this pin becomes available again after a display mode change or hot-plug event. For Haswell+ and Valleyview: display mode change or hot-plug can change the transcoder:port connection and make all the involved audio pins share the 1st converter. So the stream using 1st convertor will flow to multiple pins but active streams using other converters will fail. This workaround is to assure the pin selects the right conveter and an assigned converter is not shared by other unused pins. Signed-off-by:
Mengdong Lin <mengdong.lin@intel.com> Signed-off-by:
Takashi Iwai <tiwai@suse.de> Cc: David Henningsson <david.henningsson@canonical.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Mengdong Lin authored
commit 75dcbe4d upstream. Broadwell and Haswell have the same behavior on display audio. So this patch defines is_haswell_plus() to include codecs for both Haswell and its successor Broadwell, and apply all Haswell fix-ups to Broadwell. Signed-off-by:
Mengdong Lin <mengdong.lin@intel.com> Signed-off-by:
Takashi Iwai <tiwai@suse.de> Cc: David Henningsson <david.henningsson@canonical.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Peter Christensen authored
commit f44a5f45 upstream. Receiving a ICMP response to an IPIP packet in a non-linear skb could cause a kernel panic in __skb_pull. The problem was introduced in commit f2edb9f7 ("ipvs: implement passive PMTUD for IPIP packets"). Signed-off-by:
Peter Christensen <pch@ordbogen.com> Acked-by:
Julian Anastasov <ja@ssi.bg> Signed-off-by:
Simon Horman <horms@verge.net.au> Cc: Chris J Arges <chris.j.arges@canonical.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Wei-Chun Chao authored
commit 5882a07c upstream. This patch fixes a kernel BUG_ON in skb_segment. It is hit when testing two VMs on openvswitch with one VM acting as VXLAN gateway. During VXLAN packet GSO, skb_segment is called with skb->data pointing to inner TCP payload. skb_segment calls skb_network_protocol to retrieve the inner protocol. skb_network_protocol actually expects skb->data to point to MAC and it calls pskb_may_pull with ETH_HLEN. This ends up pulling in ETH_HLEN data from header tail. As a result, pskb_trim logic is skipped and BUG_ON is hit later. Move skb_push in front of skb_network_protocol so that skb->data lines up properly. kernel BUG at net/core/skbuff.c:2999! Call Trace: [<ffffffff816ac412>] tcp_gso_segment+0x122/0x410 [<ffffffff816bc74c>] inet_gso_segment+0x13c/0x390 [<ffffffff8164b39b>] skb_mac_gso_segment+0x9b/0x170 [<ffffffff816b3658>] skb_udp_tunnel_segment+0xd8/0x390 [<ffffffff816b3c00>] udp4_ufo_fragment+0x120/0x140 [<ffffffff816bc74c>] inet_gso_segment+0x13c/0x390 [<ffffffff8109d742>] ? default_wake_function+0x12/0x20 [<ffffffff8164b39b>] skb_mac_gso_segment+0x9b/0x170 [<ffffffff8164b4d0>] __skb_gso_segment+0x60/0xc0 [<ffffffff8164b6b3>] dev_hard_start_xmit+0x183/0x550 [<ffffffff8166c91e>] sch_direct_xmit+0xfe/0x1d0 [<ffffffff8164bc94>] __dev_queue_xmit+0x214/0x4f0 [<ffffffff8164bf90>] dev_queue_xmit+0x10/0x20 [<ffffffff81687edb>] ip_finish_output+0x66b/0x890 [<ffffffff81688a58>] ip_output+0x58/0x90 [<ffffffff816c628f>] ? fib_table_lookup+0x29f/0x350 [<ffffffff816881c9>] ip_local_out_sk+0x39/0x50 [<ffffffff816cbfad>] iptunnel_xmit+0x10d/0x130 [<ffffffffa0212200>] vxlan_xmit_skb+0x1d0/0x330 [vxlan] [<ffffffffa02a3919>] vxlan_tnl_send+0x129/0x1a0 [openvswitch] [<ffffffffa02a2cd6>] ovs_vport_send+0x26/0xa0 [openvswitch] [<ffffffffa029931e>] do_output+0x2e/0x50 [openvswitch] Signed-off-by:
Wei-Chun Chao <weichunc@plumgrid.com> Signed-off-by:
David S. Miller <davem@davemloft.net> (backported from commit 5882a07c) Signed-off-by:
Dave Chiluk <chiluk@canonical.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
- 29 Jul, 2014 4 commits
-
-
Hugh Dickins authored
commit b1a36650 upstream. shmem_fault() is the actual culprit in trinity's hole-punch starvation, and the most significant cause of such problems: since a page faulted is one that then appears page_mapped(), needing unmap_mapping_range() and i_mmap_mutex to be unmapped again. But it is not the only way in which a page can be brought into a hole in the radix_tree while that hole is being punched; and Vlastimil's testing implies that if enough other processors are busy filling in the hole, then shmem_undo_range() can be kept from completing indefinitely. shmem_file_splice_read() is the main other user of SGP_CACHE, which can instantiate shmem pagecache pages in the read-only case (without holding i_mutex, so perhaps concurrently with a hole-punch). Probably it's silly not to use SGP_READ already (using the ZERO_PAGE for holes): which ought to be safe, but might bring surprises - not a change to be rushed. shmem_read_mapping_page_gfp() is an internal interface used by drivers/gpu/drm GEM (and next by uprobes): it should be okay. And shmem_file_read_iter() uses the SGP_DIRTY variant of SGP_CACHE, when called internally by the kernel (perhaps for a stacking filesystem, which might rely on holes to be reserved): it's unclear whether it could be provoked to keep hole-punch busy or not. We could apply the same umbrella as now used in shmem_fault() to shmem_file_splice_read() and the others; but it looks ugly, and use over a range raises questions - should it actually be per page? can these get starved themselves? The origin of this part of the problem is my v3.1 commit d0823576 ("mm: pincer in truncate_inode_pages_range"), once it was duplicated into shmem.c. It seemed like a nice idea at the time, to ensure (barring RCU lookup fuzziness) that there's an instant when the entire hole is empty; but the indefinitely repeated scans to ensure that make it vulnerable. Revert that "enhancement" to hole-punch from shmem_undo_range(), but retain the unproblematic rescanning when it's truncating; add a couple of comments there. Remove the "indices[0] >= end" test: that is now handled satisfactorily by the inner loop, and mem_cgroup_uncharge_start()/end() are too light to be worth avoiding here. But if we do not always loop indefinitely, we do need to handle the case of swap swizzled back to page before shmem_free_swap() gets it: add a retry for that case, as suggested by Konstantin Khlebnikov; and for the case of page swizzled back to swap, as suggested by Johannes Weiner. Signed-off-by:
Hugh Dickins <hughd@google.com> Reported-by:
Sasha Levin <sasha.levin@oracle.com> Suggested-by:
Vlastimil Babka <vbabka@suse.cz> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Lukas Czerner <lczerner@redhat.com> Cc: Dave Jones <davej@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> [ luis: backported to 3.11: used hughd's backport to 3.10.50 ] Signed-off-by:
Luis Henriques <luis.henriques@canonical.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Hugh Dickins authored
commit 8e205f77 upstream. Commit f00cdc6d ("shmem: fix faulting into a hole while it's punched") was buggy: Sasha sent a lockdep report to remind us that grabbing i_mutex in the fault path is a no-no (write syscall may already hold i_mutex while faulting user buffer). We tried a completely different approach (see following patch) but that proved inadequate: good enough for a rational workload, but not good enough against trinity - which forks off so many mappings of the object that contention on i_mmap_mutex while hole-puncher holds i_mutex builds into serious starvation when concurrent faults force the puncher to fall back to single-page unmap_mapping_range() searches of the i_mmap tree. So return to the original umbrella approach, but keep away from i_mutex this time. We really don't want to bloat every shmem inode with a new mutex or completion, just to protect this unlikely case from trinity. So extend the original with wait_queue_head on stack at the hole-punch end, and wait_queue item on the stack at the fault end. This involves further use of i_lock to guard against the races: lockdep has been happy so far, and I see fs/inode.c:unlock_new_inode() holds i_lock around wake_up_bit(), which is comparable to what we do here. i_lock is more convenient, but we could switch to shmem's info->lock. This issue has been tagged with CVE-2014-4171, which will require commit f00cdc6d and this and the following patch to be backported: we suggest to 3.1+, though in fact the trinity forkbomb effect might go back as far as 2.6.16, when madvise(,,MADV_REMOVE) came in - or might not, since much has changed, with i_mmap_mutex a spinlock before 3.0. Anyone running trinity on 3.0 and earlier? I don't think we need care. Signed-off-by:
Hugh Dickins <hughd@google.com> Reported-by:
Sasha Levin <sasha.levin@oracle.com> Tested-by:
Sasha Levin <sasha.levin@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Lukas Czerner <lczerner@redhat.com> Cc: Dave Jones <davej@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Hugh Dickins authored
commit f00cdc6d upstream. Trinity finds that mmap access to a hole while it's punched from shmem can prevent the madvise(MADV_REMOVE) or fallocate(FALLOC_FL_PUNCH_HOLE) from completing, until the reader chooses to stop; with the puncher's hold on i_mutex locking out all other writers until it can complete. It appears that the tmpfs fault path is too light in comparison with its hole-punching path, lacking an i_data_sem to obstruct it; but we don't want to slow down the common case. Extend shmem_fallocate()'s existing range notification mechanism, so shmem_fault() can refrain from faulting pages into the hole while it's punched, waiting instead on i_mutex (when safe to sleep; or repeatedly faulting when not). [akpm@linux-foundation.org: coding-style fixes] Signed-off-by:
Hugh Dickins <hughd@google.com> Reported-by:
Sasha Levin <sasha.levin@oracle.com> Tested-by:
Sasha Levin <sasha.levin@oracle.com> Cc: Dave Jones <davej@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Sven Wegener authored
commit 8142b215 upstream. Commit 554086d8 ("x86_32, entry: Do syscall exit work on badsys (CVE-2014-4508)") introduced a regression in the x86_32 syscall entry code, resulting in syscall() not returning proper errors for undefined syscalls on CPUs supporting the sysenter feature. The following code: > int result = syscall(666); > printf("result=%d errno=%d error=%s\n", result, errno, strerror(errno)); results in: > result=666 errno=0 error=Success Obviously, the syscall return value is the called syscall number, but it should have been an ENOSYS error. When run under ptrace it behaves correctly, which makes it hard to debug in the wild: > result=-1 errno=38 error=Function not implemented The %eax register is the return value register. For debugging via ptrace the syscall entry code stores the complete register context on the stack. The badsys handlers only store the ENOSYS error code in the ptrace register set and do not set %eax like a regular syscall handler would. The old resume_userspace call chain contains code that clobbers %eax and it restores %eax from the ptrace registers afterwards. The same goes for the ptrace-enabled call chain. When ptrace is not used, the syscall return value is the passed-in syscall number from the untouched %eax register. Use %eax as the return value register in syscall_badsys and sysenter_badsys, like a real syscall handler does, and have the caller push the value onto the stack for ptrace access. Signed-off-by:
Sven Wegener <sven.wegener@stealer.net> Link: http://lkml.kernel.org/r/alpine.LNX.2.11.1407221022380.31021@titan.int.lan.stealer.netReviewed-and-tested-by:
Andy Lutomirski <luto@amacapital.net> Signed-off-by:
H. Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
- 24 Jul, 2014 1 commit
-
-
Naoya Horiguchi authored
commit 0253d634 upstream. Commit 4a705fef ("hugetlb: fix copy_hugetlb_page_range() to handle migration/hwpoisoned entry") changed the order of huge_ptep_set_wrprotect() and huge_ptep_get(), which leads to breakage in some workloads like hugepage-backed heap allocation via libhugetlbfs. This patch fixes it. The test program for the problem is shown below: $ cat heap.c #include <unistd.h> #include <stdlib.h> #include <string.h> #define HPS 0x200000 int main() { int i; char *p = malloc(HPS); memset(p, '1', HPS); for (i = 0; i < 5; i++) { if (!fork()) { memset(p, '2', HPS); p = malloc(HPS); memset(p, '3', HPS); free(p); return 0; } } sleep(1); free(p); return 0; } $ export HUGETLB_MORECORE=yes ; export HUGETLB_NO_PREFAULT= ; hugectl --heap ./heap Fixes 4a705fef ("hugetlb: fix copy_hugetlb_page_range() to handle migration/hwpoisoned entry"), so is applicable to -stable kernels which include it. Signed-off-by:
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reported-by:
Guillaume Morin <guillaume@morinfr.org> Suggested-by:
Guillaume Morin <guillaume@morinfr.org> Acked-by:
Hugh Dickins <hughd@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
- 21 Jul, 2014 4 commits
-
-
Xufeng Zhang authored
commit d3217b15 upstream. Consider the scenario: For a TCP-style socket, while processing the COOKIE_ECHO chunk in sctp_sf_do_5_1D_ce(), after it has passed a series of sanity check, a new association would be created in sctp_unpack_cookie(), but afterwards, some processing maybe failed, and sctp_association_free() will be called to free the previously allocated association, in sctp_association_free(), sk_ack_backlog value is decremented for this socket, since the initial value for sk_ack_backlog is 0, after the decrement, it will be 65535, a wrap-around problem happens, and if we want to establish new associations afterward in the same socket, ABORT would be triggered since sctp deem the accept queue as full. Fix this issue by only decrementing sk_ack_backlog for associations in the endpoint's list. Fix-suggested-by:
Neil Horman <nhorman@tuxdriver.com> Signed-off-by:
Xufeng Zhang <xufeng.zhang@windriver.com> Acked-by:
Daniel Borkmann <dborkman@redhat.com> Acked-by:
Vlad Yasevich <vyasevich@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Reference: CVE-2014-4667 Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Nicholas Bellinger authored
[Note that a different patch to address the same issue went in during v3.15-rc1 (commit 4442dc8a), but includes a bunch of other changes that don't strictly apply to fixing the bug.] This patch changes rd_allocate_sgl_table() to explicitly clear ramdisk_mcp backend memory pages by passing __GFP_ZERO into alloc_pages(). This addresses a potential security issue where reading from a ramdisk_mcp could return sensitive information, and follows what >= v3.15 does to explicitly clear ramdisk_mcp memory at backend device initialization time. Reported-by:
Jorge Daniel Sequeira Matias <jdsm@tecnico.ulisboa.pt> Cc: Jorge Daniel Sequeira Matias <jdsm@tecnico.ulisboa.pt> Signed-off-by:
Nicholas Bellinger <nab@linux-iscsi.org> Reference: CVE-2014-4027 Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Paolo Bonzini authored
commit 5678de3f upstream. QE reported that they got the BUG_ON in ioapic_service to trigger. I cannot reproduce it, but there are two reasons why this could happen. The less likely but also easiest one, is when kvm_irq_delivery_to_apic does not deliver to any APIC and returns -1. Because irqe.shorthand == 0, the kvm_for_each_vcpu loop in that function is never reached. However, you can target the similar loop in kvm_irq_delivery_to_apic_fast; just program a zero logical destination address into the IOAPIC, or an out-of-range physical destination address. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
Tony Camuso authored
commit 5b59c69e upstream. The purpose of the acpi_pad driver is to implement the "processor power aggregator" device as described in the ACPI 4.0 spec section 8.5. It takes requests from the BIOS (via ACPI) to put a specified number of CPUs into idle, in order to save power, until further notice. It does this by creating high-priority threads that try to keep the CPUs in a high C-state (using the monitor/mwait CPU instructions). The mwait() call is in a loop that checks periodically if the thread should end and a few other things. It was discovered through testing that the power_saving threads were causing the system to consume more power than the system was consuming before the threads were created. A counter in the main loop of power_saving_thread() revealed that it was spinning. The mwait() instruction was not keeping the CPU in a high C state very much if at all. Here is a simplification of the loop in function power_saving_thread() in drivers/acpi/acpi_pad.c while (!kthread_should_stop()) { : try_to_freeze() : while (!need_resched()) { : if (!need_resched()) __mwait(power_saving_mwait_eax, 1); : if (jiffies > expire_time) { do_sleep = 1; break; } } } If need_resched() returns true, then mwait() is not called. It was returning true because of things like timer interrupts, as in the following sequence. hrtimer_interrupt->__run_hrtimer->tick_sched_timer-> update_process_times-> rcu_check_callbacks->rcu_pending->__rcu_pending->set_need_resched Kernels 3.5.0-rc2+ do not exhibit this problem, because a patch to try_to_freeze() in include/linux/freezer.h introduces a call to might_sleep(), which ultimately calls schedule() to clear the reschedule flag and allows the the loop to execute the call to mwait(). However, the changes to try_to_freeze are unrelated to acpi_pad, and it does not seem like a good idea to rely on an unrelated patch in a function that could later be changed and reintroduce this bug. Therefore, it seems better to make an explicit call to schedule() in the outer loop when the need_resched flag is set. Reported-and-tested-by:
Stuart Hayes <stuart_hayes@dell.com> Signed-off-by:
Tony Camuso <tcamuso@redhat.com> Signed-off-by:
Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Leann Ogasawara <leann.ogasawara@canonical.com> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
- 18 Jul, 2014 1 commit
-
-
Kamal Mostafa authored
Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-
- 16 Jul, 2014 1 commit
-
-
Mikulas Patocka authored
commit 81a9c5e7 upstream. On uniprocessor preemptible kernel, target core deadlocks on unload. The following events happen: * iscsit_del_np is called * it calls send_sig(SIGINT, np->np_thread, 1); * the scheduler switches to the np_thread * the np_thread is woken up, it sees that kthread_should_stop() returns false, so it doesn't terminate * the np_thread clears signals with flush_signals(current); and goes back to sleep in iscsit_accept_np * the scheduler switches back to iscsit_del_np * iscsit_del_np calls kthread_stop(np->np_thread); * the np_thread is waiting in iscsit_accept_np and it doesn't respond to kthread_stop The deadlock could be resolved if the administrator sends SIGINT signal to the np_thread with killall -INT iscsi_np The reproducible deadlock was introduced in commit db6077fd, but the thread-stopping code was racy even before. This patch fixes the problem. Using kthread_should_stop to stop the np_thread is unreliable, so we test np_thread_state instead. If np_thread_state equals ISCSI_NP_THREAD_SHUTDOWN, the thread exits. Signed-off-by:
Mikulas Patocka <mpatocka@redhat.com> Signed-off-by:
Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by:
Kamal Mostafa <kamal@canonical.com>
-