- 27 Mar, 2013 40 commits
-
-
Matt Fleming authored
commit ec50bd32 upstream. It's not wise to assume VariableNameSize represents the length of VariableName, as not all firmware updates VariableNameSize in the same way (some don't update it at all if EFI_SUCCESS is returned). There are even implementations out there that update VariableNameSize with values that are both larger than the string returned in VariableName and smaller than the buffer passed to GetNextVariableName(), which resulted in the following bug report from Michael Schroeder, > On HP z220 system (firmware version 1.54), some EFI variables are > incorrectly named : > > ls -d /sys/firmware/efi/vars/*8be4d* | grep -v -- -8be returns > /sys/firmware/efi/vars/dbxDefault-pport8be4df61-93ca-11d2-aa0d-00e098032b8c > /sys/firmware/efi/vars/KEKDefault-pport8be4df61-93ca-11d2-aa0d-00e098032b8c > /sys/firmware/efi/vars/SecureBoot-pport8be4df61-93ca-11d2-aa0d-00e098032b8c > /sys/firmware/efi/vars/SetupMode-Information8be4df61-93ca-11d2-aa0d-00e098032b8c The issue here is that because we blindly use VariableNameSize without verifying its value, we can potentially read garbage values from the buffer containing VariableName if VariableNameSize is larger than the length of VariableName. Since VariableName is a string, we can calculate its size by searching for the terminating NULL character. Reported-by: Frederic Crozat <fcrozat@suse.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Josh Boyer <jwboyer@redhat.com> Cc: Michael Schroeder <mls@suse.com> Cc: Lee, Chun-Yi <jlee@suse.com> Cc: Lingzhu Xiang <lxiang@redhat.com> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Seiji Aguchi authored
commit a93bc0c6 upstream. [Problem] efi_pstore creates sysfs entries, which enable users to access to NVRAM, in a write callback. If a kernel panic happens in an interrupt context, it may fail because it could sleep due to dynamic memory allocations during creating sysfs entries. [Patch Description] This patch removes sysfs operations from a write callback by introducing a workqueue updating sysfs entries which is scheduled after the write callback is called. Also, the workqueue is kicked in a just oops case. A system will go down in other cases such as panic, clean shutdown and emergency restart. And we don't need to create sysfs entries because there is no chance for users to access to them. efi_pstore will be robust against a kernel panic in an interrupt context with this patch. Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Acked-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> [bwh: Backported to 3.2: - Adjust contest - Don't check reason in efi_pstore_write(), as it is not given as a parameter - Move up declaration of __efivars] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Ben Hutchings authored
commit ca0ba26f upstream. The 'CONFIG_' prefix is not implicit in IS_ENABLED(). Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
-
Seth Forshee authored
commit ec0971ba upstream. We know that with some firmware implementations writing too much data to UEFI variables can lead to bricking machines. Recent changes attempt to address this issue, but for some it may still be prudent to avoid writing large amounts of data until the solution has been proven on a wide variety of hardware. Crash dumps or other data from pstore can potentially be a large data source. Add a pstore_module parameter to efivars to allow disabling its use as a backend for pstore. Also add a config option, CONFIG_EFI_VARS_PSTORE_DEFAULT_DISABLE, to allow setting the default value of this paramter to true (i.e. disabled by default). Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Cc: Josh Boyer <jwboyer@redhat.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Seth Forshee authored
commit ed9dc8ce upstream. Add a new option, CONFIG_EFI_VARS_PSTORE, which can be set to N to avoid using efivars as a backend to pstore, as some users may want to compile out the code completely. Set the default to Y to maintain backwards compatability, since this feature has always been enabled until now. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Cc: Josh Boyer <jwboyer@redhat.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Joe Thornber authored
commit f046f89a upstream. Fix a bug in dm_btree_remove that could leave leaf values with incorrect reference counts. The effect of this was that removal of a shared block could result in the space maps thinking the block was no longer used. More concretely, if you have a thin device and a snapshot of it, sending a discard to a shared region of the thin could corrupt the snapshot. Thinp uses a 2-level nested btree to store it's mappings. This first level is indexed by thin device, and the second level by logical block. Often when we're removing an entry in this mapping tree we need to rebalance nodes, which can involve shadowing them, possibly creating a copy if the block is shared. If we do create a copy then children of that node need to have their reference counts incremented. In this way reference counts percolate down the tree as shared trees diverge. The rebalance functions were incrementing the children at the appropriate time, but they were always assuming the children were internal nodes. This meant the leaf values (in our case packed block/flags entries) were not being incremented. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> [bwh: Backported to 3.2: bump target version numbers from 1.0.1 to 1.0.2] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Alan Stern authored
commit 511f3c53 upstream. This patch (as1666) fixes a regression in the UDC core. The core takes care of unbinding gadget drivers, and it does the unbinding before telling the UDC driver to turn off the controller hardware. When the call to the udc_stop callback is made, the gadget no longer has a driver. The callback routine should not be invoked with a pointer to the old driver; doing so can cause problems (such as use-after-free accesses in net2280). This patch should be applied, with appropriate context changes, to all the stable kernels going back to 3.1. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Felipe Balbi <balbi@ti.com> [bwh: Backported to 3.2: adjust context, indentation] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Takashi Iwai authored
commit a686fd14 upstream. There is a typo in convert_to_spdif_status() about checking the emphasis IEC958 status bit. It should check the given value instead of the resultant value. Reported-by: Martin Weishart <martin.weishart@telosalliance.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Theodore Ts'o authored
commit 2b405bfa upstream. In data=journal mode, if we unmount the file system before a transaction has a chance to complete, when the journal inode is being evicted, we can end up calling into jbd2_log_wait_commit() for the last transaction, after the journalling machinery has been shut down. Arguably we should adjust ext4_should_journal_data() to return FALSE for the journal inode, but the only place it matters is ext4_evict_inode(), and so to save a bit of CPU time, and to make the patch much more obviously correct by inspection(tm), we'll fix it by explicitly not trying to waiting for a journal commit when we are evicting the journal inode, since it's guaranteed to never succeed in this case. This can be easily replicated via: mount -t ext4 -o data=journal /dev/vdb /vdb ; umount /vdb ------------[ cut here ]------------ WARNING: at /usr/projects/linux/ext4/fs/jbd2/journal.c:542 __jbd2_log_start_commit+0xba/0xcd() Hardware name: Bochs JBD2: bad log_start_commit: 3005630206 3005630206 0 0 Modules linked in: Pid: 2909, comm: umount Not tainted 3.8.0-rc3 #1020 Call Trace: [<c015c0ef>] warn_slowpath_common+0x68/0x7d [<c02b7e7d>] ? __jbd2_log_start_commit+0xba/0xcd [<c015c177>] warn_slowpath_fmt+0x2b/0x2f [<c02b7e7d>] __jbd2_log_start_commit+0xba/0xcd [<c02b8075>] jbd2_log_start_commit+0x24/0x34 [<c0279ed5>] ext4_evict_inode+0x71/0x2e3 [<c021f0ec>] evict+0x94/0x135 [<c021f9aa>] iput+0x10a/0x110 [<c02b7836>] jbd2_journal_destroy+0x190/0x1ce [<c0175284>] ? bit_waitqueue+0x50/0x50 [<c028d23f>] ext4_put_super+0x52/0x294 [<c020efe3>] generic_shutdown_super+0x48/0xb4 [<c020f071>] kill_block_super+0x22/0x60 [<c020f3e0>] deactivate_locked_super+0x22/0x49 [<c020f5d6>] deactivate_super+0x30/0x33 [<c0222795>] mntput_no_expire+0x107/0x10c [<c02233a7>] sys_umount+0x2cf/0x2e0 [<c02233ca>] sys_oldumount+0x12/0x14 [<c08096b8>] syscall_call+0x7/0xb ---[ end trace 6a954cc790501c1f ]--- jbd2_log_wait_commit: error: j_commit_request=-1289337090, tid=0 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Daniel Mack authored
commit 83ea5d18 upstream. Creation of individual mixer controls may fail, but that shouldn't cause the entire mixer creation to fail. Even worse, if the mixer creation fails, that will error out the entire device probing. All the functions called by parse_audio_unit() should return -EINVAL if they find descriptors that are unsupported or believed to be malformed, so we can safely handle this error code as a non-fatal condition in snd_usb_mixer_controls(). That fixes a long standing bug which is commonly worked around by adding quirks which make the driver ignore entire interfaces. Some of them might now be unnecessary. Signed-off-by: Daniel Mack <zonque@gmail.com> Reported-and-tested-by: Rodolfo Thomazelli <pe.soberbo@gmail.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Daniel Mack authored
commit 4d7b86c9 upstream. In check_input_term() and parse_audio_feature_unit(), propagate the error value that has been returned by a failing function instead of -EINVAL. That helps cleaning up the error pathes in the mixer. Signed-off-by: Daniel Mack <zonque@gmail.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
CQ Tang authored
commit 66db3feb upstream. The increment of "to" in copy_user_handle_tail() will have incremented before a failure has been noted. This causes us to skip a byte in the failure case. Only do the increment when assured there is no failure. Signed-off-by: CQ Tang <cq.tang@intel.com> Link: http://lkml.kernel.org/r/20130318150221.8439.993.stgit@phlsvslse11.ph.intel.comSigned-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Dmitry Torokhov authored
commit f8264340 upstream. According to XHCI specification (5.5.2.1) the IP is bit 0 and IE is bit 1 of IMAN register. Previously their definitions were reversed. Even though there are no ill effects being observed from the swapped definitions (because IMAN_IP is RW1C and in legacy PCI case we come in with it already set to 1 so it was clearing itself even though we were setting IMAN_IE instead of IMAN_IP), we should still correct the values. This patch should be backported to kernels as old as 2.6.36, that contain the commit 4e833c0b "xhci: don't re-enable IE constantly". Signed-off-by: Dmitry Torokhov <dtor@vmware.com> Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Takashi Iwai authored
commit a86b1a2c upstream. The argument passed to snd_hda_attach_beep_device() is a widget NID while spec->beep_amp holds the composed value for amp controls. Signed-off-by: Takashi Iwai <tiwai@suse.de> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Alex Deucher authored
commit fa8d387d upstream. Fixes a segfault on asics without a blit callback. Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=62239Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> [bwh: Backported to 3.2: s/copy\.blit/copy_blit/] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Dmitry Artamonow authored
commit 29f86e66 upstream. Device stucks on filesystem writes, unless following quirk is passed: echo 04e8:5136:m > /sys/module/usb_storage/parameters/quirks Add corresponding entry to unusual_devs.h Signed-off-by: Dmitry Artamonow <mad_soft@inbox.ru> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Hannes Reinecke authored
commit 00eed9c8 upstream. xhci has its own interrupt enabling routine, which will try to use MSI-X/MSI if present. So the usb core shouldn't try to enable legacy interrupts; on some machines the xhci legacy IRQ setting is invalid. v3: Be careful to not break XHCI_BROKEN_MSI workaround (by trenn) Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Oliver Neukum <oneukum@suse.de> Cc: Thomas Renninger <trenn@suse.de> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Frederik Himpe <fhimpe@vub.ac.be> Cc: David Haerdeman <david@hardeman.nu> Cc: Alan Stern <stern@rowland.harvard.edu> Acked-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Reviewed-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Steven Rostedt (Red Hat) authored
commit 613f04a0 upstream. The latency tracers require the buffers to be in overwrite mode, otherwise they get screwed up. Force the buffers to stay in overwrite mode when latency tracers are enabled. Added a flag_changed() method to the tracer structure to allow the tracers to see what flags are being changed, and also be able to prevent the change from happing. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> [bwh: Backported to 3.2: - Adjust context - Drop some changes that are not needed because trace_set_options() is not separate from tracing_trace_options_write()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Steven Rostedt (Red Hat) authored
commit 80902822 upstream. Changing the overwrite mode for the ring buffer via the trace option only sets the normal buffer. But the snapshot buffer could swap with it, and then the snapshot would be in non overwrite mode and the normal buffer would be in overwrite mode, even though the option flag states otherwise. Keep the two buffers overwrite modes in sync. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Steven Rostedt (Red Hat) authored
commit 69d34da2 upstream. Seems that the tracer flags have never been protected from synchronous writes. Luckily, admins don't usually modify the tracing flags via two different tasks. But if scripts were to be used to modify them, then they could get corrupted. Move the trace_types_lock that protects against tracers changing to also protect the flags being set. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> [bwh: Backported to 3.2: also move failure return in tracing_trace_options_write() after unlocking] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Steven Rostedt (Red Hat) authored
commit 740466bc upstream. Because function tracing is very invasive, and can even trace calls to rcu_read_lock(), RCU access in function tracing is done with preempt_disable_notrace(). This requires a synchronize_sched() for updates and not a synchronize_rcu(). Function probes (traceon, traceoff, etc) must be freed after a synchronize_sched() after its entry has been removed from the hash. But call_rcu() is used. Fix this by using call_rcu_sched(). Also fix the usage to use hlist_del_rcu() instead of hlist_del(). Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Kees Cook authored
commit 3118a4f6 upstream. It is possible to wrap the counter used to allocate the buffer for relocation copies. This could lead to heap writing overflows. CVE-2013-0913 v3: collapse test, improve comment v2: move check into validate_exec_list Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Pinkie Pie Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Kees Cook authored
commit 2563a452 upstream. Masks kernel address info-leak in object dumps with the %pK suffix, so they cannot be used to target kernel memory corruption attacks if the kptr_restrict sysctl is set. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> [bwh: Backported to 3.2: the rest of the format string is different] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Mateusz Guzik authored
commit 24261fc2 upstream. cifsFileInfo objects hold references to dentries and it is possible that these will still be around in workqueues when VFS decides to kill super block during unmount. This results in panics like this one: BUG: Dentry ffff88001f5e76c0{i=66b4a,n=1M-2} still in use (1) [unmount of cifs cifs] ------------[ cut here ]------------ kernel BUG at fs/dcache.c:943! [..] Process umount (pid: 1781, threadinfo ffff88003d6e8000, task ffff880035eeaec0) [..] Call Trace: [<ffffffff811b44f3>] shrink_dcache_for_umount+0x33/0x60 [<ffffffff8119f7fc>] generic_shutdown_super+0x2c/0xe0 [<ffffffff8119f946>] kill_anon_super+0x16/0x30 [<ffffffffa036623a>] cifs_kill_sb+0x1a/0x30 [cifs] [<ffffffff8119fcc7>] deactivate_locked_super+0x57/0x80 [<ffffffff811a085e>] deactivate_super+0x4e/0x70 [<ffffffff811bb417>] mntput_no_expire+0xd7/0x130 [<ffffffff811bc30c>] sys_umount+0x9c/0x3c0 [<ffffffff81657c19>] system_call_fastpath+0x16/0x1b Fix this by making each cifsFileInfo object hold a reference to cifs super block, which implicitly keeps VFS super block around as well. Signed-off-by: Mateusz Guzik <mguzik@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Reported-and-Tested-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Steve French <sfrench@us.ibm.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Steven Rostedt (Red Hat) authored
commit 2721e72d upstream. Although the swap is wrapped with a spin_lock, the assignment of the temp buffer used to swap is not within that lock. It needs to be moved into that lock, otherwise two swaps happening on two different CPUs, can end up using the wrong temp buffer to assign in the swap. Luckily, all current callers of the swap function appear to have their own locks. But in case something is added that allows two different callers to call the swap, then there's a chance that this race can trigger and corrupt the buffers. New code is coming soon that will allow for this race to trigger. I've Cc'd stable, so this bug will not show up if someone backports one of the changes that can trigger this bug. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Theodore Ts'o authored
commit 90ba983f upstream. A user who was using a 8TB+ file system and with a very large flexbg size (> 65536) could cause the atomic_t used in the struct flex_groups to overflow. This was detected by PaX security patchset: http://forums.grsecurity.net/viewtopic.php?f=3&t=3289&p=12551#p12551 This bug was introduced in commit 9f24e420, so it's been around since 2.6.30. :-( Fix this by using an atomic64_t for struct orlav_stats's free_clusters. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Lukas Czerner <lczerner@redhat.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Lukas Czerner authored
commit 810da240 upstream. We're using macro EXT4_B2C() to convert number of blocks to number of clusters for bigalloc file systems. However, we should be using EXT4_NUM_B2C(). Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> [bwh: Backported to 3.2: - Adjust context - Drop changes in ext4_setup_new_descs() and ext4_calculate_overhead()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Jan Kara authored
commit ad56edad upstream. jbd2_journal_dirty_metadata() didn't get a reference to journal_head it was working with. This is OK in most of the cases since the journal head should be attached to a transaction but in rare occasions when we are journalling data, __ext4_journalled_writepage() can race with jbd2_journal_invalidatepage() stripping buffers from a page and thus journal head can be freed under hands of jbd2_journal_dirty_metadata(). Fix the problem by getting own journal head reference in jbd2_journal_dirty_metadata() (and also in jbd2_journal_set_triggers() which can possibly have the same issue). Reported-by: Zheng Liu <gnehzuil.liu@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Zheng Liu authored
commit 3a225670 upstream. This commit fixes a wrong return value of the number of the allocated blocks in ext4_split_extent. When the length of blocks we want to allocate is greater than the length of the current extent, we return a wrong number. Let's see what happens in the following case when we call ext4_split_extent(). map: [48, 72] ex: [32, 64, u] 'ex' will be split into two parts: ex1: [32, 47, u] ex2: [48, 64, w] 'map->m_len' is returned from this function, and the value is 24. But the real length is 16. So it should be fixed. Meanwhile in this commit we use right length of the allocated blocks when get_reserved_cluster_alloc in ext4_ext_handle_uninitialized_extents is called. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Ben Hutchings authored
[ Upstream commit fae8563b ] Using TX push when notifying the NIC of multiple new descriptors in the ring will very occasionally cause the TX DMA engine to re-use an old descriptor. This can result in a duplicated or partly duplicated packet (new headers with old data), or an IOMMU page fault. This does not happen when the pushed descriptor is the only one written. TX push also provides little latency benefit when a packet requires more than one descriptor. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Ben Hutchings authored
[ Upstream commit 35205b21 ] efx_device_detach_sync() locks all TX queues before marking the device detached and thus disabling further TX scheduling. But it can still be interrupted by TX completions which then result in TX scheduling in soft interrupt context. This will deadlock when it tries to acquire a TX queue lock that efx_device_detach_sync() already acquired. To avoid deadlock, we must use netif_tx_{,un}lock_bh(). Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Ben Hutchings authored
[ Upstream commit 29c69a48 ] We must only ever stop TX queues when they are full or the net device is not 'ready' so far as the net core, and specifically the watchdog, is concerned. Otherwise, the watchdog may fire *immediately* if no packets have been added to the queue in the last 5 seconds. The device is ready if all the following are true: (a) It has a qdisc (b) It is marked present (c) It is running (d) The link is reported up (a) and (c) are normally true, and must not be changed by a driver. (d) is under our control, but fake link changes may disturb userland. This leaves (b). We already mark the device absent during reset and self-test, but we need to do the same during MTU changes and ring reallocation. We don't need to do this when the device is brought down because then (c) is already false. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Ben Hutchings authored
[ Upstream commits 06e63c57, b590ace0 and c73e787a ] We assume that the mapping between DMA and virtual addresses is done on whole pages, so we can find the page offset of an RX buffer using the lower bits of the DMA address. However, swiotlb maps in units of 2K, breaking this assumption. Add an explicit page_offset field to struct efx_rx_buffer. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Ben Hutchings authored
[ Upstream commit 3a68f19d ] We may currently allocate two RX DMA buffers to a page, and only unmap the page when the second is completed. We do not sync the first RX buffer to be completed; this can result in packet loss or corruption if the last RX buffer completed in a NAPI poll is the first in a page and is not DMA-coherent. (In the middle of a NAPI poll, we will handle the following RX completion and unmap the page *before* looking at the content of the first buffer.) Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Ben Hutchings authored
[ Upstream commit ebf98e79 ] efx_mcdi_poll() uses get_seconds() to read the current time and to implement a polling timeout. The use of this function was chosen partly because it could easily be replaced in a co-sim environment with a macro that read the simulated time. Unfortunately the real get_seconds() returns the system time (real time) which is subject to adjustment by e.g. ntpd. If the system time is adjusted forward during a polled MCDI operation, the effective timeout can be shorter than the intended 10 seconds, resulting in a spurious failure. It is also possible for a backward adjustment to delay detection of a areal failure. Use jiffies instead, and change MCDI_RPC_TIMEOUT to be denominated in jiffies. Also correct rounding of the timeout: check time > finish (or rather time_after(time, finish)) and not time >= finish. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Daniel Pieczko authored
[ Upstream commit c2f3b8e3 ] The assertion of netif_device_present() at the top of efx_hard_start_xmit() may fail if we don't do this. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Ben Hutchings authored
[ Upstream commits a606f432, d5e8cc6c, 525d9e82 ] The TX DMA engine issues upstream read requests when there is room in the TX FIFO for the completion. However, the fetches for the rest of the packet might be delayed by any back pressure. Since a flush must wait for an EOP, the entire flush may be delayed by back pressure. Mitigate this by disabling flow control before the flushes are started. Since PF and VF flushes run in parallel introduce fc_disable, a reference count of the number of flushes outstanding. The same principle could be applied to Falcon, but that would bring with it its own testing. We sometimes hit a "failed to flush" timeout on some TX queues, but the flushes have completed and the flush completion events seem to go missing. In this case, we can check the TX_DESC_PTR_TBL register and drain the queues if the flushes had finished. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> [bwh: Backported to 3.2: - Call efx_nic_type::finish_flush() on both success and failure paths - Check the TX_DESC_PTR_TBL registers in the polling loop - Declare efx_mcdi_set_mac() extern] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Ben Hutchings authored
[ Upstream commit bfeed902 ] On big-endian systems the MTD partition names currently have mangled subtype numbers and are not recognised by the firmware update tool (sfupdate). Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> [bwh: Backported to 3.2: use old macros for length of firmware subtype array] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Stuart Hodgson authored
[ Upstream commit 3dca9d2d ] efx_nic_fatal_interrupt() disables DMA before scheduling a reset. After this, we need not and *cannot* flush queues. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-
Hannes Frederic Sowa authored
[ Upstream commit 5a3da1fe ] This patch introduces a constant limit of the fragment queue hash table bucket list lengths. Currently the limit 128 is choosen somewhat arbitrary and just ensures that we can fill up the fragment cache with empty packets up to the default ip_frag_high_thresh limits. It should just protect from list iteration eating considerable amounts of cpu. If we reach the maximum length in one hash bucket a warning is printed. This is implemented on the caller side of inet_frag_find to distinguish between the different users of inet_fragment.c. I dropped the out of memory warning in the ipv4 fragment lookup path, because we already get a warning by the slab allocator. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jesper Dangaard Brouer <jbrouer@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-