- 27 Feb, 2015 13 commits
-
-
Eric Dumazet authored
If a hash table has 128 slots and 16384 elems, expand to 256 slots takes more than one second. For larger sets, a soft lockup is detected. Holding cpu for that long, even in a work queue is a show stopper for non preemptable kernels. cond_resched() at strategic points to allow process scheduler to reschedule us. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/netDavid S. Miller authored
Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2015-02-26 This series contains fixes for i40e and i40evf only. Alexey Khoroshilov found a possible leak of 'cmd_buf' when copy_from_user() failed in i40e_dbg_command_write(), so resolved by calling kfree(). Shannon provides a fix to ensure the shift and bitwise precedences do not work backwards for us by adding parans. Fixed the driver by preventing the driver from allowing stray interrupts or causing system logs from un-handled interrupts by combining the ICR0 shutdown with the standard interrupt shutdown and add the interrupt clearing to the PCI shutdown path. Fixed an issue where a NVM write times out before a transaction can complete, so Shannon added logic to make another attempt by reacquiring the semaphore, then retry the write, if the one retry fails, we will then give up. Adds checks to pointers before their use to ensure we do not try to dereference NULL pointers when returning values from the AdminQ calls. Akeem adds a check to bail out if the device is already down when checking for Tx hang subtask. Anjali fixes TSO with more than 8 frags per segment issue. The hardware has some limitations which the driver needs to adhere to: 1) no more than 8 descriptors per packet on the wire 2) no header can span more than 3 descriptors If one of these events happens, the hardware will generate an internal error and freeze the Tx queue, so Anjali fixes this by linearizes the skb to avoid these situations. Fixed an issue where the per Traffic Class queue count was higher than queues enabled, which will fix a warning with multiple function mode where systems regularly have more cores than vectors. Fixed TCP/IPv6 over VXLAN Tx checksum offload, where we were checking the outer protocol flags and deciding the flow for the inner header. Jesse fixes a race condition in the transmit hang detection. Before we were having issues of false Tx hang detection, no the driver makes more direct with the checks for progress forward by directly checking the head write back address and tail register when determining progress. This avoids Tx hangs where the software gets behind, because we are directly checking hardware state when determining a hang state. Neerav fixes the transmit ring Qset handle when DCB reconfigures. The issue was when DCB is reconfigured to a single traffic class (TC) and the driver did not reset the Tx ring Qset handle to correct the mapping, which caused the Tx queue to disable timeouts. Also as part of DCB reconfiguration flow if the Tx queue disable times out, then issue a PF reset to do some level of recovery. Mitch stops flow director on shutdown because, in some cases, the hardware would continue to try to access the FDIR ring after entering D3Hot state, which would cause either PCIe errors or NMIs, depending upon the system configuration. * NOTE * I have verified that this series of patches for net will not cause any merge issues when you sync up your net tree with your net-next tree. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Lendacky, Thomas authored
It is possible that the hardware may not have been properly shutdown before this driver gets control, through use by firmware, for example. Until the driver is loaded, interrupts associated with the hardware could go pending. When the IRQs are requested napi support has not been initialized yet, but the ISR will get control and schedule napi processing resulting in a kernel panic because the poll routine has not been set. Adjust the code so that the driver is fully ready to handle and process interrupts as soon as the IRQs are requested. This involves requesting and freeing IRQs during start and stop processing and ordering the napi add and delete calls appropriately. Also adjust the powerup and powerdown routines to match the start and stop routines in regards to the ordering of tasks, including napi related calls. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Luca Ceresoli authored
Just another AX88178-based 10/100/1000 USB-to-Ethernet dongle. This one shows up in lsusb as: "Sitecom Europe B.V. LN-028 Network USB 2.0 Adapter". Signed-off-by: Luca Ceresoli <luca@lucaceresoli.net> Cc: Francois Romieu <romieu@fr.zoreil.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: linux-usb@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Daniel Borkmann says: ==================== rhashtable updates As discussed, I'm sending out rhashtable fixups for -net. I have a couple of more patches I was working on last week pending, i.e. to get rid of ht->nelems and ht->shift atomic operations which speed-up pure insertions/deletions, e.g. on my laptop I have 2 threads, inserting 7M entries each, that will reduce insertion time from ~1,450 ms to 865 ms (performance should even be better after removing the grow/shrink indirections). I guess that however is rather something for net-next. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
Currently, all real users of rhashtable default their grow and shrink decision functions to rht_grow_above_75() and rht_shrink_below_30(), so that there's currently no need to have this explicitly selectable. It can/should be generic and private inside rhashtable until a real use case pops up. Since we can make this private, we'll save us this additional indirection layer and can improve insertion/deletion time as well. Reference: http://patchwork.ozlabs.org/patch/443040/Suggested-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
While commit c0c09bfd ("rhashtable: avoid unnecessary wakeup for worker queue") rightfully moved part of the decision making of whether we should expand or shrink from the expand/shrink functions themselves into insert/delete functions in order to avoid unnecessary worker wake-ups, it however introduced a regression by doing so. Before that change, if no max_shift was specified (= 0) on rhashtable initialization, rhashtable_expand() would just grow unconditionally and lets the available memory be the limiting factor. After that change, if no max_shift was specified, there would be _no_ expansion step at all. Given that netlink and tipc have a max_shift specified, it was not visible there, but Josh Hunt reported that if nft that starts out with a default element hint of 3 if not otherwise provided, would slow i.e. inserts down trememdously as it cannot grow larger to relax table occupancy. Given that the test case verifies shrinks/expands manually, we also must remove pointer to the helper functions to explicitly avoid parallel resizing on insertions/deletions. test_bucket_stats() and test_rht_lookup() could also be wrapped around rhashtable mutex to explicitly synchronize a walk from resizing, but I think that defeats the actual test case which intended to have explicit test steps, i.e. 1) inserts, 2) expands, 3) shrinks, 4) deletions, with object verification after each stage. Reported-by: Josh Hunt <johunt@akamai.com> Fixes: c0c09bfd ("rhashtable: avoid unnecessary wakeup for worker queue") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Ying Xue <ying.xue@windriver.com> Cc: Josh Hunt <johunt@akamai.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michael S. Tsirkin authored
The 2 that we use for copy_to_iter comes from sizeof(u16), it used to be that way before the iov iter update. Fix it up, making it obvious the size of stack access is right. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michael S. Tsirkin authored
Recent iterator-related changes in vhost made it harder to follow the logic fixing up the header. In fact, the fixup always happens at the same offset: sizeof(virtio_net_hdr): sometimes the fixup iterator is updated by copy_to_iter, sometimes-by iov_iter_advance. Rearrange code to make this obvious. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Dan Carpenter authored
"val" is declared as a u64 so static checkers complain that this shift can wrap. I don't have the hardware but probably it's doesn't have over 31 ports. Still we may as well silence the warning even if it's not a real bug. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Dan Carpenter authored
Make sure kmalloc() succeeds. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Hariprasad Shenai authored
When doing reads and writes to adapter memory via the PCI-E Memory Window interface, data gets swizzled on 4-byte boundaries on Big-Endian systems because we need to account for the register read/write interface which incorporates a swizzle onto the Little-Endian PCI-E Bus. Based on original work by Casey Leedom <leedom@chelsio.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sujith Sankar authored
We should complete notify_check before returning the credits. Once we return the credits, adaptor may access the notify data. Signed-off-by: Sujith Sankar <ssujith@cisco.com> Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 26 Feb, 2015 13 commits
-
-
Shannon Nelson authored
Make sure we don't try to dereference NULL pointers when returning values from the AdminQ calls. Change-ID: Ia6694f2f415d50acf0aba063c863568742799aff Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Shannon Nelson authored
In some circumstances, a multi-write transaction takes longer than the default 3 minute timeout on the write semaphore. If the write failed with an EBUSY status, this is likely the problem, so here we try to reacquire the semaphore then retry the write. We only do one retry, then give up. Change-ID: I1c8be60688acc2f39573839579baf601207c4a36 Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Mitch A Williams authored
In some cases, the hardware would continue to try to access the FDIR ring after entering D3Hot state, which would cause either PCIe errors or NMIs, depending upon system configuration. Explicitly stop FDIR in our shutdown routine to eliminate this possibility. Change-ID: I1bd9fc7fd8f151fe24cad132ac9adddab923e3af Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Shannon Nelson authored
Combine the ICR0 shutdown with the standard interrupt shutdown, and add the interrupt clearing to the PCI shutdown path. This prevents the driver from allowing stray interrupts or causing system logs from un-handled interrupts. Change-ID: I48f6ab95cad7f8ca77c1f26c92a51cc1034ced43 Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Anjali Singhai authored
We were checking the outer Protocol flags and deciding the flow for inner header. This patch fixes that. This fixes the Tx checksum offload for TCP/IPv6 over vxlan. Change-ID: I837aaea921d34f71b24c2bc32aaadea5001ddf78 Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Parikh, Neerav authored
As part of DCB reconfiguration flow if the Tx queue disable times out then issue a PF reset to do some level of recovery. Change-ID: I7550021c55bff355351c0365e61e1f05fcaff46d Signed-off-by: Neerav Parikh <neerav.parikh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Parikh, Neerav authored
When DCB is reconfigured to single TC the driver did not reset the Tx ring Qset handle to the correct mapping; which caused Tx queue disable timeouts. Change-ID: I4da5915ec92a83c281b478d653fae6ef1b72edfe Signed-off-by: Neerav Parikh <neerav.parikh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Anjali Singhai authored
When the driver or hardware gets less interrupt vectors than the actual number of CPU cores, limit the queue count for the priority queue traffic class (TC) queues. This will fix a warning with multiple function mode where systems regularly have more cores than vectors. Also add extra comment for readability. Change-ID: I4f02226263aa3995e1f5ee5503eac0cd6ee12fbd Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Jesse Brandeburg authored
The driver was having some issues with false Tx hang detection. This makes the driver a little more direct with the checks for progress forward by directly checking the head write back address and tail register when determining progress. This avoids Tx hangs where the software gets behind, because we are directly checking hardware state when determining hang state. Change-ID: I774f0e861c9e8ab5ccb213634100fe15440ae24a Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Anjali Singhai authored
The hardware has some limitations the driver needs to adhere to, that we found in extended testing. 1) no more than 8 descriptors per packet on the wire 2) no header can span more than 3 descriptors If one of these events occurs, the hardware will generate an internal error and freeze the Tx queue. This patch linearizes the skb to avoid these situations. Change-ID: I37dab7d3966e14895a9663ec4d0aaa8eb0d9e115 Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Akeem G Abodunrin authored
This patch adds check to bail out if device is already down when checking for Tx hang subtask. Change-ID: I3853fb7a6d11cb9a4c349b687cb25c15b19977a0 Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Shannon Nelson authored
Add parens to make sure the shift and bitwise precedences don't work backwards for us. Change-ID: I60c10ef4fad6bc654522b9d8a53da2e270a0f268 Reported-by: Joe Perches <joe@perches.com> Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Alexey Khoroshilov authored
The patch fixes a leak of 'cmd_buf' when copy_from_user() failed in i40e_dbg_command_write(). Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
- 25 Feb, 2015 2 commits
-
-
Andy Gospodarek authored
I have been signing off on patches with this address so I'll change it. Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tom Lendacky authored
The PHY requires different settings for the Decision Feedback Analyzer (DFE) when running in KX mode vs. KR mode. Update the code to change these settings when changing modes in order to provide a more stable link. Additionally, adjust the 10GbE PQ skew default setting to a more sane value. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 24 Feb, 2015 4 commits
-
-
Yannick Guerrini authored
Change 'firwmare' to 'firmware' Signed-off-by: Yannick Guerrini <yguerrini@tomshardware.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Vrabel authored
If the pending indexes are released /after/ pushing the Tx response then a stale pending index may be used if a new Tx request is immediately pushed by the frontend. The may cause various WARNINGs or BUGs if the stale pending index is actually still in use. Fix this by releasing the pending index before pushing the Tx response. The full barrier for the pending ring update is not required since the the Tx response push already has a suitable write barrier. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alexander Drozdov authored
Before da413eec ("packet: Fixed TPACKET V3 to signal poll when block is closed rather than every packet") poll listening for an af_packet socket was not signaled if there was no packets to process. After the patch poll is signaled evety time when block retire timer expires. That happens because af_packet closes the current block on timeout even if the block is empty. Passing empty blocks to the user not only wastes CPU but also wastes ring buffer space increasing probability of packets dropping on small timeouts. Signed-off-by: Alexander Drozdov <al.drozdov@gmail.com> Cc: Dan Collins <dan@dcollins.co.nz> Cc: Willem de Bruijn <willemb@google.com> Cc: Guy Harris <guy@alum.mit.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sasha Levin authored
Arrays (when not in a struct) "shall have a value greater than zero". GCC complains when it's not the case here. Fixes: ba7d49b1 ("rtnetlink: provide api for getting and setting slave info") Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 23 Feb, 2015 8 commits
-
-
Marcelo Leitner authored
Currently we don't check if the new MTU is valid or not and this allows one to configure a smaller than minimum allowed by RFCs or even bigger than interface own MTU, which is a problem as it may lead to packet drops. If you have a daemon like NetworkManager running, this may be exploited by remote attackers by forging RA packets with an invalid MTU, possibly leading to a DoS. (NetworkManager currently only validates for values too small, but not for too big ones.) The fix is just to make sure the new value is valid. That is, between IPV6_MIN_MTU and interface's MTU. Note that similar check is already performed at ndisc_router_discovery(), for when kernel itself parses the RA. Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vlastimil Setka authored
Incorrect NAPI polling caused WARNING at net/core/dev.c net_rx_action. Some stability issues were also seen at high throughput and system load before this patch. This patch contains several changes in altera_tse_main.c: - tse_rx() is fixed to not process more than `limit` frames - tse_poll() is refactored to match NAPI logic - only received frames are counted for return value - removed bogus condition `(rxcomplete >= budget || txcomplete > 0)` - replace by: if (rxcomplete < budget) -> call __napi_complete and enable irq - altera_isr() - replace spin_lock_irqsave() by spin_lock() - we are in isr - use spinlocks just over irq manipulation, not over __napi_schedule - reset IRQ first, then disable and schedule napi This is a cleaned up resubmission from Vlastimil's recent submission. Signed-off-by: Vlastimil Setka <setka@vsis.cz> Signed-off-by: Roman Pisl <rpisl@kky.zcu.cz> Signed-off-by: Vince Bridgers <vbridger@opensource.altera.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vlastimil Setka authored
This patch corrects a typo in the way tx_fifo_depth is read from the devicetree. This patch was submitted by Vlastimil about a week ago, and is now cleaned up and resubmitted. Signed-off-by: Vlastimil Setka <setka@vsis.cz> Signed-off-by: Vince Bridgers <vbridger@opensource.altera.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Catalin Marinas authored
With commit a7526eb5 (net: Unbreak compat_sys_{send,recv}msg), the MSG_CMSG_COMPAT flag is blocked at the compat syscall entry points, changing the kernel compat behaviour from the one before the commit it was trying to fix (1be374a0, net: Block MSG_CMSG_COMPAT in send(m)msg and recv(m)msg). On 32-bit kernels (!CONFIG_COMPAT), MSG_CMSG_COMPAT is 0 and the native 32-bit sys_sendmsg() allows flag 0x80000000 to be set (it is ignored by the kernel). However, on a 64-bit kernel, the compat ABI is different with commit a7526eb5. This patch changes the compat_sys_{send,recv}msg behaviour to the one prior to commit 1be374a0. The problem was found running 32-bit LTP (sendmsg01) binary on an arm64 kernel. Arguably, LTP should not pass 0xffffffff as flags to sendmsg() but the general rule is not to break user ABI (even when the user behaviour is not entirely sane). Fixes: a7526eb5 (net: Unbreak compat_sys_{send,recv}msg) Cc: Andy Lutomirski <luto@amacapital.net> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Fabian Frederick authored
Use helper functions to access current->state. Direct assignments are prone to races and therefore buggy. current->state = TASK_RUNNING can be replaced by __set_current_state() Thanks to Peter Zijlstra for the exact definition of the problem. Suggested-By: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jamal Hadi Salim authored
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Currently following race is possible in team: CPU0 CPU1 team_port_del team_upper_dev_unlink priv_flags &= ~IFF_TEAM_PORT team_handle_frame team_port_get_rcu team_port_exists priv_flags & IFF_TEAM_PORT == 0 return NULL (instead of port got from rx_handler_data) netdev_rx_handler_unregister The thing is that the flag is removed before rx_handler is unregistered. If team_handle_frame is called in between, team_port_exists returns 0 and team_port_get_rcu will return NULL. So do not check the flag here. It is guaranteed by netdev_rx_handler_unregister that team_handle_frame will always see valid rx_handler_data pointer. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Fixes: 3d249d4c ("net: introduce ethernet teaming device") Signed-off-by: David S. Miller <davem@davemloft.net>
-
Rasmus Villemoes authored
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
-