- 18 Aug, 2015 40 commits
-
-
Todd Fujinaka authored
Recent changes to igb_probe_vfs() could lead to the PF holding onto all of the queues. Reorder igb_probe_vfs() to be before gb_init_queue_configuration() and add some more error checking. Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Stefan Assmann authored
The driver doesn't clear buffer_info->dma after calling dma_unmap_single() in all cases. This has been discovered by changing the mtu twice, which caused the following backtrace. [ 68.569280] WARNING: CPU: 2 PID: 1860 at drivers/iommu/intel-iommu.c:3517 intel_unmap+0x20c/0x220() [ 68.579392] Driver unmaps unmatched page at PFN fffc2a40 [ 68.585322] Modules linked in: igbvf ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat kvm_intel kvm igb megs [ 68.599163] CPU: 2 PID: 1860 Comm: ifconfig Not tainted 4.2.0-rc4+ #147 [ 68.606543] Hardware name: IBM -[546025Z]-/00Y7630, BIOS -[VVE134TUS-1.51]- 10/17/2013 [ 68.615473] 0000000000000dbd ffff88046441bb08 ffffffff81a5ad0b ffffffff81e2f9ea [ 68.623775] ffff88046441bb58 ffff88046441bb48 ffffffff81056b55 ffff88047fc583c0 [ 68.632075] 0000000000000000 ffff880469a8e600 00000000fffc2a40 ffff880465b32098 [ 68.640375] Call Trace: [ 68.643109] [<ffffffff81a5ad0b>] dump_stack+0x48/0x5d [ 68.648844] [<ffffffff81056b55>] warn_slowpath_common+0x95/0xe0 [ 68.655549] [<ffffffff81056c56>] warn_slowpath_fmt+0x46/0x70 [ 68.661960] [<ffffffff8158a614>] ? find_iova+0x54/0x90 [ 68.667791] [<ffffffff815988dc>] intel_unmap+0x20c/0x220 [ 68.673815] [<ffffffff8159891e>] intel_unmap_page+0xe/0x10 [ 68.680038] [<ffffffffa0067536>] igbvf_clean_rx_ring+0x96/0x370 [igbvf] [ 68.687516] [<ffffffffa0067915>] igbvf_down+0x105/0x110 [igbvf] [ 68.694219] [<ffffffffa0067beb>] igbvf_change_mtu+0x16b/0x180 [igbvf] [...] Signed-off-by: Stefan Assmann <sassmann@kpanic.de> Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Jia-Ju Bai authored
In error handling code of igb_probe, the memory adapter->shadow_vfta allocated by kcalloc in igb_sw_init is not freed. So when register_netdev or igb_init_i2c is failed, a memory leak will occur. This patch adds kfree to fix it. Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Jia-Ju Bai authored
When e1000e_setup_rx_resources is failed in e1000_open, e1000e_free_tx_resources in "err_setup_rx" segment is executed. "writel(0, tx_ring->head)" statement in e1000_clean_tx_ring in e1000e_free_tx_resources will cause a null poonter dereference(crash), because "tx_ring->head" is only assigned in e1000_configure_tx in e1000_configure, but it is after e1000e_setup_rx_resources. This patch moves head/tail register writing to e1000_configure_tx/rx, which can fix this problem. It is inspired by igb_configure_tx_ring in the igb driver. Specially, thank Alexander Duyck for his valuable suggestion. Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Jia-Ju Bai authored
When igb_init_interrupt_scheme in igb_sriov_reinit is failed, the lock acquired by rtnl_lock() is not released, which causes a deadlock. This patch adds rtnl_unlock() in error handling to fix it. Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Jia-Ju Bai authored
When pci_dma_mapping_error in e100_xmit_prepare is failed, the skb buffer allocated by netdev_alloc_skb_ip_align in e100_rx_alloc_skb is not released, which causes a possible resource leak. This patch adds error handling code to fix it. Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Jia-Ju Bai authored
The driver lacks the check of nic->cbs_pool after pci_pool_create in e100_probe. When this function is failed, a null pointer dereference occurs when pci_pool_alloc uses nic->cbs_pool in e100_alloc_cbs. This patch adds a check and related error handling code to fix it. Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Alex Williamson authored
When the .remove() callback for a PF is called, SR-IOV support for the device is disabled, which requires unbinding and removing the VFs. The VFs may be in-use either by the host kernel or userspace, such as assigned to a VM through vfio-pci. In this latter case, the VFs may be removed either by shutting down the VM or hot-unplugging the devices from the VM. Unfortunately in the case of a Windows 2012 R2 guest, hot-unplug is broken due to the ordering of the PF driver teardown. Disabling SR-IOV prior to unregister_netdev() avoids this issue. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Todd Fujinaka authored
This patch adds support for Marvell PHY 1512 (required for I354). Submitted by: Maciej Szwed <maciej.szwed@intel.com> Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Richard Cochran authored
In addition to interrupt driven target time output events, the i210 also has two programmable clock outputs. These clocks support periods between 16 nanoseconds and 140 milliseconds. This patch implements the periodic output function using the clock outputs when possible, falling back to the target time for longer periods. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Stefan Assmann authored
During driver probing the following code path is triggered. igb_probe ->igb_sw_init ->igb_probe_vfs ->igb_pci_enable_sriov ->igb_sriov_reinit Doing the SR-IOV re-init is not necessary during probing since we're starting from scratch. Here we can call igb_enable_sriov() right away. Running igb_sriov_reinit() during igb_probe() also seems to cause occasional packet loss on some onboard 82576 NICs. Reproduced on Dell and HP servers with onboard 82576 NICs. Example: Intel Corporation 82576 Gigabit Network Connection [8086:10c9] (rev 01) Subsystem: Dell Device [1028:0481] Signed-off-by: Stefan Assmann <sassmann@kpanic.de> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Vasily Averin authored
Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Shota Suzuki authored
When initializing igb driver (e.g. 82576, I350), IGB_FLAG_QUEUE_PAIRS is set if adapter->rss_queues exceeds half of max_rss_queues in igb_init_queue_configuration(). On the other hand, IGB_FLAG_QUEUE_PAIRS is not set even if the number of queues exceeds half of max_combined in igb_set_channels() when changing the number of queues by "ethtool -L". In this case, if numvecs is larger than MAX_MSIX_ENTRIES (10), the size of adapter->msix_entries[], an overflow can occur in igb_set_interrupt_capability(), which in turn leads to an oops. Fix this problem as follows: - When changing the number of queues by "ethtool -L", set IGB_FLAG_QUEUE_PAIRS in the same way as initializing igb driver. - When increasing the size of q_vector, reallocate it appropriately. (With IGB_FLAG_QUEUE_PAIRS set, the size of q_vector gets larger.) Another possible way to fix this problem is to cap the queues at its initial number, which is the number of the initial online cpus. But this is not the optimal way because we cannot increase queues when another cpu becomes online. Note that before commit cd14ef54 ("igb: Change to use statically allocated array for MSIx entries"), this problem did not cause oops but just made the number of queues become 1 because of entering msi_only mode in igb_set_interrupt_capability(). Fixes: 907b7835 ("igb: Add ethtool support to configure number of channels") CC: stable <stable@vger.kernel.org> Signed-off-by: Shota Suzuki <suzuki_shota_t3@lab.ntt.co.jp> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
David S. Miller authored
Phil Sutter says: ==================== net: Convert drivers to IFF_NO_QUEUE and cleanup afterwards This series converts in-tree users away from the old and deprecated 'tx_queue_len = 0' idiom, adds a warning to notify out-of-tree driver maintainers that there is need for action on their behalf and finally drops any workarounds in scheduling algorithm implementations. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Phil Sutter authored
Those were all workarounds for the formerly double meaning of tx_queue_len, which broke scheduling algorithms if untreated. Now that all in-tree drivers have been converted away from setting tx_queue_len = 0, it should be safe to drop these workarounds for categorically broken setups. Signed-off-by: Phil Sutter <phil@nwl.cc> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Phil Sutter authored
Due to the introduction of IFF_NO_QUEUE, there is a better way for drivers to indicate that no qdisc should be attached by default. Though, the old convention can't be dropped since ignoring that setting would break drivers still using it. Instead, add a warning so out-of-tree driver maintainers get a chance to adjust their code before we finally get rid of any special handling of tx_queue_len == 0. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Phil Sutter authored
Signed-off-by: Phil Sutter <phil@nwl.cc> Cc: Johnny Kim <johnny.kim@atmel.com> Cc: Rachel Kim <rachel.kim@atmel.com> Cc: Dean Lee <dean.lee@atmel.com> Cc: Chris Park <chris.park@atmel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Phil Sutter authored
Signed-off-by: Phil Sutter <phil@nwl.cc> Cc: Dmitry Tarnyagin <dmitry.tarnyagin@lockless.no> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Phil Sutter authored
Signed-off-by: Phil Sutter <phil@nwl.cc> Cc: Arvid Brodin <arvid.brodin@alten.se> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Phil Sutter authored
Signed-off-by: Phil Sutter <phil@nwl.cc> Cc: Marek Lindner <mareklindner@neomailbox.ch> Cc: Simon Wunderlich <sw@simonwunderlich.de> Cc: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Phil Sutter authored
Signed-off-by: Phil Sutter <phil@nwl.cc> Cc: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Phil Sutter authored
Signed-off-by: Phil Sutter <phil@nwl.cc> Cc: Jouni Malinen <j@w1.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Phil Sutter authored
Signed-off-by: Phil Sutter <phil@nwl.cc> Cc: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Phil Sutter authored
Signed-off-by: Phil Sutter <phil@nwl.cc> Cc: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Phil Sutter authored
Signed-off-by: Phil Sutter <phil@nwl.cc> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Phil Sutter authored
Signed-off-by: Phil Sutter <phil@nwl.cc> Cc: Alexander Aring <alex.aring@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Phil Sutter authored
Signed-off-by: Phil Sutter <phil@nwl.cc> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Phil Sutter authored
Signed-off-by: Phil Sutter <phil@nwl.cc> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Phil Sutter authored
Signed-off-by: Phil Sutter <phil@nwl.cc> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Phil Sutter authored
Signed-off-by: Phil Sutter <phil@nwl.cc> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Phil Sutter authored
Signed-off-by: Phil Sutter <phil@nwl.cc> Cc: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Phil Sutter authored
Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Phil Sutter authored
Signed-off-by: Phil Sutter <phil@nwl.cc> Cc: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Phil Sutter authored
Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Phil Sutter authored
Signed-off-by: Phil Sutter <phil@nwl.cc> Cc: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Tom Herbert says: ==================== net: Identifier Locator Addressing - Part I This patch set provides rudimentary support for Identifier Locator Addressing or ILA. The basic concept of ILA is that we split an IPv6 address into a 64 bit locator and 64 bit identifier. The identifier is the identity of an entity in communication ("who"), and the locator expresses the location of the entity ("where"). Applications use externally visible address that contains the identifier. When a packet is actually sent, a translation is done that overwrites the first 64 bits of the address with a locator. The packet can then be forwarded over the network to the host where the addressed entity is located. At the receiver, the reverse translation is done so the that the application sees the original, untranslated address. Presumably an external control plane will provide identifier->locator mappings. v2: - Fix compilation erros when LWT not configured - Consolidate ILA into a single ila.c v3: - Change pseudohdr argument od inet_proto_csum_replace functions to be a bool v4: - In ila_build_state check locator being in netlink params before allocating tunnel state The data path for ILA is a simple NAT translation that only operates on the upper 64 bits of a destination address in IPv6 packets. The basic process is: 1) Lookup 64 bit identifier (lower 64 bits of destination) 2) If a match is found a) Overwrite locator (upper 64 bits of destination) with the new locator b) Adjust any checksum that has destination address included in pseudo header 3) Send or receive packet ILA is a means to implement tunnels or network virtualization without encapsulation. Since there is no encapsulation involved, we assume that stateless support in the network for IPv6 (e.g. RSS, ECMP, TSO, etc.) just works. Also, since we're minimally changing the packet many of the worries about encapsulation (MTU, checksum, fragmentation) are not relevant. The downside is that, ILA is not extensible like other encapsulations (GUE for instance) so it might not be appropriate for all use cases. Also, this only makes sense to do in IPv6! A key aspect of ILA is performance. The intent is that ILA would be used in data centers in virtualizing tasks or jobs. In the fullest incarnation all intra data center communications might be targeted to virtual ILA addresses. This is basically adding a new virtualization capability to the existing services in a datacenter, so there is a strong expectation is that this does not degrade performance for existing applications. Performance seems to be dependent on how ILA is hooked into kernel. ILA can be implemented under some different models: - Mechanically it is a form a stateless DNAT - It can be thought of as a type of (source) routing - As a functional replacement of encapsulation In this patch set we hook into the data path using Light Weight Tunnels (LWT) infrastructure. As part of that, we add support in LWT to redirect dst input. iproute will be modified to take a new ila encap type. ILA can be configured like: ip route add 3333:0:0:1:5555:0:2:0/128 \ encap ila 2001:0:0:2 via 2401:db00:20:911a:face:0:27:0 ip -6 addr add 3333:0:0:1:5555:0:1:0/128 dev eth0 ip route add table local local 2001:0:0:1:5555:0:1:0/128 encap ila 3333:0:0:1 dev lo So sending to destination 3333:0:0:1:5555:0:2:0 will have destination of 2001:0:0:2:5555:0:2:0 on the wire. Performance results are below. With ILA we see about a 10% drop in pps compared to non-ILA. Much of this drop can be attributed to the loss of early demux on input (translation occurs after it is attempted). We will address this in the next patch set. Also, IPvlan input path does not work with ILA since the routing is bypassed-- this will be addressed in a future patch. Performance testing: Performing netperf TCP_RR with 200 clients: Non-ILA baseline 84.92% CPU utilization 1861922.9 tps 93/163/330 50/90/99% latencies ILA single destination 83.16% CPU utilization 1679683.4 tps 105/180/332 50/90/99% latencies References: Slides from netconf: http://vger.kernel.org/netconf2015Herbert-ILA.pdf Slides from presentation at IETF: https://www.ietf.org/proceedings/92/slides/slides-92-nvo3-1.pdf I-D: https://tools.ietf.org/html/draft-herbert-nvo3-ila-00 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tom Herbert authored
Adding new module name ila. This implements ILA translation. Light weight tunnel redirection is used to perform the translation in the data path. This is configured by the "ip -6 route" command using the "encap ila <locator>" option, where <locator> is the value to set in destination locator of the packet. e.g. ip -6 route add 3333:0:0:1:5555:0:1:0/128 \ encap ila 2001:0:0:1 via 2401:db00:20:911a:face:0:25:0 Sets a route where 3333:0:0:1 will be overwritten by 2001:0:0:1 on output. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tom Herbert authored
This function updates a checksum field value and skb->csum based on a value which is the difference between the old and new checksum. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tom Herbert authored
inet_proto_csum_replace4,2,16 take a pseudohdr argument which indicates the checksum field carries a pseudo header. This argument should be a boolean instead of an int. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tom Herbert authored
This patch adds the capability to redirect dst input in the same way that dst output is redirected by LWT. Also, save the original dst.input and and dst.out when setting up lwtunnel redirection. These can be called by the client as a pass- through. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-