- 27 Jul, 2015 40 commits
-
-
David S. Miller authored
Amir Vadai says: ==================== net/mlx4_en: Hardware accelerated 802.1ad This patchset by Hadar introduces support in Hardware accelerated 802.1ad, for ConnectX-3pro NIC's. In order to support existing deployment, and due to some hardware limitations, the feature is disabled by default, and needed to be enabled using a private flag in ethtool. Ofcourse user can enable the private flag only if hardware has support. After being enabled, the standard ethtool -k/-K can be used. Patchset was applied and tested over commit 71790a27 ("hv_netvsc: Add structs and handlers for VF messages") ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Hadar Hen Zion authored
To enable device support in accelerated 802.1ad vlan, the port capability "packet has vlan enable" (phv_en) should be set. Firmware won't work properly, in case phv_en is not set. The user can enable "phv_en" port capability with the new ethtool private flag phv-bit. The phv-bit private flag default value is OFF, users who are interested in 802.1ad hardware acceleration should turn ON the phv-bit private flag: $ ethtool --set-priv-flags eth1 phv-bit on Once the private flag is set, the device is ready for 802.1ad vlan acceleration. The user should also change the interface device features and turn on "tx-vlan-stag-hw-insert" which is off by default: $ ethtool -K eth1 tx-vlan-stag-hw-insert on "phv-bit" private flag setting is available only for Physical Functions(PF), the Virtual Function (VF) will be able to use the feature by setting "tx-vlan-stag-hw-insert" ethtool device feature only if the feature was enabled by the Hypervisor. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Hadar Hen Zion authored
To add Hardware accelerated support in 802.1ad vlan, replace Current VLAN macros to CVLAN. Replace: MLX4_WQE_CTRL_INS_VLAN MLX4_CQE_VLAN_PRESENT_MASK With: MLX4_WQE_CTRL_INS_CVLAN MLX4_CQE_CVLAN_PRESENT_MASK Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Hadar Hen Zion authored
Currently we support only one ethtool private flag. Prepare mlx4_en_set_priv_flags function to support more than one private flag. Will be used in the next patch to support hardware accelerated 802.1ad vlan. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Hadar Hen Zion authored
mlx4_core preparation to support hardware accelerated 802.1ad VLAN device. To allow 802.1ad accelerated device, "packet has vlan" (phv) Firmware capability should be available. Firmware without the phv capability won't behave properly and can't support 802.1ad device acceleration. The driver checks the Firmware capability and sets the phv bit accordingly in SET_PORT command. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Nicolas Schichan says: ==================== ARM BPF JIT features This series adds support for more instructions to the ARM BPF JIT namely skb netdevice type retrieval, skb payload offset retrieval, and skb packet type retrieval. This allows 35 tests to use the JIT instead of 29 before. This series depends on the "BPF JIT fixes for ARM" serie sent earlier. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nicolas Schichan authored
Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nicolas Schichan authored
Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nicolas Schichan authored
Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Dan Carpenter authored
kfree_skb() is correct here. Fixes: ffce4196 ('lwtunnel: support dst output redirect function') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
While doing experiments with reordering resilience, we found linux senders were not able to send at full speed under reordering, because every incoming SACK was releasing one MSS. This patch removes the limitation, as we did for CWR state in commit a0ea700e ("tcp: tso: allow CA_CWR state in tcp_tso_should_defer()") Neal Cardwell had a concern about limited transmit so Yuchung conducted experiments on GFE and found nothing worth adding an extra check on fast path : if (icsk->icsk_ca_state == TCP_CA_Disorder && tcp_sk(sk)->reordering == sysctl_tcp_reordering) goto send_now; Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sergei Shtylyov authored
Renesas Ethernet AVB controller requires that all data are aligned on 4-byte boundary. While it's easily achievable for the RX data with the help of skb_reserve() (we even align on 128-byte boundary as recommended by the manual), we can't do the same with the TX data, and it always comes unaligned from the networking core. Originally we solved it an easy way, copying all packet to a preallocated aligned buffer; however, it's enough to copy only up to 3 first bytes from each packet, doing the transfer using 2 TX descriptors instead of just 1. Here's an implementation of the new TX algorithm that significantly reduces the driver's memory requirements. Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Guenter Roeck authored
Move the temperature sensing code for mv88e6352 and mv88e6320 families into mv88e6xxx.c to simplify adding support for additional chips. With this change, mv88e6xxx_6320_family() no longer needs to be a global function and is made static. Cc: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Haiyang Zhang authored
This patch adds data structures and handlers for messages related to SRIOV Virtual Function. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Martin KaFai Lau says: ==================== ipv6: Avoid rt6_probe() taking writer lock in the fast path v1 -> v2: 1. Separate the code re-arrangement into another patch 2. Fix style ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Martin KaFai Lau authored
The patch checks neigh->nud_state before acquiring the writer lock. Note that rt6_probe() is only used in CONFIG_IPV6_ROUTER_PREF. 40 udpflood processes and a /64 gateway route are used. The gateway has NUD_PERMANENT. Each of them is run for 30s. At the end, the total number of finished sendto(): Before: 55M After: 95M Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> CC: Julian Anastasov <ja@ssi.bg> CC: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Martin KaFai Lau authored
It is a prep work for the next patch to remove write_lock from rt6_probe(). 1. Reduce the number of if(neigh) check. From 4 to 1. 2. Bring the write_(un)lock() closer to the operations that the lock is protecting. Hopefully, the above make rt6_probe() more readable. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Julian Anastasov <ja@ssi.bg> Cc: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nikolay Aleksandrov authored
num_grat_arp wasn't converted to the new bonding option API, so do this now and remove the specific sysfs store option in order to use the standard one. num_grat_arp is the same as num_unsol_na so add it as an alias with the same option settings. An important difference is the option name which is matched in bond_sysfs_store_option(). Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Veaceslav Falico <vfalico@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Shaohui Xie authored
When using fiber port, the phy cannot report it's auto negotiation state, driver should always report auto negotiation is done when using fiber port. Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nicolas Dichtel authored
It saves some lines and simplify a bit the code when the state is returning by this function. It's also useful to handle a NULL entry. To avoid too long lines, I've also renamed lwtunnel_state_get() and lwtunnel_state_put() to lwtstate_get() and lwtstate_put(). CC: Thomas Graf <tgraf@suug.ch> CC: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nicolas Dichtel authored
We need to copy this field (ip6_rt_cache_alloc() and ip6_rt_pcpu_alloc() use ip6_rt_copy_init() to build a dst). CC: Thomas Graf <tgraf@suug.ch> CC: Roopa Prabhu <roopa@cumulusnetworks.com> Fixes: 19e42e45 ("ipv6: support for fib route lwtunnel encap attributes") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nicolas Dichtel authored
This function make sense only when LWTUNNEL_STATE_OUTPUT_REDIRECT is set. The check is already done in IPv4. CC: Thomas Graf <tgraf@suug.ch> CC: Roopa Prabhu <roopa@cumulusnetworks.com> Fixes: 74a0f2fe ("ipv6: rt6_info output redirect to tunnel output") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Wu Fengguang authored
drivers/net/phy/dp83867.c:126:1-4: WARNING: end returns can be simpified drivers/net/phy/dp83867.c:74:5-8: WARNING: end returns can be simpified if tested value is negative or 0 Simplify a trivial if-return sequence. Possibly combine with a preceding function call. Generated by: scripts/coccinelle/misc/simple_return.cocci CC: Dan Murphy <dmurphy@ti.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
subashab@codeaurora.org authored
Fix the following typo - unchainged -> unchanged Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alex Gartrell authored
mov %rsp, %r1 ; r1 = rsp add $-8, %r1 ; r1 = rsp - 8 store_q $123, -8(%rsp) ; *(u64*)r1 = 123 <- valid store_q $123, (%r1) ; *(u64*)r1 = 123 <- previously invalid mov $0, %r0 exit ; Always need to exit And we'd get the following error: 0: (bf) r1 = r10 1: (07) r1 += -8 2: (7a) *(u64 *)(r10 -8) = 999 3: (7a) *(u64 *)(r1 +0) = 999 R1 invalid mem access 'fp' Unable to load program We already know that a register is a stack address and the appropriate offset, so we should be able to validate those references as well. Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Amir Vadai says: ==================== ConnectX-4 driver update 2015-07-23 This patchset introduce some performance enhancements to the ConnectX-4 driver. 1. Improving RSS distribution, and make RSS function controlable using ethtool. 2. Make memory that is written by NIC and read by host CPU allocate in the local NUMA to the processing CPU 3. Support tx copybreak 4. Using hardware feature called blueflame to save DMA reads when possible Another patch by Achiad fix some cosmetic issues in the driver. Patchset was applied and tested on top of commit 045a0fa0 ("ip_tunnel: Call ip_tunnel_core_init() from inet_init()") ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Achiad Shochat authored
In addition to the source/destination IP which are already hashed. Only for unicast traffic for now. Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Achiad Shochat authored
No logical change in this commit. Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Achiad Shochat authored
A regular TX WQE execution involves two or more DMA reads - one to fetch the WQE, and another one per WQE gather entry. These DMA reads obviously increase the TX latency. There are two mlx5 mechanisms to bypass these DMA reads: 1) Inline WQE 2) Blue Flame (BF) An inline WQE contains a whole packet, thus saves the DMA read/s of the regular WQE gather entry/s. Inline WQE support was already added in the previous commit. A BF WQE is written directly to the device I/O mapped memory, thus enables saving the DMA read that fetches the WQE. The BF WQE I/O write must be in cache line granularity, thus uses the CPU write combining mechanism. A BF WQE I/O write acts also as a TX doorbell for notifying the device of new TX WQEs. A BF WQE is written to the same I/O mapped address as the regular TX doorbell, thus this address is being mapped twice - once by ioremap() and once by io_mapping_map_wc(). While both mechanisms reduce the TX latency, they both consume more CPU cycles than a regular WQE: - A BF WQE must still be written to host memory, in addition to being written directly to the device I/O mapped memory. - An inline WQE involves copying the SKB data into it. To handle this tradeoff, we introduce here a heuristic algorithm that strives to avoid using these two mechanisms in case the TX queue is being back-pressured by the device, and limit their usage rate otherwise. An inline WQE will always be "Blue Flamed" (written directly to the device I/O mapped memory) while a BF WQE may not be inlined (may contain gather entries). Preliminary testing using netperf UDP_RR shows that the latency goes down from 17.5us to 16.9us, while the message rate (tested with pktgen) stays the same. Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Achiad Shochat authored
AKA inline WQE. A TX latency optimization to save data gather DMA reads. Controlled by ETHTOOL_TX_COPYBREAK. Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Saeed Mahameed authored
By affinity hints and XPS, each mlx5e channel is assigned a CPU core. Channel DMA coherent memory that is written by the NIC and read by SW (e.g CQ buffer) is allocated on the NUMA node of the CPU core assigned for the channel. Channel DMA coherent memory that is written by SW and read by the NIC (e.g SQ/RQ buffer) is allocated on the NUMA node of the NIC. Doorbell record (written by SW and read by the NIC) is an exception since it is accessed by SW more frequently. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Saeed Mahameed authored
The ConnectX-4 HW implements inverted XOR8. To make it act as XOR we re-order the HW RSS indirection table. Set XOR to be the default RSS hash function and add ethtool API to control it. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
WingMan Kwok says: ==================== net: netcp: Bug fixes of CPSW statistics collection This patch set contains bug fixes and enhencements of hw ethernet statistics processing on TI's Keystone2 CPSW ethernet switches. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
WingMan Kwok authored
This patch adds the missing statistics for the host and slave ports of the CPSW on K2L and K2E platforms. Signed-off-by: WingMan Kwok <w-kwok2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
WingMan Kwok authored
In certain applications it's beneficial to allow the CPSW h/w stats counters to continue to increment even while the kernel polls them. This patch implements this behavior for both 1G and 10G ethernet subsystem modules. Signed-off-by: WingMan Kwok <w-kwok2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
WingMan Kwok authored
Different Keystone2 platforms have different number and layouts of hw statistics modules. This patch consolidates the statistics processing of different Keystone2 platforms for easy maintenance. Signed-off-by: WingMan Kwok <w-kwok2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
WingMan Kwok authored
The CPSW driver keeps internally some, but not all, of the statistics available in the hw statistics modules. Furthermore, some of the locations in the hw statistics modules are reserved and contain no useful information. Prior to this patch, the driver allocates memory of the size of the the whole hw statistics modules, instead of the size of statistics-entries-interested-in (i.e. et_stats), for internal storage. This patch fixes that. Signed-off-by: WingMan Kwok <w-kwok2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
WingMan Kwok authored
This patch fixes error in the setting of the hw statistics module base for K2HK platform. In K2HK although there are 4 hw statistics modules, but only 2 are visible at a time. Thus when setting up the pointers to the base of the corresponding hw statistics modules, modules 0 and 2 should point to one base, while modules 1 and 3 should point to the other. Signed-off-by: WingMan Kwok <w-kwok2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
WingMan Kwok authored
This patch fixes a bug in which the timer routine synchronized against the ethtool-triggered statistics updates with spin_lock_bh(). A timer function is itself a bottom-half, so this should be spin_lock(). Signed-off-by: WingMan Kwok <w-kwok2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Hariprasad Shenai authored
VF driver was reading incorrect freelist congestion notification threshold for FLM queues when packing is enabled for T5 and T6 adapter. Fixing it now. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-