- 19 Sep, 2016 34 commits
-
-
David S. Miller authored
Florian Westphal says: ==================== sched: convert queues to single-linked list During Netfilter Workshop 2016 Eric Dumazet pointed out that qdisc schedulers use doubly-linked lists, even though single-linked list would be enough. The double-linked skb lists incur one extra write on enqueue/dequeue operations (to change ->prev pointer of next list elem). This series converts qdiscs to single-linked version, listhead maintains pointers to first (for dequeue) and last skb (for enqueue). Most qdiscs don't queue at all and instead use a leaf qdisc (typically pfifo_fast) so only a few schedulers needed changes. I briefly tested netem and htb and they seemed fine. UDP_STREAM netperf with 64 byte packets via veth+pfifo_fast shows a small (~2%) improvement. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Westphal authored
This change replaces sk_buff_head struct in Qdiscs with new qdisc_skb_head. Its similar to the skb_buff_head api, but does not use skb->prev pointers. Qdiscs will commonly enqueue at the tail of a list and dequeue at head. While skb_buff_head works fine for this, enqueue/dequeue needs to also adjust the prev pointer of next element. The ->prev pointer is not required for qdiscs so we can just leave it undefined and avoid one cacheline write access for en/dequeue. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Westphal authored
After previous patch these functions are identical. Replace __skb_dequeue in qdiscs with __qdisc_dequeue_head. Next patch will then make __qdisc_dequeue_head handle single-linked list instead of strcut sk_buff_head argument. Doesn't change generated code. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Westphal authored
Moves qdisc stat accouting to qdisc_dequeue_head. The only direct caller of the __qdisc_dequeue_head version open-codes this now. This allows us to later use __qdisc_dequeue_head as a replacement of __skb_dequeue() (which operates on sk_buff_head list). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Westphal authored
A followup change will replace the sk_buff_head in the qdisc struct with a slightly different list. Use of the sk_buff_head helpers will thus cause compiler warnings. Open-code these accesses in an extra change to ease review. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Westphal authored
Doesn't change generated code. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Wei Yongjun authored
Fix the retrn value check which testing the wrong variable in cfg_queues_uld(). Fixes: 94cdb8bb ("cxgb4: Add support for dynamic allocation of resources for ULD") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Nelson Chang says: ==================== net: ethernet: mediatek: add HW LRO functions The series add the large receive offload (LRO) functions by hardware and the ethtool functions to configure RX flows of HW LRO. changes since v3: - Respin the patch by the newer driver - Move the dts description of hwlro to optional properties changes since v2: - Add ndo_fix_features to prevent NETIF_F_LRO off while RX flow is programmed - Rephrase the dts property is a capability if the hardware supports LRO changes since v1: - Add HW LRO support - Add ethtool hooks to set LRO RX flows ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nelson Chang authored
Add the dts property for the capability if the hardware supports LRO. Signed-off-by: Nelson Chang <nelson.chang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nelson Chang authored
The codes add ethtool functions to set RX flows for HW LRO. Because the HW LRO hardware can only recognize the destination IP of TCP/IP RX flows, the ethtool command to add HW LRO flow is as below: ethtool -N [devname] flow-type tcp4 dst-ip [ip_addr] loc [0~1] Otherwise, cause the hardware can set total four destination IPs, each GMAC (GMAC1/GMAC2) can set two IPs separately at most. Signed-off-by: Nelson Chang <nelson.chang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nelson Chang authored
The codes add the large receive offload (LRO) functions by hardware as below: 1) PDMA has total four RX rings that one is the normal ring, and others can be configured as LRO rings. 2) Only TCP/IP RX flows can be offloaded. The hardware can set four IP addresses at most, if the destination IP of the RX flow matches one of them, it has the chance to be offloaded. 3) There three RX flows can be offloaded at most, and one flow is mapped to one RX ring. 4) If there are more than three candidate RX flows, the hardware can choose three of them by throughput comparison results. Signed-off-by: Nelson Chang <nelson.chang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Hariprasad Shenai authored
Allocate resources dynamically to cxgb4's Upper layer driver's(ULD) like cxgbit, iw_cxgb4 and cxgb4i. Allocate resources when they register with cxgb4 driver and free them while unregistering. All the queues and the interrupts for them will be allocated during ULD probe only and freed during remove. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christophe Jaillet authored
In commit 311b2177 ("sctp: simplify sk_receive_queue locking"), a call to 'skb_queue_splice_tail_init()' has been made explicit. Previously it was hidden in 'sctp_skb_list_tail()' Now, the code around it looks redundant. The '_init()' part of 'skb_queue_splice_tail_init()' should already do the same. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jesper Dangaard Brouer authored
The XDP_TX action can fail transmitting the frame in case the TX ring is full or port is down. In case of TX failure it should drop the frame, and not as now call 'break' which is the same as XDP_PASS. Fixes: 9ecc2d86 ("net/mlx4_en: add xdp forwarding and data write support") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Brenden Blanco <bblanco@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Mahesh Bandewar says: ==================== IPvlan introduce l3s mode Same old problem with new approach especially from suggestions from earlier patch-series. First thing is that this is introduced as a new mode rather than modifying the old (L3) mode. So the behavior of the existing modes is preserved as it is and the new L3s mode obeys iptables so that intended conn-tracking can work. To do this, the code uses newly added l3mdev_rcv() handler and an Iptables hook. l3mdev_rcv() to perform an inbound route lookup with the correct (IPvlan slave) interface and then IPtable-hook at LOCAL_INPUT to change the input device from master to the slave to complete the formality. Supporting stack changes are trivial changes to export symbol to get IPv4 equivalent code exported for IPv6 and to allow netfilter hook registration code to allow caller to hold RTNL. Please look into individual patches for details. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Mahesh Bandewar authored
In a typical IPvlan L3 setup where master is in default-ns and each slave is into different (slave) ns. In this setup egress packet processing for traffic originating from slave-ns will hit all NF_HOOKs in slave-ns as well as default-ns. However same is not true for ingress processing. All these NF_HOOKs are hit only in the slave-ns skipping them in the default-ns. IPvlan in L3 mode is restrictive and if admins want to deploy iptables rules in default-ns, this asymmetric data path makes it impossible to do so. This patch makes use of the l3_rcv() (added as part of l3mdev enhancements) to perform input route lookup on RX packets without changing the skb->dev and then uses nf_hook at NF_INET_LOCAL_IN to change the skb->dev just before handing over skb to L4. Signed-off-by: Mahesh Bandewar <maheshb@google.com> CC: David Ahern <dsa@cumulusnetworks.com> Reviewed-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Mahesh Bandewar authored
Add _nf_register_hooks() and _nf_unregister_hooks() calls which allow caller to hold RTNL mutex. Signed-off-by: Mahesh Bandewar <maheshb@google.com> CC: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Mahesh Bandewar authored
Make ip6_route_input_lookup available outside of ipv6 the module similar to ip_route_input_noref in the IPv4 world. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Jiri Pirko says: ==================== net: return offloaded stats as default and expose original sw stats The problem we try to handle is about offloaded forwarded packets which are not seen by kernel. Let me try to draw it: port1 port2 (HW stats are counted here) \ / \ / \ / --(A)---- ASIC --(B)-- | (C) | CPU (SW stats are counted here) Now we have couple of flows for TX and RX (direction does not matter here): 1) port1->A->ASIC->C->CPU For this flow, HW and SW stats are equal. 2) port1->A->ASIC->C->CPU->C->ASIC->B->port2 For this flow, HW and SW stats are equal. 3) port1->A->ASIC->B->port2 For this flow, SW stats are 0. The purpose of this patchset is to provide facility for user to find out the difference between flows 1+2 and 3. In other words, user will be able to see the statistics for the slow-path (through kernel). Also note that HW stats are what someone calls "accumulated" stats. Every packet counted by SW is also counted by HW. Not the other way around. As a default the accumulated stats (HW) will be exposed to user so the userspace apps can react properly. This patchset add the SW stats (flows 1+2) under offload related stats, so in the future we can expose other offload related stat in a similar way. --- v9->v10: - patch 2/3 - removed unnecessary ()s as pointed out by Nik v8->v9: - patch 2/3 - add using of idxattr and prividx v7->v8: - patch 2/3 - move helping const from uapi to rtnetlink - cancel driver xstat nesting if it is empty v6->v7: - patch 1/3: - ndo interface changed to get the wanted stats type as an input. - change commit message. - patch 2/3: - create a nesting for offloaded stat and put SW stats under it. - change the ndo call to indicate which offload stats we wants. - change commit message. - patch 3/3: - change ndo implementation to match the changes in the previous patches. - change commit message. v5->v6: - patch 2/4 was dropped as requested by Roopa - patch 1/3: - comment changed to indicate that default stats are combined stats - commit massage changed - patch 2/3: (previously 3/4) - SW stats return nothing if there is no SW stats ndo v4->v5: - updated cover letter - patch3/4: - using memcpy directly to copy stats as requested by DaveM v3->v4: - patch1/4: - fixed "return ()" pointed out by EricD - patch2/4: - fixed if_nlmsg_size as pointed out by EricD v2->v3: - patch1/4: - added dev_have_sw_stats helper - patch2/4: - avoided memcpy as requested by DaveM - patch3/4: - use new dev_have_sw_stats helper v1->v2: - patch3/4: - fixed NULL initialization ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nogah Frankel authored
Change the default statistics ndo to return HW statistics (like the one returned by ethtool_ops). The HW stats are collected to a cache by delayed work every 1 sec. Implement the offload stat ndo. Add a function to get SW statistics, to be called from this function. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nogah Frankel authored
Add a nested attribute of offload stats to if_stats_msg named IFLA_STATS_LINK_OFFLOAD_XSTATS. Under it, add SW stats, meaning stats only per packets that went via slowpath to the cpu, named IFLA_OFFLOAD_XSTATS_CPU_HIT. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nogah Frankel authored
Add a new ndo to return statistics for offloaded operation. Since there can be many different offloaded operation with many stats types, the ndo gets an attribute id by which it knows which stats are wanted. The ndo also gets a void pointer to be cast according to the attribute id. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Merge tag 'mac80211-next-for-davem-2016-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== This time we have various things - all across the board: * MU-MIMO sniffer support in mac80211 * a create_singlethread_workqueue() cleanup * interface dump filtering that was documented but not implemented * support for the new radiotap timestamp field * send delBA in two unexpected conditions (as required by the spec) * connect keys cleanups - allow only WEP with index 0-3 * per-station aggregation limit to work around broken APs * debugfs improvement for the integrated codel algorithm and various other small improvements and cleanups. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Colin Ian King authored
A couple of dev_err messages span two lines and the literal string is missing a white space between words. Add the white space and join the two lines into one. Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: FLorian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
When fq is used on 32bit kernels, we need to lock the qdisc before copying 64bit fields. Otherwise "tc -s qdisc ..." might report bogus values. Fixes: afe4fd06 ("pkt_sched: fq: Fair Queue packet scheduler") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Thadeu Lima de Souza Cascardo authored
Instead of using flow stats per NUMA node, use it per CPU. When using megaflows, the stats lock can be a bottleneck in scalability. On a E5-2690 12-core system, usual throughput went from ~4Mpps to ~15Mpps when forwarding between two 40GbE ports with a single flow configured on the datapath. This has been tested on a system with possible CPUs 0-7,16-23. After module removal, there were no corruption on the slab cache. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Cc: pravin shelar <pshelar@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Thadeu Lima de Souza Cascardo authored
On a system with only node 1 as possible, all statistics is going to be accounted on node 0 as it will have a single writer. However, when getting and clearing the statistics, node 0 is not going to be considered, as it's not a possible node. Tested that statistics are not zero on a system with only node 1 possible. Also compile-tested with CONFIG_NUMA off. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Xin Long says: ==================== sctp: fix the transmit err process This patchset is to improve the transmit err process and also fix some issues. After this patchset, once the chunks are enqueued successfully, even if the chunks fail to send out, no matter because of nodst or nomem, no err retruns back to users any more. Instead, they are taken care of by retransmit. v1->v2: - add more details to the changelog in patch 1/6 - add Fixes: tag in patch 2/6, 3/6 - also revert 69b5777f in patch 3/6 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Xin Long authored
As David and Marcelo's suggestion, ENOMEM err shouldn't return back to user in transmit path. Instead, sctp's retransmit would take care of the chunks that fail to send because of ENOMEM. This patch is only to do some release job when alloc_skb fails, not to return ENOMEM back any more. Besides, it also cleans up sctp_packet_transmit's err path, and fixes some issues in err path: - It didn't free the head skb in nomem: path. - No need to check nskb in no_route: path. - It should goto err: path if alloc_skb fails for head. - Not all the NOMEMs should free nskb. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Xin Long authored
sctp_outq_flush return value is meaningless now, this patch is to make sctp_outq_flush return void, as well as sctp_outq_fail and sctp_outq_uncork. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Xin Long authored
Every time when sctp calls sctp_outq_flush, it sends out the chunks of control queue, retransmit queue and data queue. Even if some trunks are failed to transmit, it still has to flush all the transports, as it's the only chance to clean that transmit_list. So the latest transmit error here should be returned back. This transmit error is an internal error of sctp stack. I checked all the places where it uses the transmit error (the return value of sctp_outq_flush), most of them are actually just save it to sk_err. Except for sctp_assoc/endpoint_bh_rcv, they will drop the chunk if it's failed to send a REPLY, which is actually incorrect, as we can't be sure the error that sctp_outq_flush returns is from sending that REPLY. So it's meaningless for sctp_outq_flush to return error back. This patch is to save transmit error to sk_err in sctp_outq_flush, the new error can update the old value. Eventually, sctp_wait_for_* would check for it. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Xin Long authored
Last patch "sctp: do not return the transmit err back to sctp_sendmsg" made sctp_primitive_SEND return err only when asoc state is unavailable. In this case, chunks are not enqueued, they have no chance to be freed if we don't take care of them later. This Patch is actually to revert commit 1cd4d5c4 ("sctp: remove the unused sctp_datamsg_free()"), commit 69b5777f ("sctp: hold the chunks only after the chunk is enqueued in outq") and commit 8b570dc9 ("sctp: only drop the reference on the datamsg after sending a msg"), to use sctp_datamsg_free to free the chunks of current msg. Fixes: 8b570dc9 ("sctp: only drop the reference on the datamsg after sending a msg") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Xin Long authored
Once a chunk is enqueued successfully, sctp queues can take care of it. Even if it is failed to transmit (like because of nomem), it should be put into retransmit queue. If sctp report this error to users, it confuses them, they may resend that msg, but actually in kernel sctp stack is in charge of retransmit it already. Besides, this error probably is not from the failure of transmitting current msg, but transmitting or retransmitting another msg's chunks, as sctp_outq_flush just tries to send out all transports' chunks. This patch is to make sctp_cmd_send_msg return avoid, and not return the transmit err back to sctp_sendmsg Fixes: 8b570dc9 ("sctp: only drop the reference on the datamsg after sending a msg") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Xin Long authored
Data Chunks are only sent by sctp_primitive_SEND, in which sctp checks the asoc's state through statetable before calling sctp_outq_tail. So there's no need to check the asoc's state again in sctp_outq_tail. Besides, sctp_do_sm is protected by lock_sock, even if sending msg is interrupted by timer events, the event's processes still need to acquire lock_sock first. It means no others CMDs can be enqueue into side effect list before CMD_SEND_MSG to change asoc->state, so it's safe to remove it. This patch is to remove redundant asoc->state check from sctp_outq_tail. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 17 Sep, 2016 6 commits
-
-
David S. Miller authored
Alexei Starovoitov says: ==================== ip_tunnel: add collect_md mode to IPv4/IPv6 tunnels Similar to geneve, vxlan, gre tunnels implement 'collect metadata' mode in ipip, ipip6, ip6ip6 tunnels. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alexei Starovoitov authored
the test creates 3 namespaces with veth connected via bridge. First two namespaces simulate two different hosts with the same IPv4 and IPv6 addresses configured on the tunnel interface and they communicate with outside world via standard tunnels. Third namespace creates collect_md tunnel that is driven by BPF program which selects different remote host (either first or second namespace) based on tcp dest port number while tcp dst ip is the same. This scenario is rough approximation of load balancer use case. The tests check both traditional tunnel configuration and collect_md mode. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alexei Starovoitov authored
extend existing tests for vxlan, geneve, gre to include IPIP tunnel. It tests both traditional tunnel configuration and dynamic via bpf helpers. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alexei Starovoitov authored
Similar to gre, vxlan, geneve tunnels allow IPIP6 and IP6IP6 tunnels to operate in 'collect metadata' mode. Unlike ipv4 code here it's possible to reuse ip6_tnl_xmit() function for both collect_md and traditional tunnels. bpf_skb_[gs]et_tunnel_key() helpers and ovs (in the future) are the users. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alexei Starovoitov authored
Similar to gre, vxlan, geneve tunnels allow IPIP tunnels to operate in 'collect metadata' mode. bpf_skb_[gs]et_tunnel_key() helpers can make use of it right away. ovs can use it as well in the future (once appropriate ovs-vport abstractions and user apis are added). Note that just like in other tunnels we cannot cache the dst, since tunnel_info metadata can be different for every packet. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Julia Lawall authored
Check for net_device_ops structures that are only stored in the netdev_ops field of a net_device structure. This field is declared const, so net_device_ops structures that have this property can be declared as const also. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r disable optional_qualifier@ identifier i; position p; @@ static struct net_device_ops i@p = { ... }; @ok@ identifier r.i; struct net_device e; position p; @@ e.netdev_ops = &i@p; @bad@ position p != {r.p,ok.p}; identifier r.i; struct net_device_ops e; @@ e@i@p @depends on !bad disable optional_qualifier@ identifier r.i; @@ static +const struct net_device_ops i = { ... }; // </smpl> The result of size on this file before the change is: text data bss dec hex filename 3401 931 44 4376 1118 net/l2tp/l2tp_eth.o and after the change it is: text data bss dec hex filename 3993 347 44 4384 1120 net/l2tp/l2tp_eth.o Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
-