- 13 Sep, 2014 25 commits
-
-
David S. Miller authored
John Fastabend says: ==================== net/sched rcu classifiers and tcf This series converts the tcf_proto usage to RCU. This requires updating each classifier individually to handle the new copy/update requirement and also to update the core list traversals. This makes the assumption that updates to the tables are infrequent in comparison to the packet per second being classified. On a 10Gbps running near line rate we can easily produce 12+ million packets per second so IMO this is a reasonable assumption. The updates are serialized by RTNL. I have done some basic testing on this series and do not see any immediate splats or issues. The patch series has been running on my dev systems for a month or so now and I've not seen any issues. Although my configurations are not overly complicated. My test cases at this point cover all the filters with a tight loop to add/remove filters. Some basic estimator tests where I add an estimator to the qdisc and verify the statistics accurate using pktgen. And finally I have a small script to exercise the 'tc actions' interface. Feel free to send me more tests off list and I can run them. This is prep work to drop the qdisc lock with the first target being the ingress qdisc. To be done is making the tc actions RCU safe and statistics per cpu. These patches are in the works. Comments: - Checkpatch is still giving errors on some >80 char lines I know about this. IMO the way to fix this is to restructure the sched code to avoid being so heavily indented. But doing this here bloats the patchset and anyways there are already lots of >80 chars in these files. I would prefer to keep the patches as is but let me know if others think I should fix these and I will. A follow up patch set could restructure the code and fix this throughout the code blocks. ==================== Signed-off-by:
David S. Miller <davem@davemloft.net>
-
John Fastabend authored
This patch makes the cls_bpf classifier RCU safe. The tcf_lock was being used to protect a list of cls_bpf_prog now this list is RCU safe and updates occur with rcu_replace. Signed-off-by:
John Fastabend <john.r.fastabend@intel.com> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
John Fastabend authored
Signed-off-by:
John Fastabend <john.r.fastabend@intel.com> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
John Fastabend authored
Make cls_u32 classifier safe to run without holding lock. This patch converts statistics that are kept in read section u32_classify into per cpu counters. This patch was tested with a tight u32 filter add/delete loop while generating traffic with pktgen. By running pktgen on vlan devices created on top of a physical device we can hit the qdisc layer correctly. For ingress qdisc's a loopback cable was used. for i in {1..100}; do q=`echo $i%8|bc`; echo -n "u32 tos: iteration $i on queue $q"; tc filter add dev p3p2 parent $p prio $i u32 match ip tos 0x10 0xff \ action skbedit queue_mapping $q; sleep 1; tc filter del dev p3p2 prio $i; echo -n "u32 tos hash table: iteration $i on queue $q"; tc filter add dev p3p2 parent $p protocol ip prio $i handle 628: u32 divisor 1 tc filter add dev p3p2 parent $p protocol ip prio $i u32 \ match ip protocol 17 0xff link 628: offset at 0 mask 0xf00 shift 6 plus 0 tc filter add dev p3p2 parent $p protocol ip prio $i u32 \ ht 628:0 match ip tos 0x10 0xff action skbedit queue_mapping $q sleep 2; tc filter del dev p3p2 prio $i sleep 1; done Signed-off-by:
John Fastabend <john.r.fastabend@intel.com> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
John Fastabend authored
This uses per cpu counters in cls_u32 in preparation to convert over to rcu. Signed-off-by:
John Fastabend <john.r.fastabend@intel.com> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
John Fastabend authored
Make cls_tcindex RCU safe. This patch addds a new RCU routine rcu_dereference_bh_rtnl() to check caller either holds the rcu read lock or RTNL. This is needed to handle the case where tcindex_lookup() is being called in both cases. Signed-off-by:
John Fastabend <john.r.fastabend@intel.com> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
John Fastabend authored
RCUify the route classifier. For now however spinlock's are used to protect fastmap cache. The issue here is the fastmap may be read by one CPU while the cache is being updated by another. An array of pointers could be one possible solution. Signed-off-by:
John Fastabend <john.r.fastabend@intel.com> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
John Fastabend authored
RCU'ify fw classifier. Signed-off-by:
John Fastabend <john.r.fastabend@intel.com> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
John Fastabend authored
Signed-off-by:
John Fastabend <john.r.fastabend@intel.com> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
John Fastabend authored
Make cgroup classifier safe for RCU. Also drops the calls in the classify routine that were doing a rcu_read_lock()/rcu_read_unlock(). If the rcu_read_lock() isn't held entering this routine we have issues with deleting the classifier chain so remove the unnecessary rcu_read_lock()/rcu_read_unlock() pair noting all paths AFAIK hold rcu_read_lock. If there is a case where classify is called without the rcu read lock then an rcu splat will occur and we can correct it. Signed-off-by:
John Fastabend <john.r.fastabend@intel.com> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
John Fastabend authored
Enable basic classifier for RCU. Dereferencing tp->root may look a bit strange here but it is needed by my accounting because it is allocated at init time and needs to be kfree'd at destroy time. However because it may be referenced in the classify() path we must wait an RCU grace period before free'ing it. We use kfree_rcu() and rcu_ APIs to enforce this. This pattern is used in all the classifiers. Also the hgenerator can be incremented without concern because it is always incremented under RTNL. Signed-off-by:
John Fastabend <john.r.fastabend@intel.com> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
John Fastabend authored
rcu'ify tcf_proto this allows calling tc_classify() without holding any locks. Updaters are protected by RTNL. This patch prepares the core net_sched infrastracture for running the classifier/action chains without holding the qdisc lock however it does nothing to ensure cls_xxx and act_xxx types also work without locking. Additional patches are required to address the fall out. Signed-off-by:
John Fastabend <john.r.fastabend@intel.com> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
John Fastabend authored
Add __rcu notation to qdisc handling by doing this we can make smatch output more legible. And anyways some of the cases should be using rcu_dereference() see qdisc_all_tx_empty(), qdisc_tx_chainging(), and so on. Also *wake_queue() API is commonly called from driver timer routines without rcu lock or rtnl lock. So I added rcu_read_lock() blocks around netif_wake_subqueue and netif_tx_wake_queue. Signed-off-by:
John Fastabend <john.r.fastabend@intel.com> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
John Fastabend authored
This patch makes the cls_bpf classifier RCU safe. The tcf_lock was being used to protect a list of cls_bpf_prog now this list is RCU safe and updates occur with rcu_replace. Signed-off-by:
John Fastabend <john.r.fastabend@intel.com> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
John Fastabend authored
Signed-off-by:
John Fastabend <john.r.fastabend@intel.com> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
John Fastabend authored
Make cls_u32 classifier safe to run without holding lock. This patch converts statistics that are kept in read section u32_classify into per cpu counters. This patch was tested with a tight u32 filter add/delete loop while generating traffic with pktgen. By running pktgen on vlan devices created on top of a physical device we can hit the qdisc layer correctly. For ingress qdisc's a loopback cable was used. for i in {1..100}; do q=`echo $i%8|bc`; echo -n "u32 tos: iteration $i on queue $q"; tc filter add dev p3p2 parent $p prio $i u32 match ip tos 0x10 0xff \ action skbedit queue_mapping $q; sleep 1; tc filter del dev p3p2 prio $i; echo -n "u32 tos hash table: iteration $i on queue $q"; tc filter add dev p3p2 parent $p protocol ip prio $i handle 628: u32 divisor 1 tc filter add dev p3p2 parent $p protocol ip prio $i u32 \ match ip protocol 17 0xff link 628: offset at 0 mask 0xf00 shift 6 plus 0 tc filter add dev p3p2 parent $p protocol ip prio $i u32 \ ht 628:0 match ip tos 0x10 0xff action skbedit queue_mapping $q sleep 2; tc filter del dev p3p2 prio $i sleep 1; done Signed-off-by:
John Fastabend <john.r.fastabend@intel.com> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
John Fastabend authored
This uses per cpu counters in cls_u32 in preparation to convert over to rcu. Signed-off-by:
John Fastabend <john.r.fastabend@intel.com> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
John Fastabend authored
Make cls_tcindex RCU safe. This patch addds a new RCU routine rcu_dereference_bh_rtnl() to check caller either holds the rcu read lock or RTNL. This is needed to handle the case where tcindex_lookup() is being called in both cases. Signed-off-by:
John Fastabend <john.r.fastabend@intel.com> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
John Fastabend authored
RCUify the route classifier. For now however spinlock's are used to protect fastmap cache. The issue here is the fastmap may be read by one CPU while the cache is being updated by another. An array of pointers could be one possible solution. Signed-off-by:
John Fastabend <john.r.fastabend@intel.com> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
John Fastabend authored
RCU'ify fw classifier. Signed-off-by:
John Fastabend <john.r.fastabend@intel.com> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
John Fastabend authored
Signed-off-by:
John Fastabend <john.r.fastabend@intel.com> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
John Fastabend authored
Make cgroup classifier safe for RCU. Also drops the calls in the classify routine that were doing a rcu_read_lock()/rcu_read_unlock(). If the rcu_read_lock() isn't held entering this routine we have issues with deleting the classifier chain so remove the unnecessary rcu_read_lock()/rcu_read_unlock() pair noting all paths AFAIK hold rcu_read_lock. If there is a case where classify is called without the rcu read lock then an rcu splat will occur and we can correct it. Signed-off-by:
John Fastabend <john.r.fastabend@intel.com> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
John Fastabend authored
Enable basic classifier for RCU. Dereferencing tp->root may look a bit strange here but it is needed by my accounting because it is allocated at init time and needs to be kfree'd at destroy time. However because it may be referenced in the classify() path we must wait an RCU grace period before free'ing it. We use kfree_rcu() and rcu_ APIs to enforce this. This pattern is used in all the classifiers. Also the hgenerator can be incremented without concern because it is always incremented under RTNL. Signed-off-by:
John Fastabend <john.r.fastabend@intel.com> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
John Fastabend authored
rcu'ify tcf_proto this allows calling tc_classify() without holding any locks. Updaters are protected by RTNL. This patch prepares the core net_sched infrastracture for running the classifier/action chains without holding the qdisc lock however it does nothing to ensure cls_xxx and act_xxx types also work without locking. Additional patches are required to address the fall out. Signed-off-by:
John Fastabend <john.r.fastabend@intel.com> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
John Fastabend authored
Add __rcu notation to qdisc handling by doing this we can make smatch output more legible. And anyways some of the cases should be using rcu_dereference() see qdisc_all_tx_empty(), qdisc_tx_chainging(), and so on. Also *wake_queue() API is commonly called from driver timer routines without rcu lock or rtnl lock. So I added rcu_read_lock() blocks around netif_wake_subqueue and netif_tx_wake_queue. Signed-off-by:
John Fastabend <john.r.fastabend@intel.com> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 12 Sep, 2014 8 commits
-
-
Sowmini Varadhan authored
When sending out a burst of packets across multiple descriptors, it is sufficient to send one LDC "start" trigger for the first descriptor, so do not send an LDC "start" for every pass through vnet_start_xmit. Similarly, it is sufficient to send one "DRING_STOPPED" trigger for the last dring (and if that fails, hold off and send the trigger later). Optimizations to the number of LDC messages helps avoid filling up the LDC channel with superfluous LDC messages that risk triggering flow-control on the channel, and also boosts performance. Signed-off-by:
Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by:
Raghuram Kothakota <raghuram.kothakota@oracle.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Subbaraya Sundeep Bhatta authored
calling ether_setup is redundant since alloc_etherdev calls it. Signed-off-by:
Subbaraya Sundeep Bhatta <sbhatta@xilinx.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Varka Bhadram authored
It will use pr_info_one() to print the version info of the driver in probe function only once. No need to use the static variable here. Signed-off-by:
Varka Bhadram <varkab@cdac.in> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Scott Wood authored
Commit 2abb7cdc ("udp: Add support for doing checksum unnecessary conversion") caused napi_gro_cb structs with the "flush" field zero to take the "udp_gro_receive" path rather than the "set flush to 1" path that they would previously take. As a result I saw booting from an NFS root hang shortly after starting userspace, with "server not responding" messages. This change to the handling of "flush == 0" packets appears to be incidental to the goal of adding new code in the case where skb_gro_checksum_validate_zero_check() returns zero. Based on that and the fact that it breaks things, I'm assuming that it is unintentional. Fixes: 2abb7cdc ("udp: Add support for doing checksum unnecessary conversion") Cc: Tom Herbert <therbert@google.com> Signed-off-by:
Scott Wood <scottwood@freescale.com> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Alexander Duyck says: ==================== Address reference counting issues with sock_queue_err_skb After looking over the code for skb_clone_sk after some comments made by Eric Dumazet I have come to the conclusion that skb_clone_sk is taking the correct approach in how to handle the sk_refcnt when creating a buffer that is eventually meant to be returned to the socket via the sock_queue_err_skb function. However upon review of other callers I found what I believe to be a possible reference count issue in the path for handling "wifi ack" packets. To address this I have applied the same logic that is currently in place so that the sk_refcnt will be forced to stay at least 1, or we will not provide an skb to return in the sk_error_queue. ==================== Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Alexander Duyck authored
There is a possible issue with the use, or lack thereof of sk_refcnt and sk_wmem_alloc in the wifi ack status functionality. Specifically if a socket were to request acknowledgements, and the socket were to have sk_refcnt drop to 0 resulting in it waiting on sk_wmem_alloc to reach 0 it would be possible to have sock_queue_err_skb orphan the last buffer, resulting in __sk_free being called on the socket. After this the buffer is enqueued on sk_error_queue, however the queue has already been flushed resulting in at least a memory leak, if not a data corruption. Signed-off-by:
Alexander Duyck <alexander.h.duyck@intel.com> Acked-by:
Johannes Berg <johannes@sipsolutions.net> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Alexander Duyck authored
This change adds some documentation to the call skb_clone_sk. This is meant to help clarify the purpose of the function for other developers. Signed-off-by:
Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Sébastien Barré authored
This reverts commit c801e3cc ("ipv4: Clarify in docs that accept_local requires rp_filter."). It is not needed anymore since commit 1dced6a8 ("ipv4: Restore accept_local behaviour in fib_validate_source()"). Suggested-by:
Julian Anastasov <ja@ssi.bg> Cc: Gregory Detal <gregory.detal@uclouvain.be> Cc: Christoph Paasch <christoph.paasch@uclouvain.be> Cc: Hannes Frederic Sowa <hannes@redhat.com> Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by:
Sébastien Barré <sebastien.barre@uclouvain.be> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 10 Sep, 2014 7 commits
-
-
Daniel Borkmann authored
Since BPF JIT depends on the availability of module_alloc() and module_free() helpers (HAVE_BPF_JIT and MODULES), we better build that code only in case we have BPF_JIT in our config enabled, just like with other JIT code. Fixes builds for arm/marzen_defconfig and sh/rsk7269_defconfig. ==================== kernel/built-in.o: In function `bpf_jit_binary_alloc': /home/cwang/linux/kernel/bpf/core.c:144: undefined reference to `module_alloc' kernel/built-in.o: In function `bpf_jit_binary_free': /home/cwang/linux/kernel/bpf/core.c:164: undefined reference to `module_free' make: *** [vmlinux] Error 1 ==================== Reported-by:
Fengguang Wu <fengguang.wu@intel.com> Fixes: 738cbe72 ("net: bpf: consolidate JIT binary allocator") Signed-off-by:
Daniel Borkmann <dborkman@redhat.com> Acked-by:
Alexei Starovoitov <ast@plumgrid.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Hariprasad Shenai says: ==================== cxgb4: Allow FW size upto 1MB, support for S25FL032P flash and misc. fixes This patch series adds support to allow FW size upto 1MB, support for S25FL032P flash. Fix t4_flash_erase_sectors to throw an error, when erase sector aren't in the flash and also warning message when adapters have flashes less than 2Mb. Adds device id of new adapter and removes device id of debug adapter. The patches series is created against 'net-next' tree. And includes patches on cxgb4 driver and cxgb4vf driver. We have included all the maintainers of respective drivers. Kindly review the change and let us know in case of any review comments. ==================== Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Hariprasad Shenai authored
Signed-off-by:
Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Hariprasad Shenai authored
Based on original work by Casey Leedom <leedom@chelsio.com> Signed-off-by:
Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Hariprasad Shenai authored
cxgb4: Fix t4_flash_erase_sectors() to throw an error when requested to erase sectors which aren't in the FLASH Based on original work by Casey Leedom <leedom@chelsio.com> Signed-off-by:
Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Hariprasad Shenai authored
Add support for Spansion S25FL032P flash Based on original work by Dimitris Michailidis Signed-off-by:
Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Hariprasad Shenai authored
Based on original work by Casey Leedom <leedom@chelsio.com> Signed-off-by:
Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-