- 01 Aug, 2013 39 commits
-
-
David S. Miller authored
Nikolay Aleksandrov says: ==================== This patchset aims to lay the groundwork, and do the initial conversion to RCUism. I decided that it'll be much better to make the bonding RCU conversion gradual, so patches can be reviewed and tested better rather than having one huge patch (which I did in the beginning, before this). The first patch is straightforward and it converts the bonding to the standard list API, simplifying a lot of code, removing unnecessary local variables and allowing to use the nice rculist API later. It also takes care of some minor styling issues (re-arranging local variables longest -> shortest, removing brackets for single statement if/else, leaving new line before return statement etc.). The second patch simplifies the conversion by removing unnecessary read_lock(&bond->curr_slave_lock) in xmit paths that are to be converted later, because we only care if the pointer is NULL or a slave there, since we already have bond->lock the slave can't go away. The third patch simplifies the broadcast xmit function by removing the use of curr_active_slave and converting to standard list API. Also this design of the broadcast xmit function avoids a subtle double packet tx race when converted to RCU. The fourth patch factors out the code that transmits skb through a slave with given id (i.e. rr_tx_counter in rr mode, hashed value in xor mode) and simplifies the active-backup xmit path because bond_dev_queue_xmit always consumes the skb. The new bond_xmit_slave_id function is used in rr and xor modes currently, but the plans are to use it in 3ad mode as well thus it's made global. I've left the function prototype to be 81 chars so I wouldn't break it, if this is an issue I can always break it in more lines. The fifth patch introduces RCU by converting attach/detach and release to RCU. It also converts dereferencing of curr_active_slave to rcu_dereference although it's not fully converted to RCU, that is needed for the converted xmit paths. And it converts roundrobin, broadcast, xor and active-backup xmit paths to RCU. The 3ad and ALB/TLB modes acquire read_lock(&bond->lock) to make sure that no slave will be removed and to sync properly with enslave and release as before. This way for the price of a little complexity, we'll be able to convert individual parts of the bonding to RCU, and test them easier in the process. If this patchset is accepted in some form, I'll post followups in the next weeks that gradually convert the bonding to RCU and remove the need for the rwlocks. For performance notes please refer to patch 5 (RCU conversion one). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
nikolay@redhat.com authored
This patch does the initial bonding conversion to RCU. After it the following modes are protected by RCU alone: roundrobin, active-backup, broadcast and xor. Modes ALB/TLB and 3ad still acquire bond->lock for reading, and will be dealt with later. curr_active_slave needs to be dereferenced via rcu in the converted modes because the only thing protecting the slave after this patch is rcu_read_lock, so we need the proper barrier for weakly ordered archs and to make sure we don't have stale pointer. It's not tagged with __rcu yet because there's still work to be done to remove the curr_slave_lock, so sparse will complain when rcu_assign_pointer and rcu_dereference are used, but the alternative to use rcu_dereference_protected would've created much bigger code churn which is more difficult to test and review. That will be converted in time. 1. Active-backup mode 1.1 Perf recording while doing iperf -P 4 - old bonding: iperf spent 0.55% in bonding, system spent 0.29% CPU in bonding - new bonding: iperf spent 0.29% in bonding, system spent 0.15% CPU in bonding 1.2. Bandwidth measurements - old bonding: 16.1 gbps consistently - new bonding: 17.5 gbps consistently 2. Round-robin mode 2.1 Perf recording while doing iperf -P 4 - old bonding: iperf spent 0.51% in bonding, system spent 0.24% CPU in bonding - new bonding: iperf spent 0.16% in bonding, system spent 0.11% CPU in bonding 2.2 Bandwidth measurements - old bonding: 8 gbps (variable due to packet reorderings) - new bonding: 10 gbps (variable due to packet reorderings) Of course the latency has improved in all converted modes, and moreover while doing enslave/release (since it doesn't affect tx anymore). Also I've stress tested all modes doing enslave/release in a loop while transmitting traffic. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nikolay Aleksandrov authored
I factored out the tx xmit code which relies on slave id in bond_xmit_slave_id. It is global because later it can be used also in 3ad mode xmit. Unnecessary obvious comments are removed. Active-backup mode is simplified because bond_dev_queue_xmit always consumes the skb. bond_xmit_xor becomes one line because of bond_xmit_slave_id. bond_for_each_slave_from is not used in bond_xmit_slave_id because later when RCU is used we can avoid important race condition by using standard rculist routines. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nikolay Aleksandrov authored
We don't need to start from the curr_active_slave as the frame will be sent to all eligible slaves anyway, so we remove the unnecessary local variables, checks and comments, and make it use the standard list API. This has the nice side-effect that later when it's converted to RCU a race condition will be avoided which could lead to double packet tx. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
nikolay@redhat.com authored
In all the cases we already hold bond->lock for reading, so the slave can't get away and the check != NULL is sufficient. curr_active_slave can still change after the read_lock is unlocked prior to use of the dereferenced value, so there's no need for it. It either contains a valid slave which we use (and can't get away), or it is NULL which is checked. In some places the read_lock of curr_slave_lock was left because we need it not to change while performing some action (e.g. syncing current active slave's addresses, sending ARP requests through the active slave) such cases will be dealt with individually while converting to RCU. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
nikolay@redhat.com authored
This patch aims to remove struct bonding's first_slave and struct slave's next and prev pointers, and replace them with the standard Linux list API. The old macros are converted to list API as well and some new primitives are available now. The checks if there're slaves that used slave_cnt have been replaced by the list_empty macro. Also a few small style fixes, changing longest -> shortest line in local variable declarations, leaving an empty line before return and removing unnecessary brackets. This is the first step to gradual RCU conversion. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
fan.du authored
Server Client 2001:1::803/64 <-> 2001:1::805/64 2001:2::804/64 <-> 2001:2::806/64 Server side fib binary tree looks like this: (2001:/64) / / ffff88002103c380 / \ (2) / \ (2001::803/128) ffff880037ac07c0 / \ / \ (3) ffff880037ac0640 (2001::806/128) / \ (1) / \ (2001::804/128) (2001::805/128) Delete 2001::804/64 won't cause prefix route deleted as well as rt in (3) destinate to 2001::806 with source address as 2001::804/64. That's because 2001::803/64 is still alive, which make onlink=1 in ipv6_del_addr, this is where the substantial difference between same prefix configuration and different prefix configuration :) So packet are still transmitted out to 2001::806 with source address as 2001::804/64. So bump genid will clear rt in (3), and up layer protocol will eventually find the right one for themselves. This problem arised from the discussion in here: http://marc.info/?l=linux-netdev&m=137404469219410&w=4Signed-off-by: Fan Du <fan.du@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://gitorious.org/linux-can/linux-can-nextDavid S. Miller authored
Marc Kleine-Budde says: ==================== this is a pull-request for net-next/master. It consists of two patches by Fabio Estevam. Them first convert the flexcan driver to use devm_ioremap_resource(), the second adds return value checking for clk_prepare_enable(). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yuval Mintz authored
On very rare occasions, repeated load/unload stress test in the presence of our storage driver (bnx2i/bnx2fc) causes a kernel panic in bnx2x code (NULL pointer dereference). Stack traces indicate the issue happens during MAC configuration; thorough code review showed that indeed several races exist in which one thread can iterate over the list of configured MACs while another deletes entries from the same list. This patch adds a varient on the single-writer/Multiple-reader lock mechanism - It utilizes an already exsiting bottom-half lock, using it so that Whenever a writer is unable to continue due to the existence of another writer/reader, it pends its request for future deliverance. The writer / last readers will check for the existence of such requests and perform them instead of the original initiator. This prevents the writer from having to sleep while waiting for the lock to be accessible, which might cause deadlocks given the locks already held by the writer. Another result of this patch is that setting of Rx Mode is now made in sleepable context - Setting of Rx Mode is made under a bottom-half lock, which was always nontrivial for the bnx2x driver, as the HW/FW configuration requires wait for completions. Since sleep was impossible (due to the sleepless-context), various mechanisms were utilized to prevent the calling thread from sleep, but the truth was that when the caller thread (i.e, the one calling ndo_set_rx_mode()) returned, the Rx mode was still not set in HW/FW. bnx2x_set_rx_mode() will now overtly schedule for the Rx changes to be configured by the sp_rtnl_task which hold the RTNL lock and is sleepable context. Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Ariel Elior <ariele@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nikolay Aleksandrov authored
After commit 4aa5dee4 ("net: convert resend IGMP to notifier event") we try to acquire rtnl in bond_resend_igmp_join_requests but it can be scheduled with rtnl already held (e.g. when bond_change_active_slave is called with rtnl) causing a loop of immediate reschedules + calls because rtnl_trylock fails each time since it's being already held. For me this issue leads to system hangs very easy: modprobe bonding; ifconfig bond0 up; ifenslave bond0 eth0; rmmod bonding; The fix is to introduce a small (1 jiffy) delay which is enough for the sections holding rtnl to finish without putting any strain on the system. Also adjust the timer in bond_change_active_slave to be 1 jiffy, since most of the time it's called with rtnl already held. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Chris Metcalf authored
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Chris Metcalf authored
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Chris Metcalf authored
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Chris Metcalf authored
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Chris Metcalf authored
The initial driver support was for a single mPIPE shim on the chip (as is the case for the Gx36 hardware). The Gx72 chip has two mPIPE shims, so we extend the driver to handle that case. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Chris Metcalf authored
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Chris Metcalf authored
The code used to call napi_disable() in an interrupt handler (from smp_call_function), which in turn could call msleep(). Unfortunately you can't sleep in an interrupt context. Luckily it turns out all the NAPI support functions are just operating on data structures and not on any deeply per-cpu data, so we can arrange to set up and tear down all the NAPI state on the core driving the process, and just do the IRQ enable/disable as a smp_call_function thing. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Chris Metcalf authored
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Chris Metcalf authored
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Chris Metcalf authored
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Chris Metcalf authored
Building against headers from an older Tilera hypervisor can cause the frags[] array to be overrun. Don't enable TSO in that case. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Chris Metcalf authored
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Chris Metcalf authored
This change allows the user to configure various features of the tile networking drivers on and off. There is no change to the default initialization state of either the tilegx or tilepro drivers. Neither driver needs the ndo_fix_features or ndo_set_features callbacks, since the generic code already handles the dependencies for fix_features, and there is no hardware state to tweak in set_features. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Claudiu Manoil authored
grp->grp_id is obsolete. It has no use in the current driver. Remove it from gfar_priv_grp and put the 'rstat' member in its place, in the 2nd cache line, as rstat needs fast access. Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
David suggested to add a BUG_ON() to catch if some layer sets skb->sk pointer without a corresponding destructor. As skb can sit in a queue, it's mandatory to make sure the socket cannot disappear, and it's usually done by taking a reference on the socket, then releasing it from the skb destructor. This patch is a follow-up to commit c34a7612 ("net: skb_orphan() changes") and will be reverted after catching all possible offenders if any. Suggested-by: David Miller <davem@davemloft.net> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Fabio Estevam authored
clk_prepare_enable() may fail, so let's check its return value and propagate it in the case of error. Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-
Fabio Estevam authored
Using devm_ioremap_resource() can make the code simpler and smaller. Also, place alloc_candev() after of_match_device() to make error handling easier. Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-
Hannes Frederic Sowa authored
With the addition of the suppress operation (7764a45a ("fib_rules: add .suppress operation") we rely on accurate error reporting of the fib_rules.actions. fib6_rule_action always returned -EAGAIN in case we could not find a matching route and 0 if a rule was matched. This also included a match for blackhole or prohibited rule actions which could get suppressed by the new logic. So adapt fib6_rule_action to always return the correct error code as its counterpart fib4_rule_action does. This also fixes a possiblity of nullptr-deref where we don't find a table, thus rt == NULL. Because the condition rt != ip6_null_entry still holdes it seems we could later get a nullptr bug on dereference rt->dst. v2: a) Fixed a brain fart in the commit msg (the rule => a table, etc). No changes to the patch. Cc: Stefan Tomanek <stefan.tomanek@wertarbyte.de> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Joe Perches authored
There are a mix of function prototypes with and without extern in the kernel sources. Standardize on not using extern for function prototypes. Function prototypes don't need to be written with extern. extern is assumed by the compiler. Its use is as unnecessary as using auto to declare automatic/local variables in a block. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Joe Perches authored
There are a mix of function prototypes with and without extern in the kernel sources. Standardize on not using extern for function prototypes. Function prototypes don't need to be written with extern. extern is assumed by the compiler. Its use is as unnecessary as using auto to declare automatic/local variables in a block. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Joe Perches authored
There are a mix of function prototypes with and without extern in the kernel sources. Standardize on not using extern for function prototypes. Function prototypes don't need to be written with extern. extern is assumed by the compiler. Its use is as unnecessary as using auto to declare automatic/local variables in a block. Reflow modified prototypes to 80 columns. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Joe Perches authored
There are a mix of function prototypes with and without extern in the kernel sources. Standardize on not using extern for function prototypes. Function prototypes don't need to be written with extern. extern is assumed by the compiler. Its use is as unnecessary as using auto to declare automatic/local variables in a block. Reflow modified prototypes to 80 columns. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Joe Perches authored
There are a mix of function prototypes with and without extern in the kernel sources. Standardize on not using extern for function prototypes. Function prototypes don't need to be written with extern. extern is assumed by the compiler. Its use is as unnecessary as using auto to declare automatic/local variables in a block. Reflow modified prototypes to 80 columns. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Joe Perches authored
There are a mix of function prototypes with and without extern in the kernel sources. Standardize on not using extern for function prototypes. Function prototypes don't need to be written with extern. extern is assumed by the compiler. Its use is as unnecessary as using auto to declare automatic/local variables in a block. Reflow modified prototypes to 80 columns. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Joe Perches authored
There are a mix of function prototypes with and without extern in the kernel sources. Standardize on not using extern for function prototypes. Function prototypes don't need to be written with extern. extern is assumed by the compiler. Its use is as unnecessary as using auto to declare automatic/local variables in a block. Reflow modified prototypes to 80 columns. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Joe Perches authored
There are a mix of function prototypes with and without extern in the kernel sources. Standardize on not using extern for function prototypes. Function prototypes don't need to be written with extern. extern is assumed by the compiler. Its use is as unnecessary as using auto to declare automatic/local variables in a block. Reflow modified prototypes to 80 columns. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paul Gortmaker authored
A collection of expectations and operational details about how networking development takes place in the context of the netdev mailing list. The content is meant to capture specific items that are unique to netdev workflow, and not re-document generic linux expectations that are already captured elsewhere. This was originally proposed[1] as a regular posting mailing list FAQ, but it probably is more universally accessible here in tree. [1] https://lwn.net/Articles/559211/Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Stefan Tomanek authored
This change adds a new operation to the fib_rules_ops struct; it allows the suppression of routing decisions if certain criteria are not met by its results. The first implemented constraint is a minimum prefix length added to the structures of routing rules. If a rule is added with a minimum prefix length >0, only routes meeting this threshold will be considered. Any other (more general) routing table entries will be ignored. When configuring a system with multiple network uplinks and default routes, it is often convinient to reference the main routing table multiple times - but omitting the default route. Using this patch and a modified "ip" utility, this can be achieved by using the following command sequence: $ ip route add table secuplink default via 10.42.23.1 $ ip rule add pref 100 table main prefixlength 1 $ ip rule add pref 150 fwmark 0xA table secuplink With this setup, packets marked 0xA will be processed by the additional routing table "secuplink", but only if no suitable route in the main routing table can be found. By using a minimal prefixlength of 1, the default route (/0) of the table "main" is hidden to packets processed by rule 100; packets traveling to destinations with more specific routing entries are processed as usual. Signed-off-by: Stefan Tomanek <stefan.tomanek@wertarbyte.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Joe Perches authored
There are a mix of function prototypes with and without extern in the kernel sources. Standardize on not using extern for function prototypes. Function prototypes don't need to be written with extern. extern is assumed by the compiler. Its use is as unnecessary as using auto to declare automatic/local variables in a block. Reflow modified prototypes to 80 columns. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 31 Jul, 2013 1 commit
-
-
Eric Dumazet authored
It is illegal to set skb->sk without corresponding destructor. Its therefore safe for skb_orphan() to not clear skb->sk if skb->destructor is not set. Also avoid clearing skb->destructor if already NULL. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-