Commits · 15077228cab68e5e8c3cbf26a7f6ebacfac4c829 · Kirill Smelkov / linux

01 Aug, 2013 37 commits

bonding: factor out slave id tx code and simplify xmit paths · 15077228

Nikolay Aleksandrov authored Aug 01, 2013

I factored out the tx xmit code which relies on slave id in
bond_xmit_slave_id. It is global because later it can be used also in
3ad mode xmit. Unnecessary obvious comments are removed. Active-backup
mode is simplified because bond_dev_queue_xmit always consumes the skb.
bond_xmit_xor becomes one line because of bond_xmit_slave_id.
bond_for_each_slave_from is not used in bond_xmit_slave_id because later
when RCU is used we can avoid important race condition by using standard
rculist routines.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

15077228

bonding: simplify broadcast_xmit function · 78a646ce

Nikolay Aleksandrov authored Aug 01, 2013

We don't need to start from the curr_active_slave as the frame will be
sent to all eligible slaves anyway, so we remove the unnecessary local
variables, checks and comments, and make it use the standard list API.
This has the nice side-effect that later when it's converted to RCU
a race condition will be avoided which could lead to double packet tx.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

78a646ce

bonding: remove unnecessary read_locks of curr_slave_lock · 71bc3b2d

nikolay@redhat.com authored Aug 01, 2013

In all the cases we already hold bond->lock for reading, so the slave
can't get away and the check != NULL is sufficient. curr_active_slave
can still change after the read_lock is unlocked prior to use of the
dereferenced value, so there's no need for it. It either contains a
valid slave which we use (and can't get away), or it is NULL which is
checked.
In some places the read_lock of curr_slave_lock was left because we need
it not to change while performing some action (e.g. syncing current
active slave's addresses, sending ARP requests through the active slave)
such cases will be dealt with individually while converting to RCU.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

71bc3b2d

bonding: convert to list API and replace bond's custom list · dec1e90e

nikolay@redhat.com authored Aug 01, 2013

This patch aims to remove struct bonding's first_slave and struct
slave's next and prev pointers, and replace them with the standard Linux
list API. The old macros are converted to list API as well and some new
primitives are available now. The checks if there're slaves that used
slave_cnt have been replaced by the list_empty macro.
Also a few small style fixes, changing longest -> shortest line in local
variable declarations, leaving an empty line before return and removing
unnecessary brackets.
This is the first step to gradual RCU conversion.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dec1e90e

ipv6: bump genid when delete/add address · 439677d7

fan.du authored Aug 01, 2013

Server           Client
2001:1::803/64  <-> 2001:1::805/64
2001:2::804/64  <-> 2001:2::806/64

Server side fib binary tree looks like this:

                                   (2001:/64)
                                   /
                                  /
                   ffff88002103c380
                 /                 \
     (2)        /                   \
 (2001::803/128)                     ffff880037ac07c0
                                    /               \
                                   /                 \  (3)
                      ffff880037ac0640               (2001::806/128)
                       /             \
             (1)      /               \
        (2001::804/128)               (2001::805/128)

Delete 2001::804/64 won't cause prefix route deleted as well as rt in (3)
destinate to 2001::806 with source address as 2001::804/64. That's because
2001::803/64 is still alive, which make onlink=1 in ipv6_del_addr, this is
where the substantial difference between same prefix configuration and
different prefix configuration :) So packet are still transmitted out to
2001::806 with source address as 2001::804/64.

So bump genid will clear rt in (3), and up layer protocol will eventually
find the right one for themselves.

This problem arised from the discussion in here:
http://marc.info/?l=linux-netdev&m=137404469219410&w=4Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

439677d7

Merge branch 'for-davem' of git://gitorious.org/linux-can/linux-can-next · c1fc20aa

David S. Miller authored Aug 01, 2013

Marc Kleine-Budde says:

====================
this is a pull-request for net-next/master. It consists of two patches
by Fabio Estevam. Them first convert the flexcan driver to use
devm_ioremap_resource(), the second adds return value checking for
clk_prepare_enable().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

c1fc20aa

bnx2x: Revising locking scheme for MAC configuration · 8b09be5f

Yuval Mintz authored Aug 01, 2013

On very rare occasions, repeated load/unload stress test in the presence of
our storage driver (bnx2i/bnx2fc) causes a kernel panic in bnx2x code
(NULL pointer dereference). Stack traces indicate the issue happens during MAC
configuration; thorough code review showed that indeed several races exist
in which one thread can iterate over the list of configured MACs while another
deletes entries from the same list.

This patch adds a varient on the single-writer/Multiple-reader lock mechanism -
It utilizes an already exsiting bottom-half lock, using it so that Whenever
a writer is unable to continue due to the existence of another writer/reader,
it pends its request for future deliverance.
The writer / last readers will check for the existence of such requests and
perform them instead of the original initiator.
This prevents the writer from having to sleep while waiting for the lock
to be accessible, which might cause deadlocks given the locks already
held by the writer.

Another result of this patch is that setting of Rx Mode is now made in
sleepable context - Setting of Rx Mode is made under a bottom-half lock, which
was always nontrivial for the bnx2x driver, as the HW/FW configuration requires
wait for completions.
Since sleep was impossible (due to the sleepless-context), various mechanisms
were utilized to prevent the calling thread from sleep, but the truth was that
when the caller thread (i.e, the one calling ndo_set_rx_mode()) returned, the
Rx mode was still not set in HW/FW.

bnx2x_set_rx_mode() will now overtly schedule for the Rx changes to be
configured by the sp_rtnl_task which hold the RTNL lock and is sleepable
context.
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8b09be5f

bonding: fix system hang due to fast igmp timer rescheduling · 4beac029

Nikolay Aleksandrov authored Aug 01, 2013

After commit 4aa5dee4 ("net: convert resend IGMP to notifier event")
we try to acquire rtnl in bond_resend_igmp_join_requests but it can be
scheduled with rtnl already held (e.g. when bond_change_active_slave is
called with rtnl) causing a loop of immediate reschedules + calls because
rtnl_trylock fails each time since it's being already held.
For me this issue leads to system hangs very easy:
modprobe bonding; ifconfig bond0 up; ifenslave bond0 eth0; rmmod
bonding;

The fix is to introduce a small (1 jiffy) delay which is enough for the
sections holding rtnl to finish without putting any strain on the system.
Also adjust the timer in bond_change_active_slave to be 1 jiffy, since
most of the time it's called with rtnl already held.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4beac029

tile: support PTP using the tilegx mPIPE (IEEE 1588) · 9ab5ec59

Chris Metcalf authored Aug 01, 2013

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9ab5ec59

tile: remove deprecated NETIF_F_LLTX flag from tile drivers · 84e181ba
Chris Metcalf authored Aug 01, 2013
```
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
84e181ba
tile: make "tile_net.custom" a proper bool module parameter · 4aa02644
Chris Metcalf authored Aug 01, 2013
```
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
4aa02644

tile: support TSO for IPv6 in tilegx network driver · 2c7d04a9

Chris Metcalf authored Aug 01, 2013

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2c7d04a9

tile: support multiple mPIPE shims in tilegx network driver · f3286a3a

Chris Metcalf authored Aug 01, 2013

The initial driver support was for a single mPIPE shim on the chip
(as is the case for the Gx36 hardware).  The Gx72 chip has two mPIPE
shims, so we extend the driver to handle that case.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f3286a3a

tile: enable GRO in the tilegx network driver · 6ab4ae9a

Chris Metcalf authored Aug 01, 2013

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6ab4ae9a

tile: fix panic bug in napi support for tilegx network driver · 5e7a54a2

Chris Metcalf authored Aug 01, 2013

The code used to call napi_disable() in an interrupt handler
(from smp_call_function), which in turn could call msleep().
Unfortunately you can't sleep in an interrupt context.

Luckily it turns out all the NAPI support functions are
just operating on data structures and not on any deeply
per-cpu data, so we can arrange to set up and tear down all
the NAPI state on the core driving the process, and just
do the IRQ enable/disable as a smp_call_function thing.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5e7a54a2

tile: update dev->stats directly in tilegx network driver · ad018185
Chris Metcalf authored Aug 01, 2013
```
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
ad018185

tile: support jumbo frames in the tilegx network driver · 2628e8af

Chris Metcalf authored Aug 01, 2013

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2628e8af

tile: remove dead is_dup_ack() function from tilepro net driver · 48f2a4e1
Chris Metcalf authored Aug 01, 2013
```
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
48f2a4e1

tile: avoid bug in tilepro net driver built with old hypervisor · 815d3bae

Chris Metcalf authored Aug 01, 2013

Building against headers from an older Tilera hypervisor can cause
the frags[] array to be overrun.  Don't enable TSO in that case.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

815d3bae

tile: support rx_dropped/rx_errors in tilepro net driver · 439a93a0

Chris Metcalf authored Aug 01, 2013

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

439a93a0

tile: set hw_features and vlan_features in setup · a8eaed55

Chris Metcalf authored Aug 01, 2013

This change allows the user to configure various features of the tile
networking drivers on and off. There is no change to the default
initialization state of either the tilegx or tilepro drivers.

Neither driver needs the ndo_fix_features or ndo_set_features callbacks,
since the generic code already handles the dependencies for
fix_features, and there is no hardware state to tweak in set_features.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a8eaed55

gianfar: Remove unused field grp_id from gfar_priv_grp · 84915c64

Claudiu Manoil authored Aug 01, 2013

grp->grp_id is obsolete. It has no use in the current driver.
Remove it from gfar_priv_grp and put the 'rstat' member
in its place, in the 2nd cache line, as rstat needs fast access.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

84915c64

net: add a temporary sanity check in skb_orphan() · 376c7311

Eric Dumazet authored Aug 01, 2013

David suggested to add a BUG_ON() to catch if some layer
sets skb->sk pointer without a corresponding destructor.

As skb can sit in a queue, it's mandatory to make sure the
socket cannot disappear, and it's usually done by taking a
reference on the socket, then releasing it from the skb
destructor.

This patch is a follow-up to commit c34a7612
("net: skb_orphan() changes") and will be reverted after
catching all possible offenders if any.
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

376c7311

can: flexcan: Check the return value from clk_prepare_enable() · aa10181b

Fabio Estevam authored Jul 22, 2013

clk_prepare_enable() may fail, so let's check its return value and propagate it
in the case of error.
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

aa10181b

can: flexcan: Use devm_ioremap_resource() · 933e4af4

Fabio Estevam authored Jul 22, 2013

Using devm_ioremap_resource() can make the code simpler and smaller.

Also, place alloc_candev() after of_match_device() to make error handling
easier.
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

933e4af4

ipv6: fib6_rules should return exact return value · 46b3a421

Hannes Frederic Sowa authored Aug 01, 2013

With the addition of the suppress operation
(7764a45a ("fib_rules: add .suppress
operation") we rely on accurate error reporting of the fib_rules.actions.

fib6_rule_action always returned -EAGAIN in case we could not find a
matching route and 0 if a rule was matched. This also included a match
for blackhole or prohibited rule actions which could get suppressed by
the new logic.

So adapt fib6_rule_action to always return the correct error code as
its counterpart fib4_rule_action does. This also fixes a possiblity of
nullptr-deref where we don't find a table, thus rt == NULL. Because
the condition rt != ip6_null_entry still holdes it seems we could later
get a nullptr bug on dereference rt->dst.

v2:
a) Fixed a brain fart in the commit msg (the rule => a table, etc). No
   changes to the patch.

Cc: Stefan Tomanek <stefan.tomanek@wertarbyte.de>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

46b3a421

cls_cgroup.h netprio_cgroup.h: Remove extern from function prototypes · 37830721

Joe Perches authored Jul 31, 2013

There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

37830721

checksum: Remove extern from function prototypes · 4fc70747

Joe Perches authored Jul 31, 2013

There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4fc70747

cfg80211.h/mac80211.h: Remove extern from function prototypes · 10dd9b7c

Joe Perches authored Jul 31, 2013

There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Reflow modified prototypes to 80 columns.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

10dd9b7c

ax25.h: Remove extern from function prototypes · c1d8f804

Joe Perches authored Jul 31, 2013

There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Reflow modified prototypes to 80 columns.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c1d8f804

arp/neighbour.h: Remove extern from function prototypes · 90972b22

Joe Perches authored Jul 31, 2013

There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Reflow modified prototypes to 80 columns.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

90972b22

af_rxrpc.h: Remove extern from function prototypes · cd2cf63a

Joe Perches authored Jul 31, 2013

There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Reflow modified prototypes to 80 columns.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cd2cf63a

af_unix.h: Remove extern from function prototypes · b60a8280

Joe Perches authored Jul 31, 2013

There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Reflow modified prototypes to 80 columns.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b60a8280

addrconf.h: Remove extern function prototypes · e8e54d3c

Joe Perches authored Jul 31, 2013

There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Reflow modified prototypes to 80 columns.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e8e54d3c

Documentation: add networking/netdev-FAQ.txt · 49dfe762

Paul Gortmaker authored Jul 31, 2013

A collection of expectations and operational details about how
networking development takes place in the context of the netdev
mailing list.

The content is meant to capture specific items that are unique
to netdev workflow, and not re-document generic linux expectations
that are already captured elsewhere.

This was originally proposed[1] as a regular posting mailing list
FAQ, but it probably is more universally accessible here in tree.

[1] https://lwn.net/Articles/559211/Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

49dfe762

fib_rules: add .suppress operation · 7764a45a

Stefan Tomanek authored Aug 01, 2013

This change adds a new operation to the fib_rules_ops struct; it allows the
suppression of routing decisions if certain criteria are not met by its
results.

The first implemented constraint is a minimum prefix length added to the
structures of routing rules. If a rule is added with a minimum prefix length
>0, only routes meeting this threshold will be considered. Any other (more
general) routing table entries will be ignored.

When configuring a system with multiple network uplinks and default routes, it
is often convinient to reference the main routing table multiple times - but
omitting the default route. Using this patch and a modified "ip" utility, this
can be achieved by using the following command sequence:

  $ ip route add table secuplink default via 10.42.23.1

  $ ip rule add pref 100            table main prefixlength 1
  $ ip rule add pref 150 fwmark 0xA table secuplink

With this setup, packets marked 0xA will be processed by the additional routing
table "secuplink", but only if no suitable route in the main routing table can
be found. By using a minimal prefixlength of 1, the default route (/0) of the
table "main" is hidden to packets processed by rule 100; packets traveling to
destinations with more specific routing entries are processed as usual.
Signed-off-by: Stefan Tomanek <stefan.tomanek@wertarbyte.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

7764a45a

net: Remove extern from include/net/ scheduling prototypes · 5c15257f

Joe Perches authored Jul 30, 2013

There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Reflow modified prototypes to 80 columns.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5c15257f

31 Jul, 2013 3 commits

net: skb_orphan() changes · c34a7612

Eric Dumazet authored Jul 30, 2013

It is illegal to set skb->sk without corresponding destructor.

Its therefore safe for skb_orphan() to not clear skb->sk if
skb->destructor is not set.

Also avoid clearing skb->destructor if already NULL.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c34a7612

netem: Introduce skb_orphan_partial() helper · f2f872f9

Eric Dumazet authored Jul 30, 2013

Commit 547669d4 ("tcp: xps: fix reordering issues") added
unexpected reorders in case netem is used in a MQ setup for high
performance test bed.

ETH=eth0
tc qd del dev $ETH root 2>/dev/null
tc qd add dev $ETH root handle 1: mq
for i in `seq 1 32`
do
 tc qd add dev $ETH parent 1:$i netem delay 100ms
done

As all tcp packets are orphaned by netem, TCP stack believes it can
set skb->ooo_okay on all packets.

In order to allow producers to send more packets, we want to
keep sk_wmem_alloc from reaching sk_sndbuf limit.

We can do that by accounting one byte per skb in netem queues,
so that TCP stack is not fooled too much.

Tested:

With above MQ/netem setup, scaling number of concurrent flows gives
linear results and no reorders/retransmits

lpq83:~# for n in 1 10 20 30 40 50 60 70 80 90 100
 do echo -n "n:$n " ; ./super_netperf $n -H 10.7.7.84; done
n:1 198.46
n:10 2002.69
n:20 4000.98
n:30 6006.35
n:40 8020.93
n:50 10032.3
n:60 12081.9
n:70 13971.3
n:80 16009.7
n:90 17117.3
n:100 17425.5
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f2f872f9

net: split rt_genid for ipv4 and ipv6 · ca4c3fc2

fan.du authored Jul 30, 2013

Current net name space has only one genid for both IPv4 and IPv6, it has below
drawbacks:

- Add/delete an IPv4 address will invalidate all IPv6 routing table entries.
- Insert/remove XFRM policy will also invalidate both IPv4/IPv6 routing table
  entries even when the policy is only applied for one address family.

Thus, this patch attempt to split one genid for two to cater for IPv4 and IPv6
separately in a fine granularity.
Signed-off-by: Fan Du <fan.du@windriver.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ca4c3fc2