Commits · 657d92fe6d693b9674264bc7546e664714955425 · nexedi / linux

28 Sep, 2010 11 commits

bnx2: Use netif_set_real_num_{rx,tx}_queues() · 657d92fe

Ben Hutchings authored Sep 27, 2010

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

657d92fe

net: Add netif_copy_real_num_queues() for use by virtual net drivers · 3171d026

Ben Hutchings authored Sep 27, 2010

This sets the active numbers of queues on a net device to match another.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3171d026

net: Allow changing number of RX queues after device allocation · 62fe0b40

Ben Hutchings authored Sep 27, 2010

For RPS, we create a kobject for each RX queue based on the number of
queues passed to alloc_netdev_mq().  However, drivers generally do not
determine the numbers of hardware queues to use until much later, so
this usually represents the maximum number the driver may use and not
the actual number in use.

For TX queues, drivers can update the actual number using
netif_set_real_num_tx_queues().  Add a corresponding function for RX
queues, netif_set_real_num_rx_queues().
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

62fe0b40

net: sk_{detach|attach}_filter() rcu fixes · f91ff5b9

Eric Dumazet authored Sep 27, 2010

sk_attach_filter() and sk_detach_filter() are run with socket locked.

Use the appropriate rcu_dereference_protected() instead of blocking BH,
and rcu_dereference_bh().
There is no point adding BH prevention and memory barrier.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f91ff5b9

fib: use atomic_inc_not_zero() in fib_rules_lookup · 7fa7cb71

Eric Dumazet authored Sep 27, 2010

It seems we dont use appropriate refcount increment in an
rcu_read_lock() protected section.

fib_rule_get() might increment a null refcount and bad things could
happen.

While fib_nl_delrule() respects an rcu grace period before calling
fib_rule_put(), fib_rules_cleanup_ops() calls fib_rule_put() without a
grace period.

Note : after this patch, we might avoid the synchronize_rcu() call done
in fib_nl_delrule()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7fa7cb71

sit: percpu stats accounting · 15fc1f70

Eric Dumazet authored Sep 27, 2010

Maintain per_cpu tx_bytes, tx_packets, rx_bytes, rx_packets.

Other seldom used fields are kept in netdev->stats structure, possibly
unsafe.

This is a preliminary work to support lockless transmit path, and
correct RX stats, that are already unsafe.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

15fc1f70

ipip: percpu stats accounting · 3c97af99

Eric Dumazet authored Sep 27, 2010

Maintain per_cpu tx_bytes, tx_packets, rx_bytes, rx_packets.

Other seldom used fields are kept in netdev->stats structure, possibly
unsafe.

This is a preliminary work to support lockless transmit path, and
correct RX stats, that are already unsafe.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3c97af99

ip_gre: percpu stats accounting · e985aad7

Eric Dumazet authored Sep 27, 2010

Le lundi 27 septembre 2010 à 14:29 +0100, Ben Hutchings a écrit :

> > diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
> > index 5d6ddcb..de39b22 100644
> > --- a/net/ipv4/ip_gre.c
> > +++ b/net/ipv4/ip_gre.c
> [...]
> > @@ -377,7 +405,7 @@ static struct ip_tunnel *ipgre_tunnel_locate(struct net *net,
> >  	if (parms->name[0])
> >  		strlcpy(name, parms->name, IFNAMSIZ);
> >  	else
> > -		sprintf(name, "gre%%d");
> > +		strcpy(name, "gre%d");
> >
> >  	dev = alloc_netdev(sizeof(*t), name, ipgre_tunnel_setup);
> >  	if (!dev)
> [...]
>
> This is a valid fix, but doesn't belong in this patch!
>

Sorry ? It was not a fix, but at most a cleanup ;)

Anyway I forgot the gretap case...

[PATCH 2/4 v2] ip_gre: percpu stats accounting

Maintain per_cpu tx_bytes, tx_packets, rx_bytes, rx_packets.

Other seldom used fields are kept in netdev->stats structure, possibly
unsafe.

This is a preliminary work to support lockless transmit path, and
correct RX stats, that are already unsafe.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e985aad7

tunnels: prepare percpu accounting · 290b895e

Eric Dumazet authored Sep 27, 2010

Tunnels are going to use percpu for their accounting.

They are going to use a new tstats field in net_device.

skb_tunnel_rx() is changed to be a wrapper around __skb_tunnel_rx()

IPTUNNEL_XMIT() is changed to be a wrapper around __IPTUNNEL_XMIT()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

290b895e

vlan: use this_cpu_ptr() in vlan_skb_recv() · af5ef241

Eric Dumazet authored Sep 26, 2010

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

af5ef241

Phonet: Implement Pipe Controller to support Nokia Slim Modems · 8d98efa8

Kumar Sanghvi authored Sep 26, 2010

Phonet stack assumes the presence of Pipe Controller, either in Modem or
on Application Processing Engine user-space for the Pipe data.
Nokia Slim Modems like WG2.5 used in ST-Ericsson U8500 platform do not
implement Pipe controller in them.
This patch adds Pipe Controller implemenation to Phonet stack to support
Pipe data over Phonet stack for Nokia Slim Modems.
Signed-off-by: Kumar Sanghvi <kumar.sanghvi@stericsson.com>
Acked-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8d98efa8

27 Sep, 2010 17 commits

Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 · e40051d1
David S. Miller authored Sep 27, 2010
```
Conflicts:
	drivers/net/qlcnic/qlcnic_init.c
	net/ipv4/ip_output.c
```
e40051d1

net: r6040: store BIOS default MAC in perm_add · 42099d7a

Otavio Salvador authored Sep 26, 2010

Signed-off-by: Otavio Salvador <otavio@ossystems.com.br>
Signed-off-by: David S. Miller <davem@davemloft.net>

42099d7a

ipv6: add a missing unregister_pernet_subsys call · 2cc6d2bf

Neil Horman authored Sep 24, 2010

Clean up a missing exit path in the ipv6 module init routines.  In
addrconf_init we call ipv6_addr_label_init which calls register_pernet_subsys
for the ipv6_addr_label_ops structure.  But if module loading fails, or if the
ipv6 module is removed, there is no corresponding unregister_pernet_subsys call,
which leaves a now-bogus address on the pernet_list, leading to oopses in
subsequent registrations.  This patch cleans up both the failed load path and
the unload path.  Tested by myself with good results.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

 include/net/addrconf.h |    1 +
 net/ipv6/addrconf.c    |   11 ++++++++---
 net/ipv6/addrlabel.c   |    5 +++++
 3 files changed, 14 insertions(+), 3 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>

2cc6d2bf

net: loopback driver cleanup · a7855c78

Eric Dumazet authored Sep 23, 2010

loopback driver uses dev->ml_priv to store its percpu stats pointer.
It uses ugly casts "(void __percpu __force *)" to shut up sparse
complains.

Define an union to better document we use ml_priv in loopback driver and
define a lstats field with appropriate types.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a7855c78

net: fix rcu use in ip_route_output_slow · 83180af0

Eric Dumazet authored Sep 23, 2010

__in_dev_get_rtnl(dev_out) is called while RTNL is not held, thus
triggers a lockdep fault.

At this point, we only perform a raw test of dev_out->ip_ptr being NULL,
we dont need to make sure ip_ptr cant changed right after.

We can use rcu_dereference_raw() for this.
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

83180af0

rps: allocate rx queues in register_netdevice only · 1b4bf461

Eric Dumazet authored Sep 23, 2010

Instead of having two places were we allocate dev->_rx, introduce
netif_alloc_rx_queues() helper and call it only from
register_netdevice(), not from alloc_netdev_mq()

Goal is to let drivers change dev->num_rx_queues after allocating netdev
and before registering it.

This also removes a lot of ifdefs in net/core/dev.c
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1b4bf461

s390: use free_netdev(netdev) instead of kfree() · bc68580d

Vasiliy Kulikov authored Sep 26, 2010

Freeing netdev without free_netdev() leads to net, tx leaks.
I might lead to dereferencing freed pointer.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

@@
struct net_device* dev;
@@

-kfree(dev)
+free_netdev(dev)
Signed-off-by: David S. Miller <davem@davemloft.net>

bc68580d

sgiseeq: use free_netdev(netdev) instead of kfree() · 8d879de8

Kulikov Vasiliy authored Sep 25, 2010

Freeing netdev without free_netdev() leads to net, tx leaks.
I might lead to dereferencing freed pointer.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

@@
struct net_device* dev;
@@

-kfree(dev)
+free_netdev(dev)
Signed-off-by: David S. Miller <davem@davemloft.net>

8d879de8

rionet: use free_netdev(netdev) instead of kfree() · 22138d30

Kulikov Vasiliy authored Sep 25, 2010

Freeing netdev without free_netdev() leads to net, tx leaks.
I might lead to dereferencing freed pointer.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

@@
struct net_device* dev;
@@

-kfree(dev)
+free_netdev(dev)
Signed-off-by: David S. Miller <davem@davemloft.net>

22138d30

ibm_newemac: use free_netdev(netdev) instead of kfree() · 52933f05

Kulikov Vasiliy authored Sep 25, 2010

Freeing netdev without free_netdev() leads to net, tx leaks.
I might lead to dereferencing freed pointer.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

@@
struct net_device* dev;
@@

-kfree(dev)
+free_netdev(dev)
Signed-off-by: David S. Miller <davem@davemloft.net>

52933f05

net: update SOCK_MIN_RCVBUF · 7a91b434

Eric Dumazet authored Sep 26, 2010

SOCK_MIN_RCVBUF current value is 256 bytes

It doesnt permit to receive the smallest possible frame, considering
socket sk_rmem_alloc/sk_rcvbuf account skb truesizes. On 64bit arches,
sizeof(struct sk_buff) is 240 bytes. Add the typical 64 bytes of
headroom, and we go over the limit.

With old kernels and 32bit arches, we were under the limit, if netdriver
was doing copybreak.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7a91b434

smsc911x: Add MODULE_ALIAS() · 62038e4a

Vincent Stehlé authored Sep 26, 2010

This enables auto loading for the smsc911x ethernet driver.
Signed-off-by: Vincent Stehlé <vincent.stehle@laposte.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

62038e4a

net: reset skb queue mapping when rx'ing over tunnel · 693019e9

Tom Herbert authored Sep 23, 2010

Reset queue mapping when an skb is reentering the stack via a tunnel.
On second pass, the queue mapping from the original device is no
longer valid.
Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

693019e9

drivers/net: return operator cleanup · 807540ba

Eric Dumazet authored Sep 23, 2010

Change "return (EXPR);" to "return EXPR;"

return is not a function, parentheses are not required.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

807540ba

net: skb_frag_t can be smaller on small arches · cb4dfe56

Eric Dumazet authored Sep 23, 2010

On 32bit arches, if PAGE_SIZE is smaller than 65536, we can use 16bit
offset and size fields. This patch saves 72 bytes per skb on i386, or
128 bytes after rounding.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cb4dfe56

br2684: fix scheduling while atomic · a3d6713f

Karl Hiramoto authored Sep 23, 2010

You can't call atomic_notifier_chain_unregister() while in atomic context.

Fix, call un/register_atmdevice_notifier in module __init and __exit.

Bug report:
http://comments.gmane.org/gmane.linux.network/172603Reported-by: Mikko Vinni <mmvinni@yahoo.com>
Tested-by: Mikko Vinni <mmvinni@yahoo.com>
Signed-off-by: Karl Hiramoto <karl@hiramoto.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

a3d6713f

net: propagate NETIF_F_HIGHDMA to vlans · c5256c51

Eric Dumazet authored Sep 23, 2010

Automatically allows vlans to get NETIF_F_HIGHDMA if underlying device
supports it.

On 32bit arches (and more precisely if CONFIG_HIGHMEM is enabled), it
can help to reduce cost of illegal_highdma() and __skb_linearize()
calls.

Tested on tg3 , bnx2, bonding, this worked very well.

This is a generalization of a patch provided by Yi Zou & Jeff Kirsher.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c5256c51

26 Sep, 2010 2 commits

de2104x: fix TP link detection · ca9a7835

Ondrej Zary authored Sep 25, 2010

Compex FreedomLine 32 PnP-PCI2 cards have only TP and BNC connectors but the
SROM contains AUI port too. When TP loses link, the driver switches to
non-existing AUI port (which reports that carrier is always present).

Connecting TP back generates LinkPass interrupt but de_media_interrupt() is
broken - it only updates the link state of currently connected media, ignoring
the fact that LinkPass and LinkFail bits of MacStatus register belong to the
TP port only (the chip documentation says that).

This patch changes de_media_interrupt() to switch media to TP when link goes
up (and media type is not locked) and also to update the link state only when
the TP port is used.

Also the NonselPortActive (and also SelPortActive) bits of SIAStatus register
need to be cleared (by writing 1) after reading or they're useless.
Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ca9a7835

de2104x: fix power management · b0255a02

Ondrej Zary authored Sep 24, 2010

At least my 21041 cards come out of suspend with bus mastering disabled so
they did not work after resume(no data transferred).
After adding pci_set_master(), the driver oopsed immediately on resume -
because de_clean_rings() is called on suspend but de_init_rings() call
was missing in resume.

Also disable link (reset SIA) before sleep (de4x5 does this too).
Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b0255a02

25 Sep, 2010 5 commits

ixgbevf: Refactor ring parameter re-size · bba50b99

Greg Rose authored Sep 23, 2010

The function to resize the Tx/Rx rings had the potential to
dereference a NULL pointer and the code would attempt to resize
the Tx ring even if the Rx ring allocation had failed.  This
would cause some confusion in the return code semantics.  Fixed
up to just unwind the allocations if any of them fail and return
an error.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bba50b99

de2104x: disable autonegotiation on broken hardware · e0f9c4f3

Ondrej Zary authored Sep 23, 2010

At least on older 21041-AA chips (mine is rev. 11), TP duplex autonegotiation
causes the card not to work at all (link is up but no packets are transmitted).

de4x5 disables autonegotiation completely. But it seems to work on newer
(21041-PA rev. 21) so disable it only on rev<20 chips.
Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e0f9c4f3

net: fix a lockdep splat · f064af1e

Eric Dumazet authored Sep 22, 2010

We have for each socket :

One spinlock (sk_slock.slock)
One rwlock (sk_callback_lock)

Possible scenarios are :

(A) (this is used in net/sunrpc/xprtsock.c)
read_lock(&sk->sk_callback_lock) (without blocking BH)
<BH>
spin_lock(&sk->sk_slock.slock);
...
read_lock(&sk->sk_callback_lock);
...

(B)
write_lock_bh(&sk->sk_callback_lock)
stuff
write_unlock_bh(&sk->sk_callback_lock)

(C)
spin_lock_bh(&sk->sk_slock)
...
write_lock_bh(&sk->sk_callback_lock)
stuff
write_unlock_bh(&sk->sk_callback_lock)
spin_unlock_bh(&sk->sk_slock)

This (C) case conflicts with (A) :

CPU1 [A]                         CPU2 [C]
read_lock(callback_lock)
<BH>                             spin_lock_bh(slock)
<wait to spin_lock(slock)>
                                 <wait to write_lock_bh(callback_lock)>

We have one problematic (C) use case in inet_csk_listen_stop() :

local_bh_disable();
bh_lock_sock(child); // spin_lock_bh(&sk->sk_slock)
WARN_ON(sock_owned_by_user(child));
...
sock_orphan(child); // write_lock_bh(&sk->sk_callback_lock)

lockdep is not happy with this, as reported by Tetsuo Handa

It seems only way to deal with this is to use read_lock_bh(callbacklock)
everywhere.

Thanks to Jarek for pointing a bug in my first attempt and suggesting
this solution.
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jarek Poplawski <jarkao2@gmail.com>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f064af1e

stmmac: review the wake-up support · 543876c9

Giuseppe Cavallaro authored Sep 24, 2010

If the PM support is available this is passed
through the platform instead to be hard-coded
in the core files.
WoL on Magic Frame can be enabled by using
the ethtool support.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

543876c9

net: Add Gigabit Ethernet driver of Topcliff PCH · 77555ee7

Masayuki Ohtake authored Sep 21, 2010

Signed-off-by: Masayuki Ohtake <masa-korg@dsn.okisemi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

77555ee7

24 Sep, 2010 1 commit

ip: take care of last fragment in ip_append_data · 59104f06

Eric Dumazet authored Sep 20, 2010

While investigating a bit, I found ip_fragment() slow path was taken
because ip_append_data() provides following layout for a send(MTU +
N*(MTU - 20)) syscall :

- one skb with 1500 (mtu) bytes
- N fragments of 1480 (mtu-20) bytes (before adding IP header)
last fragment gets 17 bytes of trail data because of following bit:

	if (datalen == length + fraggap)
		alloclen += rt->dst.trailer_len;

Then esp4 adds 16 bytes of data (while trailer_len is 17... hmm...
another bug ?)

In ip_fragment(), we notice last fragment is too big (1496 + 20) > mtu,
so we take slow path, building another skb chain.

In order to avoid taking slow path, we should correct ip_append_data()
to make sure last fragment has real trail space, under mtu...
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

59104f06

23 Sep, 2010 4 commits

net: return operator cleanup · a02cec21

Eric Dumazet authored Sep 22, 2010

Change "return (EXPR);" to "return EXPR;"

return is not a function, parentheses are not required.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a02cec21

e1000: use GRO for receive · 6a08d194

Jesse Brandeburg authored Sep 22, 2010

E1000 can benefit from calling the GRO receive functions.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6a08d194

e1000: fix occasional panic on unload · 338c15e4

Jesse Brandeburg authored Sep 22, 2010

Net drivers in general have an issue where timers fired
by mod_timer or work threads with schedule_work are running
outside of the rtnl_lock.

With no other lock protection these routines are vulnerable
to races with driver unload or reset paths.

The longer term solution to this might be a redesign with
safer locks being taken in the driver to guarantee no
reentrance, but for now a safe and effective fix is
to take the rtnl_lock in these routines.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

338c15e4

e1000: use work queues · 5cf42fcd

Jesse Brandeburg authored Sep 22, 2010

E1000 is using several timers that in a follow on patch
will need to acquire the rtnl_lock in order to be safe.

This patch moves the timer bodies into work queues which
will allow the next patch to add rtnl_lock.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5cf42fcd