Commits · c8db3fec5b02f4cefe441903fe1c142ff14e1771 · nexedi / linux

30 Oct, 2008 2 commits

udp: Should use spin_lock_bh()/spin_unlock_bh() in udp_lib_unhash() · c8db3fec

Eric Dumazet authored Oct 30, 2008

Spotted by Alexander Beregalov
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c8db3fec

net: easy removals of HIPQUAD using %pI4 format · 8cf14e38

Harvey Harrison authored Oct 29, 2008

As a bonus, removes some unnecessary byteswapping.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8cf14e38

29 Oct, 2008 20 commits

macvlan: add support for ethtool get settings · 9edb8bb6

Stephen Hemminger authored Oct 29, 2008

If macvlan's are used, it is useful to propgate speed and other settings
from underlying device up for application usage.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

9edb8bb6

printk: remove %p6 format specifier, fix up comments · 6b9a1066

Harvey Harrison authored Oct 29, 2008

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6b9a1066

net: replace %p6 with %pI6 · 5b095d98

Harvey Harrison authored Oct 29, 2008

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5b095d98

net: replace %#p6 format specifier with %pi6 · 4b7a4274

Harvey Harrison authored Oct 29, 2008

gcc warns when using the # modifier with the %p format specifier,
so we can't use this to omit the colons when needed, introduces
%pi6 instead.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4b7a4274

printk: add %I4, %I6, %i4, %i6 format specifiers · 4aa99606

Harvey Harrison authored Oct 29, 2008

For use in printing IPv4, or IPv6 addresses in the usual way:

%i4 and %I4 are currently equivalent and print the address in
dot-separated decimal x.x.x.x

%I6 prints 16-bit network order hex with colon separators:
xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx

%i6 omits the colons.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4aa99606

udp: introduce sk_for_each_rcu_safenext() · 96631ed1

Eric Dumazet authored Oct 29, 2008

Corey Minyard found a race added in commit 271b72c7
(udp: RCU handling for Unicast packets.)

 "If the socket is moved from one list to another list in-between the
 time the hash is calculated and the next field is accessed, and the
 socket has moved to the end of the new list, the traversal will not
 complete properly on the list it should have, since the socket will
 be on the end of the new list and there's not a way to tell it's on a
 new list and restart the list traversal.  I think that this can be
 solved by pre-fetching the "next" field (with proper barriers) before
 checking the hash."

This patch corrects this problem, introducing a new
sk_for_each_rcu_safenext() macro.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

96631ed1

udp: udp_get_next() should use spin_unlock_bh() · f52b5054

Eric Dumazet authored Oct 29, 2008

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f52b5054

udp: calculate udp_mem based on low memory instead of all memory · 8203efb3

Eric Dumazet authored Oct 29, 2008

This patch mimics commit 57413ebc
(tcp: calculate tcp_mem based on low memory instead of all memory)

The udp_mem array which contains limits on the total amount of memory
used by UDP sockets is calculated based on nr_all_pages.  On a 32 bits
x86 system, we should base this on the number of lowmem pages.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8203efb3

udp: RCU handling for Unicast packets. · 271b72c7

Eric Dumazet authored Oct 29, 2008

Goals are :

1) Optimizing handling of incoming Unicast UDP frames, so that no memory
 writes should happen in the fast path.

 Note: Multicasts and broadcasts still will need to take a lock,
 because doing a full lockless lookup in this case is difficult.

2) No expensive operations in the socket bind/unhash phases :
  - No expensive synchronize_rcu() calls.

  - No added rcu_head in socket structure, increasing memory needs,
  but more important, forcing us to use call_rcu() calls,
  that have the bad property of making sockets structure cold.
  (rcu grace period between socket freeing and its potential reuse
   make this socket being cold in CPU cache).
  David did a previous patch using call_rcu() and noticed a 20%
  impact on TCP connection rates.
  Quoting Cristopher Lameter :
   "Right. That results in cacheline cooldown. You'd want to recycle
    the object as they are cache hot on a per cpu basis. That is screwed
    up by the delayed regular rcu processing. We have seen multiple
    regressions due to cacheline cooldown.
    The only choice in cacheline hot sensitive areas is to deal with the
    complexity that comes with SLAB_DESTROY_BY_RCU or give up on RCU."

  - Because udp sockets are allocated from dedicated kmem_cache,
  use of SLAB_DESTROY_BY_RCU can help here.

Theory of operation :
---------------------

As the lookup is lockfree (using rcu_read_lock()/rcu_read_unlock()),
special attention must be taken by readers and writers.

Use of SLAB_DESTROY_BY_RCU is tricky too, because a socket can be freed,
reused, inserted in a different chain or in worst case in the same chain
while readers could do lookups in the same time.

In order to avoid loops, a reader must check each socket found in a chain
really belongs to the chain the reader was traversing. If it finds a
mismatch, lookup must start again at the begining. This *restart* loop
is the reason we had to use rdlock for the multicast case, because
we dont want to send same message several times to the same socket.

We use RCU only for fast path.
Thus, /proc/net/udp still takes spinlocks.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

271b72c7

udp: introduce struct udp_table and multiple spinlocks · 645ca708

Eric Dumazet authored Oct 29, 2008

UDP sockets are hashed in a 128 slots hash table.

This hash table is protected by *one* rwlock.

This rwlock is readlocked each time an incoming UDP message is handled.

This rwlock is writelocked each time a socket must be inserted in
hash table (bind time), or deleted from this table (close time)

This is not scalable on SMP machines :

1) Even in read mode, lock() and unlock() are atomic operations and
 must dirty a contended cache line, shared by all cpus.

2) A writer might be starved if many readers are 'in flight'. This can
 happen on a machine with some NIC receiving many UDP messages. User
 process can be delayed a long time at socket creation/dismantle time.

This patch prepares RCU migration, by introducing 'struct udp_table
and struct udp_hslot', and using one spinlock per chain, to reduce
contention on central rwlock.

Introducing one spinlock per chain reduces latencies, for port
randomization on heavily loaded UDP servers. This also speedup
bindings to specific ports.

udp_lib_unhash() was uninlined, becoming to big.

Some cleanups were done to ease review of following patch
(RCUification of UDP Unicast lookups)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

645ca708

net: remove NIP6(), NIP6_FMT, NIP6_SEQFMT and final users · b189db5d

Harvey Harrison authored Oct 28, 2008

Open code NIP6_FMT in the one call inside sscanf and one user
of NIP6() that could use %p6 in the netfilter code.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b189db5d

uwb: use the %pM formatting specifier in eda.c · a20fd0a7

Harvey Harrison authored Oct 28, 2008

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a20fd0a7

infiniband: remove IPOIB_GID_RAW_ARG, IPOIB_GID_ARG, IPOIB_GID_FMT · 8c165a83
Harvey Harrison authored Oct 28, 2008
```
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
8c165a83

infiniband: ipoib replace IPOIB_GID_FMT with %p6 · fcace2fe

Harvey Harrison authored Oct 28, 2008

Replace all uses of IPOIB_GID_FMT, IPOIB_GID_RAW_ARG() and IPOIB_GID_ARG()
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fcace2fe

infiniband: use %p6 for printing message ids · 8867cd7c

Harvey Harrison authored Oct 28, 2008

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8867cd7c

vlan: propogate ethtool speed values · b3020061

Stephen Hemminger authored Oct 28, 2008

This enables more ethtool information. The speed and settings of the
underlying device are propagated up. This makes services like SNMP that
use ethtool to get speed setting, work when managing a vlan, without adding
silly heurtistics into SNMP daemon.

For the driver info, just use existing driver strings.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b3020061

veth: remove unused list · 3717746e

Daniel Lezcano authored Oct 28, 2008

The veth network device is stored in a list in the netdev private.
AFAICS, this list is never used so I removed this list from the code.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3717746e

veth: Remove useless veth field · bb7bba3d

Daniel Lezcano authored Oct 28, 2008

The veth private structure contains a netdev pointer refering to its peer.
This field is never used and it is pointless because if we can access,
the veth_priv, that means we already have the netdev which is stored
in veth_priv->dev.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bb7bba3d

net, misc: replace uses of NIP6_FMT with %p6 · fdb46ee7

Harvey Harrison authored Oct 28, 2008

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fdb46ee7

net: replace uses of NIP6_FMT with %p6 · 0c6ce78a

Harvey Harrison authored Oct 28, 2008

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0c6ce78a

28 Oct, 2008 14 commits

netfilter: replace uses of NIP6_FMT with %p6 · 38ff4fa4

Harvey Harrison authored Oct 28, 2008

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

38ff4fa4

misc: replace NIP6_FMT with %p6 format specifier · 1afa67f5

Harvey Harrison authored Oct 28, 2008

The iscsi_ibft.c changes are almost certainly a bugfix as the
pointer 'ip' is a u8 *, so they never print the last 8 bytes
of the IPv6 address, and the eight bytes they do print have
a zero byte with them in each 16-bit word.

Other than that, this should cause no difference in functionality.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1afa67f5

net: replace all current users of NIP6_SEQFMT with %#p6 · b071195d

Harvey Harrison authored Oct 28, 2008

The define in kernel.h can be done away with at a later time.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b071195d

printk: add %p6 format specifier for IPv6 addresses · 689afa7d

Harvey Harrison authored Oct 28, 2008

Takes a pointer to a IPv6 address and formats it in the usual
colon-separated hex format:
xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx

Each 16 bit word is printed in network-endian byteorder.

%#p6 is also supported and will omit the colons.

%p6 is a replacement for NIP6_FMT and NIP6()
%#p6 is a replacement for NIP6_SEQFMT and NIP6()

Note that NIP6() took a struct in6_addr whereas this takes a pointer
to a struct in6_addr.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

689afa7d

xfrm: Notify changes in UDP encapsulation via netlink · 3a2dfbe8

Martin Willi authored Oct 28, 2008

Add new_mapping() implementation to the netlink xfrm_mgr to notify
address/port changes detected in UDP encapsulated ESP packets.
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

3a2dfbe8

net: don't use INIT_RCU_HEAD · 93adcc80

Alexey Dobriyan authored Oct 28, 2008

call_rcu() will unconditionally rewrite RCU head anyway.
Applies to 
	struct neigh_parms
	struct neigh_table
	struct net
	struct cipso_v4_doi
	struct in_ifaddr
	struct in_device
	rt->u.dst
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

93adcc80

net: reduce structures when XFRM=n · def8b4fa

Alexey Dobriyan authored Oct 28, 2008

ifdef out
* struct sk_buff::sp		(pointer)
* struct dst_entry::xfrm	(pointer)
* struct sock::sk_policy	(2 pointers)
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

def8b4fa

netlink: constify struct nlattr * arg to parsing functions · b057efd4
Patrick McHardy authored Oct 28, 2008
```
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
b057efd4

netns: Coexist with the sysfs limitations v2 · 3891845e

Eric W. Biederman authored Oct 27, 2008

To make testing of the network namespace simpler allow
the network namespace code and the sysfs code to be
compiled and run at the same time.  To do this only
virtual devices are allowed in the additional network
namespaces and those virtual devices are not placed
in the kobject tree.

Since virtual devices don't actually do anything interesting
hardware wise that needs device management there should
be no loss in keeping them out of the kobject tree and
by implication sysfs.  The gain in ease of testing
and code coverage should be significant.

Changelog:

v2: As pointed out by Benjamin Thery it only makes sense to call
    device_rename in the initial network namespace for now.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Tested-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3891845e

net: convert more to %pM · 7c510e4b

Johannes Berg authored Oct 27, 2008

A number of places still use %02x:...:%02x because it's
in debug statements or for no real reason. Make a few
of them use %pM.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

7c510e4b

net: convert print_mac to %pM · e174961c

Johannes Berg authored Oct 27, 2008

This converts pretty much everything to print_mac. There were
a few things that had conflicts which I have just dropped for
now, no harm done.

I've built an allyesconfig with this and looked at the files
that weren't built very carefully, but it's a huge patch.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

e174961c

mac80211: convert to %pM away from print_mac · 0c68ae26

Johannes Berg authored Oct 27, 2008

Also remove a few stray DECLARE_MAC_BUF that were no longer
used at all.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

0c68ae26

printk: add %pM format specifier for MAC addresses · dd45c9cf

Harvey Harrison authored Oct 27, 2008

Add format specifiers for printing out six colon-separated bytes:

MAC addresses (%pM):
xx:xx:xx:xx:xx:xx

%#pM is also supported and omits the colon separators.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dd45c9cf

net: implement emergency route cache rebulds when gc_elasticity is exceeded · 1080d709

Neil Horman authored Oct 27, 2008

This is a patch to provide on demand route cache rebuilding. Currently, our
route cache is rebulid periodically regardless of need. This introduced
unneeded periodic latency. This patch offers a better approach. Using code
provided by Eric Dumazet, we compute the standard deviation of the average hash
bucket chain length while running rt_check_expire. Should any given chain
length grow to larger that average plus 4 standard deviations, we trigger an
emergency hash table rebuild for that net namespace. This allows for the common
case in which chains are well behaved and do not grow unevenly to not incur any
latency at all, while those systems (which may be being maliciously attacked),
only rebuild when the attack is detected. This patch take 2 other factors into
account:
1) chains with multiple entries that differ by attributes that do not affect the
hash value are only counted once, so as not to unduly bias system to rebuilding
if features like QOS are heavily used
2) if rebuilding crosses a certain threshold (which is adjustable via the added
sysctl in this patch), route caching is disabled entirely for that net
namespace, since constant rebuilding is less efficient that no caching at all

Tested successfully by me.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1080d709

27 Oct, 2008 4 commits

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6 · 1d63e726

Linus Torvalds authored Oct 27, 2008

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6:
  firewire: fw-sbp2: fix races
  firewire: fw-sbp2: delay first login to avoid retries
  firewire: fw-ohci: initialization failure path fixes
  firewire: fw-ohci: don't leak dma memory on module removal
  firewire: fix struct fw_node memory leak
  firewire: Survive more than 256 bus resets

1d63e726

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 · 31390d0f

Linus Torvalds authored Oct 27, 2008

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ALSA: ASoC: Blackfin: update SPORT0 port selector (v2)
  ALSA: hda - Restore default pin configs for realtek codecs
  sound: use a common working email address
  pci: use pci_ioremap_bar() in sound/

31390d0f

Merge branches 'topic/fix/asoc', 'topic/fix/hda', 'topic/fix/misc' and... · 0a9b8638
Takashi Iwai authored Oct 27, 2008
```
Merge branches 'topic/fix/asoc', 'topic/fix/hda', 'topic/fix/misc' and 'topic/pci-ioremap-bar' into for-linus
```
0a9b8638

ALSA: ASoC: Blackfin: update SPORT0 port selector (v2) · c3e5203b

Cliff Cai authored Oct 27, 2008

- Setting the TFS pin selector for SPORT 0 based on whether the selected
  port id F or G. If the port is F then no conflict should exist for the
  TFS. When Port G is selected and EMAC then there is a conflict between
  the PHY interrupt line and TFS.  Current settings prevent the conflict
  by ignoring the TFS pin when Port G is selected. This allows both
  ssm2602 using Port G and EMAC concurrently.

 - some code cleanup
Signed-off-by: Cliff Cai <cliff.cai@analog.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

c3e5203b