Commits · 9cde070874b822d4677f4f01fe146991785813b1 · Kirill Smelkov / linux

26 Apr, 2007 40 commits

bridge: add support for user mode STP · 9cde0708

Stephen Hemminger authored Mar 21, 2007

This patchset based on work by Aji_Srinivas@emc.com provides allows
spanning tree to be controled from userspace.  Like hotplug, it
uses call_usermodehelper when spanning tree is enabled so there
is no visible API change. If call to start usermode STP fails
it falls back to existing kernel STP.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>

9cde0708

bridge: add sysfs hook to flush forwarding table · 9cf63747

Stephen Hemminger authored Apr 09, 2007

The RSTP daemon needs to be able to flush all dynamic forwarding
entries in the case of topology change.

This is a temporary interface. It will change to a netlink interface
before RSTP daemon is officially released.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>

9cf63747

bridge: simpler hash with salt · 3f890923

Stephen Hemminger authored Mar 21, 2007

Instead of hashing the whole Ethernet address, it should be faster
to just use the last 4 bytes. Add a random salt value to the hash
to make it more difficult to construct worst case DoS hash chains.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>

3f890923

bridge: don't route packets while learning · 467aea0d

Stephen Hemminger authored Mar 21, 2007

While in the STP learning state, don't route packets; wait until
forwarding delay has expired. The purpose of the forwarding delay
is to detect loops in the network, and if a brouter started up
and started forwarding, it could cause a flood.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>

467aea0d

bridge: eliminate call by reference · 6229e362

Stephen Hemminger authored Mar 21, 2007

Change the bridging hook to be simple function with return value
rather than modifying the skb argument. This could generate better
code and is cleaner.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>

6229e362

[NET]: Treat CHECKSUM_PARTIAL as CHECKSUM_UNNECESSARY · 60476372

Herbert Xu authored Apr 09, 2007

When a transmitted packet is looped back directly, CHECKSUM_PARTIAL
maps to the semantics of CHECKSUM_UNNECESSARY.  Therefore we should
treat it as such in the stack.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

60476372

[NETDRV]: Perform missing csum_offset conversions · 628592cc

Herbert Xu authored Apr 23, 2007

When csum_offset was introduced we did a conversion from csum to
csum_offset where applicable.  A couple of drivers were missed in
this process.

It was harmless to begin with since the two fields coincided.  Now
that we've made them different with the addition of csum_start, the
missed drivers must be converted or they can't send packets out at
all that require checksum offload.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

628592cc

[NET]: Use csum_start offset instead of skb_transport_header · 663ead3b

Herbert Xu authored Apr 09, 2007

The skb transport pointer is currently used to specify the start
of the checksum region for transmit checksum offload.  Unfortunately,
the same pointer is also used during receive side processing.

This creates a problem when we want to retransmit a received
packet with partial checksums since the skb transport pointer
would be overwritten.

This patch solves this problem by creating a new 16-bit csum_start
offset value to replace the skb transport header for the purpose
of checksums.  This offset is calculated from skb->head so that
it does not have to change when skb->data changes.

No extra space is required since csum_offset itself fits within
a 16-bit word so we can use the other 16 bits for csum_start.

For backwards compatibility, just before we push a packet with
partial checksums off into the device driver, we set the skb
transport header to what it would have been under the old scheme.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

663ead3b

[XFRM]: beet: fix worst case header_len calculation · ac758e3c

Patrick McHardy authored Apr 09, 2007

esp_init_state doesn't account for the beet pseudo header in the header_len
calculation, which may result in undersized skbs hitting xfrm4_beet_output,
causing unnecessary reallocations in ip_finish_output2.

The skbs should still always have enough room to avoid causing
skb_under_panic in skb_push since we have at least 16 bytes available
from LL_RESERVED_SPACE in xfrm_state_check_space.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

ac758e3c

[XFRM]: Optimize MTU calculation · c5c25238

Patrick McHardy authored Apr 09, 2007

Replace the probing based MTU estimation, which usually takes 2-3 iterations
to find a fitting value and may underestimate the MTU, by an exact calculation.

Also fix underestimation of the XFRM trailer_len, which causes unnecessary
reallocations.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

c5c25238

[XFRM]: esp: fix skb_tail_pointer conversion bug · 55792258

Patrick McHardy authored Apr 09, 2007

Fix incorrect switch of "trailer" skb by "skb" during skb_tail_pointer
conversion:

-       *(u8*)(trailer->tail - 1) = top_iph->protocol;
+       *(skb_tail_pointer(skb) - 1) = top_iph->protocol;

-       *(u8 *)(trailer->tail - 1) = *skb_network_header(skb);
+       *(skb_tail_pointer(skb) - 1) = *skb_network_header(skb);
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

55792258

[SK_BUFF]: Fix missing offset adjustment in pskb_expand_head · 56eb8882

Patrick McHardy authored Apr 09, 2007

Since we're increasing the headroom, the header offsets need to be
increased by the same amount as well.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

56eb8882

[IPV6] FIB6RULE: Find source address during looking up route. · 29f6af77

YOSHIFUJI Hideaki authored Apr 06, 2007

When looking up route for destination with rules with
source address restrictions, we may need to find a source
address for the traffic if not given.

Based on patch from Noriaki TAKAMIYA <takamiya@po.ntts.co.jp>.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

29f6af77

[XFRM]: beet: minor cleanups · ea2f10a3

Patrick McHardy authored Apr 05, 2007

Remove unnecessary initialization/variable.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

ea2f10a3

[RTNL]: Improve error codes for unsupported operations · 038890fe

Thomas Graf authored Apr 05, 2007

The most common trigger of these errors is that the
config option hasn't been enable wich would make the
functionality available. Therefore returning EOPNOTSUPP
gives a better idea on what is going wrong.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

038890fe

[NET]: Move generic skbuff stuff from XFRM code to generic code · 716ea3a7

David Howells authored Apr 02, 2007

Move generic skbuff stuff from XFRM code to generic code so that
AF_RXRPC can use it too.

The kdoc comments I've attached to the functions needs to be checked
by whoever wrote them as I had to make some guesses about the workings
of these functions.
Signed-off-By: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

716ea3a7

[CREDITS]: Update Arnaldo entry · 926554c4
Arnaldo Carvalho de Melo authored Mar 31, 2007
```
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
```
926554c4
[SK_BUFF]: Some more conversions to skb_copy_from_linear_data · 1a4e2d09
Arnaldo Carvalho de Melo authored Mar 31, 2007
```
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
```
1a4e2d09

[SK_BUFF]: Introduce skb_copy_to_linear_data{_offset} · 27d7ff46

Arnaldo Carvalho de Melo authored Mar 31, 2007

To clearly state the intent of copying to linear sk_buffs, _offset being a
overly long variant but interesting for the sake of saving some bytes.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>

27d7ff46

[NET]: Fix warnings in 3c523.c and ni52.c · 3dbad80a

David S. Miller authored Mar 29, 2007

We have to put back the cast to "char *" because these
pointers are volatile.

Reported by Andrew Morton.
Signed-off-by: David S. Miller <davem@davemloft.net>

3dbad80a

[NET]: Inline net_device_stats · c45d286e

Rusty Russell authored Mar 28, 2007

Network drivers which keep stats allocate their own stats structure
then write a get_stats() function to return them.  It would be nice if
this were done by default.

1) Add a new "stats" field to "struct net_device".
2) Add a new feature field to say "this driver uses the internal one"
3) Have a default "get_stats" which returns NULL if that feature not set.
4) Change callers to check result of get_stats call for NULL, not if
   ->get_stats is set.

This should not break backwards compatibility with older drivers, yet
allow modern drivers to shed some boilerplate code.

Lightly tested: works for a modified lguest network driver.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

c45d286e

[NET]: random functions can use nsec resolution instead of usec · f8595815

Eric Dumazet authored Mar 28, 2007

In order to get more randomness for secure_tcpv6_sequence_number(),
secure_tcp_sequence_number(), secure_dccp_sequence_number() functions,
we can use the high resolution time services, providing nanosec
resolution.

I've also done two kmalloc()/kzalloc() conversions.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

f8595815

[NET] fib_rules: delay route cache flush by ip_rt_min_delay · 4b19ca44
Thomas Graf authored Mar 28, 2007
```
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
4b19ca44

[SK_BUFF]: Introduce skb_copy_from_linear_data{_offset} · d626f62b

Arnaldo Carvalho de Melo authored Mar 27, 2007

To clearly state the intent of copying from linear sk_buffs, _offset being a
overly long variant but interesting for the sake of saving some bytes.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

d626f62b

[BLUETOOTH]: Introduce skb->data accessor methods for hci_{acl,event,sco}_hdr · 2a123b86

Arnaldo Carvalho de Melo authored Mar 27, 2007

For consistency with other skb data accessors, reducing the number of direct
accesses to skb->data.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

2a123b86

[IPV4]: align inet_protos[] on SMP · 03d4f879

Eric Dumazet authored Mar 27, 2007

As IPPROTO_TCP is 6, it makes sense to make sure inet_protos[] array
is properly cache line aligned to avoid false sharing on SMP.

c0680540 b peer_total
c0680544 b inet_peer_unused_head
c0680560 B inet_protos

On i386 this example, we can see that inet_protos[IPPROTO_TCP] shares
a potentially hot (and modified) cache line.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

03d4f879

[TCP]: tcp_memory_pressure and tcp_socket are__read_mostly candidates · 4103f8cd

Eric Dumazet authored Mar 27, 2007

tcp_memory_pressure and tcp_socket currently share a cache line with tcp_memory_allocated, tcp_sockets_allocated.
(Very hot cache line)
It makes sense to declare these variables as __read_mostly, to avoid false sharing on SMP.

ffffffff8081d9c0 B tcp_orphan_count
ffffffff8081d9c4 B tcp_memory_allocated
ffffffff8081d9c8 B tcp_sockets_allocated
ffffffff8081d9cc B tcp_memory_pressure
ffffffff8081d9d0 b tcp_md5sig_users
ffffffff8081d9d8 b tcp_md5sig_pool
ffffffff8081d9e0 b warntime.31570
ffffffff8081d9e8 b tcp_socket
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4103f8cd

[NET] fib_rules: Flush route cache after rule modifications · 73417f61

Thomas Graf authored Mar 27, 2007

The results of FIB rules lookups are cached in the routing cache
except for IPv6 as no such cache exists. So far, it was the
responsibility of the user to flush the cache after modifying any
rules. This lead to many false bug reports due to misunderstanding
of this concept.

This patch automatically flushes the route cache after inserting
or deleting a rule.

Thanks to Muli Ben-Yehuda <muli@il.ibm.com> for catching a bug
in the previous patch.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

73417f61

[NET]: inet_ehash_secret should be __read_mostly and set only once · be776281

Eric Dumazet authored Mar 27, 2007

There is a very tiny probability that build_ehash_secret() is called
at the same time by different CPUS.

Also, using __read_mostly is a must for inet_ehash_secret
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be776281

[NET]: Allow forwarding of ip_summed except CHECKSUM_COMPLETE · 35fc92a9

Herbert Xu authored Mar 26, 2007

Right now Xen has a horrible hack that lets it forward packets with
partial checksums.  One of the reasons that CHECKSUM_PARTIAL and
CHECKSUM_COMPLETE were added is so that we can get rid of this hack
(where it creates two extra bits in the skbuff to essentially mirror
ip_summed without being destroyed by the forwarding code).

I had forgotten that I've already gone through all the deivce drivers
last time around to make sure that they're looking at ip_summed ==
CHECKSUM_PARTIAL rather than ip_summed != 0 on transmit.  In any case,
I've now done that again so it should definitely be safe.

Unfortunately nobody has yet added any code to update CHECKSUM_COMPLETE
values on forward so we I'm setting that to CHECKSUM_NONE.  This should
be safe to remove for bridging but I'd like to check that code path
first.

So here is the patch that lets us get rid of the hack by preserving
ip_summed (mostly) on forwarded packets.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

35fc92a9

[IPV4] LVS: Allow to send ICMP unreachable responses when real-servers are removed · 2d771cd8

Janusz Krzysztofik authored Mar 26, 2007

this is a small patch by Janusz Krzysztofik to ip_route_output_slow()
that allows VIP-less LVS linux director to generate packets
originating >From VIP if sysctl_ip_nonlocal_bind is set.

In a nutshell, the intention is for an LVS linux director to be able
to send ICMP unreachable responses to end-users when real-servers are
removed.

http://archive.linuxvirtualserver.org/html/lvs-users/2007-01/msg00106.htmlSigned-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

2d771cd8

[NET] fib_rules: Add no-operation action · fa0b2d1d

Thomas Graf authored Mar 26, 2007

The use of nop rules simplifies the usage of goto rules
and adds more flexibility as they allow targets to remain
while the actual content of the branches can change easly.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

fa0b2d1d

[NET] fib_rules: Mark rules detached from the device · 2b443683

Thomas Graf authored Mar 26, 2007

Rules which match against device names in their selector can
remain while the device itself disappears, in fact the device
doesn't have to present when the rule is added in the first
place. The device name is resolved by trying when the rule is
added and later by listening to NETDEV_REGISTER/UNREGISTER
notifications.

This patch adds the flag FIB_RULE_DEV_DETACHED which is set
towards userspace when a rule contains a device match which
is unresolved at the moment. This eases spotting the reason
why certain rules seem not to function properly.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

2b443683

[NET] fib_rules: goto rule action · 0947c9fe

Thomas Graf authored Mar 26, 2007

This patch adds a new rule action FR_ACT_GOTO which allows
to skip a set of rules by jumping to another rule. The rule
to jump to is specified via the FRA_GOTO attribute which
carries a rule preference.

Referring to a rule which doesn't exists is explicitely allowed.
Such goto rules are marked with the flag FIB_RULE_UNRESOLVED
and will act like a rule with a non-matching selector. The rule
will become functional as soon as its target is present.

The goto action enables performance optimizations by reducing
the average number of rules that have to be passed per lookup.

Example:
0:      from all lookup local
40:     not from all to 192.168.23.128 goto 32766
41:     from all fwmark 0xa blackhole
42:     from all fwmark 0xff blackhole
32766:  from all lookup main
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

0947c9fe

[WAN] cosa.c: Build fix. · 2f7826c0

David S. Miller authored Mar 26, 2007

Caused by skb_reset_mac_header() changes, missing semicolon.
Signed-off-by: David S. Miller <davem@davemloft.net>

2f7826c0

[TCP] tcp_probe: improvements for net-2.6.22 · 85795d64

Stephen Hemminger authored Mar 24, 2007

Change tcp_probe to use ktime (needed to add one export).
Add option to only get events when cwnd changes - from Doug Leith
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

85795d64

[TCP]: cubic update for net-2.6.22 · e1c3e7ab

Stephen Hemminger authored Mar 24, 2007

The following update received from Injong updates TCP cubic to the latest
version. I am running more complete tests and will have results after 4/1.

According to Injong: the new version improves on its scalability,
fairness and stability. So in all properties, we confirmed it shows better
performance.

NCSU results (for 2.6.18 and 2.6.20) available:
http://netsrv.csc.ncsu.edu/wiki/index.php/TCP_Testing

This version is described in a new Internet draft for CUBIC.
http://www.ietf.org/internet-drafts/draft-rhee-tcp-cubic-00.txtSigned-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1c3e7ab

[NET] Move DF check to ip_forward · 9af3912e

John Heffner authored Mar 25, 2007

Do fragmentation check in ip_forward, similar to ipv6 forwarding.
Signed-off-by: John Heffner <jheffner@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>

9af3912e

[INET]: Use jhash + random secret for ehash. · b3da2cf3

David S. Miller authored Mar 23, 2007

The days are gone when this was not an issue, there are folks out
there with huge bot networks that can be used to attack the
established hash tables on remote systems.

So just like the routing cache and connection tracking
hash, use Jenkins hash with random secret input.
Signed-off-by: David S. Miller <davem@davemloft.net>

b3da2cf3

[NETLINK]: introduce NLA_BINARY type · d30045a0

Johannes Berg authored Mar 23, 2007

This patch introduces a new NLA_BINARY attribute policy type with the
verification of simply checking the maximum length of the payload.

It also fixes a small typo in the example.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

d30045a0