Commits · e5a2c899957659cd1a9f789bc462f9c0b35f5150 · nexedi / linux

06 Nov, 2014 14 commits

fast_hash: avoid indirect function calls · e5a2c899

Hannes Frederic Sowa authored Nov 05, 2014

By default the arch_fast_hash hashing function pointers are initialized
to jhash(2). If during boot-up a CPU with SSE4.2 is detected they get
updated to the CRC32 ones. This dispatching scheme incurs a function
pointer lookup and indirect call for every hashing operation.

rhashtable as a user of arch_fast_hash e.g. stores pointers to hashing
functions in its structure, too, causing two indirect branches per
hashing operation.

Using alternative_call we can get away with one of those indirect branches.
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

e5a2c899

Merge branch 'amd-xgbe-next' · 2c99cd91

David S. Miller authored Nov 05, 2014

Tom Lendacky says:

====================
amd-xgbe: AMD XGBE driver updates 2014-11-04

The following series of patches includes functional updates to the
driver as well as some trivial changes for function renaming and
spelling fixes.

- Move channel and ring structure allocation into the device open path
- Rename the pre_xmit function to dev_xmit
- Explicitly use the u32 data type for the device descriptors
- Use page allocation for the receive buffers
- Add support for split header/payload receive
- Add support for per DMA channel interrupts
- Add support for receive side scaling (RSS)
- Add support for ethtool receive side scaling commands
- Fix the spelling of descriptors
- After a PCS reset, sync the PCS and PHY modes
- Add dependency on HAS_IOMEM to both the amd-xgbe and amd-xgbe-phy
  drivers

This patch series is based on net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

2c99cd91

amd-xgbe-phy: Let AMD_XGBE_PHY depend on HAS_IOMEM · 5cdec679

Lendacky, Thomas authored Nov 04, 2014

The amd-xgbe-phy driver needs to perform ioremap calls, so add HAS_IOMEM
to its build dependency.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5cdec679

amd-xgbe: Let AMD_XGBE depend on HAS_IOMEM · 474809b9

Lendacky, Thomas authored Nov 04, 2014

The amd-xgbe driver needs to perform ioremap calls, so add HAS_IOMEM
to its build dependency.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

474809b9

amd-xgbe-phy: Sync PCS and PHY modes after reset · 0c95a1fa

Lendacky, Thomas authored Nov 04, 2014

This patch adds support to sync the states of the PCS and the PHY
after a reset is performed.  If the PCS and the PHY are not in the
same state after reset an extra mode change would be performed. This
extra mode change might not be needed if the PCS and the PHY are
synced up after reset.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0c95a1fa

amd-xgbe: Fix a spelling error · a7beaf23

Lendacky, Thomas authored Nov 04, 2014

This patch fixes the spelling of the word "descriptor" in a couple
of locations.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a7beaf23

amd-xgbe: Add receive side scaling ethtool support · f6ac8628

Lendacky, Thomas authored Nov 04, 2014

This patch adds support for ethtool receive side scaling (RSS) commands.
Support is added to get/set the RSS hash key and the RSS lookup table.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f6ac8628

amd-xgbe: Provide support for receive side scaling · 5b9dfe29

Lendacky, Thomas authored Nov 04, 2014

This patch provides support for receive side scaling (RSS). RSS allows
for spreading incoming network packets across the Rx queues. When used
in conjunction with the per DMA channel interrupt support, this allows
the receive processing to be spread across multiple processors.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5b9dfe29

amd-xgbe: Add support for per DMA channel interrupts · 9227dc5e

Lendacky, Thomas authored Nov 04, 2014

This patch provides support for interrupts that are generated by the
Tx/Rx DMA channel pairs of the device.  This allows for Tx and Rx
processing to run across multiple processsors.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9227dc5e

amd-xgbe: Implement split header receive support · 174fd259

Lendacky, Thomas authored Nov 04, 2014

Provide support for splitting IP packets so that the header and
payload can be sent to different DMA addresses.  This will allow
the IP header to be put into the linear part of the skb while the
payload can be added as frags.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

174fd259

amd-xgbe: Use page allocations for Rx buffers · 08dcc47c

Lendacky, Thomas authored Nov 04, 2014

Use page allocations for Rx buffers instead of pre-allocating skbs
of a set size.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

08dcc47c

amd-xgbe: Use the u32 data type for descriptors · aa96bd3c

Lendacky, Thomas authored Nov 04, 2014

The Tx and Rx descriptors are unsigned 32 bit values.  Use the u32
type, rather than unsigned int, to map these descriptors.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

aa96bd3c

amd-xgbe: Rename pre_xmit function to dev_xmit · a9d41981

Lendacky, Thomas authored Nov 04, 2014

The pre_xmit function name implies that it performs operations prior
to transmitting the packet when in fact it is responsible for setting
up the descriptors and initiating the transmit.  Rename this to
function from pre_xmit to dev_xmit, which is consistent with the name
used during receive processing - dev_read.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a9d41981

amd-xgbe: Move ring allocation to device open · 4780b7ca

Lendacky, Thomas authored Nov 04, 2014

Move the channel and ring tracking structures allocation to device
open.  This will allow for future support to vary the number of Tx/Rx
queues without unloading the module.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4780b7ca

05 Nov, 2014 14 commits

ipv6: move INET6_MATCH() to include/net/inet6_hashtables.h · 25de4668

WANG Cong authored Nov 04, 2014

It is only used in net/ipv6/inet6_hashtables.c.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

25de4668

net: Add and use skb_copy_datagram_msg() helper. · 51f3d02b

David S. Miller authored Nov 05, 2014

This encapsulates all of the skb_copy_datagram_iovec() callers
with call argument signature "skb, offset, msghdr->msg_iov, length".

When we move to iov_iters in the networking, the iov_iter object will
sit in the msghdr.

Having a helper like this means there will be less places to touch
during that transformation.

Based upon descriptions and patch from Al Viro.
Signed-off-by: David S. Miller <davem@davemloft.net>

51f3d02b

Merge branch 'gue-next' · 1d76c1d0

David S. Miller authored Nov 05, 2014

Tom Herbert says:

====================
gue: Remote checksum offload

This patch set implements remote checksum offload for
GUE, which is a mechanism that provides checksum offload of
encapsulated packets using rudimentary offload capabilities found in
most Network Interface Card (NIC) devices. The outer header checksum
for UDP is enabled in packets and, with some additional meta
information in the GUE header, a receiver is able to deduce the
checksum to be set for an inner encapsulated packet. Effectively this
offloads the computation of the inner checksum. Enabling the outer
checksum in encapsulation has the additional advantage that it covers
more of the packet than the inner checksum including the encapsulation
headers.

Remote checksum offload is described in:
http://tools.ietf.org/html/draft-herbert-remotecsumoffload-01

The GUE transmit and receive paths are modified to support the
remote checksum offload option. The option contains a checksum
offset and checksum start which are directly derived from values
set in stack when doing CHECKSUM_PARTIAL. On receipt of the option, the
operation is to calculate the packet checksum from "start" to end of
the packet (normally derived for checksum complete), and then set
the resultant value at checksum "offset" (the checksum field has
already been primed with the pseudo header). This emulates a NIC
that implements NETIF_F_HW_CSUM.

The primary purpose of this feature is to eliminate cost of performing
checksum calculation over a packet when encpasulating.

In this patch set:
  - Move fou_build_header into fou.c and split it into a couple of
    functions
  - Enable offloading of outer UDP checksum in encapsulation
  - Change udp_offload to support remote checksum offload, includes
    new GSO type and ensuring encapsulated layers (TCP) doesn't try to
    set a checksum covered by RCO
  - TX support for RCO with GUE. This is configured through ip_tunnel
    and set the option on transmit when packet being encapsulated is
    CHECKSUM_PARTIAL
  - RX support for RCO with GUE for normal and GRO paths. Includes
    resolving the offloaded checksum

v2:
  Address comments from davem: Move accounting for private option
  field in gue_encap_hlen to patch in which we add the remote checksum
  offload option.

Testing:

I ran performance numbers using netperf TCP_STREAM and TCP_RR with 200
streams, comparing GUE with and without remote checksum offload (doing
checksum-unnecessary to complete conversion in both cases). These
were run on mlnx4 and bnx2x. Some mlnx4 results are below.

GRE/GUE
    TCP_STREAM
      IPv4, with remote checksum offload
        9.71% TX CPU utilization
        7.42% RX CPU utilization
        36380 Mbps
      IPv4, without remote checksum offload
        12.40% TX CPU utilization
        7.36% RX CPU utilization
        36591 Mbps
    TCP_RR
      IPv4, with remote checksum offload
        77.79% CPU utilization
	91/144/216 90/95/99% latencies
        1.95127e+06 tps
      IPv4, without remote checksum offload
        78.70% CPU utilization
        89/152/297 90/95/99% latencies
        1.95458e+06 tps

IPIP/GUE
    TCP_STREAM
      With remote checksum offload
        10.30% TX CPU utilization
        7.43% RX CPU utilization
        36486 Mbps
      Without remote checksum offload
        12.47% TX CPU utilization
        7.49% RX CPU utilization
        36694 Mbps
    TCP_RR
      With remote checksum offload
        77.80% CPU utilization
        87/153/270 90/95/99% latencies
        1.98735e+06 tps
      Without remote checksum offload
        77.98% CPU utilization
        87/150/287 90/95/99% latencies
        1.98737e+06 tps

SIT/GUE
    TCP_STREAM
      With remote checksum offload
        9.68% TX CPU utilization
        7.36% RX CPU utilization
        35971 Mbps
      Without remote checksum offload
        12.95% TX CPU utilization
        8.04% RX CPU utilization
        36177 Mbps
    TCP_RR
      With remote checksum offload
        79.32% CPU utilization
        94/158/295 90/95/99% latencies
        1.88842e+06 tps
      Without remote checksum offload
        80.23% CPU utilization
        94/149/226 90/95/99% latencies
        1.90338e+06 tps

VXLAN
    TCP_STREAM
        35.03% TX CPU utilization
        20.85% RX CPU utilization
        36230 Mbps
    TCP_RR
        77.36% CPU utilization
        84/146/270 90/95/99% latencies
        2.08063e+06 tps

We can also look at CPU time in csum_partial using perf (with bnx2x
setup). For GRE with TCP_STREAM I see:

    With remote checksum offload
        0.33% TX
        1.81% RX
    Without remote checksum offload
        6.00% TX
        0.51% RX

I suspect the fact that time in csum_partial noticably increases
with remote checksum offload for RX is due to taking the cache miss on
the encapsulated header in that function. By similar reasoning, if on
the TX side the packet were not in cache (say we did a splice from a
file whose data was never touched by the CPU) the CPU savings for TX
would probably be more pronounced.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

1d76c1d0

gue: Receive side of remote checksum offload · a8d31c12

Tom Herbert authored Nov 04, 2014

Add processing of the remote checksum offload option in both the normal
path as well as the GRO path. The implements patching the affected
checksum to derive the offloaded checksum.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a8d31c12

gue: TX support for using remote checksum offload option · b17f709a

Tom Herbert authored Nov 04, 2014

Add if_tunnel flag TUNNEL_ENCAP_FLAG_REMCSUM to configure
remote checksum offload on an IP tunnel. Add logic in gue_build_header
to insert remote checksum offload option.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b17f709a

gue: Protocol constants for remote checksum offload · c1aa8347

Tom Herbert authored Nov 04, 2014

Define a private flag for remote checksun offload as well as a length
for the option.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c1aa8347

udp: Changes to udp_offload to support remote checksum offload · e585f236

Tom Herbert authored Nov 04, 2014

Add a new GSO type, SKB_GSO_TUNNEL_REMCSUM, which indicates remote
checksum offload being done (in this case inner checksum must not
be offloaded to the NIC).

Added logic in __skb_udp_tunnel_segment to handle remote checksum
offload case.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e585f236

gue: Add infrastructure for flags and options · 5024c33a

Tom Herbert authored Nov 04, 2014

Add functions and basic definitions for processing standard flags,
private flags, and control messages. This includes definitions
to compute length of optional fields corresponding to a set of flags.
Flag validation is in validate_gue_flags function. This checks for
unknown flags, and that length of optional fields is <= length
in guehdr hlen.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5024c33a

udp: Offload outer UDP tunnel csum if available · 4bcb877d

Tom Herbert authored Nov 04, 2014

In __skb_udp_tunnel_segment if outer UDP checksums are enabled and
ip_summed is not already CHECKSUM_PARTIAL, set up checksum offload
if device features allow it.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4bcb877d

net: Move fou_build_header into fou.c and refactor · 63487bab

Tom Herbert authored Nov 04, 2014

Move fou_build_header out of ip_tunnel.c and into fou.c splitting
it up into fou_build_header, gue_build_header, and fou_build_udp.
This allows for other users for TX of FOU or GUE. Change ip_tunnel_encap
to call fou_build_header or gue_build_header based on the tunnel
encapsulation type. Similarly, added fou_encap_hlen and gue_encap_hlen
functions which are called by ip_encap_hlen. New net/fou.h has
prototypes and defines for this.

Added NET_FOU_IP_TUNNELS configuration. When this is set, IP tunnels
can use FOU/GUE and fou module is also selected.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

63487bab

Merge branch 'stmmac-next' · 890b7916

David S. Miller authored Nov 05, 2014

Giuseppe Cavallaro says:

====================
stmmac: review driver Koptions

Recently many Koption options have been added to have new glue logic on several
platforms.

The main goal behind this work is to guarantee that the driver built
fine on all the branches where it is present independently of which
glue logic is selected.

IMHO, it is better to remove all the not necessary Koption(s) that can hide
build problems when something changes in the driver and especially when
the DT compatibility allows us to manage all the platform data.

I compiled the driver w/o any issue on net-next Git for:

  x86, arm and sh4.

In case of there are build problems on some repos now it will be
easy to catch them and cherry-pick patches from mainstream.

For sure, do not hesitate to contact me in case of issue.

Also this set removes STMMAC_DEBUG_FS and BUS_MODE_DA. The latter is useless
and the former can be replaced by DEBUG_FS (always to make safe the build).

V2: patch-set re-based on top of the latest updates for net-next
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

890b7916

stmmac: remove BUS_MODE_DA · 98fbebcb

Giuseppe CAVALLARO authored Nov 04, 2014

This is a very old and often unused option to configure
a bit in a register inside the DMA. This support should
not stay under Koption and should be extended for new chips too.
This will be do later maybe via device-tree parameters.
Also no performance impact when remove this setting on STi platforms.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

98fbebcb

stmmac: remove STMMAC_DEBUG_FS · 50fb4f74

Giuseppe CAVALLARO authored Nov 04, 2014

the STMMAC_DEBUG_FS Koption is now removed from the
driver configuration and this support will be built
by default when DEBUG_FS is present. This can also be
useful on building driver verification.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

50fb4f74

stmmac: remove specific SoC Koption from platform. · c0d54066

Giuseppe CAVALLARO authored Nov 04, 2014

This patch removes all the Koptions added to build the glue-logic files
for all different architectures: DWMAC_MESON, DWMAC_SUNXI, DWMAC_STI ...
Nowadays the stmmac needs to be compiled on several platforms; in some
case it very convenient to guarantee that its build is always completed
with success on all the branches where the driver is present.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c0d54066

04 Nov, 2014 12 commits

net: phy: spi_ks8995: remove sysfs bin file by registered attribute · 30349bdb

Vladimir Zapolskiy authored Nov 04, 2014

When a sysfs binary file is asked to be removed, it is found by
attribute name, so strictly speaking this change is not a fix, but
just in case when attribute name is changed in the driver or sysfs
internals are changed, it might be better to remove the previously
created file using right the same binary attribute.
Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

30349bdb

udp: remove blank line between set and test · 6cf1093e

Fabian Frederick authored Nov 04, 2014

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

6cf1093e

ipv6: trivial, add bracket for the if block · 869ba988

Florent Fourcot authored Nov 04, 2014

The "else" block is on several lines and use bracket.
Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

869ba988

esp4: remove assignment in if condition · 05006e8c

Fabian Frederick authored Nov 04, 2014

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

05006e8c

Merge branch 'ecn_via_routing_table' · 90284c2b

David S. Miller authored Nov 04, 2014

Florian Westphal says:

====================
net: allow setting ecn via routing table

Here is v4 of the patchset, its exactly the same as v3 except in patch3/3
where I added the missing 'const' qualifier to a function argument that
Eric spotted during review.

I preserved Erics Acks so that he doesn't have to resend them.

v3 cover letter:

When using syn cookies, then do not simply trust that the echoed timestamp
was not modified to make sure that ecn is not turned on magically when it
is disabled on the host.

The first two patches, which were not part of earlier series, prepare
the cookie code for the ecn route metrics change by allowing is to
more easily use the existing dst object for ecn validation.

The 3rd patch adds the ecn route metric feature support.
It is almost the same as in v2, except that we'll now also test the
dst_features when decoding a syn cookie timestamp that indicates ecn support.

These three patches then allow turning on explicit congestion notification
based on the destination network.

For example, assuming the default tcp_ecn sysctl '2', the following will
enable ecn (tcp_ecn=1 behaviour, i.e. request ecn to be enabled for a
tcp connection) for all connections to hosts inside the 192.168.2/24 network:

ip route change 192.168.2.0/24 dev eth0 features ecn

Having a more fine-grained per-route setting can be beneficial for
various reasons, for example 1) within data centers, or 2) local ISPs
may deploy ECN support for their own video/streaming services [1], etc.

Joint work with Daniel Borkmann, feature suggested by Hannes Frederic Sowa.

The patch to enable this in iproute2 will be posted shortly, it is currently
also available here:
http://git.breakpoint.cc/cgit/fw/iproute2.git/commit/?h=iproute_features&id=8843d2d8973fb81c78a7efe6d42e3a17d739003e

[1] http://www.ietf.org/proceedings/89/slides/slides-89-tsvarea-1.pdf, p.15
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

90284c2b

net: allow setting ecn via routing table · f7b3bec6

Florian Westphal authored Nov 03, 2014

This patch allows to set ECN on a per-route basis in case the sysctl
tcp_ecn is not set to 1. In other words, when ECN is set for specific
routes, it provides a tcp_ecn=1 behaviour for that route while the rest
of the stack acts according to the global settings.

One can use 'ip route change dev $dev $net features ecn' to toggle this.

Having a more fine-grained per-route setting can be beneficial for various
reasons, for example, 1) within data centers, or 2) local ISPs may deploy
ECN support for their own video/streaming services [1], etc.

There was a recent measurement study/paper [2] which scanned the Alexa's
publicly available top million websites list from a vantage point in US,
Europe and Asia:

Half of the Alexa list will now happily use ECN (tcp_ecn=2, most likely
blamed to commit 255cac91 ("tcp: extend ECN sysctl to allow server-side
only ECN") ;)); the break in connectivity on-path was found is about
1 in 10,000 cases. Timeouts rather than receiving back RSTs were much
more common in the negotiation phase (and mostly seen in the Alexa
middle band, ranks around 50k-150k): from 12-thousand hosts on which
there _may_ be ECN-linked connection failures, only 79 failed with RST
when _not_ failing with RST when ECN is not requested.

It's unclear though, how much equipment in the wild actually marks CE
when buffers start to fill up.

We thought about a fallback to non-ECN for retransmitted SYNs as another
global option (which could perhaps one day be made default), but as Eric
points out, there's much more work needed to detect broken middleboxes.

Two examples Eric mentioned are buggy firewalls that accept only a single
SYN per flow, and middleboxes that successfully let an ECN flow establish,
but later mark CE for all packets (so cwnd converges to 1).

 [1] http://www.ietf.org/proceedings/89/slides/slides-89-tsvarea-1.pdf, p.15
 [2] http://ecn.ethz.ch/

Joint work with Daniel Borkmann.

Reference: http://thread.gmane.org/gmane.linux.network/335797Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

f7b3bec6

syncookies: split cookie_check_timestamp() into two functions · f1673381

Florian Westphal authored Nov 03, 2014

The function cookie_check_timestamp(), both called from IPv4/6 context,
is being used to decode the echoed timestamp from the SYN/ACK into TCP
options used for follow-up communication with the peer.

We can remove ECN handling from that function, split it into a separate
one, and simply rename the original function into cookie_decode_options().
cookie_decode_options() just fills in tcp_option struct based on the
echoed timestamp received from the peer. Anything that fails in this
function will actually discard the request socket.

While this is the natural place for decoding options such as ECN which
commit 172d69e6 ("syncookies: add support for ECN") added, we argue
that in particular for ECN handling, it can be checked at a later point
in time as the request sock would actually not need to be dropped from
this, but just ECN support turned off.

Therefore, we split this functionality into cookie_ecn_ok(), which tells
us if the timestamp indicates ECN support AND the tcp_ecn sysctl is enabled.

This prepares for per-route ECN support: just looking at the tcp_ecn sysctl
won't be enough anymore at that point; if the timestamp indicates ECN
and sysctl tcp_ecn == 0, we will also need to check the ECN dst metric.

This would mean adding a route lookup to cookie_check_timestamp(), which
we definitely want to avoid. As we already do a route lookup at a later
point in cookie_{v4,v6}_check(), we can simply make use of that as well
for the new cookie_ecn_ok() function w/o any additional cost.

Joint work with Daniel Borkmann.
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

f1673381

syncookies: avoid magic values and document which-bit-is-what-option · 274e2da0

Florian Westphal authored Nov 03, 2014

Was a bit more difficult to read than needed due to magic shifts;
add defines and document the used encoding scheme.

Joint work with Daniel Borkmann.
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

274e2da0

igmp: remove camel case definitions · 436f7c20

Fabian Frederick authored Nov 04, 2014

use standard uppercase for definitions
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

436f7c20

udp: remove else after return · c18450a5

Fabian Frederick authored Nov 04, 2014

else is unnecessary after return 0 in __udp4_lib_rcv()
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

c18450a5

inet: frags: remove inline on static in c file · aa1f731e

Fabian Frederick authored Nov 04, 2014

remove __inline__ / inline and let compiler decide what to do
with static functions
Inspired-by: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

aa1f731e

ipv4: remove 0/NULL assignment on static · 0d3979b9

Fabian Frederick authored Nov 04, 2014

static values are automatically initialized to 0
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

0d3979b9