Commits · fece33c19563aeb6b9a00ca7a466093ae58e6158 · Kirill Smelkov / iproute2

29 Nov, 2015 16 commits

Merge branch 'master' into net-next · fece33c1
Stephen Hemminger authored Nov 29, 2015

fece33c1

vxlan: Add support for remote checksum offload · 35f59d86

Tom Herbert authored Nov 27, 2015

This patch adds support to remote checksum checksum offload
to VXLAN. This patch adds remcsumtx and remcsumrx to ip vxlan
configuration to enable remote checksum offload for transmit
and receive on the VXLAN tunnel.

https://tools.ietf.org/html/draft-herbert-vxlan-rco-00

Example:

ip link add name vxlan0 type vxlan id 42 group 239.1.1.1 dev eth0 \
    udpcsum remcsumtx remcsumrx

Testing:

Ran single netperf over mlnx4 to illustrate the effest:

- Without RCO (UDP csum set to zero)
  4335.99 Mbps
- With RCO enabled
  7661.81 Mbps
Signed-off-by: Tom Herbert <tom@herbertland.com>

35f59d86

get rid of unnecessary fgets() buffer size limitation · 61170fd8

Phil Sutter authored Nov 28, 2015

fgets() will read at most size-1 bytes into the buffer and add a
terminating null-char at the end. Therefore it is not necessary to pass
a reduced buffer size when calling it.

This change was generated using the following semantic patch:

@@
identifier buf, fp;
@@
- fgets(buf, sizeof(buf) - 1, fp)
+ fgets(buf, sizeof(buf), fp)
Signed-off-by: Phil Sutter <phil@nwl.cc>

61170fd8

get rid of remaining -Wunused-result warnings · d572ed4d

Phil Sutter authored Nov 28, 2015

Although not fundamentally necessary to check return codes in these
spots, preventing the warnings will put new ones into focus.
Signed-off-by: Phil Sutter <phil@nwl.cc>

d572ed4d

ss: review is_ephemeral() · c29d3792

Phil Sutter authored Nov 28, 2015

No need to keep static port boundaries global, they are not used
directly. Keeping them local also allows to safely reduce their names to
the minimum. Assign hardcoded fallback values also if fscanf() fails.
Get rid of unnecessary braces around return parameter.

Instead of more or less duplicating is_ephemeral() in run_ssfilter(),
simply call the function instead.
Signed-off-by: Phil Sutter <phil@nwl.cc>

c29d3792

ss: reduce max indentation level in init_service_resolver() · 596307ea

Phil Sutter authored Nov 28, 2015

Exit early or continue on error instead of putting conditional into
conditional to make reading the code a bit easier.

Also, the call to memcpy() can be skipped by initialising prog with the
desired prefix.
Signed-off-by: Phil Sutter <phil@nwl.cc>

596307ea

lnstat: review lnstat_update() · db3ef44c

Phil Sutter authored Nov 28, 2015

Instead of calling rewind() and fgets() before every call to
scan_lines(), move them into scan_lines() itself.

This should also fix compat mode, as before the second call to
scan_lines() the first line was skipped unconditionally.
Signed-off-by: Phil Sutter <phil@nwl.cc>

db3ef44c

bridge.8: minor formatting cleanup · fc31817d

Phil Sutter authored Nov 24, 2015

- Replace commas at end of subsection with dots.
- Replace double whitespace by single one.
Signed-off-by: Phil Sutter <phil@nwl.cc>

fc31817d

iproute: restrict hoplimit values to be in range [0; 255] · ea6cbab7

Phil Sutter authored Nov 24, 2015

Technically, the range of possible hoplimit values are defined by IPv4
and IPv6 header formats. Both define the field to be eight bits in size,
which leads to a value range of [0;255]. Setting a packet's hoplimit
field to 0 though makes not much sense, as the next hop would
immediately drop the packet. Therefore Linux uses 0 as a special value
indicating to use the system's default hoplimit (configurable via
sysctl). In iproute, setting the hoplimit of a route to 0 is equivalent
to omitting the hoplimit parameter alltogether, so it is actually not
necessary to allow that value to be specified, but keep it anyway for
backwards compatibility.
Signed-off-by: Phil Sutter <phil@nwl.cc>

ea6cbab7

iptoken: simplify iptoken_list a bit · d81f54d5

Phil Sutter authored Nov 24, 2015

Since it uses only a single filter, rtnl_dump_filter() can be used.
Signed-off-by: Phil Sutter <phil@nwl.cc>

d81f54d5

ipaddress: drop unnecessary check in ipaddr_list_flush_or_save() · 906dfe48
Phil Sutter authored Nov 24, 2015
```
Right after ipaddr_reset_filter(), filter.family is always AF_UNSPEC.
Signed-off-by: Phil Sutter <phil@nwl.cc>
```
906dfe48

ipaddress: fix ipaddr_flush for Linux >= 3.1 · d25ec03e

Phil Sutter authored Nov 24, 2015

Linux version 3.1 introduced a consistency check for netlink dumps in
commit 670dc28 ("netlink: advertise incomplete dumps"). This bites
iproute2 when flushing more addresses than can fit into a single
RTM_GETADDR response. To silence the spurious error message "Dump was
interrupted and may be inconsistent.", advise rtnl_dump_filter_l() to
not care about NLM_F_DUMP_INTR.
Signed-off-by: Phil Sutter <phil@nwl.cc>

d25ec03e

libnetlink: introduce nc_flags · 8e72880f

Phil Sutter authored Nov 24, 2015

Allow for a filter to ignore certain nlmsg_flags.
Signed-off-by: Phil Sutter <phil@nwl.cc>

8e72880f

ipaddress: simplify ipaddr_flush() · c6995c48

Phil Sutter authored Nov 24, 2015

Since it's no longer relevant whether an IP address is primary or
secondary when flushing, ipaddr_flush() can be simplified a bit.
Signed-off-by: Phil Sutter <phil@nwl.cc>

c6995c48

rt_names: style cleanup · 68ef5072
Stephen Hemminger authored Nov 29, 2015
```
Cleanup all checkpatch complaints about whitespace in rt_names.
```
68ef5072

Add support for rt_tables.d · 13ada95d

David Ahern authored Nov 24, 2015

Add support for reading table id/name mappings from rt_tables.d
directory.
Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>

13ada95d

24 Nov, 2015 3 commits

geneve: add support for IPv6 link partners · 906ac543
John W. Linville authored Sep 24, 2015
```
Signed-off-by: John W. Linville <linville@tuxdriver.com>
```
906ac543
geneve: add support for IPv6 link partners · 6581df5e
John W. Linville authored Sep 24, 2015
```
Signed-off-by: John W. Linville <linville@tuxdriver.com>
```
6581df5e

{f,m}_bpf: allow for sharing maps · 32e93fb7

Daniel Borkmann authored Nov 13, 2015

This larger work addresses one of the bigger remaining issues on
tc's eBPF frontend, that is, to allow for persistent file descriptors.
Whenever tc parses the ELF object, extracts and loads maps into the
kernel, these file descriptors will be out of reach after the tc
instance exits.

Meaning, for simple (unnested) programs which contain one or
multiple maps, the kernel holds a reference, and they will live
on inside the kernel until the program holding them is unloaded,
but they will be out of reach for user space, even worse with
(also multiple nested) tail calls.

For this issue, we introduced the concept of an agent that can
receive the set of file descriptors from the tc instance creating
them, in order to be able to further inspect/update map data for
a specific use case. However, while that is more tied towards
specific applications, it still doesn't easily allow for sharing
maps accross multiple tc instances and would require a daemon to
be running in the background. F.e. when a map should be shared by
two eBPF programs, one attached to ingress, one to egress, this
currently doesn't work with the tc frontend.

This work solves exactly that, i.e. if requested, maps can now be
_arbitrarily_ shared between object files (PIN_GLOBAL_NS) or within
a single object (but various program sections, PIN_OBJECT_NS) without
"loosing" the file descriptor set. To make that happen, we use eBPF
object pinning introduced in kernel commit b2197755b263 ("bpf: add
support for persistent maps/progs") for exactly this purpose.

The shipped examples/bpf/bpf_shared.c code from this patch can be
easily applied, for instance, as:

 - classifier-classifier shared:

  tc filter add dev foo parent 1: bpf obj shared.o sec egress
  tc filter add dev foo parent ffff: bpf obj shared.o sec ingress

 - classifier-action shared (here: late binding to a dummy classifier):

  tc actions add action bpf obj shared.o sec egress pass index 42
  tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
  tc filter add dev foo parent 1: bpf bytecode '1,6 0 0 4294967295,' \
     action bpf index 42

The toy example increments a shared counter on egress and dumps its
value on ingress (if no sharing (PIN_NONE) would have been chosen,
map value is 0, of course, due to the two map instances being created):

  [...]
          <idle>-0     [002] ..s. 38264.788234: : map val: 4
          <idle>-0     [002] ..s. 38264.788919: : map val: 4
          <idle>-0     [002] ..s. 38264.789599: : map val: 5
  [...]

... thus if both sections reference the pinned map(s) in question,
tc will take care of fetching the appropriate file descriptor.

The patch has been tested extensively on both, classifier and
action sides.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

32e93fb7

23 Nov, 2015 21 commits

iproute2: Ignore EADDRNOTAVAIL errors during address flush operation · e149d4e8

Neil Horman authored Nov 05, 2015

I found recently that, if I disabled address promotion in the kernel, that
ip addr flush dev <dev>

would fail with an EADDRNOTAVAIL errno (though the flush operation would in fact
flush all addresses from an interface properly)

Whats happening is that, if I add a primary and multiple secondary addresses to
an interface, the flush operation first ennumerates them all with a GETADDR |
DUMP operation, then sends a delete request for each address. But the kernel,
having promotion disabled, deletes all secondary addresses when the primary is
removed. That means, that several delete requests may still be pending in the
netlink request for addresses that have been removed on our behalf, resulting in
EADDRNOTAVAIL return codes.

It seems the simplest thing to do is to understand that EADDRUNAVAIL isn't a
fatal outcome on a flush operation, as it just indicates that an address which
you want to remove is already removed, so it can safely be ignored.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Stephen Hemminger <stephen@networkplumber.org>
CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>

e149d4e8

bridge.8: document fdb replace command · 6e2e2cf0

Phil Sutter authored Nov 18, 2015

Despite commit 45a82e5 ("iproute vxlan add support for fdb replace
command"), the 'fdb replace' command was not mentioned in bridge.8.
Signed-off-by: Phil Sutter <phil@nwl.cc>

6e2e2cf0

lnstat: fix header displaying mechanism · fdb347f7

Phil Sutter authored Nov 18, 2015

The algorithm depends on the loop counter ('i') to increment by one in
each iteration. Though if running endlessly (count==0), the counter was
not incremented at all.

Also change formatting of the header printing conditional a bit so it's
hopefully easier to read.

Fixes: e7e2913f ("lnstat: run indefinitely by default")
Signed-off-by: Phil Sutter <phil@nwl.cc>

fdb347f7

lnstat: describe -s option in help output · 869fcabe
Phil Sutter authored Nov 18, 2015
```
Signed-off-by: Phil Sutter <phil@nwl.cc>
```
869fcabe
update kernel headers to 4.4-rc1 · 0198930b
Stephen Hemminger authored Nov 23, 2015
```
Post merge window changes
```
0198930b

ip_common.h header cleanup · f7b49a3f

Phil Sutter authored Nov 06, 2015

- Drop 'extern' keyword from all function prototypes.
- Make line breaking of print_* functions consistent.
- Make print_ntable() and ipntable_reset_filter() static and remove
  their declaration.
- Drop declaration of non-existent ipaddr_list() and iproute_monitor().
Signed-off-by: Phil Sutter <phil@nwl.cc>

f7b49a3f

misc: remove extra blank line · 23d6c997
Stephen Hemminger authored Nov 23, 2015

23d6c997
man8: scrub trailing whitespace · 5699275b
Stephen Hemminger authored Nov 23, 2015
```
Remove extraneous whitespace
```
5699275b
man: Spelling fixes · ac0817ef
Ville Skyttä authored Nov 07, 2015
```
Signed-off-by: Ville Skyttä <ville.skytta@iki.fi>
```
ac0817ef

man: Syntax and warning fixes · 85e3c87c

Ville Skyttä authored Nov 07, 2015

Fix syntax issues and warnings highlighted by `man --warnings=w' from
man-db 2.7.1.
Signed-off-by: Ville Skyttä <ville.skytta@iki.fi>

85e3c87c

ip{,6}tunnel: put spaces around non-unary operators · 04ce8d3e
Phil Sutter authored Nov 13, 2015
```
Signed-off-by: Phil Sutter <phil@nwl.cc>
```
04ce8d3e

iptunnel: sanitize copying tunnel name · f53ecee8

Phil Sutter authored Nov 13, 2015

Since p->name is only IFNAMSIZ bytes, do not copy more than IFNAMSIZ - 1
bytes into it so there remains at least a single null byte in the end.
Signed-off-by: Phil Sutter <phil@nwl.cc>

f53ecee8

iptunnel: share common code when determining the default interface name · c957821b
Phil Sutter authored Nov 13, 2015
```
Signed-off-by: Phil Sutter <phil@nwl.cc>
```
c957821b

iptunnel: simplify parsing TTL, allow 'hlim' as identifier · 0dd4d2b3

Phil Sutter authored Nov 13, 2015

Instead of parsing an unsigned integer and checking boundaries, simply
parse u8. This and the added ttl alias 'hlim' provide consistency with
ip6tunnel.
Signed-off-by: Phil Sutter <phil@nwl.cc>

0dd4d2b3

iptunnel: share common code when setting tunnel mode · 2520598a
Phil Sutter authored Nov 13, 2015
```
Signed-off-by: Phil Sutter <phil@nwl.cc>
```
2520598a
ip6tunnel: fix coding style: no newline between brace and else · 7894ce77
Phil Sutter authored Nov 13, 2015
```
Signed-off-by: Phil Sutter <phil@nwl.cc>
```
7894ce77

ip6tunnel: print local/remote addresses like iptunnel does · 9af72f81

Phil Sutter authored Nov 13, 2015

This makes output consistent with iptunnel, also supporting reverse DNS
lookup for remote address if requested.
Signed-off-by: Phil Sutter <phil@nwl.cc>

9af72f81

ip{,6}tunnel: align do_tunnels_list() a bit · c4527d7b

Phil Sutter authored Nov 13, 2015

In iptunnel, declare loop variables inside the loop as done in
ip6tunnel.

Fix and simplify goto logic in ip6tunnel:
- Failure to read over header lines would have left fp opened.
- By returning directly upon fopen() failure, fp can be closed
  unconditionally in the end.

Use the same goto logic in iptunnel, as well.
Signed-off-by: Phil Sutter <phil@nwl.cc>

c4527d7b

iptunnel: use ll_name_to_index() for physical interface lookup · 4b3cb962

Phil Sutter authored Nov 13, 2015

Although the cache is only initialized in do_show(), this way it is at
least consistent with ip6tunnel.
Signed-off-by: Phil Sutter <phil@nwl.cc>

4b3cb962

ip{, 6}tunnel: unify behaviour if physical device is not found · 6ddb1e8c

Phil Sutter authored Nov 13, 2015

Make ip6tunnel print an error message as well. While there, get rid of
unnecessary line breaking.
Signed-off-by: Phil Sutter <phil@nwl.cc>

6ddb1e8c

ip/tunnel: introduce tnl_parse_key() · a7ed1520

Phil Sutter authored Nov 13, 2015

Instead of duplicating the same code six times (key, ikey and okey in
iptunnel and ip6tunnel), have a common parsing routine. This has the
added benefit of having the same verbose error message in ip6tunnel as
well as iptunnel.

I'm not sure if parsing an IPv4 address as key makes sense for
ip6tunnel, but the code was there before so this patch at least doesn't
make it worse.
Signed-off-by: Phil Sutter <phil@nwl.cc>

a7ed1520