Commits · c833484e5f3872a38fe232c663586069d5ad9645 · nexedi / linux

29 Feb, 2016 10 commits

batman-adv: ELP - compute the metric based on the estimated throughput · c833484e

Antonio Quartulli authored Nov 10, 2015

In case of wireless interface retrieve the throughput by
querying cfg80211. To perform this call a separate work
must be scheduled because the function may sleep and this
is not allowed within an RCU protected context (RCU in this
case is used to iterate over all the neighbours).

Use ethtool to retrieve information about an Ethernet link
like HALF/FULL_DUPLEX and advertised bandwidth (e.g.
100/10Mbps).

The metric is updated each time a new ELP packet is sent,
this way it is possible to timely react to a metric
variation which can imply (for example) a neighbour
disconnection.
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>

c833484e

batman-adv: keep track of when unicast packets are sent · 95d39278

Antonio Quartulli authored Jan 16, 2016

To enable ELP to send probing packets over wireless links
only if needed, batman-adv must keep track of the last time
it sent a unicast packet towards every neighbour.

For this purpose a 2 main changes are introduced:
1) a new member of the elp_neigh_node structure stores the
   last time a unicast packet was sent towards this neighbour;
2) a wrapper function for sending unicast packets is
   implemented. This function will simply update the member
   describe din point 1) and then forward the packet to the
   real sending routine.

Point 2) implies that any code-path leading to a unicast
sending now has to use the new wrapper.
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>

95d39278

batman-adv: add throughput override attribute to hard_ifaces · 0b5ecc68

Antonio Quartulli authored Jan 16, 2016

This attribute is exported to user space to disable the link
throughput auto-detection by setting a fixed value.
The throughput override value is used when batman-adv is
computing the link throughput towards a neighbour.

If the value is set to 0 then batman-adv will try to detect
the throughput by itself.
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>

0b5ecc68

batman-adv: OGMv2 - implement originators logic · 9323158e

Antonio Quartulli authored Jan 16, 2016

Add the support for recognising new originators in the
network and rebroadcast their OGMs.
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>

9323158e

batman-adv: OGMv2 - add basic infrastructure · 0da00359

Antonio Quartulli authored Jan 16, 2016

This is the initial implementation of the new OGM protocol
(version 2). It has been designed to work on top of the
newly added ELP.

In the previous version the OGM protocol was used to both
measure link qualities and flood the network with the metric
information. In this version the protocol is in charge of
the latter task only, leaving the former to ELP.

This means being able to decouple the interval used by the
neighbor discovery from the OGM broadcasting, which revealed
to be costly in dense networks and needed to be relaxed so
leading to a less responsive routing protocol.
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>

0da00359

batman-adv: ELP - adding sysfs parameter for elp interval · 7f136cd4

Linus Luessing authored Jan 16, 2016

This parameter can be set individually on each interface and
allows the configuration of the elp interval for the link
quality measurements during runtime. Usually it is desirable
to set it to a higher (= slower) value on interfaces which
have a more static characteristic (e.g. wired interfaces)
or very dense neighbourhoods to reduce overhead.

Developed by Linus during a 6 months trainee study period in
Ascom (Switzerland) AG.
Signed-off-by: Linus Luessing <linus.luessing@web.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
[antonio@open-mesh.com: respin on top of the latest master]
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>

7f136cd4

batman-adv: ELP - creating neighbor structures · 162bd64c

Linus Luessing authored Jan 16, 2016

Initially developed by Linus during a 6 months trainee study
period in Ascom (Switzerland) AG.
Signed-off-by: Linus Luessing <linus.luessing@web.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>

162bd64c

batman-adv: ELP - adding basic infrastructure · d6f94d91

Linus Luessing authored Jan 16, 2016

The B.A.T.M.A.N. protocol originally only used a single
message type (called OGM) to determine the link qualities to
the direct neighbors and spreading these link quality
information through the whole mesh. This procedure is
summarized on the BATMAN concept page and explained in
details in the RFC draft published in 2008.

This approach was chosen for its simplicity during the
protocol design phase and the implementation. However, it
also bears some drawbacks:

 *  Wireless interfaces usually come with some packet loss,
    therefore a higher broadcast rate is desirable to allow
    a fast reaction on flaky connections.
    Other interfaces of the same host might be connected to
    Ethernet LANs / VPNs / etc which rarely exhibit packet
    loss would benefit from a lower broadcast rate to reduce
    overhead.
 *  It generally is more desirable to detect local link
    quality changes at a faster rate than propagating all
    these changes through the entire mesh (the far end of
    the mesh does not need to care about local link quality
    changes that much). Other optimizations strategies, like
    reducing overhead, might be possible if OGMs weren't
    used for all tasks in the mesh at the same time.

As a result detecting local link qualities shall be handled
by an independent message type, ELP, whereas the OGM message
type remains responsible for flooding the mesh with these
link quality information and determining the overall path
transmit qualities.

Developed by Linus during a 6 months trainee study period in
Ascom (Switzerland) AG.
Signed-off-by: Linus Luessing <linus.luessing@web.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>

d6f94d91

batman-adv: Add hard_iface specific sysfs wrapper macros for UINT · 0744ff8f

Linus Luessing authored Jan 16, 2016

This allows us to easily add a sysfs parameter for an
unsigned int later, which is not for a batman mesh interface
(e.g. bat0), but for a common interface instead. It allows
reading and writing an atomic_t in hard_iface (instead of
bat_priv compared to the mesh variant).

Developed by Linus during a 6 months trainee study period in
Ascom (Switzerland) AG.
Signed-off-by: Linus Luessing <linus.luessing@web.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
[antonio@open-mesh.com: rename functions and move macros]
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>

0744ff8f

3c59x: Ensure to apply the expires time · f12d33f4

Stafford Horne authored Feb 28, 2016

In commit 5b6490de ("3c59x: Use setup_timer()") Amitoj
removed add_timer which sets up the epires timer.  In this patch
the behavior is restore but it uses mod_timer which is a bit more
compact.
Signed-off-by: Stafford Horne <shorne@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f12d33f4

27 Feb, 2016 1 commit

rocker: fix an error code · f73e0c24

Dan Carpenter authored Feb 27, 2016

We intended to return PTR_ERR() here instead of 1.

Fixes: 1f9993f6 ('rocker: fix a neigh entry leak issue')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f73e0c24

26 Feb, 2016 24 commits

Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · e1bae75d

David S. Miller authored Feb 26, 2016

Jeff Kirsher says:

====================
1GbE Intel Wired LAN Driver Updates 2016-02-24

This series contains updates to e1000e, igb and igbvf.

Raanan provides updates for e1000e, first increases the ULP timer since it
now takes longer for the ULP exit to complete on Skylake.  Fixes the
configuration of the internal hardware PHY clock gating mechanism, which was
causing packet loss due to mis configuring.  Fixed additional ULP
configuration settings which were not being properly cleared after cable
connect in V-Pro capable systems.  Added support for more i219 devices.

Takuma Ueba provides a fix for I210 where IPv6 autoconf test sometimes
fails due to DAD NS for link-local is not transmitted.  To avoid this
issue, we need to wait until 1000BASE-T status register "Remote receiver
status OK".

Todd provides a patch to override EEPROM WoL settings for specific OEM
devices. Then renamed igb defines to be more generic, since the define
E1000_MRQC_ENABLE_RSS_4Q enables 4 and 8 queues depending on the part.

Roland Hii fixes an issue where only the half cycle time of less than or
equal to 70 millisecond uses the I210 clock output function.  His patch
adds additional conditions when half cycle time is equal to 125 or 250 or
500 millisecond to use the clock output function.

Alex Duyck adds support for generic transmit checksums for igb and igbvf.

Jon Maxwell fixes an issues where customer applications are registering
and un-registering multicast addresses every few seconds which is leading
to many "Link is up" messages in the logs as a result of the
netif_carrier_off(netdev) in igbvf_msix_other().  So remove the
link is up message when registering multicast addresses.

Corinna Vinschen provides a fix for when switching off VLAN offloading on
i350, the VLAN interface becomes unusable.

Stefan Assmann updates the driver to use ndo_stop() instead of
dev_close() when running ethtool offline self test.  Since dev_close()
causes IFF_UP to be cleared which will remove the interfaces routes
and some addresses.

v2: Dropped patches 6-10 in the original series.  Patch 6-7 added support
    for character device for AVB and based on community feedback, we do not
    want to do this.  Patches 8-10 provided fixes to the problematic code
    added in patches 6 & 7.  So all of them must go!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e1bae75d

GSO: Provide software checksum of tunneled UDP fragmentation offload · 22463876

Alexander Duyck authored Feb 24, 2016

On reviewing the code I realized that GRE and UDP tunnels could cause a
kernel panic if we used GSO to segment a large UDP frame that was sent
through the tunnel with an outer checksum and hardware offloads were not
available.

In order to correct this we need to update the feature flags that are
passed to the skb_segment function so that in the event of UDP
fragmentation being requested for the inner header the segmentation
function will correctly generate the checksum for the payload if we cannot
segment the outer header.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

22463876

Merge branch 'vrf-saddr-selection' · c3f463ba

David S. Miller authored Feb 26, 2016

David Ahern says:

====================
net: l3mdev: Fix source address for unnumbered deployments

David Lamparter noted a use case where the source address selection fails
to pick an address from a VRF interface - unnumbered interfaces. The use
case has the VRF device as the VRF local loopback with addresses and
interfaces enslaved without an address themselves. e.g,

    ip addr add 9.9.9.9/32 dev lo
    ip link set lo up

    ip link add name vrf0 type vrf table 101
    ip rule add oif vrf0 table 101
    ip rule add iif vrf0 table 101
    ip link set vrf0 up
    ip addr add 10.0.0.3/32 dev vrf0

    ip link add name dummy2 type dummy
    ip link set dummy2 master vrf0 up

    --> note dummy2 has no address - unnumbered device

    ip route add 10.2.2.2/32 dev dummy2 table 101
    ip neigh add 10.2.2.2 dev dummy2 lladdr 02:00:00:00:00:02

ping to the 10.2.2.2 through the L3 domain:

    $ ping -I vrf0 -c1 10.2.2.2
    ping: Warning: source address might be selected on device other than vrf0.
    PING 10.2.2.2 (10.2.2.2) from 9.9.9.9 vrf0: 56(84) bytes of data.

picks up the wrong address -- the one from 'lo' not vrf0. And from tcpdump:
    12:57:29.449128 IP 9.9.9.9 > 10.2.2.2: ICMP echo request, id 2491, seq 1, length 64

This patch series changes address selection to only consider devices in
the same L3 domain and to use the VRF device as the L3 domains loopback.

    $ ping -I vrf0 -c1 10.2.2.2
    PING 10.2.2.2 (10.2.2.2) from 10.0.0.3 vrf0: 56(84) bytes of data.

From tcpdump:
    12:59:25.096426 IP 10.0.0.3 > 10.2.2.2: ICMP echo request, id 2113, seq 1, length 64

Now the source address comes from vrf0.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

c3f463ba

net: l3mdev: prefer VRF master for source address selection · 17b693cd

David Lamparter authored Feb 24, 2016

When selecting an address in context of a VRF, the vrf master should be
preferred for address selection.  If it isn't, the user has a hard time
getting the system to select to their preference - the code will pick
the address off the first in-VRF interface it can find, which on a
router could well be a non-routable address.
Signed-off-by: David Lamparter <equinox@diac24.net>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
[dsa: Fixed comment style and removed extra blank link ]
Signed-off-by: David S. Miller <davem@davemloft.net>

17b693cd

net: l3mdev: address selection should only consider devices in L3 domain · 3f2fb9a8

David Ahern authored Feb 24, 2016

David Lamparter noted a use case where the source address selection fails
to pick an address from a VRF interface - unnumbered interfaces.

Relevant commands from his script:
    ip addr add 9.9.9.9/32 dev lo
    ip link set lo up

    ip link add name vrf0 type vrf table 101
    ip rule add oif vrf0 table 101
    ip rule add iif vrf0 table 101
    ip link set vrf0 up
    ip addr add 10.0.0.3/32 dev vrf0

    ip link add name dummy2 type dummy
    ip link set dummy2 master vrf0 up

    --> note dummy2 has no address - unnumbered device

    ip route add 10.2.2.2/32 dev dummy2 table 101
    ip neigh add 10.2.2.2 dev dummy2 lladdr 02:00:00:00:00:02

    tcpdump -ni dummy2 &

And using ping instead of his socat example:
    $ ping -I vrf0 -c1 10.2.2.2
    ping: Warning: source address might be selected on device other than vrf0.
    PING 10.2.2.2 (10.2.2.2) from 9.9.9.9 vrf0: 56(84) bytes of data.

>From tcpdump:
    12:57:29.449128 IP 9.9.9.9 > 10.2.2.2: ICMP echo request, id 2491, seq 1, length 64

Note the source address is from lo and is not a VRF local address. With
this patch:

    $ ping -I vrf0 -c1 10.2.2.2
    PING 10.2.2.2 (10.2.2.2) from 10.0.0.3 vrf0: 56(84) bytes of data.

>From tcpdump:
    12:59:25.096426 IP 10.0.0.3 > 10.2.2.2: ICMP echo request, id 2113, seq 1, length 64

Now the source address comes from vrf0.

The ipv4 function for selecting source address takes a const argument.
Removing the const requires touching a lot of places, so instead
l3mdev_master_ifindex_rcu is changed to take a const argument and then
do the typecast to non-const as required by netdev_master_upper_dev_get_rcu.
This is similar to what l3mdev_fib_table_rcu does.

IPv6 for unnumbered interfaces appears to be selecting the addresses
properly.

Cc: David Lamparter <david@opensourcerouting.org>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3f2fb9a8

Merge branch 'ethtool-ksettings' · 8d3f2806

David S. Miller authored Feb 25, 2016

David Decotigny says:

====================
new ETHTOOL_GLINKSETTINGS/SLINKSETTINGS API

History:
 v9
 - add 'link' in macro, struct and function names
 - rename ethtool_link_ksettings::parent -> ::base
 - remove un-needed mlx4 en_dbg_enabled() companion patch
 - note: bitmap u32[] API patches were merged separately by Kan Liang
 v8
 - bitmap u32 API returns number of bits copied, unit tests updated
 v7
 - module_exit in test_bitmap
 v6
 - fix copy_from_user in user/kernel handshake
 v5
 note: please see v4 bullets for a question regarding bitmap.c
 - minor fix to make allyesconfig/allmodconfig
 v4
 - removed typedef for link mode bitmaps
 - moved bitmap<->u32[] conversion routines to bitmap.c . This is the
   naive implementation. I have an endian-aware version that uses
   memcpy/memset as much as possible, but I find it harder to follow
   (see http://paste.ubuntu.com/13863722/). Please let me know if I
   should use it instead.
 - fixes suggested by Ben Hutchings
 v3
 - rebased v2 on top of latest net-next, minor checkpatch/printf %*pb
   updates
 v2
 - keep return 0 in get_settings when successful, instead of
   propagating positive result from driver's get_settings callback.
 v1
 - original submission

The main goal of this series is to support ethtool link mode masks
larger than 32 bits. It implements a new ioctl pair
(ETHTOOL_GLINKSETTINGS/SLINKSETTINGS), its associated callbacks
(get/set_link_ksettings) and a new struct ethtool_link_settings, which
should eventually replace legacy ethtool_cmd. Internally, the kernel
uses fixed length link mode masks defined at compilation time in
ethtool.h (for now: 31 bits), that can be increased by changing
__ETHTOOL_LINK_MODE_LAST in ethtool.h (absolute max is 4064 bits,
checked at compile time), and the user/kernel interface allows this
length to be arbitrary within 1..4064. This should allow some
flexibility without using too much heap/stack space, at the cost of a
small kernel/user handshake for the user to determine the sizes of
those bitmaps.

Along the way, I chose to drop in the new structure the 3 ethtool_cmd
fields marked "deprecated" (transceiver/maxrxpkt/maxtxpkt). They are
still available for old drivers via the (old) ETHTOOL_GSET/SSET API,
but are not available to drivers that switch to new API. Of those 3
fields, ethtool_cmd::transceiver seems to be still actively used by
several drivers, maybe we should not consider this field deprecated?
The 2 other fields are basically not used. This transition requires
some care in the way old and new ethtool talk to the kernel.

More technical details provided in the description for main patch. In
particular details about backward compatibility properties.

Some open questions:
 - the kernel/interface multiplexes the "tell me the bitmap length"
   handshake and the "give me the settings" inside the new
   ETHTOOL_GLINKSETTINGS cmd. I was thinking of making this into 2
   separate cmds: 1 cmd ETHTOOL_GKERNELPROPERTIES which would be
   kernel-wide rather than device-specific, would return properties
   like "length of the link mode bitmaps", and possibly others. And
   ETHTOOL_GLINKSETTINGS would expect the proper bitmaps
 - the link mode bitmaps are piggybacked at tail of the new struct
   ethtool_link_settings. Since its user-visible definition does not
   assume specific bitmap width, I am using a 0-length array as the
   publicly visible placeholder. But then, the kernel needs to
   specialize it (struct ethtool_link_ksettings) to specify its
   current link mode masks. This means that kernel code is "littered"
   with "ksettings->base.field" to access "field" inside
   ethtool_settings:
   + I could use ethtool_link_settings everywhere (instead of a new
     ethtool_ksettings) and an container_of accessor (or a plain cast)
     to retrieve the link mode masks?
   + or: we could decide to make the link mode masks statically
     bounded again, ie. make their width public, but larger than
     current 32, and unchangeable forever. This would make everything
     straightforward, but we might hit limits later, or have an
     unneeded memory/stack usage for unused bits.
   any preference?
 - I foresee bugs where people use the legacy/deprecated SUPPORTED_x
   macros instead of the new ETHTOOL_LINK_MODE_x_BIT enums in the new
   get/set_link_ksettings callbacks. Not sure how to prevent problems
   with this.

The only driver which was converted for now is mlx4. I am not
considering fcoe as fully converted, but I updated it a minima to be
able to remove __ethtool_get_settings, now known as
__ethtool_get_link_ksettings.

Tested with legacy and "future" ethtool on 64b x86 kernel and 32+64b
ethtool, and on a 32b x86 kernel + 32b ethtool.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

8d3f2806

net: mlx4: use new ETHTOOL_G/SSETTINGS API · 3d8f7cc7

David Decotigny authored Feb 24, 2016

Signed-off-by: David Decotigny <decot@googlers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3d8f7cc7

net: ethtool: remove unused __ethtool_get_settings · 3237fc63

David Decotigny authored Feb 24, 2016

replaced by __ethtool_get_link_ksettings.
Signed-off-by: David Decotigny <decot@googlers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3237fc63

net: core: use __ethtool_get_ksettings · 7cad1bac

David Decotigny authored Feb 24, 2016

Signed-off-by: David Decotigny <decot@googlers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7cad1bac

net: bridge: use __ethtool_get_ksettings · 702b26a2

David Decotigny authored Feb 24, 2016

Signed-off-by: David Decotigny <decot@googlers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

702b26a2

net: 8021q: use __ethtool_get_ksettings · 57709798

David Decotigny authored Feb 24, 2016

Signed-off-by: David Decotigny <decot@googlers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

57709798

net: rdma: use __ethtool_get_ksettings · 17605b96

David Decotigny authored Feb 24, 2016

Signed-off-by: David Decotigny <decot@googlers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

17605b96

net: fcoe: use __ethtool_get_ksettings · 008eb736

David Decotigny authored Feb 24, 2016

Signed-off-by: David Decotigny <decot@googlers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

008eb736

net: team: use __ethtool_get_ksettings · 0ab6b544

David Decotigny authored Feb 24, 2016

Signed-off-by: David Decotigny <decot@googlers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0ab6b544

net: macvlan: use __ethtool_get_ksettings · 85f95819

David Decotigny authored Feb 24, 2016

Signed-off-by: David Decotigny <decot@googlers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

85f95819

net: ipvlan: use __ethtool_get_ksettings · 314d10d7

David Decotigny authored Feb 24, 2016

Signed-off-by: David Decotigny <decot@googlers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

314d10d7

net: bonding: use __ethtool_get_ksettings · 9856909c

David Decotigny authored Feb 24, 2016

Signed-off-by: David Decotigny <decot@googlers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9856909c

net: usnic: use __ethtool_get_ksettings · 96a0c396

David Decotigny authored Feb 24, 2016

Signed-off-by: David Decotigny <decot@googlers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

96a0c396

tx4939: use __ethtool_get_ksettings · 091a9277

David Decotigny authored Feb 24, 2016

Signed-off-by: David Decotigny <decot@googlers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

091a9277

net: ethtool: add new ETHTOOL_xLINKSETTINGS API · 3f1ac7a7

David Decotigny authored Feb 24, 2016

This patch defines a new ETHTOOL_GLINKSETTINGS/SLINKSETTINGS API,
handled by the new get_link_ksettings/set_link_ksettings callbacks.
This API provides support for most legacy ethtool_cmd fields, adds
support for larger link mode masks (up to 4064 bits, variable length),
and removes ethtool_cmd deprecated
fields (transceiver/maxrxpkt/maxtxpkt).

This API is deprecating the legacy ETHTOOL_GSET/SSET API and provides
the following backward compatibility properties:
 - legacy ethtool with legacy drivers: no change, still using the
   get_settings/set_settings callbacks.
 - legacy ethtool with new get/set_link_ksettings drivers: the new
   driver callbacks are used, data internally converted to legacy
   ethtool_cmd. ETHTOOL_GSET will return only the 1st 32b of each link
   mode mask. ETHTOOL_SSET will fail if user tries to set the
   ethtool_cmd deprecated fields to
   non-0 (transceiver/maxrxpkt/maxtxpkt). A kernel warning is logged if
   driver sets higher bits.
 - future ethtool with legacy drivers: no change, still using the
   get_settings/set_settings callbacks, internally converted to new data
   structure. Deprecated fields (transceiver/maxrxpkt/maxtxpkt) will be
   ignored and seen as 0 from user space. Note that that "future"
   ethtool tool will not allow changes to these deprecated fields.
 - future ethtool with new drivers: direct call to the new callbacks.

By "future" ethtool, what is meant is:
 - query: first try ETHTOOL_GLINKSETTINGS, and revert to ETHTOOL_GSET if
   fails
 - set: query first and remember which of ETHTOOL_GLINKSETTINGS or
   ETHTOOL_GSET was successful
   + if ETHTOOL_GLINKSETTINGS was successful, then change config with
     ETHTOOL_SLINKSETTINGS. A failure there is final (do not try
     ETHTOOL_SSET).
   + otherwise ETHTOOL_GSET was successful, change config with
     ETHTOOL_SSET. A failure there is final (do not try
     ETHTOOL_SLINKSETTINGS).

The interaction user/kernel via the new API requires a small
ETHTOOL_GLINKSETTINGS handshake first to agree on the length of the link
mode bitmaps. If kernel doesn't agree with user, it returns the bitmap
length it is expecting from user as a negative length (and cmd field is
0). When kernel and user agree, kernel returns valid info in all
fields (ie. link mode length > 0 and cmd is ETHTOOL_GLINKSETTINGS).

Data structure crossing user/kernel boundary is 32/64-bit
agnostic. Converted internally to a legal kernel bitmap.

The internal __ethtool_get_settings kernel helper will gradually be
replaced by __ethtool_get_link_ksettings by the time the first
"link_settings" drivers start to appear. So this patch doesn't change
it, it will be removed before it needs to be changed.
Signed-off-by: David Decotigny <decot@googlers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3f1ac7a7

net: usnic: use __ethtool_get_settings · 48133335

David Decotigny authored Feb 24, 2016

Signed-off-by: David Decotigny <decot@googlers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

48133335

net: usnic: remove unused call to ethtool_ops::get_settings · 4f03980c
David Decotigny authored Feb 24, 2016
```
Signed-off-by: David Decotigny <decot@googlers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
4f03980c

net: Facility to report route quality of connected sockets · a87cb3e4

Tom Herbert authored Feb 24, 2016

This patch add the SO_CNX_ADVICE socket option (setsockopt only). The
purpose is to allow an application to give feedback to the kernel about
the quality of the network path for a connected socket. The value
argument indicates the type of quality report. For this initial patch
the only supported advice is a value of 1 which indicates "bad path,
please reroute"-- the action taken by the kernel is to call
dst_negative_advice which will attempt to choose a different ECMP route,
reset the TX hash for flow label and UDP source port in encapsulation,
etc.

This facility should be useful for connected UDP sockets where only the
application can provide any feedback about path quality. It could also
be useful for TCP applications that have additional knowledge about the
path outside of the normal TCP control loop.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a87cb3e4

net: ipv6: Make address flushing on ifdown optional · f1705ec1

David Ahern authored Feb 24, 2016

Currently, all ipv6 addresses are flushed when the interface is configured
down, including global, static addresses:

    $ ip -6 addr show dev eth1
    3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
        inet6 2100:1::2/120 scope global
           valid_lft forever preferred_lft forever
        inet6 fe80::e0:f9ff:fe79:34bd/64 scope link
           valid_lft forever preferred_lft forever
    $ ip link set dev eth1 down
    $ ip -6 addr show dev eth1
    << nothing; all addresses have been flushed>>

Add a new sysctl to make this behavior optional. The new setting defaults to
flush all addresses to maintain backwards compatibility. When the set global
addresses with no expire times are not flushed on an admin down. The sysctl
is per-interface or system-wide for all interfaces

    $ sysctl -w net.ipv6.conf.eth1.keep_addr_on_down=1
or
    $ sysctl -w net.ipv6.conf.all.keep_addr_on_down=1

Will keep addresses on eth1 on an admin down.

    $ ip -6 addr show dev eth1
    3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
        inet6 2100:1::2/120 scope global
           valid_lft forever preferred_lft forever
        inet6 fe80::e0:f9ff:fe79:34bd/64 scope link
           valid_lft forever preferred_lft forever
    $ ip link set dev eth1 down
    $ ip -6 addr show dev eth1
    3: eth1: <BROADCAST,MULTICAST> mtu 1500 state DOWN qlen 1000
        inet6 2100:1::2/120 scope global tentative
           valid_lft forever preferred_lft forever
        inet6 fe80::e0:f9ff:fe79:34bd/64 scope link tentative
           valid_lft forever preferred_lft forever
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f1705ec1

25 Feb, 2016 5 commits

tipc: fix null deref crash in compat config path · 619b1745

Florian Westphal authored Feb 24, 2016

msg.dst_sk needs to be set up with a valid socket because some callbacks
later derive the netns from it.

Fixes: 263ea09084d172d ("Revert "genl: Add genlmsg_new_unicast() for unicast message allocation")
Reported-by: Jon Maloy <maloy@donjonn.com>
Bisected-by: Jon Maloy <maloy@donjonn.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

619b1745

tipc: fix crash during node removal · d25a0125

Jon Paul Maloy authored Feb 24, 2016

When the TIPC module is unloaded, we have identified a race condition
that allows a node reference counter to go to zero and the node instance
being freed before the node timer is finished with accessing it. This
leads to occasional crashes, especially in multi-namespace environments.

The scenario goes as follows:

CPU0:(node_stop)                       CPU1:(node_timeout)  // ref == 2

1:                                          if(!mod_timer())
2: if (del_timer())
3:   tipc_node_put()                                        // ref -> 1
4: tipc_node_put()                                          // ref -> 0
5:   kfree_rcu(node);
6:                                               tipc_node_get(node)
7:                                               // BOOM!

We now clean up this functionality as follows:

1) We remove the node pointer from the node lookup table before we
   attempt deactivating the timer. This way, we reduce the risk that
   tipc_node_find() may obtain a valid pointer to an instance marked
   for deletion; a harmless but undesirable situation.

2) We use del_timer_sync() instead of del_timer() to safely deactivate
   the node timer without any risk that it might be reactivated by the
   timeout handler. There is no risk of deadlock here, since the two
   functions never touch the same spinlocks.

3: We remove a pointless tipc_node_get() + tipc_node_put() from the
   timeout handler.
Reported-by: Zhijiang Hu <huzhijiang@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d25a0125

tipc: eliminate risk of finding to-be-deleted node instance · b170997a

Jon Paul Maloy authored Feb 24, 2016

Although we have never seen it happen, we have identified the
following problematic scenario when nodes are stopped and deleted:

CPU0:                            CPU1:

tipc_node_xxx()                                   //ref == 1
   tipc_node_put()                                //ref -> 0
                                 tipc_node_find() // node still in table
       tipc_node_delete()
         list_del_rcu(n. list)
                                 tipc_node_get()  //ref -> 1, bad
         kfree_rcu()

                                 tipc_node_put() //ref to 0 again.
                                 kfree_rcu()     // BOOM!

We fix this by introducing use of the conditional kref_get_if_not_zero()
instead of kref_get() in the function tipc_node_find(). This eliminates
any risk of post-mortem access.
Reported-by: Zhijiang Hu <huzhijiang@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b170997a

Merge branch 'qed-misc' · 3da7611f

David S. Miller authored Feb 25, 2016

Yuval Mintz says:

====================
qed*: Driver updates

Usually I try to provide a sensible description of the patch set even if
it lacks a general 'motif', but this simply contains several small,
unrelated and self-explenatory tweaks and additions.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

3da7611f

qed, qede: rebrand module description · 5abd7e92

Yuval Mintz authored Feb 24, 2016

Drop the `QL4xxx 40G/100G' and use `FastLinQ 4xxxx' instead.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5abd7e92