Commits · 1c78efa8319cad2f10f421afa627745fb4d9b29f · nexedi / linux

23 Oct, 2015 11 commits

mpls: flow-based multipath selection · 1c78efa8

Robert Shearman authored Oct 23, 2015

Change the selection of a multipath route to use a flow-based
hash. This more suitable for traffic sensitive to reordering within a
flow (e.g. TCP, L2VPN) and whilst still allowing a good distribution
of traffic given enough flows.

Selection of the path for a multipath route is done using a hash of:
1. Label stack up to MAX_MP_SELECT_LABELS labels or up to and
   including entropy label, whichever is first.
2. 3-tuple of (L3 src, L3 dst, proto) from IPv4/IPv6 header in MPLS
   payload, if present.

Naturally, a 5-tuple hash using L4 information in addition would be
possible and be better in some scenarios, but there is a tradeoff
between looking deeper into the packet to achieve good distribution,
and packet forwarding performance, and I have erred on the side of the
latter as the default.
Signed-off-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1c78efa8

mpls: multipath route support · f8efb73c

Roopa Prabhu authored Oct 23, 2015

This patch adds support for MPLS multipath routes.

Includes following changes to support multipath:
- splits struct mpls_route into 'struct mpls_route + struct mpls_nh'

- 'struct mpls_nh' represents a mpls nexthop label forwarding entry

- moves mpls route and nexthop structures into internal.h

- A mpls_route can point to multiple mpls_nh structs

- the nexthops are maintained as a array (similar to ipv4 fib)

- In the process of restructuring, this patch also consistently changes
  all labels to u8

- Adds support to parse/fill RTA_MULTIPATH netlink attribute for
multipath routes similar to ipv4/v6 fib

- In this patch, the multipath route nexthop selection algorithm
simply returns the first nexthop. It is replaced by a
hash based algorithm from Robert Shearman in the next patch

- mpls_route_update cleanup: remove 'dev' handling in mpls_route_update.
mpls_route_update though implemented to update based on dev, it was
never used that way. And the dev handling gets tricky with multiple
nexthops. Cannot match against any single nexthops dev. So, this patch
removes the unused 'dev' handling in mpls_route_update.

- dead route/path handling will be implemented in a subsequent patch

Example:

$ip -f mpls route add 100 nexthop as 200 via inet 10.1.1.2 dev swp1 \
                nexthop as 700 via inet 10.1.1.6 dev swp2 \
                nexthop as 800 via inet 40.1.1.2 dev swp3

$ip  -f mpls route show
100
        nexthop as to 200 via inet 10.1.1.2  dev swp1
        nexthop as to 700 via inet 10.1.1.6  dev swp2
        nexthop as to 800 via inet 40.1.1.2  dev swp3
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f8efb73c

Merge branch 'mdiobus_nested_read_write' · 654c9c54

David S. Miller authored Oct 23, 2015

Neil Armstrong says:

====================
Refactor nested mdiobus read/write functions

In order to avoid locked signal false positive for nested mdiobus
read/write calls, nested code was introduced in mv88e6xxx and
mdio-mux.
But mv88e6060 also needs such nested mdiobus read/write calls.
For sake of refactoring, introduce nested variants of mdiobus read/write
and make them used by mv88e6xxx and mv88e6060.
In a next patch, mdio-mux should also use these variant calls.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

654c9c54

net: dsa: Make mv88e6060 use nested mdiobus read/write · f0505610

Neil Armstrong authored Oct 22, 2015

Like mv88e6xxx and mdio-mux, to avoid lockdep give false positives
because of nested MDIO busses, switch to previously introduced
nested mdiobus_read/write variants.
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f0505610

net: dsa: Make mv88e6xxx use nested mdiobus read/write · 6e899e6c

Neil Armstrong authored Oct 22, 2015

Make the mv88e6xxx driver use the previously introduced nested
variants of mdiobus_read/write functions.
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6e899e6c

net: phy: Add nested variants of mdiobus read/write · 21dd19fe

Neil Armstrong authored Oct 22, 2015

Since nested variants of mdiobus_read/write are used in multiple
drivers, add nested variants in the mdiobus core.
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

21dd19fe

drivers: net: cpsw: use module_platform_driver · 6fb3b6b5

Grygorii Strashko authored Oct 23, 2015

There is no reasons to probe cpsw from late_initcall level
and it's not recommended. Hence, use module_platform_driver()
to register and probe cpsw driver from module_init() level.

Cc: Tony Lindgren <tony@atomide.com>
Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6fb3b6b5

tcp/dccp: fix hashdance race for passive sessions · 5e0724d0

Eric Dumazet authored Oct 22, 2015

Multiple cpus can process duplicates of incoming ACK messages
matching a SYN_RECV request socket. This is a rare event under
normal operations, but definitely can happen.

Only one must win the race, otherwise corruption would occur.

To fix this without adding new atomic ops, we use logic in
inet_ehash_nolisten() to detect the request was present in the same
ehash bucket where we try to insert the new child.

If request socket was not found, we have to undo the child creation.

This actually removes a spin_lock()/spin_unlock() pair in
reqsk_queue_unlink() for the fast path.

Fixes: e994b2f0 ("tcp: do not lock listener to process SYN packets")
Fixes: 079096f1 ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5e0724d0

ipv4: implement support for NOPREFIXROUTE ifa flag for ipv4 address · 7b131180

Paolo Abeni authored Oct 20, 2015

Currently adding a new ipv4 address always cause the creation of the
related network route, with default metric. When a host has multiple
interfaces on the same network, multiple routes with the same metric
are created.

If the userspace wants to set specific metric on each routes, i.e.
giving better metric to ethernet links in respect to Wi-Fi ones,
the network routes must be deleted and recreated, which is error-prone.

This patch implements the support for IFA_F_NOPREFIXROUTE for ipv4
address. When an address is added with such flag set, no associated
network route is created, no network route is deleted when
said IP is gone and it's up to the user space manage such route.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7b131180

bnxt_en: New Broadcom ethernet driver. · c0c050c5

Michael Chan authored Oct 22, 2015

Broadcom ethernet driver for the new family of NetXtreme-C/E
ethernet devices.

v5:
  - Removed empty blank lines at end of files (noted by David Miller).
  - Moved busy poll helper functions to bnxt.h to at least make the
    .c file look less cluttered with #ifdef (noted by Stephen Hemminger).

v4:
  - Broke up 2 long message strings with "\n" (suggested by John Linville)
  - Constify an array of strings (suggested by Stephen Hemminger)
  - Improve bnxt_vf_pciid() (suggested by Stephen Hemminger)
  - Use PCI_VDEVICE() to populate pci_device_id table for more compact
    source.

v3:
  - Fixed 2 more sparse warnings.
  - Removed some unused structures in .h files.

v2:
  - Fixed all kbuild test robot reported warnings.
  - Fixed many of the checkpatch.pl errors and warnings.
  - Fixed the Kconfig description (noted by Dmitry Kravkov).
Acked-by: Eddie Wai <eddie.wai@broadcom.com>
Acked-by: Jeffrey Huang <huangjw@broadcom.com>
Signed-off-by: Prashant Sreedharan <prashant@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c0c050c5

net: dsa: mv88e6xxx: remove debugfs interface · 0a31adae

Vivien Didelot authored Oct 22, 2015

It is preferable to have a common debugfs interface for DSA or switchdev
instead of a driver specific one. Thus remove the mv88e6xxx debug code.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0a31adae

22 Oct, 2015 29 commits

Merge branch 'dsa-port_fdb_dump' · 998eb807

David S. Miller authored Oct 22, 2015

Vivien Didelot says:

====================
net: dsa: implement port_fdb_dump in drivers

Not all switch chips provide a Get Next kind of operation to dump FDB entries.
It is preferred to let the driver handle the dump operation the way it works
best for the chip. Thus, drop port_fdb_getnext and implement the port_fdb_dump
operation in DSA, which pushes the switchdev FDB dump callback down to the
drivers. mv88e6xxx is the only driver affected and is updated accordingly.

v3 -> v4: fix rejects on latest net-next

v2 -> v3: opencode switchdev_obj_dump_cb_t to avoid multiple typedef;
          use ether_addr_copy in fdb_dump

v1 -> v2: fix a few "return err" instead of "goto unlock" in mv88e6xxx.c
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

998eb807

net: dsa: remove port_fdb_getnext · 1a49a2fb

Vivien Didelot authored Oct 22, 2015

No driver implements port_fdb_getnext anymore, and port_fdb_dump is
preferred anyway, so remove this function from DSA.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1a49a2fb

net: dsa: mv88e6xxx: remove port_fdb_getnext · 2c49471b

Vivien Didelot authored Oct 22, 2015

Now that port_fdb_dump is implemented and even simpler, get rid of
port_fdb_getnext.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2c49471b

net: dsa: mv88e6xxx: implement port_fdb_dump · f33475bd

Vivien Didelot authored Oct 22, 2015

Implement the port_fdb_dump DSA operation.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f33475bd

net: dsa: mv88e6xxx: write MAC outside of ATU Get Next code · b0e1a692

Vivien Didelot authored Oct 22, 2015

There is no need to write the MAC address before every Get Next
operation, since ATU MAC registers are not cleared between calls.

Move the _mv88e6xxx_atu_mac_write call outside of _mv88e6xxx_atu_getnext
so future code could call ATU Get Next multiple times and save a few
register access.
Signed-off-by: David S. Miller <davem@davemloft.net>

b0e1a692

net: dsa: mv88e6xxx: write VID outside of VTU Get Next code · 36d04ba1

Vivien Didelot authored Oct 22, 2015

There is no need to write the VLAN ID before every Get Next operation,
since the VTU VID register is not cleared between calls.

Move the VID write call in a _mv88e6xxx_vtu_vid_write function outside
of _mv88e6xxx_vtu_getnext so future code could call VTU Get Next
multiple times and save a few register accesses.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

36d04ba1

net: dsa: add port_fdb_dump function · ea70ba98

Vivien Didelot authored Oct 22, 2015

Not all switch chips support a Get Next operation to iterate on its FDB.
So add a more simple port_fdb_dump function for them.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ea70ba98

Merge tag 'mac80211-next-for-davem-2015-10-21' of... · e9829b97

David S. Miller authored Oct 22, 2015

Merge tag 'mac80211-next-for-davem-2015-10-21' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
Here's another set of patches for the current cycle:
 * I merged net-next back to avoid a conflict with the
 * cfg80211 scheduled scan API extensions
 * preparations for better scan result timestamping
 * regulatory cleanups
 * mac80211 statistics cleanups
 * a few other small cleanups and fixes
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e9829b97

net: hisilicon: deals with the sub ctrl by syscon · c7fc9eb7

yankejian authored Oct 21, 2015

the global Soc configuration is treated by syscon, and sub ctrl bus is
Soc bus. it has to be treated by syscon.
Signed-off-by: yankejian <yankejian@huawei.com>
Signed-off-by: lisheng <lisheng011@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c7fc9eb7

Merge branch 'cxgb4-trivial-fixes' · e9e6d79c

David S. Miller authored Oct 22, 2015

Hariprasad Shenai says:

====================
Trivial fixes for cxgb4 driver

This patch series updates driver description for next gen. adapters, updates
firmware info., returns error for setup_rss error case, restores L1
configuration in case of FW rejects new config, updates and aligns ethtool
get stats settings, etc

 This patch series has been created against net-next tree and includes
patches on cxgb4 and cxgb4vf driver.

We have included all the maintainers of respective drivers. Kindly review
the change and let us know in case of any review comments.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e9e6d79c

cxgb4: Update ethtool get_drvinfo to get regdump len · b08f2b35

Hariprasad Shenai authored Oct 21, 2015

Update ethtool get_drvinfo to display regdump len and also update
firmware string version print to display N/A in case FW isn't present
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b08f2b35

cxgb4: Use vmalloc, if kmalloc fails · 9c673d15

Hariprasad Shenai authored Oct 21, 2015

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9c673d15

cxgb4: Return error if setup_rss is called before probe · 6ac5fe75

Hariprasad Shenai authored Oct 21, 2015

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6ac5fe75

cxgb4/cxgb4vf: Update driver desc. to include Chelsio T6 adapter · 52a5f846
Hariprasad Shenai authored Oct 21, 2015
```
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
52a5f846
cxgb4: Add info print to display number of MSI-X vectors allocated · 43eb4e82
Hariprasad Shenai authored Oct 21, 2015
```
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
43eb4e82

cxgb4: Restore L1 cfg, if FW rejects new L1 cfg settings · 41165428

Hariprasad Shenai authored Oct 21, 2015

In the ethtool set_settings() routine we need to remember our old L1
Configuration in case the firmware rejects the request and then restore
that.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

41165428

cxgb4: Don't disallow turning off auto-negotiation · 9bfdad5e

Hariprasad Shenai authored Oct 21, 2015

For {1, 10, 40} Gb/s. Prohibiting turning off autonegotiation isn't anywhere
in the standard.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9bfdad5e

cxgb4: Align ethtool get stat settings · eed7342d

Hariprasad Shenai authored Oct 21, 2015

Align the ethtool get stats settings with the rest so it looks uniform
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

eed7342d

openvswitch: Use dev_queue_xmit for vport send. · aec15924

Pravin B Shelar authored Oct 20, 2015

With use of lwtunnel, we can directly call dev_queue_xmit()
rather than calling netdev vport send operation.
Following change make tunnel vport code bit cleaner.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

aec15924

openvswitch: Fix incorrect type use. · 99e28f18

Pravin B Shelar authored Oct 20, 2015

Patch fixes following sparse warning.
net/openvswitch/flow_netlink.c:583:30: warning: incorrect type in assignment (different base types)
net/openvswitch/flow_netlink.c:583:30:    expected restricted __be16 [usertype] ipv4
net/openvswitch/flow_netlink.c:583:30:    got int

Fixes: 6b26ba3a ("openvswitch: netlink attributes for IPv6 tunneling")
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

99e28f18

Merge branch 'bpf-perf' · 721daebb

David S. Miller authored Oct 22, 2015

Alexei Starovoitov says:

====================
bpf_perf_event_output helper

Over the last year there were multiple attempts to let eBPF programs
output data into perf events by He Kuang and Wangnan.
The last one was:
https://lkml.org/lkml/2015/7/20/736
It was almost perfect with exception that all bpf programs would sent
data into one global perf_event.
This patch set takes different approach by letting user space
open independent PERF_COUNT_SW_BPF_OUTPUT events, so that program
output won't collide.

Wangnan is working on corresponding perf patches.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

721daebb

samples: bpf: add bpf_perf_event_output example · 39111695

Alexei Starovoitov authored Oct 20, 2015

Performance test and example of bpf_perf_event_output().
kprobe is attached to sys_write() and trivial bpf program streams
pid+cookie into userspace via PERF_COUNT_SW_BPF_OUTPUT event.

Usage:
$ sudo ./bld_x64/samples/bpf/trace_output
recv 2968913 events per sec
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

39111695

bpf: introduce bpf_perf_event_output() helper · a43eec30

Alexei Starovoitov authored Oct 20, 2015

This helper is used to send raw data from eBPF program into
special PERF_TYPE_SOFTWARE/PERF_COUNT_SW_BPF_OUTPUT perf_event.
User space needs to perf_event_open() it (either for one or all cpus) and
store FD into perf_event_array (similar to bpf_perf_event_read() helper)
before eBPF program can send data into it.

Today the programs triggered by kprobe collect the data and either store
it into the maps or print it via bpf_trace_printk() where latter is the debug
facility and not suitable to stream the data. This new helper replaces
such bpf_trace_printk() usage and allows programs to have dedicated
channel into user space for post-processing of the raw data collected.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

a43eec30

perf: pad raw data samples automatically · fa128e6a

Alexei Starovoitov authored Oct 20, 2015

Instead of WARN_ON in perf_event_output() on unpaded raw samples,
pad them automatically.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

fa128e6a

ipvlan: read direct ifindex instead of iflink · 63b11e75

Brenden Blanco authored Oct 20, 2015

In the ipv4 outbound path of an ipvlan device in l3 mode, the ifindex is
being grabbed from dev_get_iflink. This works for the physical device
case, since as the documentation of that function notes: "Physical
interfaces have the same 'ifindex' and 'iflink' values.".  However, if
the master device is a veth, and the pairs are in separate net
namespaces, the route lookup will fail with -ENODEV due to outer veth
pair being in a separate namespace from the ipvlan master/routing
namespace.

  ns0    |   ns1    |   ns2
 veth0a--|--veth0b--|--ipvl0

In ipvlan_process_v4_outbound(), a packet sent from ipvl0 in the above
configuration will pass fl.flowi4_oif == veth0a to
ip_route_output_flow(), but *net == ns1.

Notice also that ipv6 processing is not using iflink. Since there is a
discrepancy in usage, fixup both v4 and v6 case to use local dev
variable.

Tested this with l3 ipvlan on top of veth, as well as with single
physical interface in the top namespace.
Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Reviewed-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

63b11e75

tcp: fastopen: limit max_qlen · dbf650b6

Eric Dumazet authored Oct 20, 2015

Allowing an application to set whatever limit for
the list of recently RST fastopen sessions [1] is not wise,
as it open ways to deplete kernel memory.

Cap the user provided limit by somaxconn sysctl,
like listen() backlog.

[1] https://tools.ietf.org/html/rfc7413#section-5.1Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dbf650b6

net: mdio-gpio: move platform data header · e2aacd96

Vivien Didelot authored Oct 20, 2015

This header file only contains the platform data structure definition,
so move it to the include/linux/platform_data/ directory.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e2aacd96

ARM: gemini: remove unnecessary mdio-gpio includes · 844338e5

Vivien Didelot authored Oct 20, 2015

Remove the inclusion of linux/mdio-gpio.h in nas4220b, wbd111 and wbd222
boards since mdio-gpio is not used.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

844338e5

net: hisilicon: fix ptr_ret.cocci warnings · c6aa74d5

Wu Fengguang authored Oct 20, 2015

drivers/net/ethernet/hisilicon/hns/hnae.c:442:1-3: WARNING: PTR_ERR_OR_ZERO can be used

 Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR

Generated by: scripts/coccinelle/api/ptr_ret.cocci

CC: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c6aa74d5