Commits · 7582f5b70f9a2335f3713edb9a2614a50f1f1a90 · nexedi / linux

05 Jul, 2019 5 commits

bridge: add br_vlan_get_pvid_rcu() · 7582f5b7

Pablo Neira Ayuso authored Jul 05, 2019

This new function allows you to fetch bridge pvid from packet path.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

7582f5b7

netfilter: nft_meta_bridge: Remove the br_private.h header · 9d6a1ecd

wenxu authored Jul 05, 2019

nft_bridge_meta should not access the bridge internal API.
Signed-off-by: wenxu <wenxu@ucloud.cn>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

9d6a1ecd

netfilter: nft_meta: move bridge meta keys into nft_meta_bridge · 30e103fe

wenxu authored Jul 05, 2019

Separate bridge meta key from nft_meta to meta_bridge to avoid a
dependency between the bridge module and nft_meta when using the bridge
API available through include/linux/if_bridge.h
Signed-off-by: wenxu <wenxu@ucloud.cn>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

30e103fe

ipvs: strip gre tunnel headers from icmp errors · 6aedd14b

Julian Anastasov authored Jul 03, 2019

Recognize GRE tunnels in received ICMP errors and
properly strip the tunnel headers.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

6aedd14b

netfilter: nf_tables: Add synproxy support · ad49d86e

Fernando Fernandez Mancera authored Jun 26, 2019

Add synproxy support for nf_tables. This behaves like the iptables
synproxy target but it is structured in a way that allows us to propose
improvements in the future.
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

ad49d86e

04 Jul, 2019 4 commits

ipvs: allow tunneling with gre encapsulation · 6f7b841b

Vadim Fedorenko authored Jul 01, 2019

windows real servers can handle gre tunnels, this patch allows
gre encapsulation with the tunneling method, thereby letting ipvs
be load balancer for windows-based services
Signed-off-by: Vadim Fedorenko <vfedorenko@yandex-team.ru>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

6f7b841b

netfilter: nf_queue: remove unused hook entries pointer · 0d9cb300

Florian Westphal authored Jul 02, 2019

Its not used anywhere, so remove this.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

0d9cb300

netfilter: nf_log: Replace a seq_printf() call by seq_puts() in seq_show() · eca27f14

Markus Elfring authored Jul 02, 2019

A string which did not contain a data format specification should be put
into a sequence. Thus use the corresponding function “seq_puts”.

This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

eca27f14

netfilter: rename nf_SYNPROXY.h to nf_synproxy.h · f0c1aab2

Pablo Neira Ayuso authored Jun 21, 2019

Uppercase is a reminiscence from the iptables infrastructure, rename
this header before this is included in stable kernels.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

f0c1aab2

25 Jun, 2019 3 commits

tipc: simplify stale link failure criteria · 77cf8edb

Jon Maloy authored Jun 25, 2019

In commit a4dc70d4 ("tipc: extend link reset criteria for stale
packet retransmission") we made link retransmission failure events
dependent on the link tolerance, and not only of the number of failed
retransmission attempts, as we did earlier. This works well. However,
keeping the original, additional criteria of 99 failed retransmissions
is now redundant, and may in some cases lead to failure detection
times in the order of minutes instead of the expected 1.5 sec link
tolerance value.

We now remove this criteria altogether.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

77cf8edb

tc-testing: Restore original behaviour for namespaces in tdc · 489ce2f4

Lucas Bates authored Jun 24, 2019

This patch restores the original behaviour for tdc prior to the
introduction of the plugin system, where the network namespace
functionality was split from the main script.

It introduces the concept of required plugins for testcases,
and will automatically load any plugin that isn't already
enabled when said plugin is required by even one testcase.

Additionally, the -n option for the nsPlugin is deprecated
so the default action is to make use of the namespaces.
Instead, we introduce -N to not use them, but still create
the veth pair.

buildebpfPlugin's -B option is also deprecated.

If a test cases requires the features of a specific plugin
in order to pass, it should instead include a new key/value
pair describing plugin interactions:

        "plugins": {
                "requires": "buildebpfPlugin"
        },

A test case can have more than one required plugin: a list
can be inserted as the value for 'requires'.
Signed-off-by: Lucas Bates <lucasb@mojatatu.com>
Acked-by: Davide Caratti <dcaratti@redhat.com>
Tested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

489ce2f4

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · 27d92807

David S. Miller authored Jun 25, 2019

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patches contains Netfilter updates for net-next:

1) .br_defrag indirection depends on CONFIG_NF_DEFRAG_IPV6, from wenxu.

2) Remove unnecessary memset() in ipset, from Florent Fourcot.

3) Merge control plane addition and deletion in ipset, also from Florent.

4) A few missing check for nla_parse() in ipset, from Aditya Pakki
   and Jozsef Kadlecsik.

5) Incorrect cleanup in error path of xt_set version 3, from Jozsef.

6) Memory accounting problems when resizing in ipset, from Stefano Brivio.

7) Jozsef updates his email to @netfilter.org, this batch comes with a
   conflict resolution with recent SPDX header updates.

8) Add to create custom conntrack expectations via nftables, from
   Stephane Veyret.

9) A lookup optimization for conntrack, from Florian Westphal.

10) Check for supported flags in xt_owner.

11) Support for pernet sysctl in br_netfilter, patches
    from Christian Brauner.

12) Patches to move common synproxy infrastructure to nf_synproxy.c,
    to prepare the synproxy support for nf_tables, patches from
    Fernando Fernandez Mancera.

13) Support to restore expiration time in set element, from Laura Garcia.

14) Fix recent rewrite of netfilter IPv6 to avoid indirections
    when CONFIG_IPV6 is unset, from Arnd Bergmann.

15) Always reset vlan tag on skbuff fraglist when refragmenting in
    bridge conntrack, from wenxu.

16) Support to match IPv4 options in nf_tables, from Stephen Suryaputra.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

27d92807

24 Jun, 2019 28 commits

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next · 1c5ba67d

Pablo Neira Ayuso authored Jun 25, 2019

Resolve conflict between d2912cb1 ("treewide: Replace GPLv2
boilerplate/reference with SPDX - rule 500") removing the GPL disclaimer
and fe03d474 ("Update my email address") which updates Jozsef
Kadlecsik's email.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

1c5ba67d

Merge branch 'cxgb4-Reference-count-MPS-TCAM-entries-within-a-PF' · 045df37e

David S. Miller authored Jun 24, 2019

Raju Rangoju says:

====================
cxgb4: Reference count MPS TCAM entries within a PF

Firmware reference counts the MPS TCAM entries by PF and VF,
but it does not do it for usage within a PF or VF. This patch
adds the support to track MPS TCAM entries within a PF.

v2->v3:
 Fixed the compiler errors due to incorrect patch
 Also, removed the new blank line at EOF
v1->v2:
 Use refcount_t type instead of atomic_t for mps reference count
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

045df37e

cxgb4: Add MPS refcounting for alloc/free mac filters · f9f329ad

Raju Rangoju authored Jun 24, 2019

This patch adds reference counting support for
alloc/free mac filters
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f9f329ad

cxgb4: Add MPS TCAM refcounting for cxgb4 change mac · 2f0b9406

Raju Rangoju authored Jun 24, 2019

This patch adds TCAM reference counting
support for cxgb4 change mac path
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2f0b9406

cxgb4: Add MPS TCAM refcounting for raw mac filters · 5fab5158

Raju Rangoju authored Jun 24, 2019

This patch adds TCAM reference counting
support for raw mac filters.
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5fab5158

cxgb4: Re-work the logic for mps refcounting · 28b38705

Raju Rangoju authored Jun 24, 2019

Remove existing mps refcounting code which was
added only for encap filters and add necessary
data structures/functions to support mps reference
counting for all the mac filters. Also add wrapper
functions for allocating and freeing encap mac
filters.
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

28b38705

net: stmmac: sun8i: force select external PHY when no internal one · 0fec7e72

Icenowy Zheng authored Jun 20, 2019

The PHY selection bit also exists on SoCs without an internal PHY; if it's
set to 1 (internal PHY, default value) then the MAC will not make use of
any PHY on such SoCs.

This problem appears when adapting for H6, which has no real internal PHY
(the "internal PHY" on H6 is not on-die, but on a co-packaged AC200 chip,
connected via RMII interface at GPIO bank A).

Force the PHY selection bit to 0 when the SOC doesn't have an internal PHY,
to address the problem of a wrong default value.
Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Signed-off-by: Ondrej Jirman <megous@megous.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0fec7e72

net: stmmac: sun8i: add support for Allwinner H6 EMAC · adadd38c

Icenowy Zheng authored Jun 20, 2019

The EMAC on Allwinner H6 is just like the one on A64. The "internal PHY" on
H6 is on a co-packaged AC200 chip, and it's not really internal (it's
connected via RMII at PA GPIO bank).

Add support for the Allwinner H6 EMAC in the dwmac-sun8i driver.
Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Signed-off-by: Ondrej Jirman <megous@megous.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

adadd38c

Merge branch 'cached-route-listings' · dcdfa50e

David S. Miller authored Jun 24, 2019

Stefano Brivio says:

====================
Fix listing (IPv4, IPv6) and flushing (IPv6) of cached route exceptions

For IPv6 cached routes, the commands 'ip -6 route list cache' and
'ip -6 route flush cache' don't work at all after route exceptions have
been moved to a separate hash table in commit 2b760fcf ("ipv6: hook
up exception table to store dst cache").

For IPv4 cached routes, the command 'ip route list cache' has also
stopped working in kernel 3.5 after commit 4895c771 ("ipv4: Add FIB
nexthop exceptions.") introduced storage for route exceptions as a
separate entity.

Fix this by allowing userspace to clearly request cached routes with
the RTM_F_CLONED flag used as a filter (in conjuction with strict
checking) and by retrieving and dumping cached routes if requested.

If strict checking is not requested (iproute2 < 5.0.0), we don't have a
way to consistently filter results on other selectors (e.g. on tables),
so skip filtering entirely and dump both regular routes and exceptions.

For IPv4, cache flushing uses a completely different mechanism, so it
wasn't affected. Listing of exception routes (modified routes pre-3.5) was
tested against these versions of kernel and iproute2:

                    iproute2
kernel         4.14.0   4.15.0   4.19.0   5.0.0   5.1.0
 3.5-rc4         +        +        +        +       +
 4.4
 4.9
 4.14
 4.15
 4.19
 5.0
 5.1
 fixed           +        +        +        +       +

For IPv6, a separate iproute2 patch is required. Versions of iproute2
and kernel tested:

                    iproute2
kernel             4.14.0   4.15.0   4.19.0   5.0.0   5.1.0    5.1.0, patched
 3.18    list        +        +        +        +       +            +
         flush       +        +        +        +       +            +
 4.4     list        +        +        +        +       +            +
         flush       +        +        +        +       +            +
 4.9     list        +        +        +        +       +            +
         flush       +        +        +        +       +            +
 4.14    list        +        +        +        +       +            +
         flush       +        +        +        +       +            +
 4.15    list
         flush
 4.19    list
         flush
 5.0     list
         flush
 5.1     list
         flush
 with    list        +        +        +        +       +            +
 fix     flush       +        +        +                             +

v7: Make sure r->rtm_tos is initialised in 3/11, move loop over nexthop
    objects in 4/11, add comments about usage of "skip" counters in commit
    messages of 4/11 and 8/11

v6: Target for net-next, rebase and adapt to nexthop objects for IPv6 paths.
    Merge selftests into this series (as they were addressed for net-next).
    A number of minor changes detailed in logs of single patches.

v5: Skip filtering altogether if no strict checking is requested: selecting
    routes or exceptions only would be inconsistent with the fact we can't
    filter on tables. Drop 1/8 (non-strict dump filter function no longer
    needed), replace 2/8 (don't use NLM_F_MATCH, decide to skip routes or
    exceptions in filter function), drop 6/8 (2/8 is enough for IPv6 too).
    Introduce dump_routes and dump_exceptions flags in filter, adapt other
    patches to that.

v4: Fix the listing issue also for IPv4, making the behaviour consistent
    with IPv6. Honour NLM_F_MATCH as per RFC 3549 and allow usage of
    RTM_F_CLONED filter. Split patches into smaller logical changes.

v3: Drop check on RTM_F_CLONED and rework logic of return values of
    rt6_dump_route()

v2: Add count of routes handled in partial dumps, and skip them, in patch 1/2.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

dcdfa50e

selftests: pmtu: Make list_flush_ipv6_exception test more demanding · b964641e

Stefano Brivio authored Jun 21, 2019

Instead of just listing and flushing two cached exceptions, create
a relatively big number of them, and count how many are listed. Single
netlink dump messages contain approximately 25 entries each, and this
way we can make sure the partial dump tracking mechanism is working
properly.

While at it, also ensure that no cached routes can be listed after
flush, and remove 'sleep 1' calls, they are not actually needed.

v7: No changes

v6:
  - Merge this patch into series including fix, as it's also targeted
    for net-next. No actual changes
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b964641e

selftests: pmtu: Introduce list_flush_ipv4_exception test case · de755a85

Stefano Brivio authored Jun 21, 2019

This test checks that route exceptions can be successfully listed and
flushed using ip -6 route {list,flush} cache.

v7: No changes

v6:
  - Merge this patch into series including fix, as it's also targeted
    for net-next
  - Drop left-over print of 'ip route list cache | wc -l'
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

de755a85

ip6_fib: Don't discard nodes with valid routing information in fib6_locate_1() · 40cb35d5

Stefano Brivio authored Jun 21, 2019

When we perform an inexact match on FIB nodes via fib6_locate_1(), longer
prefixes will be preferred to shorter ones. However, it might happen that
a node, with higher fn_bit value than some other, has no valid routing
information.

In this case, we'll pick that node, but it will be discarded by the check
on RTN_RTINFO in fib6_locate(), and we might miss nodes with valid routing
information but with lower fn_bit value.

This is apparent when a routing exception is created for a default route:
 # ip -6 route list
 fc00:1::/64 dev veth_A-R1 proto kernel metric 256 pref medium
 fc00:2::/64 dev veth_A-R2 proto kernel metric 256 pref medium
 fc00:4::1 via fc00:2::2 dev veth_A-R2 metric 1024 pref medium
 fe80::/64 dev veth_A-R1 proto kernel metric 256 pref medium
 fe80::/64 dev veth_A-R2 proto kernel metric 256 pref medium
 default via fc00:1::2 dev veth_A-R1 metric 1024 pref medium
 # ip -6 route list cache
 fc00:4::1 via fc00:2::2 dev veth_A-R2 metric 1024 expires 593sec mtu 1500 pref medium
 fc00:3::1 via fc00:1::2 dev veth_A-R1 metric 1024 expires 593sec mtu 1500 pref medium
 # ip -6 route flush cache    # node for default route is discarded
 Failed to send flush request: No such process
 # ip -6 route list cache
 fc00:3::1 via fc00:1::2 dev veth_A-R1 metric 1024 expires 586sec mtu 1500 pref medium

Check right away if the node has a RTN_RTINFO flag, before replacing the
'prev' pointer, that indicates the longest matching prefix found so far.

Fixes: 38fbeeee ("ipv6: prepare fib6_locate() for exception table")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

40cb35d5

ipv6: Dump route exceptions if requested · 1e47b483

Stefano Brivio authored Jun 21, 2019

Since commit 2b760fcf ("ipv6: hook up exception table to store dst
cache"), route exceptions reside in a separate hash table, and won't be
found by walking the FIB, so they won't be dumped to userspace on a
RTM_GETROUTE message.

This causes 'ip -6 route list cache' and 'ip -6 route flush cache' to
have no function anymore:

 # ip -6 route get fc00:3::1
 fc00:3::1 via fc00:1::2 dev veth_A-R1 src fc00:1::1 metric 1024 expires 539sec mtu 1400 pref medium
 # ip -6 route get fc00:4::1
 fc00:4::1 via fc00:2::2 dev veth_A-R2 src fc00:2::1 metric 1024 expires 536sec mtu 1500 pref medium
 # ip -6 route list cache
 # ip -6 route flush cache
 # ip -6 route get fc00:3::1
 fc00:3::1 via fc00:1::2 dev veth_A-R1 src fc00:1::1 metric 1024 expires 520sec mtu 1400 pref medium
 # ip -6 route get fc00:4::1
 fc00:4::1 via fc00:2::2 dev veth_A-R2 src fc00:2::1 metric 1024 expires 519sec mtu 1500 pref medium

because iproute2 lists cached routes using RTM_GETROUTE, and flushes them
by listing all the routes, and deleting them with RTM_DELROUTE one by one.

If cached routes are requested using the RTM_F_CLONED flag together with
strict checking, or if no strict checking is requested (and hence we can't
consistently apply filters), look up exceptions in the hash table
associated with the current fib6_info in rt6_dump_route(), and, if present
and not expired, add them to the dump.

We might be unable to dump all the entries for a given node in a single
message, so keep track of how many entries were handled for the current
node in fib6_walker, and skip that amount in case we start from the same
partially dumped node.

When a partial dump restarts, as the starting node might change when
'sernum' changes, we have no guarantee that we need to skip the same
amount of in-node entries. Therefore, we need two counters, and we need to
zero the in-node counter if the node from which the dump is resumed
differs.

Note that, with the current version of iproute2, this only fixes the
'ip -6 route list cache': on a flush command, iproute2 doesn't pass
RTM_F_CLONED and, due to this inconsistency, 'ip -6 route flush cache' is
still unable to fetch the routes to be flushed. This will be addressed in
a patch for iproute2.

To flush cached routes, a procfs entry could be introduced instead: that's
how it works for IPv4. We already have a rt6_flush_exception() function
ready to be wired to it. However, this would not solve the issue for
listing.

Versions of iproute2 and kernel tested:

                    iproute2
kernel             4.14.0   4.15.0   4.19.0   5.0.0   5.1.0    5.1.0, patched
 3.18    list        +        +        +        +       +            +
         flush       +        +        +        +       +            +
 4.4     list        +        +        +        +       +            +
         flush       +        +        +        +       +            +
 4.9     list        +        +        +        +       +            +
         flush       +        +        +        +       +            +
 4.14    list        +        +        +        +       +            +
         flush       +        +        +        +       +            +
 4.15    list
         flush
 4.19    list
         flush
 5.0     list
         flush
 5.1     list
         flush
 with    list        +        +        +        +       +            +
 fix     flush       +        +        +                             +

v7:
  - Explain usage of "skip" counters in commit message (suggested by
    David Ahern)

v6:
  - Rebase onto net-next, use recently introduced nexthop walker
  - Make rt6_nh_dump_exceptions() a separate function (suggested by David
    Ahern)

v5:
  - Use dump_routes and dump_exceptions from filter, ignore NLM_F_MATCH,
    update test results (flushing works with iproute2 < 5.0.0 now)

v4:
  - Split NLM_F_MATCH and strict check handling in separate patches
  - Filter routes using RTM_F_CLONED: if it's not set, only return
    non-cached routes, and if it's set, only return cached routes:
    change requested by David Ahern and Martin Lau. This implies that
    iproute2 needs a separate patch to be able to flush IPv6 cached
    routes. This is not ideal because we can't fix the breakage caused
    by 2b760fcf entirely in kernel. However, two years have passed
    since then, and this makes it more tolerable

v3:
  - More descriptive comment about expired exceptions in rt6_dump_route()
  - Swap return values of rt6_dump_route() (suggested by Martin Lau)
  - Don't zero skip_in_node in case we don't dump anything in a given pass
    (also suggested by Martin Lau)
  - Remove check on RTM_F_CLONED altogether: in the current UAPI semantic,
    it's just a flag to indicate the route was cloned, not to filter on
    routes

v2: Add tracking of number of entries to be skipped in current node after
    a partial dump. As we restart from the same node, if not all the
    exceptions for a given node fit in a single message, the dump will
    not terminate, as suggested by Martin Lau. This is a concrete
    possibility, setting up a big number of exceptions for the same route
    actually causes the issue, suggested by David Ahern.
Reported-by: Jianlin Shi <jishi@redhat.com>
Fixes: 2b760fcf ("ipv6: hook up exception table to store dst cache")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1e47b483

ipv6/route: Change return code of rt6_dump_route() for partial node dumps · bf9a8a06

Stefano Brivio authored Jun 21, 2019

In the next patch, we are going to add optional dump of exceptions to
rt6_dump_route().

Change the return code of rt6_dump_route() to accomodate partial node
dumps: we might dump multiple routes per node, and might be able to dump
only a given number of them, so fib6_dump_node() will need to know how
many routes have been dumped on partial dump, to restart the dump from the
point where it was interrupted.

Note that fib6_dump_node() is the only caller and already handles all
non-negative return codes as success: those become -1 to signal that we're
done with the node. If we fail, return 0, as we were unable to dump the
single route in the node, but we're not done with it.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bf9a8a06

ipv6/route: Don't match on fc_nh_id if not set in ip6_route_del() · 3401bfb1

Stefano Brivio authored Jun 21, 2019

If fc_nh_id isn't set, we shouldn't try to match against it. This
actually matters just for the RTF_CACHE below (where this case is
already handled): if iproute2 gets a route exception and tries to
delete it, it won't reference it by fc_nh_id, even if a nexthop
object might be associated to the originating route.

Fixes: 5b98324e ("ipv6: Allow routes to use nexthop objects")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3401bfb1

Revert "net/ipv6: Bail early if user only wants cloned entries" · ef11209d

Stefano Brivio authored Jun 21, 2019

This reverts commit 08e814c9: as we
are preparing to fix listing and dumping of IPv6 cached routes, we
need to allow RTM_F_CLONED as a flag to match routes against while
dumping them.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ef11209d

ipv4: Dump route exceptions if requested · ee28906f

Stefano Brivio authored Jun 21, 2019

Since commit 4895c771 ("ipv4: Add FIB nexthop exceptions."), cached
exception routes are stored as a separate entity, so they are not dumped
on a FIB dump, even if the RTM_F_CLONED flag is passed.

This implies that the command 'ip route list cache' doesn't return any
result anymore.

If the RTM_F_CLONED is passed, and strict checking requested, retrieve
nexthop exception routes and dump them. If no strict checking is
requested, filtering can't be performed consistently: dump everything in
that case.

With this, we need to add an argument to the netlink callback in order to
track how many entries were already dumped for the last leaf included in
a partial netlink dump.

A single additional argument is sufficient, even if we traverse logically
nested structures (nexthop objects, hash table buckets, bucket chains): it
doesn't matter if we stop in the middle of any of those, because they are
always traversed the same way. As an example, s_i values in [], s_fa
values in ():

  node (fa) #1 [1]
    nexthop #1
    bucket #1 -> #0 in chain (1)
    bucket #2 -> #0 in chain (2) -> #1 in chain (3) -> #2 in chain (4)
    bucket #3 -> #0 in chain (5) -> #1 in chain (6)

    nexthop #2
    bucket #1 -> #0 in chain (7) -> #1 in chain (8)
    bucket #2 -> #0 in chain (9)
  --
  node (fa) #2 [2]
    nexthop #1
    bucket #1 -> #0 in chain (1) -> #1 in chain (2)
    bucket #2 -> #0 in chain (3)

it doesn't matter if we stop at (3), (4), (7) for "node #1", or at (2)
for "node #2": walking flattens all that.

It would even be possible to drop the distinction between the in-tree
(s_i) and in-node (s_fa) counter, but a further improvement might
advise against this. This is only as accurate as the existing tracking
mechanism for leaves: if a partial dump is restarted after exceptions
are removed or expired, we might skip some non-dumped entries.

To improve this, we could attach a 'sernum' attribute (similar to the
one used for IPv6) to nexthop entities, and bump this counter whenever
exceptions change: having a distinction between the two counters would
make this more convenient.

Listing of exception routes (modified routes pre-3.5) was tested against
these versions of kernel and iproute2:

                    iproute2
kernel         4.14.0   4.15.0   4.19.0   5.0.0   5.1.0
 3.5-rc4         +        +        +        +       +
 4.4
 4.9
 4.14
 4.15
 4.19
 5.0
 5.1
 fixed           +        +        +        +       +

v7:
   - Move loop over nexthop objects to route.c, and pass struct fib_info
     and table ID to it, not a struct fib_alias (suggested by David Ahern)
   - While at it, note that the NULL check on fa->fa_info is redundant,
     and the check on RTNH_F_DEAD is also not consistent with what's done
     with regular route listing: just keep it for nhc_flags
   - Rename entry point function for dumping exceptions to
     fib_dump_info_fnhe(), and rearrange arguments for consistency with
     fib_dump_info()
   - Rename fnhe_dump_buckets() to fnhe_dump_bucket() and make it handle
     one bucket at a time
   - Expand commit message to describe why we can have a single "skip"
     counter for all exceptions stored in bucket chains in nexthop objects
     (suggested by David Ahern)

v6:
   - Rebased onto net-next
   - Loop over nexthop paths too. Move loop over fnhe buckets to route.c,
     avoids need to export rt_fill_info() and to touch exceptions from
     fib_trie.c. Pass NULL as flow to rt_fill_info(), it now allows that
     (suggested by David Ahern)

Fixes: 4895c771 ("ipv4: Add FIB nexthop exceptions.")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ee28906f

ipv4/route: Allow NULL flowinfo in rt_fill_info() · d948974c

Stefano Brivio authored Jun 21, 2019

In the next patch, we're going to use rt_fill_info() to dump exception
routes upon RTM_GETROUTE with NLM_F_ROOT, meaning userspace is requesting
a dump and not a specific route selection, which in turn implies the input
interface is not relevant. Update rt_fill_info() to handle a NULL
flowinfo.

v7: If fl4 is NULL, explicitly set r->rtm_tos to 0: it's not initialised
    otherwise (spotted by David Ahern)

v6: New patch
Suggested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d948974c

ipv4/fib_frontend: Allow RTM_F_CLONED flag to be used for filtering · b597ca6e

Stefano Brivio authored Jun 21, 2019

This functionally reverts the check introduced by commit
e8ba330a ("rtnetlink: Update fib dumps for strict data checking")
as modified by commit e4e92fb1 ("net/ipv4: Bail early if user only
wants prefix entries").

As we are preparing to fix listing of IPv4 cached routes, we need to
give userspace a way to request them.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b597ca6e

fib_frontend, ip6_fib: Select routes or exceptions dump from RTM_F_CLONED · 564c91f7

Stefano Brivio authored Jun 21, 2019

The following patches add back the ability to dump IPv4 and IPv6 exception
routes, and we need to allow selection of regular routes or exceptions.

Use RTM_F_CLONED as filter to decide whether to dump routes or exceptions:
iproute2 passes it in dump requests (except for IPv6 cache flush requests,
this will be fixed in iproute2) and this used to work as long as
exceptions were stored directly in the FIB, for both IPv4 and IPv6.

Caveat: if strict checking is not requested (that is, if the dump request
doesn't go through ip_valid_fib_dump_req()), we can't filter on protocol,
tables or route types.

In this case, filtering on RTM_F_CLONED would be inconsistent: we would
fix 'ip route list cache' by returning exception routes and at the same
time introduce another bug in case another selector is present, e.g. on
'ip route list cache table main' we would return all exception routes,
without filtering on tables.

Keep this consistent by applying no filters at all, and dumping both
routes and exceptions, if strict checking is not requested. iproute2
currently filters results anyway, and no unwanted results will be
presented to the user. The kernel will just dump more data than needed.

v7: No changes

v6: Rebase onto net-next, no changes

v5: New patch: add dump_routes and dump_exceptions flags in filter and
    simply clear the unwanted one if strict checking is enabled, don't
    ignore NLM_F_MATCH and don't set filter_set if NLM_F_MATCH is set.
    Skip filtering altogether if no strict checking is requested:
    selecting routes or exceptions only would be inconsistent with the
    fact we can't filter on tables.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

564c91f7

net: macb: use GRO · 97236cda

Antoine Tenart authored Jun 21, 2019

This patch updates the macb driver to use NAPI GRO helpers when
receiving SKBs. This improves performances.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

97236cda

net: macb: use NAPI_POLL_WEIGHT · 760a3c1a

Antoine Tenart authored Jun 21, 2019

Use NAPI_POLL_WEIGHT, the default NAPI poll() weight instead of
redefining our own value (which turns out to be 64 as well).
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

760a3c1a

Merge branch 'ipv4-fix-bugs-when-enable-route_localnet' · 38a3889f

David S. Miller authored Jun 24, 2019

Shijie Luo says:

====================
ipv4: fix bugs when enable route_localnet

When enable route_localnet, route of the 127/8 address is enabled.
But in some situations like arp_announce=2, ARP requests or reply
work abnormally.

This patchset fix some bugs when enable route_localnet.

Change History:
V2:
- Change a single patch to a patchset.
- Add bug fix for arp_ignore = 3.
- Add a couple of test for enabling route_localnet in selftests.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

38a3889f

selftests: add route_localnet test script · 58ade67b

Shijie Luo authored Jun 18, 2019

Add a simple scripts to exercise several situations when enable
route_localnet.
Signed-off-by: Shijie Luo <luoshijie1@huawei.com>
Signed-off-by: Zhiqiang liu <liuzhiqiang26@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

58ade67b

ipv4: fix confirm_addr_indev() when enable route_localnet · 650638a7

Shijie Luo authored Jun 18, 2019

When arp_ignore=3, the NIC won't reply for scope host addresses, but
if enable route_locanet, we need to reply ip address with head 127 and
scope RT_SCOPE_HOST.

Fixes: d0daebc3 ("ipv4: Add interface option to enable routing of 127.0.0.0/8")
Signed-off-by: Shijie Luo <luoshijie1@huawei.com>
Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

650638a7

ipv4: fix inet_select_addr() when enable route_localnet · d8c444d5

Shijie Luo authored Jun 18, 2019

Suppose we have two interfaces eth0 and eth1 in two hosts, follow
the same steps in the two hosts:
 # sysctl -w net.ipv4.conf.eth1.route_localnet=1
 # sysctl -w net.ipv4.conf.eth1.arp_announce=2
 # ip route del 127.0.0.0/8 dev lo table local
and then set ip to eth1 in host1 like:
 # ifconfig eth1 127.25.3.4/24
set ip to eth2 in host2 and ping host1:
 # ifconfig eth1 127.25.3.14/24
 # ping -I eth1 127.25.3.4
Well, host2 cannot connect to host1.

When set a ip address with head 127, the scope of the address defaults
to RT_SCOPE_HOST. In this situation, host2 will use arp_solicit() to
send a arp request for the mac address of host1 with ip
address 127.25.3.14. When arp_announce=2, inet_select_addr() cannot
select a correct saddr with condition ifa->ifa_scope > scope, because
ifa_scope is RT_SCOPE_HOST and scope is RT_SCOPE_LINK. Then,
inet_select_addr() will go to no_in_dev to lookup all interfaces to find
a primary ip and finally get the primary ip of eth0.

Here I add a localnet_scope defaults to RT_SCOPE_HOST, and when
route_localnet is enabled, this value changes to RT_SCOPE_LINK to make
inet_select_addr() find a correct primary ip as saddr of arp request.

Fixes: d0daebc3 ("ipv4: Add interface option to enable routing of 127.0.0.0/8")
Signed-off-by: Shijie Luo <luoshijie1@huawei.com>
Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d8c444d5

tipc: remove the unnecessary msg->req check from tipc_nl_compat_bearer_set · 8bc81c57

Xin Long authored Jun 24, 2019

tipc_nl_compat_bearer_set() is only called by tipc_nl_compat_link_set()
which already does the check for msg->req check, so remove it from
tipc_nl_compat_bearer_set(), and do the same in tipc_nl_compat_media_set().
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8bc81c57

Merge branch 'mlxsw-Thermal-and-hwmon-extensions' · 18f3896d

David S. Miller authored Jun 24, 2019

Ido Schimmel says:

====================
mlxsw: Thermal and hwmon extensions

This patchset from Vadim includes various enhancements to thermal and
hwmon code in mlxsw.

Patch #1 adds a thermal zone for each inter-connect device (gearbox).
These devices are present in SN3800 systems and code to expose their
temperature via hwmon was added in commit 2e265a8b ("mlxsw: core:
Extend hwmon interface with inter-connect temperature attributes").

Currently, there are multiple thermal zones in mlxsw and only a few
cooling devices. Patch #2 detects the hottest thermal zone and the
cooling devices are switched to follow its trends. RFC was sent last
month [1].

Patch #3 allows to read and report negative temperature of the sensors
mlxsw exposes via hwmon and thermal subsystems.

v2 (Andrew Lunn):
* In patch #3, replace '%u' with '%d' in mlxsw_hwmon_module_temp_show()

[1] https://patchwork.ozlabs.org/patch/1107161/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

18f3896d