Commits · 2eb0f624b709e78ec8e2f4c3412947703db99301 · nexedi / linux

24 Apr, 2018 18 commits

netfilter: add NAT support for shifted portmap ranges · 2eb0f624

Thierry Du Tre authored Apr 04, 2018

This is a patch proposal to support shifted ranges in portmaps. (i.e. tcp/udp
incoming port 5000-5100 on WAN redirected to LAN 192.168.1.5:2000-2100)

Currently DNAT only works for single port or identical port ranges. (i.e.
ports 5000-5100 on WAN interface redirected to a LAN host while original
destination port is not altered) When different port ranges are configured,
either 'random' mode should be used, or else all incoming connections are
mapped onto the first port in the redirect range. (in described example
WAN:5000-5100 will all be mapped to 192.168.1.5:2000)

This patch introduces a new mode indicated by flag NF_NAT_RANGE_PROTO_OFFSET
which uses a base port value to calculate an offset with the destination port
present in the incoming stream. That offset is then applied as index within the
redirect port range (index modulo rangewidth to handle range overflow).

In described example the base port would be 5000. An incoming stream with
destination port 5004 would result in an offset value 4 which means that the
NAT'ed stream will be using destination port 2004.

Other possibilities include deterministic mapping of larger or multiple ranges
to a smaller range : WAN:5000-5999 -> LAN:5000-5099 (maps WAN port 5*xx to port
51xx)

This patch does not change any current behavior. It just adds new NAT proto
range functionality which must be selected via the specific flag when intended
to use.

A patch for iptables (libipt_DNAT.c + libip6t_DNAT.c) will also be proposed
which makes this functionality immediately available.
Signed-off-by: Thierry Du Tre <thierry@dtsystems.be>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

2eb0f624

netfilter: nf_tables: Simplify set backend selection · 71cc0873

Phil Sutter authored Apr 03, 2018

Drop nft_set_type's ability to act as a container of multiple backend
implementations it chooses from. Instead consolidate the whole selection
logic in nft_select_set_ops() and the actual backend provided estimate()
callback.

This turns nf_tables_set_types into a list containing all available
backends which is traversed when selecting one matching userspace
requested criteria.

Also, this change allows to embed nft_set_ops structure into
nft_set_type and pull flags field into the latter as it's only used
during selection phase.

A crucial part of this change is to make sure the new layout respects
hash backend constraints formerly enforced by nft_hash_select_ops()
function: This is achieved by introduction of a specific estimate()
callback for nft_hash_fast_ops which returns false for key lengths != 4.
In turn, nft_hash_estimate() is changed to return false for key lengths
== 4 so it won't be chosen by accident. Also, both callbacks must return
false for unbounded sets as their size estimate depends on a known
maximum element count.

Note that this patch partially reverts commit 4f2921ca ("netfilter:
nf_tables: meter: pick a set backend that supports updates") by making
nft_set_ops_candidate() not explicitly look for an update callback but
make NFT_SET_EVAL a regular backend feature flag which is checked along
with the others. This way all feature requirements are checked in one
go.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

71cc0873

netfilter: nf_tables: initial support for extended ACK reporting · 36dd1bcc

Pablo Neira Ayuso authored Mar 28, 2018

Keep it simple to start with, just report attribute offsets that can be
useful to userspace when representating errors to users.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

36dd1bcc

netfilter: nf_tables: simplify lookup functions · cac20fcd

Pablo Neira Ayuso authored Mar 28, 2018

Replace the nf_tables_ prefix by nft_ and merge code into single lookup
function whenever possible. In many cases we go over the 80-chars
boundary function names, this save us ~50 LoC.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

cac20fcd

netfilter: nf_flow_table: fix offloading connections with SNAT+DNAT · df1e2025

Felix Fietkau authored Mar 23, 2018

Pass all NAT types to the flow offload struct, otherwise parts of the
address/port pair do not get translated properly, causing connection
stalls
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

df1e2025

netfilter: nf_flow_table: add missing condition for TCP state check · 33894c36

Felix Fietkau authored Mar 23, 2018

Avoid looking at unrelated fields in UDP packets
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

33894c36

netfilter: nf_flow_table: tear down TCP flows if RST or FIN was seen · b6f27d32

Felix Fietkau authored Feb 26, 2018

Allow the slow path to handle the shutdown of the connection with proper
timeouts. The packet containing RST/FIN is also sent to the slow path
and the TCP conntrack module will update its state.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

b6f27d32

netfilter: nf_flow_table: add support for sending flows back to the slow path · da5984e5

Felix Fietkau authored Feb 26, 2018

Since conntrack hasn't seen any packets from the offloaded flow in a
while, and the timeout for offloaded flows is set to an extremely long
value, we need to fix up the state before we can send a flow back to the
slow path.

For TCP, reset td_maxwin in both directions, which makes it resync its
state on the next packets.

Use the regular timeout for TCP and UDP established connections.

This allows the slow path to take over again once the offload state has
been torn down
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

da5984e5

netfilter: nf_flow_table: in flow_offload_lookup, skip entries being deleted · ba03137f

Felix Fietkau authored Feb 26, 2018

Preparation for sending flows back to the slow path
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

ba03137f

netfilter: nf_flow_table: add a new flow state for tearing down offloading · 59c466dd

Felix Fietkau authored Feb 26, 2018

On cleanup, this will be treated differently from FLOW_OFFLOAD_DYING:

If FLOW_OFFLOAD_DYING is set, the connection is going away, so both the
offload state and the connection tracking entry will be deleted.

If FLOW_OFFLOAD_TEARDOWN is set, the connection remains alive, but
the offload state is torn down. This is useful for cases that require
more complex state tracking / timeout handling on TCP, or if the
connection has been idle for too long.

Support for sending flows back to the slow path will be implemented in
a following patch
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

59c466dd

netfilter: nf_flow_table: make flow_offload_dead inline · 6bdc3c68

Felix Fietkau authored Feb 26, 2018

It is too trivial to keep as a separate exported function
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

6bdc3c68

netfilter: nf_flow_table: track flow tables in nf_flow_table directly · 84453a90

Felix Fietkau authored Feb 26, 2018

Avoids having nf_flow_table depend on nftables (useful for future
iptables backport work)
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

84453a90

netfilter: nf_flow_table: fix priv pointer for netdev hook · 17857d92

Felix Fietkau authored Feb 26, 2018

The offload ip hook expects a pointer to the flowtable, not to the
rhashtable. Since the rhashtable is the first member, this is safe for
the moment, but breaks as soon as the structure layout changes
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

17857d92

netfilter: nf_flow_table: move init code to nf_flow_table_core.c · a268de77

Felix Fietkau authored Feb 26, 2018

Reduces duplication of .gc and .params in flowtable type definitions and
makes the API clearer
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

a268de77

netfilter: nf_flow_table: relax mixed ipv4/ipv6 flowtable dependencies · 1e80380b

Felix Fietkau authored Feb 26, 2018

Since the offload hook code was moved, this table no longer depends on
the IPv4 and IPv6 flowtable modules
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

1e80380b

netfilter: nf_flow_table: move ipv6 offload hook code to nf_flow_table · a908fdec

Felix Fietkau authored Feb 26, 2018

Useful as preparation for adding iptables support for offload.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

a908fdec

netfilter: nf_flow_table: move ip header check out of nf_flow_exceeds_mtu · 3aeb51d7

Felix Fietkau authored Feb 26, 2018

Allows the function to be shared with the IPv6 hook code
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

3aeb51d7

netfilter: nf_flow_table: move ipv4 offload hook code to nf_flow_table · 7d208687

Felix Fietkau authored Feb 26, 2018

Allows some minor code sharing with the ipv6 hook code and is also
useful as preparation for adding iptables support for offload
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

7d208687

21 Apr, 2018 3 commits

netfilter: nf_flow_table: rename nf_flow_table.c to nf_flow_table_core.c · 1a999d89

Felix Fietkau authored Feb 26, 2018

Preparation for adding more code to the same module
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

1a999d89

netfilter: nf_flow_table: cache mtu in struct flow_offload_tuple · 4f3780c0

Felix Fietkau authored Feb 26, 2018

Reduces the number of cache lines touched in the offload forwarding
path. This is safe because PMTU limits are bypassed for the forwarding
path (see commit f87c10a8 for more details).
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

4f3780c0

ipv6: make ip6_dst_mtu_forward inline · 07cb9623

Felix Fietkau authored Feb 26, 2018

Just like ip_dst_mtu_maybe_forward(), to avoid a dependency with ipv6.ko.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

07cb9623

19 Apr, 2018 19 commits

netfilter: nf_flow_table: clean up flow_offload_alloc · 047b300e

Felix Fietkau authored Feb 26, 2018

Reduce code duplication and make it much easier to read
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

047b300e

netfilter: nf_flow_table: use IP_CT_DIR_* values for FLOW_OFFLOAD_DIR_* · af81f9e7

Felix Fietkau authored Feb 26, 2018

Simplifies further code cleanups
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

af81f9e7

netfilter: xt_NFLOG: use nf_log_packet instead of nfulnl_log_packet. · ce20cdf4

Taehee Yoo authored Apr 09, 2018

The nfulnl_log_packet() is added to make sure that the NFLOG target
works as only user-space logger. but now, nf_log_packet() can find proper
log function using NF_LOG_TYPE_ULOG and NF_LOG_TYPE_LOG.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

ce20cdf4

ipv6: frags: fix a lockdep false positive · 415787d7

Eric Dumazet authored Apr 17, 2018

lockdep does not know that the locks used by IPv4 defrag
and IPv6 reassembly units are of different classes.

It complains because of following chains :

1) sch_direct_xmit()        (lock txq->_xmit_lock)
    dev_hard_start_xmit()
     xmit_one()
      dev_queue_xmit_nit()
       packet_rcv_fanout()
        ip_check_defrag()
         ip_defrag()
          spin_lock()     (lock frag queue spinlock)

2) ip6_input_finish()
    ipv6_frag_rcv()       (lock frag queue spinlock)
     ip6_frag_queue()
      icmpv6_param_prob() (lock txq->_xmit_lock at some point)

We could add lockdep annotations, but we also can make sure IPv6
calls icmpv6_param_prob() only after the release of the frag queue spinlock,
since this naturally makes frag queue spinlock a leaf in lock hierarchy.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

415787d7

hv_netvsc: Add NetVSP v6 and v6.1 into version negotiation · 0dcec221

Haiyang Zhang authored Apr 17, 2018

This patch adds the NetVSP v6 and 6.1 message structures, and includes
these versions into NetVSC/NetVSP version negotiation process.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0dcec221

hv_netvsc: propogate Hyper-V friendly name into interface alias · 0fe554a4

Stephen Hemminger authored Apr 17, 2018

This patch implement the 'Device Naming' feature of the Hyper-V
network device API. In Hyper-V on the host through the GUI or PowerShell
it is possible to enable the device naming feature which causes
the host to make available to the guest the name of the device.
This shows up in the RNDIS protocol as the friendly name.

The name has no particular meaning and is limited to 256 characters.
The value can only be set via PowerShell on the host, but could
be scripted for mass deployments. The default value is the
string 'Network Adapter' and since that is the same for all devices
and useless, the driver ignores it.

In Windows, the value goes into a registry key for use in SNMP
ifAlias. For Linux, this patch puts the value in the network
device alias property; where it is visible in ip tools and SNMP.

The host provided ifAlias is just a suggestion, and can be
overridden by later ip commands.

Also requires exporting dev_set_alias in netdev core.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0fe554a4

Merge branch 'r8169-series-with-further-smaller-improvements' · c4ff0b0f

David S. Miller authored Apr 18, 2018

Heiner Kallweit says:

====================
r8169: series with further smaller improvements

This series includes further smaller improvements.

Then I think the basic cleanup has been done and next step would be
preparing the switch to phylib.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

c4ff0b0f

r8169: remove jumbo_tx_csum from chip config struct · 6ed0e08f

Heiner Kallweit authored Apr 17, 2018

According to the chip configuration entries only RTL8169 (ver <= 06)
supports tx checksumming for jumbo packets.
By the way: constant JUMBO_1K is a little misleading because it refers
to the standard packet size and not to a jumbo packet size.

By implementing this rule we can get rid of configuring tx checksumming
support per chip type.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6ed0e08f

r8169: improve pci region handling · c8d48d9c

Heiner Kallweit authored Apr 17, 2018

The region to be used is always the first of type IORESOURCE_MEM.
We can implement this rule directly w/o having to specify which
region is the first one per configuration entry.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c8d48d9c

r8169: drop member txd_version from struct rtl8169_private · a4328ddb

Heiner Kallweit authored Apr 17, 2018

txd_version is used in rtl_init_one() only, so we can drop member
txd_version from struct rtl8169_private.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a4328ddb

r8169: improve rtl8169_get_mac_version · 90b989c5

Heiner Kallweit authored Apr 17, 2018

Certain entries in array mac_info[] are redundant, so remove them:
0x7cf, 0x2c200000 (VER 33): matched by entry 0x7c8, 0x2c000000
0x7cf, 0x28300000 (VER 26): matched by entry 0x7c8, 0x28000000
0x7cf, 0x3cb00000 (VER 24): matched by entry 0x7c8, 0x3c800000
0x7cf, 0x3c400000 (VER 22): matched by entry 0x7c8, 0x3c000000
0x7cf, 0x38500000 (VER 17): matched by entry 0x7c8, 0x38000000
0x7cf, 0x44900000 (VER 39): matched by entry 0x7c8, 0x44800000
0x7cf, 0x40b00000 (VER 30): matched by entry 0x7c8, 0x40800000
0x7cf, 0x40a00000 (VER 30): matched by entry 0x7c8, 0x40800000
0x7cf, 0x34a00000 (VER 09): matched by entry 0x7c8, 0x34800000
0x7cf, 0x24a00000 (VER 09): matched by entry 0x7c8, 0x24800000

In addition don't mask out bits 30 and 29 when printing the XID.
Most likely this is a relict from the times when the driver covered
RTL8169 chip version only.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

90b989c5

r8169: don't display tp->mmio_addr address · 2d6c5a61

Heiner Kallweit authored Apr 17, 2018

For security reasons since commit ad67b74d "printk: hash addresses
printed with %p" %p doesn't display the full address any longer.
We could switch to %px, but I think the pointer address doesn't
provide a real benefit, so remove printing the hashed address.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2d6c5a61

r8169: drop member opts1_mask from struct rtl8169_private · 6202806e

Heiner Kallweit authored Apr 17, 2018

We can get rid of member opts1_mask and in addition save a few cpu
cycles in the hot path of rtl_rx().
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6202806e

r8169: change interrupt handler argument type · ebcd5daa

Heiner Kallweit authored Apr 17, 2018

Code can be a little simplified by switching the interrupt handler
argument type to struct rtl8169_private *.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ebcd5daa

r8169: change argument type of counters handling functions · e71c9ce2

Heiner Kallweit authored Apr 17, 2018

The counter handling functions don't deal with the net_device, so code
can be simplified by changing the argument type to
struct rtl8169_private *.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e71c9ce2

r8169: change hw_start argument type · 61cb532d

Heiner Kallweit authored Apr 17, 2018

Code can be simplified by changing the argument type of hw_start
callbacks from struct net_device * to struct rtl8169_private *.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

61cb532d

r8169: remove rtl8169_map_to_asic · d731af78

Heiner Kallweit authored Apr 17, 2018

This function is very simple and used only once, so we can inline
the two statements.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d731af78

r8169: replace rx_buf_sz with a constant · 1d0254dd

Heiner Kallweit authored Apr 17, 2018

rx_buf_sz is constant, so we don't have to pass it as parameter and
in general can replace it with a constant.

When working on this I noticed that also before in
rtl_set_rx_max_size() a value of 0x4000 is set, what is not in line
with the chip spec. According to the spec only bits 0..13 are used
and we set an effective value of zero therefore.
However, the driver still seems to work and due to potential side
effects I'm reluctant to make a change.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1d0254dd

r8169: remove unneeded check in rtl8169_rx_fill · 1528192f

Heiner Kallweit authored Apr 17, 2018

rtl8169_rx_fill() is called only once and directly before the call
array tp->Rx_databuff[] is filled with zero's. Therefore we don't
need this check.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1528192f