Commits · 0623951eb87ce1e3aba41d27cadbb9f49398de66 · Kirill Smelkov / linux

09 Mar, 2018 21 commits

Merge branch 'sched-action-events' · 0623951e

David S. Miller authored Mar 09, 2018

Roman Mashak says:

====================
Fix event generation for actions batch Add/Delete mode

When adding or deleting a batch of entries, the kernel sends upto
TCA_ACT_MAX_PRIO entries in an event to user space. However it does not
consider that the action sizes may vary and require different skb sizes.

For example :

% cat tc-batch.sh
TC="sudo /mnt/iproute2.git/tc/tc"

$TC actions flush action gact
for i in `seq 1 $1`;
do
   cmd="action pass index $i "
   args=$args$cmd
done
$TC actions add $args
%
% ./tc-batch.sh 32
Error: Failed to fill netlink attributes while adding TC action.
We have an error talking to the kernel
%

This patchset introduces new callback in tc_action_ops, which calculates
the action size, and passes size to tcf_add_notify()/tcf_del_notify(). The
patch fixes act_gact, and the rest of actions will be updated in the
follow-up patches.

v3:
   Fixed tcf_action_fill_size() to return shared attrs length when
   action ->get_fill_size() isn't implemented.
v2:
   Restructured patches to make them bisectable.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

0623951e

net sched actions: implement get_fill_size routine in act_gact · 9c5c9c57
Roman Mashak authored Mar 08, 2018
```
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
9c5c9c57

net sched actions: calculate add/delete event message size · 4e76e75d

Roman Mashak authored Mar 08, 2018

Introduce routines to calculate size of the shared tc netlink attributes
and the full message size including netlink header and tc service header.

Update add/delete action logic to have the size for event messages,
the size is passed to tcf_add_notify() and tcf_del_notify() where the
notification message is being allocated and constructed.
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4e76e75d

net sched actions: add new tc_action_ops callback · a03b91b1

Roman Mashak authored Mar 08, 2018

Add a new callback in tc_action_ops, it will be needed by the tc actions
to compute its size when a ADD/DELETE notification message is constructed.
This routine has to take into account optional/variable size TLVs specific
per action.
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a03b91b1

net sched actions: update Add/Delete action API with new argument · d04e6990

Roman Mashak authored Mar 08, 2018

Introduce a new function argument to carry total attributes size for
correct allocation of skb in event messages.
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d04e6990

net: do not create fallback tunnels for non-default namespaces · 79134e6c

Eric Dumazet authored Mar 08, 2018

fallback tunnels (like tunl0, gre0, gretap0, erspan0, sit0,
ip6tnl0, ip6gre0) are automatically created when the corresponding
module is loaded.

These tunnels are also automatically created when a new network
namespace is created, at a great cost.

In many cases, netns are used for isolation purposes, and these
extra network devices are a waste of resources. We are using
thousands of netns per host, and hit the netns creation/delete
bottleneck a lot. (Many thanks to Kirill for recent work on this)

Add a new sysctl so that we can opt-out from this automatic creation.

Note that these tunnels are still created for the initial namespace,
to be the least intrusive for typical setups.

Tested:
lpk43:~# cat add_del_unshare.sh
for i in `seq 1 40`
do
 (for j in `seq 1 100` ; do  unshare -n /bin/true >/dev/null ; done) &
done
wait

lpk43:~# echo 0 >/proc/sys/net/core/fb_tunnels_only_for_init_net
lpk43:~# time ./add_del_unshare.sh

real	0m37.521s
user	0m0.886s
sys	7m7.084s
lpk43:~# echo 1 >/proc/sys/net/core/fb_tunnels_only_for_init_net
lpk43:~# time ./add_del_unshare.sh

real	0m4.761s
user	0m0.851s
sys	1m8.343s
lpk43:~#
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

79134e6c

tools: tc-testing: Can pause just before post-suite · 2b3905de

Brenda J. Butler authored Mar 08, 2018

With option -P, the test script will pause just before
the post_suite functions are called.  This allows the tester to
inspect the system before it is torn down.
Signed-off-by: Brenda J. Butler <bjb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2b3905de

tools: tc-testing: Can refer to $TESTID in test spec · 75291f3a

Brenda J. Butler authored Mar 08, 2018

When processing the commands in the test cases, substitute
the test id for $TESTID.  This helps to make more flexible
tests.  For example, the testid can be given as a command
line argument.

As an example, if we wish to save the test output to a file
named for the test case, we can write in the test case:

"cmdUnderTest": "some test command | tee -a $TESTID.out"
Signed-off-by: Brenda J. Butler <bjb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

75291f3a

net: dsa: mv88e6xxx: Fix irq free'ing · b19e5c15

Andrew Lunn authored Mar 08, 2018

Call the common irq free function, rather than going recursive and
blowing away the stack, followed by the machine.

Fixes: 294d711e ("net: dsa: mv88e6xxx: Poll when no interrupt defined")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

b19e5c15

tc-testing: add csum tests · 8edfaf7d

Roman Mashak authored Mar 08, 2018

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Tested-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8edfaf7d

net: usb: asix88179_178a: de-duplicate code · cf29bded

Alexander Kurz authored Mar 08, 2018

Remove the duplicated code for asix88179_178a bind and reset methods.
Signed-off-by: Alexander Kurz <akurz@blala.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

cf29bded

net: usb: asix88179_178a: set permanent address once only · 84c4df40

Alexander Kurz authored Mar 08, 2018

The permanent address of asix88179_178a devices is read at probe time
and should not be overwritten later. Otherwise it may be overwritten
unintentionally with a configured address.
Signed-off-by: Alexander Kurz <akurz@blala.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

84c4df40

Merge branch 'ntuple-filters-with-RSS' · 9b5d5f4f

David S. Miller authored Mar 08, 2018

Edward Cree says:

====================
ntuple filters with RSS

This series introduces the ability to mark an ethtool steering filter to use
 RSS spreading, and the ability to create and configure multiple RSS contexts
 with different indirection tables, hash keys, and hash fields.
An implementation for the sfc driver (for 7000-series and later SFC NICs) is
 included in patch 2/2.

The anticipated use case of this feature is for steering traffic destined for
 a container (or virtual machine) to the subset of CPUs on which processes in
 the container (or the VM's vCPUs) are bound, while retaining the scalability
 of RSS spreading from the viewpoint inside the container.
The use of both a base queue number (ring_cookie) and indirection table is
 intended to allow re-use of a single RSS context to target multiple sets of
 CPUs.  For instance, if an 8-core system is hosting three containers on CPUs
 [1,2], [3,4] and [6,7], then a single RSS context with an equal-weight [0,1]
 indirection table could be used to target all three containers by setting
 ring_cookie to 1, 3 and 6 on the respective filters.

v2: Initialised ctx in efx_ef10_filter_insert() to avoid (false positive) gcc
 warning.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

9b5d5f4f

sfc: support RSS spreading of ethtool ntuple filters · 42356d9a

Edward Cree authored Mar 08, 2018

Use a linked list to associate user-facing context IDs with FW-facing
 context IDs (since the latter can change after an MC reset).
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

42356d9a

net: ethtool: extend RXNFC API to support RSS spreading of filter matches · 84a1d9c4

Edward Cree authored Mar 08, 2018

We use a two-step process to configure a filter with RSS spreading. First,
the RSS context is allocated and configured using ETHTOOL_SRSSH; this
returns an identifier (rss_context) which can then be passed to subsequent
invocations of ETHTOOL_SRXCLSRLINS to specify that the offset from the RSS
indirection table lookup should be added to the queue number (ring_cookie)
when delivering the packet. Drivers for devices which can only use the
indirection table entry directly (not add it to a base queue number)
should reject rule insertions combining RSS with a nonzero ring_cookie.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

84a1d9c4

rds: rds_info_from_znotifier() can be static · 571e6776

kbuild test robot authored Mar 08, 2018

Fixes: 9426bbc6 ("rds: use list structure to track information for zerocopy completion notification")
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

571e6776

rds: rds_message_zcopy_from_user() can be static · 496c7f3c

kbuild test robot authored Mar 08, 2018

Fixes: d40a126b ("rds: refactor zcopy code into rds_message_zcopy_from_user")
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

496c7f3c

net/ncsi: unlock on error in ncsi_set_interface_nl() · 054f34da

Dan Carpenter authored Mar 08, 2018

There are two error paths which are missing unlocks in this function.

Fixes: 955dc68c ("net/ncsi: Add generic netlink family")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

054f34da

net/ncsi: use kfree_skb() instead of kfree() · 50db64b0

Dan Carpenter authored Mar 08, 2018

We're supposed to use kfree_skb() to free these sk_buffs.

Fixes: 955dc68c ("net/ncsi: Add generic netlink family")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

50db64b0

liquidio: avoid doing useless work · cecd8d81

Prasad Kanneganti authored Mar 07, 2018

Avoid doing useless work by making sure that the response_list is not empty
before scheduling work to process it.
Signed-off-by: Prasad Kanneganti <prasad.kanneganti@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cecd8d81

liquidio: Resolved mbox read issue while reading more than one 64bit data · fcbedd0f

Intiyaz Basha authored Mar 07, 2018

Corrected length check when data received in the mbox is more than one
64 bit data value
Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fcbedd0f

08 Mar, 2018 19 commits

Merge tag 'mlx5-updates-2018-02-28-2' of... · fd372a7a

David S. Miller authored Mar 08, 2018

Merge tag 'mlx5-updates-2018-02-28-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Saeed Mahameed says:

====================
mlx5-updates-2018-02-28-2 (IPSec-2)

This series follows our previous one to lay out the foundations for IPSec
in user-space and extend current kernel netdev IPSec support. As noted in
our previous pull request cover letter "mlx5-updates-2018-02-28-1 (IPSec-1)",
the IPSec mechanism will be supported through our flow steering mechanism.
Therefore, we need to change the initialization order. Furthermore, IPsec
is also supported in both egress and ingress. Since our current flow
steering is egress only, we add an empty (only implemented through FPGA
steering ops) egress namespace to handle that case. We also implement
the required flow steering callbacks and logic in our FPGA driver.

We extend the FPGA support for ESN and modifying a xfrm too. Therefore, we
add support for some new FPGA command interface that supports them. The
other required bits are added too. The new features and requirements are
advertised via cap bits.

Last but not least, we revise our driver's accel_esp API. This API will be
shared between our netdev and IB driver, so we need to have all the required
functionality from both worlds.

Regards,
Aviad and Matan
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

fd372a7a

Merge branch 'ibmvnic-Clean-up-net-close-and-fix-reset-bug' · 10c56b8d

David S. Miller authored Mar 08, 2018

Thomas Falcon says:

====================
ibmvnic: Clean up net close and fix reset bug

This patch set cleans up and reorganizes the driver's net_device
close function and leverages that to fix up a bug that can occur
during some device resets. Some reset cases require the backing
adapter to be disabled before continuing, but other cases, such as
during a device failover or partition migration, do not require this
step. Since the device will not be initialized at this stage and
its command-processing queue is closed, do not send the request to
disable the device as it could result in an error or timeout
disrupting the reset.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

10c56b8d

ibmvnic: Do not disable device during failover or partition migration · 18b8d6bb

Thomas Falcon authored Mar 07, 2018

During a device failover or partition migration reset, it is not
necessary to disable the backing adapter since it should not be
running yet and its Command-Response Queue is closed. Sending
device commands during this time could result in an error or
timeout disrupting the reset process. In these cases, just halt
transmissions, clean up resources, and continue with reset.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

18b8d6bb

ibmvnic: Reorganize device close · 01d9bd79

Thomas Falcon authored Mar 07, 2018

Introduce a function to halt network operations and clean up any
unused or outstanding socket buffers. Then, during device close,
disable backing adapter before halting all queues and performing
cleanup. This ensures all backing device operations will be
stopped before the driver cleans up shared resources.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

01d9bd79

ibmvnic: Clean up device close · f873866a

Thomas Falcon authored Mar 07, 2018

Remove some dead code now that RX pools are being cleaned. This
was included to wait until any pending RX queue interrupts are
processed, but NAPI polling should be disabled by this point.

Another minor change is to use the net device parameter for any
print functions instead of accessing it from the adapter structure.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f873866a

openvswitch: fix vport packet length check. · 46e371f0

William Tu authored Mar 07, 2018

When sending a packet to a tunnel device, the dev's hard_header_len
could be larger than the skb->len in function packet_length().
In the case of ip6gretap/erspan, hard_header_len = LL_MAX_HEADER + t_hlen,
which is around 180, and an ARP packet sent to this tunnel has
skb->len = 42. This causes the 'unsign int length' to become super
large because it is negative value, causing the later ovs_vport_send
to drop it due to over-mtu size. The patch fixes it by setting it to 0.
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

46e371f0

Merge branch 'pernet-convert-part5' · 55a165a7

David S. Miller authored Mar 08, 2018

Kirill Tkhai says:

====================
Converting pernet_operations (part #5)

this series continues to review and to convert pernet_operations
to make them possible to be executed in parallel for several
net namespaces in the same time. There are mostly netfilter
operations (and they should be the last netfilter's), also
there are two patches touching pktgen and xfrm.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

55a165a7

net: Convet ipv6_net_ops · 1fd2c557

Kirill Tkhai authored Mar 07, 2018

These pernet_operations are similar to ipv4_net_ops.
They are safe to be async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1fd2c557

net: Convert ipv4_net_ops · e8a95ad4

Kirill Tkhai authored Mar 07, 2018

These pernet_operations register and unregister bunch
of nf_conntrack_l4proto. Exit method unregisters related
sysctl, init method calls init_net and get_net_proto.
The whole builtin_l4proto4 array has pretty simple
init_net and get_net_proto methods. The first one register
sysctl table, the second one is just RO memory dereference.
So, these pernet_operations are safe to be marked as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e8a95ad4

net: Convert iptable_security_net_ops · 8dbc6e2e

Kirill Tkhai authored Mar 07, 2018

These pernet_operations unregister net::ipv4::iptable_security table.
Another net/pernet_operations do not send ipv4 packets to foreign
net namespaces. So, we mark them async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8dbc6e2e

net: Convert iptable_raw_net_ops · 65f828c3

Kirill Tkhai authored Mar 07, 2018

These pernet_operations unregister net::ipv4::iptable_raw table.
Another net/pernet_operations do not send ipv4 packets to foreign
net namespaces. So, we mark them async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

65f828c3

net: Convert iptable_nat_net_ops · 06a8a67b

Kirill Tkhai authored Mar 07, 2018

These pernet_operations unregister net::ipv4::nat_table table.
Another net/pernet_operations do not send ipv4 packets to foreign
net namespaces. So, we mark them async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

06a8a67b

net: Convert iptable_mangle_net_ops · 7ba81869

Kirill Tkhai authored Mar 07, 2018

These pernet_operations unregister net::ipv4::iptable_mangle table.
Another net/pernet_operations do not send ipv4 packets to foreign
net namespaces. So, we mark them async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7ba81869

net: Convert arptable_filter_net_ops · 93623f2b

Kirill Tkhai authored Mar 07, 2018

These pernet_operations unregister net::ipv4::arptable_filter.
Another net/pernet_operations do not send arp packets to foreign
net namespaces. So, we mark them async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

93623f2b

net: Convert pg_net_ops · 59d26973

Kirill Tkhai authored Mar 07, 2018

These pernet_operations create per-net pktgen threads
and /proc entries. These pernet subsys looks closed
in itself, and there are no pernet_operations outside
this file, which are interested in the threads.
Init and/or exit methods look safe to be executed
in parallel.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

59d26973

net: Convert nfnl_queue_net_ops · bd54dce0

Kirill Tkhai authored Mar 07, 2018

These pernet_operations register and unregister net::nf::queue_handler
and /proc entry. The handler is accessed only under RCU, so this looks
safe to convert them.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bd54dce0

net: Convert nfnl_log_net_ops · 74f26bbf

Kirill Tkhai authored Mar 07, 2018

These pernet_operations create and destroy /proc entries.
Also, exit method unsets nfulnl_logger. The logger is not
set by default, and it becomes bound via userspace request.
So, they look safe to be made async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

74f26bbf

net: Convert cttimeout_ops · ffdf72bc

Kirill Tkhai authored Mar 07, 2018

These pernet_operations also look closed in themself.
Exit method touch only per-net structures, so it's
safe to execute them for several net namespaces in parallel.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ffdf72bc

net: Convert nfnl_acct_ops · cf51503a

Kirill Tkhai authored Mar 07, 2018

These pernet_operations look closed in themself,
and there are no other users of net::nfnl_acct_list
outside. They are safe to be executed for several
net namespaces in parallel.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cf51503a