Commits · fd372a7a9e5e9d8011a0222d10edd3523abcd3b1 · nexedi / linux

08 Mar, 2018 34 commits

Merge tag 'mlx5-updates-2018-02-28-2' of... · fd372a7a

David S. Miller authored Mar 08, 2018

Merge tag 'mlx5-updates-2018-02-28-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Saeed Mahameed says:

====================
mlx5-updates-2018-02-28-2 (IPSec-2)

This series follows our previous one to lay out the foundations for IPSec
in user-space and extend current kernel netdev IPSec support. As noted in
our previous pull request cover letter "mlx5-updates-2018-02-28-1 (IPSec-1)",
the IPSec mechanism will be supported through our flow steering mechanism.
Therefore, we need to change the initialization order. Furthermore, IPsec
is also supported in both egress and ingress. Since our current flow
steering is egress only, we add an empty (only implemented through FPGA
steering ops) egress namespace to handle that case. We also implement
the required flow steering callbacks and logic in our FPGA driver.

We extend the FPGA support for ESN and modifying a xfrm too. Therefore, we
add support for some new FPGA command interface that supports them. The
other required bits are added too. The new features and requirements are
advertised via cap bits.

Last but not least, we revise our driver's accel_esp API. This API will be
shared between our netdev and IB driver, so we need to have all the required
functionality from both worlds.

Regards,
Aviad and Matan
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

fd372a7a

Merge branch 'ibmvnic-Clean-up-net-close-and-fix-reset-bug' · 10c56b8d

David S. Miller authored Mar 08, 2018

Thomas Falcon says:

====================
ibmvnic: Clean up net close and fix reset bug

This patch set cleans up and reorganizes the driver's net_device
close function and leverages that to fix up a bug that can occur
during some device resets. Some reset cases require the backing
adapter to be disabled before continuing, but other cases, such as
during a device failover or partition migration, do not require this
step. Since the device will not be initialized at this stage and
its command-processing queue is closed, do not send the request to
disable the device as it could result in an error or timeout
disrupting the reset.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

10c56b8d

ibmvnic: Do not disable device during failover or partition migration · 18b8d6bb

Thomas Falcon authored Mar 07, 2018

During a device failover or partition migration reset, it is not
necessary to disable the backing adapter since it should not be
running yet and its Command-Response Queue is closed. Sending
device commands during this time could result in an error or
timeout disrupting the reset process. In these cases, just halt
transmissions, clean up resources, and continue with reset.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

18b8d6bb

ibmvnic: Reorganize device close · 01d9bd79

Thomas Falcon authored Mar 07, 2018

Introduce a function to halt network operations and clean up any
unused or outstanding socket buffers. Then, during device close,
disable backing adapter before halting all queues and performing
cleanup. This ensures all backing device operations will be
stopped before the driver cleans up shared resources.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

01d9bd79

ibmvnic: Clean up device close · f873866a

Thomas Falcon authored Mar 07, 2018

Remove some dead code now that RX pools are being cleaned. This
was included to wait until any pending RX queue interrupts are
processed, but NAPI polling should be disabled by this point.

Another minor change is to use the net device parameter for any
print functions instead of accessing it from the adapter structure.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f873866a

openvswitch: fix vport packet length check. · 46e371f0

William Tu authored Mar 07, 2018

When sending a packet to a tunnel device, the dev's hard_header_len
could be larger than the skb->len in function packet_length().
In the case of ip6gretap/erspan, hard_header_len = LL_MAX_HEADER + t_hlen,
which is around 180, and an ARP packet sent to this tunnel has
skb->len = 42. This causes the 'unsign int length' to become super
large because it is negative value, causing the later ovs_vport_send
to drop it due to over-mtu size. The patch fixes it by setting it to 0.
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

46e371f0

Merge branch 'pernet-convert-part5' · 55a165a7

David S. Miller authored Mar 08, 2018

Kirill Tkhai says:

====================
Converting pernet_operations (part #5)

this series continues to review and to convert pernet_operations
to make them possible to be executed in parallel for several
net namespaces in the same time. There are mostly netfilter
operations (and they should be the last netfilter's), also
there are two patches touching pktgen and xfrm.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

55a165a7

net: Convet ipv6_net_ops · 1fd2c557

Kirill Tkhai authored Mar 07, 2018

These pernet_operations are similar to ipv4_net_ops.
They are safe to be async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1fd2c557

net: Convert ipv4_net_ops · e8a95ad4

Kirill Tkhai authored Mar 07, 2018

These pernet_operations register and unregister bunch
of nf_conntrack_l4proto. Exit method unregisters related
sysctl, init method calls init_net and get_net_proto.
The whole builtin_l4proto4 array has pretty simple
init_net and get_net_proto methods. The first one register
sysctl table, the second one is just RO memory dereference.
So, these pernet_operations are safe to be marked as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e8a95ad4

net: Convert iptable_security_net_ops · 8dbc6e2e

Kirill Tkhai authored Mar 07, 2018

These pernet_operations unregister net::ipv4::iptable_security table.
Another net/pernet_operations do not send ipv4 packets to foreign
net namespaces. So, we mark them async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8dbc6e2e

net: Convert iptable_raw_net_ops · 65f828c3

Kirill Tkhai authored Mar 07, 2018

These pernet_operations unregister net::ipv4::iptable_raw table.
Another net/pernet_operations do not send ipv4 packets to foreign
net namespaces. So, we mark them async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

65f828c3

net: Convert iptable_nat_net_ops · 06a8a67b

Kirill Tkhai authored Mar 07, 2018

These pernet_operations unregister net::ipv4::nat_table table.
Another net/pernet_operations do not send ipv4 packets to foreign
net namespaces. So, we mark them async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

06a8a67b

net: Convert iptable_mangle_net_ops · 7ba81869

Kirill Tkhai authored Mar 07, 2018

These pernet_operations unregister net::ipv4::iptable_mangle table.
Another net/pernet_operations do not send ipv4 packets to foreign
net namespaces. So, we mark them async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7ba81869

net: Convert arptable_filter_net_ops · 93623f2b

Kirill Tkhai authored Mar 07, 2018

These pernet_operations unregister net::ipv4::arptable_filter.
Another net/pernet_operations do not send arp packets to foreign
net namespaces. So, we mark them async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

93623f2b

net: Convert pg_net_ops · 59d26973

Kirill Tkhai authored Mar 07, 2018

These pernet_operations create per-net pktgen threads
and /proc entries. These pernet subsys looks closed
in itself, and there are no pernet_operations outside
this file, which are interested in the threads.
Init and/or exit methods look safe to be executed
in parallel.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

59d26973

net: Convert nfnl_queue_net_ops · bd54dce0

Kirill Tkhai authored Mar 07, 2018

These pernet_operations register and unregister net::nf::queue_handler
and /proc entry. The handler is accessed only under RCU, so this looks
safe to convert them.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bd54dce0

net: Convert nfnl_log_net_ops · 74f26bbf

Kirill Tkhai authored Mar 07, 2018

These pernet_operations create and destroy /proc entries.
Also, exit method unsets nfulnl_logger. The logger is not
set by default, and it becomes bound via userspace request.
So, they look safe to be made async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

74f26bbf

net: Convert cttimeout_ops · ffdf72bc

Kirill Tkhai authored Mar 07, 2018

These pernet_operations also look closed in themself.
Exit method touch only per-net structures, so it's
safe to execute them for several net namespaces in parallel.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ffdf72bc

net: Convert nfnl_acct_ops · cf51503a

Kirill Tkhai authored Mar 07, 2018

These pernet_operations look closed in themself,
and there are no other users of net::nfnl_acct_list
outside. They are safe to be executed for several
net namespaces in parallel.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cf51503a

net: Convert nfnetlink_net_ops · 5a8e9be6

Kirill Tkhai authored Mar 07, 2018

These pernet_operations create and destroy net::nfnl
socket of NETLINK_NETFILTER code. There are no other
places, where such type the socket is created, except
these pernet_operations. It seem other pernet_operations
depending on CONFIG_NETFILTER_NETLINK send messages
to this socket. So, we mark it async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5a8e9be6

net: Convert nf_tables_net_ops · c7c5e435

Kirill Tkhai authored Mar 07, 2018

These pernet_operations looks nicely separated per-net.
Exit method unregisters net's nf tables objects.
We allow them be executed in parallel.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c7c5e435

net: Convert xfrm_user_net_ops · 649b9826

Kirill Tkhai authored Mar 07, 2018

These pernet_operations create and destroy net::xfrm::nlsk
socket of NETLINK_XFRM. There is only entry point, where
it's dereferenced, it's xfrm_user_rcv_msg(). There is no
in-kernel senders to this socket.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

649b9826

net: Convert ip6 tables pernet_operations · 997266a4

Kirill Tkhai authored Mar 07, 2018

The pernet_operations:

    ip6table_filter_net_ops
    ip6table_mangle_net_ops
    ip6table_nat_net_ops
    ip6table_raw_net_ops
    ip6table_security_net_ops

have exit methods, which call ip6t_unregister_table().
ip6table_filter_net_ops has init method registering
filter table.

Since there must not be in-flight ipv6 packets at the time
of pernet_operations execution and since pernet_operations
don't send ipv6 packets each other, these pernet_operations
are safe to be async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

997266a4

net/sched: cls_flower: Add support to handle first frag as match field · 459d153d

Pieter Jansen van Vuuren authored Mar 06, 2018

Allow setting firstfrag as matching option in tc flower classifier.

 # tc filter add dev eth0 protocol ip parent ffff: \
     flower indev eth0 \
        ip_flags firstfrag
     action mirred egress redirect dev eth1
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

459d153d

Merge branch 'hns3-next' · 08a24239

David S. Miller authored Mar 08, 2018

Peng Li says:

====================
fix some bugs for hns3 driver

This patchset fix some bugs for hns3 driver.
[Patch 1/6 - Patch 3/6] fix bugs related about VF driver.
[Patch 3/6 - Patch 6/6] fix the bugs about ethtool_ops.set_channels.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

08a24239

net: hns3: add support for VF driver inner interface hclgevf_ops.get_tqps_and_rss_info · cc719218

Peng Li authored Mar 08, 2018

This patch adds support for VF driver inner interface
hclgevf_ops.get_tqps_and_rss_info. This interface will be
used in the initialization process.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cc719218

net: hns3: set the max ring num when alloc netdev · 678335a1

Peng Li authored Mar 08, 2018

HNS3 driver should alloc netdev with max support ring num, as
driver support change netdev count by ethtool -L.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

678335a1

net: hns3: fix the queue id for tqp enable&&reset · 814e0274

Peng Li authored Mar 08, 2018

Command HCLGE_OPC_CFG_COM_TQP_QUEUE should use queue id in the
function, but command HCLGE_OPC_RESET_TQP_QUEUE should use global
queue id.
This patch fixes the queue id about queue enable/disable/reset.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

814e0274

net: hns3: fix endian issue when PF get mbx message flag · f18f0d4d

Peng Li authored Mar 08, 2018

This patch fixes the endian issue when PF get mbx message flag.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f18f0d4d

net: hns3: set the cmdq out_vld bit to 0 after used · 090e3b53

Peng Li authored Mar 08, 2018

Driver check the out_vld bit when get a new cmdq BD, if the bit is 1,
the BD is valid. driver Should set the bit 0 after used and hw will
set the bit 1 if get a valid BD.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

090e3b53

net: hns3: VF should get the real rss_size instead of rss_size_max · f5e084b8

Peng Li authored Mar 08, 2018

VF driver should get the real rss_size which is assigned
by host PF, not rss_size_max.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f5e084b8

devlink: Change dpipe/resource get privileges · 67ae686b

Arkadi Sharshevsky authored Mar 08, 2018

Let dpipe/resource be retrieved by unprivileged users.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

67ae686b

selftests/net: enable fragments for fib-onlink-tests · dcf1bcb6

Anders Roxell authored Mar 08, 2018

We miss CONFIG_* fragments so test fib-onlink-tests.sh can do:
ip li add lisa type vrf table 1101
ip li add veth1 type veth peer name veth2

And the follow message occurs if it isn't enabled:
Configuring interfaces
RTNETLINK answers: Operation not supported

This enables for NET_NRF (and friends) and VETH so we can create a vrf
table and veth.

Fixes: 153e1b84 ("selftests: Add FIB onlink tests")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

dcf1bcb6

ipvlan: properly annotate rx_handler access · ae5799dc

Paolo Abeni authored Mar 08, 2018

The rx_handler field is rcu-protected, but I forgot to use the
proper accessor while refactoring netif_is_ipvlan_port(). Such
function only check the rx_handler value, so it is safe, but we need
to properly read rx_handler via rcu_access_pointer() to avoid sparse
warnings.

Fixes: 1ec54cb4 ("net: unpollute priv_flags space")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ae5799dc

07 Mar, 2018 6 commits

net/mlx5: Fix wrongly assigned CQ reference counter · 31135eb3

Leon Romanovsky authored Feb 28, 2018

The kernel compiled with CONFIG_REFCOUNT_FULL produces the following
error. The reason to it that initial value of refcount_t is supposed
to be more than 0, change it.

[    3.106634] ------------[ cut here ]------------
[    3.107756] refcount_t: increment on 0; use-after-free.
[    3.109130] WARNING: CPU: 0 PID: 1 at lib/refcount.c:153 refcount_inc+0x27/0x30
[    3.110085] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.16.0-rc1-00028-gf683e04bdccc #137
[    3.110085] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[    3.110085] RIP: 0010:refcount_inc+0x27/0x30
[    3.110085] RSP: 0000:ffffaa620000fba0 EFLAGS: 00010286
[    3.110085] RAX: 0000000000000000 RBX: ffff9a6d1a1821c8 RCX: ffffffff98a50f48
[    3.110085] RDX: 0000000000000001 RSI: 0000000000000086 RDI: 0000000000000246
[    3.110085] RBP: ffff9a6d1ac800a0 R08: 0000000000000289 R09: 000000000000000a
[    3.110085] R10: fffff03bc0682840 R11: ffffffff9949856d R12: ffff9a6d1b4a4000
[    3.110085] R13: 0000000000000000 R14: ffff9a6d1a0a6c00 R15: ffffaa620000fc5c
[    3.110085] FS:  0000000000000000(0000) GS:ffff9a6d1fc00000(0000) knlGS:0000000000000000
[    3.110085] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.110085] CR2: 0000000000000000 CR3: 000000000ba0a000 CR4: 00000000000006b0
[    3.110085] Call Trace:
[    3.110085]  mlx5_core_create_cq+0xde/0x250
[    3.110085]  ? __kmalloc+0x1ce/0x1e0
[    3.110085]  mlx5e_create_cq+0x15c/0x1e0
[    3.110085]  mlx5e_open_drop_rq+0xea/0x190
[    3.110085]  mlx5e_attach_netdev+0x53/0x140
[    3.110085]  mlx5e_attach+0x3d/0x60
[    3.110085]  mlx5e_add+0x11d/0x2f0
[    3.110085]  mlx5_add_device+0x77/0x170
[    3.110085]  mlx5_register_interface+0x74/0xc0
[    3.110085]  ? set_debug_rodata+0x11/0x11
[    3.110085]  init+0x67/0x72
[    3.110085]  ? mlx4_en_init_ptys2ethtool_map+0x346/0x346
[    3.110085]  do_one_initcall+0x98/0x147
[    3.110085]  ? set_debug_rodata+0x11/0x11
[    3.110085]  kernel_init_freeable+0x164/0x1e0
[    3.110085]  ? rest_init+0xb0/0xb0
[    3.110085]  kernel_init+0xa/0x100
[    3.110085]  ret_from_fork+0x35/0x40
[    3.110085] Code: 00 00 00 00 e8 ab ff ff ff 84 c0 74 02 f3 c3 80 3d 3b c3 64 01 00 75 f5 48 c7 c7 68 0b 81 98 c6 05 2b c3 64 01 01 e8 79 d7 a3 ff <0f> ff c3 66 0f 1f 44 00 00 8b 06 83 f8 ff 74 39 31 c9 39 f8 89
[    3.110085] ---[ end trace a0068e1c68438a74 ]---

Fixes: f105b45b ("net/mlx5: CQ hold/put API")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

31135eb3

net/mlx5: IPSec, Add support for ESN · cb010083

Aviad Yehezkel authored Jan 18, 2018

Currently ESN is not supported with IPSec device offload.

This patch adds ESN support to IPsec device offload.
Implementing new xfrm device operation to synchronize offloading device
ESN with xfrm received SN. New QP command to update SA state at the
following:

           ESN 1                    ESN 2                  ESN 3
|-----------*-----------|-----------*-----------|-----------*
^           ^           ^           ^           ^           ^

^ - marks where QP command invoked to update the SA ESN state
    machine.
| - marks the start of the ESN scope (0-2^32-1). At this point move SA
    ESN overlap bit to zero and increment ESN.
* - marks the middle of the ESN scope (2^31). At this point move SA
    ESN overlap bit to one.
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Yossef Efraim <yossefe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

cb010083

net/mlx5e: Added common function for to_ipsec_sa_entry · 75ef3f55

Aviad Yehezkel authored Jan 18, 2018

New function for getting driver internal sa entry from xfrm state.
All checks are done in one function.
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

75ef3f55

net/mlx5: Add flow-steering commands for FPGA IPSec implementation · 05564d0a

Aviad Yehezkel authored Feb 18, 2018

In order to add a context to the FPGA, we need to get both the software
transform context (which includes the keys, etc) and the
source/destination IPs (which are included in the steering
rule). Therefore, we register new set of firmware like commands for
the FPGA. Each time a rule is added, the steering core infrastructure
calls the FPGA command layer. If the rule is intended for the FPGA,
it combines the IPs information with the software transformation
context and creates the respective hardware transform.
Afterwards, it calls the standard steering command layer.
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

05564d0a

net/mlx5: Refactor accel IPSec code · d6c4f029

Aviad Yehezkel authored Jan 18, 2018

The current code has one layer that executed FPGA commands and
the Ethernet part directly used this code. Since downstream patches
introduces support for IPSec in mlx5_ib, we need to provide some
abstractions. This patch refactors the accel code into one layer
that creates a software IPSec transformation and another one which
creates the actual hardware context.
The internal command implementation is now hidden in the FPGA
core layer. The code also adds the ability to share FPGA hardware
contexts. If two contexts are the same, only a reference count
is taken.
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

d6c4f029

net/mlx5: Added required metadata capability for ipsec · af9fe19d

Aviad Yehezkel authored Jan 17, 2018

Currently our device requires additional metadata in packet
to perform ipsec crypto offload.
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

af9fe19d