Commits · 46e371f0e78a82186a83cbcb4f4b8850417c7dd5 · Kirill Smelkov / linux

08 Mar, 2018 29 commits

openvswitch: fix vport packet length check. · 46e371f0

William Tu authored Mar 07, 2018

When sending a packet to a tunnel device, the dev's hard_header_len
could be larger than the skb->len in function packet_length().
In the case of ip6gretap/erspan, hard_header_len = LL_MAX_HEADER + t_hlen,
which is around 180, and an ARP packet sent to this tunnel has
skb->len = 42. This causes the 'unsign int length' to become super
large because it is negative value, causing the later ovs_vport_send
to drop it due to over-mtu size. The patch fixes it by setting it to 0.
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

46e371f0

Merge branch 'pernet-convert-part5' · 55a165a7

David S. Miller authored Mar 08, 2018

Kirill Tkhai says:

====================
Converting pernet_operations (part #5)

this series continues to review and to convert pernet_operations
to make them possible to be executed in parallel for several
net namespaces in the same time. There are mostly netfilter
operations (and they should be the last netfilter's), also
there are two patches touching pktgen and xfrm.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

55a165a7

net: Convet ipv6_net_ops · 1fd2c557

Kirill Tkhai authored Mar 07, 2018

These pernet_operations are similar to ipv4_net_ops.
They are safe to be async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1fd2c557

net: Convert ipv4_net_ops · e8a95ad4

Kirill Tkhai authored Mar 07, 2018

These pernet_operations register and unregister bunch
of nf_conntrack_l4proto. Exit method unregisters related
sysctl, init method calls init_net and get_net_proto.
The whole builtin_l4proto4 array has pretty simple
init_net and get_net_proto methods. The first one register
sysctl table, the second one is just RO memory dereference.
So, these pernet_operations are safe to be marked as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e8a95ad4

net: Convert iptable_security_net_ops · 8dbc6e2e

Kirill Tkhai authored Mar 07, 2018

These pernet_operations unregister net::ipv4::iptable_security table.
Another net/pernet_operations do not send ipv4 packets to foreign
net namespaces. So, we mark them async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8dbc6e2e

net: Convert iptable_raw_net_ops · 65f828c3

Kirill Tkhai authored Mar 07, 2018

These pernet_operations unregister net::ipv4::iptable_raw table.
Another net/pernet_operations do not send ipv4 packets to foreign
net namespaces. So, we mark them async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

65f828c3

net: Convert iptable_nat_net_ops · 06a8a67b

Kirill Tkhai authored Mar 07, 2018

These pernet_operations unregister net::ipv4::nat_table table.
Another net/pernet_operations do not send ipv4 packets to foreign
net namespaces. So, we mark them async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

06a8a67b

net: Convert iptable_mangle_net_ops · 7ba81869

Kirill Tkhai authored Mar 07, 2018

These pernet_operations unregister net::ipv4::iptable_mangle table.
Another net/pernet_operations do not send ipv4 packets to foreign
net namespaces. So, we mark them async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7ba81869

net: Convert arptable_filter_net_ops · 93623f2b

Kirill Tkhai authored Mar 07, 2018

These pernet_operations unregister net::ipv4::arptable_filter.
Another net/pernet_operations do not send arp packets to foreign
net namespaces. So, we mark them async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

93623f2b

net: Convert pg_net_ops · 59d26973

Kirill Tkhai authored Mar 07, 2018

These pernet_operations create per-net pktgen threads
and /proc entries. These pernet subsys looks closed
in itself, and there are no pernet_operations outside
this file, which are interested in the threads.
Init and/or exit methods look safe to be executed
in parallel.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

59d26973

net: Convert nfnl_queue_net_ops · bd54dce0

Kirill Tkhai authored Mar 07, 2018

These pernet_operations register and unregister net::nf::queue_handler
and /proc entry. The handler is accessed only under RCU, so this looks
safe to convert them.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bd54dce0

net: Convert nfnl_log_net_ops · 74f26bbf

Kirill Tkhai authored Mar 07, 2018

These pernet_operations create and destroy /proc entries.
Also, exit method unsets nfulnl_logger. The logger is not
set by default, and it becomes bound via userspace request.
So, they look safe to be made async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

74f26bbf

net: Convert cttimeout_ops · ffdf72bc

Kirill Tkhai authored Mar 07, 2018

These pernet_operations also look closed in themself.
Exit method touch only per-net structures, so it's
safe to execute them for several net namespaces in parallel.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ffdf72bc

net: Convert nfnl_acct_ops · cf51503a

Kirill Tkhai authored Mar 07, 2018

These pernet_operations look closed in themself,
and there are no other users of net::nfnl_acct_list
outside. They are safe to be executed for several
net namespaces in parallel.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cf51503a

net: Convert nfnetlink_net_ops · 5a8e9be6

Kirill Tkhai authored Mar 07, 2018

These pernet_operations create and destroy net::nfnl
socket of NETLINK_NETFILTER code. There are no other
places, where such type the socket is created, except
these pernet_operations. It seem other pernet_operations
depending on CONFIG_NETFILTER_NETLINK send messages
to this socket. So, we mark it async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5a8e9be6

net: Convert nf_tables_net_ops · c7c5e435

Kirill Tkhai authored Mar 07, 2018

These pernet_operations looks nicely separated per-net.
Exit method unregisters net's nf tables objects.
We allow them be executed in parallel.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c7c5e435

net: Convert xfrm_user_net_ops · 649b9826

Kirill Tkhai authored Mar 07, 2018

These pernet_operations create and destroy net::xfrm::nlsk
socket of NETLINK_XFRM. There is only entry point, where
it's dereferenced, it's xfrm_user_rcv_msg(). There is no
in-kernel senders to this socket.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

649b9826

net: Convert ip6 tables pernet_operations · 997266a4

Kirill Tkhai authored Mar 07, 2018

The pernet_operations:

    ip6table_filter_net_ops
    ip6table_mangle_net_ops
    ip6table_nat_net_ops
    ip6table_raw_net_ops
    ip6table_security_net_ops

have exit methods, which call ip6t_unregister_table().
ip6table_filter_net_ops has init method registering
filter table.

Since there must not be in-flight ipv6 packets at the time
of pernet_operations execution and since pernet_operations
don't send ipv6 packets each other, these pernet_operations
are safe to be async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

997266a4

net/sched: cls_flower: Add support to handle first frag as match field · 459d153d

Pieter Jansen van Vuuren authored Mar 06, 2018

Allow setting firstfrag as matching option in tc flower classifier.

 # tc filter add dev eth0 protocol ip parent ffff: \
     flower indev eth0 \
        ip_flags firstfrag
     action mirred egress redirect dev eth1
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

459d153d

Merge branch 'hns3-next' · 08a24239

David S. Miller authored Mar 08, 2018

Peng Li says:

====================
fix some bugs for hns3 driver

This patchset fix some bugs for hns3 driver.
[Patch 1/6 - Patch 3/6] fix bugs related about VF driver.
[Patch 3/6 - Patch 6/6] fix the bugs about ethtool_ops.set_channels.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

08a24239

net: hns3: add support for VF driver inner interface hclgevf_ops.get_tqps_and_rss_info · cc719218

Peng Li authored Mar 08, 2018

This patch adds support for VF driver inner interface
hclgevf_ops.get_tqps_and_rss_info. This interface will be
used in the initialization process.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cc719218

net: hns3: set the max ring num when alloc netdev · 678335a1

Peng Li authored Mar 08, 2018

HNS3 driver should alloc netdev with max support ring num, as
driver support change netdev count by ethtool -L.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

678335a1

net: hns3: fix the queue id for tqp enable&&reset · 814e0274

Peng Li authored Mar 08, 2018

Command HCLGE_OPC_CFG_COM_TQP_QUEUE should use queue id in the
function, but command HCLGE_OPC_RESET_TQP_QUEUE should use global
queue id.
This patch fixes the queue id about queue enable/disable/reset.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

814e0274

net: hns3: fix endian issue when PF get mbx message flag · f18f0d4d

Peng Li authored Mar 08, 2018

This patch fixes the endian issue when PF get mbx message flag.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f18f0d4d

net: hns3: set the cmdq out_vld bit to 0 after used · 090e3b53

Peng Li authored Mar 08, 2018

Driver check the out_vld bit when get a new cmdq BD, if the bit is 1,
the BD is valid. driver Should set the bit 0 after used and hw will
set the bit 1 if get a valid BD.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

090e3b53

net: hns3: VF should get the real rss_size instead of rss_size_max · f5e084b8

Peng Li authored Mar 08, 2018

VF driver should get the real rss_size which is assigned
by host PF, not rss_size_max.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f5e084b8

devlink: Change dpipe/resource get privileges · 67ae686b

Arkadi Sharshevsky authored Mar 08, 2018

Let dpipe/resource be retrieved by unprivileged users.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

67ae686b

selftests/net: enable fragments for fib-onlink-tests · dcf1bcb6

Anders Roxell authored Mar 08, 2018

We miss CONFIG_* fragments so test fib-onlink-tests.sh can do:
ip li add lisa type vrf table 1101
ip li add veth1 type veth peer name veth2

And the follow message occurs if it isn't enabled:
Configuring interfaces
RTNETLINK answers: Operation not supported

This enables for NET_NRF (and friends) and VETH so we can create a vrf
table and veth.

Fixes: 153e1b84 ("selftests: Add FIB onlink tests")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

dcf1bcb6

ipvlan: properly annotate rx_handler access · ae5799dc

Paolo Abeni authored Mar 08, 2018

The rx_handler field is rcu-protected, but I forgot to use the
proper accessor while refactoring netif_is_ipvlan_port(). Such
function only check the rx_handler value, so it is safe, but we need
to properly read rx_handler via rcu_access_pointer() to avoid sparse
warnings.

Fixes: 1ec54cb4 ("net: unpollute priv_flags space")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ae5799dc

07 Mar, 2018 11 commits

ip6mr: remove synchronize_rcu() in favor of SOCK_RCU_FREE · a366e300

Eric Dumazet authored Mar 07, 2018

Kirill found that recently added synchronize_rcu() call in
ip6mr_sk_done()
was slowing down netns dismantle and posted a patch to use it only if
the socket
was found.

I instead suggested to get rid of this call, and use instead
SOCK_RCU_FREE

We might later change IPv4 side to use the same technique and unify
both stacks. IPv4 does not use synchronize_rcu() but has a call_rcu()
that could be replaced by SOCK_RCU_FREE.

Tested:
 time for i in {1..1000}; do unshare -n /bin/false;done

 Before : real 7m18.911s
 After : real 10.187s

Fixes: 8571ab47 ("ip6mr: Make mroute_sk rcu-based")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Yuval Mintz <yuvalm@mellanox.com>
Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a366e300

Merge branch 'RDS-zerocopy-code-enhancements' · 3f6be0eb

David S. Miller authored Mar 07, 2018

Sowmini Varadhan says:

====================
RDS: zerocopy code enhancements

A couple of enhancements to the rds zerocop code
- patch 1 refactors rds_message_copy_from_user to pull the zcopy logic
  into its own function
- patch 2 drops the usage sk_buff to track MSG_ZEROCOPY cookies and
  uses a simple linked list (enhancement suggested by willemb during
  code review)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

3f6be0eb

rds: use list structure to track information for zerocopy completion notification · 9426bbc6

Sowmini Varadhan authored Mar 06, 2018

Commit 401910db ("rds: deliver zerocopy completion notification
with data") removes support fo r zerocopy completion notification
on the sk_error_queue, thus we no longer need to track the cookie
information in sk_buff structures.

This commit removes the struct sk_buff_head rs_zcookie_queue by
a simpler list that results in a smaller memory footprint as well
as more efficient memory_allocation time.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9426bbc6

rds: refactor zcopy code into rds_message_zcopy_from_user · d40a126b

Sowmini Varadhan authored Mar 06, 2018

Move the large block of code predicated on zcopy from
rds_message_copy_from_user into a new function,
rds_message_zcopy_from_user()
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d40a126b

cxgb3: remove VLA usage · c33b3b9f

Gustavo A. R. Silva authored Mar 07, 2018

Remove VLA usage and change the 'len' argument to a u8 and use a 256
byte buffer on the stack. Notice that these lengths are limited by the
encoding field in the VPD structure, which is a u8 [1].

[1] https://marc.info/?l=linux-netdev&m=152044354814024&w=2Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c33b3b9f

sock: Fix SO_ZEROCOPY switch case · 334e6413

Jesus Sanchez-Palencia authored Mar 07, 2018

Fix the SO_ZEROCOPY switch case on sock_setsockopt() avoiding the
ret values to be overwritten by the one set on the default case.

Fixes: 28190752 ("sock: permit SO_ZEROCOPY on PF_RDS socket")
Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

334e6413

Merge branch 'mvpp2-ucast-filter' · 92dbb332

David S. Miller authored Mar 07, 2018

Maxime Chevallier says:

====================
net: mvpp2: Add Unicast filtering capabilities

This series adds unicast filtering support to the Marvell PPv2 controller.

This is implemented using the header parser cababilities of the PPv2,
which allows for generic packet filtering based on matching patterns in
the packet headers.

PPv2 controller only has 256 of these entries, and we need to share them
with other features, such as VLAN filtering.

For each interface, we have 5 entries dedicated to unicast filtering (the
controller's own address, and 4 other), and 21 to multicast filtering.

When this number is reached, the controller switches to unicast or
multicast promiscuous mode.

The first patch reworks the function that adds and removes addresses to the
filter. This is preparatory work to ease UC filter implementation.

The second patch adds the UC filtering feature.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

92dbb332

net: mvpp2: Add support for unicast filtering · 10fea26c

Maxime Chevallier authored Mar 07, 2018

Marvell PPv2 controller can be used to implement packet filtering based
on the destination MAC address. This is already used to implement
multicast filtering. This patch adds support for Unicast filtering.

Filtering is based on so-called "TCAM entries" to implement filtering.
Due to their limited number and the fact that these are also used for
other purposes, we reserve 80 entries for both unicast and multicast
filters. On top of the broadcast address, and each interface's own MAC
address, we reserve 25 entries per port, 4 for unicast filters, 21 for
multicast.

Whenever unicast or multicast range for one port is full, the filtering
is disabled and port goes into promiscuous mode for the given type of
addresses.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

10fea26c

net: mvpp2: Simplify MAC filtering function parameters · ce2a27c7

Maxime Chevallier authored Mar 07, 2018

The mvpp2_prs_mac_da_accept function takes into parameter both the
struct representing the controller and the port id. This is meaningful
when we want to create TCAM entries for non-initialized ports, but in
this case we expect the port to be initialized before starting adding or
removing MAC addresses to the per-port filter.

This commit changes the function so that it takes struct mvpp2_port as
a parameter instead.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ce2a27c7

selftests: forwarding: fix flags passed to first drop rule in gact_drop_and_ok_test · dff58a09

Jiri Pirko authored Mar 07, 2018

Fix copy&paste error and pass proper flags.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dff58a09

selftests: forwarding: fix "ok" action test · 0c17db05

Jiri Pirko authored Mar 07, 2018

Fix the "ok" action test so it checks that packet that is okayed does not
continue to be processed by other rules. Fix error message as well.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0c17db05