Commits · 5a712e1363c8ecb4b504a888833ef91416314c36 · nexedi / linux

16 Sep, 2019 4 commits

samples/bpf: fix xdpsock l2fwd tx for unaligned mode · 5a712e13

Ciara Loftus authored Sep 13, 2019

Preserve the offset of the address of the received descriptor, and include
it in the address set for the tx descriptor, so the kernel can correctly
locate the start of the packet data.

Fixes: 03895e63 ("samples/bpf: add buffer recycling for unaligned chunks to xdpsock")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

5a712e13

ixgbe: fix xdp handle calculations · 2e78fc62

Ciara Loftus authored Sep 13, 2019

Commit 7cbbf9f1 ("ixgbe: fix xdp handle calculations") reintroduced
the addition of the umem headroom to the xdp handle in the ixgbe_zca_free,
ixgbe_alloc_buffer_slow_zc and ixgbe_alloc_buffer_zc functions. However,
the headroom is already added to the handle in the function
ixgbe_run_xdp_zc. This commit removes the latter addition and fixes the
case where the headroom is non-zero.

Fixes: 7cbbf9f1 ("ixgbe: fix xdp handle calculations")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

2e78fc62

i40e: fix xdp handle calculations · 168dfc3a

Ciara Loftus authored Sep 13, 2019

Commit 4c5d9a7f ("i40e: fix xdp handle calculations") reintroduced
the addition of the umem headroom to the xdp handle in the i40e_zca_free,
i40e_alloc_buffer_slow_zc and i40e_alloc_buffer_zc functions. However,
the headroom is already added to the handle in the function i40_run_xdp_zc.
This commit removes the latter addition and fixes the case where the
headroom is non-zero.

Fixes: 4c5d9a7f ("i40e: fix xdp handle calculations")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

168dfc3a

selftests/bpf: add bpf-gcc support · 4ce150b6

Ilya Leoshkevich authored Sep 12, 2019

Now that binutils and gcc support for BPF is upstream, make use of it in
BPF selftests using alu32-like approach. Share as much as possible of
CFLAGS calculation with clang.

Fixes only obvious issues, leaving more complex ones for later:
- Use gcc-provided bpf-helpers.h instead of manually defining the
  helpers, change bpf_helpers.h include guard to avoid conflict.
- Include <linux/stddef.h> for __always_inline.
- Add $(OUTPUT)/../usr/include to include path in order to use local
  kernel headers instead of system kernel headers when building with O=.

In order to activate the bpf-gcc support, one needs to configure
binutils and gcc with --target=bpf and make them available in $PATH. In
particular, gcc must be installed as `bpf-gcc`, which is the default.

Right now with binutils 25a2915e8dba and gcc r275589 only a handful of
tests work:

	# ./test_progs_bpf_gcc
	# Summary: 7/39 PASSED, 1 SKIPPED, 98 FAILED

The reason for those failures are as follows:

- Build errors:
  - `error: too many function arguments for eBPF` for __always_inline
    functions read_str_var and read_map_var - must be inlining issue,
    and for process_l3_headers_v6, which relies on optimizing away
    function arguments.
  - `error: indirect call in function, which are not supported by eBPF`
    where there are no obvious indirect calls in the source calls, e.g.
    in __encap_ipip_none.
  - `error: field 'lock' has incomplete type` for fields of `struct
    bpf_spin_lock` type - bpf_spin_lock is re#defined by bpf-helpers.h,
    so its usage is sensitive to order of #includes.
  - `error: eBPF stack limit exceeded` in sysctl_tcp_mem.
- Load errors:
  - Missing object files due to above build errors.
  - `libbpf: failed to create map (name: 'test_ver.bss')`.
  - `libbpf: object file doesn't contain bpf program`.
  - `libbpf: Program '.text' contains unrecognized relo data pointing to
    section 0`.
  - `libbpf: BTF is required, but is missing or corrupted` - no BTF
    support in gcc yet.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Jose E. Marchesi <jose.marchesi@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

4ce150b6

06 Sep, 2019 29 commits

kcm: use BPF_PROG_RUN · a2c11b03

Sami Tolvanen authored Sep 05, 2019

Instead of invoking struct bpf_prog::bpf_func directly, use the
BPF_PROG_RUN macro.
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

a2c11b03

Merge branch 'move-sockopt-tests' · 8f6e19ab

Alexei Starovoitov authored Sep 06, 2019

Stanislav Fomichev says:

====================
Now that test_progs is shaping into more generic test framework,
let's convert sockopt tests to it. This requires adding
a helper to create and join a cgroup first (test__join_cgroup).
Since we already hijack stdout/stderr that shouldn't be
a problem (cgroup helpers log to stderr).

The rest of the patches just move sockopt tests files under prog_tests/
and do the required small adjustments.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

8f6e19ab

selftests/bpf: test_progs: convert test_tcp_rtt · 1f4f80fe

Stanislav Fomichev authored Sep 04, 2019

Move the files, adjust includes, remove entry from Makefile & .gitignore
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

1f4f80fe

selftests/bpf: test_progs: convert test_sockopt_inherit · e3e02e1d

Stanislav Fomichev authored Sep 04, 2019

Move the files, adjust includes, remove entry from Makefile & .gitignore

I also added pthread_cond_wait for the server thread startup. We don't
want to connect to the server that's not yet up (for some reason
this existing race is now more prominent with test_progs).
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

e3e02e1d

selftests/bpf: test_progs: convert test_sockopt_multi · 3886bd7c

Stanislav Fomichev authored Sep 04, 2019

Move the files, adjust includes, remove entry from Makefile & .gitignore
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

3886bd7c

selftests/bpf: test_progs: convert test_sockopt_sk · 9a365e67

Stanislav Fomichev authored Sep 04, 2019

Move the files, adjust includes, remove entry from Makefile & .gitignore
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

9a365e67

selftests/bpf: test_progs: convert test_sockopt · 4a647421

Stanislav Fomichev authored Sep 04, 2019

Move the files, adjust includes, remove entry from Makefile & .gitignore
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

4a647421

selftests/bpf: test_progs: add test__join_cgroup helper · 88dadc63

Stanislav Fomichev authored Sep 04, 2019

test__join_cgroup() combines the following operations that usually
go hand in hand and returns cgroup fd:

  * setup cgroup environment (make sure cgroupfs is mounted)
  * mkdir cgroup
  * join cgroup

It also marks a test as a "cgroup cleanup needed" and removes cgroup
state after the test is done.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

88dadc63

kbuild: replace BASH-specific ${@:2} with shift and ${@} · 618916a4

Andrii Nakryiko authored Sep 05, 2019

${@:2} is BASH-specific extension, which makes link-vmlinux.sh rely on
BASH. Use shift and ${@} instead to fix this issue.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 341dfcf8 ("btf: expose BTF info through sysfs")
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

618916a4

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · 1e46c09e

David S. Miller authored Sep 06, 2019

Daniel Borkmann says:

====================
The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Add the ability to use unaligned chunks in the AF_XDP umem. By
   relaxing where the chunks can be placed, it allows to use an
   arbitrary buffer size and place whenever there is a free
   address in the umem. Helps more seamless DPDK AF_XDP driver
   integration. Support for i40e, ixgbe and mlx5e, from Kevin and
   Maxim.

2) Addition of a wakeup flag for AF_XDP tx and fill rings so the
   application can wake up the kernel for rx/tx processing which
   avoids busy-spinning of the latter, useful when app and driver
   is located on the same core. Support for i40e, ixgbe and mlx5e,
   from Magnus and Maxim.

3) bpftool fixes for printf()-like functions so compiler can actually
   enforce checks, bpftool build system improvements for custom output
   directories, and addition of 'bpftool map freeze' command, from Quentin.

4) Support attaching/detaching XDP programs from 'bpftool net' command,
   from Daniel.

5) Automatic xskmap cleanup when AF_XDP socket is released, and several
   barrier/{read,write}_once fixes in AF_XDP code, from Björn.

6) Relicense of bpf_helpers.h/bpf_endian.h for future libbpf
   inclusion as well as libbpf versioning improvements, from Andrii.

7) Several new BPF kselftests for verifier precision tracking, from Alexei.

8) Several BPF kselftest fixes wrt endianess to run on s390x, from Ilya.

9) And more BPF kselftest improvements all over the place, from Stanislav.

10) Add simple BPF map op cache for nfp driver to batch dumps, from Jakub.

11) AF_XDP socket umem mapping improvements for 32bit archs, from Ivan.

12) Add BPF-to-BPF call and BTF line info support for s390x JIT, from Yauheni.

13) Small optimization in arm64 JIT to spare 1 insns for BPF_MOD, from Jerin.

14) Fix an error check in bpf_tcp_gen_syncookie() helper, from Petar.

15) Various minor fixes and cleanups, from Nathan, Masahiro, Masanari,
    Peter, Wei, Yue.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

1e46c09e

lan743x: remove redundant assignment to variable rx_process_result · f9bcfe21

Colin Ian King authored Sep 05, 2019

The variable rx_process_result is being initialized with a value that
is never read and is being re-assigned immediately afterwards. The
assignment is redundant, so replace it with the return from function
lan743x_rx_process_packet.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f9bcfe21

Merge branch 'ravb-remove-use-of-undocumented-registers' · 5b1ab1ae

David S. Miller authored Sep 06, 2019

Simon Horman says:

====================
ravb: remove use of undocumented registers

this short series cleans up the RAVB driver a little.

The first patch corrects the spelling of the FBP field of SFO register.
This register field is unused and should have no run-time effect.

The remaining patches remove the use of undocumented registers
after some consultation with the internal Renesas BSP team.

Changes in v2:
* Corrected mangled state of first patch
* Patches 2/4 and 3/4 split out of a large patch
* Accumulated acks
* Tweaked changelog
* Claimed authorship of all patches

v1 of this series was tested on the following platforms.
No behaviour change is expected in v2.
* E3 Ebisu
* H3 Salvator-XS (ES2.0)
* M3-W Salvator-XS
* M3-N Salvator-XS
* RZ/G1C iW-RainboW-G23S
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

5b1ab1ae

ravb: TROCR register is only present on R-Car Gen3 · fd8ab76a

Simon Horman authored Sep 05, 2019

Only use the TROCR register on R-Car Gen3 as it is not present on other
SoCs.

Offsets used for the undocumented registers are considered reserved and
should not be written to. After some internal investigation with Renesas it
remains unclear why this driver accesses these fields on R-Car Gen2 but
regardless of what the historical reasons are the current code is
considered incorrect.
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fd8ab76a

ravb: remove undocumented endianness selection · 2d957a7e

Simon Horman authored Sep 05, 2019

This patch removes the use of the undocumented BOC bit of the CCC register.

Current documentation for EtherAVB (ravb) describes the offset of what the
driver uses as the BOC bit as reserved and that only a value of 0 should be
written. After some internal investigation with Renesas it remains unclear
why this driver accesses these fields but regardless of what the historical
reasons are the current code is considered incorrect.

Based on work by Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

2d957a7e

ravb: remove undocumented counter processing · 009a4703

Simon Horman authored Sep 05, 2019

This patch removes the use of the undocumented counter registers
CDCR, LCCR, CERCR, CEECR.

Offsets used for undocumented registers are considered reserved and
should not be written to. After some internal investigation with Renesas
it remains unclear why this driver accesses these fields but regardless of
what the historical reasons are the current code is considered incorrect.

Based on work by Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

009a4703

ravb: correct typo in FBP field of SFO register · 845e4b80

Simon Horman authored Sep 05, 2019

The field name is FBP rather than FPB.

This field is unused and could equally be removed from the driver entirely.
But there seems no harm in leaving as documentation of the presence of the
field.

Based on work by Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

845e4b80

Merge branch 'net-hns3-add-some-bugfixes-and-cleanups' · 7250a9d2

David S. Miller authored Sep 06, 2019

Huazhong Tan says:

====================
net: hns3: add some bugfixes and cleanups

This patch-set includes bugfixes and cleanups for the HNS3
ethernet controller driver.

[patch 01/07] fixes an error when setting VLAN offload.

[patch 02/07] fixes an double free issue when setting ringparam.

[patch 03/07] fixes a mis-assignment of hdev->reset_level.

[patch 04/07] adds a checking for client's validity.

[patch 05/07] simplifies bool variable's assignment.

[patch 06/07] disables loopback when initializing.

[patch 07/07] makes internal function to static.

Change log:
V1->V2: fixes comment from Sergei Shtylyov.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

7250a9d2

net: hns3: make hclge_dbg_get_m7_stats_info static · 91f8ff09

Guojia Liao authored Sep 05, 2019

hclge_dbg_get_m7_info is used only in the hclge_debugfs.c,
so it should be declared with static.
Signed-off-by: Guojia Liao <liaoguojia@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

91f8ff09

net: hns3: disable loopback setting in hclge_mac_init · 1cbc662d

Yufeng Mo authored Sep 05, 2019

If the selftest and reset are performed at the same time, the loopback
setting may be still in the enable state after the reset. As a result,
packets cannot be sent out.

This patch fixes this issue by disabling loopback in hclge_mac_init.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1cbc662d

net: hns3: remove explicit conversion to bool · 1483fa49

Guojia Liao authored Sep 05, 2019

Relational and logical operators evaluate to bool,
explicit conversion is overly verbose and unnecessary.
Signed-off-by: Guojia Liao <liaoguojia@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1483fa49

net: hns3: add client node validity judgment · b7cf22b7

Peng Li authored Sep 05, 2019

HNS3 driver can only unregister client which included in hnae3_client_list.
This patch adds the client node validity judgment.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b7cf22b7

net: hns3: fix mis-assignment to hdev->reset_level in hclge_reset · 525a294e

Huazhong Tan authored Sep 05, 2019

Since hclge_get_reset_level may return HNAE3_NONE_RESET,
so hdev->reset_level can not be assigned with the return
value in the hclge_reset(), otherwise, it will cause
the use of hdev->reset_level in hclge_reset_event get
into error.

Fixes: 012fcb52 ("net: hns3: activate reset timer when calling reset_event")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

525a294e

net: hns3: fix double free bug when setting ringparam · 323a2ac5

Huazhong Tan authored Sep 05, 2019

The system will panic when change the ringparam in HNS3 drivers:

[ 1459.627727] hns3 0000:bd:00.0 eth6: Changing Tx/Rx ring ds from 1024/1024 to 24/24
[ 1459.635766] hns3 0000:bd:00.0 eth6: link down
[ 1459.640788] BUG: Bad page state in process ethtool  pfn:203f75c18
[ 1459.646940] page:ffff7ee4ffd70600 refcount:0 mapcount:0 mapping:ffff993fff40f400 index:0x0 compound_mapcount: 0
[ 1459.656987] flags: 0x9fffe00000010200(slab|head)
[ 1459.661591] raw: 9fffe00000010200 dead000000000100 dead000000000122 ffff993fff40f400
[ 1459.669302] raw: 0000000000000000 0000000080100010 00000000ffffffff 0000000000000000
[ 1459.677016] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
[ 1459.683432] bad because of flags: 0x200(slab)
[ 1459.687775] Modules linked in: ib_ipoib ib_umad rpcrdma ib_iser libiscsi scsi_transport_iscsi hns_roce_hw_v2 crct10dif_ce hns3 ses hclge hnae3 hisi_hpre hisi_zip qm uacce ip_tables x_tables hisi_sas_v3_hw hisi_sas_main libsas scsi_transport_sas
[ 1459.709329] CPU: 14 PID: 17244 Comm: ethtool Tainted: G           O      5.3.0-rc4-00415-gc86f057 #1
[ 1459.718419] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V3.B040.01 07/26/2019
[ 1459.727248] Call trace:
[ 1459.729688]  dump_backtrace+0x0/0x150
[ 1459.733335]  show_stack+0x24/0x30
[ 1459.736639]  dump_stack+0xa0/0xc4
[ 1459.739943]  bad_page+0xf0/0x158
[ 1459.743157]  free_pages_check_bad+0x84/0xa0
[ 1459.747322]  __free_pages_ok+0x348/0x378
[ 1459.751228]  page_frag_free+0x80/0x88
[ 1459.754877]  skb_free_head+0x38/0x48
[ 1459.758436]  skb_release_data+0x134/0x160
[ 1459.762427]  skb_release_all+0x30/0x40
[ 1459.766158]  consume_skb+0x38/0x108
[ 1459.769633]  __dev_kfree_skb_any+0x58/0x68
[ 1459.773718]  hns3_fini_ring+0x48/0x58 [hns3]
[ 1459.777970]  hns3_set_ringparam+0x2a8/0x418 [hns3]
[ 1459.782741]  dev_ethtool+0x5f4/0x2080
[ 1459.786390]  dev_ioctl+0x190/0x3d8
[ 1459.789777]  sock_do_ioctl+0xf8/0x220
[ 1459.793423]  sock_ioctl+0x3bc/0x490
[ 1459.796896]  do_vfs_ioctl+0xc4/0x868
[ 1459.800454]  ksys_ioctl+0x8c/0xa0
[ 1459.803752]  __arm64_sys_ioctl+0x28/0x38
[ 1459.807658]  el0_svc_common.constprop.0+0xe0/0x1e0
[ 1459.812426]  el0_svc_handler+0x34/0x90
[ 1459.816158]  el0_svc+0x10/0x14
[ 1459.819220] Disabling lock debugging due to kernel taint
[ 1459.825182] ------------[ cut here ]------------

Since ndo_stop will reclaim the RX's skb allocated by the driver,
so the backed up ring parameter should not keep this info.

Fixes: a723fb8e ("net: hns3: refine for set ring parameters")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

323a2ac5

net: hns3: fix error VF index when setting VLAN offload · d9c0f275

Jian Shen authored Sep 05, 2019

In original codes, the VF index used incorrectly in function
hclge_set_vlan_rx_offload_cfg() and hclge_set_vlan_rx_offload_cfg().
When VF id is greater than 8, for example 9, it will set the
same bit with VF id 1.

This patch fixes it by using  vport->vport_id % HCLGE_VF_NUM_PER_CMD /
HCLGE_VF_NUM_PER_BYTE as the array index, instead of vport->vport_id /
HCLGE_VF_NUM_PER_CMD.

Fixes: 052ece6d ("net: hns3: add ethtool related offload command")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d9c0f275

stmmac: platform: adjust messages and move to dev level · c3a502de

Andy Shevchenko authored Sep 05, 2019

This patch amends the error and warning messages across the platform driver.
It includes the following changes:
 - append \n to the end of messages
 - change pr_* macros to dev_*
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c3a502de

net: phy: Do not check Link status when loopback is enabled · fe4a7a41

Jose Abreu authored Sep 05, 2019

While running stmmac selftests I found that in my 1G setup some tests
were failling when running with PHY loopback enabled.

It looks like when loopback is enabled the PHY will report that Link is
down even though there is a valid connection.

As in loopback mode the data will not be sent anywhere we can bypass the
logic of checking if Link is valid thus saving unecessary reads.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fe4a7a41

net_sched: act_police: add 2 new attributes to support police 64bit rate and peakrate · d1967e49

David Dai authored Sep 04, 2019

For high speed adapter like Mellanox CX-5 card, it can reach upto
100 Gbits per second bandwidth. Currently htb already supports 64bit rate
in tc utility. However police action rate and peakrate are still limited
to 32bit value (upto 32 Gbits per second). Add 2 new attributes
TCA_POLICE_RATE64 and TCA_POLICE_RATE64 in kernel for 64bit support
so that tc utility can use them for 64bit rate and peakrate value to
break the 32bit limit, and still keep the backward binary compatibility.
Tested-by: David Dai <zdai@linux.vnet.ibm.com>
Signed-off-by: David Dai <zdai@linux.vnet.ibm.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d1967e49

net: openvswitch: Set OvS recirc_id from tc chain index · 95a7233c

Paul Blakey authored Sep 04, 2019

Offloaded OvS datapath rules are translated one to one to tc rules,
for example the following simplified OvS rule:

recirc_id(0),in_port(dev1),eth_type(0x0800),ct_state(-trk) actions:ct(),recirc(2)

Will be translated to the following tc rule:

$ tc filter add dev dev1 ingress \
	    prio 1 chain 0 proto ip \
		flower tcp ct_state -trk \
		action ct pipe \
		action goto chain 2

Received packets will first travel though tc, and if they aren't stolen
by it, like in the above rule, they will continue to OvS datapath.
Since we already did some actions (action ct in this case) which might
modify the packets, and updated action stats, we would like to continue
the proccessing with the correct recirc_id in OvS (here recirc_id(2))
where we left off.

To support this, introduce a new skb extension for tc, which
will be used for translating tc chain to ovs recirc_id to
handle these miss cases. Last tc chain index will be set
by tc goto chain action and read by OvS datapath.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

95a7233c

nfp: Drop unnecessary continue in nfp_net_pf_alloc_vnics · 47e25277

zhong jiang authored Sep 04, 2019

Continue is not needed at the bottom of a loop.
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

47e25277

05 Sep, 2019 7 commits

Merge branch 'bpf-af-xdp-barrier-fixes' · 593f191a

Daniel Borkmann authored Sep 05, 2019

Björn Töpel says:

====================
This is a four patch series of various barrier, {READ, WRITE}_ONCE
cleanups in the AF_XDP socket code. More details can be found in the
corresponding commit message. Previous revisions: v1 [4] and v2 [5].

For an AF_XDP socket, most control plane operations are done under the
control mutex (struct xdp_sock, mutex), but there are some places
where members of the struct is read outside the control mutex. The
dev, queue_id members are set in bind() and cleared at cleanup. The
umem, fq, cq, tx, rx, and state member are all assigned in various
places, e.g. bind() and setsockopt(). When the members are assigned,
they are protected by the control mutex, but since they are read
outside the mutex, a WRITE_ONCE is required to avoid store-tearing on
the read-side.

Prior the state variable was introduced by Ilya, the dev member was
used to determine whether the socket was bound or not. However, when
dev was read, proper SMP barriers and READ_ONCE were missing. In order
to address the missing barriers and READ_ONCE, we start using the
state variable as a point of synchronization. The state member
read/write is paired with proper SMP barriers, and from this follows
that the members described above does not need READ_ONCE statements if
used in conjunction with state check.

To summarize: The members struct xdp_sock members dev, queue_id, umem,
fq, cq, tx, rx, and state were read lock-less, with incorrect barriers
and missing {READ, WRITE}_ONCE. After this series umem, fq, cq, tx,
rx, and state are read lock-less. When these members are updated,
WRITE_ONCE is used. When read, READ_ONCE are only used when read
outside the control mutex (e.g. mmap) or, not synchronized with the
state member (XSK_BOUND plus smp_rmb())

[1] https://lore.kernel.org/bpf/beef16bb-a09b-40f1-7dd0-c323b4b89b17@iogearbox.net/
[2] https://lwn.net/Articles/793253/
[3] https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE
[4] https://lore.kernel.org/bpf/20190822091306.20581-1-bjorn.topel@gmail.com/
[5] https://lore.kernel.org/bpf/20190826061053.15996-1-bjorn.topel@gmail.com/

v2->v3:
  Minor restructure of commits.
  Improve cover and commit messages. (Daniel)
v1->v2:
  Removed redundant dev check. (Jonathan)
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

593f191a

xsk: lock the control mutex in sock_diag interface · 25dc18ff

Björn Töpel authored Sep 04, 2019

When accessing the members of an XDP socket, the control mutex should
be held. This commit fixes that.
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Fixes: a36b38aa ("xsk: add sock_diag interface for AF_XDP")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

25dc18ff

xsk: use state member for socket synchronization · 42fddcc7

Björn Töpel authored Sep 04, 2019

Prior the state variable was introduced by Ilya, the dev member was
used to determine whether the socket was bound or not. However, when
dev was read, proper SMP barriers and READ_ONCE were missing. In order
to address the missing barriers and READ_ONCE, we start using the
state variable as a point of synchronization. The state member
read/write is paired with proper SMP barriers, and from this follows
that the members described above does not need READ_ONCE if used in
conjunction with state check.

In all syscalls and the xsk_rcv path we check if state is
XSK_BOUND. If that is the case we do a SMP read barrier, and this
implies that the dev, umem and all rings are correctly setup. Note
that no READ_ONCE are needed for these variable if used when state is
XSK_BOUND (plus the read barrier).

To summarize: The members struct xdp_sock members dev, queue_id, umem,
fq, cq, tx, rx, and state were read lock-less, with incorrect barriers
and missing {READ, WRITE}_ONCE. Now, umem, fq, cq, tx, rx, and state
are read lock-less. When these members are updated, WRITE_ONCE is
used. When read, READ_ONCE are only used when read outside the control
mutex (e.g. mmap) or, not synchronized with the state member
(XSK_BOUND plus smp_rmb())

Note that dev and queue_id do not need a WRITE_ONCE or READ_ONCE, due
to the introduce state synchronization (XSK_BOUND plus smp_rmb()).

Introducing the state check also fixes a race, found by syzcaller, in
xsk_poll() where umem could be accessed when stale.
Suggested-by: Hillf Danton <hdanton@sina.com>
Reported-by: syzbot+c82697e3043781e08802@syzkaller.appspotmail.com
Fixes: 77cd0d7b ("xsk: add support for need_wakeup flag in AF_XDP rings")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

42fddcc7

xsk: avoid store-tearing when assigning umem · 9764f4b3

Björn Töpel authored Sep 04, 2019

The umem member of struct xdp_sock is read outside of the control
mutex, in the mmap implementation, and needs a WRITE_ONCE to avoid
potential store-tearing.
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Fixes: 423f3832 ("xsk: add umem fill queue support and mmap")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

9764f4b3

xsk: avoid store-tearing when assigning queues · 94a99763

Björn Töpel authored Sep 04, 2019

Use WRITE_ONCE when doing the store of tx, rx, fq, and cq, to avoid
potential store-tearing. These members are read outside of the control
mutex in the mmap implementation.
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Fixes: 37b07693 ("xsk: add missing write- and data-dependency barrier")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

94a99763

selftests/bpf: precision tracking tests · 310f4204

Alexei Starovoitov authored Sep 03, 2019

Add two tests to check that stack slot marking during backtracking
doesn't trigger 'spi > allocated_stack' warning.
One test is using BPF_ST insn. Another is using BPF_STX.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

310f4204

ixgbe: fix xdp handle calculations · 7cbbf9f1

Kevin Laatz authored Sep 05, 2019

Currently, we don't add headroom to the handle in ixgbe_zca_free,
ixgbe_alloc_buffer_slow_zc and ixgbe_alloc_buffer_zc. The addition of the
headroom to the handle was removed in
commit d8c3061e ("ixgbe: modify driver for handling offsets"), which
will break things when headroom isvnon-zero. This patch fixes this and uses
xsk_umem_adjust_offset to add it appropritely based on the mode being run.

Fixes: d8c3061e ("ixgbe: modify driver for handling offsets")
Reported-by: Bjorn Topel <bjorn.topel@intel.com>
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

7cbbf9f1