Commits · 6a5457618f62147aeac706d26ff28e0e57a324e3 · Kirill Smelkov / linux

01 Feb, 2019 15 commits

samples/bpf: Extend RLIMIT_MEMLOCK for xdp_{sample_pkts, router_ipv4} · 6a545761

Maciej Fijalkowski authored Feb 01, 2019

There is a common problem with xdp samples that happens when user wants
to run a particular sample and some bpf program is already loaded. The
default 64kb RLIMIT_MEMLOCK resource limit will cause a following error
(assuming that xdp sample that is failing was converted to libbpf
usage):

libbpf: Error in bpf_object__probe_name():Operation not permitted(1).
Couldn't load basic 'r0 = 0' BPF program.
libbpf: failed to load object './xdp_sample_pkts_kern.o'

Fix it in xdp_sample_pkts and xdp_router_ipv4 by setting RLIMIT_MEMLOCK
to RLIM_INFINITY.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

6a545761

samples/bpf: Convert XDP samples to libbpf usage · bbaf6029

Maciej Fijalkowski authored Feb 01, 2019

Some of XDP samples that are attaching the bpf program to the interface
via libbpf's bpf_set_link_xdp_fd are still using the bpf_load.c for
loading and manipulating the ebpf program and maps. Convert them to do
this through libbpf usage and remove bpf_load from the picture.

While at it remove what looks like debug leftover in
xdp_redirect_map_user.c

In xdp_redirect_cpu, change the way that the program to be loaded onto
interface is chosen - user now needs to pass the program's section name
instead of the relative number. In case of typo print out the section
names to choose from.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

bbaf6029

samples/bpf: xdp_redirect_cpu have not need for read_trace_pipe · 7313798b

Jesper Dangaard Brouer authored Feb 01, 2019

The sample xdp_redirect_cpu is not using helper bpf_trace_printk.
Thus it makes no sense that the --debug option us reading
from /sys/kernel/debug/tracing/trace_pipe via read_trace_pipe.
Simply remove it.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

7313798b

libbpf: Add a helper for retrieving a map fd for a given name · f3cea32d

Maciej Fijalkowski authored Feb 01, 2019

XDP samples are mostly cooperating with eBPF maps through their file
descriptors. In case of a eBPF program that contains multiple maps it
might be tiresome to iterate through them and call bpf_map__fd for each
one. Add a helper mostly based on bpf_object__find_map_by_name, but
instead of returning the struct bpf_map pointer, return map fd.
Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

f3cea32d

bpf: powerpc64: add JIT support for bpf line info · 6f20c71d

Sandipan Das authored Feb 01, 2019

This adds support for generating bpf line info for
JITed programs.
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

6f20c71d

Merge branch 'bpf-spinlocks' · 2863debf

Daniel Borkmann authored Feb 01, 2019

Alexei Starovoitov says:

====================
Many algorithms need to read and modify several variables atomically.
Until now it was hard to impossible to implement such algorithms in BPF.
Hence introduce support for bpf_spin_lock.

The api consists of 'struct bpf_spin_lock' that should be placed
inside hash/array/cgroup_local_storage element
and bpf_spin_lock/unlock() helper function.

Example:
struct hash_elem {
    int cnt;
    struct bpf_spin_lock lock;
};
struct hash_elem * val = bpf_map_lookup_elem(&hash_map, &key);
if (val) {
    bpf_spin_lock(&val->lock);
    val->cnt++;
    bpf_spin_unlock(&val->lock);
}

and BPF_F_LOCK flag for lookup/update bpf syscall commands that
allows user space to read/write map elements under lock.

Together these primitives allow race free access to map elements
from bpf programs and from user space.

Key restriction: root only.
Key requirement: maps must be annotated with BTF.

This concept was discussed at Linux Plumbers Conference 2018.
Thank you everyone who participated and helped to iron out details
of api and implementation.

Patch 1: bpf_spin_lock support in the verifier, BTF, hash, array.
Patch 2: bpf_spin_lock in cgroup local storage.
Patches 3,4,5: tests
Patch 6: BPF_F_LOCK flag to lookup/update
Patches 7,8,9: tests

v6->v7:
- fixed this_cpu->__this_cpu per Peter's suggestion and added Ack.
- simplified bpf_spin_lock and load/store overlap check in the verifier
  as suggested by Andrii
- rebase

v5->v6:
- adopted arch_spinlock approach suggested by Peter
- switched to spin_lock_irqsave equivalent as the simplest way
  to avoid deadlocks in rare case of nested networking progs
  (cgroup-bpf prog in preempt_disable vs clsbpf in softirq sharing
  the same map with bpf_spin_lock)
  bpf_spin_lock is only allowed in networking progs that don't
  have arbitrary entry points unlike tracing progs.
- rebase and split test_verifier tests

v4->v5:
- disallow bpf_spin_lock for tracing progs due to insufficient preemption checks
- socket filter progs cannot use bpf_spin_lock due to missing preempt_disable
- fix atomic_set_release. Spotted by Peter.
- fixed hash_of_maps

v3->v4:
- fix BPF_EXIST | BPF_NOEXIST check patch 6. Spotted by Jakub. Thanks!
- rebase

v2->v3:
- fixed build on ia64 and archs where qspinlock is not supported
- fixed missing lock init during lookup w/o BPF_F_LOCK. Spotted by Martin

v1->v2:
- addressed several issues spotted by Daniel and Martin in patch 1
- added test11 to patch 4 as suggested by Daniel
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

2863debf

selftests/bpf: test for BPF_F_LOCK · ba72a7b4

Alexei Starovoitov authored Jan 31, 2019

Add C based test that runs 4 bpf programs in parallel
that update the same hash and array maps.
And another 2 threads that read from these two maps
via lookup(key, value, BPF_F_LOCK) api
to make sure the user space sees consistent value in both
hash and array elements while user space races with kernel bpf progs.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

ba72a7b4

libbpf: introduce bpf_map_lookup_elem_flags() · df5d22fa

Alexei Starovoitov authored Jan 31, 2019

Introduce
int bpf_map_lookup_elem_flags(int fd, const void *key, void *value, __u64 flags)
helper to lookup array/hash/cgroup_local_storage elements with BPF_F_LOCK flag.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

df5d22fa

tools/bpf: sync uapi/bpf.h · e44ac9a2

Alexei Starovoitov authored Jan 31, 2019

add BPF_F_LOCK definition to tools/include/uapi/linux/bpf.h
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

e44ac9a2

bpf: introduce BPF_F_LOCK flag · 96049f3a

Alexei Starovoitov authored Jan 31, 2019

Introduce BPF_F_LOCK flag for map_lookup and map_update syscall commands
and for map_update() helper function.
In all these cases take a lock of existing element (which was provided
in BTF description) before copying (in or out) the rest of map value.

Implementation details that are part of uapi:

Array:
The array map takes the element lock for lookup/update.

Hash:
hash map also takes the lock for lookup/update and tries to avoid the bucket lock.
If old element exists it takes the element lock and updates the element in place.
If element doesn't exist it allocates new one and inserts into hash table
while holding the bucket lock.
In rare case the hashmap has to take both the bucket lock and the element lock
to update old value in place.

Cgroup local storage:
It is similar to array. update in place and lookup are done with lock taken.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

96049f3a

selftests/bpf: add bpf_spin_lock C test · ab963beb

Alexei Starovoitov authored Jan 31, 2019

add bpf_spin_lock C based test that requires latest llvm with BTF support
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

ab963beb

selftests/bpf: add bpf_spin_lock verifier tests · b4d4556c

Alexei Starovoitov authored Jan 31, 2019

add bpf_spin_lock tests to test_verifier.c that don't require
latest llvm with BTF support
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

b4d4556c

tools/bpf: sync include/uapi/linux/bpf.h · 7dac3ae4

Alexei Starovoitov authored Jan 31, 2019

sync bpf.h
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

7dac3ae4

bpf: add support for bpf_spin_lock to cgroup local storage · e16d2f1a

Alexei Starovoitov authored Jan 31, 2019

Allow 'struct bpf_spin_lock' to reside inside cgroup local storage.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

e16d2f1a

bpf: introduce bpf_spin_lock · d83525ca

Alexei Starovoitov authored Jan 31, 2019

Introduce 'struct bpf_spin_lock' and bpf_spin_lock/unlock() helpers to let
bpf program serialize access to other variables.

Example:
struct hash_elem {
    int cnt;
    struct bpf_spin_lock lock;
};
struct hash_elem * val = bpf_map_lookup_elem(&hash_map, &key);
if (val) {
    bpf_spin_lock(&val->lock);
    val->cnt++;
    bpf_spin_unlock(&val->lock);
}

Restrictions and safety checks:
- bpf_spin_lock is only allowed inside HASH and ARRAY maps.
- BTF description of the map is mandatory for safety analysis.
- bpf program can take one bpf_spin_lock at a time, since two or more can
  cause dead locks.
- only one 'struct bpf_spin_lock' is allowed per map element.
  It drastically simplifies implementation yet allows bpf program to use
  any number of bpf_spin_locks.
- when bpf_spin_lock is taken the calls (either bpf2bpf or helpers) are not allowed.
- bpf program must bpf_spin_unlock() before return.
- bpf program can access 'struct bpf_spin_lock' only via
  bpf_spin_lock()/bpf_spin_unlock() helpers.
- load/store into 'struct bpf_spin_lock lock;' field is not allowed.
- to use bpf_spin_lock() helper the BTF description of map value must be
  a struct and have 'struct bpf_spin_lock anyname;' field at the top level.
  Nested lock inside another struct is not allowed.
- syscall map_lookup doesn't copy bpf_spin_lock field to user space.
- syscall map_update and program map_update do not update bpf_spin_lock field.
- bpf_spin_lock cannot be on the stack or inside networking packet.
  bpf_spin_lock can only be inside HASH or ARRAY map value.
- bpf_spin_lock is available to root only and to all program types.
- bpf_spin_lock is not allowed in inner maps of map-in-map.
- ld_abs is not allowed inside spin_lock-ed region.
- tracing progs and socket filter progs cannot use bpf_spin_lock due to
  insufficient preemption checks

Implementation details:
- cgroup-bpf class of programs can nest with xdp/tc programs.
  Hence bpf_spin_lock is equivalent to spin_lock_irqsave.
  Other solutions to avoid nested bpf_spin_lock are possible.
  Like making sure that all networking progs run with softirq disabled.
  spin_lock_irqsave is the simplest and doesn't add overhead to the
  programs that don't use it.
- arch_spinlock_t is used when its implemented as queued_spin_lock
- archs can force their own arch_spinlock_t
- on architectures where queued_spin_lock is not available and
  sizeof(arch_spinlock_t) != sizeof(__u32) trivial lock is used.
- presence of bpf_spin_lock inside map value could have been indicated via
  extra flag during map_create, but specifying it via BTF is cleaner.
  It provides introspection for map key/value and reduces user mistakes.

Next steps:
- allow bpf_spin_lock in other map types (like cgroup local storage)
- introduce BPF_F_LOCK flag for bpf_map_update() syscall and helper
  to request kernel to grab bpf_spin_lock before rewriting the value.
  That will serialize access to map elements.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

d83525ca

31 Jan, 2019 9 commits

bpf, cgroups: clean up kerneldoc warnings · 1832f4ef

Valdis Kletnieks authored Jan 29, 2019

Building with W=1 reveals some bitrot:

  CC      kernel/bpf/cgroup.o
kernel/bpf/cgroup.c:238: warning: Function parameter or member 'flags' not described in '__cgroup_bpf_attach'
kernel/bpf/cgroup.c:367: warning: Function parameter or member 'unused_flags' not described in '__cgroup_bpf_detach'

Add a kerneldoc line for 'flags'.

Fixing the warning for 'unused_flags' is best approached by
removing the unused parameter on the function call.
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

1832f4ef

bpf: fix missing prototype warnings · 116bfa96

Valdis Kletnieks authored Jan 29, 2019

Compiling with W=1 generates warnings:

  CC      kernel/bpf/core.o
kernel/bpf/core.c:721:12: warning: no previous prototype for ?bpf_jit_alloc_exec_limit? [-Wmissing-prototypes]
  721 | u64 __weak bpf_jit_alloc_exec_limit(void)
      |            ^~~~~~~~~~~~~~~~~~~~~~~~
kernel/bpf/core.c:757:14: warning: no previous prototype for ?bpf_jit_alloc_exec? [-Wmissing-prototypes]
  757 | void *__weak bpf_jit_alloc_exec(unsigned long size)
      |              ^~~~~~~~~~~~~~~~~~
kernel/bpf/core.c:762:13: warning: no previous prototype for ?bpf_jit_free_exec? [-Wmissing-prototypes]
  762 | void __weak bpf_jit_free_exec(void *addr)
      |             ^~~~~~~~~~~~~~~~~

All three are weak functions that archs can override, provide
proper prototypes for when a new arch provides their own.
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

116bfa96

bpf: fix bitrotted kerneldoc · de1da68d

Valdis Kletnieks authored Jan 28, 2019

Over the years, the function signature has changed, but the
kerneldoc block hasn't.
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

de1da68d

Merge branch 'bpf-tests-probe-kernel-support' · 9f239f68

Daniel Borkmann authored Jan 31, 2019

Stanislav Fomichev says:

====================
If test_maps/test_verifier is running against the kernel which doesn't
have _all_ BPF features enabled, it fails with an error. This patch
series tries to probe kernel support for each failed test and skip
it instead. This lets users run BPF selftests in the not-all-bpf-yes
environments and received correct PASS/NON-PASS result.

See https://www.spinics.net/lists/netdev/msg539331.html for more
context.

The series goes like this:

* patch #1 skips sockmap tests in test_maps.c if BPF_MAP_TYPE_SOCKMAP
  map is not supported (if bpf_create_map fails, we probe the kernel
  for support)
* patch #2 skips verifier tests if test->prog_type is not supported (if
  bpf_verify_program fails, we probe the kernel for support)
* patch #3 skips verifier tests if test fixup map is not supported (if
  create_map fails, we probe the kernel for support)
* next patches fix various small issues that arise from the first four:
  * patch #4 sets "unknown func bpf_trace_printk#6" prog_type to
    BPF_PROG_TYPE_TRACEPOINT so it is correctly skipped in
    CONFIG_BPF_EVENTS=n case
  * patch #5 exposes BPF_PROG_TYPE_CGROUP_{SKB,SOCK,SOCK_ADDR} only when
    CONFIG_CGROUP_BPF=y, this makes verifier correctly skip appropriate
    tests

v3 changes:
* rebased on top of Quentin's series which adds probes to libbpf

v2 changes:
* don't sprinkle "ifdef CONFIG_CGROUP_BPF" all around net/core/filter.c,
  doing it only in the bpf_types.h is enough to disable
  BPF_PROG_TYPE_CGROUP_{SKB,SOCK,SOCK_ADDR} prog types for non-cgroup
  enabled kernels
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

9f239f68

bpf: BPF_PROG_TYPE_CGROUP_{SKB, SOCK, SOCK_ADDR} require cgroups enabled · befa6181

Stanislav Fomichev authored Jan 28, 2019

There is no way to exercise appropriate attach points without cgroups
enabled. This lets test_verifier correctly skip tests for these
prog_types if kernel was compiled without BPF cgroup support.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

befa6181

selftests/bpf: mark verifier test that uses bpf_trace_printk as BPF_PROG_TYPE_TRACEPOINT · cfff578e

Stanislav Fomichev authored Jan 28, 2019

We don't have this helper if the kernel was compiled without
CONFIG_BPF_EVENTS. Setting prog_type to BPF_PROG_TYPE_TRACEPOINT
let's verifier correctly skip this test based on the missing
prog_type support in the kernel.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

cfff578e

selftests/bpf: skip verifier tests for unsupported map types · 9acea337

Stanislav Fomichev authored Jan 28, 2019

Use recently introduced bpf_probe_map_type() to skip tests in the
test_verifier if map creation (create_map) fails. It's handled
explicitly for each fixup, i.e. if bpf_create_map returns negative fd,
we probe the kernel for the appropriate map support and skip the
test is map type is not supported.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

9acea337

selftests/bpf: skip verifier tests for unsupported program types · 8184d44c

Stanislav Fomichev authored Jan 28, 2019

Use recently introduced bpf_probe_prog_type() to skip tests in the
test_verifier() if bpf_verify_program() fails. The skipped test is
indicated in the output.

Example:

...
679/p bpf_get_stack return R0 within range SKIP (unsupported program
type 5)
680/p ld_abs: invalid op 1 OK
...
Summary: 863 PASSED, 165 SKIPPED, 3 FAILED
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

8184d44c

selftests/bpf: skip sockmap in test_maps if kernel doesn't have support · e8ddbfb4

Stanislav Fomichev authored Jan 28, 2019

Use recently introduced bpf_probe_map_type() to skip test_sockmap()
if map creation fails. The skipped test is indicated in the output.

Example:

test_sockmap SKIP (unsupported map type BPF_MAP_TYPE_SOCKMAP)
Fork 1024 tasks to 'test_update_delete'
...
test_sockmap SKIP (unsupported map type BPF_MAP_TYPE_SOCKMAP)
Fork 1024 tasks to 'test_update_delete'
...
test_maps: OK, 2 SKIPPED
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

e8ddbfb4

30 Jan, 2019 16 commits

Merge branch 'hns3-next' · 630afc77

David S. Miller authored Jan 30, 2019

Huazhong Tan says:

====================
code optimizations & bugfixes for HNS3 driver

This patchset includes bugfixes and code optimizations for the HNS3
ethernet controller driver
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

630afc77

net: hns3: keep flow director state unchanged when reset · 9abeb7d8

Jian Shen authored Jan 31, 2019

In orginal codes, driver always enables flow director when
intializing. When user disable flow director with command
ethtool -K, the flow director will be enabled again after
resetting.

This patch fixes it by only enabling it when first initialzing.

Fixes: 6871af29 ("net: hns3: Add reset handle for flow director")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9abeb7d8

net: hns3: stop sending keep alive msg to PF when VF is resetting · c59a85c0

Jian Shen authored Jan 31, 2019

When VF is resetting, it can't communicate to PF with mailbox msg.
This patch adds reset state checking before sending keep alive msg
to PF.

Fixes: a6d818e3 ("net: hns3: Add vport alive state checking support")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c59a85c0

net: hns3: fix an issue for hclgevf_ae_get_hdev · eed9535f

Peng Li authored Jan 31, 2019

HNS3 VF driver support NIC and Roce, hdev stores NIC
handle and Roce handle, should use correct parameter for
container_of.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

eed9535f

net: hns3: fix improper error handling in the hclge_init_ae_dev() · 9fc55413

Huazhong Tan authored Jan 31, 2019

While hclge_init_umv_space() failed in the hclge_init_ae_dev(),
we should undo all the operation which has been done successfully,
the last success operation maybe hclge_mac_mdio_config(), so if
hclge_init_umv_space() failed, we also need to undo it.

Fixes: 288475b2ad01 ("{topost} net: hns3: refine umv space allocation")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9fc55413

net: hns3: fix for rss result nonuniform · 472d7ece

Jian Shen authored Jan 31, 2019

The rss result is more uniform when use recommended hash key from
microsoft, instead of the one generated by netdev_rss_key_fill().
Also using hash algorithm "xor" is better than "toeplitz".

This patch modifies the default hash key and hash algorithm.
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

472d7ece

net: hns3: fix netif_napi_del() not do problem when unloading · e2152785

Huazhong Tan authored Jan 31, 2019

When the driver is unloading, if a global reset occurs,
unmap_ring_from_vector() in the hns3_nic_uninit_vector_data() will
fail, and hns3_nic_uninit_vector_data() just return. There may be
some netif_napi_del() not be done.

Since hardware will unmap all ring while resetting, so
hns3_nic_uninit_vector_data() should ignore this error, and do the
rest uninitialization.

Fixes: 76ad4f0e ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e2152785

net: hns3: Fix NULL deref when unloading driver · c8a8045b

Huazhong Tan authored Jan 31, 2019

When the driver is unloading, if there is a calling of ndo_open occurs
between phy_disconnect() and unregister_netdev(), it will end up
causing the kernel to eventually hit a NULL deref:

[14942.417828] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000048
[14942.529878] Mem abort info:
[14942.551166]   ESR = 0x96000006
[14942.567070]   Exception class = DABT (current EL), IL = 32 bits
[14942.623081]   SET = 0, FnV = 0
[14942.639112]   EA = 0, S1PTW = 0
[14942.643628] Data abort info:
[14942.659227]   ISV = 0, ISS = 0x00000006
[14942.674870]   CM = 0, WnR = 0
[14942.679449] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000224ad6ad
[14942.695595] [0000000000000048] pgd=00000021e6673003, pud=00000021dbf01003, pmd=0000000000000000
[14942.723163] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[14942.729358] Modules linked in: hns3(O) hclge(O) pv680_mii(O) hnae3(O) [last unloaded: hclge]
[14942.738907] CPU: 1 PID: 26629 Comm: kworker/u4:13 Tainted: G           O      4.18.0-rc1-12928-ga960791-dirty #145
[14942.749491] Hardware name: Huawei Technologies Co., Ltd. D05/D05, BIOS Hi1620 FPGA TB BOOT BIOS B763 08/17/2018
[14942.760392] Workqueue: events_power_efficient phy_state_machine
[14942.766644] pstate: 80c00009 (Nzcv daif +PAN +UAO)
[14942.771918] pc : test_and_set_bit+0x18/0x38
[14942.776589] lr : netif_carrier_off+0x24/0x70
[14942.781033] sp : ffff0000121abd20
[14942.784518] x29: ffff0000121abd20 x28: 0000000000000000
[14942.790208] x27: ffff0000164d3cd8 x26: ffff8021da68b7b8
[14942.795832] x25: 0000000000000000 x24: ffff8021eb407800
[14942.801445] x23: 0000000000000000 x22: 0000000000000000
[14942.807046] x21: 0000000000000001 x20: 0000000000000000
[14942.812672] x19: 0000000000000000 x18: ffff000009781708
[14942.818284] x17: 00000000004970e8 x16: ffff00000816ad48
[14942.823900] x15: 0000000000000000 x14: 0000000000000008
[14942.829528] x13: 0000000000000000 x12: 0000000000000f65
[14942.835149] x11: 0000000000000001 x10: 00000000000009d0
[14942.840753] x9 : ffff0000121abaa0 x8 : 0000000000000000
[14942.846360] x7 : ffff000009781708 x6 : 0000000000000003
[14942.851970] x5 : 0000000000000020 x4 : 0000000000000004
[14942.857575] x3 : 0000000000000002 x2 : 0000000000000001
[14942.863180] x1 : 0000000000000048 x0 : 0000000000000000
[14942.868875] Process kworker/u4:13 (pid: 26629, stack limit = 0x00000000c909dbf3)
[14942.876464] Call trace:
[14942.879200]  test_and_set_bit+0x18/0x38
[14942.883376]  phy_link_change+0x38/0x78
[14942.887378]  phy_state_machine+0x3dc/0x4f8
[14942.891968]  process_one_work+0x158/0x470
[14942.896223]  worker_thread+0x50/0x470
[14942.900219]  kthread+0x104/0x130
[14942.903905]  ret_from_fork+0x10/0x1c
[14942.907755] Code: d2800022 8b400c21 f9800031 9ac32044 (c85f7c22)
[14942.914185] ---[ end trace 968c9e12eb740b23 ]---

So this patch fixes it by modifying the timing to do phy_connect_direct()
and phy_disconnect().

Fixes: 256727da ("net: hns3: Add MDIO support to HNS3 Ethernet driver for hip08 SoC")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c8a8045b

net: hns3: only support tc 0 for VF · de67a690

Yunsheng Lin authored Jan 31, 2019

When the VF shares the same TC config as PF, the business
running on PF and VF must have samiliar module.

For simplicity, we are not considering VF sharing the same tc
configuration as PF use case, so this patch removes the support
of TC configuration from VF and forcing VF to just use single
TC.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

de67a690

net: hns3: change hnae3_register_ae_dev() to int · 74354140

Huazhong Tan authored Jan 31, 2019

hnae3_register_ae_dev() may fail, and it should return a error code
to its caller, so change hnae3_register_ae_dev() return type to int.

Also, when hnae3_register_ae_dev() return error, hns3_probe() should
do some error handling and return the error code.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

74354140

net: hns3: use the correct interface to stop|open port · fc0c174f

Peng Li authored Jan 31, 2019

dev_close() stop the netdev and the service base on the netdev
will stop. But ndev->netdev_ops->ndo_stop() may only stop HW
and stack queue, the service base on the netdev can still work.

Fixes: 5668abda ("net: hns3: add support for set_ringparam")
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fc0c174f

net: hns3: fix VF dump register issue · 8e1445a6

Jian Shen authored Jan 31, 2019

In original codes, the .get_regs_len and .get_regs were missed
assigned. This patch fixes it.

Fixes: 1600c3e5 ("net: hns3: Support "ethtool -d" for HNS3 VF driver")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8e1445a6

net: hns3: reuse the definition of l3 and l4 header info union · 1a6e552d

liyongxin authored Jan 31, 2019

Union l3_hdr_info and l4_hdr_info have already been defined in
the hns3_enet.h, so it is unnecessary to define them elsewhere.

This patch removes the redundant definition, and reuses the one
defined in the hns3_enet.h.
Signed-off-by: liyongxin <liyongxin1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1a6e552d

Merge branch 'net-dsa-mt7530-support-MT7530-in-the-MT7621-SoC' · a82a3fe0

David S. Miller authored Jan 30, 2019

Greg Ungerer says:

====================
net: dsa: mt7530: support MT7530 in the MT7621 SoC

This is the fourth version of a patch series supporting the MT7530 switch
as used in the MediaTek MT7621 SoC. Unlike the MediaTek MT7623 the MT7621
is built around a dual core MIPS CPU architecture. But inside it uses
basically the same 7530 switch.

This series resolves all issues I had with previous versions, and I can
now reliably use the driver on a 7621 SoC platform. These patches were
generated against linux-5.0-rc4.

The first patch enables support for the existing kernel mediatek ethernet
driver on the MT7621 SoC. This support is from Bjørn Mork, with an update
and fix by me. Using this driver fixed a number of problems I had
(TX checksums, large RX packet drop) over the staging driver
(drivers/staging/mt7621-eth).

Patch 2 modifies the mt7530 DSA driver to support the 7530 switch as
implemented in the Mediatek MT7621 SoC. The last patch updates the
devicetree bindings to reflect the new support in the mt7530 driver.

There is no real dependencies between the patches, so they can be taken
independantly.

Creating a new binding for the MT7621 seems like the only viable approach
to distinguish between a stand alone 7530 switch, the silicon module
in the MT7623 SoC and the silicon in the MT7621. Certainly the 7530 ID
register in the MT7623 and MT7621 returns the same value, "0x7530001".

Looking at the mt7530.c DSA driver it might make some sense to convert
the existing "mediatek,mcm" binding to something like "mediatek,mt7623"
to be consistent with this new MT7621 support. As far as I can tell
this is the intention of this binding.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

a82a3fe0

dt-bindings: net: dsa: add new MT7530 binding to support MT7621 · 9389b5e9

Greg Ungerer authored Jan 30, 2019

Add devicetree binding to support the compatible mt7530 switch as used
in the MediaTek MT7621 SoC.
Signed-off-by: Greg Ungerer <gerg@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Sean Wang <sean.wang@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

9389b5e9

net: dsa: mt7530: support the 7530 switch on the Mediatek MT7621 SoC · ddda1ac1

Greg Ungerer authored Jan 30, 2019

The MediaTek MT7621 SoC device contains a 7530 switch, and the existing
linux kernel 7530 DSA switch driver can be used with it.

The bulk of the changes required stem from the 7621 having different
regulator and pad setup. The existing setup of these in the 7530
driver appears to be very specific to its implemtation in the Mediatek
7623 SoC. (Not entirely surprising given the 7623 is a quad core ARM
based SoC, and the 7621 is a dual core, dual thread MIPS based SoC).

Create a new devicetree type, "mediatek,mt7621", to support the 7530
switch in the 7621 SoC. There appears to be no usable ID register to
distinguish it from a 7530 in other hardware at runtime. This is used
to carry out the appropriate configuration and setup.
Signed-off-by: Greg Ungerer <gerg@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Sean Wang <sean.wang@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ddda1ac1