Commits · b42699547fc9fb1057795bccc21a6445743a7fde · nexedi / linux

30 Nov, 2018 1 commit

tools/bpf: make libbpf _GNU_SOURCE friendly · b4269954

Yonghong Song authored Nov 29, 2018

During porting libbpf to bcc, I got some warnings like below:
  ...
  [  2%] Building C object src/cc/CMakeFiles/bpf-shared.dir/libbpf/src/libbpf.c.o
  /home/yhs/work/bcc2/src/cc/libbpf/src/libbpf.c:12:0:
  warning: "_GNU_SOURCE" redefined [enabled by default]
   #define _GNU_SOURCE
  ...
  [  3%] Building C object src/cc/CMakeFiles/bpf-shared.dir/libbpf/src/libbpf_errno.c.o
  /home/yhs/work/bcc2/src/cc/libbpf/src/libbpf_errno.c: In function ‘libbpf_strerror’:
  /home/yhs/work/bcc2/src/cc/libbpf/src/libbpf_errno.c:45:7:
  warning: assignment makes integer from pointer without a cast [enabled by default]
     ret = strerror_r(err, buf, size);
  ...

bcc is built with _GNU_SOURCE defined and this caused the above warning.
This patch intends to make libpf _GNU_SOURCE friendly by
  . define _GNU_SOURCE in libbpf.c unless it is not defined
  . undefine _GNU_SOURCE as non-gnu version of strerror_r is expected.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

b4269954

29 Nov, 2018 1 commit

bpf: Fix various lib and testsuite build failures on 32-bit. · 1ad93ab1

David Miller authored Nov 28, 2018

Cannot cast a u64 to a pointer on 32-bit without an intervening (long)
cast otherwise GCC warns.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

1ad93ab1

28 Nov, 2018 5 commits

selftests/bpf: add config fragment CONFIG_FTRACE_SYSCALLS · 295daee4

Naresh Kamboju authored Nov 27, 2018

CONFIG_FTRACE_SYSCALLS=y is required for get_cgroup_id_user test case
this test reads a file from debug trace path
/sys/kernel/debug/tracing/events/syscalls/sys_enter_nanosleep/id
Signed-off-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

295daee4

Merge branch 'bpf-sk-msg-pop-data' · 36dbe571

Daniel Borkmann authored Nov 28, 2018

John Fastabend says:

====================
After being able to add metadata to messages with sk_msg_push_data we
have also found it useful to be able to "pop" this metadata off before
sending it to applications in some cases. This series adds a new helper
sk_msg_pop_data() and the associated patches to add tests and tools/lib
support.

Thanks!

v2: Daniel caught that we missed adding sk_msg_pop_data to the changes
    data helper so that the verifier ensures BPF programs revalidate
    data after using this helper. Also improve documentation adding a
    return description and using RST syntax per Quentin's comment. And
    delta calculations for DROP with pop'd data (albeit a strange set
    of operations for a program to be doing) had potential to be
    incorrect possibly confusing user space applications, so fix it.
====================
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

36dbe571

bpf: test_sockmap, add options for msg_pop_data() helper · 1ade9aba

John Fastabend authored Nov 26, 2018

Similar to msg_pull_data and msg_push_data add a set of options to
have msg_pop_data() exercised.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

1ade9aba

bpf: add msg_pop_data helper to tools · d913a227

John Fastabend authored Nov 26, 2018

Add the necessary header definitions to tools for new
msg_pop_data_helper.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

d913a227

bpf: helper to pop data from messages · 7246d8ed

John Fastabend authored Nov 26, 2018

This adds a BPF SK_MSG program helper so that we can pop data from a
msg. We use this to pop metadata from a previous push data call.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

7246d8ed

27 Nov, 2018 8 commits

Merge branch 'libbpf-versioning-doc' · 17d95e42

Alexei Starovoitov authored Nov 26, 2018

Andrey Ignatov says:

====================
This patch set adds ABI versioning and documentation to libbpf.

Patch 1 renames btf_get_from_id to btf__get_from_id to follow naming
convention.
Patch 2 adds version script and has more details on ABI versioning.
Patch 3 adds simple check that all global symbols are versioned.
Patch 4 documents a few aspects of libbpf API and ABI in dev process.

v1->v2:
* add patch from Martin KaFai Lau <kafai@fb.com> to rename btf_get_from_id;
* add documentation for libbpf API and ABI.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

17d95e42

libbpf: Document API and ABI conventions · 76d1b894

Andrey Ignatov authored Nov 23, 2018

Document API and ABI for libbpf: naming convention, symbol visibility,
ABI versioning.

This is just a starting point. Documentation can be significantly
extended in the future to cover more topics.

ABI versioning section touches only a few basic points with a link to
more comprehensive documentation from Ulrich Drepper. This section can
be extended in the future when there is better understanding what works
well and what not so well in libbpf development process and production
usage.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

76d1b894

libbpf: Verify versioned symbols · 306b267c

Andrey Ignatov authored Nov 23, 2018

Since ABI versioning info is kept separately from the code it's easy to
forget to update it while adding a new API.

Add simple verification that all global symbols exported with LIBBPF_API
are versioned in libbpf.map version script.

The idea is to check that number of global symbols in libbpf-in.o, that
is the input to the linker, matches with number of unique versioned
symbols in libbpf.so, that is the output of the linker. If these numbers
don't match, it may mean some symbol was not versioned and make will
fail.

"Unique" means that if a symbol is present in more than one version of
ABI due to ABI changes, it'll be counted once.

Another option to calculate number of global symbols in the "input"
could be to count number of LIBBPF_ABI entries in C headers but it seems
to be fragile.

Example of output when a symbol is missing in version script:

    ...
    LD       libbpf-in.o
    LINK     libbpf.a
    LINK     libbpf.so
  Warning: Num of global symbols in libbpf-in.o (115) does NOT match
  with num of versioned symbols in libbpf.so (114). Please make sure all
  LIBBPF_API symbols are versioned in libbpf.map.
  make: *** [check_abi] Error 1
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

306b267c

libbpf: Add version script for DSO · 16192a77

Andrey Ignatov authored Nov 23, 2018

More and more projects use libbpf and one day it'll likely be packaged
and distributed as DSO and that requires ABI versioning so that both
compatible and incompatible changes to ABI can be introduced in a safe
way in the future without breaking executables dynamically linked with a
previous version of the library.

Usual way to do ABI versioning is version script for the linker. Add
such a script for libbpf. All global symbols currently exported via
LIBBPF_API macro are added to the version script libbpf.map.

The version name LIBBPF_0.0.1 is constructed from the name of the
library + version specified by $(LIBBPF_VERSION) in Makefile.

Version script does not duplicate the work done by LIBBPF_API macro, it
rather complements it. The macro is used at compile time and can be used
by compiler to do optimization that can't be done at link time, it is
purely about global symbol visibility. The version script, in turn, is
used at link time and takes care of ABI versioning. Both techniques are
described in details in [1].

Whenever ABI is changed in the future, version script should be changed
appropriately.

[1] https://www.akkadia.org/drepper/dsohowto.pdfSigned-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

16192a77

libbpf: Name changing for btf_get_from_id · 1d2f44ca

Martin KaFai Lau authored Nov 23, 2018

s/btf_get_from_id/btf__get_from_id/ to restore the API naming convention.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

1d2f44ca

Merge branch 'non-jit-btf-func_info' · b89c2998

Alexei Starovoitov authored Nov 26, 2018

Yonghong Song says:

====================
Commit 838e9690 ("bpf: Introduce bpf_func_info")
added bpf func info support. The userspace is able
to get better ksym's for bpf programs with jit, and
is able to print out func prototypes.

For a program containing func-to-func calls, the existing
implementation returns user specified number of function
calls and BTF types if jit is enabled. If the jit is not
enabled, it only returns the type for the main function.

This is undesirable. Interpreter may still be used
and we should keep feature identical regardless of
whether jit is enabled or not.
This patch fixed this discrepancy.

The following example shows bpftool output for
the bpf program in selftests test_btf_haskv.o when jit
is disabled:
  $ bpftool prog dump xlated id 1490
  int _dummy_tracepoint(struct dummy_tracepoint_args * arg):
     0: (85) call pc+2#__bpf_prog_run_args32
     1: (b7) r0 = 0
     2: (95) exit
  int test_long_fname_1(struct dummy_tracepoint_args * arg):
     3: (85) call pc+1#__bpf_prog_run_args32
     4: (95) exit
  int test_long_fname_2(struct dummy_tracepoint_args * arg):
     5: (b7) r2 = 0
     6: (63) *(u32 *)(r10 -4) = r2
     7: (79) r1 = *(u64 *)(r1 +8)
     8: (15) if r1 == 0x0 goto pc+9
     9: (bf) r2 = r10
    10: (07) r2 += -4
    11: (18) r1 = map[id:1173]
    13: (85) call bpf_map_lookup_elem#77088
    14: (15) if r0 == 0x0 goto pc+3
    15: (61) r1 = *(u32 *)(r0 +4)
    16: (07) r1 += 1
    17: (63) *(u32 *)(r0 +4) = r1
    18: (95) exit
  $ bpftool prog dump jited id 1490
    no instructions returned
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

b89c2998

tools/bpf: change selftest test_btf for both jit and non-jit · 812dd689

Yonghong Song authored Nov 24, 2018

The selftest test_btf is changed to test both jit and non-jit.
The test result should be the same regardless of whether jit
is enabled or not.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

812dd689

bpf: btf: support proper non-jit func info · ba64e7d8

Yonghong Song authored Nov 24, 2018

Commit 838e9690 ("bpf: Introduce bpf_func_info")
added bpf func info support. The userspace is able
to get better ksym's for bpf programs with jit, and
is able to print out func prototypes.

For a program containing func-to-func calls, the existing
implementation returns user specified number of function
calls and BTF types if jit is enabled. If the jit is not
enabled, it only returns the type for the main function.

This is undesirable. Interpreter may still be used
and we should keep feature identical regardless of
whether jit is enabled or not.
This patch fixed this discrepancy.

Fixes: 838e9690 ("bpf: Introduce bpf_func_info")
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

ba64e7d8

26 Nov, 2018 5 commits

bpf: Avoid unnecessary instruction in convert_bpf_ld_abs() · d8f3e978

David Miller authored Nov 26, 2018

'offset' is constant and if it is zero, no need to subtract it
from BPF_REG_TMP.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

d8f3e978

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · 4afe60a9

David S. Miller authored Nov 26, 2018

Daniel Borkmann says:

====================
pull-request: bpf-next 2018-11-26

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Extend BTF to support function call types and improve the BPF
   symbol handling with this info for kallsyms and bpftool program
   dump to make debugging easier, from Martin and Yonghong.

2) Optimize LPM lookups by making longest_prefix_match() handle
   multiple bytes at a time, from Eric.

3) Adds support for loading and attaching flow dissector BPF progs
   from bpftool, from Stanislav.

4) Extend the sk_lookup() helper to be supported from XDP, from Nitin.

5) Enable verifier to support narrow context loads with offset > 0
   to adapt to LLVM code generation (currently only offset of 0 was
   supported). Add test cases as well, from Andrey.

6) Simplify passing device functions for offloaded BPF progs by
   adding callbacks to bpf_prog_offload_ops instead of ndo_bpf.
   Also convert nfp and netdevsim to make use of them, from Quentin.

7) Add support for sock_ops based BPF programs to send events to
   the perf ring-buffer through perf_event_output helper, from
   Sowmini and Daniel.

8) Add read / write support for skb->tstamp from tc BPF and cg BPF
   programs to allow for supporting rate-limiting in EDT qdiscs
   like fq from BPF side, from Vlad.

9) Extend libbpf API to support map in map types and add test cases
   for it as well to BPF kselftests, from Nikita.

10) Account the maximum packet offset accessed by a BPF program in
    the verifier and use it for optimizing nfp JIT, from Jiong.

11) Fix error handling regarding kprobe_events in BPF sample loader,
    from Daniel T.

12) Add support for queue and stack map type in bpftool, from David.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

4afe60a9

bpf: align map type names formatting. · ffac28f9

David Calavera authored Nov 23, 2018

Make the formatting for map_type_name array consistent.
Signed-off-by: David Calavera <david.calavera@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

ffac28f9

bpf: btf: fix spelling mistake "Memmber" -> "Member" · 311fe1a8

Colin Ian King authored Nov 25, 2018

There is a spelling mistake in a btf_verifier_log_member message,
fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

311fe1a8

bpf, tags: Fix DEFINE_PER_CPU expansion · cf0dd411

Rustam Kovhaev authored Nov 23, 2018

Building tags produces warning:

  ctags: Warning: kernel/bpf/local_storage.c:10: null expansion of name pattern "\1"

Let's use the same fix as in commit 25528213 ("tags: Fix DEFINE_PER_CPU
expansions"), even though it violates the usual code style.
Signed-off-by: Rustam Kovhaev <rkovhaev@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

cf0dd411

25 Nov, 2018 10 commits

net: remove unsafe skb_insert() · 4bffc669

Eric Dumazet authored Nov 25, 2018

I do not see how one can effectively use skb_insert() without holding
some kind of lock. Otherwise other cpus could have changed the list
right before we have a chance of acquiring list->lock.

Only existing user is in drivers/infiniband/hw/nes/nes_mgt.c and this
one probably meant to use __skb_insert() since it appears nesqp->pau_list
is protected by nesqp->pau_lock. This looks like nesqp->pau_lock
could be removed, since nesqp->pau_list.lock could be used instead.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Faisal Latif <faisal.latif@intel.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: linux-rdma <linux-rdma@vger.kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

4bffc669

net: bridge: remove redundant checks for null p->dev and p->br · 40b1c813

Colin Ian King authored Nov 25, 2018

A recent change added a null check on p->dev after p->dev was being
dereferenced by the ns_capable check on p->dev. It turns out that
neither the p->dev and p->br null checks are necessary, and can be
removed, which cleans up a static analyis warning.

As Nikolay Aleksandrov noted, these checks can be removed because:

"My reasoning of why it shouldn't be possible:
- On port add new_nbp() sets both p->dev and p->br before creating
  kobj/sysfs

- On port del (trickier) del_nbp() calls kobject_del() before call_rcu()
  to destroy the port which in turn calls sysfs_remove_dir() which uses
  kernfs_remove() which deactivates (shouldn't be able to open new
  files) and calls kernfs_drain() to drain current open/mmaped files in
  the respective dir before continuing, thus making it impossible to
  open a bridge port sysfs file with p->dev and p->br equal to NULL.

So I think it's safe to remove those checks altogether. It'd be nice to
get a second look over my reasoning as I might be missing something in
sysfs/kernfs call path."

Thanks to Nikolay Aleksandrov's suggestion to remove the check and
David Miller for sanity checking this.

Detected by CoverityScan, CID#751490 ("Dereference before null check")

Fixes: a5f3ea54 ("net: bridge: add support for raw sysfs port options")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

40b1c813

Merge branch 'r8169-xmit_more' · a1f2d60a

David S. Miller authored Nov 25, 2018

Heiner Kallweit says:

====================
r8169: make use of xmit_more and __netdev_sent_queue

This series adds helper __netdev_sent_queue to the core and makes use
of it in the r8169 driver.

Heiner Kallweit (2):
  net: core: add __netdev_sent_queue as variant of __netdev_tx_sent_queue
  r8169: make use of xmit_more and __netdev_sent_queue

v2:
- fix minor style issue
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

a1f2d60a

r8169: make use of xmit_more and __netdev_sent_queue · 2e6eedb4

Heiner Kallweit authored Nov 25, 2018

Make use of xmit_more and add the functionality introduced with
3e59020a ("net: bql: add __netdev_tx_sent_queue()").
I used the mlx4 driver as template.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2e6eedb4

net: core: add __netdev_sent_queue as variant of __netdev_tx_sent_queue · 620344c4

Heiner Kallweit authored Nov 25, 2018

Similar to netdev_sent_queue add helper __netdev_sent_queue as variant
of __netdev_tx_sent_queue.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

620344c4

selftests/net: add txring_overwrite · 358be656

Willem de Bruijn authored Nov 24, 2018

Packet sockets with PACKET_TX_RING send skbs with user data in frags.

Before commit 5cd8d46e ("packet: copy user buffers before orphan
or clone") ring slots could be released prematurely, possibly allowing
a process to overwrite data still in flight.

This test opens two packet sockets, one to send and one to read.
The sender has a tx ring of one slot. It sends two packets with
different payload, then reads both and verifies their payload.

Before the above commit, both receive calls return the same data as
the send calls use the same buffer. From the commit, the clone
needed for looping onto a packet socket triggers an skb_copy_ubufs
to create a private copy. The separate sends each arrive correctly.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

358be656

net: qualcomm: rmnet: move null check on dev before dereferecing it · 3c18aa14

Colin Ian King authored Nov 24, 2018

Currently dev is dereferenced by the call dev_net(dev) before dev is null
checked.  Fix this by null checking dev before the potential null
pointer dereference.

Detected by CoverityScan, CID#1462955 ("Dereference before null check")

Fixes: 23790ef1 ("net: qualcomm: rmnet: Allow to configure flags for existing devices")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3c18aa14

cxgb4: remove set but not used variables 'multitrc, speed' · 21ab664a

YueHaibing authored Nov 24, 2018

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:5883:6:
 warning: variable 'multitrc' set but not used [-Wunused-but-set-variable]

drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:8585:32:
 warning: variable 'speed' set but not used [-Wunused-but-set-variable]

'multitrc' never used since introduction in
commit 8e3d04fd ("cxgb4: Add MPS tracing support")

'speed' never used since introduction in
commit c3168cab ("cxgb4/cxgbvf: Handle 32-bit fw port capabilities")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

21ab664a

net: fixup type in netdev_start_xmit() · 2183435c

Alexey Dobriyan authored Nov 24, 2018

Return code should be formally "netdev_tx_t".
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2183435c

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · b1bf78bf
David S. Miller authored Nov 24, 2018

b1bf78bf

24 Nov, 2018 10 commits

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · d146194f

Linus Torvalds authored Nov 24, 2018

Pull arm64 fixes from Catalin Marinas::

 - Fix wrong conflict resolution around CONFIG_ARM64_SSBD

 - Fix sparse warning on unsigned long constant

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: cpufeature: Fix mismerge of CONFIG_ARM64_SSBD block
  arm64: sysreg: fix sparse warnings

d146194f

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 857fa628

Linus Torvalds authored Nov 24, 2018

Pull networking fixes from David Miller:

 1) Need to take mutex in ath9k_add_interface(), from Dan Carpenter.

 2) Fix mt76 build without CONFIG_LEDS_CLASS, from Arnd Bergmann.

 3) Fix socket wmem accounting in SCTP, from Xin Long.

 4) Fix failed resume crash in ena driver, from Arthur Kiyanovski.

 5) qed driver passes bytes instead of bits into second arg of
    bitmap_weight(). From Denis Bolotin.

 6) Fix reset deadlock in ibmvnic, from Juliet Kim.

 7) skb_scrube_packet() needs to scrub the fwd marks too, from Petr
    Machata.

 8) Make sure older TCP stacks see enough dup ACKs, and avoid doing SACK
    compression during this period, from Eric Dumazet.

 9) Add atomicity to SMC protocol cursor handling, from Ursula Braun.

10) Don't leave dangling error pointer if bpf_prog_add() fails in
    thunderx driver, from Lorenzo Bianconi. Also, when we unmap TSO
    headers, set sq->tso_hdrs to NULL.

11) Fix race condition over state variables in act_police, from Davide
    Caratti.

12) Disable guest csum in the presence of XDP in virtio_net, from Jason
    Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (64 commits)
  net: gemini: Fix copy/paste error
  net: phy: mscc: fix deadlock in vsc85xx_default_config
  dt-bindings: dsa: Fix typo in "probed"
  net: thunderx: set tso_hdrs pointer to NULL in nicvf_free_snd_queue
  net: amd: add missing of_node_put()
  team: no need to do team_notify_peers or team_mcast_rejoin when disabling port
  virtio-net: fail XDP set if guest csum is negotiated
  virtio-net: disable guest csum during XDP set
  net/sched: act_police: add missing spinlock initialization
  net: don't keep lonely packets forever in the gro hash
  net/ipv6: re-do dad when interface has IFF_NOARP flag change
  packet: copy user buffers before orphan or clone
  ibmvnic: Update driver queues after change in ring size support
  ibmvnic: Fix RX queue buffer cleanup
  net: thunderx: set xdp_prog to NULL if bpf_prog_add fails
  net/dim: Update DIM start sample after each DIM iteration
  net: faraday: ftmac100: remove netif_running(netdev) check before disabling interrupts
  net/smc: use after free fix in smc_wr_tx_put_slot()
  net/smc: atomic SMCD cursor handling
  net/smc: add SMC-D shutdown signal
  ...

857fa628

Merge tag 'xfs-4.20-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · abe72ff4

Linus Torvalds authored Nov 24, 2018

Pull xfs fixes from Darrick Wong:
 "Dave and I have continued our work fixing corruption problems that can
  be found when running long-term burn-in exercisers on xfs. Here are
  some patches fixing most of the problems, but there will likely be
  more. :/

   - Numerous corruption fixes for copy on write

   - Numerous corruption fixes for blocksize < pagesize writes

   - Don't miscalculate AG reservations for small final AGs

   - Fix page cache truncation to work properly for reflink and extent
     shifting

   - Fix use-after-free when retrying failed inode/dquot buffer logging

   - Fix corruptions seen when using copy_file_range in directio mode"

* tag 'xfs-4.20-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  iomap: readpages doesn't zero page tail beyond EOF
  vfs: vfs_dedupe_file_range() doesn't return EOPNOTSUPP
  iomap: dio data corruption and spurious errors when pipes fill
  iomap: sub-block dio needs to zeroout beyond EOF
  iomap: FUA is wrong for DIO O_DSYNC writes into unwritten extents
  xfs: delalloc -> unwritten COW fork allocation can go wrong
  xfs: flush removing page cache in xfs_reflink_remap_prep
  xfs: extent shifting doesn't fully invalidate page cache
  xfs: finobt AG reserves don't consider last AG can be a runt
  xfs: fix transient reference count error in xfs_buf_resubmit_failed_buffers
  xfs: uncached buffer tracing needs to print bno
  xfs: make xfs_file_remap_range() static
  xfs: fix shared extent data corruption due to missing cow reservation

abe72ff4

net: gemini: Fix copy/paste error · 07093b76

Andreas Fiedler authored Nov 24, 2018

The TX stats should be started with the tx_stats_syncp,
there seems to be a copy/paste error in the driver.
Signed-off-by: Andreas Fiedler <andreas.fiedler@gmx.net>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

07093b76

net: phy: mscc: fix deadlock in vsc85xx_default_config · 3fa528b7

Quentin Schulz authored Nov 23, 2018

The vsc85xx_default_config function called in the vsc85xx_config_init
function which is used by VSC8530, VSC8531, VSC8540 and VSC8541 PHYs
mistakenly calls phy_read and phy_write in-between phy_select_page and
phy_restore_page.

phy_select_page and phy_restore_page actually take and release the MDIO
bus lock and phy_write and phy_read take and release the lock to write
or read to a PHY register.

Let's fix this deadlock by using phy_modify_paged which handles
correctly a read followed by a write in a non-standard page.

Fixes: 6a0bfbbe ("net: phy: mscc: migrate to phy_select/restore_page functions")
Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

3fa528b7

dt-bindings: dsa: Fix typo in "probed" · e7b9fb4f

Fabio Estevam authored Nov 23, 2018

The correct form is "can be probed", so fix the typo.
Signed-off-by: Fabio Estevam <festevam@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

e7b9fb4f

net: thunderx: set tso_hdrs pointer to NULL in nicvf_free_snd_queue · ef2a7cf1

Lorenzo Bianconi authored Nov 23, 2018

Reset snd_queue tso_hdrs pointer to NULL in nicvf_free_snd_queue routine
since it is used to check if tso dma descriptor queue has been previously
allocated. The issue can be triggered with the following reproducer:

$ip link set dev enP2p1s0v0 xdpdrv obj xdp_dummy.o
$ip link set dev enP2p1s0v0 xdpdrv off

[  341.467649] WARNING: CPU: 74 PID: 2158 at mm/vmalloc.c:1511 __vunmap+0x98/0xe0
[  341.515010] Hardware name: GIGABYTE H270-T70/MT70-HD0, BIOS T49 02/02/2018
[  341.521874] pstate: 60400005 (nZCv daif +PAN -UAO)
[  341.526654] pc : __vunmap+0x98/0xe0
[  341.530132] lr : __vunmap+0x98/0xe0
[  341.533609] sp : ffff00001c5db860
[  341.536913] x29: ffff00001c5db860 x28: 0000000000020000
[  341.542214] x27: ffff810feb5090b0 x26: ffff000017e57000
[  341.547515] x25: 0000000000000000 x24: 00000000fbd00000
[  341.552816] x23: 0000000000000000 x22: ffff810feb5090b0
[  341.558117] x21: 0000000000000000 x20: 0000000000000000
[  341.563418] x19: ffff000017e57000 x18: 0000000000000000
[  341.568719] x17: 0000000000000000 x16: 0000000000000000
[  341.574020] x15: 0000000000000010 x14: ffffffffffffffff
[  341.579321] x13: ffff00008985eb27 x12: ffff00000985eb2f
[  341.584622] x11: ffff0000096b3000 x10: ffff00001c5db510
[  341.589923] x9 : 00000000ffffffd0 x8 : ffff0000086868e8
[  341.595224] x7 : 3430303030303030 x6 : 00000000000006ef
[  341.600525] x5 : 00000000003fffff x4 : 0000000000000000
[  341.605825] x3 : 0000000000000000 x2 : ffffffffffffffff
[  341.611126] x1 : ffff0000096b3728 x0 : 0000000000000038
[  341.616428] Call trace:
[  341.618866]  __vunmap+0x98/0xe0
[  341.621997]  vunmap+0x3c/0x50
[  341.624961]  arch_dma_free+0x68/0xa0
[  341.628534]  dma_direct_free+0x50/0x80
[  341.632285]  nicvf_free_resources+0x160/0x2d8 [nicvf]
[  341.637327]  nicvf_config_data_transfer+0x174/0x5e8 [nicvf]
[  341.642890]  nicvf_stop+0x298/0x340 [nicvf]
[  341.647066]  __dev_close_many+0x9c/0x108
[  341.650977]  dev_close_many+0xa4/0x158
[  341.654720]  rollback_registered_many+0x140/0x530
[  341.659414]  rollback_registered+0x54/0x80
[  341.663499]  unregister_netdevice_queue+0x9c/0xe8
[  341.668192]  unregister_netdev+0x28/0x38
[  341.672106]  nicvf_remove+0xa4/0xa8 [nicvf]
[  341.676280]  nicvf_shutdown+0x20/0x30 [nicvf]
[  341.680630]  pci_device_shutdown+0x44/0x88
[  341.684720]  device_shutdown+0x144/0x250
[  341.688640]  kernel_restart_prepare+0x44/0x50
[  341.692986]  kernel_restart+0x20/0x68
[  341.696638]  __se_sys_reboot+0x210/0x238
[  341.700550]  __arm64_sys_reboot+0x24/0x30
[  341.704555]  el0_svc_handler+0x94/0x110
[  341.708382]  el0_svc+0x8/0xc
[  341.711252] ---[ end trace 3f4019c8439959c9 ]---
[  341.715874] page:ffff7e0003ef4000 count:0 mapcount:0 mapping:0000000000000000 index:0x4
[  341.723872] flags: 0x1fffe000000000()
[  341.727527] raw: 001fffe000000000 ffff7e0003f1a008 ffff7e0003ef4048 0000000000000000
[  341.735263] raw: 0000000000000004 0000000000000000 00000000ffffffff 0000000000000000
[  341.742994] page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)

where xdp_dummy.c is a simple bpf program that forwards the incoming
frames to the network stack (available here:
https://github.com/altoor/xdp_walkthrough_examples/blob/master/sample_1/xdp_dummy.c)

Fixes: 05c773f5 ("net: thunderx: Add basic XDP support")
Fixes: 4863dea3 ("net: Adding support for Cavium ThunderX network controller")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ef2a7cf1

ptp: Fix pass zero to ERR_PTR() in ptp_clock_register · aea0a897

YueHaibing authored Nov 23, 2018

Fix smatch warning:

drivers/ptp/ptp_clock.c:298 ptp_clock_register() warn:
 passing zero to 'ERR_PTR'

'err' should be set while device_create_with_groups and
pps_register_source fails

Fixes: 85a66e55 ("ptp: create "pins" together with the rest of attributes")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

aea0a897

Merge branch 'switchdev-blocking-notifiers' · 06d21290

David S. Miller authored Nov 23, 2018

Petr Machata says:

====================
switchdev: Convert switchdev_port_obj_{add,del}() to notifiers

An offloading driver may need to have access to switchdev events on
ports that aren't directly under its control. An example is a VXLAN port
attached to a bridge offloaded by a driver. The driver needs to know
about VLANs configured on the VXLAN device. However the VXLAN device
isn't stashed between the bridge and a front-panel-port device (such as
is the case e.g. for LAG devices), so the usual switchdev ops don't
reach the driver.

VXLAN is likely not the only device type like this: in theory any L2
tunnel device that needs offloading will prompt requirement of this
sort.

A way to fix this is to give up the notion of port object addition /
deletion as a switchdev operation, which assumes somewhat tight coupling
between the message producer and consumer. And instead send the message
over a notifier chain.

The series starts with a clean-up patch #1, where
SWITCHDEV_OBJ_PORT_{VLAN, MDB}() are fixed up to lift the constraint
that the passed-in argument be a simple variable named "obj".

switchdev_port_obj_add and _del are invoked in a context that permits
blocking. Not only that, at least for the VLAN notification, being able
to signal failure is actually important. Therefore introduce a new
blocking notifier chain that the new events will be sent on. That's done
in patch #2. Retain the current (atomic) notifier chain for the
preexisting notifications.

In patch #3, introduce two new switchdev notifier types,
SWITCHDEV_PORT_OBJ_ADD and SWITCHDEV_PORT_OBJ_DEL. These notifier types
communicate the same event as the corresponding switchdev op, except in
a form of a notification. struct switchdev_notifier_port_obj_info was
added to carry the fields that correspond to the switchdev op arguments.
An additional field, handled, will be used to communicate back to
switchdev that the event has reached an interested party, which will be
important for the two-phase commit.

In patches #4, #5, and #7, rocker, DSA resp. ethsw are updated to
subscribe to the switchdev blocking notifier chain, and handle the new
notifier types. #6 introduces a helper to determine whether a
netdevice corresponds to a front panel port.

What these three drivers have in common is that their ports don't
support any uppers besides bridge. That makes it possible to ignore any
notifiers that don't reference a front-panel port device, because they
are certainly out of scope.

Unlike the previous three, mlxsw and ocelot drivers admit stacked
devices as uppers. While the current switchdev code recursively descends
through layers of lower devices, eventually calling the op on a
front-panel port device, the notifier would reference a stacking device
that's one of front-panel ports uppers. The filtering is thus more
complex.

For ocelot, such iteration is currently pretty much required, because
there's no bookkeeping of LAG devices. mlxsw does keep the list of LAGs,
however it iterates the lower devices anyway when deciding whether an
event on a tunnel device pertains to the driver or not.

Therefore this patch set instead introduces, in patch #8, a helper to
iterate through lowers, much like the current switchdev code does,
looking for devices that match a given predicate.

Then in patches #9 and #10, first mlxsw and then ocelot are updated to
dispatch the newly-added notifier types to the preexisting
port_obj_add/_del handlers. The dispatch is done via the new helper, to
recursively descend through lower devices.

Finally in patch #11, the actual switch is made, retiring the current
SDO-based code in favor of a notifier.

Now that the event is distributed through a notifier, the explicit
netdevice check in rocker, DSA and ethsw doesn't let through any events
except those done on a front-panel port itself. It is therefore
unnecessary to check in VLAN-handling code whether a VLAN was added to
the bridge itself: such events will simply be ignored much sooner.
Therefore remove it in patch #12.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

06d21290

rocker, dsa, ethsw: Don't filter VLAN events on bridge itself · ab4a1686

Petr Machata authored Nov 22, 2018

Due to an explicit check in rocker_world_port_obj_vlan_add(),
dsa_slave_switchdev_event() resp. port_switchdev_event(), VLAN objects
that are added to a device that is not a front-panel port device are
ignored. Therefore this check is immaterial.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ab4a1686