Commits · db4374f48a6c31c02f6ad1d19b257c186b443c0c · Kirill Smelkov / linux

16 Mar, 2015 22 commits

rhashtable: Annotate RCU locking of walkers · db4374f4

Thomas Graf authored Mar 16, 2015

Fixes the following sparse warnings:

lib/rhashtable.c:767:5: warning: context imbalance in 'rhashtable_walk_start' - wrong count at exit
lib/rhashtable.c:849:6: warning: context imbalance in 'rhashtable_walk_stop' - unexpected unlock

Fixes: f2dba9c6 ("rhashtable: Introduce rhashtable_walk_*")
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

db4374f4

rocker: replace fixed stack allocation with dynamic allocation · 04f49faf

Scott Feldman authored Mar 15, 2015

In hast to fix some sparse warning, I hard-coded a fix-sized array on the stack
which is probably too big for kernel standards.  Fix this by converting array
to dynamic allocation.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

04f49faf

Merge branch 'listener_refactor' · b35f504a

David S. Miller authored Mar 16, 2015

Eric Dumazet says:

====================
inet: tcp listener refactoring, part 10

We are getting close to the point where request sockets will be hashed
into generic hash table. Some followups are needed for netfilter and
will be handled in next patch series.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

b35f504a

inet: add proper refcounting to request sock · 13854e5a

Eric Dumazet authored Mar 15, 2015

reqsk_put() is the generic function that should be used
to release a refcount (and automatically call reqsk_free())

reqsk_free() might be called if refcount is known to be 0
or undefined.

refcnt is set to one in inet_csk_reqsk_queue_add()

As request socks are not yet in global ehash table,
I added temporary debugging checks in reqsk_put() and reqsk_free()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

13854e5a

inet: factorize sock_edemux()/sock_gen_put() code · 2c13270b

Eric Dumazet authored Mar 15, 2015

sock_edemux() is not used in fast path, and should
really call sock_gen_put() to save some code.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2c13270b

inet_diag: allow sk_diag_fill() to handle request socks · a58917f5

Eric Dumazet authored Mar 15, 2015

inet_diag_fill_req() is renamed to inet_req_diag_fill()
and moved up, so that it can be called fom sk_diag_fill()

inet_diag_bc_sk() is ready to handle request socks.

inet_twsk_diag_dump() is no longer needed.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a58917f5

inet: ip early demux should avoid request sockets · f7e4eb03

Eric Dumazet authored Mar 15, 2015

When a request socket is created, we do not cache ip route
dst entry, like for timewait sockets.

Let's use sk_fullsock() helper.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f7e4eb03

net: add sk_fullsock() helper · 1d0ab253

Eric Dumazet authored Mar 15, 2015

We have many places where we want to check if a socket is
not a timewait or request socket. Use a helper to avoid
hard coding this.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1d0ab253

Merge branch 'swdev_ops' · f00bbd21

David S. Miller authored Mar 16, 2015

Scott Feldman says:

====================
switchdev: add swdev ops

v3:

 - Fix missing include for DSA build

v2:

 - Per Simon's review, squash some of the dependent commits into one to
   make series git bisect safe.

v1:

Per discussions at netconf, move switchdev ndo ops to a new swdev_ops to
keep ndo namespace clean and maintain switchdev-related ops into one place.

There are no functional changes here; just shuffling ops around for better
organization.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

f00bbd21

netdev: remove ndo ops for switchdev · 812a1c3f

Scott Feldman authored Mar 15, 2015

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

812a1c3f

switchdev: use new swdev ops · 98237d43

Scott Feldman authored Mar 15, 2015

Move swdev wrappers over to new swdev ops (from previous ndo ops).  No
functional changes to the implementation.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>

rocker: move to new swdev ops
Signed-off-by: Scott Feldman <sfeldma@gmail.com>

dsa: move to new swdev ops
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

98237d43

switchdev: add swdev ops · 4170604f

Scott Feldman authored Mar 15, 2015

As discussed at netconf, introduce swdev_ops as first step to move switchdev
ops from ndo to swdev.  This will keep switchdev from cluttering up ndo ops
space.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4170604f

Merge branch 'rhashtable-fixes-next' · 7993d44e

David S. Miller authored Mar 15, 2015

Herbert Xu says:

====================
rhashtable: Fix two bugs caused by multiple rehash preparation

While testing some new patches over the weekend I discovered a
couple of bugs in the series that had just been merged.  These
two patches fix them:

1) A use-after-free in the walker that can cause crashes when
walking during a rehash.

2) When a second rehash starts during a single rhashtable_remove
call the remove may fail when it shouldn't.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

7993d44e

rhashtable: Fix rhashtable_remove failures · 565e8640

Herbert Xu authored Mar 15, 2015

The commit 9d901bc0 ("rhashtable:
Free bucket tables asynchronously after rehash") causes gratuitous
failures in rhashtable_remove.

The reason is that it inadvertently introduced multiple rehashing
from the perspective of readers.  IOW it is now possible to see
more than two tables during a single RCU critical section.

Fortunately the other reader rhashtable_lookup already deals with
this correctly thanks to c4db8848
("rhashtable: rhashtable: Move future_tbl into struct bucket_table")
so only rhashtable_remove is broken by this change.

This patch fixes this by looping over every table from the first
one to the last or until we find the element that we were trying
to delete.

Incidentally the simple test for detecting rehashing to prevent
starting another shrinking no longer works.  Since it isn't needed
anyway (the work queue and the mutex serves as a natural barrier
to unnecessary rehashes) I've simply killed the test.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

565e8640

rhashtable: Fix use-after-free in rhashtable_walk_stop · 963ecbd4

Herbert Xu authored Mar 15, 2015

The commit c4db8848 ("rhashtable:
Move future_tbl into struct bucket_table") introduced a use-after-
free bug in rhashtable_walk_stop because it dereferences tbl after
droping the RCU read lock.

This patch fixes it by moving the RCU read unlock down to the bottom
of rhashtable_walk_stop.  In fact this was how I had it originally
but it got dropped while rearranging patches because this one
depended on the async freeing of bucket_table.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

963ecbd4

net: bcmgenet: add support for Hardware Filter Block · 0034de41

Petri Gynther authored Mar 13, 2015

Add support for Hardware Filter Block (HFB) so that incoming Rx traffic
can be matched and directed to desired Rx queues.
Signed-off-by: Petri Gynther <pgynther@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0034de41

Merge branch 'ebpf_skb_fields' · 70006af9

David S. Miller authored Mar 15, 2015

Alexei Starovoitov says:

====================
bpf: allow eBPF access skb fields

V1->V2:
- refactored field access converter into common helper convert_skb_access()
  used in both classic and extended BPF
- added missing build_bug_on for field 'len'
- added comment to uapi/linux/bpf.h as suggested by Daniel
- dropped exposing 'ifindex' field for now

classic BPF has a way to access skb fields, whereas extended BPF didn't.
This patch introduces this ability.

Classic BPF can access fields via negative SKF_AD_OFF offset.
Positive bpf_ld_abs N is treated as load from packet, whereas
bpf_ld_abs -0x1000 + N is treated as skb fields access.
Many offsets were hard coded over years: SKF_AD_PROTOCOL, SKF_AD_PKTTYPE, etc.
The problem with this approach was that for every new field classic bpf
assembler had to be tweaked.

I've considered doing the same for extended, but for every new field LLVM
compiler would have to be modifed. Since it would need to add a new intrinsic.
It could be done with single intrinsic and magic offset or use of inline
assembler, but neither are clean from compiler backend point of view, since
they look like calls but shouldn't scratch caller-saved registers.

Another approach was to introduce a new helper functions like bpf_get_pkt_type()
for every field that we want to access, but that is equally ugly for kernel
and slow, since helpers are calls and they are slower then just loads.
In theory helper calls can be 'inlined' inside kernel into direct loads, but
since they were calls for user space, compiler would have to spill registers
around such calls anyway. Teaching compiler to treat such helpers differently
is even uglier.

They were few other ideas considered. At the end the best seems to be to
introduce a user accessible mirror of in-kernel sk_buff structure:

struct __sk_buff {
    __u32 len;
    __u32 pkt_type;
    __u32 mark;
    __u32 queue_mapping;
};

bpf programs will do:

int bpf_prog1(struct __sk_buff *skb)
{
    __u32 var = skb->pkt_type;

which will be compiled to bpf assembler as:

dst_reg = *(u32 *)(src_reg + 4) // 4 == offsetof(struct __sk_buff, pkt_type)

bpf verifier will check validity of access and will convert it to:

dst_reg = *(u8 *)(src_reg + offsetof(struct sk_buff, __pkt_type_offset))
dst_reg &= 7

since 'pkt_type' is a bitfield.

No new instructions added. LLVM doesn't need to be modified.
JITs don't change and verifier already knows when it accesses 'ctx' pointer.
The only thing needed was to convert user visible offset within __sk_buff
to kernel internal offset within sk_buff.
For 'len' and other fields conversion is trivial.
Converting 'pkt_type' takes 2 or 3 instructions depending on endianness.
More fields can be exposed by adding to the end of the 'struct __sk_buff'.
Like vlan_tci and others can be added later.

When pkt_type field is moved around, goes into different structure, removed or
its size changes, the function convert_skb_access() would need to updated and
it will cover both classic and extended.

Patch 2 updates examples to demonstrates how fields are accessed and
adds new tests for verifier, since it needs to detect a corner case when
attacker is using single bpf instruction in two branches with different
register types.

The 4 fields of __sk_buff are already exposed to user space via classic bpf and
I believe they're useful in extended as well.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

70006af9

samples: bpf: add skb->field examples and tests · 614cd3bd

Alexei Starovoitov authored Mar 13, 2015

- modify sockex1 example to count number of bytes in outgoing packets
- modify sockex2 example to count number of bytes and packets per flow
- add 4 stress tests that exercise 'skb->field' code path of verifier
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

614cd3bd

bpf: allow extended BPF programs access skb fields · 9bac3d6d

Alexei Starovoitov authored Mar 13, 2015

introduce user accessible mirror of in-kernel 'struct sk_buff':
struct __sk_buff {
    __u32 len;
    __u32 pkt_type;
    __u32 mark;
    __u32 queue_mapping;
};

bpf programs can do:

int bpf_prog(struct __sk_buff *skb)
{
    __u32 var = skb->pkt_type;

which will be compiled to bpf assembler as:

dst_reg = *(u32 *)(src_reg + 4) // 4 == offsetof(struct __sk_buff, pkt_type)

bpf verifier will check validity of access and will convert it to:

dst_reg = *(u8 *)(src_reg + offsetof(struct sk_buff, __pkt_type_offset))
dst_reg &= 7

since skb->pkt_type is a bitfield.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9bac3d6d

Merge branch 'ebpf_helpers' · a498cfe9

David S. Miller authored Mar 15, 2015

Daniel Borkmann says:

====================
eBPF updates

Two small eBPF helper additions to better match up with ancillary
classic BPF functionality.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

a498cfe9

ebpf: add helper for obtaining current processor id · c04167ce

Daniel Borkmann authored Mar 14, 2015

This patch adds the possibility to obtain raw_smp_processor_id() in
eBPF. Currently, this is only possible in classic BPF where commit
da2033c2 ("filter: add SKF_AD_RXHASH and SKF_AD_CPU") has added
facilities for this.

Perhaps most importantly, this would also allow us to track per CPU
statistics with eBPF maps, or to implement a poor-man's per CPU data
structure through eBPF maps.

Example function proto-type looks like:

  u32 (*smp_processor_id)(void) = (void *)BPF_FUNC_get_smp_processor_id;
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

c04167ce

ebpf: add prandom helper for packet sampling · 03e69b50

Daniel Borkmann authored Mar 14, 2015

This work is similar to commit 4cd3675e ("filter: added BPF
random opcode") and adds a possibility for packet sampling in eBPF.

Currently, this is only possible in classic BPF and useful to
combine sampling with f.e. packet sockets, possible also with tc.

Example function proto-type looks like:

  u32 (*prandom_u32)(void) = (void *)BPF_FUNC_get_prandom_u32;
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

03e69b50

15 Mar, 2015 11 commits

Merge branch 'gianfar-next' · 12028820

David S. Miller authored Mar 15, 2015

Claudiu Manoil says:

====================
gianfar: ARM port driver updates (2/2)

The 2nd round of driver updates to make gianfar portable on ARM,
for the ARM based SoC that integrates eTSEC - "ls1021a".
The patches address the bulk of remaining endianess issues -
handling DMA fields (BD and FCB), and device tree properties.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

12028820

gianfar: Consider dts property endianess on handling · 55917641

Jingchang Lu authored Mar 13, 2015

Use of_property_read*() to get arch endian consistent
property values. Do some refactoring in the process.
Signed-off-by: Jingchang Lu <jingchang.lu@freescale.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

55917641

gianfar: Make FCB access endian safe · 26eb9374

Claudiu Manoil authored Mar 13, 2015

Use conversion macros to correctly access the BE
fields of the Rx and Tx Frame Control Block on LE CPUs.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

26eb9374

gianfar: Make BDs access endian safe · a7312d58

Claudiu Manoil authored Mar 13, 2015

Use conversion macros to correctly access the BE
fields of the Rx and Tx Buffer Descriptors on LE CPUs.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a7312d58

Merge branch 'rhashtable-next' · 5a2f78dd

David S. Miller authored Mar 15, 2015

Herbert Xu says:

====================
rhashtable: Fixes + cleanups + preparation for multiple rehash

Patch 1 fixes the walker so that it behaves properly even during
a resize.

Patch 2-3 are cleanups.

Patch 4-6 lays some ground work for the upcoming multiple rehashing.

This revision fixes the warning coming from the bucket_table->size
downsize and improves its changelog.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

5a2f78dd

rhashtable: Move future_tbl into struct bucket_table · c4db8848