Commits · 34666d467cbf1e2e3c7bb15a63eccfb582cdd71f · nexedi / linux

26 Sep, 2014 3 commits

netfilter: bridge: move br_netfilter out of the core · 34666d46

Pablo Neira Ayuso authored Sep 18, 2014

Jesper reported that br_netfilter always registers the hooks since
this is part of the bridge core. This harms performance for people that
don't need this.

This patch modularizes br_netfilter so it can be rmmod'ed, thus,
the hooks can be unregistered. I think the bridge netfilter should have
been a separated module since the beginning, Patrick agreed on that.

Note that this is breaking compatibility for users that expect that
bridge netfilter is going to be available after explicitly 'modprobe
bridge' or via automatic load through brctl.

However, the damage can be easily undone by modprobing br_netfilter.
The bridge core also spots a message to provide a clue to people that
didn't notice that this has been deprecated.

On top of that, the plan is that nftables will not rely on this software
layer, but integrate the connection tracking into the bridge layer to
enable stateful filtering and NAT, which is was bridge netfilter users
seem to require.

This patch still keeps the fake_dst_ops in the bridge core, since this
is required by when the bridge port is initialized. So we can safely
modprobe/rmmod br_netfilter anytime.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Florian Westphal <fw@strlen.de>

34666d46

netfilter: bridge: nf_bridge_copy_header as static inline in header · 7276ca3f

Pablo Neira Ayuso authored Sep 22, 2014

Move nf_bridge_copy_header() as static inline in netfilter_bridge.h
header file. This patch prepares the modularization of the br_netfilter
code.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

7276ca3f

net/netfilter/x_tables.c: use __seq_open_private() · 772476df

Rob Jones authored Sep 19, 2014

Reduce boilerplate code by using __seq_open_private() instead of seq_open()
in xt_match_open() and xt_target_open().
Signed-off-by: Rob Jones <rob.jones@codethink.co.uk>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

772476df

19 Sep, 2014 2 commits

netfilter: nf_tables: export rule-set generation ID · 84d7fce6

Pablo Neira Ayuso authored Sep 04, 2014

This patch exposes the ruleset generation ID in three ways:

1) The new command NFT_MSG_GETGEN that exposes the 32-bits ruleset
   generation ID. This ID is incremented in every commit and it
   should be large enough to avoid wraparound problems.

2) The less significant 16-bits of the generation ID are exposed through
   the nfgenmsg->res_id header field. This allows us to quickly catch
   if the ruleset has change between two consecutive list dumps from
   different object lists (in this specific case I think the risk of
   wraparound is unlikely).

3) Userspace subscribers may receive notifications of new rule-set
   generation after every commit. This also provides an alternative
   way to monitor the generation ID. If the events are lost, the
   userspace process hits a overrun error, so it knows that it is
   working with a stale ruleset anyway.

Patrick spotted that rule-set transformations in userspace may take
quite some time. In that case, it annotates the 32-bits generation ID
before fetching the rule-set, then:

1) it compares it to what we obtain after the transformation to
   make sure it is not working with a stale rule-set and no wraparound
   has ocurred.

2) it subscribes to ruleset notifications, so it can watch for new
   generation ID.

This is complementary to the NLM_F_DUMP_INTR approach, which allows
us to detect an interference in the middle one single list dumping.
There is no way to explicitly check that an interference has occurred
between two list dumps from the kernel, since it doesn't know how
many lists the userspace client is actually going to dump.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

84d7fce6

netfilter: nfnetlink: use original skbuff when committing/aborting · fc04733a

Pablo Neira Ayuso authored Sep 11, 2014

This allows us to access the original content of the batch from
the commit and the abort paths.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

fc04733a

18 Sep, 2014 1 commit

Merge branch 'ipvs-next' · fcfa8f49

Pablo Neira Ayuso authored Sep 18, 2014

Simon Horman says:

====================
This pull requests makes the following changes:

* Add simple weighted fail-over scheduler.
  - Unlike other IPVS schedulers this offers fail-over rather than load
    balancing. Connections are directed to the appropriate server based
    solely on highest weight value and server availability.
  - Thanks to Kenny Mathis

* Support IPv6 real servers in IPv4 virtual-services and vice versa
  - This feature is supported in conjunction with the tunnel (IPIP)
    forwarding mechanism. That is, IPv4 may be forwarded in IPv6 and
    vice versa.
  - The motivation for this is to allow more flexibility in the
    choice of IP version offered by both virtual-servers and
    real-servers as they no longer need to match: An IPv4 connection from an
    end-user may be forwarded to a real-server using IPv6 and vice versa.
  - Further work need to be done to support this feature in conjunction
    with connection synchronisation. For now such configurations are
    not allowed.
  - This change includes update to netlink protocol, adding a new
    destination address family attribute. And the necessary changes
    to plumb this information throughout IPVS.
  - Thanks to Alex Gartrell and Julian Anastasov
====================
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

fcfa8f49

17 Sep, 2014 3 commits

ipvs: Allow heterogeneous pools now that we support them · bc18d37f

Alex Gartrell authored Sep 09, 2014

Remove the temporary consistency check and add a case statement to only
allow ipip mixed dests.
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

bc18d37f

ipvs: use the new dest addr family field · f18ae720

Julian Anastasov authored Sep 09, 2014

Use the new address family field cp->daf when printing
cp->daddr in logs or connection listing.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

f18ae720

ipvs: use correct address family in scheduler logs · 4d316f3f

Julian Anastasov authored Sep 17, 2014

Needed to support svc->af != dest->af.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

4d316f3f

16 Sep, 2014 12 commits

ipvs: address family of LBLCR entry depends on svc family · cf34e646

Julian Anastasov authored Sep 09, 2014

The LBLCR entries should use svc->af, not dest->af.
Needed to support svc->af != dest->af.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

cf34e646

ipvs: address family of LBLC entry depends on svc family · f7fa3800

Julian Anastasov authored Sep 09, 2014

The LBLC entries should use svc->af, not dest->af.
Needed to support svc->af != dest->af.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

f7fa3800

ipvs: support ipv4 in ipv6 and ipv6 in ipv4 tunnel forwarding · 8052ba29

Alex Gartrell authored Sep 09, 2014

Pull the common logic for preparing an skb to prepend the header into a
single function and then set fields such that they can be used in either
case (generalize tos and tclass to dscp, hop_limit and ttl to ttl, etc)
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

8052ba29

ipvs: Add generic ensure_mtu_is_adequate to handle mixed pools · c63e4de2

Alex Gartrell authored Sep 09, 2014

The out_rt functions check to see if the mtu is large enough for the packet
and, if not, send icmp messages (TOOBIG or DEST_UNREACH) to the source and
bail out.  We needed the ability to send ICMP from the out_rt_v6 function
and DEST_UNREACH from the out_rt function, so we just pulled it out into a
common function.
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

c63e4de2

ipvs: Pull out update_pmtu code · 919aa0b2

Alex Gartrell authored Sep 09, 2014

Another step toward heterogeneous pools, this removes another piece of
functionality currently specific to each address family type.
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

919aa0b2

ipvs: Pull out crosses_local_route_boundary logic · 4a4739d5

Alex Gartrell authored Sep 09, 2014

This logic is repeated in both out_rt functions so it was redundant.
Additionally, we'll need to be able to do checks to route v4 to v6 and vice
versa in order to deal with heterogeneous pools.

This patch also updates the callsites to add an additional parameter to the
out route functions.
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

4a4739d5

ipvs: prevent mixing heterogeneous pools and synchronization · 391f503d

Alex Gartrell authored Sep 09, 2014

The synchronization protocol is not compatible with heterogeneous pools, so
we need to verify that we're not turning both on at the same time.
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

391f503d

ipvs: Supply destination address family to ip_vs_conn_new · ba38528a

Alex Gartrell authored Sep 09, 2014

The assumption that dest af is equal to service af is now unreliable, so we
must specify it manually so as not to copy just the first 4 bytes of a v6
address or doing an illegal read of 16 butes on a v6 address.

We "lie" in two places: for synchronization (which we will explicitly
disallow from happening when we have heterogeneous pools) and for black
hole addresses where there's no real dest.
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

ba38528a

ipvs: Pass destination address family to ip_vs_trash_get_dest · ad147aa4

Alex Gartrell authored Sep 09, 2014

Part of a series of diffs to tease out destination family from virtual
family.  This diff just adds a parameter to ip_vs_trash_get and then uses
it for comparison rather than svc->af.
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

ad147aa4

ipvs: Supply destination addr family to ip_vs_{lookup_dest,find_dest} · 655eef10

Alex Gartrell authored Sep 09, 2014

We need to remove the assumption that virtual address family is the same as
real address family in order to support heterogeneous services (that is,
services with v4 vips and v6 backends or the opposite).
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

655eef10

ipvs: Add destination address family to netlink interface · 6cff339b

Alex Gartrell authored Sep 09, 2014

This is necessary to support heterogeneous pools.  For example, if you have
an ipv6 addressed network, you'll want to be able to forward ipv4 traffic
into it.

This patch enforces that destination address family is the same as service
family, as none of the forwarding mechanisms support anything else.

For the old setsockopt mechanism, we simply set the dest address family to
AF_INET as we do with the service.
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

6cff339b

ipvs: Add simple weighted failover scheduler · 616a9be2

Kenny Mathis authored Sep 09, 2014

Add simple weighted IPVS failover support to the Linux kernel. All
other scheduling modules implement some form of load balancing, while
this offers a simple failover solution. Connections are directed to
the appropriate server based solely on highest weight value and server
availability. Tested functionality with keepalived.
Signed-off-by: Kenny Mathis <kmathis@chokepoint.net>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

616a9be2

15 Sep, 2014 8 commits

netfilter: ipset: hash:mac type added to ipset · 07034aea
Jozsef Kadlecsik authored Sep 15, 2014
```
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
```
07034aea

netfilter: ipset: send nonzero skbinfo extensions only · aef96193

Jozsef Kadlecsik authored Sep 15, 2014

Do not send zero valued skbinfo extensions to userspace at listing.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>

aef96193

netfilter: ipset: Add skbinfo extension support to SET target. · 76cea410

Anton Danilov authored Sep 02, 2014

Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>

76cea410

netfilter: ipset: Add skbinfo extension kernel support for the list set type. · cbee93d7

Anton Danilov authored Aug 28, 2014

Add skbinfo extension kernel support for the list set type.
Introduce the new revision of the list set type.
Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>

cbee93d7

netfilter: ipset: Add skbinfo extension kernel support for the hash set types. · af331419

Anton Danilov authored Aug 28, 2014

Add skbinfo extension kernel support for the hash set types.
Inroduce the new revisions of all hash set types.
Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>

af331419

netfilter: ipset: Add skbinfo extension kernel support for the bitmap set types. · 39d1ecf1

Anton Danilov authored Aug 28, 2014

Add skbinfo extension kernel support for the bitmap set types.
Inroduce the new revisions of bitmap_ip, bitmap_ipmac and bitmap_port set types.
Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>

39d1ecf1

netfilter: ipset: Add skbinfo extension kernel support in the ipset core. · 0e9871e3

Anton Danilov authored Aug 28, 2014

Skbinfo extension provides mapping of metainformation with lookup in the ipset tables.
This patch defines the flags, the constants, the functions and the structures
for the data type independent support of the extension.
Note the firewall mark stores in the kernel structures as two 32bit values,
but transfered through netlink as one 64bit value.
Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>

0e9871e3

netfilter: ipset: Fix static checker warning in ip_set_core.c · 73e64e18

Jozsef Kadlecsik authored Sep 15, 2014

Dan Carpenter reported the following static checker warning:

        net/netfilter/ipset/ip_set_core.c:1414 call_ad()
        error: 'nlh->nlmsg_len' from user is not capped properly

The payload size is limited now by the max size of size_t.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>

73e64e18

12 Sep, 2014 2 commits

netfilter: masquerading needs to be independent of x_tables in Kconfig · 0bbe80e5

Pablo Neira Ayuso authored Sep 11, 2014

Users are starting to test nf_tables with no x_tables support. Therefore,
masquerading needs to be indenpendent of it from Kconfig.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

0bbe80e5

netfilter: NFT_CHAIN_NAT_IPV* is independent of NFT_NAT · 3e8dc212

Pablo Neira Ayuso authored Sep 11, 2014

Now that we have masquerading support in nf_tables, the NAT chain can
be use with it, not only for SNAT/DNAT. So make this chain type
independent of it.

While at it, move it inside the scope of 'if NF_NAT_IPV*' to simplify
dependencies.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

3e8dc212

11 Sep, 2014 2 commits

netfilter: nf_tables: add NFTA_MASQ_UNSPEC to nft_masq_attributes · 39e393bb
Pablo Neira Ayuso authored Sep 11, 2014
```
To keep this consistent with other nft_*_attributes.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
```
39e393bb

netfilter: fix compilation of masquerading without IP_NF_TARGET_MASQUERADE · 67981fef

Pablo Neira Ayuso authored Sep 11, 2014

 CONFIG_NF_NAT_MASQUERADE_IPV6=m
 # CONFIG_IP6_NF_TARGET_MASQUERADE is not set

results in:

net/ipv6/netfilter/nf_nat_masquerade_ipv6.c: In function ‘nf_nat_masquerade_ipv6’:
net/ipv6/netfilter/nf_nat_masquerade_ipv6.c:41:14: error: ‘struct nf_conn_nat’ has no member named ‘masq_index’
  nfct_nat(ct)->masq_index = out->ifindex;
              ^
net/ipv6/netfilter/nf_nat_masquerade_ipv6.c: In function ‘device_cmp’:
net/ipv6/netfilter/nf_nat_masquerade_ipv6.c:61:12: error: ‘const struct nf_conn_nat’ has no member named ‘masq_index’
  return nat->masq_index == (int)(long)ifindex;
            ^
net/ipv6/netfilter/nf_nat_masquerade_ipv6.c:62:1: warning: control
reaches end of non-void function [-Wreturn-type]
 }
 ^
make[3]: *** [net/ipv6/netfilter/nf_nat_masquerade_ipv6.o] Error 1

Fix this by using the new NF_NAT_MASQUERADE_IPV4 and _IPV6 symbols
in include/net/netfilter/nf_nat.h.
Reported-by: Jim Davis <jim.epost@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

67981fef

10 Sep, 2014 7 commits

net: bpf: only build bpf_jit_binary_{alloc, free}() when jit selected · b954d834

Daniel Borkmann authored Sep 10, 2014

Since BPF JIT depends on the availability of module_alloc() and
module_free() helpers (HAVE_BPF_JIT and MODULES), we better build
that code only in case we have BPF_JIT in our config enabled, just
like with other JIT code. Fixes builds for arm/marzen_defconfig
and sh/rsk7269_defconfig.

====================
kernel/built-in.o: In function `bpf_jit_binary_alloc':
/home/cwang/linux/kernel/bpf/core.c:144: undefined reference to `module_alloc'
kernel/built-in.o: In function `bpf_jit_binary_free':
/home/cwang/linux/kernel/bpf/core.c:164: undefined reference to `module_free'
make: *** [vmlinux] Error 1
====================
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Fixes: 738cbe72 ("net: bpf: consolidate JIT binary allocator")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b954d834

Merge branch 'cxgb4-next' · 17fa1f98

David S. Miller authored Sep 10, 2014

Hariprasad Shenai says:

====================
cxgb4: Allow FW size upto 1MB, support for S25FL032P flash and misc. fixes

This patch series adds support to allow FW size upto 1MB, support for S25FL032P
flash. Fix t4_flash_erase_sectors to throw an error, when erase sector aren't in
the flash and also warning message when adapters have flashes less than 2Mb.
Adds device id of new adapter and removes device id of debug adapter.

The patches series is created against 'net-next' tree.
And includes patches on cxgb4 driver and cxgb4vf driver.

We have included all the maintainers of respective drivers. Kindly review the
change and let us know in case of any review comments.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

17fa1f98

cxgb4/cxgb4vf: Add device ID for new adapter and remove for dbg adapter · 56e03e51
Hariprasad Shenai authored Sep 10, 2014
```
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
56e03e51

cxgb4: Add warning msg when attaching to adapters which have FLASHes smaller than 2Mb · c290607e

Hariprasad Shenai authored Sep 10, 2014

Based on original work by Casey Leedom <leedom@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c290607e

cxgb4: Fix t4_flash_erase_sectors() to throw an error when requested to erase... · c0d5b8cf

Hariprasad Shenai authored Sep 10, 2014

cxgb4: Fix t4_flash_erase_sectors() to throw an error when requested to erase sectors which aren't in the FLASH

Based on original work by Casey Leedom <leedom@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c0d5b8cf

cxgb4: Add support to S25FL032P flash · fe2ee139

Hariprasad Shenai authored Sep 10, 2014

Add support for Spansion S25FL032P flash
Based on original work by Dimitris Michailidis
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fe2ee139

cxgb4: Allow T4/T5 firmware sizes up to 1MB · 60d42bf6

Hariprasad Shenai authored Sep 10, 2014

Based on original work by Casey Leedom <leedom@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

60d42bf6