Commits · 7237190df8c4129241697530a4eecabdc4ecc66e · Kirill Smelkov / linux

29 Apr, 2013 16 commits

netfilter: nfnetlink_queue: add skb info attribute · 7237190d

Florian Westphal authored Apr 19, 2013

Once we allow userspace to receive gso/gro packets, userspace
needs to be able to determine when checksums appear to be
broken, but are not.

NFQA_SKB_CSUMNOTREADY means 'checksums will be fixed in kernel
later, pretend they are ok'.

NFQA_SKB_GSO could be used for statistics, or to determine when
packet size exceeds mtu.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

7237190d

netfilter: move skb_gso_segment into nfnetlink_queue module · a5fedd43

Florian Westphal authored Apr 19, 2013

skb_gso_segment is expensive, so it would be nice if we could
avoid it in the future. However, userspace needs to be prepared
to receive larger-than-mtu-packets (which will also have incorrect
l3/l4 checksums), so we cannot simply remove it.

The plan is to add a per-queue feature flag that userspace can
set when binding the queue.

The problem is that in nf_queue, we only have a queue number,
not the queue context/configuration settings.

This patch should have no impact other than the skb_gso_segment
call now being in a function that has access to the queue config
data.

A new size attribute in nf_queue_entry is needed so
nfnetlink_queue can duplicate the entry of the gso skb
when segmenting the skb while also copying the route key.

The follow up patch adds switch to disable skb_gso_segment when
queue config says so.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

a5fedd43

netfilter: nf_queue: move device refcount bump to extra function · 4bd60443

Florian Westphal authored Apr 19, 2013

required by future patch that will need to duplicate the
nf_queue_entry, bumping refcounts of the copy.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

4bd60443

netfilter: ipset: set match: add support to match the counters · 6e01781d

Jozsef Kadlecsik authored Apr 27, 2013

The new revision of the set match supports to match the counters
and to suppress updating the counters at matching too.

At the set:list types, the updating of the subcounters can be
suppressed as well.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

6e01781d

netfilter: ipset: The list:set type with counter support · de76303c

Jozsef Kadlecsik authored Apr 08, 2013

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

de76303c

netfilter: ipset: The hash types with counter support · 00d71b27

Jozsef Kadlecsik authored Apr 08, 2013

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

00d71b27

netfilter: ipset: The bitmap types with counter support · f48d19db

Jozsef Kadlecsik authored Apr 08, 2013

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

f48d19db

netfilter: ipset: Introduce the counter extension in the core · 34d666d4

Jozsef Kadlecsik authored Apr 27, 2013

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

34d666d4

netfilter: ipset: list:set type using the extension interface · 7d47d972

Jozsef Kadlecsik authored Apr 04, 2013

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

7d47d972

netfilter: ipset: Hash types using the unified code base · 5d50e1d8

Jozsef Kadlecsik authored Apr 08, 2013

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

5d50e1d8

netfilter: ipset: Unified hash type generation · 1feab10d

Jozsef Kadlecsik authored Apr 08, 2013

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

1feab10d

netfilter: ipset: Bitmap types using the unified code base · b0da3905

Jozsef Kadlecsik authored Apr 27, 2013

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

b0da3905

netfilter: ipset: Unified bitmap type generation · 4d73de38

Jozsef Kadlecsik authored Apr 08, 2013

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

4d73de38

netfilter: ipset: Introduce extensions to elements in the core · 075e64c0

Jozsef Kadlecsik authored Apr 27, 2013

Introduce extensions to elements in the core and prepare timeout as
the first one.

This patch also modifies the em_ipset classifier to use the new
extension struct layout.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

075e64c0

netfilter: ipset: Move often used IPv6 address masking function to header file · 8672d4d1
Jozsef Kadlecsik authored Apr 08, 2013
```
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
```
8672d4d1
netfilter: ipset: Make possible to test elements marked with nomatch · 43c56e59
Jozsef Kadlecsik authored Apr 08, 2013
```
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
```
43c56e59

25 Apr, 2013 24 commits

net: fix address check in rtnl_fdb_del · 37fe0660

Vlad Yasevich authored Apr 23, 2013

Commit 6681712d
	vxlan: generalize forwarding tables

relaxed the address checks in rtnl_fdb_del() to use is_zero_ether_addr().
This allows users to add multicast addresses using the fdb API.  However,
the check in rtnl_fdb_del() still uses a more strict
is_valid_ether_addr() which rejects multicast addresses.  Thus it
is possible to add an fdb that can not be later removed.
Relax the check in rtnl_fdb_del() as well.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

37fe0660

net/cpsw: fix irq_disable() with threaded interrupts · a11fbba9

Sebastian Siewior authored Apr 24, 2013

During high throughput it is likely that we receive both: an RX and TX
interrupt. The normal behaviour is that once we enter the ISR the
interrupts are disabled in the IRQ chip and so the ISR is invoked only
once and the interrupt line is disabled once. It will be re-enabled
after napi completes.
With threaded interrupts on the other hand the interrupt the interrupt
is disabled immediately and the ISR is marked for "later". By having TX
and RX interrupt marked pending we invoke them both and disable the
interrupt line twice. The napi callback is still executed once and so
after it completes we remain with interrupts disabled.

The initial patch simply removed the cpsw_{enable|disable}_irq() calls
and it worked well on my AM335X ES1.0 (beagle bone). On ES2.0 (beagle
bone black) it caused an never ending interrupt (even after the mask via
cpsw_intr_disable()) according to Mugunthan V N. Since I don't have the
ES2.0 and no idea what is going on this patch tracks the state of the
irq_disable() call and execute it only when not yet done.
The book keeping is done on the first struct since with dual_emac we can
have two of those and only one interrupt line.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a11fbba9

net/cpsw: optimize the for_each_slave_macro() · 6e6ceaed

Sebastian Siewior authored Apr 24, 2013

text    data     bss     dec     hex filename
15530      92       4   15626    3d0a cpsw.o.before
15478      92       4   15574    3cd6 cpsw.o.after

52 bytes smaller, 13 for each invocation.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6e6ceaed

net/cpsw: make sure modules remove does not leak any ressources · d1bd9acf

Sebastian Siewior authored Apr 24, 2013

This driver does not clean up properly after leaving. Here is a list:
- Use unregister_netdev(). free_netdev() is good but not enough
- Use the above also on the other ndev in case of dual mac
- Free data.slave_data. The name of the strucre makes it look like
  it is platform_data but it is not. It is just a trick!
- Free all irqs. Again: freeing one irq is good start, but freeing all
  of them is better.

With this rmmod & modprobe of cpsw seems to work. The remaining issue
is:
|WARNING: at fs/sysfs/dir.c:536 sysfs_add_one+0x9c/0xd4()
|sysfs: cannot create duplicate filename '/devices/ocp.2/4a100000.ethernet/4a101000.mdio'
|WARNING: at lib/kobject.c:196 kobject_add_internal+0x1a4/0x1c8()

comming from of_platform_populate() and I am not sure that this belongs
here.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d1bd9acf

net/ti: add MODULE_DEVICE_TABLE + MODULE_LICENSE · 4bc21d41

Sebastian Siewior authored Apr 24, 2013

If compiled as modules each one of these modules is missing something.
With this patch the modules are loaded on demand and don't taint the
kernel due to license issues.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4bc21d41

net/cpsw: redo rx skb allocation in rx path · b4727e69

Sebastian Siewior authored Apr 23, 2013

In case that we run into OOM during the allocation of the new rx-skb we
don't get one and we have one skb less than we used to have. If this
continues to happen then we end up with no rx-skbs at all.
This patch changes the following:
- if we fail to allocate the new skb, then we treat the currently
  completed skb as the new one and so drop the currently received data.
- instead of testing multiple times if the device is gone we rely one
  the status field which is set to -ENOSYS in case the channel is going
  down and incomplete requests are purged.
  cpdma_chan_stop() removes most of the packages with -ENOSYS. The
  currently active packet which is removed has the "tear down" bit set.
  So if that bit is set, we send ENOSYS as well otherwise we pass the
  status bits which are required to figure out which of the two possible
  just finished.
Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

b4727e69

net/davinci_cpdma: remove unused argument in cpdma_chan_submit() · aef614e1

Sebastian Siewior authored Apr 23, 2013

The gfp_mask argument is not used in cpdma_chan_submit() and always set
to GFP_KERNEL even in atomic sections. This patch drops it since it is
unused.
Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

aef614e1

net/cpsw: don't rely only on netif_running() to check which device is active · fd51cf19

Sebastian Siewior authored Apr 23, 2013

netif_running() reports false before the ->ndo_stop() callback is
called. That means if one executes "ifconfig down" and the system
receives an interrupt before the interrupt source has been disabled we
hang for always for two reasons:
- we never disable the interrupt source because devices claim to be
  already inactive and don't feel responsible.
- since the ISR always reports IRQ_HANDLED the line is never deactivated
  because it looks like the ISR feels responsible.

This patch changes the logic in the ISR a little:
- If none of the status registers reports an active source (RX or TX,
  misc is ignored because it is not actived) we leave with IRQ_NONE.
- the interrupt is deactivated
- The first active network device is taken and napi is scheduled. If
  none are active (a small race window between ndo_down() and the
  interrupt the) then we leave and should not come back because the
  source is off.
  There is no need to schedule the second NAPI because both share the
  same dma queue.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fd51cf19

net/cpsw: don't continue if we miss to allocate rx skbs · aacebbf8

Sebastian Siewior authored Apr 23, 2013

if during "ifconfig up" we run out of mem we continue regardless how
many skbs we got. In worst case we have zero RX skbs and can't ever
receive further packets since the RX skbs are never reallocated. If
cpdma_chan_submit() fails we even leak the skb.
This patch changes the behavior here:
If we fail to allocate an skb during bring up we don't continue and
report that error. Same goes for errors from cpdma_chan_submit().
While here I changed to __netdev_alloc_skb_ip_align() so GFP_KERNEL can
be used.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

aacebbf8

net/davinci_cpdma: don't check for jiffies with interrupts · 817f6d1a

Sebastian Siewior authored Apr 23, 2013

__cpdma_chan_process() holds the lock with interrupts off (and its
caller as well), same goes for cpdma_ctlr_start(). With interrupts off,
jiffies will not make any progress and if the wait condition never gets
true we wait for ever.
Tgis patch adds a a simple udelay and counting down attempt.
Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

817f6d1a

net: ipv4: typo issue, remove erroneous semicolon · 2bac7cb3

Chen Gang authored Apr 22, 2013

Need remove erroneous semicolon, which is found by EXTRA_CFLAGS=-W,
the related commit number: c5441932
("GRE: Refactor GRE tunneling code")
Signed-off-by: Chen Gang <gang.chen@asianux.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2bac7cb3

bnx2x, bnx2fc: Use per port max exchange resources · 0eb43b4b

Bhanu Prakash Gollapudi authored Apr 22, 2013

The firmware supports a maximum of 4K FCoE exchanges. In 4-port devices,
or when working in multi-function mode, this resource needs to be distributed
between the various possible FCoE functions.

This information needs to be calculated by bnx2x and propagated into bnx2fc
via cnic. bnx2fc can then use this value to calculate corresponding xid
resources instead of using global constants.
Signed-off-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0eb43b4b

net: fec: Enable imx6 enet checksum acceleration. · 4c09eed9

Jim Baxter authored Apr 19, 2013

Enables hardware generation of IP header and
protocol specific checksums for transmitted
packets.

Enabled hardware discarding of received packets with
invalid IP header or protocol specific checksums.

The feature is enabled by default but can be
enabled/disabled by ethtool.
Signed-off-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Jim Baxter <jim_baxter@mentor.com>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4c09eed9

net: calxedaxgmac: fix condition in xgmac_set_features() · cf62cb72

Dan Carpenter authored Apr 25, 2013

The "changed" variable should be a 64 bit type, otherwise it can't store
all the features. The way the code is now the test for whether
NETIF_F_RXCSUM changed is always false and we return immediately.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cf62cb72

openvswitch: Use parallel_ops genl. · 3a4e0d6a

Pravin B Shelar authored Apr 23, 2013

OVS locking was recently changed to have private OVS lock which
simplified overall locking.  Therefore there is no need to have
another global genl lock to protect OVS data structures.  Following
patch uses of parallel_ops genl family for OVS.  This also allows
more granual OVS locking using ovs_mutex for protecting OVS data
structures, which gives more concurrencey.  E.g multiple genl
operations OVS_PACKET_CMD_EXECUTE can run in parallel, etc.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3a4e0d6a

genl: Allow concurrent genl callbacks. · def31174

Pravin B Shelar authored Apr 23, 2013

All genl callbacks are serialized by genl-mutex. This can become
bottleneck in multi threaded case.
Following patch adds an parameter to genl_family so that a
particular family can get concurrent netlink callback without
genl_lock held.
New rw-sem is used to protect genl callback from genl family unregister.
in case of parallel_ops genl-family read-lock is taken for callbacks and
write lock is taken for register or unregistration for any family.
In case of locked genl family semaphore and gel-mutex is locked for
any openration.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

def31174

irda: irlmp_reasons[] can be static · 133b9424

Wu Fengguang authored Apr 19, 2013

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

133b9424

net: remove redundant code in dev_hard_start_xmit() · e12472dc

Eric Dumazet authored Apr 22, 2013

This reverts commit 068a2de5 (net: release dst entry while
cache-hot for GSO case too)

Before GSO packet segmentation, we already take care of skb->dst if it
can be released.

There is no point adding extra test for every segment in the gso loop.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e12472dc

packet: account statistics only in tpacket_stats_u · ee80fbf3

Daniel Borkmann authored Apr 19, 2013

Currently, packet_sock has a struct tpacket_stats stats member for
TPACKET_V1 and TPACKET_V2 statistic accounting, and with TPACKET_V3
``union tpacket_stats_u stats_u'' was introduced, where however only
statistics for TPACKET_V3 are held, and when copied to user space,
TPACKET_V3 does some hackery and access also tpacket_stats' stats,
although everything could have been done within the union itself.

Unify accounting within the tpacket_stats_u union so that we can
remove 8 bytes from packet_sock that are there unnecessary. Note that
even if we switch to TPACKET_V3 and would use non mmap(2)ed option,
this still works due to the union with same types + offsets, that are
exposed to the user space.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ee80fbf3

packet: reorder a member in packet_ring_buffer · 0578edc5

Daniel Borkmann authored Apr 19, 2013

There's a 4 byte hole in packet_ring_buffer structure before
prb_bdqc, that can be filled with 'pending' member, thus we can
reduce the overall structure size from 224 bytes to 216 bytes.
This also has the side-effect, that in struct packet_sock 2*4 byte
holes after the embedded packet_ring_buffer members are removed,
and overall, packet_sock can be reduced by 1 cacheline:

Before: size: 1344, cachelines: 21, members: 24
After:  size: 1280, cachelines: 20, members: 24
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0578edc5

Merge branch 'af_packet-timestamp' · 660f7d22

David S. Miller authored Apr 25, 2013

Daniel Borkmann says:

====================
This is a joint effort with Willem to bring optional i) tx hw/sw
timestamping into PF_PACKET, that was reported by Paul Chavent,
and ii) to expose the type of timestamp to the user, which is in
the current situation not possible to distinguish with the RX_RING
and TX_RING API (but distinguishable through the normal timestamping
API), reported by Richard Cochran. This set is based on top of
``packet: account statistics only in tpacket_stats_u''. Related
discussion can be found in: http://patchwork.ozlabs.org/patch/238125/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

660f7d22

packet: doc: update timestamping part · 2940b26b

Daniel Borkmann authored Apr 23, 2013

Bring the timestamping section in sync with the implementation.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2940b26b

packet: if hw/sw ts enabled in rx/tx ring, report which ts we got · b9c32fb2

Daniel Borkmann authored Apr 23, 2013

Currently, there is no way to find out which timestamp is reported in
tpacket{,2,3}_hdr's tp_sec, tp_{n,u}sec members. It can be one of
SOF_TIMESTAMPING_SYS_HARDWARE, SOF_TIMESTAMPING_RAW_HARDWARE,
SOF_TIMESTAMPING_SOFTWARE, or a fallback variant late call from the
PF_PACKET code in software.

Therefore, report in the tp_status member of the ring buffer which
timestamp has been reported for RX and TX path. This should not break
anything for the following reasons: i) in RX ring path, the user needs
to test for tp_status & TP_STATUS_USER, and later for other flags as
well such as TP_STATUS_VLAN_VALID et al, so adding other flags will
do no harm; ii) in TX ring path, time stamps with PACKET_TIMESTAMP
socketoption are not available resp. had no effect except that the
application setting this is buggy. Next to TP_STATUS_AVAILABLE, the
user also should check for other flags such as TP_STATUS_WRONG_FORMAT
to reclaim frames to the application. Thus, in case TX ts are turned
off (default case), nothing happens to the application logic, and in
case we want to use this new feature, we now can also check which of
the ts source is reported in the status field as provided in the docs.
Reported-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b9c32fb2

packet: minor: convert status bits into shifting format · 7276d5d7

Daniel Borkmann authored Apr 23, 2013

This makes it more readable and clearer what bits are still free to
use. The compiler reduces this to a constant for us anyway.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7276d5d7