- 29 Apr, 2013 16 commits
-
-
Florian Westphal authored
Once we allow userspace to receive gso/gro packets, userspace needs to be able to determine when checksums appear to be broken, but are not. NFQA_SKB_CSUMNOTREADY means 'checksums will be fixed in kernel later, pretend they are ok'. NFQA_SKB_GSO could be used for statistics, or to determine when packet size exceeds mtu. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Florian Westphal authored
skb_gso_segment is expensive, so it would be nice if we could avoid it in the future. However, userspace needs to be prepared to receive larger-than-mtu-packets (which will also have incorrect l3/l4 checksums), so we cannot simply remove it. The plan is to add a per-queue feature flag that userspace can set when binding the queue. The problem is that in nf_queue, we only have a queue number, not the queue context/configuration settings. This patch should have no impact other than the skb_gso_segment call now being in a function that has access to the queue config data. A new size attribute in nf_queue_entry is needed so nfnetlink_queue can duplicate the entry of the gso skb when segmenting the skb while also copying the route key. The follow up patch adds switch to disable skb_gso_segment when queue config says so. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Florian Westphal authored
required by future patch that will need to duplicate the nf_queue_entry, bumping refcounts of the copy. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Jozsef Kadlecsik authored
The new revision of the set match supports to match the counters and to suppress updating the counters at matching too. At the set:list types, the updating of the subcounters can be suppressed as well. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Jozsef Kadlecsik authored
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Jozsef Kadlecsik authored
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Jozsef Kadlecsik authored
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Jozsef Kadlecsik authored
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Jozsef Kadlecsik authored
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Jozsef Kadlecsik authored
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Jozsef Kadlecsik authored
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Jozsef Kadlecsik authored
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Jozsef Kadlecsik authored
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Jozsef Kadlecsik authored
Introduce extensions to elements in the core and prepare timeout as the first one. This patch also modifies the em_ipset classifier to use the new extension struct layout. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Jozsef Kadlecsik authored
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Jozsef Kadlecsik authored
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
- 25 Apr, 2013 24 commits
-
-
Vlad Yasevich authored
Commit 6681712d vxlan: generalize forwarding tables relaxed the address checks in rtnl_fdb_del() to use is_zero_ether_addr(). This allows users to add multicast addresses using the fdb API. However, the check in rtnl_fdb_del() still uses a more strict is_valid_ether_addr() which rejects multicast addresses. Thus it is possible to add an fdb that can not be later removed. Relax the check in rtnl_fdb_del() as well. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sebastian Siewior authored
During high throughput it is likely that we receive both: an RX and TX interrupt. The normal behaviour is that once we enter the ISR the interrupts are disabled in the IRQ chip and so the ISR is invoked only once and the interrupt line is disabled once. It will be re-enabled after napi completes. With threaded interrupts on the other hand the interrupt the interrupt is disabled immediately and the ISR is marked for "later". By having TX and RX interrupt marked pending we invoke them both and disable the interrupt line twice. The napi callback is still executed once and so after it completes we remain with interrupts disabled. The initial patch simply removed the cpsw_{enable|disable}_irq() calls and it worked well on my AM335X ES1.0 (beagle bone). On ES2.0 (beagle bone black) it caused an never ending interrupt (even after the mask via cpsw_intr_disable()) according to Mugunthan V N. Since I don't have the ES2.0 and no idea what is going on this patch tracks the state of the irq_disable() call and execute it only when not yet done. The book keeping is done on the first struct since with dual_emac we can have two of those and only one interrupt line. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sebastian Siewior authored
text data bss dec hex filename 15530 92 4 15626 3d0a cpsw.o.before 15478 92 4 15574 3cd6 cpsw.o.after 52 bytes smaller, 13 for each invocation. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sebastian Siewior authored
This driver does not clean up properly after leaving. Here is a list: - Use unregister_netdev(). free_netdev() is good but not enough - Use the above also on the other ndev in case of dual mac - Free data.slave_data. The name of the strucre makes it look like it is platform_data but it is not. It is just a trick! - Free all irqs. Again: freeing one irq is good start, but freeing all of them is better. With this rmmod & modprobe of cpsw seems to work. The remaining issue is: |WARNING: at fs/sysfs/dir.c:536 sysfs_add_one+0x9c/0xd4() |sysfs: cannot create duplicate filename '/devices/ocp.2/4a100000.ethernet/4a101000.mdio' |WARNING: at lib/kobject.c:196 kobject_add_internal+0x1a4/0x1c8() comming from of_platform_populate() and I am not sure that this belongs here. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sebastian Siewior authored
If compiled as modules each one of these modules is missing something. With this patch the modules are loaded on demand and don't taint the kernel due to license issues. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sebastian Siewior authored
In case that we run into OOM during the allocation of the new rx-skb we don't get one and we have one skb less than we used to have. If this continues to happen then we end up with no rx-skbs at all. This patch changes the following: - if we fail to allocate the new skb, then we treat the currently completed skb as the new one and so drop the currently received data. - instead of testing multiple times if the device is gone we rely one the status field which is set to -ENOSYS in case the channel is going down and incomplete requests are purged. cpdma_chan_stop() removes most of the packages with -ENOSYS. The currently active packet which is removed has the "tear down" bit set. So if that bit is set, we send ENOSYS as well otherwise we pass the status bits which are required to figure out which of the two possible just finished. Acked-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sebastian Siewior authored
The gfp_mask argument is not used in cpdma_chan_submit() and always set to GFP_KERNEL even in atomic sections. This patch drops it since it is unused. Acked-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sebastian Siewior authored
netif_running() reports false before the ->ndo_stop() callback is called. That means if one executes "ifconfig down" and the system receives an interrupt before the interrupt source has been disabled we hang for always for two reasons: - we never disable the interrupt source because devices claim to be already inactive and don't feel responsible. - since the ISR always reports IRQ_HANDLED the line is never deactivated because it looks like the ISR feels responsible. This patch changes the logic in the ISR a little: - If none of the status registers reports an active source (RX or TX, misc is ignored because it is not actived) we leave with IRQ_NONE. - the interrupt is deactivated - The first active network device is taken and napi is scheduled. If none are active (a small race window between ndo_down() and the interrupt the) then we leave and should not come back because the source is off. There is no need to schedule the second NAPI because both share the same dma queue. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sebastian Siewior authored
if during "ifconfig up" we run out of mem we continue regardless how many skbs we got. In worst case we have zero RX skbs and can't ever receive further packets since the RX skbs are never reallocated. If cpdma_chan_submit() fails we even leak the skb. This patch changes the behavior here: If we fail to allocate an skb during bring up we don't continue and report that error. Same goes for errors from cpdma_chan_submit(). While here I changed to __netdev_alloc_skb_ip_align() so GFP_KERNEL can be used. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sebastian Siewior authored
__cpdma_chan_process() holds the lock with interrupts off (and its caller as well), same goes for cpdma_ctlr_start(). With interrupts off, jiffies will not make any progress and if the wait condition never gets true we wait for ever. Tgis patch adds a a simple udelay and counting down attempt. Acked-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Chen Gang authored
Need remove erroneous semicolon, which is found by EXTRA_CFLAGS=-W, the related commit number: c5441932 ("GRE: Refactor GRE tunneling code") Signed-off-by: Chen Gang <gang.chen@asianux.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Bhanu Prakash Gollapudi authored
The firmware supports a maximum of 4K FCoE exchanges. In 4-port devices, or when working in multi-function mode, this resource needs to be distributed between the various possible FCoE functions. This information needs to be calculated by bnx2x and propagated into bnx2fc via cnic. bnx2fc can then use this value to calculate corresponding xid resources instead of using global constants. Signed-off-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jim Baxter authored
Enables hardware generation of IP header and protocol specific checksums for transmitted packets. Enabled hardware discarding of received packets with invalid IP header or protocol specific checksums. The feature is enabled by default but can be enabled/disabled by ethtool. Signed-off-by: Fugang Duan <B38611@freescale.com> Signed-off-by: Jim Baxter <jim_baxter@mentor.com> Reviewed-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Dan Carpenter authored
The "changed" variable should be a 64 bit type, otherwise it can't store all the features. The way the code is now the test for whether NETIF_F_RXCSUM changed is always false and we return immediately. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Pravin B Shelar authored
OVS locking was recently changed to have private OVS lock which simplified overall locking. Therefore there is no need to have another global genl lock to protect OVS data structures. Following patch uses of parallel_ops genl family for OVS. This also allows more granual OVS locking using ovs_mutex for protecting OVS data structures, which gives more concurrencey. E.g multiple genl operations OVS_PACKET_CMD_EXECUTE can run in parallel, etc. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Pravin B Shelar authored
All genl callbacks are serialized by genl-mutex. This can become bottleneck in multi threaded case. Following patch adds an parameter to genl_family so that a particular family can get concurrent netlink callback without genl_lock held. New rw-sem is used to protect genl callback from genl family unregister. in case of parallel_ops genl-family read-lock is taken for callbacks and write lock is taken for register or unregistration for any family. In case of locked genl family semaphore and gel-mutex is locked for any openration. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Wu Fengguang authored
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Acked-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
This reverts commit 068a2de5 (net: release dst entry while cache-hot for GSO case too) Before GSO packet segmentation, we already take care of skb->dst if it can be released. There is no point adding extra test for every segment in the gso loop. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
Currently, packet_sock has a struct tpacket_stats stats member for TPACKET_V1 and TPACKET_V2 statistic accounting, and with TPACKET_V3 ``union tpacket_stats_u stats_u'' was introduced, where however only statistics for TPACKET_V3 are held, and when copied to user space, TPACKET_V3 does some hackery and access also tpacket_stats' stats, although everything could have been done within the union itself. Unify accounting within the tpacket_stats_u union so that we can remove 8 bytes from packet_sock that are there unnecessary. Note that even if we switch to TPACKET_V3 and would use non mmap(2)ed option, this still works due to the union with same types + offsets, that are exposed to the user space. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
There's a 4 byte hole in packet_ring_buffer structure before prb_bdqc, that can be filled with 'pending' member, thus we can reduce the overall structure size from 224 bytes to 216 bytes. This also has the side-effect, that in struct packet_sock 2*4 byte holes after the embedded packet_ring_buffer members are removed, and overall, packet_sock can be reduced by 1 cacheline: Before: size: 1344, cachelines: 21, members: 24 After: size: 1280, cachelines: 20, members: 24 Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Daniel Borkmann says: ==================== This is a joint effort with Willem to bring optional i) tx hw/sw timestamping into PF_PACKET, that was reported by Paul Chavent, and ii) to expose the type of timestamp to the user, which is in the current situation not possible to distinguish with the RX_RING and TX_RING API (but distinguishable through the normal timestamping API), reported by Richard Cochran. This set is based on top of ``packet: account statistics only in tpacket_stats_u''. Related discussion can be found in: http://patchwork.ozlabs.org/patch/238125/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
Bring the timestamping section in sync with the implementation. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
Currently, there is no way to find out which timestamp is reported in tpacket{,2,3}_hdr's tp_sec, tp_{n,u}sec members. It can be one of SOF_TIMESTAMPING_SYS_HARDWARE, SOF_TIMESTAMPING_RAW_HARDWARE, SOF_TIMESTAMPING_SOFTWARE, or a fallback variant late call from the PF_PACKET code in software. Therefore, report in the tp_status member of the ring buffer which timestamp has been reported for RX and TX path. This should not break anything for the following reasons: i) in RX ring path, the user needs to test for tp_status & TP_STATUS_USER, and later for other flags as well such as TP_STATUS_VLAN_VALID et al, so adding other flags will do no harm; ii) in TX ring path, time stamps with PACKET_TIMESTAMP socketoption are not available resp. had no effect except that the application setting this is buggy. Next to TP_STATUS_AVAILABLE, the user also should check for other flags such as TP_STATUS_WRONG_FORMAT to reclaim frames to the application. Thus, in case TX ts are turned off (default case), nothing happens to the application logic, and in case we want to use this new feature, we now can also check which of the ts source is reported in the status field as provided in the docs. Reported-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
This makes it more readable and clearer what bits are still free to use. The compiler reduces this to a constant for us anyway. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-