- 19 Mar, 2010 2 commits
-
-
Jiri Pirko authored
Since generally there could be more netdevices changing type other than bonding, making this event type name "bonding-unrelated" Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tom Herbert authored
Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 18 Mar, 2010 1 commit
-
-
Rafael J. Wysocki authored
The recent PCI runtime PM patch broke build for CONFIG_PM_RUNTIME and CONFIG_PM_SLEEP undefined. Fix that by moving the PM callbacks under suitable #ifdefs. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 17 Mar, 2010 28 commits
-
-
David S. Miller authored
Signed-off-by: David S. Miller <davem@davemloft.net>
-
Joe Perches authored
Convert DPRINTK, commonly used for debugging, to netif_<level> Remove #define PFX Use #define pr_fmt Consistently use no periods for non-sentence logging messages Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jason Gunthorpe authored
IEEE 802.3ae clause 45 specifies a somewhat modified MDIO protocol for use by 10GIGE phys. The main change is a 21 bit address split into a 5 bit device ID and a 16 bit register offset. The definition is designed so that normal and extended devices can run on the same MDIO bus. Extend mdio-bitbang to do the new protocol. At the MDIO bus level the protocol is requested by or'ing MII_ADDR_C45 into the register offset. Make phy_read/phy_write/etc pass a full 32 bit register offset. This does not attempt to make the phy layer support C45 style PHYs, just to provide the MDIO bus support. Tested against a Broadcom 10GE phy with ID 0x206034, and several Broadcom 10/100/1000 Phys in normal mode. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Rafael J. Wysocki authored
Use the PCI runtime power management framework to add basic PCI runtime PM support to the e1000e driver. Namely, make the driver suspend the device when the link is off and set it up for generating a wakeup event after the link has been detected again. [This feature is disabled until the user space enables it with the help of the /sys/devices/.../power/contol device attribute.] Based on a patch from Matthew Garrett. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Rafael J. Wysocki authored
Use the PCI runtime power management framework to add basic PCI runtime PM support to the r8169 driver. Namely, make the driver suspend the device when the link is not present and set it up for generating a wakeup event after the link has been detected again. [This feature is disabled until the user space enables it with the help of the /sys/devices/.../power/contol device attribute.] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
In mlx4, using char * to store mc address in private structure instead. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Joe Perches authored
I'm not sure this is correct. It changes logging macros from: dev_<level>(&ks->spidev->dev, to netdev_<level>(ks->netdev, Comments? Use netdev_<level> Use netif_<level> Use pr_<level> Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt Add missing line to message in ks8851_remove Change kmalloc/memset(,0) to kzalloc Remove ks_<level> macros Consolidation code into set_media_state Signed-off-by: David S. Miller <davem@davemloft.net>
-
Neil Horman authored
Forward port commit fc477e160af086f6e30c3d4fdf5f5c000d29beb5 from git://tipc.cslab.ericsson.net/pub/git/people/allan/tipc.git Origional commit message: Allow retransmission of cloned buffers This patch fixes an issue with TIPC's message retransmission logic that prevented retransmission of clone sk_buffs. Originally intended as a means of avoiding wasted work in retransmitting messages that were still on the driver's outbound queue, it also prevented TIPC from retransmitting messages through other means -- such as the secondary bearer of the broadcast link, or another interface in a set of bonded interfaces. This fix removes existing checks for cloned sk_buffs that prevented such retransmission. Origionally-Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Neil Horman authored
Forward port commit 29eb572941501c40ac6e62dbc5043bf9ee76ee56 from git://tipc.cslab.ericsson.net/pub/git/people/allan/tipc.git Origional commit message: Increase frequency of load distribution over broadcast link This patch enhances the behavior of TIPC's broadcast link so that it alternates between redundant bearers (if available) after every message sent, rather than after every 10 messages. This change helps to speed up delivery of retransmitted messages by ensuring that they are not sent repeatedly over a bearer that is no longer working, but not yet recognized as failed. Tested by myself in the latest net-2.6 tree using the tipc sanity test suite Origionally-signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Neil Horman <nhorman@tuxdriver.com> bcast.c | 35 ++++++++++++++--------------------- 1 file changed, 14 insertions(+), 21 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jan Engelhardt authored
`ip -s link` shows interface counters truncated to 32 bit. This is because interface statistics are transported only in 32-bit quantity to userspace. This commit adds a new IFLA_STATS64 attribute that exports them in full 64 bit. References: http://lkml.indiana.edu/hypermail/linux/kernel/0307.3/0215.htmlSigned-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jan Engelhardt authored
Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jan Engelhardt authored
Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
We hold RTNL at this point and dont use RCU variants of list traversals, we dont need rcu_read_lock()/rcu_read_unlock() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
stephen hemminger authored
The shared packet statistics are a potential source of slow down on bridged traffic. Convert to per-cpu array, but only keep those statistics which change per-packet. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tom Herbert authored
This patch implements software receive side packet steering (RPS). RPS distributes the load of received packet processing across multiple CPUs. Problem statement: Protocol processing done in the NAPI context for received packets is serialized per device queue and becomes a bottleneck under high packet load. This substantially limits pps that can be achieved on a single queue NIC and provides no scaling with multiple cores. This solution queues packets early on in the receive path on the backlog queues of other CPUs. This allows protocol processing (e.g. IP and TCP) to be performed on packets in parallel. For each device (or each receive queue in a multi-queue device) a mask of CPUs is set to indicate the CPUs that can process packets. A CPU is selected on a per packet basis by hashing contents of the packet header (e.g. the TCP or UDP 4-tuple) and using the result to index into the CPU mask. The IPI mechanism is used to raise networking receive softirqs between CPUs. This effectively emulates in software what a multi-queue NIC can provide, but is generic requiring no device support. Many devices now provide a hash over the 4-tuple on a per packet basis (e.g. the Toeplitz hash). This patch allow drivers to set the HW reported hash in an skb field, and that value in turn is used to index into the RPS maps. Using the HW generated hash can avoid cache misses on the packet when steering it to a remote CPU. The CPU mask is set on a per device and per queue basis in the sysfs variable /sys/class/net/<device>/queues/rx-<n>/rps_cpus. This is a set of canonical bit maps for receive queues in the device (numbered by <n>). If a device does not support multi-queue, a single variable is used for the device (rx-0). Generally, we have found this technique increases pps capabilities of a single queue device with good CPU utilization. Optimal settings for the CPU mask seem to depend on architectures and cache hierarcy. Below are some results running 500 instances of netperf TCP_RR test with 1 byte req. and resp. Results show cumulative transaction rate and system CPU utilization. e1000e on 8 core Intel Without RPS: 108K tps at 33% CPU With RPS: 311K tps at 64% CPU forcedeth on 16 core AMD Without RPS: 156K tps at 15% CPU With RPS: 404K tps at 49% CPU bnx2x on 16 core AMD Without RPS 567K tps at 61% CPU (4 HW RX queues) Without RPS 738K tps at 96% CPU (8 HW RX queues) With RPS: 854K tps at 76% CPU (4 HW RX queues) Caveats: - The benefits of this patch are dependent on architecture and cache hierarchy. Tuning the masks to get best performance is probably necessary. - This patch adds overhead in the path for processing a single packet. In a lightly loaded server this overhead may eliminate the advantages of increased parallelism, and possibly cause some relative performance degradation. We have found that masks that are cache aware (share same caches with the interrupting CPU) mitigate much of this. - The RPS masks can be changed dynamically, however whenever the mask is changed this introduces the possibility of generating out of order packets. It's probably best not change the masks too frequently. Signed-off-by: Tom Herbert <therbert@google.com> include/linux/netdevice.h | 32 ++++- include/linux/skbuff.h | 3 + net/core/dev.c | 335 +++++++++++++++++++++++++++++++++++++-------- net/core/net-sysfs.c | 225 ++++++++++++++++++++++++++++++- net/core/skbuff.c | 2 + 5 files changed, 538 insertions(+), 59 deletions(-) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tina Yang authored
Create per-cpu workqueue threads instead of a single krdsd thread. This is a step towards better scalability. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andy Grover authored
set_page_dirty() unconditionally re-enables interrupts, so if we call it with irqs off, they will be on after the call, and that's bad. This patch moves the call after we've re-enabled interrupts in send_drop_to(), so it's safe. Also, add BUG_ONs to let us know if we ever do call set_page_dirty with interrupts off. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sherman Pun authored
If the RDMA op has aborted with a remote access error, in addition to what we already do (tell userspace it has completed with an error) also unmap it and put() the rm. Otherwise, hangs may occur on arches that track maps and will not exit without proper cleanup. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andy Grover authored
rds_poll_waitq's listeners will be awoken if we receive a congestion notification. Bad performance may result because *all* polled sockets contend for this single lock. However, it should not be necessary to wake pollers when a congestion update arrives if they have never experienced congestion, and not putting these on the waitq will hopefully greatly reduce contention. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tina Yang authored
It seems rds_send_drop_to() called __rds_rdma_send_complete(rs, rm, RDS_RDMA_CANCELED) with only rds_sock lock, but not rds_message lock. It raced with other threads that is attempting to modify the rds_message as well, such as from within rds_rdma_send_complete(). Signed-off-by: Tina Yang <tina.yang@oracle.com> Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andy Grover authored
RDS's error messages when a connection goes down are a little extreme. A connection may go down, and it will be re-established, and everything is fine. This patch links these messages through rdsdebug(), instead of to printk directly. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andy Grover authored
if a machine is shut down without closing sockets properly, and freeing all MRs, then a BUG_ON will bring it down. This patch changes these to WARN_ONs -- leaking MRs is not fatal (although not ideal, and there is more work to do here for a proper fix.) Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tina Yang authored
Fix a deadlock between rds_rdma_send_complete() and rds_send_remove_from_sock() when rds socket lock and rds message lock are acquired out-of-order. Signed-off-by: Tina Yang <Tina.Yang@oracle.com> Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andy Grover authored
We have two kinds of loopback: software (via loop transport) and hardware (via IB). sw is used for 127.0.0.1, and doesn't support rdma ops. hw is used for sends to local device IPs, and supports rdma. Both are used in different cases. For both of these, when there is a congestion map update, we want to call rds_cong_map_updated() but not actually send anything -- since loopback local and foreign congestion maps point to the same spot, they're already in sync. The old code never called sw loop's xmit_cong_map(),so rds_cong_map_updated() wasn't being called for it. sw loop ports would not work right with the congestion monitor. Fixing that meant that hw loopback now would send congestion maps to itself. This is also undesirable (racy), so we check for this case in the ib-specific xmit code. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andy Grover authored
Instead of waking the send thread whenever any send space is available, wait until it is at least half empty. This is modeled on how sock_def_write_space() does it, and may help to minimize context switches. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andy Grover authored
Other transports use rds_page_copy_user, which updates our s_copy_to_user counter. TCP doesn't, so it needs to explicity call rds_stats_add(). Reported-by: Richard Frank <richard.frank@oracle.com> Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andy Grover authored
Most likely cut n paste error - sendmsg() was checking sock_rcvtimeo. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andy Grover authored
BUGging on a runtime error code should be avoided. This patch also eliminates all other BUG()s that have no real reason to exist. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 16 Mar, 2010 9 commits
-
-
David S. Miller authored
Otherwise we get a warning from the call in br_forward(). Signed-off-by: David S. Miller <davem@davemloft.net>
-
YOSHIFUJI Hideaki / 吉藤英明 authored
Without CONFIG_BRIDGE_IGMP_SNOOPING, BR_INPUT_SKB_CB(skb)->mrouters_only is not appropriately initialized, so we can see garbage. A clear option to fix this is to set it even without that config, but we cannot optimize out the branch. Let's introduce a macro that returns value of mrouters_only and let it return 0 without CONFIG_BRIDGE_IGMP_SNOOPING. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vitaliy Gusev authored
route: Fix caught BUG_ON during rt_secret_rebuild_oneshot() Call rt_secret_rebuild can cause BUG_ON(timer_pending(&net->ipv4.rt_secret_timer)) in add_timer as there is not any synchronization for call rt_secret_rebuild_oneshot() for the same net namespace. Also this issue affects to rt_secret_reschedule(). Thus use mod_timer enstead. Signed-off-by: Vitaliy Gusev <vgusev@openvz.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
YOSHIFUJI Hideaki / 吉藤英明 authored
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
YOSHIFUJI Hideaki / 吉藤英明 authored
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Slaby authored
Stanse found that one error path in netpoll_setup dereferences npinfo even though it is NULL. Avoid that by adding new label and go to that instead. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Daniel Borkmann <danborkmann@googlemail.com> Cc: David S. Miller <davem@davemloft.net> Acked-by: chavey@google.com Acked-by: Matt Mackall <mpm@selenic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Neil Horman authored
So in the forward porting of various tipc packages, I was constantly getting this lockdep warning everytime I used tipc-config to set a network address for the protocol: [ INFO: possible circular locking dependency detected ] 2.6.33 #1 tipc-config/1326 is trying to acquire lock: (ref_table_lock){+.-...}, at: [<ffffffffa0315148>] tipc_ref_discard+0x53/0xd4 [tipc] but task is already holding lock: (&(&entry->lock)->rlock#2){+.-...}, at: [<ffffffffa03150d5>] tipc_ref_lock+0x43/0x63 [tipc] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&(&entry->lock)->rlock#2){+.-...}: [<ffffffff8107b508>] __lock_acquire+0xb67/0xd0f [<ffffffff8107b78c>] lock_acquire+0xdc/0x102 [<ffffffff8145471e>] _raw_spin_lock_bh+0x3b/0x6e [<ffffffffa03152b1>] tipc_ref_acquire+0xe8/0x11b [tipc] [<ffffffffa031433f>] tipc_createport_raw+0x78/0x1b9 [tipc] [<ffffffffa031450b>] tipc_createport+0x8b/0x125 [tipc] [<ffffffffa030f221>] tipc_subscr_start+0xce/0x126 [tipc] [<ffffffffa0308fb2>] process_signal_queue+0x47/0x7d [tipc] [<ffffffff81053e0c>] tasklet_action+0x8c/0xf4 [<ffffffff81054bd8>] __do_softirq+0xf8/0x1cd [<ffffffff8100aadc>] call_softirq+0x1c/0x30 [<ffffffff810549f4>] _local_bh_enable_ip+0xb8/0xd7 [<ffffffff81054a21>] local_bh_enable_ip+0xe/0x10 [<ffffffff81454d31>] _raw_spin_unlock_bh+0x34/0x39 [<ffffffffa0308eb8>] spin_unlock_bh.clone.0+0x15/0x17 [tipc] [<ffffffffa0308f47>] tipc_k_signal+0x8d/0xb1 [tipc] [<ffffffffa0308dd9>] tipc_core_start+0x8a/0xad [tipc] [<ffffffffa01b1087>] 0xffffffffa01b1087 [<ffffffff8100207d>] do_one_initcall+0x72/0x18a [<ffffffff810872fb>] sys_init_module+0xd8/0x23a [<ffffffff81009b42>] system_call_fastpath+0x16/0x1b -> #0 (ref_table_lock){+.-...}: [<ffffffff8107b3b2>] __lock_acquire+0xa11/0xd0f [<ffffffff8107b78c>] lock_acquire+0xdc/0x102 [<ffffffff81454836>] _raw_write_lock_bh+0x3b/0x6e [<ffffffffa0315148>] tipc_ref_discard+0x53/0xd4 [tipc] [<ffffffffa03141ee>] tipc_deleteport+0x40/0x119 [tipc] [<ffffffffa0316e35>] release+0xeb/0x137 [tipc] [<ffffffff8139dbf4>] sock_release+0x1f/0x6f [<ffffffff8139dc6b>] sock_close+0x27/0x2b [<ffffffff811116f6>] __fput+0x12a/0x1df [<ffffffff811117c5>] fput+0x1a/0x1c [<ffffffff8110e49b>] filp_close+0x68/0x72 [<ffffffff8110e552>] sys_close+0xad/0xe7 [<ffffffff81009b42>] system_call_fastpath+0x16/0x1b Finally decided I should fix this. Its a straightforward inversion, tipc_ref_acquire takes two locks in this order: ref_table_lock entry->lock while tipc_deleteport takes them in this order: entry->lock (via tipc_port_lock()) ref_table_lock (via tipc_ref_discard()) when the same entry is referenced, we get the above warning. The fix is equally straightforward. Theres no real relation between the entry->lock and the ref_table_lock (they just are needed at the same time), so move the entry->lock aquisition in tipc_ref_acquire down, after we unlock ref_table_lock (this is safe since the ref_table_lock guards changes to the reference table, and we've already claimed a slot there. I've tested the below fix and confirmed that it clears up the lockdep issue Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
James Chapman authored
This patch fixes UDP socket refcnt bugs in the pppol2tp driver. A bug can cause a kernel stack trace when a tunnel socket is closed. A way to reproduce the issue is to prepare the UDP socket for L2TP (by opening a tunnel pppol2tp socket) and then close it before any L2TP sessions are added to it. The sequence is Create UDP socket Create tunnel pppol2tp socket to prepare UDP socket for L2TP pppol2tp_connect: session_id=0, peer_session_id=0 L2TP SCCRP control frame received (tunnel_id==0) pppol2tp_recv_core: sock_hold() pppol2tp_recv_core: sock_put L2TP ZLB control frame received (tunnel_id=nnn) pppol2tp_recv_core: sock_hold() pppol2tp_recv_core: sock_put Close tunnel management socket pppol2tp_release: session_id=0, peer_session_id=0 Close UDP socket udp_lib_close: BUG The addition of sock_hold() in pppol2tp_connect() solves the problem. For data frames, two sock_put() calls were added to plug a refcnt leak per received data frame. The ref that is grabbed at the top of pppol2tp_recv_core() must always be released, but this wasn't done for accepted data frames or data frames discarded because of bad UDP checksums. This leak meant that any UDP socket that had passed L2TP data traffic (i.e. L2TP data frames, not just L2TP control frames) using pppol2tp would not be released by the kernel. WARNING: at include/net/sock.h:435 udp_lib_unhash+0x117/0x120() Pid: 1086, comm: openl2tpd Not tainted 2.6.33-rc1 #8 Call Trace: [<c119e9b7>] ? udp_lib_unhash+0x117/0x120 [<c101b871>] ? warn_slowpath_common+0x71/0xd0 [<c119e9b7>] ? udp_lib_unhash+0x117/0x120 [<c101b8e3>] ? warn_slowpath_null+0x13/0x20 [<c119e9b7>] ? udp_lib_unhash+0x117/0x120 [<c11598a7>] ? sk_common_release+0x17/0x90 [<c11a5e33>] ? inet_release+0x33/0x60 [<c11577b0>] ? sock_release+0x10/0x60 [<c115780f>] ? sock_close+0xf/0x30 [<c106e542>] ? __fput+0x52/0x150 [<c106b68e>] ? filp_close+0x3e/0x70 [<c101d2e2>] ? put_files_struct+0x62/0xb0 [<c101eaf7>] ? do_exit+0x5e7/0x650 [<c1081623>] ? mntput_no_expire+0x13/0x70 [<c106b68e>] ? filp_close+0x3e/0x70 [<c101eb8a>] ? do_group_exit+0x2a/0x70 [<c101ebe1>] ? sys_exit_group+0x11/0x20 [<c10029b0>] ? sysenter_do_call+0x12/0x26 Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Steve Glendinning authored
This patch ensures the PHY correctly completes its reset before setting register values. Signed-off-by: Steve Glendinning <steve.glendinning@smsc.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-