Commits · 643566d4b47e2956110e79c0e6f65db9b9ea42c6 · Kirill Smelkov / linux

18 Oct, 2014 7 commits

tipc: fix bug in bundled buffer reception · 643566d4

Jon Paul Maloy authored Oct 17, 2014

In commit ec8a2e56 ("tipc: same receive
code path for connection protocol and data messages") we omitted the
the possiblilty that an arriving message extracted from a bundle buffer
may be a multicast message. Such messages need to be to be delivered to
the socket via a separate function, tipc_sk_mcast_rcv(). As a result,
small multicast messages arriving as members of a bundle buffer will be
silently dropped.

This commit corrects the error by considering this case in the function
tipc_link_bundle_rcv().
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

643566d4

ipv6: introduce tcp_v6_iif() · 870c3151

Eric Dumazet authored Oct 17, 2014

Commit 971f10ec ("tcp: better TCP_SKB_CB layout to reduce cache line
misses") added a regression for SO_BINDTODEVICE on IPv6.

This is because we still use inet6_iif() which expects that IP6 control
block is still at the beginning of skb->cb[]

This patch adds tcp_v6_iif() helper and uses it where necessary.

Because __inet6_lookup_skb() is used by TCP and DCCP, we add an iif
parameter to it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 971f10ec ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

870c3151

sfc: add support for skb->xmit_more · 70b33fb0

Edward Cree authored Oct 17, 2014

Don't ring the doorbell, and don't do PIO.  This will also prevent
 TX Push, because there will be more than one buffer waiting when
 the doorbell is rung.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

70b33fb0

r8152: return -EBUSY for runtime suspend · 6cc69f2a

hayeswang authored Oct 17, 2014

Remove calling cancel_delayed_work_sync() for runtime suspend,
because it would cause dead lock. Instead, return -EBUSY to
avoid the device enters suspending if the net is running and
the delayed work is pending or running. The delayed work would
try to wake up the device later, so the suspending is not
necessary.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6cc69f2a

ipv4: fix a potential use after free in fou.c · d8f00d27

Li RongQing authored Oct 17, 2014

pskb_may_pull() maybe change skb->data and make uh pointer oboslete,
so reload uh and guehdr

Fixes: 37dd0247 ("gue: Receive side for Generic UDP Encapsulation")
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d8f00d27

ipv4: fix a potential use after free in ip_tunnel_core.c · 1245dfc8

Li RongQing authored Oct 17, 2014

pskb_may_pull() maybe change skb->data and make eth pointer oboslete,
so set eth after pskb_may_pull()

Fixes:3d7b46cd("ip_tunnel: push generic protocol handling to ip_tunnel module")
Cc: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1245dfc8

hyperv: Add handling of IP header with option field in netvsc_set_hash() · f88e6714

Haiyang Zhang authored Oct 16, 2014

In case that the IP header has optional field at the end, this patch will
get the port numbers after that field, and compute the hash. The general
parser skb_flow_dissect() is used here.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f88e6714

17 Oct, 2014 8 commits

openvswitch: Create right mask with disabled megaflows · f47de068

Pravin B Shelar authored Oct 16, 2014

If megaflows are disabled, the userspace does not send the netlink attribute
OVS_FLOW_ATTR_MASK, and the kernel must create an exact match mask.

sw_flow_mask_set() sets every bytes (in 'range') of the mask to 0xff, even the
bytes that represent padding for struct sw_flow, or the bytes that represent
fields that may not be set during ovs_flow_extract().
This is a problem, because when we extract a flow from a packet,
we do not memset() anymore the struct sw_flow to 0.

This commit gets rid of sw_flow_mask_set() and introduces mask_set_nlattr(),
which operates on the netlink attributes rather than on the mask key. Using
this approach we are sure that only the bytes that the user provided in the
flow are matched.

Also, if the parse_flow_mask_nlattrs() for the mask ENCAP attribute fails, we
now return with an error.

This bug is introduced by commit 07148121
("openvswitch: Eliminate memset() from flow_extract").
Reported-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f47de068

vxlan: fix a free after use · 7a9f526f

Li RongQing authored Oct 17, 2014

pskb_may_pull maybe change skb->data and make eth pointer oboslete,
so eth needs to reload

Fixes: 91269e39 ("vxlan: using pskb_may_pull as early as possible")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7a9f526f

openvswitch: fix a use after free · 389f4894

Li RongQing authored Oct 17, 2014

pskb_may_pull() called by arphdr_ok can change skb->data, so put the arp
setting after arphdr_ok to avoid the use the freed memory

Fixes: 07148121 ("openvswitch: Eliminate memset() from flow_extract.")
Cc: Jesse Gross <jesse@nicira.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

389f4894

ipv4: dst_entry leak in ip_send_unicast_reply() · 4062090e

Vasily Averin authored Oct 15, 2014

ip_setup_cork() called inside ip_append_data() steals dst entry from rt to cork
and in case errors in __ip_append_data() nobody frees stolen dst entry

Fixes: 2e77d89b ("net: avoid a pair of dst_hold()/dst_release() in ip_append_data()")
Signed-off-by: Vasily Averin <vvs@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4062090e

ipv4: clean up cookie_v4_check() · 461b74c3

Cong Wang authored Oct 15, 2014

We can retrieve opt from skb, no need to pass it as a parameter.
And opt should always be non-NULL, no need to check.

Cc: Krzysztof Kolasa <kkolasa@winsoft.pl>
Cc: Eric Dumazet <edumazet@google.com>
Tested-by: Krzysztof Kolasa <kkolasa@winsoft.pl>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

461b74c3

ipv4: share tcp_v4_save_options() with cookie_v4_check() · e25f866f

Cong Wang authored Oct 15, 2014

cookie_v4_check() allocates ip_options_rcu in the same way
with tcp_v4_save_options(), we can just make it a helper function.

Cc: Krzysztof Kolasa <kkolasa@winsoft.pl>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e25f866f

ipv4: call __ip_options_echo() in cookie_v4_check() · 2077eebf

Cong Wang authored Oct 15, 2014

commit 971f10ec ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
missed that cookie_v4_check() still calls ip_options_echo() which uses
IPCB(). It should use TCPCB() at TCP layer, so call __ip_options_echo()
instead.

Fixes: commit 971f10ec ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
Cc: Krzysztof Kolasa <kkolasa@winsoft.pl>
Cc: Eric Dumazet <edumazet@google.com>
Reported-by: Krzysztof Kolasa <kkolasa@winsoft.pl>
Tested-by: Krzysztof Kolasa <kkolasa@winsoft.pl>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2077eebf

atm: simplify lanai.c by using module_pci_driver · b7983e3f

Michael Opdenacker authored Oct 15, 2014

This simplifies the lanai.c driver by using
the module_pci_driver() macro, at the expense
of losing only debugging messages.
Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b7983e3f

16 Oct, 2014 10 commits

netlink: fix description of portid · 2c6ba4b1

Nicolas Dichtel authored Oct 16, 2014

Avoid confusion between pid and portid.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2c6ba4b1

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net · 3331177c

David S. Miller authored Oct 16, 2014

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2014-10-16

This series contains updates to fm10k and ixgbe.

Matthew provides two fixes for fm10k, first sets the flag to fetch the
host state before kicking off the service task that reads the host
state when bringing the interface up.  The second makes sure that we
release the mailbox lock after detecting an error and before we return
the error code.

Andy Zhou provides a compile fix for fm10k, when the driver is compiled
into the kernel and the VXLAN driver is compiled as a module.

Emil provides a fix for ixgbe to prevent against a panic by trying
to dereference a NULL pointer in ixgbe_ndo_set_vf_spoofchk().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

3331177c

ixgbe: check for vfs outside of sriov_num_vfs before dereference · 600a507d

Emil Tantilov authored Oct 16, 2014

The check for vfinfo is not sufficient because it does not protect
against specifying vf that is outside of sriov_num_vfs range.
All of the ndo functions have a check for it except for
ixgbevf_ndo_set_spoofcheck().

The following patch is all we need to protect against this panic:

ip link set p96p1 vf 0 spoofchk off
BUG: unable to handle kernel NULL pointer dereference at 0000000000000052
IP: [<ffffffffa044a1c1>]
ixgbe_ndo_set_vf_spoofchk+0x51/0x150 [ixgbe]
Reported-by: Thierry Herbelot <thierry.herbelot@6wind.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Acked-by: Thierry Herbelot <thierry.herbelot@6wind.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

600a507d

fm10k: Add CONFIG_FM10K_VXLAN configuration option · f6b03c10

Andy Zhou authored Oct 04, 2014

Compiling with CONFIG_FM10K=y and VXLAN=m resulting in linking error:

   drivers/built-in.o: In function `fm10k_open':
   (.text+0x1f9d7a): undefined reference to `vxlan_get_rx_port'
   make: *** [vmlinux] Error 1

The fix follows the same strategy as I40E.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

f6b03c10

fm10k: Unlock mailbox on VLAN addition failures · 13cb2dad

Matthew Vick authored Oct 03, 2014

After grabbing the mailbox lock and detecting an error, the lock must be
released before the error code can be returned.
Signed-off-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

13cb2dad

fm10k: Check the host state when bringing the interface up · 4d419156

Matthew Vick authored Oct 02, 2014

Set the flag to fetch the host state before kicking off the service task
that reads the host state when bringing the interface back up.
Signed-off-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

4d419156

vxlan: using pskb_may_pull as early as possible · 91269e39

Li RongQing authored Oct 16, 2014

pskb_may_pull should be used to check if skb->data has enough space,
skb->len can not ensure that.

Cc: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

91269e39

vxlan: fix a use after free in vxlan_encap_bypass · ce6502a8

Li RongQing authored Oct 16, 2014

when netif_rx() is done, the netif_rx handled skb maybe be freed,
and should not be used.
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ce6502a8

openvswitch: use vport instead of p · 4e8febd0

Fabian Frederick authored Oct 15, 2014

All functions used struct vport *vport except
ovs_vport_find_upcall_portid.

This fixes 1 kerneldoc warning
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4e8febd0

openvswitch: kerneldoc warning fix · 7e78cc46

Fabian Frederick authored Oct 15, 2014

s/sock/gs
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7e78cc46

15 Oct, 2014 11 commits

gianfar: Add FCS to rx buffer size (fix) · f5b720b8

Claudiu Manoil authored Oct 15, 2014

For each Rx frame the eTSEC writes its FCS (Frame Check Sequence)
to the Rx buffer.

The eTSEC h/w manual states in the "Receive Buffer Descriptor Field
Descriptions" table:
"Data length is the number of octets written by the eTSEC into this BD's
data buffer if L is cleared (the value is equal to MRBLR), or, if L is
set, the length of the frame including *CRC*, FCB (if RCTRL[PRSDEP > 00),
preamble (if MACCFG2[PreAmRxEn]=1), time stamp (if RCTRL[TS] = 1) and
any padding (RCTRL[PAL])."

Though the FCS bytes are removed by the driver before passing the skb
to the net stack, the Rx buffer size computation does not currently
take into account the FCS bytes (4 bytes).
Because the Rx buffer size is multiple of 512 bytes, leaving out the
FCS is not a problem for the default MTU of 1500, as the Rx buffer size
is 1536 in this case.  However, for custom MTUs, where the difference
between the MTU size and the Rx buffer size is less, this can be a
problem as the computed Rx buffer size won't be enough to accomodate
the FCS for a received frame that is big enough (close to MTU size).
In such case the received frame is considered to be incomplete (L flag
not set in the RxBD status) and silently dropped.

Note that the driver does not currently support S/G on Rx, so it has to
compute its Rx buffer size based on the MTU of the device.
Reported-by: Kristian Otnes <kotnes@cisco.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f5b720b8

virtio_net: fix use after free · 4b7fd2e6

Michael S. Tsirkin authored Oct 15, 2014

commit 0b725a2c
    net: Remove ndo_xmit_flush netdev operation, use signalling instead.

added code that looks at skb->xmit_more after the skb has
been put in TX VQ. Since some paths process the ring and free the skb
immediately, this can cause use after free.

Fix by storing xmit_more in a local variable.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4b7fd2e6

net: fec: ptp: fix convergence issue to support LinuxPTP stack · 28b5f058

Nimrod Andy authored Oct 15, 2014

iMX6SX IEEE 1588 module has one hw issue in capturing the ATVR register.
The current SW flow is:
		ENET0->ATCR |= ENET_ATCR_CAPTURE_MASK;
		ts_counter_ns = ENET0->ATVR;
The ATVR value is not expected value that cause LinuxPTP stack cannot be convergent.

ENET Block Guide/ Chapter for the iMX6SX (PELE) address the issue:
After set ENET_ATCR[Capture], there need some time cycles before the counter
value is capture in the register clock domain. The wait-time-cycles is at least
6 clock cycles of the slower clock between the register clock and the 1588 clock.
So need something like:
		ENET0->ATCR |= ENET_ATCR_CAPTURE_MASK;
		wait();
		ts_counter_ns = ENET0->ATVR;

For iMX6SX, the 1588 ts_clk is fixed to 25Mhz, register clock is 66Mhz, so the
wait-time-cycles must be greater than 240ns (40ns * 6). The patch add 1us delay
before cpu read ATVR register.

Changes V2:
Modify the commit/comments log to describe the issue clearly.
Signed-off-by: Fugang Duan <B38611@freescale.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

28b5f058

cxgb4i : Fix -Wmaybe-uninitialized warning. · 001586a7

Anish Bhatt authored Oct 15, 2014

Identified by kbuild test robot. csk family is always set to be AF_INET or
AF_INET6, so skb will always be initialized to some value but there is no harm
in silencing the warning anyways.
Signed-off-by: Anish Bhatt <anish@chelsio.com>
Fixes : f42bb57c ('cxgb4i : Fix -Wunused-function warning')
Signed-off-by: David S. Miller <davem@davemloft.net>

001586a7

net: Add ndo_gso_check · 04ffcb25

Tom Herbert authored Oct 14, 2014

Add ndo_gso_check which a device can define to indicate whether is
is capable of doing GSO on a packet. This funciton would be called from
the stack to determine whether software GSO is needed to be done. A
driver should populate this function if it advertises GSO types for
which there are combinations that it wouldn't be able to handle. For
instance a device that performs UDP tunneling might only implement
support for transparent Ethernet bridging type of inner packets
or might have limitations on lengths of inner headers.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

04ffcb25

stmmac: fix sti compatibililies · 71ae8f52

Giuseppe CAVALLARO authored Oct 15, 2014

this patch is to fix the stmmac data compatibilities for
all the SoCs inside the platform file.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

71ae8f52

Merge branch 'cxgb4' · 2ef1e9ef

David S. Miller authored Oct 15, 2014

Anish Bhatt says:

====================
ipv6 and related cleanup for cxgb4/cxgb4i

This patch set removes some duplicated/extraneous code from cxgb4i, guards
cxgb4 against compilation failure based on ipv6 tristate, make ipv6 related
code no longer be enabled by default irrespective of ipv6 tristate and fixes
a refcnt issue.
-Anish

v2 : Provide more detailed commit messages, make subject more concise as
recommended by Dave Miller.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

2ef1e9ef

cxgb4i: Remove duplicate call to dst_neigh_lookup() · c5bbcb58

Anish Bhatt authored Oct 14, 2014

There is an extra call to dst_neigh_lookup() leftover in cxgb4i that can cause
an unreleased refcnt issue. Remove extraneous call.
Signed-off-by: Anish Bhatt <anish@chelsio.com>

Fixes : 759a0cc5 ('cxgb4i: Add ipv6 code to driver, call into libcxgbi ipv6 api')
Signed-off-by: David S. Miller <davem@davemloft.net>

c5bbcb58

cxgb4i : Fix -Wunused-function warning · f42bb57c

Anish Bhatt authored Oct 14, 2014

A bunch of ipv6 related code is left on by default. While this causes no
compilation issues, there is no need to have this enabled by default. Guard
with an ipv6 check, which also takes care of a -Wunused-function warning.
Signed-off-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f42bb57c

cxgb4 : Fix build failure in cxgb4 when ipv6 is disabled/not in-built · 1bb60376

Anish Bhatt authored Oct 14, 2014

cxgb4 ipv6 does not guard against ipv6 being disabled, or the standard
ipv6 module vs inbuilt tri-state issue. This was fixed for cxgb4i & iw_cxgb4
but missed for cxgb4.
Signed-off-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1bb60376

cxgb4i : Remove duplicated CLIP handling code · 587ddfe2

Anish Bhatt authored Oct 14, 2014

cxgb4 already handles CLIP updates from a previous changeset for iw_cxgb4,
there is no need to have this functionality in cxgb4i. Remove duplicated code
Signed-off-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

587ddfe2

14 Oct, 2014 4 commits

tcp: TCP Small Queues and strange attractors · 9b462d02

Eric Dumazet authored Oct 13, 2014

TCP Small queues tries to keep number of packets in qdisc
as small as possible, and depends on a tasklet to feed following
packets at TX completion time.
Choice of tasklet was driven by latencies requirements.

Then, TCP stack tries to avoid reorders, by locking flows with
outstanding packets in qdisc in a given TX queue.

What can happen is that many flows get attracted by a low performing
TX queue, and cpu servicing TX completion has to feed packets for all of
them, making this cpu 100% busy in softirq mode.

This became particularly visible with latest skb->xmit_more support

Strategy adopted in this patch is to detect when tcp_wfree() is called
from ksoftirqd and let the outstanding queue for this flow being drained
before feeding additional packets, so that skb->ooo_okay can be set
to allow select_queue() to select the optimal queue :

Incoming ACKS are normally handled by different cpus, so this patch
gives more chance for these cpus to take over the burden of feeding
qdisc with future packets.

Tested:

lpaa23:~# ./super_netperf 1400 --google-pacing-rate 3028000 -H lpaa24 -l 3600 &

lpaa23:~# sar -n DEV 1 10 | grep eth1
06:16:18 AM eth1 595448.00 1190564.00 38381.09 1760253.12 0.00 0.00 1.00
06:16:19 AM eth1 594858.00 1189686e.00 38340.76 1758952.72 0.00 0.00 0.00
06:16:20 AM eth1 597017.00 1194019.00 38480.79 1765370.29 0.00 0.00 1.00
06:16:21 AM eth1 595450.00 1190936.00 38380.19 1760805.05 0.00 0.00 0.00
06:16:22 AM eth1 596385.00 1193096.00 38442.56 1763976.29 0.00 0.00 1.00
06:16:23 AM eth1 598155.00 1195978.00 38552.97 1768264.60 0.00 0.00 0.00
06:16:24 AM eth1 594405.00 1188643.00 38312.57 1757414.89 0.00 0.00 1.00
06:16:25 AM eth1 593366.00 1187154.00 38252.16 1755195.83 0.00 0.00 0.00
06:16:26 AM eth1 593188.00 1186118.00 38232.88 1753682.57 0.00 0.00 1.00
06:16:27 AM eth1 596301.00 1192241.00 38440.94 1762733.09 0.00 0.00 0.00
Average: eth1 595457.30 1190843.50 38381.69 1760664.84 0.00 0.00 0.50
lpaa23:~# ./tc -s -d qd sh dev eth1 | grep backlog
backlog 7606336b 2513p requeues 167982
backlog 224072b 74p requeues 566
backlog 581376b 192p requeues 5598
backlog 181680b 60p requeues 1070
backlog 5305056b 1753p requeues 110166 // Here, this TX queue is attracting flows
backlog 157456b 52p requeues 1758
backlog 672216b 222p requeues 3025
backlog 60560b 20p requeues 24541
backlog 448144b 148p requeues 21258

lpaa23:~# echo 1 >/proc/sys/net/ipv4/tcp_tsq_enable_tcp_wfree_ksoftirqd_detect

Immediate jump to full bandwidth, and traffic is properly
shard on all tx queues.

lpaa23:~# sar -n DEV 1 10 | grep eth1
06:16:46 AM eth1 1397632.00 2795397.00 90081.87 4133031.26 0.00 0.00 1.00
06:16:47 AM eth1 1396874.00 2793614.00 90032.99 4130385.46 0.00 0.00 0.00
06:16:48 AM eth1 1395842.00 2791600.00 89966.46 4127409.67 0.00 0.00 1.00
06:16:49 AM eth1 1395528.00 2791017.00 89946.17 4126551.24 0.00 0.00 0.00
06:16:50 AM eth1 1397891.00 2795716.00 90098.74 4133497.39 0.00 0.00 1.00
06:16:51 AM eth1 1394951.00 2789984.00 89908.96 4125022.51 0.00 0.00 0.00
06:16:52 AM eth1 1394608.00 2789190.00 89886.90 4123851.36 0.00 0.00 1.00
06:16:53 AM eth1 1395314.00 2790653.00 89934.33 4125983.09 0.00 0.00 0.00
06:16:54 AM eth1 1396115.00 2792276.00 89984.25 4128411.21 0.00 0.00 1.00
06:16:55 AM eth1 1396829.00 2793523.00 90030.19 4130250.28 0.00 0.00 0.00
Average: eth1 1396158.40 2792297.00 89987.09 4128439.35 0.00 0.00 0.50

lpaa23:~# tc -s -d qd sh dev eth1 | grep backlog
backlog 7900052b 2609p requeues 173287
backlog 878120b 290p requeues 589
backlog 1068884b 354p requeues 5621
backlog 996212b 329p requeues 1088
backlog 984100b 325p requeues 115316
backlog 956848b 316p requeues 1781
backlog 1080996b 357p requeues 3047
backlog 975016b 322p requeues 24571
backlog 990156b 327p requeues 21274

(All 8 TX queues get a fair share of the traffic)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9b462d02

Merge branch 'qlcnic' · 82b009f9

David S. Miller authored Oct 14, 2014

Rajesh Borundia says:

====================
qlcnic: Bug fixes

This series fixes following issues.

* We were programming maximum number of arguments supported by
  adapter instead of required in a command.
* Destroy tx command requires three arguments instead of two.

Please apply these patches to net.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

82b009f9

qlcnic: Fix number of arguments in destroy tx context command · d47d2fdd

Rajesh Borundia authored Oct 14, 2014

o Number of arguments taken by destroy tx command is three
  instead of two.
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d47d2fdd

qlcnic: Fix programming number of arguments in a command. · 2a1ef4b5

Rajesh Borundia authored Oct 14, 2014

o Initially we were programming maximum number of arguments.
  Instead we should program number of arguments required in
  a command.
o Maximum number of arguments for 82xx adapter is four. Fix it
  for GET_ESWITCH_STATS command.
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2a1ef4b5