Commits · 349ce993ac706869d553a1816426d3a4bfda02b1 · Kirill Smelkov / linux

25 Oct, 2014 5 commits

tcp: md5: do not use alloc_percpu() · 349ce993

Eric Dumazet authored Oct 23, 2014

percpu tcp_md5sig_pool contains memory blobs that ultimately
go through sg_set_buf().

-> sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf));

This requires that whole area is in a physically contiguous portion
of memory. And that @buf is not backed by vmalloc().

Given that alloc_percpu() can use vmalloc() areas, this does not
fit the requirements.

Replace alloc_percpu() by a static DEFINE_PER_CPU() as tcp_md5sig_pool
is small anyway, there is no gain to dynamically allocate it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 765cf997 ("tcp: md5: remove one indirection level in tcp_md5sig_pool")
Reported-by: Crestez Dan Leonard <cdleonard@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

349ce993

Merge branch 'xen-netback' · 4cc40af0

David S. Miller authored Oct 25, 2014

David Vrabel says:

====================
xen-netback: guest Rx queue drain and stall fixes

This series fixes two critical xen-netback bugs.

1. Netback may consume all of host memory by queuing an unlimited
   number of skb on the internal guest Rx queue.  This behaviour is
   guest triggerable.

2. Carrier flapping under high traffic rates which reduces
   performance.

The first patch is a prerequite.  Removing support for frontends with
feature-rx-notify makes it easier to reason about the correctness of
netback since it no longer has to support this outdated and broken
mode.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

4cc40af0

xen-netback: reintroduce guest Rx stall detection · ecf08d2d

David Vrabel authored Oct 22, 2014

If a frontend not receiving packets it is useful to detect this and
turn off the carrier so packets are dropped early instead of being
queued and drained when they expire.

A to-guest queue is stalled if it doesn't have enough free slots for a
an extended period of time (default 60 s).

If at least one queue is stalled, the carrier is turned off (in the
expectation that the other queues will soon stall as well).  The
carrier is only turned on once all queues are ready.

When the frontend connects, all the queues start in the stalled state
and only become ready once the frontend queues enough Rx requests.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ecf08d2d

xen-netback: fix unlimited guest Rx internal queue and carrier flapping · f48da8b1

David Vrabel authored Oct 22, 2014

Netback needs to discard old to-guest skb's (guest Rx queue drain) and
it needs detect guest Rx stalls (to disable the carrier so packets are
discarded earlier), but the current implementation is very broken.

1. The check in hard_start_xmit of the slot availability did not
   consider the number of packets that were already in the guest Rx
   queue.  This could allow the queue to grow without bound.

   The guest stops consuming packets and the ring was allowed to fill
   leaving S slot free.  Netback queues a packet requiring more than S
   slots (ensuring that the ring stays with S slots free).  Netback
   queue indefinately packets provided that then require S or fewer
   slots.

2. The Rx stall detection is not triggered in this case since the
   (host) Tx queue is not stopped.

3. If the Tx queue is stopped and a guest Rx interrupt occurs, netback
   will consider this an Rx purge event which may result in it taking
   the carrier down unnecessarily.  It also considers a queue with
   only 1 slot free as unstalled (even though the next packet might
   not fit in this).

The internal guest Rx queue is limited by a byte length (to 512 Kib,
enough for half the ring).  The (host) Tx queue is stopped and started
based on this limit.  This sets an upper bound on the amount of memory
used by packets on the internal queue.

This allows the estimatation of the number of slots for an skb to be
removed (it wasn't a very good estimate anyway).  Instead, the guest
Rx thread just waits for enough free slots for a maximum sized packet.

skbs queued on the internal queue have an 'expires' time (set to the
current time plus the drain timeout).  The guest Rx thread will detect
when the skb at the head of the queue has expired and discard expired
skbs.  This sets a clear upper bound on the length of time an skb can
be queued for.  For a guest being destroyed the maximum time needed to
wait for all the packets it sent to be dropped is still the drain
timeout (10 s) since it will not be sending new packets.

Rx stall detection is reintroduced in a later commit.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f48da8b1

xen-netback: make feature-rx-notify mandatory · bc96f648

David Vrabel authored Oct 22, 2014

Frontends that do not provide feature-rx-notify may stall because
netback depends on the notification from frontend to wake the guest Rx
thread (even if can_queue is false).

This could be fixed but feature-rx-notify was introduced in 2006 and I
am not aware of any frontends that do not implement this.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bc96f648

24 Oct, 2014 1 commit

ptp: restore the makefile for building the test program. · 5345c1d4

Richard Cochran authored Oct 22, 2014

This patch brings back the makefile called testptp.mk which was removed
in commit adb19fb6 (Documentation: add makefiles for more targets).

While the idea of that commit was to improve build coverage of the
examples, the new Makefile is unable to cross compile the testptp program.
In contrast, the deleted makefile was able to do this just fine.

This patch fixes the regression by restoring the original makefile.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Peter Foley <pefoley2@pefoley.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5345c1d4

22 Oct, 2014 13 commits

hyperv: Fix the total_data_buflen in send path · 942396b0

Haiyang Zhang authored Oct 22, 2014

total_data_buflen is used by netvsc_send() to decide if a packet can be put
into send buffer. It should also include the size of RNDIS message before the
Ethernet frame. Otherwise, a messge with total size bigger than send_section_size
may be copied into the send buffer, and cause data corruption.

[Request to include this patch to the Stable branches]
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

942396b0

Merge branch 'amd-xgbe' · f765678e

David S. Miller authored Oct 22, 2014

Tom Lendacky says:

====================
amd-xgbe: AMD XGBE driver fixes 2014-10-22

The following series of patches includes fixes to the driver.

- Properly handle feature changes via ethtool by using correctly sized
  variables
- Perform proper napi packet counting and budget checking

This patch series is based on net.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

f765678e

amd-xgbe: Fix napi Rx budget accounting · 55ca6bcd

Lendacky, Thomas authored Oct 22, 2014

Currently the amd-xgbe driver increments the packets processed counter
each time a descriptor is processed.  Since a packet can be represented
by more than one descriptor incrementing the counter in this way is not
appropriate.  Also, since multiple descriptors cause the budget check
to be short circuited, sometimes the returned value from the poll
function would be larger than the budget value resulting in a WARN_ONCE
being triggered.

Update the polling logic to properly account for the number of packets
processed and exit when the budget value is reached.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

55ca6bcd

amd-xgbe: Properly handle feature changes via ethtool · 386f1c96

Lendacky, Thomas authored Oct 22, 2014

The ndo_set_features callback function was improperly using an unsigned
int to save the current feature value for features such as NETIF_F_RXCSUM.
Since that feature is in the upper 32 bits of a 64 bit variable the
result was always 0 making it not possible to actually turn off the
hardware RX checksum support. Change the unsigned int type to the
netdev_features_t type in order to properly capture the current value
and perform the proper operation.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

386f1c96

net: fec: ptp: fix NULL pointer dereference if ptp_clock is not set · 81f35ffd

Philipp Zabel authored Oct 22, 2014

Since commit 278d2404 (net: fec: ptp: Enable PPS output based on ptp clock)
fec_enet_interrupt calls fec_ptp_check_pps_event unconditionally, which calls
into ptp_clock_event. If fep->ptp_clock is NULL, ptp_clock_event tries to
dereference the NULL pointer.
Since on i.MX53 fep->bufdesc_ex is not set, fec_ptp_init is never called,
and fep->ptp_clock is NULL, which reliably causes a kernel panic.

This patch adds a check for fep->ptp_clock == NULL in fec_enet_interrupt.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

81f35ffd

net: fix saving TX flow hash in sock for outgoing connections · 9e7ceb06

Sathya Perla authored Oct 22, 2014

The commit "net: Save TX flow hash in sock and set in skbuf on xmit"
introduced the inet_set_txhash() and ip6_set_txhash() routines to calculate
and record flow hash(sk_txhash) in the socket structure. sk_txhash is used
to set skb->hash which is used to spread flows across multiple TXQs.

But, the above routines are invoked before the source port of the connection
is created. Because of this all outgoing connections that just differ in the
source port get hashed into the same TXQ.

This patch fixes this problem for IPv4/6 by invoking the the above routines
after the source port is available for the socket.

Fixes: b73c3d0e("net: Save TX flow hash in sock and set in skbuf on xmit")
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9e7ceb06

xfrm6: fix a potential use after free in xfrm6_policy.c · 789f2023

Li RongQing authored Oct 22, 2014

pskb_may_pull() maybe change skb->data and make nh and exthdr pointer
oboslete, so recompute the nd and exthdr
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

789f2023

net: fs_enet: set back promiscuity mode after restart · 8751b12c

LEROY Christophe authored Oct 22, 2014

After interface restart (eg: after link disconnection/reconnection), the bridge
function doesn't work anymore. This is due to the promiscuous mode being cleared
by the restart.

The mac-fcc already includes code to set the promiscuous mode back during the restart.
This patch adds the same handling to mac-fec and mac-scc.

Tested with bridge function on MPC885 with FEC.
Reported-by: Germain Montoies <germain.montoies@c-s.fr>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

8751b12c

net: tso: fix unaligned access to crafted TCP header in helper API · a63ba13e

Karl Beldan authored Oct 21, 2014

The crafted header start address is from a driver supplied buffer, which
one can reasonably expect to be aligned on a 4-bytes boundary.
However ATM the TSO helper API is only used by ethernet drivers and
the tcp header will then be aligned to a 2-bytes only boundary from the
header start address.
Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Cc: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a63ba13e

sfc: remove incorrect EFX_BUG_ON_PARANOID check · 8fc96351

Jon Cooper authored Oct 21, 2014

write_count and insert_count can wrap around, making > check invalid.

Fixes: 70b33fb0 ("sfc: add support for
 skb->xmit_more").
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8fc96351

net: sched: initialize bstats syncp · 7c1c97d5

Sabrina Dubroca authored Oct 21, 2014

Use netdev_alloc_pcpu_stats to allocate percpu stats and initialize syncp.

Fixes: 22e0f8b9 "net: sched: make bstats per cpu and estimator RCU safe"
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7c1c97d5

bpf: fix bug in eBPF verifier · 32bf08a6

Alexei Starovoitov authored Oct 20, 2014

while comparing for verifier state equivalency the comparison
was missing a check for uninitialized register.
Make sure it does so and add a testcase.

Fixes: f1bca824 ("bpf: add search pruning optimization to verifier")
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

32bf08a6

netlink: Re-add locking to netlink_lookup() and seq walker · 78fd1d0a

Thomas Graf authored Oct 21, 2014

The synchronize_rcu() in netlink_release() introduces unacceptable
latency. Reintroduce minimal lookup so we can drop the
synchronize_rcu() until socket destruction has been RCUfied.

Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Steinar H. Gunderson <sgunderson@bigfoot.com>
Reported-and-tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

78fd1d0a

21 Oct, 2014 5 commits

tipc: fix lockdep warning when intra-node messages are delivered · 1a194c2d

Ying Xue authored Oct 20, 2014

When running tipcTC&tipcTS test suite, below lockdep unsafe locking
scenario is reported:

[ 1109.997854]
[ 1109.997988] =================================
[ 1109.998290] [ INFO: inconsistent lock state ]
[ 1109.998575] 3.17.0-rc1+ #113 Not tainted
[ 1109.998762] ---------------------------------
[ 1109.998762] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[ 1109.998762] swapper/7/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
[ 1109.998762]  (slock-AF_TIPC){+.?...}, at: [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc]
[ 1109.998762] {SOFTIRQ-ON-W} state was registered at:
[ 1109.998762]   [<ffffffff810a4770>] __lock_acquire+0x6a0/0x1d80
[ 1109.998762]   [<ffffffff810a6555>] lock_acquire+0x95/0x1e0
[ 1109.998762]   [<ffffffff81a2d1ce>] _raw_spin_lock+0x3e/0x80
[ 1109.998762]   [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc]
[ 1109.998762]   [<ffffffffa0004fe8>] tipc_link_xmit+0xa8/0xc0 [tipc]
[ 1109.998762]   [<ffffffffa000ec6f>] tipc_sendmsg+0x15f/0x550 [tipc]
[ 1109.998762]   [<ffffffffa000f165>] tipc_connect+0x105/0x140 [tipc]
[ 1109.998762]   [<ffffffff817676ee>] SYSC_connect+0xae/0xc0
[ 1109.998762]   [<ffffffff81767b7e>] SyS_connect+0xe/0x10
[ 1109.998762]   [<ffffffff817a9788>] compat_SyS_socketcall+0xb8/0x200
[ 1109.998762]   [<ffffffff81a306e5>] sysenter_dispatch+0x7/0x1f
[ 1109.998762] irq event stamp: 241060
[ 1109.998762] hardirqs last  enabled at (241060): [<ffffffff8105a4ad>] __local_bh_enable_ip+0x6d/0xd0
[ 1109.998762] hardirqs last disabled at (241059): [<ffffffff8105a46f>] __local_bh_enable_ip+0x2f/0xd0
[ 1109.998762] softirqs last  enabled at (241020): [<ffffffff81059a52>] _local_bh_enable+0x22/0x50
[ 1109.998762] softirqs last disabled at (241021): [<ffffffff8105a626>] irq_exit+0x96/0xc0
[ 1109.998762]
[ 1109.998762] other info that might help us debug this:
[ 1109.998762]  Possible unsafe locking scenario:
[ 1109.998762]
[ 1109.998762]        CPU0
[ 1109.998762]        ----
[ 1109.998762]   lock(slock-AF_TIPC);
[ 1109.998762]   <Interrupt>
[ 1109.998762]     lock(slock-AF_TIPC);
[ 1109.998762]
[ 1109.998762]  *** DEADLOCK ***
[ 1109.998762]
[ 1109.998762] 2 locks held by swapper/7/0:
[ 1109.998762]  #0:  (rcu_read_lock){......}, at: [<ffffffff81782dc9>] __netif_receive_skb_core+0x69/0xb70
[ 1109.998762]  #1:  (rcu_read_lock){......}, at: [<ffffffffa0001c90>] tipc_l2_rcv_msg+0x40/0x260 [tipc]
[ 1109.998762]
[ 1109.998762] stack backtrace:
[ 1109.998762] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 3.17.0-rc1+ #113
[ 1109.998762] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[ 1109.998762]  ffffffff82745830 ffff880016c03828 ffffffff81a209eb 0000000000000007
[ 1109.998762]  ffff880017b3cac0 ffff880016c03888 ffffffff81a1c5ef 0000000000000001
[ 1109.998762]  ffff880000000001 ffff880000000000 ffffffff81012d4f 0000000000000000
[ 1109.998762] Call Trace:
[ 1109.998762]  <IRQ>  [<ffffffff81a209eb>] dump_stack+0x4e/0x68
[ 1109.998762]  [<ffffffff81a1c5ef>] print_usage_bug+0x1f1/0x202
[ 1109.998762]  [<ffffffff81012d4f>] ? save_stack_trace+0x2f/0x50
[ 1109.998762]  [<ffffffff810a406c>] mark_lock+0x28c/0x2f0
[ 1109.998762]  [<ffffffff810a3440>] ? print_irq_inversion_bug.part.46+0x1f0/0x1f0
[ 1109.998762]  [<ffffffff810a467d>] __lock_acquire+0x5ad/0x1d80
[ 1109.998762]  [<ffffffff810a70dd>] ? trace_hardirqs_on+0xd/0x10
[ 1109.998762]  [<ffffffff8108ace8>] ? sched_clock_cpu+0x98/0xc0
[ 1109.998762]  [<ffffffff8108ad2b>] ? local_clock+0x1b/0x30
[ 1109.998762]  [<ffffffff810a10dc>] ? lock_release_holdtime.part.29+0x1c/0x1a0
[ 1109.998762]  [<ffffffff8108aa05>] ? sched_clock_local+0x25/0x90
[ 1109.998762]  [<ffffffffa000dec0>] ? tipc_sk_get+0x60/0x80 [tipc]
[ 1109.998762]  [<ffffffff810a6555>] lock_acquire+0x95/0x1e0
[ 1109.998762]  [<ffffffffa0011969>] ? tipc_sk_rcv+0x49/0x2b0 [tipc]
[ 1109.998762]  [<ffffffff810a6fb6>] ? trace_hardirqs_on_caller+0xa6/0x1c0
[ 1109.998762]  [<ffffffff81a2d1ce>] _raw_spin_lock+0x3e/0x80
[ 1109.998762]  [<ffffffffa0011969>] ? tipc_sk_rcv+0x49/0x2b0 [tipc]
[ 1109.998762]  [<ffffffffa000dec0>] ? tipc_sk_get+0x60/0x80 [tipc]
[ 1109.998762]  [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc]
[ 1109.998762]  [<ffffffffa00076bd>] tipc_rcv+0x5ed/0x960 [tipc]
[ 1109.998762]  [<ffffffffa0001d1c>] tipc_l2_rcv_msg+0xcc/0x260 [tipc]
[ 1109.998762]  [<ffffffffa0001c90>] ? tipc_l2_rcv_msg+0x40/0x260 [tipc]
[ 1109.998762]  [<ffffffff81783345>] __netif_receive_skb_core+0x5e5/0xb70
[ 1109.998762]  [<ffffffff81782dc9>] ? __netif_receive_skb_core+0x69/0xb70
[ 1109.998762]  [<ffffffff81784eb9>] ? dev_gro_receive+0x259/0x4e0
[ 1109.998762]  [<ffffffff817838f6>] __netif_receive_skb+0x26/0x70
[ 1109.998762]  [<ffffffff81783acd>] netif_receive_skb_internal+0x2d/0x1f0
[ 1109.998762]  [<ffffffff81785518>] napi_gro_receive+0xd8/0x240
[ 1109.998762]  [<ffffffff815bf854>] e1000_clean_rx_irq+0x2c4/0x530
[ 1109.998762]  [<ffffffff815c1a46>] e1000_clean+0x266/0x9c0
[ 1109.998762]  [<ffffffff8108ad2b>] ? local_clock+0x1b/0x30
[ 1109.998762]  [<ffffffff8108aa05>] ? sched_clock_local+0x25/0x90
[ 1109.998762]  [<ffffffff817842b1>] net_rx_action+0x141/0x310
[ 1109.998762]  [<ffffffff810bd710>] ? handle_fasteoi_irq+0xe0/0x150
[ 1109.998762]  [<ffffffff81059fa6>] __do_softirq+0x116/0x4d0
[ 1109.998762]  [<ffffffff8105a626>] irq_exit+0x96/0xc0
[ 1109.998762]  [<ffffffff81a30d07>] do_IRQ+0x67/0x110
[ 1109.998762]  [<ffffffff81a2ee2f>] common_interrupt+0x6f/0x6f
[ 1109.998762]  <EOI>  [<ffffffff8100d2b7>] ? default_idle+0x37/0x250
[ 1109.998762]  [<ffffffff8100d2b5>] ? default_idle+0x35/0x250
[ 1109.998762]  [<ffffffff8100dd1f>] arch_cpu_idle+0xf/0x20
[ 1109.998762]  [<ffffffff810999fd>] cpu_startup_entry+0x27d/0x4d0
[ 1109.998762]  [<ffffffff81034c78>] start_secondary+0x188/0x1f0

When intra-node messages are delivered from one process to another
process, tipc_link_xmit() doesn't disable BH before it directly calls
tipc_sk_rcv() on process context to forward messages to destination
socket. Meanwhile, if messages delivered by remote node arrive at the
node and their destinations are also the same socket, tipc_sk_rcv()
running on process context might be preempted by tipc_sk_rcv() running
BH context. As a result, the latter cannot obtain the socket lock as
the lock was obtained by the former, however, the former has no chance
to be run as the latter is owning the CPU now, so headlock happens. To
avoid it, BH should be always disabled in tipc_sk_rcv().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1a194c2d

tipc: fix a potential deadlock · 7b8613e0

Ying Xue authored Oct 20, 2014

Locking dependency detected below possible unsafe locking scenario:

           CPU0                          CPU1
T0:  tipc_named_rcv()                tipc_rcv()
T1:  [grab nametble write lock]*     [grab node lock]*
T2:  tipc_update_nametbl()           tipc_node_link_up()
T3:  tipc_nodesub_subscribe()        tipc_nametbl_publish()
T4:  [grab node lock]*               [grab nametble write lock]*

The opposite order of holding nametbl write lock and node lock on
above two different paths may result in a deadlock. If we move the
the updating of the name table after link state named out of node
lock, the reverse order of holding locks will be eliminated, and
as a result, the deadlock risk.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7b8613e0

Merge branch 'enic' · 73829bf6

David S. Miller authored Oct 21, 2014

Govindarajulu Varadarajan says:

====================
enic: Bug fixes

This series fixes the following problem.

Please apply this to net.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

73829bf6

enic: Do not call napi_disable when preemption is disabled. · 39dc90c1

Govindarajulu Varadarajan authored Oct 19, 2014

In enic_stop, we disable preemption using local_bh_disable(). We disable
preemption to wait for busy_poll to finish.

napi_disable should not be called here as it might sleep.

Moving napi_disable() call out side of local_bh_disable.

BUG: sleeping function called from invalid context at include/linux/netdevice.h:477
in_atomic(): 1, irqs_disabled(): 0, pid: 443, name: ifconfig
INFO: lockdep is turned off.
Preemption disabled at:[<ffffffffa029c5c4>] enic_rfs_flw_tbl_free+0x34/0xd0 [enic]

CPU: 31 PID: 443 Comm: ifconfig Not tainted 3.17.0-netnext-05504-g59f35b81 #268
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 ffff8800dac10000 ffff88020b8dfcb8 ffffffff8148a57c 0000000000000000
 ffff88020b8dfcd0 ffffffff8107e253 ffff8800dac12a40 ffff88020b8dfd10
 ffffffffa029305b ffff88020b8dfd48 ffff8800dac10000 ffff88020b8dfd48
Call Trace:
 [<ffffffff8148a57c>] dump_stack+0x4e/0x7a
 [<ffffffff8107e253>] __might_sleep+0x123/0x1a0
 [<ffffffffa029305b>] enic_stop+0xdb/0x4d0 [enic]
 [<ffffffff8138ed7d>] __dev_close_many+0x9d/0xf0
 [<ffffffff8138ef81>] __dev_close+0x31/0x50
 [<ffffffff813974a8>] __dev_change_flags+0x98/0x160
 [<ffffffff81397594>] dev_change_flags+0x24/0x60
 [<ffffffff814085fd>] devinet_ioctl+0x63d/0x710
 [<ffffffff81139c16>] ? might_fault+0x56/0xc0
 [<ffffffff81409ef5>] inet_ioctl+0x65/0x90
 [<ffffffff813768e0>] sock_do_ioctl+0x20/0x50
 [<ffffffff81376ebb>] sock_ioctl+0x20b/0x2e0
 [<ffffffff81197250>] do_vfs_ioctl+0x2e0/0x500
 [<ffffffff81492619>] ? sysret_check+0x22/0x5d
 [<ffffffff81285f23>] ? __this_cpu_preempt_check+0x13/0x20
 [<ffffffff8109fe19>] ? trace_hardirqs_on_caller+0x119/0x270
 [<ffffffff811974ac>] SyS_ioctl+0x3c/0x80
 [<ffffffff814925ed>] system_call_fastpath+0x1a/0x1f
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

39dc90c1

enic: fix possible deadlock in enic_stop/ enic_rfs_flw_tbl_free · b6931c9b

Govindarajulu Varadarajan authored Oct 19, 2014

The following warning is shown when spinlock debug is enabled.

This occurs when enic_flow_may_expire timer function is running and
enic_stop is called on same CPU.

Fix this by using spink_lock_bh().

=================================
[ INFO: inconsistent lock state ]
3.17.0-netnext-05504-g59f35b81 #268 Not tainted
---------------------------------
inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
ifconfig/443 [HC0[0]:SC0[0]:HE1:SE1] takes:
 (&(&enic->rfs_h.lock)->rlock){+.?...}, at:
enic_rfs_flw_tbl_free+0x34/0xd0 [enic]
{IN-SOFTIRQ-W} state was registered at:
  [<ffffffff810a25af>] __lock_acquire+0x83f/0x21c0
  [<ffffffff810a45f2>] lock_acquire+0xa2/0xd0
  [<ffffffff814913fc>] _raw_spin_lock+0x3c/0x80
  [<ffffffffa029c3d5>] enic_flow_may_expire+0x25/0x130[enic]
  [<ffffffff810bcd07>] call_timer_fn+0x77/0x100
  [<ffffffff810bd8e3>] run_timer_softirq+0x1e3/0x270
  [<ffffffff8105f9ae>] __do_softirq+0x14e/0x280
  [<ffffffff8105fdae>] irq_exit+0x8e/0xb0
  [<ffffffff8103da0f>] smp_apic_timer_interrupt+0x3f/0x50
  [<ffffffff81493742>] apic_timer_interrupt+0x72/0x80
  [<ffffffff81018143>] default_idle+0x13/0x20
  [<ffffffff81018a6a>] arch_cpu_idle+0xa/0x10
  [<ffffffff81097676>] cpu_startup_entry+0x2c6/0x330
  [<ffffffff8103b7ad>] start_secondary+0x21d/0x290
irq event stamp: 2997
hardirqs last  enabled at (2997): [<ffffffff81491865>] _raw_spin_unlock_irqrestore+0x65/0x90
hardirqs last disabled at (2996): [<ffffffff814915e6>] _raw_spin_lock_irqsave+0x26/0x90
softirqs last  enabled at (2968): [<ffffffff813b57a3>] dev_deactivate_many+0x213/0x260
softirqs last disabled at (2966): [<ffffffff813b5783>] dev_deactivate_many+0x1f3/0x260

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&enic->rfs_h.lock)->rlock);
  <Interrupt>
    lock(&(&enic->rfs_h.lock)->rlock);

 *** DEADLOCK ***
Reported-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b6931c9b

20 Oct, 2014 6 commits

Merge branch 'gso_encap_fixes' · d10845fc

David S. Miller authored Oct 20, 2014

Florian Westphal says:

====================
net: minor gso encapsulation fixes

The following series fixes a minor bug in the gso segmentation handlers
when encapsulation offload is used.

Theoretically this could cause kernel panic when the stack tries
to software-segment such a GRE offload packet, but it looks like there
is only one affected call site (tbf scheduler) and it handles NULL
return value.

I've included a followup patch to add IS_ERR_OR_NULL checks where needed.

While looking into this, I also found that size computation of the individual
segments is incorrect if skb->encapsulation is set.

Please see individual patches for delta vs. v1.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

d10845fc

net: core: handle encapsulation offloads when computing segment lengths · f993bc25

Florian Westphal authored Oct 20, 2014

if ->encapsulation is set we have to use inner_tcp_hdrlen and add the
size of the inner network headers too.

This is 'mostly harmless'; tbf might send skb that is slightly over
quota or drop skb even if it would have fit.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

f993bc25

net: make skb_gso_segment error handling more robust · 330966e5

Florian Westphal authored Oct 20, 2014

skb_gso_segment has three possible return values:
1. a pointer to the first segmented skb
2. an errno value (IS_ERR())
3. NULL.  This can happen when GSO is used for header verification.

However, several callers currently test IS_ERR instead of IS_ERR_OR_NULL
and would oops when NULL is returned.

Note that these call sites should never actually see such a NULL return
value; all callers mask out the GSO bits in the feature argument.

However, there have been issues with some protocol handlers erronously not
respecting the specified feature mask in some cases.

It is preferable to get 'have to turn off hw offloading, else slow' reports
rather than 'kernel crashes'.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

330966e5

net: gso: use feature flag argument in all protocol gso handlers · 1e16aa3d

Florian Westphal authored Oct 20, 2014

skb_gso_segment() has a 'features' argument representing offload features
available to the output path.

A few handlers, e.g. GRE, instead re-fetch the features of skb->dev and use
those instead of the provided ones when handing encapsulation/tunnels.

Depending on dev->hw_enc_features of the output device skb_gso_segment() can
then return NULL even when the caller has disabled all GSO feature bits,
as segmentation of inner header thinks device will take care of segmentation.

This e.g. affects the tbf scheduler, which will silently drop GRE-encap GSO skbs
that did not fit the remaining token quota as the segmentation does not work
when device supports corresponding hw offload capabilities.

Cc: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

1e16aa3d

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · ce8ec489

David S. Miller authored Oct 20, 2014

Pablo Neira Ayuso says:

====================
netfilter fixes for net

The following patchset contains netfilter fixes for your net tree,
they are:

1) Fix missing MODULE_LICENSE() in the new nf_reject_ipv{4,6} modules.

2) Restrict nat and masq expressions to the nat chain type. Otherwise,
   users may crash their kernel if they attach a nat/masq rule to a non
   nat chain.

3) Fix hook validation in nft_compat when non-base chains are used.
   Basically, initialize hook_mask to zero.

4) Make sure you use match/targets in nft_compat from the right chain
   type. The existing validation relies on the table name which can be
   avoided by

5) Better netlink attribute validation in nft_nat. This expression has
   to reject the configuration when no address and proto configurations
   are specified.

6) Interpret NFTA_NAT_REG_*_MAX if only if NFTA_NAT_REG_*_MIN is set.
   Yet another sanity check to reject incorrect configurations from
   userspace.

7) Conditional NAT attribute dumping depending on the existing
   configuration.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ce8ec489

ax88179_178a: fix bonding failure · 95ff8868

Ian Morgan authored Oct 19, 2014

The following patch fixes a bug which causes the ax88179_178a driver to be
incapable of being added to a bond.

When I brought up the issue with the bonding maintainers, they indicated
that the real problem was with the NIC driver which must return zero for
success (of setting the MAC address). I see that several other NIC drivers
follow that pattern by either simply always returing zero, or by passing
through a negative (error) result while rewriting any positive return code
to zero. With that same philisophy applied to the ax88179_178a driver, it
allows it to work correctly with the bonding driver.

I believe this is suitable for queuing in -stable, as it's a small, simple,
and obvious fix that corrects a defect with no other known workaround.

This patch is against vanilla 3.17(.0).
Signed-off-by: Ian Morgan <imorgan@primordial.ca>

drivers/net/usb/ax88179_178a.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
Signed-off-by: David S. Miller <davem@davemloft.net>

95ff8868

19 Oct, 2014 10 commits

Merge tag 'ntb-3.18' of git://github.com/jonmason/ntb · 61ed53de

Linus Torvalds authored Oct 19, 2014

Pull ntb (non-transparent bridge) updates from Jon Mason:
 "Add support for Haswell NTB split BARs, a debugfs entry for basic
  debugging info, and some code clean-ups"

* tag 'ntb-3.18' of git://github.com/jonmason/ntb:
  ntb: Adding split BAR support for Haswell platforms
  ntb: use errata flag set via DID to implement workaround
  ntb: conslidate reading of PPD to move platform detection earlier
  ntb: move platform detection to separate function
  NTB: debugfs device entry

61ed53de

Merge branch 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 278f1d07

Linus Torvalds authored Oct 19, 2014

Pull i2c updates from Wolfram Sang:
 "Highlights from the I2C subsystem for 3.18:

   - new drivers for Axxia AM55xx, and Hisilicon hix5hd2 SoC.

   - designware driver gained AMD support, exynos gained exynos7 support

  The rest is usual driver stuff.  Hopefully no lowlights this time"

* 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: i801: Add Device IDs for Intel Sunrise Point PCH
  i2c: hix5hd2: add i2c controller driver
  i2c-imx: Disable the clock on probe failure
  i2c: designware: Add support for AMD I2C controller
  i2c: designware: Rework probe() to get clock a bit later
  i2c: designware: Default to fast mode in case of ACPI
  i2c: axxia: Add I2C driver for AXM55xx
  i2c: exynos: add support for HSI2C module on Exynos7
  i2c: mxs: detect No Slave Ack on SELECT in PIO mode
  i2c: cros_ec: Remove EC_I2C_FLAG_10BIT
  i2c: cros-ec-tunnel: Add of match table
  i2c: rcar: remove sign-compare flaw
  i2c: ismt: Use minimum descriptor size
  i2c: imx: Add arbitration lost check
  i2c: rk3x: Remove unlikely() annotations
  i2c: rcar: check for no IRQ in rcar_i2c_irq()
  i2c: rcar: make rcar_i2c_prepare_msg() *void*
  i2c: rcar: simplify check for last message
  i2c: designware: add support of platform data to set I2C mode
  i2c: designware: add support of I2C standard mode

278f1d07

Merge tag 'sound-fix-3.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · d590c6cd

Linus Torvalds authored Oct 19, 2014

Pull sound fixes from Takashi Iwai:
 "Here are a collection of small fixes after 3.18 merge.

  The urgent one is the fix for kernel panics with linked PCM substream
  triggered by the recent nonatomic PCM ops support.  Other two fixes
  (emu10k1 and bebob) are stable fixes, and one easy PCI ID addition for
  a new Intel HD-audio controller"

* tag 'sound-fix-3.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda_intel: Add Device IDs for Intel Sunrise Point PCH
  ALSA: emu10k1: Fix deadlock in synth voice lookup
  ALSA: pcm: Fix referred substream in snd_pcm_action_group() unlock loop
  ALSA: bebob: Fix failure to detect source of clock for Terratec Phase 88

d590c6cd

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · fb378df5

Linus Torvalds authored Oct 19, 2014

Pull second round of input updates from Dmitry Torokhov:
 "Mostly simple bug fixes, although we do have one brand new driver for
  Microchip AR1021 i2c touchscreen.

  Also there is the change to stop trying to use i8042 active
  multiplexing by default (it is still possible to activate it via
  i8042.nomux=0 on boxes that implement it)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: xpad - add Thrustmaster as Xbox 360 controller vendor
  Input: xpad - add USB ID for Thrustmaster Ferrari 458 Racing Wheel
  Input: max77693-haptic - fix state check in imax77693_haptic_disable()
  Input: xen-kbdfront - free grant table entry in xenkbd_disconnect_backend
  Input: alps - fix v4 button press recognition
  Input: i8042 - disable active multiplexing by default
  Input: i8042 - add noloop quirk for Asus X750LN
  Input: synaptics - gate forcepad support by DMI check
  Input: Add Microchip AR1021 i2c touchscreen
  Input: cros_ec_keyb - add of match table
  Input: serio - avoid negative serio device numbers
  Input: avoid negative input device numbers
  Input: automatically set EV_ABS bit in input_set_abs_params
  Input: adp5588-keys - cancel workqueue in failure path
  Input: opencores-kbd - switch to using managed resources
  Input: evdev - fix EVIOCG{type} ioctl

fb378df5

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband · 2eb7f910

Linus Torvalds authored Oct 19, 2014

Pull infiniband/RDMA updates from Roland Dreier:
 - large set of iSER initiator improvements
 - hardware driver fixes for cxgb4, mlx5 and ocrdma
 - small fixes to core midlayer

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (47 commits)
  RDMA/cxgb4: Fix ntuple calculation for ipv6 and remove duplicate line
  RDMA/cxgb4: Add missing neigh_release in find_route
  RDMA/cxgb4: Take IPv6 into account for best_mtu and set_emss
  RDMA/cxgb4: Make c4iw_wr_log_size_order static
  IB/core: Fix XRC race condition in ib_uverbs_open_qp
  IB/core: Clear AH attr variable to prevent garbage data
  RDMA/ocrdma: Save the bit environment, spare unncessary parenthesis
  RDMA/ocrdma: The kernel has a perfectly good BIT() macro - use it
  RDMA/ocrdma: Don't memset() buffers we just allocated with kzalloc()
  RDMA/ocrdma: Remove a unused-label warning
  RDMA/ocrdma: Convert kernel VA to PA for mmap in user
  RDMA/ocrdma: Get vlan tag from ib_qp_attrs
  RDMA/ocrdma: Add default GID at index 0
  IB/mlx5, iser, isert: Add Signature API additions
  Target/iser: Centralize ib_sig_domain setting
  IB/iser: Centralize ib_sig_domain settings
  IB/mlx5: Use extended internal signature layout
  IB/iser: Set IP_CSUM as default guard type
  IB/iser: Remove redundant assignment
  IB/mlx5: Use enumerations for PI copy mask
  ...

2eb7f910

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1f6075f9

Linus Torvalds authored Oct 19, 2014

Pull more perf updates from Ingo Molnar:
 "A second (and last) round of late coming fixes and changes, almost all
  of them in perf tooling:

  User visible tooling changes:

   - Add period data column and make it default in 'perf script' (Jiri
     Olsa)

   - Add a visual cue for toggle zeroing of samples in 'perf top'
     (Taeung Song)

   - Improve callchains when using libunwind (Namhyung Kim)

  Tooling fixes and infrastructure changes:

   - Fix for double free in 'perf stat' when using some specific invalid
     command line combo (Yasser Shalabi)

   - Fix off-by-one bugs in map->end handling (Stephane Eranian)

   - Fix off-by-one bug in maps__find(), also related to map->end
     handling (Namhyung Kim)

   - Make struct symbol->end be the first addr after the symbol range,
     to make it match the convention used for struct map->end.  (Arnaldo
     Carvalho de Melo)

   - Fix perf_evlist__add_pollfd() error handling in 'perf kvm stat
     live' (Jiri Olsa)

   - Fix python test build by moving callchain_param to an object linked
     into the python binding (Jiri Olsa)

   - Document sysfs events/ interfaces (Cody P Schafer)

   - Fix typos in perf/Documentation (Masanari Iida)

   - Add missing 'struct option' forward declaration (Arnaldo Carvalho
     de Melo)

   - Add option to copy events when queuing for sorting across cpu
     buffers and enable it for 'perf kvm stat live', to avoid having
     events left in the queue pointing to the ring buffer be rewritten
     in high volume sessions.  (Alexander Yarygin, improving work done
     by David Ahern):

   - Do not include a struct hists per perf_evsel, untangling the
     histogram code from perf_evsel, to pave the way for exporting a
     minimalistic tools/lib/api/perf/ library usable by tools/perf and
     initially by the rasd daemon being developed by Borislav Petkov,
     Robert Richter and Jean Pihet.  (Arnaldo Carvalho de Melo)

   - Make perf_evlist__open(evlist, NULL, NULL), i.e. without cpu and
     thread maps mean syswide monitoring, reducing the boilerplate for
     tools that only want system wide mode.  (Arnaldo Carvalho de Melo)

   - Move exit stuff from perf_evsel__delete to perf_evsel__exit, delete
     should be just a front end for exit + free (Arnaldo Carvalho de
     Melo)

   - Add support to new style format of kernel PMU event.  (Kan Liang)

  and other misc fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
  perf script: Add period as a default output column
  perf script: Add period data column
  perf evsel: No need to drag util/cgroup.h
  perf evlist: Add missing 'struct option' forward declaration
  perf evsel: Move exit stuff from __delete to __exit
  kprobes/x86: Remove stale ARCH_SUPPORTS_KPROBES_ON_FTRACE define
  perf kvm stat live: Enable events copying
  perf session: Add option to copy events when queueing
  perf Documentation: Fix typos in perf/Documentation
  perf trace: Use thread_{,_set}_priv helpers
  perf kvm: Use thread_{,_set}_priv helpers
  perf callchain: Create an address space per thread
  perf report: Set callchain_param.record_mode for future use
  perf evlist: Fix for double free in tools/perf stat
  perf test: Add test case for pmu event new style format
  perf tools: Add support to new style format of kernel PMU event
  perf tools: Parse the pmu event prefix and suffix
  Revert "perf tools: Default to cpu// for events v5"
  perf Documentation: Remove Ruplicated docs for powerpc cpu specific events
  perf Documentation: sysfs events/ interfaces
  ...

1f6075f9

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · 5e2ee7cd

Linus Torvalds authored Oct 19, 2014

Pull sparc fixes from David Miller:
 "Here we have two bug fixes:

  1) The current thread's fault_code is not setup properly upon entry to
     do_sparc64_fault() in some paths, leading to spurious SIGBUS.

  2) Don't use a zero length array at the end of thread_info on sparc64,
     otherwise end_of_stack() isn't right"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc64: Do not define thread fpregs save area as zero-length array.
  sparc64: Fix corrupted thread fault code.

5e2ee7cd

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · e25b4927

Linus Torvalds authored Oct 19, 2014

Pull networking fixes from David Miller:
 "A quick batch of bug fixes:

  1) Fix build with IPV6 disabled, from Eric Dumazet.

  2) Several more cases of caching SKB data pointers across calls to
     pskb_may_pull(), thus referencing potentially free'd memory.  From
     Li RongQing.

  3) DSA phy code tests operation presence improperly, instead of going:

        if (x->ops->foo)
                r = x->ops->foo(args);

     it was going:

        if (x->ops->foo(args))
                r = x->ops->foo(args);

   Fix from Andew Lunn"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  Net: DSA: Fix checking for get_phy_flags function
  ipv6: fix a potential use after free in sit.c
  ipv6: fix a potential use after free in ip6_offload.c
  ipv4: fix a potential use after free in gre_offload.c
  tcp: fix build error if IPv6 is not enabled

e25b4927

Net: DSA: Fix checking for get_phy_flags function · 228b16cb

Andrew Lunn authored Oct 19, 2014

The check for the presence or not of the optional switch function
get_phy_flags() called the function, rather than checked to see if it
is a NULL pointer. This causes a derefernce of a NULL pointer on all
switch chips except the sf2, the only switch to implement this call.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Fixes: 6819563e ("net: dsa: allow switch drivers to specify phy_device::dev_flags")
Cc: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

228b16cb

sparc64: Do not define thread fpregs save area as zero-length array. · e2653143

David S. Miller authored Oct 18, 2014

This breaks the stack end corruption detection facility.

What that facility does it write a magic value to "end_of_stack()"
and checking to see if it gets overwritten.

"end_of_stack()" is "task_thread_info(p) + 1", which for sparc64 is
the beginning of the FPU register save area.

So once the user uses the FPU, the magic value is overwritten and the
debug checks trigger.

Fix this by making the size explicit.

Due to the size we use for the fpsaved[], gsr[], and xfsr[] arrays we
are limited to 7 levels of FPU state saves.  So each FPU register set
is 256 bytes, allocate 256 * 7 for the fpregs area.
Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>

e2653143