Commits · 48856286b64e4b66ec62b94e504d0b29c1ade664 · Kirill Smelkov / linux

08 Feb, 2013 1 commit

xen/netback: shutdown the ring if it contains garbage. · 48856286

Ian Campbell authored Feb 06, 2013

A buggy or malicious frontend should not be able to confuse netback.
If we spot anything which is not as it should be then shutdown the
device and don't try to continue with the ring in a potentially
hostile state. Well behaved and non-hostile frontends will not be
penalised.

As well as making the existing checks for such errors fatal also add a
new check that ensures that there isn't an insane number of requests
on the ring (i.e. more than would fit in the ring). If the ring
contains garbage then previously is was possible to loop over this
insane number, getting an error each time and therefore not generating
any more pending requests and therefore not exiting the loop in
xen_netbk_tx_build_gops for an externded period.

Also turn various netdev_dbg calls which no precipitate a fatal error
into netdev_err, they are rate limited because the device is shutdown
afterwards.

This fixes at least one known DoS/softlockup of the backend domain.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

48856286

04 Feb, 2013 3 commits

net: usbnet: fix tx_dropped statistics · bf414b36

Bjørn Mork authored Jan 31, 2013

It is normal for minidrivers accumulating frames to return NULL
from their tx_fixup function. We do not want to count this as a
drop, or log any debug messages.  A different exit path is
therefore chosen for such drivers, skipping the debug message
and the tx_dropped increment.

The test for accumulating drivers was however completely bogus,
making the exit path selection depend on whether the user had
enabled tx_err logging or not. This would arbitrarily mess up
accounting for both accumulating and non-accumulating minidrivers,
and would result in unwanted debug messages for the accumulating
drivers.

Fix by testing for FLAG_MULTI_PACKET instead, which probably was
the intention from the beginning.  This usage match the documented
behaviour of this flag:

 Indicates to usbnet, that USB driver accumulates multiple IP packets.
 Affects statistic (counters) and short packet handling.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

bf414b36

tcp: ipv6: Update MIB counters for drops · 5f1e942c

Vijay Subramanian authored Jan 31, 2013

This patch updates LINUX_MIB_LISTENDROPS and LINUX_MIB_LISTENOVERFLOWS in
tcp_v6_conn_request() and tcp_v6_err(). tcp_v6_conn_request() in particular can
drop SYNs for various reasons which are not currently tracked.
Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5f1e942c

tcp: Update MIB counters for drops · 848bf15f

Vijay Subramanian authored Jan 31, 2013

This patch updates LINUX_MIB_LISTENDROPS in tcp_v4_conn_request() and
tcp_v4_err(). tcp_v4_conn_request() in particular can drop SYNs for various
reasons which are not currently tracked.
Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

848bf15f

03 Feb, 2013 6 commits

packet: fix leakage of tx_ring memory · 9665d5d6

Phil Sutter authored Feb 01, 2013

When releasing a packet socket, the routine packet_set_ring() is reused
to free rings instead of allocating them. But when calling it for the
first time, it fills req->tp_block_nr with the value of rb->pg_vec_len
which in the second invocation makes it bail out since req->tp_block_nr
is greater zero but req->tp_block_size is zero.

This patch solves the problem by passing a zeroed auto-variable to
packet_set_ring() upon each invocation from packet_release().

As far as I can tell, this issue exists even since 69e3c75f (net: TX_RING
and packet mmap), i.e. the original inclusion of TX ring support into
af_packet, but applies only to sockets with both RX and TX ring
allocated, which is probably why this was unnoticed all the time.
Signed-off-by: Phil Sutter <phil.sutter@viprinet.com>
Cc: Johann Baudy <johann.baudy@gnu-log.net>
Cc: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9665d5d6

net: Fix inner_network_header assignment in skb-copy. · 92df9b21

Pravin B Shelar authored Feb 01, 2013

Use correct inner offset to set inner_network_offset.
Found by inspection.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

92df9b21

tcp: frto should not set snd_cwnd to 0 · 2e5f4212

Eric Dumazet authored Feb 03, 2013

Commit 9dc27415 (tcp: fix ABC in tcp_slow_start())
uncovered a bug in FRTO code :
tcp_process_frto() is setting snd_cwnd to 0 if the number
of in flight packets is 0.

As Neal pointed out, if no packet is in flight we lost our
chance to disambiguate whether a loss timeout was spurious.

We should assume it was a proper loss.
Reported-by: Pasi Kärkkäinen <pasik@iki.fi>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2e5f4212

tcp: fix an infinite loop in tcp_slow_start() · 973ec449

Eric Dumazet authored Feb 02, 2013

Since commit 9dc27415 (tcp: fix ABC in tcp_slow_start()),
a nul snd_cwnd triggers an infinite loop in tcp_slow_start()

Avoid this infinite loop and log a one time error for further
analysis. FRTO code is suspected to cause this bug.
Reported-by: Pasi Kärkkäinen <pasik@iki.fi>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

973ec449

Merge branch 'fixes-for-3.8' of git://gitorious.org/linux-can/linux-can · 59fa5348

David S. Miller authored Feb 02, 2013

Marc Kleine-Budde says:

====================
here's a patch for net for the v3.8 release cycle. Alexander Stein noticed that
the c_can hardware has a fixed bit in the IFx_MASK2 register. His patch fixes
writing of this register by always setting this bit.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

59fa5348

via-rhine: Fix bugs in NAPI support. · 559bcac3

David S. Miller authored Jan 29, 2013

1) rhine_tx() should use dev_kfree_skb() not dev_kfree_skb_irq()

2) rhine_slow_event_task's NAPI triggering logic is racey, it
   should just hit the interrupt mask register.  This is the
   same as commit 7dbb4918
   ("r8169: avoid NAPI scheduling delay.") made to fix the same
   problem in the r8169 driver.  From Francois Romieu.
Reported-by: Jamie Gloudon <jamie.gloudon@gmail.com>
Tested-by: Jamie Gloudon <jamie.gloudon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

559bcac3

01 Feb, 2013 3 commits

Merge branch 'wireless' · 9165bf27

David S. Miller authored Feb 01, 2013

John W. Linville says:

====================
This is a small batch of fixes intended for the 3.8 stream...

There are two pulls from Johannes.  Regarding mac80211, Johannes says:

"One fix from Dan for a possible memory overrun."

Regarding iwlwifi,  Johannes says:

"I have one fix from Emmanuel reverting a previous fix that caused
more trouble than it's worth."

Along with those:

Arend van Spriel fixes a fatal error in brcsmac related to tx status processing.

Bing Zhao corrects a problem where mwifiex would fail to complete a scan
in the event of an IE processing error.

Larry Finger fixes a thinko in rtlwifi in which the wrong skb variable
was being used in some cases.

Rafał Miłecki fixes a thinko in an ID check in the bcma flash code.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

9165bf27

Merge branch 'master' of... · ed6882ac

John W. Linville authored Feb 01, 2013

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem

ed6882ac

can: c_can: Set reserved bit in IFx_MASK2 to 1 on write · 2bd3bc4e

Alexander Stein authored Dec 13, 2012

According to C_CAN documentation, the reserved bit in IFx_MASK2 register is
fixed 1.

Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

2bd3bc4e

31 Jan, 2013 6 commits

tcp: detect SYN/data drop when F-RTO is disabled · 66555e92

Yuchung Cheng authored Jan 31, 2013

On receiving the SYN-ACK, Fast Open checks icsk_retransmit for SYN
retransmission to detect SYN/data drops. But if F-RTO is disabled,
icsk_retransmit is reset at step D of tcp_fastretrans_alert() (
under tcp_ack()) before tcp_rcv_fastopen_synack(). The fix is to use
total_retrans instead which accounts for SYN retransmission regardless
the use of F-RTO.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

66555e92

l2tp: correctly handle ancillary data in the ip6 recv path · 700163db

Tom Parkin authored Jan 31, 2013

l2tp_ip6 is incorrectly using the IPv4-specific ip_cmsg_recv to handle
ancillary data.  This means that socket options such as IPV6_RECVPKTINFO are
not honoured in userspace.

Convert l2tp_ip6 to use the IPv6-specific handler.

Ref: net/ipv6/udp.c
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: Chris Elston <celston@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

700163db

ipv6: export ip6_datagram_recv_ctl · 8e72d37e

Tom Parkin authored Jan 31, 2013

ip6_datagram_recv_ctl and ip6_datagram_send_ctl are used for handling IPv6
ancillary data.  Since ip6_datagram_send_ctl is already publicly exported for
use in modules, ip6_datagram_recv_ctl should also be available to support
ancillary data in the receive path.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8e72d37e

ipv6: rename datagram_send_ctl and datagram_recv_ctl · 73df66f8

Tom Parkin authored Jan 31, 2013

The datagram_*_ctl functions in net/ipv6/datagram.c are IPv6-specific.  Since
datagram_send_ctl is publicly exported it should be appropriately named to
reflect the fact that it's for IPv6 only.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

73df66f8

NET: qmi_wwan: add Telit LE920 support · 3d6d7ab5

Daniele Palmas authored Jan 30, 2013

Add VID, PID and fixed interface for Telit LE920
Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

3d6d7ab5

ipv6: do not create neighbor entries for local delivery · bd30e947

Marcelo Ricardo Leitner authored Jan 29, 2013

They will be created at output, if ever needed. This avoids creating
empty neighbor entries when TPROXYing/Forwarding packets for addresses
that are not even directly reachable.

Note that IPv4 already handles it this way. No neighbor entries are
created for local input.

Tested by myself and customer.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bd30e947

30 Jan, 2013 2 commits

net: usbnet: prevent buggy devices from killing us · 70c37bf9

Bjørn Mork authored Jan 28, 2013

A device sending 0 length frames as fast as it can has been
observed killing the host system due to the resulting memory
pressure.

Temporarily disable RX skb allocation and URB submission when
the current error ratio is high, preventing us from trying to
allocate an infinite number of skbs.  Reenable as soon as we
are finished processing the done queue, allowing the device
to continue working after short error bursts.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Acked-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

70c37bf9

mwifiex: fix incomplete scan in case of IE parsing error · 8a7d7cbf

Bing Zhao authored Jan 29, 2013

A scan request is split into multiple scan commands queued in
scan_pending_q. Each scan command will be sent to firmware and
its response is handlded one after another.

If any error is detected while parsing IE in command response
buffer the remaining data will be ignored and error is returned.

We should check if there is any more scan commands pending in
the queue before returning error. This ensures that we will call
cfg80211_scan_done if this is the last scan command, or send
next scan command in scan_pending_q to firmware.

Cc: "3.6+" <stable@vger.kernel.org>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

8a7d7cbf

29 Jan, 2013 14 commits

vmxnet3: set carrier state properly on probe · 6cdd20c3

Neil Horman authored Jan 29, 2013

vmxnet3 fails to set netif_carrier_off on probe, meaning that when an interface
is opened the __LINK_STATE_NOCARRIER bit is already cleared, and so
/sys/class/net/<ifname>/operstate remains in the unknown state.  Correct this by
setting netif_carrier_off on probe, like other drivers do.

Also, while we're at it, lets remove the netif_carrier_ok checks from the
link_state_update function, as that check is atomically contained within the
netif_carrier_[on|off] functions anyway

Tested successfully by myself
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: "VMware, Inc." <pv-drivers@vmware.com>
CC: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6cdd20c3

e1000e: enable ECC on I217/I218 to catch packet buffer memory errors · 28600304

Bruce Allan authored Jan 28, 2013

In rare instances, memory errors have been detected in the internal packet
buffer memory on I217/I218 when stressed under certain environmental
conditions. Enable Error Correcting Code (ECC) in hardware to catch both
correctable and uncorrectable errors. Correctable errors will be handled
by the hardware. Uncorrectable errors in the packet buffer will cause the
packet to be received with an error indication in the buffer descriptor
causing the packet to be discarded. If the uncorrectable error is in the
descriptor itself, the hardware will stop and interrupt the driver
indicating the error. The driver will then reset the hardware in order to
clear the error and restart.

Both types of errors will be accounted for in statistics counters.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Cc: <stable@vger.kernel.org> # 3.5.x & 3.6.x
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

28600304

bonding: unset primary slave via sysfs · eb492f74

Milos Vyletel authored Jan 29, 2013

When bonding module is loaded with primary parameter and one decides to unset
primary slave using sysfs these settings are not preserved during bond device
restart. Primary slave is only unset once and it's not remembered in
bond->params structure. Below is example of recreation.

 grep OPTS /etc/sysconfig/network-scripts/ifcfg-bond0
BONDING_OPTS="mode=active-backup miimon=100 primary=eth01"
 grep "Primary Slave" /proc/net/bonding/bond0
Primary Slave: eth01 (primary_reselect always)

 echo "" > /sys/class/net/bond0/bonding/primary
 grep "Primary Slave" /proc/net/bonding/bond0
Primary Slave: None

 sed -i -e 's/primary=eth01//' /etc/sysconfig/network-scripts/ifcfg-bond0
 grep OPTS /etc/sysconfig/network-scripts/ifcfg-bond
BONDING_OPTS="mode=active-backup miimon=100 "
 ifdown bond0 && ifup bond0

without patch:
 grep "Primary Slave" /proc/net/bonding/bond0
Primary Slave: eth01 (primary_reselect always)

with patch:
 grep "Primary Slave" /proc/net/bonding/bond0
Primary Slave: None
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Milos Vyletel <milos.vyletel@sde.cz>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

eb492f74

tcp: Increment LISTENOVERFLOW and LISTENDROPS in tcp_v4_conn_request() · 2aeef18d

Nivedita Singhvi authored Jan 28, 2013

We drop a connection request if the accept backlog is full and there are
sufficient packets in the syn queue to warrant starting drops. Increment the
appropriate counters so this isn't silent, for accurate stats and help in
debugging.

This patch assumes LINUX_MIB_LISTENDROPS is a superset of/includes the
counter LINUX_MIB_LISTENOVERFLOWS.
Signed-off-by: Nivedita Singhvi <niv@us.ibm.com>
Acked-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2aeef18d

ipv6 addrconf: Fix interface identifiers of 802.15.4 devices. · 5e98a36e

YOSHIFUJI Hideaki / 吉藤英明 authored Jan 28, 2013

The "Universal/Local" (U/L) bit must be complmented according to RFC4944
and RFC2464.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

5e98a36e

be2net: Updating Module Author string and log message string to "Emulex Corporation" · 00d3d51e
Sarveshwar Bandi authored Jan 28, 2013
```
Signed-off-by: Sarveshwar Bandi <sarveshwar.bandi@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
00d3d51e

tuntap: allow polling/writing/reading when detached · 9e85722d

Jason Wang authored Jan 28, 2013

We forbid polling, writing and reading when the file were detached, this may
complex the user in several cases:

- when guest pass some buffers to vhost/qemu and then disable some queues,
  host/qemu needs to do its own cleanup on those buffers which is complex
  sometimes. We can do this simply by allowing a user can still write to an
  disabled queue. Write to an disabled queue will cause the packet pass to the
  kernel and read will get nothing.
- align the polling behavior with macvtap which never fails when the queue is
  created. This can simplify the polling errors handling of its user (e.g vhost)

We can simply achieve this by don't assign NULL to tfile->tun when detached.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9e85722d

vhost_net: handle polling errors when setting backend · 2b8b328b

Jason Wang authored Jan 28, 2013

Currently, the polling errors were ignored, which can lead following issues:

- vhost remove itself unconditionally from waitqueue when stopping the poll,
  this may crash the kernel since the previous attempt of starting may fail to
  add itself to the waitqueue
- userspace may think the backend were successfully set even when the polling
  failed.

Solve this by:

- check poll->wqh before trying to remove from waitqueue
- report polling errors in vhost_poll_start(), tx_poll_start(), the return value
  will be checked and returned when userspace want to set the backend

After this fix, there still could be a polling failure after backend is set, it
will addressed by the next patch.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2b8b328b

vhost_net: correct error handling in vhost_net_set_backend() · 692a998b

Jason Wang authored Jan 28, 2013

Currently, when vhost_init_used() fails the sock refcnt and ubufs were
leaked. Correct this by calling vhost_init_used() before assign ubufs and
restore the oldsock when it fails.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

692a998b

tun: fix carrier on/off status · af668b3c

Michael S. Tsirkin authored Jan 28, 2013

Commit c8d68e6b removed carrier off call
from tun_detach since it's now called on queue disable and not only on
tun close.  This confuses userspace which used this flag to detect a
free tun. To fix, put this back but under if (clean).
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Jason Wang <jasowang@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Tested-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

af668b3c

pktgen: correctly handle failures when adding a device · 604dfd6e

Cong Wang authored Jan 27, 2013

The return value of pktgen_add_device() is not checked, so
even if we fail to add some device, for example, non-exist one,
we still see "OK:...". This patch fixes it.

After this patch, I got:

	# echo "add_device non-exist" > /proc/net/pktgen/kpktgend_0
	-bash: echo: write error: No such device
	# cat /proc/net/pktgen/kpktgend_0
	Running:
	Stopped:
	Result: ERROR: can not add device non-exist
	# echo "add_device eth0" > /proc/net/pktgen/kpktgend_0
	# cat /proc/net/pktgen/kpktgend_0
	Running:
	Stopped: eth0
	Result: OK: add_device=eth0

(Candidate for -stable)

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

604dfd6e

netem: fix delay calculation in rate extension · a13d3104

Johannes Naab authored Jan 23, 2013

The delay calculation with the rate extension introduces in v3.3 does
not properly work, if other packets are still queued for transmission.
For the delay calculation to work, both delay types (latency and delay
introduces by rate limitation) have to be handled differently. The
latency delay for a packet can overlap with the delay of other packets.
The delay introduced by the rate however is separate, and can only
start, once all other rate-introduced delays finished.

Latency delay is from same distribution for each packet, rate delay
depends on the packet size.

.: latency delay
-: rate delay
x: additional delay we have to wait since another packet is currently
   transmitted

  .....----                    Packet 1
    .....xx------              Packet 2
               .....------     Packet 3
    ^^^^^
    latency stacks
         ^^
         rate delay doesn't stack
               ^^
               latency stacks

  -----> time

When a packet is enqueued, we first consider the latency delay. If other
packets are already queued, we can reduce the latency delay until the
last packet in the queue is send, however the latency delay cannot be
<0, since this would mean that the rate is overcommitted.  The new
reference point is the time at which the last packet will be send. To
find the time, when the packet should be send, the rate introduces delay
has to be added on top of that.
Signed-off-by: Johannes Naab <jn@stusta.de>
Acked-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

a13d3104

l2tp: prevent l2tp_tunnel_delete racing with userspace close · 80d84ef3

Tom Parkin authored Jan 22, 2013

If a tunnel socket is created by userspace, l2tp hooks the socket destructor
in order to clean up resources if userspace closes the socket or crashes. It
also caches a pointer to the struct sock for use in the data path and in the
netlink interface.

While it is safe to use the cached sock pointer in the data path, where the
skb references keep the socket alive, it is not safe to use it elsewhere as
such access introduces a race with userspace closing the socket. In
particular, l2tp_tunnel_delete is prone to oopsing if a multithreaded
userspace application closes a socket at the same time as sending a netlink
delete command for the tunnel.

This patch fixes this oops by forcing l2tp_tunnel_delete to explicitly look up
a tunnel socket held by userspace using sockfd_lookup().
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

80d84ef3

Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc · fc16e884

Linus Torvalds authored Jan 28, 2013

Pull powerpc fixes from Benjamin Herrenschmidt:
 "Whenever you have a chance between two dives, you might want to
  consider pulling my merge branch to pickup a few fixes for 3.8 that
  have been accumulating for the last couple of weeks (I was myself
  travelling then on vacation).

  Nothing major, just a handful of powerpc bug fixes that I consider
  worth getting in before 3.8 goes final."

And I'll have everybody know that I'm not diving for several days yet.
Snif.

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc: Max next_tb to prevent from replaying timer interrupt
  powerpc: kernel/kgdb.c: Fix memory leakage
  powerpc/book3e: Disable interrupt after preempt_schedule_irq
  powerpc/oprofile: Fix error in oprofile power7_marked_instr_event() function
  powerpc/pasemi: Fix crash on reboot
  powerpc: Fix MAX_STACK_TRACE_ENTRIES too low warning for ppc32

fc16e884

28 Jan, 2013 5 commits

powerpc: Max next_tb to prevent from replaying timer interrupt · 689dfa89

Tiejun Chen authored Jan 15, 2013

With lazy interrupt, we always call __check_irq_replaysome with
decrementers_next_tb to check if we need to replay timer interrupt.
So in hotplug case we also need to set decrementers_next_tb as MAX
to make sure __check_irq_replay don't replay timer interrupt
when return as we expect, otherwise we'll trap here infinitely.
Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

689dfa89

powerpc: kernel/kgdb.c: Fix memory leakage · fefd9e6f

Cong Ding authored Jan 14, 2013

the variable backup_current_thread_info isn't freed before existing the
function.
Signed-off-by: Cong Ding <dinggnu@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

fefd9e6f

powerpc/book3e: Disable interrupt after preempt_schedule_irq · 572177d7

Tiejun Chen authored Jan 06, 2013

In preempt case current arch_local_irq_restore() from
preempt_schedule_irq() may enable hard interrupt but we really
should disable interrupts when we return from the interrupt,
and so that we don't get interrupted after loading SRR0/1.
Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

572177d7

powerpc/oprofile: Fix error in oprofile power7_marked_instr_event() function · 46ed7a76

Carl E. Love authored Nov 29, 2012

The calculation for the left shift of the mask OPROFILE_PM_PMCSEL_MSK has an
error.  The calculation is should be to shift left by (max_cntrs - cntr) times
the width of the pmsel field width.  However, the #define OPROFILE_MAX_PMC_NUM
was used instead of OPROFILE_PMSEL_FIELD_WIDTH.  This patch fixes the
calculation.
Signed-off-by: Carl Love <cel@us.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

46ed7a76

powerpc/pasemi: Fix crash on reboot · 72640d88

Steven Rostedt authored Jan 21, 2013

commit f96972f2 "kernel/sys.c: call disable_nonboot_cpus() in
kernel_restart()"

added a call to disable_nonboot_cpus() on kernel_restart(), which tries
to shutdown all the CPUs except the first one. The issue with the PA
Semi, is that it does not support CPU hotplug.

When the call is made to __cpu_down(), it calls the notifiers
CPU_DOWN_PREPARE, and then tries to take the CPU down.

One of the notifiers to the CPU hotplug code, is the cpufreq. The
DOWN_PREPARE will call __cpufreq_remove_dev() which calls
cpufreq_driver->exit. The PA Semi exit handler unmaps regions of I/O
that is used by an interrupt that goes off constantly
(system_reset_common, but it goes off during normal system operations
too). I'm not sure exactly what this interrupt does.

Running a simple function trace, you can see it goes off quite a bit:

# tracer: function
#
#           TASK-PID    CPU#    TIMESTAMP  FUNCTION
#              | |       |          |         |
          <idle>-0     [001]  1558.859363: .pasemi_system_reset_exception <-.system_reset_exception
          <idle>-0     [000]  1558.860112: .pasemi_system_reset_exception <-.system_reset_exception
          <idle>-0     [000]  1558.861109: .pasemi_system_reset_exception <-.system_reset_exception
          <idle>-0     [001]  1558.861361: .pasemi_system_reset_exception <-.system_reset_exception
          <idle>-0     [000]  1558.861437: .pasemi_system_reset_exception <-.system_reset_exception

When the region is unmapped, the system crashes with:

Disabling non-boot CPUs ...
Error taking CPU1 down: -38
Unable to handle kernel paging request for data at address 0xd0000800903a0100
Faulting instruction address: 0xc000000000055fcc
Oops: Kernel access of bad area, sig: 11 [#1]
PREEMPT SMP NR_CPUS=64 NUMA PA Semi PWRficient
Modules linked in: shpchp
NIP: c000000000055fcc LR: c000000000055fb4 CTR: c0000000000df1fc
REGS: c0000000012175d0 TRAP: 0300   Not tainted  (3.8.0-rc4-test-dirty)
MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI>  CR: 24000088  XER: 00000000
SOFTE: 0
DAR: d0000800903a0100, DSISR: 42000000
TASK = c0000000010e9008[0] 'swapper/0' THREAD: c000000001214000 CPU: 0
GPR00: d0000800903a0000 c000000001217850 c0000000012167e0 0000000000000000
GPR04: 0000000000000000 0000000000000724 0000000000000724 0000000000000000
GPR08: 0000000000000000 0000000000000000 0000000000000001 0000000000a70000
GPR12: 0000000024000080 c00000000fff0000 ffffffffffffffff 000000003ffffae0
GPR16: ffffffffffffffff 0000000000a21198 0000000000000060 0000000000000000
GPR20: 00000000008fdd35 0000000000a21258 000000003ffffaf0 0000000000000417
GPR24: 0000000000a226d0 c000000000000000 0000000000000000 0000000000000000
GPR28: c00000000138b358 0000000000000000 c000000001144818 d0000800903a0100
NIP [c000000000055fcc] .set_astate+0x5c/0xa4
LR [c000000000055fb4] .set_astate+0x44/0xa4
Call Trace:
[c000000001217850] [c000000000055fb4] .set_astate+0x44/0xa4 (unreliable)
[c0000000012178f0] [c00000000005647c] .restore_astate+0x2c/0x34
[c000000001217980] [c000000000054668] .pasemi_system_reset_exception+0x6c/0x88
[c000000001217a00] [c000000000019ef0] .system_reset_exception+0x48/0x84
[c000000001217a80] [c000000000001e40] system_reset_common+0x140/0x180
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

72640d88