Commits · 3cdf02807d7d96dbf120809c7c1a4196ff812e4e · Kirill Smelkov / linux

28 Jan, 2015 11 commits

Richard Weinberger authored Oct 27, 2014

commit f38aed97 upstream.

The logic of vfree()'ing vol->upd_buf is tied to vol->updating.
In ubi_start_update() vol->updating is set long before vmalloc()'ing
vol->upd_buf. If we encounter a write failure in ubi_start_update()
before vmalloc() the UBI device release function will try to vfree()
vol->upd_buf because vol->updating is set.
Fix this by allocating vol->upd_buf directly after setting vol->updating.

Fixes:
[   31.559338] UBI warning: vol_cdev_release: update of volume 2 not finished, volume is damaged
[   31.559340] ------------[ cut here ]------------
[   31.559343] WARNING: CPU: 1 PID: 2747 at mm/vmalloc.c:1446 __vunmap+0xe3/0x110()
[   31.559344] Trying to vfree() nonexistent vm area (ffffc90001f2b000)
[   31.559345] Modules linked in:
[   31.565620]  0000000000000bba ffff88002a0cbdb0 ffffffff818f0497 ffff88003b9ba148
[   31.566347]  ffff88002a0cbde0 ffffffff8156f515 ffff88003b9ba148 0000000000000bba
[   31.567073]  0000000000000000 0000000000000000 ffff88002a0cbe88 ffffffff8156c10a
[   31.567793] Call Trace:
[   31.568034]  [<ffffffff818f0497>] dump_stack+0x4e/0x7a
[   31.568510]  [<ffffffff8156f515>] ubi_io_write_vid_hdr+0x155/0x160
[   31.569084]  [<ffffffff8156c10a>] ubi_eba_write_leb+0x23a/0x870
[   31.569628]  [<ffffffff81569b36>] vol_cdev_write+0x226/0x380
[   31.570155]  [<ffffffff81179265>] vfs_write+0xb5/0x1f0
[   31.570627]  [<ffffffff81179f8a>] SyS_pwrite64+0x6a/0xa0
[   31.571123]  [<ffffffff818fde12>] system_call_fastpath+0x16/0x1b
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

3cdf0280

UBI: Fix double free after do_sync_erase() · 6b8d849d

Richard Weinberger authored Nov 06, 2014

commit aa5ad3b6 upstream.

If the erase worker is unable to erase a PEB it will
free the ubi_wl_entry itself.
The failing ubi_wl_entry must not free()'d again after
do_sync_erase() returns.
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

6b8d849d

KVM: s390: flush CPU on load control · 6898a386

Christian Borntraeger authored Oct 31, 2014

commit 2dca485f upstream.

some control register changes will flush some aspects of the CPU, e.g.
POP explicitely mentions that for CR9-CR11 "TLBs may be cleared".
Instead of trying to be clever and only flush on specific CRs, let
play safe and flush on all lctl(g) as future machines might define
new bits in CRs. Load control intercept should not happen that often.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

6898a386

usb: renesas_usbhs: gadget: fix NULL pointer dereference in ep_disable() · 0a271e44

Kazuya Mizuguchi authored Nov 04, 2014

commit 11432050 upstream.

This patch fixes an issue that the NULL pointer dereference happens
when we uses g_audio driver. Since the g_audio driver will call
usb_ep_disable() in afunc_set_alt() before it calls usb_ep_enable(),
the uep->pipe of renesas usbhs driver will be NULL. So, this patch
adds a condition to avoid the oops.
Signed-off-by: Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
Signed-off-by: Takeshi Kihara <takeshi.kihara.df@renesas.com>
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Fixes: 2f98382d (usb: renesas_usbhs: Add Renesas USBHS Gadget)
Signed-off-by: Felipe Balbi <balbi@ti.com>
[ kamal: backport to 3.13-stable ]
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

0a271e44

writeback: fix a subtle race condition in I_DIRTY clearing · 14d5d1ea

Tejun Heo authored Oct 24, 2014

commit 9c6ac78e upstream.

After invoking ->dirty_inode(), __mark_inode_dirty() does smp_mb() and
tests inode->i_state locklessly to see whether it already has all the
necessary I_DIRTY bits set.  The comment above the barrier doesn't
contain any useful information - memory barriers can't ensure "changes
are seen by all cpus" by itself.

And it sure enough was broken.  Please consider the following
scenario.

 CPU 0					CPU 1
 -------------------------------------------------------------------------------

					enters __writeback_single_inode()
					grabs inode->i_lock
					tests PAGECACHE_TAG_DIRTY which is clear
 enters __set_page_dirty()
 grabs mapping->tree_lock
 sets PAGECACHE_TAG_DIRTY
 releases mapping->tree_lock
 leaves __set_page_dirty()

 enters __mark_inode_dirty()
 smp_mb()
 sees I_DIRTY_PAGES set
 leaves __mark_inode_dirty()
					clears I_DIRTY_PAGES
					releases inode->i_lock

Now @inode has dirty pages w/ I_DIRTY_PAGES clear.  This doesn't seem
to lead to an immediately critical problem because requeue_inode()
later checks PAGECACHE_TAG_DIRTY instead of I_DIRTY_PAGES when
deciding whether the inode needs to be requeued for IO and there are
enough unintentional memory barriers inbetween, so while the inode
ends up with inconsistent I_DIRTY_PAGES flag, it doesn't fall off the
IO list.

The lack of explicit barrier may also theoretically affect the other
I_DIRTY bits which deal with metadata dirtiness.  There is no
guarantee that a strong enough barrier exists between
I_DIRTY_[DATA]SYNC clearing and write_inode() writing out the dirtied
inode.  Filesystem inode writeout path likely has enough stuff which
can behave as full barrier but it's theoretically possible that the
writeout may not see all the updates from ->dirty_inode().

Fix it by adding an explicit smp_mb() after I_DIRTY clearing.  Note
that I_DIRTY_PAGES needs a special treatment as it always needs to be
cleared to be interlocked with the lockless test on
__mark_inode_dirty() side.  It's cleared unconditionally and
reinstated after smp_mb() if the mapping still has dirty pages.

Also add comments explaining how and why the barriers are paired.

Lightly tested.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

14d5d1ea

[media] af9005: fix kernel panic on init if compiled without IR · a1de9d04

Frank Schaefer authored Sep 29, 2014

commit 22799487 upstream.

This patches fixes an ancient bug in the dvb_usb_af9005 driver, which
has been reported at least in the following threads:
https://lkml.org/lkml/2009/2/4/350
https://lkml.org/lkml/2014/9/18/558

If the driver is compiled in without any IR support (neither
DVB_USB_AF9005_REMOTE nor custom symbols), the symbol_request calls in
af9005_usb_module_init() return pointers != NULL although the IR
symbols are not available.

This leads to the following oops:
...
[    8.529751] usbcore: registered new interface driver dvb_usb_af9005
[    8.531584] BUG: unable to handle kernel paging request at 02e00000
[    8.533385] IP: [<7d9d67c6>] af9005_usb_module_init+0x6b/0x9d
[    8.535613] *pde = 00000000
[    8.536416] Oops: 0000 [#1] PREEMPT PREEMPT DEBUG_PAGEALLOCDEBUG_PAGEALLOC
[    8.537863] CPU: 0 PID: 1 Comm: swapper Not tainted 3.15.0-rc6-00151-ga5c075cf #1
[    8.539827] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[    8.541519] task: 89c9a670 ti: 89c9c000 task.ti: 89c9c000
[    8.541519] EIP: 0060:[<7d9d67c6>] EFLAGS: 00010206 CPU: 0
[    8.541519] EIP is at af9005_usb_module_init+0x6b/0x9d
[    8.541519] EAX: 02e00000 EBX: 00000000 ECX: 00000006 EDX: 00000000
[    8.541519] ESI: 00000000 EDI: 7da33ec8 EBP: 89c9df30 ESP: 89c9df2c
[    8.541519]  DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
[    8.541519] CR0: 8005003b CR2: 02e00000 CR3: 05a54000 CR4: 00000690
[    8.541519] Stack:
[    8.541519]  7d9d675b 89c9df90 7d992a49 7d7d5914 89c9df4c 7be3a800 7d08c58c 8a4c3968
[    8.541519]  89c9df80 7be3a966 00000192 00000006 00000006 7d7d3ff4 8a4c397a 00000200
[    8.541519]  7d6b1280 8a4c3979 00000006 000009a6 7da32db8 b13eec81 00000006 000009a6
[    8.541519] Call Trace:
[    8.541519]  [<7d9d675b>] ? ttusb2_driver_init+0x16/0x16
[    8.541519]  [<7d992a49>] do_one_initcall+0x77/0x106
[    8.541519]  [<7be3a800>] ? parameqn+0x2/0x35
[    8.541519]  [<7be3a966>] ? parse_args+0x113/0x25c
[    8.541519]  [<7d992bc2>] kernel_init_freeable+0xea/0x167
[    8.541519]  [<7cf01070>] kernel_init+0x8/0xb8
[    8.541519]  [<7cf27ec0>] ret_from_kernel_thread+0x20/0x30
[    8.541519]  [<7cf01068>] ? rest_init+0x10c/0x10c
[    8.541519] Code: 08 c2 c7 05 44 ed f9 7d 00 00 e0 02 c7 05 40 ed f9 7d 00 00 e0 02 c7 05 3c ed f9 7d 00 00 e0 02 75 1f b8 00 00 e0 02 85 c0 74 16 <a1> 00 00 e0 02 c7 05 54 84 8e 7d 00 00 e0 02 a3 58 84 8e 7d eb
[    8.541519] EIP: [<7d9d67c6>] af9005_usb_module_init+0x6b/0x9d SS:ESP 0068:89c9df2c
[    8.541519] CR2: 0000000002e00000
[    8.541519] ---[ end trace 768b6faf51370fc7 ]---

The prefered fix would be to convert the whole IR code to use the kernel IR
infrastructure (which wasn't available at the time this driver had been created).

Until anyone who still has this old hardware steps up an does the conversion,
fix it by not calling the symbol_request calls if the driver is compiled in
without the default IR symbols (CONFIG_DVB_USB_AF9005_REMOTE).
Due to the IR related pointers beeing NULL by default, IR support will then be disabled.

The downside of this solution is, that it will no longer be possible to
compile custom IR symbols (not using CONFIG_DVB_USB_AF9005_REMOTE) in.

Please note that this patch has NOT been tested with all possible cases.
I don't have the hardware and could only verify that it fixes the reported
bug.
Reported-by: Fengguag Wu <fengguang.wu@intel.com>
Signed-off-by: Frank Schäfer <fschaefer.oss@googlemail.com>
Acked-by: Luca Olivetti <luca@ventoso.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

a1de9d04

[media] sound: Update au0828 quirks table · 1a0c4262

Mauro Carvalho Chehab authored Oct 30, 2014

commit 678fa12f upstream.

The au0828 quirks table is currently not in sync with the au0828
media driver.

Syncronize it and put them on the same order as found at au0828
driver, as all the au0828 devices with analog TV need the
same quirks.
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

1a0c4262

[media] sound: simplify au0828 quirk table · a7e70b78

Mauro Carvalho Chehab authored Oct 30, 2014

commit 5d1f00a2 upstream.

Add a macro to simplify au0828 quirk table. That makes easier
to check it against the USB IDs at drivers/media/usb/au0828/au0828-cards.c.
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
[ kamal: backport to 3.13-stable: context (removed entries order) ]
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

a7e70b78

[media] smiapp-pll: Correct clock debug prints · bc805ad5

Sakari Ailus authored Apr 01, 2014

commit bc47150a upstream.

The PLL flags were not used correctly.
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

bc805ad5

[media] smiapp: Take mutex during PLL update in sensor initialisation · e7f44ca0

Sakari Ailus authored Sep 16, 2014

commit f85698cd upstream.

The mutex does not serialise anything in this case but avoids a lockdep
warning from the control framework.
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

e7f44ca0

eCryptfs: Force RO mount when encrypted view is enabled · 99fff9f3

Tyler Hicks authored Oct 07, 2014

commit 332b122d upstream.

The ecryptfs_encrypted_view mount option greatly changes the
functionality of an eCryptfs mount. Instead of encrypting and decrypting
lower files, it provides a unified view of the encrypted files in the
lower filesystem. The presence of the ecryptfs_encrypted_view mount
option is intended to force a read-only mount and modifying files is not
supported when the feature is in use. See the following commit for more
information:

  e77a56dd [PATCH] eCryptfs: Encrypted passthrough

This patch forces the mount to be read-only when the
ecryptfs_encrypted_view mount option is specified by setting the
MS_RDONLY flag on the superblock. Additionally, this patch removes some
broken logic in ecryptfs_open() that attempted to prevent modifications
of files when the encrypted view feature was in use. The check in
ecryptfs_open() was not sufficient to prevent file modifications using
system calls that do not operate on a file descriptor.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Priya Bansal <p.bansal@samsung.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

99fff9f3

21 Jan, 2015 19 commits

xen-netfront: use correct linear area after linearizing an skb · db76bea3

David Vrabel authored Dec 09, 2014

commit 11d3d2a1 upstream.

Commit 97a6d1bb (xen-netfront: Fix
handling packets on compound pages with skb_linearize) attempted to
fix a problem where an skb that would have required too many slots
would be dropped causing TCP connections to stall.

However, it filled in the first slot using the original buffer and not
the new one and would use the wrong offset and grant access to the
wrong page.

Netback would notice the malformed request and stop all traffic on the
VIF, reporting:

    vif vif-3-0 vif3.0: txreq.offset: 85e, size: 4002, end: 6144
    vif vif-3-0 vif3.0: fatal error; disabling device
Reported-by: Anthony Wright <anthony@overnetdata.com>
Tested-by: Anthony Wright <anthony@overnetdata.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
BugLink: http://bugs.launchpad.net/bugs/1317811Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

db76bea3

xen-netfront: Fix handling packets on compound pages with skb_linearize · b35b8e78

Zoltan Kiss authored Aug 11, 2014

commit 97a6d1bb upstream.

There is a long known problem with the netfront/netback interface: if the guest
tries to send a packet which constitues more than MAX_SKB_FRAGS + 1 ring slots,
it gets dropped. The reason is that netback maps these slots to a frag in the
frags array, which is limited by size. Having so many slots can occur since
compound pages were introduced, as the ring protocol slice them up into
individual (non-compound) page aligned slots. The theoretical worst case
scenario looks like this (note, skbs are limited to 64 Kb here):
linear buffer: at most PAGE_SIZE - 17 * 2 bytes, overlapping page boundary,
using 2 slots
first 15 frags: 1 + PAGE_SIZE + 1 bytes long, first and last bytes are at the
end and the beginning of a page, therefore they use 3 * 15 = 45 slots
last 2 frags: 1 + 1 bytes, overlapping page boundary, 2 * 2 = 4 slots
Although I don't think this 51 slots skb can really happen, we need a solution
which can deal with every scenario. In real life there is only a few slots
overdue, but usually it causes the TCP stream to be blocked, as the retry will
most likely have the same buffer layout.
This patch solves this problem by linearizing the packet. This is not the
fastest way, and it can fail much easier as it tries to allocate a big linear
area for the whole packet, but probably easier by an order of magnitude than
anything else. Probably this code path is not touched very frequently anyway.
Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Cc: Paul Durrant <paul.durrant@citrix.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
Signed-off-by: David S. Miller <davem@davemloft.net>
BugLink: http://bugs.launchpad.net/bugs/1317811Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

b35b8e78

netfilter: conntrack: disable generic tracking for known protocols · 25bde5dd

Florian Westphal authored Sep 26, 2014

commit db29a950 upstream.

Given following iptables ruleset:

-P FORWARD DROP
-A FORWARD -m sctp --dport 9 -j ACCEPT
-A FORWARD -p tcp --dport 80 -j ACCEPT
-A FORWARD -p tcp -m conntrack -m state ESTABLISHED,RELATED -j ACCEPT

One would assume that this allows SCTP on port 9 and TCP on port 80.
Unfortunately, if the SCTP conntrack module is not loaded, this allows
*all* SCTP communication, to pass though, i.e. -p sctp -j ACCEPT,
which we think is a security issue.

This is because on the first SCTP packet on port 9, we create a dummy
"generic l4" conntrack entry without any port information (since
conntrack doesn't know how to extract this information).

All subsequent packets that are unknown will then be in established
state since they will fallback to proto_generic and will match the
'generic' entry.

Our originally proposed version [1] completely disabled generic protocol
tracking, but Jozsef suggests to not track protocols for which a more
suitable helper is available, hence we now mitigate the issue for in
tree known ct protocol helpers only, so that at least NAT and direction
information will still be preserved for others.

 [1] http://www.spinics.net/lists/netfilter-devel/msg33430.html

Joint work with Daniel Borkmann.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

25bde5dd

macvlan: unregister net device when netdev_upper_dev_link() fails · 1bfbf30d

Cong Wang authored Feb 11, 2014

commit da37705c upstream.

rtnl_newlink() doesn't unregister it for us on failure.

Cc: Patrick McHardy <kaber@trash.net>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Zefan Li <lizefan@huawei.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

1bfbf30d

net/core: Handle csum for CHECKSUM_COMPLETE VXLAN forwarding · 4fdd841d

Jay Vosburgh authored Dec 19, 2014

[ Upstream commit 2c26d34b ]

When using VXLAN tunnels and a sky2 device, I have experienced
checksum failures of the following type:

[ 4297.761899] eth0: hw csum failure
[...]
[ 4297.765223] Call Trace:
[ 4297.765224]  <IRQ>  [<ffffffff8172f026>] dump_stack+0x46/0x58
[ 4297.765235]  [<ffffffff8162ba52>] netdev_rx_csum_fault+0x42/0x50
[ 4297.765238]  [<ffffffff8161c1a0>] ? skb_push+0x40/0x40
[ 4297.765240]  [<ffffffff8162325c>] __skb_checksum_complete+0xbc/0xd0
[ 4297.765243]  [<ffffffff8168c602>] tcp_v4_rcv+0x2e2/0x950
[ 4297.765246]  [<ffffffff81666ca0>] ? ip_rcv_finish+0x360/0x360

	These are reliably reproduced in a network topology of:

container:eth0 == host(OVS VXLAN on VLAN) == bond0 == eth0 (sky2) -> switch

	When VXLAN encapsulated traffic is received from a similarly
configured peer, the above warning is generated in the receive
processing of the encapsulated packet.  Note that the warning is
associated with the container eth0.

        The skbs from sky2 have ip_summed set to CHECKSUM_COMPLETE, and
because the packet is an encapsulated Ethernet frame, the checksum
generated by the hardware includes the inner protocol and Ethernet
headers.

	The receive code is careful to update the skb->csum, except in
__dev_forward_skb, as called by dev_forward_skb.  __dev_forward_skb
calls eth_type_trans, which in turn calls skb_pull_inline(skb, ETH_HLEN)
to skip over the Ethernet header, but does not update skb->csum when
doing so.

	This patch resolves the problem by adding a call to
skb_postpull_rcsum to update the skb->csum after the call to
eth_type_trans.
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

4fdd841d

enic: fix rx skb checksum · aad9a4c0

Govindarajulu Varadarajan authored Dec 18, 2014

[ Upstream commit 17e96834 ]

Hardware always provides compliment of IP pseudo checksum. Stack expects
whole packet checksum without pseudo checksum if CHECKSUM_COMPLETE is set.

This causes checksum error in nf & ovs.

kernel: qg-19546f09-f2: hw csum failure
kernel: CPU: 9 PID: 0 Comm: swapper/9 Tainted: GF          O--------------   3.10.0-123.8.1.el7.x86_64 #1
kernel: Hardware name: Cisco Systems Inc UCSB-B200-M3/UCSB-B200-M3, BIOS B200M3.2.2.3.0.080820141339 08/08/2014
kernel: ffff881218f40000 df68243feb35e3a8 ffff881237a43ab8 ffffffff815e237b
kernel: ffff881237a43ad0 ffffffff814cd4ca ffff8829ec71eb00 ffff881237a43af0
kernel: ffffffff814c6232 0000000000000286 ffff8829ec71eb00 ffff881237a43b00
kernel: Call Trace:
kernel: <IRQ>  [<ffffffff815e237b>] dump_stack+0x19/0x1b
kernel: [<ffffffff814cd4ca>] netdev_rx_csum_fault+0x3a/0x40
kernel: [<ffffffff814c6232>] __skb_checksum_complete_head+0x62/0x70
kernel: [<ffffffff814c6251>] __skb_checksum_complete+0x11/0x20
kernel: [<ffffffff8155a20c>] nf_ip_checksum+0xcc/0x100
kernel: [<ffffffffa049edc7>] icmp_error+0x1f7/0x35c [nf_conntrack_ipv4]
kernel: [<ffffffff814cf419>] ? netif_rx+0xb9/0x1d0
kernel: [<ffffffffa040eb7b>] ? internal_dev_recv+0xdb/0x130 [openvswitch]
kernel: [<ffffffffa04c8330>] nf_conntrack_in+0xf0/0xa80 [nf_conntrack]
kernel: [<ffffffff81509380>] ? inet_del_offload+0x40/0x40
kernel: [<ffffffffa049e302>] ipv4_conntrack_in+0x22/0x30 [nf_conntrack_ipv4]
kernel: [<ffffffff815005ca>] nf_iterate+0xaa/0xc0
kernel: [<ffffffff81509380>] ? inet_del_offload+0x40/0x40
kernel: [<ffffffff81500664>] nf_hook_slow+0x84/0x140
kernel: [<ffffffff81509380>] ? inet_del_offload+0x40/0x40
kernel: [<ffffffff81509dd4>] ip_rcv+0x344/0x380

Hardware verifies IP & tcp/udp header checksum but does not provide payload
checksum, use CHECKSUM_UNNECESSARY. Set it only if its valid IP tcp/udp packet.

Cc: Jiri Benc <jbenc@redhat.com>
Cc: Stefan Assmann <sassmann@redhat.com>
Reported-by: Sunil Choudhary <schoudha@redhat.com>
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Reviewed-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

aad9a4c0

team: avoid possible underflow of count_pending value for notify_peers and mcast_rejoin · 57bdf691

Jiri Pirko authored Jan 14, 2015

[ Upstream commit b0d11b42 ]

This patch is fixing a race condition that may cause setting
count_pending to -1, which results in unwanted big bulk of arp messages
(in case of "notify peers").

Consider following scenario:

count_pending == 2
   CPU0                                           CPU1
					team_notify_peers_work
					  atomic_dec_and_test (dec count_pending to 1)
					  schedule_delayed_work
 team_notify_peers
   atomic_add (adding 1 to count_pending)
					team_notify_peers_work
					  atomic_dec_and_test (dec count_pending to 1)
					  schedule_delayed_work
					team_notify_peers_work
					  atomic_dec_and_test (dec count_pending to 0)
   schedule_delayed_work
					team_notify_peers_work
					  atomic_dec_and_test (dec count_pending to -1)

Fix this race by using atomic_dec_if_positive - that will prevent
count_pending running under 0.

Fixes: fc423ff0 ("team: add peer notification")
Fixes: 492b200e  ("team: add support for sending multicast rejoins")
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

57bdf691

alx: fix alx_poll() · 644bfce0

Eric Dumazet authored Jan 11, 2015

[ Upstream commit 7a05dc64 ]

Commit d75b1ade ("net: less interrupt masking in NAPI") uncovered
wrong alx_poll() behavior.

A NAPI poll() handler is supposed to return exactly the budget when/if
napi_complete() has not been called.

It is also supposed to return number of frames that were received, so
that netdev_budget can have a meaning.

Also, in case of TX pressure, we still have to dequeue received
packets : alx_clean_rx_irq() has to be called even if
alx_clean_tx_irq(alx) returns false, otherwise device is half duplex.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: d75b1ade ("net: less interrupt masking in NAPI")
Reported-by: Oded Gabbay <oded.gabbay@amd.com>
Bisected-by: Oded Gabbay <oded.gabbay@amd.com>
Tested-by: Oded Gabbay <oded.gabbay@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

644bfce0

tcp: Do not apply TSO segment limit to non-TSO packets · a32cd17b

Herbert Xu authored Jan 01, 2015

[ Upstream commit 843925f3 ]

Thomas Jarosch reported IPsec TCP stalls when a PMTU event occurs.

In fact the problem was completely unrelated to IPsec.  The bug is
also reproducible if you just disable TSO/GSO.

The problem is that when the MSS goes down, existing queued packet
on the TX queue that have not been transmitted yet all look like
TSO packets and get treated as such.

This then triggers a bug where tcp_mss_split_point tells us to
generate a zero-sized packet on the TX queue.  Once that happens
we're screwed because the zero-sized packet can never be removed
by ACKs.

Fixes: 1485348d ("tcp: Apply device TSO segment limit earlier")
Reported-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

a32cd17b

net: Reset secmark when scrubbing packet · 375e61d0

Thomas Graf authored Dec 23, 2014

[ Upstream commit b8fb4e06 ]

skb_scrub_packet() is called when a packet switches between a context
such as between underlay and overlay, between namespaces, or between
L3 subnets.

While we already scrub the packet mark, connection tracking entry,
and cached destination, the security mark/context is left intact.

It seems wrong to inherit the security context of a packet when going
from overlay to underlay or across forwarding paths.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

375e61d0

net: Fix stacked vlan offload features computation · 11a78d0f

Toshiaki Makita authored Dec 22, 2014

[ Upstream commit 796f2da8 ]

When vlan tags are stacked, it is very likely that the outer tag is stored
in skb->vlan_tci and skb->protocol shows the inner tag's vlan_proto.
Currently netif_skb_features() first looks at skb->protocol even if there
is the outer tag in vlan_tci, thus it incorrectly retrieves the protocol
encapsulated by the inner vlan instead of the inner vlan protocol.
This allows GSO packets to be passed to HW and they end up being
corrupted.

Fixes: 58e998c6 ("offloading: Force software GSO for multiple vlan tags.")
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

11a78d0f

batman-adv: avoid NULL dereferences and fix if check · 030b8e65

Antonio Quartulli authored Dec 20, 2014

[ Upstream commit 0d164491 ]

Gateway having bandwidth_down equal to zero are not accepted
at all and so never added to the Gateway list.
For this reason checking the bandwidth_down member in
batadv_gw_out_of_range() is useless.

This is probably a copy/paste error and this check was supposed
to be "!gw_node" only. Moreover, the way the check is written
now may also lead to a NULL dereference.

Fix this by rewriting the if-condition properly.

Introduced by 414254e3
("batman-adv: tvlv - gateway download/upload bandwidth container")
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

030b8e65

batman-adv: Unify fragment size calculation · 01fadf3a

Sven Eckelmann authored Dec 20, 2014

[ Upstream commit 0402e444 ]

The fragmentation code was replaced in 610bfc6b
("batman-adv: Receive fragmented packets and merge") by an implementation which
can handle up to 16 fragments of a packet. The packet is prepared for the split
in fragments by the function batadv_frag_send_packet and the actual split is
done by batadv_frag_create.

Both functions calculate the size of a fragment themself. But their calculation
differs because batadv_frag_send_packet also subtracts ETH_HLEN. Therefore,
the check in batadv_frag_send_packet "can a full fragment can be created?" may
return true even when batadv_frag_create cannot create a full fragment.

The function batadv_frag_create doesn't check the size of the skb before
splitting it and therefore might try to create a larger fragment than the
remaining buffer. This creates an integer underflow and an invalid len is given
to skb_split.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

01fadf3a

tg3: tg3_disable_ints using uninitialized mailbox value to disable interrupts · 49f195ad

Prashant Sreedharan authored Dec 20, 2014

[ Upstream commit 05b0aa57 ]

During driver load in tg3_init_one, if the driver detects DMA activity before
intializing the chip tg3_halt is called. As part of tg3_halt interrupts are
disabled using routine tg3_disable_ints. This routine was using mailbox value
which was not initialized (default value is 0). As a result driver was writing
0x00000001 to pci config space register 0, which is the vendor id / device id.

This driver bug was exposed because of the commit a7877b17a667 (PCI: Check only
the Vendor ID to identify Configuration Request Retry). Also this issue is only
seen in older generation chipsets like 5722 because config space write to offset
0 from driver is possible. The newer generation chips ignore writes to offset 0.
Also without commit a7877b17a667, for these older chips when a GRC reset is
issued the Bootcode would reprogram the vendor id/device id, which is the reason
this bug was masked earlier.

Fixed by initializing the interrupt mailbox registers before calling tg3_halt.

Please queue for -stable.
Reported-by: Nils Holland <nholland@tisys.org>
Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Prashant Sreedharan <prashant@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

49f195ad

in6: fix conflict with glibc · 8f6ad7c4

stephen hemminger authored Dec 20, 2014

[ Upstream commit 6d08acd2 ]

Resolve conflicts between glibc definition of IPV6 socket options
and those defined in Linux headers. Looks like earlier efforts to
solve this did not cover all the definitions.

It resolves warnings during iproute2 build.
Please consider for stable as well.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

8f6ad7c4

netlink: Don't reorder loads/stores before marking mmap netlink frame as available · 3eeeea50

Thomas Graf authored Dec 18, 2014

[ Upstream commit a18e6a18 ]

Each mmap Netlink frame contains a status field which indicates
whether the frame is unused, reserved, contains data or needs to
be skipped. Both loads and stores may not be reordeded and must
complete before the status field is changed and another CPU might
pick up the frame for use. Use an smp_mb() to cover needs of both
types of callers to netlink_set_status(), callers which have been
reading data frame from the frame, and callers which have been
filling or releasing and thus writing to the frame.

- Example code path requiring a smp_rmb():
  memcpy(skb->data, (void *)hdr + NL_MMAP_HDRLEN, hdr->nm_len);
  netlink_set_status(hdr, NL_MMAP_STATUS_UNUSED);

- Example code path requiring a smp_wmb():
  hdr->nm_uid	= from_kuid(sk_user_ns(sk), NETLINK_CB(skb).creds.uid);
  hdr->nm_gid	= from_kgid(sk_user_ns(sk), NETLINK_CB(skb).creds.gid);
  netlink_frame_flush_dcache(hdr);
  netlink_set_status(hdr, NL_MMAP_STATUS_VALID);

Fixes: f9c228 ("netlink: implement memory mapped recvmsg()")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

3eeeea50

netlink: Always copy on mmap TX. · 357d462e

David Miller authored Dec 16, 2014

[ Upstream commit 4682a035 ]

Checking the file f_count and the nlk->mapped count is not completely
sufficient to prevent the mmap'd area contents from changing from
under us during netlink mmap sendmsg() operations.

Be careful to sample the header's length field only once, because this
could change from under us as well.

Fixes: 5fd96123 ("netlink: implement memory mapped sendmsg()")
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

357d462e

gre: fix the inner mac header in nbma tunnel xmit path · 6c7142a1

Timo Teräs authored Dec 15, 2014

[ Upstream commit 8a0033a9 ]

The NBMA GRE tunnels temporarily push GRE header that contain the
per-packet NBMA destination on the skb via header ops early in xmit
path. It is the later pulled before the real GRE header is constructed.

The inner mac was thus set differently in nbma case: the GRE header
has been pushed by neighbor layer, and mac header points to beginning
of the temporary gre header (set by dev_queue_xmit).

Now that the offloads expect mac header to point to the gre payload,
fix the xmit patch to:
 - pull first the temporary gre header away
 - and reset mac header to point to gre payload

This fixes tso to work again with nbma tunnels.

Fixes: 14051f04 ("gre: Use inner mac length when computing tunnel length")
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Cc: Tom Herbert <therbert@google.com>
Cc: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

6c7142a1

Linux 3.13.11-ckt14 · 38e21c0c
Kamal Mostafa authored Jan 21, 2015
```
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
```
38e21c0c

14 Jan, 2015 5 commits

deal with deadlock in d_walk() · 7b70ac31

Al Viro authored Oct 26, 2014

commit ca5358ef upstream.

... by not hitting rename_retry for reasons other than rename having
happened.  In other words, do _not_ restart when finding that
between unlocking the child and locking the parent the former got
into __dentry_kill().  Skip the killed siblings instead...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[ kamal: backport to 3.13-stable: __dentry_kill() change applied to d_kill() ]
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

7b70ac31

move d_rcu from overlapping d_child to overlapping d_alias · 563ba5a9

Al Viro authored Oct 26, 2014

commit 946e51f2 upstream.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[bwh: Backported to 3.16:
 - Apply name changes in all the different places we use d_alias and d_child
 - Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Moritz Muehlenhoff <jmm@inutil.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

563ba5a9

userns: Don't allow unprivileged creation of gid mappings · 035ec337

Eric W. Biederman authored Dec 05, 2014

commit be7c6dba upstream.

As any gid mapping will allow and must allow for backwards
compatibility dropping groups don't allow any gid mappings to be
established without CAP_SETGID in the parent user namespace.

For a small class of applications this change breaks userspace
and removes useful functionality.  This small class of applications
includes tools/testing/selftests/mount/unprivilged-remount-test.c

Most of the removed functionality will be added back with the addition
of a one way knob to disable setgroups.  Once setgroups is disabled
setting the gid_map becomes as safe as setting the uid_map.

For more common applications that set the uid_map and the gid_map
with privilege this change will have no affect.

This is part of a fix for CVE-2014-8989.
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

035ec337

userns: Don't allow setgroups until a gid mapping has been setablished · a0092873

Eric W. Biederman authored Dec 05, 2014

commit 273d2c67 upstream.

setgroups is unique in not needing a valid mapping before it can be called,
in the case of setgroups(0, NULL) which drops all supplemental groups.

The design of the user namespace assumes that CAP_SETGID can not actually
be used until a gid mapping is established.  Therefore add a helper function
to see if the user namespace gid mapping has been established and call
that function in the setgroups permission check.

This is part of the fix for CVE-2014-8989, being able to drop groups
without privilege using user namespaces.
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

a0092873

groups: Consolidate the setgroups permission checks · ae6a146b

Eric W. Biederman authored Dec 05, 2014

commit 7ff4d90b upstream.

Today there are 3 instances of setgroups and due to an oversight their
permission checking has diverged.  Add a common function so that
they may all share the same permission checking code.

This corrects the current oversight in the current permission checks
and adds a helper to avoid this in the future.

A user namespace security fix will update this new helper, shortly.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

ae6a146b

13 Jan, 2015 5 commits

x86_64, vdso: Fix the vdso address randomization algorithm · ff845a15

Andy Lutomirski authored Dec 19, 2014

commit 394f56fe upstream.

The theory behind vdso randomization is that it's mapped at a random
offset above the top of the stack.  To avoid wasting a page of
memory for an extra page table, the vdso isn't supposed to extend
past the lowest PMD into which it can fit.  Other than that, the
address should be a uniformly distributed address that meets all of
the alignment requirements.

The current algorithm is buggy: the vdso has about a 50% probability
of being at the very end of a PMD.  The current algorithm also has a
decent chance of failing outright due to incorrect handling of the
case where the top of the stack is near the top of its PMD.

This fixes the implementation.  The paxtest estimate of vdso
"randomisation" improves from 11 bits to 18 bits.  (Disclaimer: I
don't know what the paxtest code is actually calculating.)

It's worth noting that this algorithm is inherently biased: the vdso
is more likely to end up near the end of its PMD than near the
beginning.  Ideally we would either nix the PMD sharing requirement
or jointly randomize the vdso and the stack to reduce the bias.

In the mean time, this is a considerable improvement with basically
no risk of compatibility issues, since the allowed outputs of the
algorithm are unchanged.

As an easy test, doing this:

for i in `seq 10000`
  do grep -P vdso /proc/self/maps |cut -d- -f1
done |sort |uniq -d

used to produce lots of output (1445 lines on my most recent run).
A tiny subset looks like this:

7fffdfffe000
7fffe01fe000
7fffe05fe000
7fffe07fe000
7fffe09fe000
7fffe0bfe000
7fffe0dfe000

Note the suspicious fe000 endings.  With the fix, I get a much more
palatable 76 repeated addresses.
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

ff845a15

isofs: Fix unchecked printing of ER records · 24c7fcc3

Jan Kara authored Dec 18, 2014

commit 4e202462 upstream.

We didn't check length of rock ridge ER records before printing them.
Thus corrupted isofs image can cause us to access and print some memory
behind the buffer with obvious consequences.
Reported-and-tested-by: Carl Henrik Lunde <chlunde@ping.uio.no>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

24c7fcc3

KEYS: close race between key lookup and freeing · 55036ae4

Sasha Levin authored Dec 29, 2014

commit a3a87844 upstream.

When a key is being garbage collected, it's key->user would get put before
the ->destroy() callback is called, where the key is removed from it's
respective tracking structures.

This leaves a key hanging in a semi-invalid state which leaves a window open
for a different task to try an access key->user. An example is
find_keyring_by_name() which would dereference key->user for a key that is
in the process of being garbage collected (where key->user was freed but
->destroy() wasn't called yet - so it's still present in the linked list).

This would cause either a panic, or corrupt memory.

Fixes CVE-2014-9529.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

55036ae4

batman-adv: Calculate extra tail size based on queued fragments · e6e75eaa

Sven Eckelmann authored Dec 20, 2014

commit 5b6698b0 upstream.

The fragmentation code was replaced in 610bfc6b
("batman-adv: Receive fragmented packets and merge"). The new code provided a
mostly unused parameter skb for the merging function. It is used inside the
function to calculate the additionally needed skb tailroom. But instead of
increasing its own tailroom, it is only increasing the tailroom of the first
queued skb. This is not correct in some situations because the first queued
entry can be a different one than the parameter.

An observed problem was:

1. packet with size 104, total_size 1464, fragno 1 was received
   - packet is queued
2. packet with size 1400, total_size 1464, fragno 0 was received
   - packet is queued at the end of the list
3. enough data was received and can be given to the merge function
   (1464 == (1400 - 20) + (104 - 20))
   - merge functions gets 1400 byte large packet as skb argument
4. merge function gets first entry in queue (104 byte)
   - stored as skb_out
5. merge function calculates the required extra tail as total_size - skb->len
   - pskb_expand_head tail of skb_out with 64 bytes
6. merge function tries to squeeze the extra 1380 bytes from the second queued
   skb (1400 byte aka skb parameter) in the 64 extra tail bytes of skb_out

Instead calculate the extra required tail bytes for skb_out also using skb_out
instead of using the parameter skb. The skb parameter is only used to get the
total_size from the last received packet. This is also the total_size used to
decide that all fragments were received.
Reported-by: Philipp Psurek <philipp.psurek@gmail.com>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Acked-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

e6e75eaa

isofs: Fix infinite looping over CE entries · f5034d91

Jan Kara authored Dec 15, 2014

commit f54e18f1 upstream.

Rock Ridge extensions define so called Continuation Entries (CE) which
define where is further space with Rock Ridge data. Corrupted isofs
image can contain arbitrarily long chain of these, including a one
containing loop and thus causing kernel to end in an infinite loop when
traversing these entries.

Limit the traversal to 32 entries which should be more than enough space
to store all the Rock Ridge data.
Reported-by: P J P <ppandit@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

f5034d91