Commits · 526d1f369185226566cdc5718650006e8e6fff73 · Kirill Smelkov / linux

05 Oct, 2014 40 commits

blk-mq: Avoid race condition with uninitialized requests · 526d1f36

David Hildenbrand authored Sep 18, 2014

commit 683d0e12 upstream.

This patch should fix the bug reported in
https://lkml.org/lkml/2014/9/11/249.

We have to initialize at least the atomic_flags and the cmd_flags when
allocating storage for the requests.

Otherwise blk_mq_timeout_check() might dereference uninitialized
pointers when racing with the creation of a request.

Also move the reset of cmd_flags for the initializing code to the point
where a request is freed. So we will never end up with pending flush
request indicators that might trigger dereferences of invalid pointers
in blk_mq_timeout_check().
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reported-by: Paulo De Rezende Pinatti <ppinatti@linux.vnet.ibm.com>
Tested-by: Paulo De Rezende Pinatti <ppinatti@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

526d1f36

Fix nasty 32-bit overflow bug in buffer i/o code. · c1672332

Anton Altaparmakov authored Sep 22, 2014

commit f2d5a944 upstream.

On 32-bit architectures, the legacy buffer_head functions are not always
handling the sector number with the proper 64-bit types, and will thus
fail on 4TB+ disks.

Any code that uses __getblk() (and thus bread(), breadahead(),
sb_bread(), sb_breadahead(), sb_getblk()), and calls it using a 64-bit
block on a 32-bit arch (where "long" is 32-bit) causes an inifinite loop
in __getblk_slow() with an infinite stream of errors logged to dmesg
like this:

  __find_get_block_slow() failed. block=6740375944, b_blocknr=2445408648
  b_state=0x00000020, b_size=512
  device sda1 blocksize: 512

Note how in hex block is 0x191C1F988 and b_blocknr is 0x91C1F988 i.e. the
top 32-bits are missing (in this case the 0x1 at the top).

This is because grow_dev_page() is broken and has a 32-bit overflow due
to shifting the page index value (a pgoff_t - which is just 32 bits on
32-bit architectures) left-shifted as the block number.  But the top
bits to get lost as the pgoff_t is not type cast to sector_t / 64-bit
before the shift.

This patch fixes this issue by type casting "index" to sector_t before
doing the left shift.

Note this is not a theoretical bug but has been seen in the field on a
4TiB hard drive with logical sector size 512 bytes.

This patch has been verified to fix the infinite loop problem on 3.17-rc5
kernel using a 4TB disk image mounted using "-o loop".  Without this patch
doing a "find /nt" where /nt is an NTFS volume causes the inifinite loop
100% reproducibly whilst with the patch it works fine as expected.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

c1672332

drm/radeon/px: fix module unload · bd1328b4

Alex Deucher authored Sep 12, 2014

commit 2e97140d upstream.

Use the new vga_switcheroo_fini_domain_pm_ops function
to unregister the pm ops.

Based on a patch from:
Pali Rohár <pali.rohar@gmail.com>

bug:
https://bugzilla.kernel.org/show_bug.cgi?id=84431Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

bd1328b4

drm/nouveau/runpm: fix module unload · 97d30fa3

Alex Deucher authored Sep 12, 2014

commit 53beaa01 upstream.

Use the new vga_switcheroo_fini_domain_pm_ops function
to unregister the pm ops.

Based on a patch from:
Pali Rohár <pali.rohar@gmail.com>

bug:
https://bugzilla.kernel.org/show_bug.cgi?id=84431Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

97d30fa3

vgaswitcheroo: add vga_switcheroo_fini_domain_pm_ops · 4659be27

Alex Deucher authored Sep 12, 2014

commit 766a53d0 upstream.

Drivers should call this on unload to unregister pmops.

Bug:
https://bugzilla.kernel.org/show_bug.cgi?id=84431Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

4659be27

Revert "PCI: Don't scan random busses in pci_scan_bridge()" · 4e3a0d9d

Bjorn Helgaas authored Sep 19, 2014

commit 7a0b33d4 upstream.

This reverts commit fc1b2531 ("PCI: Don't scan random busses in
pci_scan_bridge()") because it breaks CardBus on some machines.

David tested a Dell Latitude D505 that worked like this prior to
fc1b2531:

pci 0000:00:1e.0: PCI bridge to [bus 01]
pci 0000:01:01.0: CardBus bridge to [bus 02-05]

Note that the 01:01.0 CardBus bridge has a bus number aperture of
[bus 02-05], but those buses are all outside the 00:1e.0 PCI bridge bus
number aperture, so accesses to buses 02-05 never reach CardBus. This is
later patched up by yenta_fixup_parent_bridge(), which changes the
subordinate bus number of the 00:1e.0 PCI bridge:

pci_bus 0000:01: Raising subordinate bus# of parent bus (#01) from #01 to #05

With fc1b2531, pci_scan_bridge() fails immediately when it notices that
we can't allocate a valid secondary bus number for the CardBus bridge, and
CardBus doesn't work at all:

pci 0000:01:01.0: can't allocate child bus 01 from [bus 01]

I'd prefer to fix this by integrating the yenta_fixup_parent_bridge() logic
into pci_scan_bridge() so we fix the bus number apertures up front. But
I don't think we can do that before v3.17, so I'm going to revert this to
avoid the problem while we're working on the long-term fix.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=83441
Link: http://lkml.kernel.org/r/1409303414-5196-1-git-send-email-david.henningsson@canonical.comReported-by: David Henningsson <david.henningsson@canonical.com>
Tested-by: David Henningsson <david.henningsson@canonical.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

4e3a0d9d

PCI: Add pci_ignore_hotplug() to ignore hotplug events for a device · a4c5f39c

Bjorn Helgaas authored Sep 10, 2014

commit b440bde7 upstream.

Powering off a hot-pluggable device, e.g., with pci_set_power_state(D3cold),
normally generates a hot-remove event that unbinds the driver.

Some drivers expect to remain bound to a device even while they power it
off and back on again. This can be dangerous, because if the device is
removed or replaced while it is powered off, the driver doesn't know that
anything changed. But some drivers accept that risk.

Add pci_ignore_hotplug() for use by drivers that know their device cannot
be removed. Using pci_ignore_hotplug() tells the PCI core that hot-plug
events for the device should be ignored.

The radeon and nouveau drivers use this to switch between a low-power,
integrated GPU and a higher-power, higher-performance discrete GPU. They
power off the unused GPU, but they want to remain bound to it.

This is a reimplementation of f244d8b6 ("ACPIPHP / radeon / nouveau:
Fix VGA switcheroo problem related to hotplug") but extends it to work with
both acpiphp and pciehp.

This fixes a problem where systems with dual GPUs using the radeon drivers
become unusable, freezing every few seconds (see bugzillas below). The
resume of the radeon device may also fail, e.g.,

This fixes problems on dual GPU systems where the radeon driver becomes
unusable because of problems while suspending the device, as in bug 79701:

[drm] radeon: finishing device.
radeon 0000:01:00.0: Userspace still has active objects !
radeon 0000:01:00.0: ffff8800cb4ec288 ffff8800cb4ec000 16384 4294967297 force free
...
WARNING: CPU: 0 PID: 67 at /home/apw/COD/linux/drivers/gpu/drm/radeon/radeon_gart.c:234 radeon_gart_unbind+0xd2/0xe0 [radeon]()
trying to unbind memory from uninitialized GART !

or while resuming it, as in bug 77261:

radeon 0000:01:00.0: ring 0 stalled for more than 10158msec
radeon 0000:01:00.0: GPU lockup ...
radeon 0000:01:00.0: GPU pci config reset
pciehp 0000:00:01.0:pcie04: Card not present on Slot(1-1)
radeon 0000:01:00.0: GPU reset succeeded, trying to resume
*ERROR* radeon: dpm resume failed
radeon 0000:01:00.0: Wait for MC idle timedout !

Link: https://bugzilla.kernel.org/show_bug.cgi?id=77261
Link: https://bugzilla.kernel.org/show_bug.cgi?id=79701Reported-by: Shawn Starr <shawn.starr@rogers.com>
Reported-by: Jose P. <lbdkmjdf@sharklasers.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Rajat Jain <rajatxjain@gmail.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

a4c5f39c

arm: armv7: perf: fix armv7 ref-cycles error · e7a0374e

Zhiqiang Zhang authored Sep 26, 2014

ref-cycles event is specially to Intel core, but can still used in arm
architecture with the wrong return value with 3.10 stable. this patch fix the
bug and make it return NOT SUPPORTED distinctly.

In upstream this bug has been fixed by other way, which changes more than one
file and more than 1000 lines. the primary commit is
6b7658ec.  besides we can not simply
cherry-pick.
Signed-off-by: Zhiqiang Zhang <zhangzhiqiang.zhang@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com
Cc: Will Deacon <will.deacon@arm.com>
Cc: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

e7a0374e

perf: Fix a race condition in perf_remove_from_context() · c850c078

Cong Wang authored Sep 02, 2014

commit 3577af70 upstream.

We saw a kernel soft lockup in perf_remove_from_context(),
it looks like the `perf` process, when exiting, could not go
out of the retry loop. Meanwhile, the target process was forking
a child. So either the target process should execute the smp
function call to deactive the event (if it was running) or it should
do a context switch which deactives the event.

It seems we optimize out a context switch in perf_event_context_sched_out(),
and what's more important, we still test an obsolete task pointer when
retrying, so no one actually would deactive that event in this situation.
Fix it directly by reloading the task pointer in perf_remove_from_context().

This should cure the above soft lockup.
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1409696840-843-1-git-send-email-xiyou.wangcong@gmail.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

c850c078

Fix unbalanced mutex in dma_pool_create(). · f673015a

Krzysztof Hałasa authored Sep 18, 2014

commit 153a9f13 upstream.

dma_pool_create() needs to unlock the mutex in error case.  The bug was
introduced in the 3.16 by commit cc6b664a ("mm/dmapool.c: remove
redundant NULL check for dev in dma_pool_create()")/
Signed-off-by: Krzysztof Hałasa <khc@piap.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f673015a

spi: sirf: enable RX_IO_DMA_INT interrupt · 30b9f42a

Qipan Li authored Sep 02, 2014

commit f2a08b40 upstream.

in spi interrupt handler, we need check RX_IO_DMA status to ensure
rx fifo have received the specify count data.

if not set, the while statement in spi isr function will keep loop,
at last, make the kernel hang.

[The code is actually there in the interrupt handler but apparently it
needs the interrupt unmasking so the handler sees the status -- broonie]
Signed-off-by: Qipan Li <Qipan.Li@csr.com>
Signed-off-by: Barry Song <Baohua.Song@csr.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

30b9f42a

spi: dw: Don't use devm_kzalloc in master->setup callback · 2ac3e493

Axel Lin authored Aug 31, 2014

commit a97c883a upstream.

device_add() expects that any memory allocated via devm_* API is only
done in the device's probe function.

Fix below boot warning:
WARNING: CPU: 1 PID: 1 at drivers/base/dd.c:286 driver_probe_device+0x2b4/0x2f4()
Modules linked in:
CPU: 1 PID: 1 Comm: swapper/0 Not tainted 3.16.0-10474-g835c90b-dirty #160
[<c0016364>] (unwind_backtrace) from [<c001251c>] (show_stack+0x20/0x24)
[<c001251c>] (show_stack) from [<c04eaefc>] (dump_stack+0x7c/0x98)
[<c04eaefc>] (dump_stack) from [<c0023d4c>] (warn_slowpath_common+0x78/0x9c)
[<c0023d4c>] (warn_slowpath_common) from [<c0023d9c>] (warn_slowpath_null+0x2c/0x34)
[<c0023d9c>] (warn_slowpath_null) from [<c0302c60>] (driver_probe_device+0x2b4/0x2f4)
[<c0302c60>] (driver_probe_device) from [<c0302d90>] (__device_attach+0x50/0x54)
[<c0302d90>] (__device_attach) from [<c0300e60>] (bus_for_each_drv+0x54/0x9c)
[<c0300e60>] (bus_for_each_drv) from [<c0302958>] (device_attach+0x84/0x90)
[<c0302958>] (device_attach) from [<c0301f10>] (bus_probe_device+0x94/0xb8)
[<c0301f10>] (bus_probe_device) from [<c03000c0>] (device_add+0x434/0x4fc)
[<c03000c0>] (device_add) from [<c0342dd4>] (spi_add_device+0x98/0x164)
[<c0342dd4>] (spi_add_device) from [<c03444a4>] (spi_register_master+0x598/0x768)
[<c03444a4>] (spi_register_master) from [<c03446b4>] (devm_spi_register_master+0x40/0x80)
[<c03446b4>] (devm_spi_register_master) from [<c0346214>] (dw_spi_add_host+0x1a8/0x258)
[<c0346214>] (dw_spi_add_host) from [<c0346920>] (dw_spi_mmio_probe+0x1d4/0x294)
[<c0346920>] (dw_spi_mmio_probe) from [<c0304560>] (platform_drv_probe+0x3c/0x6c)
[<c0304560>] (platform_drv_probe) from [<c0302a98>] (driver_probe_device+0xec/0x2f4)
[<c0302a98>] (driver_probe_device) from [<c0302d3c>] (__driver_attach+0x9c/0xa0)
[<c0302d3c>] (__driver_attach) from [<c0300f0c>] (bus_for_each_dev+0x64/0x98)
[<c0300f0c>] (bus_for_each_dev) from [<c0302518>] (driver_attach+0x2c/0x30)
[<c0302518>] (driver_attach) from [<c0302134>] (bus_add_driver+0xdc/0x1f4)
[<c0302134>] (bus_add_driver) from [<c03035c8>] (driver_register+0x88/0x104)
[<c03035c8>] (driver_register) from [<c030445c>] (__platform_driver_register+0x58/0x6c)
[<c030445c>] (__platform_driver_register) from [<c0700f00>] (dw_spi_mmio_driver_init+0x18/0x20)
[<c0700f00>] (dw_spi_mmio_driver_init) from [<c0008914>] (do_one_initcall+0x90/0x1d4)
[<c0008914>] (do_one_initcall) from [<c06d7d90>] (kernel_init_freeable+0x178/0x248)
[<c06d7d90>] (kernel_init_freeable) from [<c04e687c>] (kernel_init+0x18/0xfc)
[<c04e687c>] (kernel_init) from [<c000ecd8>] (ret_from_fork+0x14/0x20)
Reported-by: Thor Thayer <tthayer@opensource.altera.com>
Signed-off-by: Axel Lin <axel.lin@ingics.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

2ac3e493

spi: fsl: Don't use devm_kzalloc in master->setup callback · f9078a24

Axel Lin authored Aug 31, 2014

commit d9f26748 upstream.

device_add() expects that any memory allocated via devm_* API is only
done in the device's probe function.

Fix below boot warning:
[    3.092348] WARNING: at drivers/base/dd.c:286
[    3.096637] Modules linked in:
[    3.099697] CPU: 0 PID: 25 Comm: kworker/u2:1 Tainted: G W 3.16.1-s3k-drv-999-svn5771_knld-999 #158
[ 3.109610] Workqueue: deferwq deferred_probe_work_func
[    3.114736] task: c787f020 ti: c790c000 task.ti: c790c000
[    3.120062] NIP: c01df158 LR: c01df144 CTR: 00000000
[    3.124983] REGS: c790db30 TRAP: 0700   Tainted: G        W (3.16.1-s3k-drv-999-svn5771_knld-999)
[    3.134162] MSR: 00029032 <EE,ME,IR,DR,RI>  CR: 22002082 XER: 20000000
[    3.140703]
[    3.140703] GPR00: 00000001 c790dbe0 c787f020 00000044 00000054 00000308 c056da0e 20737069
[    3.140703] GPR08: 33323736 000ebfe0 00000308 000ebfdf 22002082 00000000 c046c5a0 c046c608
[    3.140703] GPR16: c046c614 c046c620 c046c62c c046c638 c046c648 c046c654 c046c68c c046c6c4
[    3.140703] GPR24: 00000000 00000000 00000003 c0401aa0 c0596638 c059662c c054e7a8 c7996800
[    3.170102] NIP [c01df158] driver_probe_device+0xf8/0x334
[    3.175431] LR [c01df144] driver_probe_device+0xe4/0x334
[    3.180633] Call Trace:
[    3.183093] [c790dbe0] [c01df144] driver_probe_device+0xe4/0x334 (unreliable)
[    3.190147] [c790dc10] [c01dd15c] bus_for_each_drv+0x7c/0xc0
[    3.195741] [c790dc40] [c01df5fc] device_attach+0xcc/0xf8
[    3.201076] [c790dc60] [c01dd6d4] bus_probe_device+0xb4/0xc4
[    3.206666] [c790dc80] [c01db9f8] device_add+0x270/0x564
[    3.211923] [c790dcc0] [c0219e84] spi_add_device+0xc0/0x190
[    3.217427] [c790dce0] [c021a79c] spi_register_master+0x720/0x834
[    3.223455] [c790dd40] [c021cb48] of_fsl_spi_probe+0x55c/0x614
[    3.229234] [c790dda0] [c01e0d2c] platform_drv_probe+0x30/0x74
[    3.234987] [c790ddb0] [c01df18c] driver_probe_device+0x12c/0x334
[    3.241008] [c790dde0] [c01dd15c] bus_for_each_drv+0x7c/0xc0
[    3.246602] [c790de10] [c01df5fc] device_attach+0xcc/0xf8
[    3.251937] [c790de30] [c01dd6d4] bus_probe_device+0xb4/0xc4
[    3.257536] [c790de50] [c01de9d8] deferred_probe_work_func+0x98/0xe0
[    3.263816] [c790de70] [c00305b8] process_one_work+0x18c/0x440
[    3.269577] [c790dea0] [c0030a00] worker_thread+0x194/0x67c
[    3.275105] [c790def0] [c0039198] kthread+0xd0/0xe4
[    3.279911] [c790df40] [c000c6d0] ret_from_kernel_thread+0x5c/0x64
[    3.285970] Instruction dump:
[    3.288900] 80de0000 419e01d0 3b7b0038 3c60c046 7f65db78 38635264 48211b99 813f00a0
[    3.296559] 381f00a0 7d290278 3169ffff 7c0b4910 <0f000000> 93df0044 7fe3fb78 4bfffd4d
Reported-by: leroy christophe <christophe.leroy@c-s.fr>
Signed-off-by: Axel Lin <axel.lin@ingics.com>
Tested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f9078a24

IB/core: When marshaling uverbs path, clear unused fields · e60309c8

Matan Barak authored Sep 02, 2014

commit a59c5850 upstream.

When marsheling a user path to the kernel struct ib_sa_path, need
to zero smac, dmac and set the vlan id to the "no vlan" value.

Fixes: dd5f03be ("IB/core: Ethernet L2 attributes in verbs/cm structures")
Reported-by: Aleksey Senin <alekseys@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

e60309c8

IB/mlx4: Don't duplicate the default RoCE GID · a85dce19

Moni Shoua authored Aug 21, 2014

commit f5c4834d upstream.

When reading the IPv6 addresses from the net-device, make sure to
avoid adding a duplicate entry to the GID table because of equality
between the default GID we generate and the default IPv6 link-local
address of the device.

Fixes: acc4fccf ("IB/mlx4: Make sure GID index 0 is always occupied")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

a85dce19

IB/mlx4: Avoid null pointer dereference in mlx4_ib_scan_netdevs() · 23636720

Moni Shoua authored Aug 21, 2014

commit e381835c upstream.

When Ethernet netdev is not present for a port (e.g. when the link
layer type of the port is InfiniBand) it's possible to dereference a
null pointer when we do netdevice scanning.

To fix that, we move a section of code that needs to run only when
netdev is present to a proper if () statement.

Fixes: ad4885d2 ("IB/mlx4: Build the port IBoE GID table properly under bonding")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

23636720

IB/qib: Correct reference counting in debugfs qp_stats · f59c9cae

Mike Marciniszyn authored Sep 19, 2014

commit 85cbb7c7 upstream.

This particular reference count is not needed with the rcu protection,
and the current code leaks a reference count, causing a hang in
qib_qp_destroy().
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f59c9cae

GFS2: fix d_splice_alias() misuses · 778c2935

Al Viro authored Sep 12, 2014

commit cfb2f9d5 upstream.

Callers of d_splice_alias(dentry, inode) don't need iput(), neither
on success nor on failure.  Either the reference to inode is stored
in a previously negative dentry, or it's dropped.  In either case
inode reference the caller used to hold is consumed.

__gfs2_lookup() does iput() in case when d_splice_alias() has failed.
Double iput() if we ever hit that.  And gfs2_create_inode() ends up
not only with double iput(), but with link count dropped to zero - on
an inode it has just found in directory.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

778c2935

Revert "hwrng: virtio - ensure reads happen after successful probe" · 60ede3e7

Amit Shah authored Jul 27, 2014

commit eeec6263 upstream.

This reverts commit e052dbf5.

Now that we use the virtio ->scan() function to register with the hwrng
core, we will not get read requests till probe is successfully finished.

So revert the workaround we had in place to refuse read requests while
we were not yet setup completely.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

60ede3e7

virtio: rng: delay hwrng_register() till driver is ready · 5590b04b

Amit Shah authored Jul 27, 2014

commit 5c062734 upstream.

Instead of calling hwrng_register() in the probe routing, call it in the
scan routine.  This ensures that when hwrng_register() is successful,
and it requests a few random bytes to seed the kernel's pool at init,
we're ready to service that request.

This will also enable us to remove the workaround added previously to
check whether probe was completed, and only then ask for data from the
host.  The revert follows in the next commit.

There's a slight behaviour change here on unsuccessful hwrng_register().
Previously, when hwrng_register() failed, the probe() routine would
fail, and the vqs would be torn down, and driver would be marked not
initialized.  Now, the vqs will remain initialized, driver would be
marked initialized as well, but won't be available in the list of RNGs
available to hwrng core.  To fix the failures, the procedure remains the
same, i.e. unload and re-load the module, and hope things succeed the
next time around.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

5590b04b

alarmtimer: Lock k_itimer during timer callback · 6ba3934d

Richard Larocque authored Sep 09, 2014

commit 474e941b upstream.

Locks the k_itimer's it_lock member when handling the alarm timer's
expiry callback.

The regular posix timers defined in posix-timers.c have this lock held
during timout processing because their callbacks are routed through
posix_timer_fn().  The alarm timers follow a different path, so they
ought to grab the lock somewhere else.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Sharvil Nanavati <sharvil@google.com>
Signed-off-by: Richard Larocque <rlarocque@google.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

6ba3934d

alarmtimer: Do not signal SIGEV_NONE timers · f11d2259

Richard Larocque authored Sep 09, 2014

commit 265b81d2 upstream.

Avoids sending a signal to alarm timers created with sigev_notify set to
SIGEV_NONE by checking for that special case in the timeout callback.

The regular posix timers avoid sending signals to SIGEV_NONE timers by
not scheduling any callbacks for them in the first place.  Although it
would be possible to do something similar for alarm timers, it's simpler
to handle this as a special case in the timeout.

Prior to this patch, the alarm timer would ignore the sigev_notify value
and try to deliver signals to the process anyway.  Even worse, the
sanity check for the value of sigev_signo is skipped when SIGEV_NONE was
specified, so the signal number could be bogus.  If sigev_signo was an
unitialized value (as it often would be if SIGEV_NONE is used), then
it's hard to predict which signal will be sent.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Sharvil Nanavati <sharvil@google.com>
Signed-off-by: Richard Larocque <rlarocque@google.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f11d2259

alarmtimer: Return relative times in timer_gettime · bffe4248

Richard Larocque authored Sep 09, 2014

commit e86fea76 upstream.

Returns the time remaining for an alarm timer, rather than the time at
which it is scheduled to expire.  If the timer has already expired or it
is not currently scheduled, the it_value's members are set to zero.

This new behavior matches that of the other posix-timers and the POSIX
specifications.

This is a change in user-visible behavior, and may break existing
applications.  Hopefully, few users rely on the old incorrect behavior.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Sharvil Nanavati <sharvil@google.com>
Signed-off-by: Richard Larocque <rlarocque@google.com>
[jstultz: minor style tweak]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

bffe4248

parisc: Only use -mfast-indirect-calls option for 32-bit kernel builds · 79b9b729

John David Anglin authored Sep 22, 2014

commit d26a7730 upstream.

In spite of what the GCC manual says, the -mfast-indirect-calls has
never been supported in the 64-bit parisc compiler. Indirect calls have
always been done using function descriptors irrespective of the
-mfast-indirect-calls option.

Recently, it was noticed that a function descriptor was always requested
when the -mfast-indirect-calls option was specified. This caused
problems when the option was used in  application code and doesn't make
any sense because the whole point of the option is to avoid using a
function descriptor for indirect calls.

Fixing this broke 64-bit kernel builds.

I will fix GCC but for now we need the attached change. This results in
the same kernel code as before.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

79b9b729

parisc: Implement new LWS CAS supporting 64 bit operations. · 43ed39e4

Guy Martin authored Sep 12, 2014

commit 89206491 upstream.

The current LWS cas only works correctly for 32bit. The new LWS allows
for CAS operations of variable size.
Signed-off-by: Guy Martin <gmsoft@tuxicoman.be>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

43ed39e4

don't bugger nd->seq on set_root_rcu() from follow_dotdot_rcu() · 8ed0282d

Al Viro authored Sep 13, 2014

commit 7bd88377 upstream.

return the value instead, and have path_init() do the assignment.  Broken by
"vfs: Fix absolute RCU path walk failures due to uninitialized seq number",
which was Cc-stable with 2.6.38+ as destination.  This one should go where
it went.

To avoid dummy value returned in case when root is already set (it would do
no harm, actually, since the only caller that doesn't ignore the return value
is guaranteed to have nd->root *not* set, but it's more obvious that way),
lift the check into callers.  And do the same to set_root(), to keep them
in sync.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

8ed0282d

tty/serial: at91: BUG: disable interrupts when !UART_ENABLE_MS() · f60133a9

Richard Genoud authored Sep 03, 2014

commit 35b675b9 upstream.

In set_termios(), interrupts where not disabled if UART_ENABLE_MS() was
false.

Tested on at91sam9g35.
Signed-off-by: Richard Genoud <richard.genoud@gmail.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f60133a9

powerpc: Add smp_mb()s to arch_spin_unlock_wait() · 4e43bbd4

Michael Ellerman authored Aug 07, 2014

commit 78e05b14 upstream.

Similar to the previous commit which described why we need to add a
barrier to arch_spin_is_locked(), we have a similar problem with
spin_unlock_wait().

We need a barrier on entry to ensure any spinlock we have previously
taken is visibly locked prior to the load of lock->slock.

It's also not clear if spin_unlock_wait() is intended to have ACQUIRE
semantics. For now be conservative and add a barrier on exit to give it
ACQUIRE semantics.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

4e43bbd4

powerpc: Add smp_mb() to arch_spin_is_locked() · 2dd10ce8

Michael Ellerman authored Aug 07, 2014

commit 51d7d520 upstream.

The kernel defines the function spin_is_locked(), which can be used to
check if a spinlock is currently locked.

Using spin_is_locked() on a lock you don't hold is obviously racy. That
is, even though you may observe that the lock is unlocked, it may become
locked at any time.

There is (at least) one exception to that, which is if two locks are
used as a pair, and the holder of each checks the status of the other
before doing any update.

Assuming *A and *B are two locks, and *COUNTER is a shared non-atomic
value:

The first CPU does:

	spin_lock(*A)

	if spin_is_locked(*B)
		# nothing
	else
		smp_mb()
		LOAD r = *COUNTER
		r++
		STORE *COUNTER = r

	spin_unlock(*A)

And the second CPU does:

	spin_lock(*B)

	if spin_is_locked(*A)
		# nothing
	else
		smp_mb()
		LOAD r = *COUNTER
		r++
		STORE *COUNTER = r

	spin_unlock(*B)

Although this is a strange locking construct, it should work.

It seems to be understood, but not documented, that spin_is_locked() is
not a memory barrier, so in the examples above and below the caller
inserts its own memory barrier before acting on the result of
spin_is_locked().

For now we assume spin_is_locked() is implemented as below, and we break
it out in our examples:

	bool spin_is_locked(*LOCK) {
		LOAD l = *LOCK
		return l.locked
	}

Our intuition is that there should be no problem even if the two code
sequences run simultaneously such as:

	CPU 0			CPU 1
	==================================================
	spin_lock(*A)		spin_lock(*B)
	LOAD b = *B		LOAD a = *A
	if b.locked # true	if a.locked # true
	# nothing		# nothing
	spin_unlock(*A)		spin_unlock(*B)

If one CPU gets the lock before the other then it will do the update and
the other CPU will back off:

	CPU 0			CPU 1
	==================================================
	spin_lock(*A)
	LOAD b = *B
				spin_lock(*B)
	if b.locked # false	LOAD a = *A
	else			if a.locked # true
	smp_mb()		# nothing
	LOAD r1 = *COUNTER	spin_unlock(*B)
	r1++
	STORE *COUNTER = r1
	spin_unlock(*A)

However in reality spin_lock() itself is not indivisible. On powerpc we
implement it as a load-and-reserve and store-conditional.

Ignoring the retry logic for the lost reservation case, it boils down to:
	spin_lock(*LOCK) {
		LOAD l = *LOCK
		l.locked = true
		STORE *LOCK = l
		ACQUIRE_BARRIER
	}

The ACQUIRE_BARRIER is required to give spin_lock() ACQUIRE semantics as
defined in memory-barriers.txt:

     This acts as a one-way permeable barrier.  It guarantees that all
     memory operations after the ACQUIRE operation will appear to happen
     after the ACQUIRE operation with respect to the other components of
     the system.

On modern powerpc systems we use lwsync for ACQUIRE_BARRIER. lwsync is
also know as "lightweight sync", or "sync 1".

As described in Power ISA v2.07 section B.2.1.1, in this scenario the
lwsync is not the barrier itself. It instead causes the LOAD of *LOCK to
act as the barrier, preventing any loads or stores in the locked region
from occurring prior to the load of *LOCK.

Whether this behaviour is in accordance with the definition of ACQUIRE
semantics in memory-barriers.txt is open to discussion, we may switch to
a different barrier in future.

What this means in practice is that the following can occur:

	CPU 0			CPU 1
	==================================================
	LOAD a = *A 		LOAD b = *B
	a.locked = true		b.locked = true
	LOAD b = *B		LOAD a = *A
	STORE *A = a		STORE *B = b
	if b.locked # false	if a.locked # false
	else			else
	smp_mb()		smp_mb()
	LOAD r1 = *COUNTER	LOAD r2 = *COUNTER
	r1++			r2++
	STORE *COUNTER = r1
				STORE *COUNTER = r2	# Lost update
	spin_unlock(*A)		spin_unlock(*B)

That is, the load of *B can occur prior to the store that makes *A
visibly locked. And similarly for CPU 1. The result is both CPUs hold
their lock and believe the other lock is unlocked.

The easiest fix for this is to add a full memory barrier to the start of
spin_is_locked(), so adding to our previous definition would give us:

	bool spin_is_locked(*LOCK) {
		smp_mb()
		LOAD l = *LOCK
		return l.locked
	}

The new barrier orders the store to the lock we are locking vs the load
of the other lock:

	CPU 0			CPU 1
	==================================================
	LOAD a = *A 		LOAD b = *B
	a.locked = true		b.locked = true
	STORE *A = a		STORE *B = b
	smp_mb()		smp_mb()
	LOAD b = *B		LOAD a = *A
	if b.locked # true	if a.locked # true
	# nothing		# nothing
	spin_unlock(*A)		spin_unlock(*B)

Although the above example is theoretical, there is code similar to this
example in sem_lock() in ipc/sem.c. This commit in addition to the next
commit appears to be a fix for crashes we are seeing in that code where
we believe this race happens in practice.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

2dd10ce8

powerpc/perf: Fix ABIv2 kernel backtraces · cc8dcb69

Anton Blanchard authored Aug 26, 2014

commit 85101af1 upstream.

ABIv2 kernels are failing to backtrace through the kernel. An example:

39.30%  readseek2_proce  [kernel.kallsyms]    [k] find_get_entry
            |
            --- find_get_entry
               __GI___libc_read

The problem is in valid_next_sp() where we check that the new stack
pointer is at least STACK_FRAME_OVERHEAD below the previous one.

ABIv1 has a minimum stack frame size of 112 bytes consisting of 48 bytes
and 64 bytes of parameter save area. ABIv2 changes that to 32 bytes
with no paramter save area.

STACK_FRAME_OVERHEAD is in theory the minimum stack frame size,
but we over 240 uses of it, some of which assume that it includes
space for the parameter area.

We need to work through all our stack defines and rationalise them
but let's fix perf now by creating STACK_FRAME_MIN_SIZE and using
in valid_next_sp(). This fixes the issue:

30.64%  readseek2_proce  [kernel.kallsyms]    [k] find_get_entry
            |
            --- find_get_entry
               pagecache_get_page
               generic_file_read_iter
               new_sync_read
               vfs_read
               sys_read
               syscall_exit
               __GI___libc_read
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cc8dcb69

ath9k_htc: fix random decryption failure · 981b2611

Johannes Stezenbach authored Sep 12, 2014

commit d21ccfd0 upstream.

In v3.15 the driver stopped to accept network packets after successful
authentification, which could be worked around by passing the
nohwcrypt=1 module parameter.  This was not reproducible by
everyone, and showed random behaviour in some tests.
It was caused by an uninitialized variable introduced
in 4ed1a8d4 ("ath9k_htc: use ath9k_cmn_rx_accept") and
used in 341b29b9 ("ath9k_htc: use ath9k_cmn_rx_skb_postprocess").

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=78581
Fixes: 341b29b9 ("ath9k_htc: use ath9k_cmn_rx_skb_postprocess")
Signed-off-by: Johannes Stezenbach <js@sig21.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

981b2611

brcmfmac: handle IF event for P2P_DEVICE interface · 6b2a8bf7

Arend van Spriel authored Sep 12, 2014

commit 87c47903 upstream.

The firmware notifies about interface changes through the IF event
which has a NO_IF flag that means host can ignore the event. This
behaviour was introduced in the driver by:

  commit 2ee8382f
  Author: Arend van Spriel <arend@broadcom.com>
  Date:   Sat Aug 10 12:27:24 2013 +0200

      brcmfmac: ignore IF event if firmware indicates it

It turns out that the IF event for the P2P_DEVICE also has this
flag set, but the event should not be ignored in this scenario.
The mentioned commit caused a regression in 3.12 kernel in creation
of the P2P_DEVICE interface.
Reviewed-by: Hante Meuleman <meuleman@broadcom.com>
Reviewed-by: Franky (Zhenhui) Lin <frankyl@broadcom.com>
Reviewed-by: Daniel (Deognyoun) Kim <dekim@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

6b2a8bf7

sched: Fix unreleased llc_shared_mask bit during CPU hotplug · f99234e1

Wanpeng Li authored Sep 24, 2014

commit 03bd4e1f upstream.

The following bug can be triggered by hot adding and removing a large number of
xen domain0's vcpus repeatedly:

	BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 IP: [..] find_busiest_group
	PGD 5a9d5067 PUD 13067 PMD 0
	Oops: 0000 [#3] SMP
	[...]
	Call Trace:
	load_balance
	? _raw_spin_unlock_irqrestore
	idle_balance
	__schedule
	schedule
	schedule_timeout
	? lock_timer_base
	schedule_timeout_uninterruptible
	msleep
	lock_device_hotplug_sysfs
	online_store
	dev_attr_store
	sysfs_write_file
	vfs_write
	SyS_write
	system_call_fastpath

Last level cache shared mask is built during CPU up and the
build_sched_domain() routine takes advantage of it to setup
the sched domain CPU topology.

However, llc_shared_mask is not released during CPU disable,
which leads to an invalid sched domainCPU topology.

This patch fix it by releasing the llc_shared_mask correctly
during CPU disable.

Yasuaki also reported that this can happen on real hardware:

  https://lkml.org/lkml/2014/7/22/1018

His case is here:

	==
	Here is an example on my system.
	My system has 4 sockets and each socket has 15 cores and HT is
	enabled. In this case, each core of sockes is numbered as
	follows:

		 | CPU#
	Socket#0 | 0-14 , 60-74
	Socket#1 | 15-29, 75-89
	Socket#2 | 30-44, 90-104
	Socket#3 | 45-59, 105-119

	Then llc_shared_mask of CPU#30 has 0x3fff80000001fffc0000000.

	It means that last level cache of Socket#2 is shared with
	CPU#30-44 and 90-104.

	When hot-removing socket#2 and #3, each core of sockets is
	numbered as follows:

		 | CPU#
	Socket#0 | 0-14 , 60-74
	Socket#1 | 15-29, 75-89

	But llc_shared_mask is not cleared. So llc_shared_mask of CPU#30
	remains having 0x3fff80000001fffc0000000.

	After that, when hot-adding socket#2 and #3, each core of
	sockets is numbered as follows:

		 | CPU#
	Socket#0 | 0-14 , 60-74
	Socket#1 | 15-29, 75-89
	Socket#2 | 30-59
	Socket#3 | 90-119

	Then llc_shared_mask of CPU#30 becomes
	0x3fff8000fffffffc0000000. It means that last level cache of
	Socket#2 is shared with CPU#30-59 and 90-104. So the mask has
	the wrong value.
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Tested-by: Linn Crosetto <linn@hp.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Toshi Kani <toshi.kani@hp.com>
Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1411547885-48165-1-git-send-email-wanpeng.li@linux.intel.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f99234e1

mm: softdirty: keep bit when zapping file pte · ffa16dcb

Peter Feiner authored Sep 25, 2014

commit dbab31aa upstream.

This fixes the same bug as b43790ee ("mm: softdirty: don't forget to
save file map softdiry bit on unmap") and 9aed8614 ("mm/memory.c:
don't forget to set softdirty on file mapped fault") where the return
value of pte_*mksoft_dirty was being ignored.

To be sure that no other pte/pmd "mk" function return values were being
ignored, I annotated the functions in arch/x86/include/asm/pgtable.h
with __must_check and rebuilt.

The userspace effect of this bug is that the softdirty mark might be
lost if a file mapped pte get zapped.
Signed-off-by: Peter Feiner <pfeiner@google.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Jamie Liu <jamieliu@google.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ffa16dcb

fs/cachefiles: add missing \n to kerror conversions · 152f5bed

Fabian Frederick authored Sep 25, 2014

commit 6ff66ac7 upstream.

Commit 0227d6ab ("fs/cachefiles: replace kerror by pr_err") didn't
include newline featuring in original kerror definition
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Reported-by: David Howells <dhowells@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

152f5bed

mm, slab: initialize object alignment on cache creation · f5eae161

David Rientjes authored Sep 25, 2014

commit d4a5fca5 upstream.

Since commit 45906855 ("mm/sl[aou]b: Common alignment code"), the
"ralign" automatic variable in __kmem_cache_create() may be used as
uninitialized.

The proper alignment defaults to BYTES_PER_WORD and can be overridden by
SLAB_RED_ZONE or the alignment specified by the caller.

This fixes https://bugzilla.kernel.org/show_bug.cgi?id=85031Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: Andrei Elovikov <a.elovikov@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f5eae161

ocfs2/dlm: do not get resource spinlock if lockres is new · a5d969ea

Joseph Qi authored Sep 25, 2014

commit 5760a97c upstream.

There is a deadlock case which reported by Guozhonghua:
  https://oss.oracle.com/pipermail/ocfs2-devel/2014-September/010079.html

This case is caused by &res->spinlock and &dlm->master_lock
misordering in different threads.

It was introduced by commit 8d400b81 ("ocfs2/dlm: Clean up refmap
helpers").  Since lockres is new, it doesn't not require the
&res->spinlock.  So remove it.

Fixes: 8d400b81 ("ocfs2/dlm: Clean up refmap helpers")
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: joyce.xue <xuejiufei@huawei.com>
Reported-by: Guozhonghua <guozhonghua@h3c.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

a5d969ea

nilfs2: fix data loss with mmap() · 1ba8582b

Andreas Rohner authored Sep 25, 2014

commit 56d7acc7 upstream.

This bug leads to reproducible silent data loss, despite the use of
msync(), sync() and a clean unmount of the file system.  It is easily
reproducible with the following script:

  ----------------[BEGIN SCRIPT]--------------------
  mkfs.nilfs2 -f /dev/sdb
  mount /dev/sdb /mnt

  dd if=/dev/zero bs=1M count=30 of=/mnt/testfile

  umount /mnt
  mount /dev/sdb /mnt
  CHECKSUM_BEFORE="$(md5sum /mnt/testfile)"

  /root/mmaptest/mmaptest /mnt/testfile 30 10 5

  sync
  CHECKSUM_AFTER="$(md5sum /mnt/testfile)"
  umount /mnt
  mount /dev/sdb /mnt
  CHECKSUM_AFTER_REMOUNT="$(md5sum /mnt/testfile)"
  umount /mnt

  echo "BEFORE MMAP:\t$CHECKSUM_BEFORE"
  echo "AFTER MMAP:\t$CHECKSUM_AFTER"
  echo "AFTER REMOUNT:\t$CHECKSUM_AFTER_REMOUNT"
  ----------------[END SCRIPT]--------------------

The mmaptest tool looks something like this (very simplified, with
error checking removed):

  ----------------[BEGIN mmaptest]--------------------
  data = mmap(NULL, file_size - file_offset, PROT_READ | PROT_WRITE,
              MAP_SHARED, fd, file_offset);

  for (i = 0; i < write_count; ++i) {
        memcpy(data + i * 4096, buf, sizeof(buf));
        msync(data, file_size - file_offset, MS_SYNC))
  }
  ----------------[END mmaptest]--------------------

The output of the script looks something like this:

  BEFORE MMAP:    281ed1d5ae50e8419f9b978aab16de83  /mnt/testfile
  AFTER MMAP:     6604a1c31f10780331a6850371b3a313  /mnt/testfile
  AFTER REMOUNT:  281ed1d5ae50e8419f9b978aab16de83  /mnt/testfile

So it is clear, that the changes done using mmap() do not survive a
remount.  This can be reproduced a 100% of the time.  The problem was
introduced in commit 136e8770 ("nilfs2: fix issue of
nilfs_set_page_dirty() for page at EOF boundary").

If the page was read with mpage_readpage() or mpage_readpages() for
example, then it has no buffers attached to it.  In that case
page_has_buffers(page) in nilfs_set_page_dirty() will be false.
Therefore nilfs_set_file_dirty() is never called and the pages are never
collected and never written to disk.

This patch fixes the problem by also calling nilfs_set_file_dirty() if the
page has no buffers attached to it.

[akpm@linux-foundation.org: s/PAGE_SHIFT/PAGE_CACHE_SHIFT/]
Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net>
Tested-by: Andreas Rohner <andreas.rohner@gmx.net>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

1ba8582b

fs/notify: don't show f_handle if exportfs_encode_inode_fh failed · 312a707d

Andrey Vagin authored Sep 09, 2014

commit 7e882481 upstream.

Currently we handle only ENOSPC.  In case of other errors the file_handle
variable isn't filled properly and we will show a part of stack.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

312a707d

fsnotify/fdinfo: use named constants instead of hardcoded values · 009adfc8

Andrey Vagin authored Sep 09, 2014

commit 1fc98d11 upstream.

MAX_HANDLE_SZ is equal to 128, but currently the size of pad is only 64
bytes, so exportfs_encode_inode_fh can return an error.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

009adfc8