Commits · 017fd99beb3ccdb301009fa8f905f574e3e3ce29 · Kirill Smelkov / linux

18 Sep, 2015 40 commits

libata: Ignore spurious PHY event on LPM policy change · 017fd99b

Gabriele Mazzotta authored Apr 25, 2015

commit 09c5b480 upstream.

When the LPM policy is set to ATA_LPM_MAX_POWER, the device might
generate a spurious PHY event that cuases errors on the link.
Ignore this event if it occured within 10s after the policy change.

The timeout was chosen observing that on a Dell XPS13 9333 these
spurious events can occur up to roughly 6s after the policy change.

Link: http://lkml.kernel.org/g/3352987.ugV1Ipy7Z5@xps13Signed-off-by: Gabriele Mazzotta <gabriele.mzt@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <lizefan@huawei.com>

017fd99b

libata: Add helper to determine when PHY events should be ignored · ba4e97b4

Gabriele Mazzotta authored Apr 25, 2015

commit 8393b811 upstream.

This is a preparation commit that will allow to add other criteria
according to which PHY events should be dropped.
Signed-off-by: Gabriele Mazzotta <gabriele.mzt@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Zefan Li <lizefan@huawei.com>

ba4e97b4

ocfs2: dlm: fix race between purge and get lock resource · d939e53d

Junxiao Bi authored May 05, 2015

commit b1432a2a upstream.

There is a race window in dlm_get_lock_resource(), which may return a
lock resource which has been purged.  This will cause the process to
hang forever in dlmlock() as the ast msg can't be handled due to its
lock resource not existing.

    dlm_get_lock_resource {
        ...
        spin_lock(&dlm->spinlock);
        tmpres = __dlm_lookup_lockres_full(dlm, lockid, namelen, hash);
        if (tmpres) {
             spin_unlock(&dlm->spinlock);
             >>>>>>>> race window, dlm_run_purge_list() may run and purge
                              the lock resource
             spin_lock(&tmpres->spinlock);
             ...
             spin_unlock(&tmpres->spinlock);
        }
    }
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Zefan Li <lizefan@huawei.com>

d939e53d

nilfs2: fix sanity check of btree level in nilfs_btree_root_broken() · a885169f

Ryusuke Konishi authored May 05, 2015

commit d8fd150f upstream.

The range check for b-tree level parameter in nilfs_btree_root_broken()
is wrong; it accepts the case of "level == NILFS_BTREE_LEVEL_MAX" even
though the level is limited to values in the range of 0 to
(NILFS_BTREE_LEVEL_MAX - 1).

Since the level parameter is read from storage device and used to index
nilfs_btree_path array whose element count is NILFS_BTREE_LEVEL_MAX, it
can cause memory overrun during btree operations if the boundary value
is set to the level parameter on device.

This fixes the broken sanity check and adds a comment to clarify that
the upper bound NILFS_BTREE_LEVEL_MAX is exclusive.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Zefan Li <lizefan@huawei.com>

a885169f

mm/memory-failure: call shake_page() when error hits thp tail page · 350b59e3

Naoya Horiguchi authored May 05, 2015

commit 09789e5d upstream.

Currently memory_failure() calls shake_page() to sweep pages out from
pcplists only when the victim page is 4kB LRU page or thp head page.
But we should do this for a thp tail page too.

Consider that a memory error hits a thp tail page whose head page is on
a pcplist when memory_failure() runs.  Then, the current kernel skips
shake_pages() part, so hwpoison_user_mappings() returns without calling
split_huge_page() nor try_to_unmap() because PageLRU of the thp head is
still cleared due to the skip of shake_page().

As a result, me_huge_page() runs for the thp, which is broken behavior.

One effect is a leak of the thp.  And another is to fail to isolate the
memory error, so later access to the error address causes another MCE,
which kills the processes which used the thp.

This patch fixes this problem by calling shake_page() for thp tail case.

Fixes: 385de357 ("thp: allow a hwpoisoned head page to be put back to LRU")
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Dean Nelson <dnelson@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Zefan Li <lizefan@huawei.com>

350b59e3

xen/events: Set irq_info->evtchn before binding the channel to CPU in __startup_pirq() · 95f92efe

Boris Ostrovsky authored Apr 29, 2015

commit 16e6bd59 upstream.

.. because bind_evtchn_to_cpu(evtchn, cpu) will map evtchn to
'info' and pass 'info' down to xen_evtchn_port_bind_to_cpu().
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Tested-by: Annie Li <annie.li@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
[lizf: Backported to 3.4: adjust filename and context]
Signed-off-by: Zefan Li <lizefan@huawei.com>

95f92efe

xen/console: Update console event channel on resume · da911401

Boris Ostrovsky authored Apr 29, 2015

commit b9d934f2 upstream.

After a resume the hypervisor/tools may change console event
channel number. We should re-query it.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>

da911401

RDMA/CMA: Canonize IPv4 on IPV6 sockets properly · bbb55274

Jason Gunthorpe authored Apr 20, 2015

commit 28521440 upstream.

When accepting a new IPv4 connect to an IPv6 socket, the CMA tries to
canonize the address family to IPv4, but does not properly process
the listening sockaddr to get the listening port, and does not properly
set the address family of the canonized sockaddr.

Fixes: e51060f0 ("IB: IP address based RDMA connection manager")
Reported-By: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Tested-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
[lizf: Backported to 3.4:
 - there's no cma_save_ip4_info() and cma_save_ip6_info(), and instead
   we apply the changes to cma_save_net_info()]
Signed-off-by: Zefan Li <lizefan@huawei.com>

bbb55274

mmc: core: add missing pm event in mmc_pm_notify to fix hib restore · df48a548

Grygorii Strashko authored Apr 23, 2015

commit 184af16b upstream.

The PM_RESTORE_PREPARE is not handled now in mmc_pm_notify(),
as result mmc_rescan() could be scheduled and executed at
late hibernation restore stages when MMC device is suspended
already - which, in turn, will lead to system crash on TI dra7-evm board:

WARNING: CPU: 0 PID: 3188 at drivers/bus/omap_l3_noc.c:148 l3_interrupt_handler+0x258/0x374()
44000000.ocp:L3 Custom Error: MASTER MPU TARGET L4_PER1_P3 (Idle): Data Access in User mode during Functional access

Hence, add missed PM_RESTORE_PREPARE PM event in mmc_pm_notify().

Fixes: 4c2ef25f (mmc: fix all hangs related to mmc/sd card...)
Signed-off-by: Grygorii Strashko <Grygorii.Strashko@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <lizefan@huawei.com>

df48a548

ARM: pxa: lubbock: use new pxa_cplds driver · 5a16f737

Robert Jarzmik authored Dec 14, 2014

commit fc9e38c0 upstream.

As the interrupt handling was transferred to the pxa_cplds driver,
make the switch in lubbock platform code.

Fixes: 157d2644 ("ARM: pxa: change gpio to platform device")
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Zefan Li <lizefan@huawei.com>

5a16f737

ARM: pxa: mainstone: use new pxa_cplds driver · 7d5e6688

Robert Jarzmik authored Apr 24, 2015

commit 27768863 upstream.

As the interrupt handling was transferred to the pxa_cplds driver,
make the switch in mainstone platform code.

Fixes: 157d2644 ("ARM: pxa: change gpio to platform device")
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Zefan Li <lizefan@huawei.com>

7d5e6688

ARM: pxa: pxa_cplds: add lubbock and mainstone IO · f6922538

Robert Jarzmik authored Apr 24, 2015

commit aa8d6b73 upstream.

Historically, this support was in arch/arm/mach-pxa/lubbock.c and
arch/arm/mach-pxa/mainstone.c. When gpio-pxa was moved to drivers/pxa,
it became a driver, and its initialization and probing happened at
postcore initcall. The lubbock code used to install the chained lubbock
interrupt handler at init_irq() time.

The consequence of the gpio-pxa change is that the installed chained irq
handler lubbock_irq_handler() was overwritten in pxa_gpio_probe(_dt)(),
removing :
 - the handler
 - the falling edge detection setting of GPIO0, which revealed the
   interrupt request from the lubbock IO board.

As a fix, move the gpio0 chained handler setup to a place where we have
the guarantee that pxa_gpio_probe() was called before, so that lubbock
handler becomes the true IRQ chained handler of GPIO0, demuxing the
lubbock IO board interrupts.

This patch moves all that handling to a mfd driver. It's only purpose
for the time being is the interrupt handling, but in the future it
should encompass all the motherboard CPLDs handling :
 - leds
 - switches
 - hexleds

The same logic applies to mainstone board.

Fixes: 157d2644 ("ARM: pxa: change gpio to platform device")
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Zefan Li <lizefan@huawei.com>

f6922538

ext4: move check under lock scope to close a race. · 0c797892

Davide Italiano authored May 02, 2015

commit 280227a7 upstream.

fallocate() checks that the file is extent-based and returns
EOPNOTSUPP in case is not. Other tasks can convert from and to
indirect and extent so it's safe to check only after grabbing
the inode mutex.
Signed-off-by: Davide Italiano <dccitaliano@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[lizf: Backported to 3.4:
 - adjust context
 - return -EOPNOTSUPP instead of jumping to the "out" label]
Signed-off-by: Zefan Li <lizefan@huawei.com>

0c797892

powerpc/pseries: Correct cpu affinity for dlpar added cpus · c79a5426

Nathan Fontenot authored Apr 29, 2015

commit f32393c9 upstream.

The incorrect ordering of operations during cpu dlpar add results in invalid
affinity for the cpu being added. The ibm,associativity property in the
device tree is populated with all zeroes for the added cpu which results in
invalid affinity mappings and all cpus appear to belong to node 0.

This occurs because rtas configure-connector is called prior to making the
rtas set-indicator calls. Phyp does not assign affinity information
for a cpu until the rtas set-indicator calls are made to set the isolation
and allocation state.

Correct the order of operations to make the rtas set-indicator
calls (done in dlpar_acquire_drc) before calling rtas configure-connector.

Fixes: 1a8061c4 ("powerpc/pseries: Add kernel based CPU DLPAR handling")
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
[lizf: Backported to 3.4:
 - adjust context
 - jump to the "out" lable instead of returning -EINVAL]
Signed-off-by: Zefan Li <lizefan@huawei.com>

c79a5426

ALSA: emu10k1: Emu10k2 32 bit DMA mode · 9ae05197

Peter Zubaj authored Apr 28, 2015

commit 7241ea55 upstream.

Looks like audigy emu10k2 (probably emu10k1 - sb live too) support two
modes for DMA. Second mode is useful for 64 bit os with more then 2 GB
of ram (fixes problems with big soundfont loading)

1) 32MB from 2 GB address space using 8192 pages (used now as default)
2) 16MB from 4 GB address space using 4096 pages

Mode is set using HCFG_EXPANDED_MEM flag in HCFG register.
Also format of emu10k2 page table is then different.
Signed-off-by: Peter Zubaj <pzubaj@marticonet.sk>
Tested-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Zefan Li <lizefan@huawei.com>

9ae05197

ALSA: emux: Fix mutex deadlock in OSS emulation · dbaa20e3

Takashi Iwai authored Apr 28, 2015

commit 1c94e65c upstream.

The OSS emulation in synth-emux helper has a potential AB/BA deadlock
at the simultaneous closing and opening:

  close ->
    snd_seq_release() ->
      sne_seq_free_client() ->
        snd_seq_delete_all_ports(): takes client->ports_mutex ->
	  port_delete() ->
	    snd_emux_unuse(): takes emux->register_mutex

  open ->
    snd_seq_oss_open() ->
      snd_emux_open_seq_oss(): takes emux->register_mutex ->
        snd_seq_event_port_attach() ->
	  snd_seq_create_port(): takes client->ports_mutex

This patch addresses the deadlock by reducing the rance taking
emux->register_mutex in snd_emux_open_seq_oss().  The lock is needed
for the refcount handling, so move it locally.  The calls in
emux_seq.c are already with the mutex, thus they are replaced with the
version without mutex lock/unlock.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Zefan Li <lizefan@huawei.com>

dbaa20e3

serial: of-serial: Remove device_type = "serial" registration · 475aba9a

Michal Simek authored Apr 14, 2015

commit 6befa9d8 upstream.

Do not probe all serial drivers by of_serial.c which are using
device_type = "serial"; property. Only drivers which have valid
compatible strings listed in the driver should be probed.

When PORT_UNKNOWN is setup probe will fail anyway.

Arnd quotation about driver historical background:
"when I wrote that driver initially, the idea was that it would
get used as a stub to hook up all other serial drivers but after
that, the common code learned to create platform devices from DT"

This patch fix the problem with on the system with xilinx_uartps and
16550a where of_serial failed to register for xilinx_uartps and because
of irq_dispose_mapping() removed irq_desc. Then when xilinx_uartps was asking
for irq with request_irq() EINVAL is returned.
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Zefan Li <lizefan@huawei.com>

475aba9a

serial: xilinx: Use platform_get_irq to get irq description structure · 9aabcbc0

Michal Simek authored Apr 13, 2015

commit 5c90c07b upstream.

For systems with CONFIG_SERIAL_OF_PLATFORM=y and device_type =
"serial"; property in DT of_serial.c driver maps and unmaps IRQ (because
driver probe fails). Then a driver is called but irq mapping is not
created that's why driver is failing again in again on request_irq().
Based on this use platform_get_irq() instead of platform_get_resource()
which is doing irq_desc allocation and driver itself can request IRQ.

Fix both xilinx serial drivers in the tree.
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <lizefan@huawei.com>

9aabcbc0

3w-9xxx: fix command completion race · c2f1b991

Christoph Hellwig authored Apr 23, 2015

commit 118c855b upstream.

The 3w-9xxx driver needs to tear down the dma mappings before returning
the command to the midlayer, as there is no guarantee the sglist and
count are valid after that point.  Also remove the dma mapping helpers
which have another inherent race due to the request_id index.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Adam Radford <aradford@gmail.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>

c2f1b991

3w-xxxx: fix command completion race · 4d5e62fd

Christoph Hellwig authored Apr 23, 2015

commit 9cd95546 upstream.

The 3w-xxxx driver needs to tear down the dma mappings before returning
the command to the midlayer, as there is no guarantee the sglist and
count are valid after that point.  Also remove the dma mapping helpers
which have another inherent race due to the request_id index.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Adam Radford <aradford@gmail.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>

4d5e62fd

3w-sas: fix command completion race · e456c24e

Christoph Hellwig authored Apr 23, 2015

commit 579d69bc upstream.

The 3w-sas driver needs to tear down the dma mappings before returning
the command to the midlayer, as there is no guarantee the sglist and
count are valid after that point.  Also remove the dma mapping helpers
which have another inherent race due to the request_id index.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Torsten Luettgert <ml-lkml@enda.eu>
Tested-by: Bernd Kardatzki <Bernd.Kardatzki@med.uni-tuebingen.de>
Acked-by: Adam Radford <aradford@gmail.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>

e456c24e

SCSI: add 1024 max sectors black list flag · 3fe48d03

Mike Christie authored Apr 20, 2015

commit 35e9a9f9 upstream.

This works around a issue with qnap iscsi targets not handling large IOs
very well.

The target returns:

VPD INQUIRY: Block limits page (SBC)
  Maximum compare and write length: 1 blocks
  Optimal transfer length granularity: 1 blocks
  Maximum transfer length: 4294967295 blocks
  Optimal transfer length: 4294967295 blocks
  Maximum prefetch, xdread, xdwrite transfer length: 0 blocks
  Maximum unmap LBA count: 8388607
  Maximum unmap block descriptor count: 1
  Optimal unmap granularity: 16383
  Unmap granularity alignment valid: 0
  Unmap granularity alignment: 0
  Maximum write same length: 0xffffffff blocks
  Maximum atomic transfer length: 0
  Atomic alignment: 0
  Atomic transfer length granularity: 0

and it is *sometimes* able to handle at least one IO of size up to 8 MB. We
have seen in traces where it will sometimes work, but other times it
looks like it fails and it looks like it returns failures if we send
multiple large IOs sometimes. Also it looks like it can return 2 different
errors. It will sometimes send iscsi reject errors indicating out of
resources or it will send invalid cdb illegal requests check conditions.
And then when it sends iscsi rejects it does not seem to handle retries
when there are command sequence holes, so I could not just add code to
try and gracefully handle that error code.

The problem is that we do not have a good contact for the company,
so we are not able to determine under what conditions it returns
which error and why it sometimes works.

So, this patch just adds a new black list flag to set targets like this to
the old max safe sectors of 1024. The max_hw_sectors changes added in 3.19
caused this regression, so I also ccing stable.
Reported-by: Christian Hesse <list@eworm.de>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <lizefan@huawei.com>

3fe48d03

drm/radeon: Use drm_calloc_ab for CS relocs · 961bd135

Michel Dänzer authored Apr 16, 2015

commit b421ed15 upstream.

The number of relocs is passed in by userspace and can be large. It has
been observed to cause kcalloc failures in the wild.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <lizefan@huawei.com>

961bd135

ALSA: emux: Fix mutex deadlock at unloading · 0243c10d

Takashi Iwai authored Apr 27, 2015

commit 07b0e5d4 upstream.

The emux-synth driver has a possible AB/BA mutex deadlock at unloading
the emu10k1 driver:

  snd_emux_free() ->
    snd_emux_detach_seq(): mutex_lock(&emu->register_mutex) ->
      snd_seq_delete_kernel_client() ->
        snd_seq_free_client(): mutex_lock(&register_mutex)

  snd_seq_release() ->
    snd_seq_free_client(): mutex_lock(&register_mutex) ->
      snd_seq_delete_all_ports() ->
        snd_emux_unuse(): mutex_lock(&emu->register_mutex)

Basically snd_emux_detach_seq() doesn't need a protection of
emu->register_mutex as it's already being unregistered.  So, we can
get rid of this for avoiding the deadlock.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Zefan Li <lizefan@huawei.com>

0243c10d

ALSA: emu10k1: Fix card shortname string buffer overflow · 96a987df

Takashi Iwai authored Apr 27, 2015

commit d0226082 upstream.

Some models provide too long string for the shortname that has 32bytes
including the terminator, and it results in a non-terminated string
exposed to the user-space.  This isn't too critical, though, as the
string is stopped at the succeeding longname string.

This patch fixes such entries by dropping "SB" prefix (it's enough to
fit within 32 bytes, so far).  Meanwhile, it also changes strcpy()
with strlcpy() to make sure that this kind of problem won't happen in
future, too.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Zefan Li <lizefan@huawei.com>

96a987df

ALSA: hda - Fix mute-LED fixed mode · e35facb7

Takashi Iwai authored Apr 27, 2015

commit ee52e56e upstream.

The mute-LED mode control has the fixed on/off states that are
supposed to remain on/off regardless of the master switch.  However,
this doesn't work actually because the vmaster hook is called in the
vmaster code itself.

This patch fixes it by calling the hook indirectly after checking the
mute LED mode.
Reported-and-tested-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Zefan Li <lizefan@huawei.com>

e35facb7

RCU pathwalk breakage when running into a symlink overmounting something · a26f33c5

Al Viro authored Apr 24, 2015

commit 3cab989a upstream.

Calling unlazy_walk() in walk_component() and do_last() when we find
a symlink that needs to be followed doesn't acquire a reference to vfsmount.
That's fine when the symlink is on the same vfsmount as the parent directory
(which is almost always the case), but it's not always true - one _can_
manage to bind a symlink on top of something.  And in such cases we end up
with excessive mntput().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[lizf: Backported to 3.4: drop the changes to do_last()]
Signed-off-by: Zefan Li <lizefan@huawei.com>

a26f33c5

nfs: fix high load average due to callback thread sleeping · 20db5788

Jeff Layton authored Mar 20, 2015

commit 5d05e54a upstream.

Chuck pointed out a problem that crept in with commit 6ffa30d3 (nfs:
don't call blocking operations while !TASK_RUNNING). Linux counts tasks
in uninterruptible sleep against the load average, so this caused the
system's load average to be pinned at at least 1 when there was a
NFSv4.1+ mount active.

Not a huge problem, but it's probably worth fixing before we get too
many complaints about it. This patch converts the code back to use
TASK_INTERRUPTIBLE sleep, simply has it flush any signals on each loop
iteration. In practice no one should really be signalling this thread at
all, so I think this is reasonably safe.

With this change, there's also no need to game the hung task watchdog so
we can also convert the schedule_timeout call back to a normal schedule.
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Tested-by: Chuck Lever <chuck.lever@oracle.com>
Fixes: commit 6ffa30d3 (“nfs: don't call blocking . . .”)
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <lizefan@huawei.com>

20db5788

nfs: don't call blocking operations while !TASK_RUNNING · c9e5b3b7

Jeff Layton authored Jan 14, 2015

commit 6ffa30d3 upstream.

Bruce reported seeing this warning pop when mounting using v4.1:

     ------------[ cut here ]------------
     WARNING: CPU: 1 PID: 1121 at kernel/sched/core.c:7300 __might_sleep+0xbd/0xd0()
    do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff810ff58f>] prepare_to_wait+0x2f/0x90
    Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_timer ppdev joydev snd virtio_console virtio_balloon pcspkr serio_raw parport_pc parport pvpanic floppy soundcore i2c_piix4 virtio_blk virtio_net qxl drm_kms_helper ttm drm virtio_pci virtio_ring ata_generic virtio pata_acpi
    CPU: 1 PID: 1121 Comm: nfsv4.1-svc Not tainted 3.19.0-rc4+ #25
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014
     0000000000000000 000000004e5e3f73 ffff8800b998fb48 ffffffff8186ac78
     0000000000000000 ffff8800b998fba0 ffff8800b998fb88 ffffffff810ac9da
     ffff8800b998fb68 ffffffff81c923e7 00000000000004d9 0000000000000000
    Call Trace:
     [<ffffffff8186ac78>] dump_stack+0x4c/0x65
     [<ffffffff810ac9da>] warn_slowpath_common+0x8a/0xc0
     [<ffffffff810aca65>] warn_slowpath_fmt+0x55/0x70
     [<ffffffff810ff58f>] ? prepare_to_wait+0x2f/0x90
     [<ffffffff810ff58f>] ? prepare_to_wait+0x2f/0x90
     [<ffffffff810dd2ad>] __might_sleep+0xbd/0xd0
     [<ffffffff8124c973>] kmem_cache_alloc_trace+0x243/0x430
     [<ffffffff810d941e>] ? groups_alloc+0x3e/0x130
     [<ffffffff810d941e>] groups_alloc+0x3e/0x130
     [<ffffffffa0301b1e>] svcauth_unix_accept+0x16e/0x290 [sunrpc]
     [<ffffffffa0300571>] svc_authenticate+0xe1/0xf0 [sunrpc]
     [<ffffffffa02fc564>] svc_process_common+0x244/0x6a0 [sunrpc]
     [<ffffffffa02fd044>] bc_svc_process+0x1c4/0x260 [sunrpc]
     [<ffffffffa03d5478>] nfs41_callback_svc+0x128/0x1f0 [nfsv4]
     [<ffffffff810ff970>] ? wait_woken+0xc0/0xc0
     [<ffffffffa03d5350>] ? nfs4_callback_svc+0x60/0x60 [nfsv4]
     [<ffffffff810d45bf>] kthread+0x11f/0x140
     [<ffffffff810ea815>] ? local_clock+0x15/0x30
     [<ffffffff810d44a0>] ? kthread_create_on_node+0x250/0x250
     [<ffffffff81874bfc>] ret_from_fork+0x7c/0xb0
     [<ffffffff810d44a0>] ? kthread_create_on_node+0x250/0x250
    ---[ end trace 675220a11e30f4f2 ]---

nfs41_callback_svc does most of its work while in TASK_INTERRUPTIBLE,
which is just wrong. Fix that by finishing the wait immediately if we've
found that the list has something on it.

Also, we don't expect this kthread to accept signals, so we should be
using a TASK_UNINTERRUPTIBLE sleep instead. That however, opens us up
hung task warnings from the watchdog, so have the schedule_timeout
wake up every 60s if there's no callback activity.
Reported-by: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <lizefan@huawei.com>

c9e5b3b7

nfsd: fix nsfd startup race triggering BUG_ON · c1515815

Giuseppe Cantavenera authored Apr 20, 2015

commit bb7ffbf2 upstream.

nfsd triggered a BUG_ON in net_generic(...) when rpc_pipefs_event(...)
in fs/nfsd/nfs4recover.c was called before assigning ntfsd_net_id.
The following was observed on a MIPS 32-core processor:
kernel: Call Trace:
kernel: [<ffffffffc00bc5e4>] rpc_pipefs_event+0x7c/0x158 [nfsd]
kernel: [<ffffffff8017a2a0>] notifier_call_chain+0x70/0xb8
kernel: [<ffffffff8017a4e4>] __blocking_notifier_call_chain+0x4c/0x70
kernel: [<ffffffff8053aff8>] rpc_fill_super+0xf8/0x1a0
kernel: [<ffffffff8022204c>] mount_ns+0xb4/0xf0
kernel: [<ffffffff80222b48>] mount_fs+0x50/0x1f8
kernel: [<ffffffff8023dc00>] vfs_kern_mount+0x58/0xf0
kernel: [<ffffffff802404ac>] do_mount+0x27c/0xa28
kernel: [<ffffffff80240cf0>] SyS_mount+0x98/0xe8
kernel: [<ffffffff80135d24>] handle_sys64+0x44/0x68
kernel:
kernel:
        Code: 0040f809  00000000  2e020001 <00020336> 3c12c00d
                3c02801a  de100000 6442eb98  0040f809
kernel: ---[ end trace 7471374335809536 ]---

Fixed this behaviour by calling register_pernet_subsys(&nfsd_net_ops) before
registering rpc_pipefs_event(...) with the notifier chain.
Signed-off-by: Giuseppe Cantavenera <giuseppe.cantavenera.ext@nokia.com>
Signed-off-by: Lorenzo Restelli <lorenzo.restelli.ext@nokia.com>
Reviewed-by: Kinlong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <lizefan@huawei.com>

c1515815

memstick: mspro_block: add missing curly braces · 4897576f

Dan Carpenter authored Apr 16, 2015

commit 13f6b191 upstream.

Using the indenting we can see the curly braces were obviously intended.
This is a static checker fix, but my guess is that we don't read enough
bytes, because we don't calculate "t_len" correctly.

Fixes: f1d82698 ('memstick: use fully asynchronous request processing')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Alex Dubov <oakad@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Zefan Li <lizefan@huawei.com>

4897576f

ptrace: fix race between ptrace_resume() and wait_task_stopped() · a12cb100

Oleg Nesterov authored Apr 16, 2015

commit b72c1869 upstream.

ptrace_resume() is called when the tracee is still __TASK_TRACED.  We set
tracee->exit_code and then wake_up_state() changes tracee->state.  If the
tracer's sub-thread does wait() in between, task_stopped_code(ptrace => T)
wrongly looks like another report from tracee.

This confuses debugger, and since wait_task_stopped() clears ->exit_code
the tracee can miss a signal.

Test-case:

	#include <stdio.h>
	#include <unistd.h>
	#include <sys/wait.h>
	#include <sys/ptrace.h>
	#include <pthread.h>
	#include <assert.h>

	int pid;

	void *waiter(void *arg)
	{
		int stat;

		for (;;) {
			assert(pid == wait(&stat));
			assert(WIFSTOPPED(stat));
			if (WSTOPSIG(stat) == SIGHUP)
				continue;

			assert(WSTOPSIG(stat) == SIGCONT);
			printf("ERR! extra/wrong report:%x\n", stat);
		}
	}

	int main(void)
	{
		pthread_t thread;

		pid = fork();
		if (!pid) {
			assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
			for (;;)
				kill(getpid(), SIGHUP);
		}

		assert(pthread_create(&thread, NULL, waiter, NULL) == 0);

		for (;;)
			ptrace(PTRACE_CONT, pid, 0, SIGCONT);

		return 0;
	}

Note for stable: the bug is very old, but without 9899d11f "ptrace:
ensure arch_ptrace/ptrace_request can never race with SIGKILL" the fix
should use lock_task_sighand(child).
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Pavel Labath <labath@google.com>
Tested-by: Pavel Labath <labath@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Zefan Li <lizefan@huawei.com>

a12cb100

firmware/ihex2fw.c: restore missing default in switch statement · 241cb823

Nicolas Iooss authored Apr 16, 2015

commit d43698e8 upstream.

Commit 2473238e ("ihex: add support for CS:IP/EIP records") removes
the "default:" statement in the switch block, making the "return
usage();" line dead code and ihex2fw silently ignoring unknown options.
Restore this statement.

This bug was found by building with HOSTCC=clang and adding
-Wunreachable-code-return to HOSTCFLAGS.

Fixes: 2473238e ("ihex: add support for CS:IP/EIP records")
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Zefan Li <lizefan@huawei.com>

241cb823

megaraid_sas: use raw_smp_processor_id() · b7cecd38

Christoph Hellwig authored Apr 15, 2015

commit 16b8528d upstream.

We only want to steer the I/O completion towards a queue, but don't
actually access any per-CPU data, so the raw_ version is fine to use
and avoids the warnings when using smp_processor_id().
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Andy Lutomirski <luto@kernel.org>
Tested-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
[lizf: Backported to 3.4: drop the changes to megasas_build_dcdb_fusion()]
Signed-off-by: Zefan Li <lizefan@huawei.com>

b7cecd38

IB/mlx4: Fix WQE LSO segment calculation · b0772ad6

Erez Shitrit authored Apr 02, 2015

commit ca9b590c upstream.

The current code decreases from the mss size (which is the gso_size
from the kernel skb) the size of the packet headers.

It shouldn't do that because the mss that comes from the stack
(e.g IPoIB) includes only the tcp payload without the headers.

The result is indication to the HW that each packet that the HW sends
is smaller than what it could be, and too many packets will be sent
for big messages.

An easy way to demonstrate one more aspect of the problem is by
configuring the ipoib mtu to be less than 2*hlen (2*56) and then
run app sending big TCP messages. This will tell the HW to send packets
with giant (negative value which under unsigned arithmetics becomes
a huge positive one) length and the QP moves to SQE state.

Fixes: b832be1e ('IB/mlx4: Add IPoIB LSO support')
Reported-by: Matthew Finlay <matt@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>

b0772ad6

IB/core: disallow registering 0-sized memory region · c5498948

Yann Droneaud authored Apr 13, 2015

commit 8abaae62 upstream.

If ib_umem_get() is called with a size equal to 0 and an
non-page aligned address, one page will be pinned and a
0-sized umem will be returned to the caller.

This should not be allowed: it's not expected for a memory
region to have a size equal to 0.

This patch adds a check to explicitly refuse to register
a 0-sized region.

Link: http://mid.gmane.org/cover.1428929103.git.ydroneaud@opteya.com
Cc: Shachar Raindel <raindel@mellanox.com>
Cc: Jack Morgenstein <jackm@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>

c5498948

dm crypt: fix deadlock when async crypto algorithm returns -EBUSY · ebcf545d

Ben Collins authored Apr 03, 2015

commit 0618764c upstream.

I suspect this doesn't show up for most anyone because software
algorithms typically don't have a sense of being too busy.  However,
when working with the Freescale CAAM driver it will return -EBUSY on
occasion under heavy -- which resulted in dm-crypt deadlock.

After checking the logic in some other drivers, the scheme for
crypt_convert() and it's callback, kcryptd_async_done(), were not
correctly laid out to properly handle -EBUSY or -EINPROGRESS.

Fix this by using the completion for both -EBUSY and -EINPROGRESS.  Now
crypt_convert()'s use of completion is comparable to
af_alg_wait_for_completion().  Similarly, kcryptd_async_done() follows
the pattern used in af_alg_complete().

Before this fix dm-crypt would lockup within 1-2 minutes running with
the CAAM driver.  Fix was regression tested against software algorithms
on PPC32 and x86_64, and things seem perfectly happy there as well.
Signed-off-by: Ben Collins <ben.c@servergy.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <lizefan@huawei.com>

ebcf545d

fs/binfmt_elf.c: fix bug in loading of PIE binaries · a425d56c

Michael Davidson authored Apr 14, 2015

commit a87938b2 upstream.

With CONFIG_ARCH_BINFMT_ELF_RANDOMIZE_PIE enabled, and a normal top-down
address allocation strategy, load_elf_binary() will attempt to map a PIE
binary into an address range immediately below mm->mmap_base.

Unfortunately, load_elf_ binary() does not take account of the need to
allocate sufficient space for the entire binary which means that, while
the first PT_LOAD segment is mapped below mm->mmap_base, the subsequent
PT_LOAD segment(s) end up being mapped above mm->mmap_base into the are
that is supposed to be the "gap" between the stack and the binary.

Since the size of the "gap" on x86_64 is only guaranteed to be 128MB this
means that binaries with large data segments > 128MB can end up mapping
part of their data segment over their stack resulting in corruption of the
stack (and the data segment once the binary starts to run).

Any PIE binary with a data segment > 128MB is vulnerable to this although
address randomization means that the actual gap between the stack and the
end of the binary is normally greater than 128MB.  The larger the data
segment of the binary the higher the probability of failure.

Fix this by calculating the total size of the binary in the same way as
load_elf_interp().
Signed-off-by: Michael Davidson <md@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Zefan Li <lizefan@huawei.com>

a425d56c

ACPICA: Utilities: split IO address types from data type models. · c77676b7

Lv Zheng authored Apr 13, 2015

commit 2b876010 upstream.

ACPICA commit aacf863cfffd46338e268b7415f7435cae93b451

It is reported that on a physically 64-bit addressed machine, 32-bit kernel
can trigger crashes in accessing the memory regions that are beyond the
32-bit boundary. The region field's start address should still be 32-bit
compliant, but after a calculation (adding some offsets), it may exceed the
32-bit boundary. This case is rare and buggy, but there are real BIOSes
leaked with such issues (see References below).

This patch fixes this gap by always defining IO addresses as 64-bit, and
allows OSPMs to optimize it for a real 32-bit machine to reduce the size of
the internal objects.

Internal acpi_physical_address usages in the structures that can be fixed
by this change include:
 1. struct acpi_object_region:
    acpi_physical_address		address;
 2. struct acpi_address_range:
    acpi_physical_address		start_address;
    acpi_physical_address		end_address;
 3. struct acpi_mem_space_context;
    acpi_physical_address		address;
 4. struct acpi_table_desc
    acpi_physical_address		address;
See known issues 1 for other usages.

Note that acpi_io_address which is used for ACPI_PROCESSOR may also suffer
from same problem, so this patch changes it accordingly.

For iasl, it will enforce acpi_physical_address as 32-bit to generate
32-bit OSPM compatible tables on 32-bit platforms, we need to define
ACPI_32BIT_PHYSICAL_ADDRESS for it in acenv.h.

Known issues:
 1. Cleanup of mapped virtual address
   In struct acpi_mem_space_context, acpi_physical_address is used as a virtual
   address:
    acpi_physical_address                   mapped_physical_address;
   It is better to introduce acpi_virtual_address or use acpi_size instead.
   This patch doesn't make such a change. Because this should be done along
   with a change to acpi_os_map_memory()/acpi_os_unmap_memory().
   There should be no functional problem to leave this unchanged except
   that only this structure is enlarged unexpectedly.

Link: https://github.com/acpica/acpica/commit/aacf863c
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=87971
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=79501Reported-and-tested-by: Paul Menzel <paulepanter@users.sourceforge.net>
Reported-and-tested-by: Sial Nije <sialnije@gmail.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <lizefan@huawei.com>

c77676b7

powerpc/perf: Cap 64bit userspace backtraces to PERF_MAX_STACK_DEPTH · 3e6102f9

Anton Blanchard authored Apr 14, 2015

commit 9a5cbce4 upstream.

We cap 32bit userspace backtraces to PERF_MAX_STACK_DEPTH
(currently 127), but we forgot to do the same for 64bit backtraces.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Zefan Li <lizefan@huawei.com>

3e6102f9