Commits · a18887a24eaea649d6891d4049ef07bf826d6179 · Kirill Smelkov / linux

13 Dec, 2015 40 commits

x86/setup: Extend low identity map to cover whole kernel range · a18887a2

Paolo Bonzini authored Oct 14, 2015

commit f5f3497c upstream.

On 32-bit systems, the initial_page_table is reused by
efi_call_phys_prolog as an identity map to call
SetVirtualAddressMap.  efi_call_phys_prolog takes care of
converting the current CPU's GDT to a physical address too.

For PAE kernels the identity mapping is achieved by aliasing the
first PDPE for the kernel memory mapping into the first PDPE
of initial_page_table.  This makes the EFI stub's trick "just work".

However, for non-PAE kernels there is no guarantee that the identity
mapping in the initial_page_table extends as far as the GDT; in this
case, accesses to the GDT will cause a page fault (which quickly becomes
a triple fault).  Fix this by copying the kernel mappings from
swapper_pg_dir to initial_page_table twice, both at PAGE_OFFSET and at
identity mapping.

For some reason, this is only reproducible with QEMU's dynamic translation
mode, and not for example with KVM.  However, even under KVM one can clearly
see that the page table is bogus:

    $ qemu-system-i386 -pflash OVMF.fd -M q35 vmlinuz0 -s -S -daemonize
    $ gdb
    (gdb) target remote localhost:1234
    (gdb) hb *0x02858f6f
    Hardware assisted breakpoint 1 at 0x2858f6f
    (gdb) c
    Continuing.

    Breakpoint 1, 0x02858f6f in ?? ()
    (gdb) monitor info registers
    ...
    GDT=     0724e000 000000ff
    IDT=     fffbb000 000007ff
    CR0=0005003b CR2=ff896000 CR3=032b7000 CR4=00000690
    ...

The page directory is sane:

    (gdb) x/4wx 0x32b7000
    0x32b7000:	0x03398063	0x03399063	0x0339a063	0x0339b063
    (gdb) x/4wx 0x3398000
    0x3398000:	0x00000163	0x00001163	0x00002163	0x00003163
    (gdb) x/4wx 0x3399000
    0x3399000:	0x00400003	0x00401003	0x00402003	0x00403003

but our particular page directory entry is empty:

    (gdb) x/1wx 0x32b7000 + (0x724e000 >> 22) * 4
    0x32b7070:	0x00000000

[ It appears that you can skate past this issue if you don't receive
  any interrupts while the bogus GDT pointer is loaded, or if you avoid
  reloading the segment registers in general.

  Andy Lutomirski provides some additional insight:

   "AFAICT it's entirely permissible for the GDTR and/or LDT
    descriptor to point to unmapped memory.  Any attempt to use them
    (segment loads, interrupts, IRET, etc) will try to access that memory
    as if the access came from CPL 0 and, if the access fails, will
    generate a valid page fault with CR2 pointing into the GDT or
    LDT."

  Up until commit 23a0d4e8 ("efi: Disable interrupts around EFI
  calls, not in the epilog/prolog calls") interrupts were disabled
  around the prolog and epilog calls, and the functional GDT was
  re-installed before interrupts were re-enabled.

  Which explains why no one has hit this issue until now. ]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reported-by: Laszlo Ersek <lersek@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
[ Updated changelog. ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

a18887a2

proc: actually make proc_fd_permission() thread-friendly · 720ae734

Oleg Nesterov authored Nov 06, 2015

commit 54708d28 upstream.

The commit 96d0df79 ("proc: make proc_fd_permission() thread-friendly")
fixed the access to /proc/self/fd from sub-threads, but introduced another
problem: a sub-thread can't access /proc/<tid>/fd/ or /proc/thread-self/fd
if generic_permission() fails.

Change proc_fd_permission() to check same_thread_group(pid_task(), current).

Fixes: 96d0df79 ("proc: make proc_fd_permission() thread-friendly")
Reported-by: "Jin, Yihua" <yihua.jin@intel.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

720ae734

Input: elantech - add Fujitsu Lifebook U745 to force crc_enabled · edd540a6

Takashi Iwai authored Nov 06, 2015

commit 60603950 upstream.

Another Lifebook machine that needs the same quirk as other similar
models to make the driver working.

Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=883192Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

edd540a6

mm: slab: only move management objects off-slab for sizes larger than KMALLOC_MIN_SIZE · ca317d04

Catalin Marinas authored Nov 05, 2015

commit d4322d88 upstream.

On systems with a KMALLOC_MIN_SIZE of 128 (arm64, some mips and powerpc
configurations defining ARCH_DMA_MINALIGN to 128), the first
kmalloc_caches[] entry to be initialised after slab_early_init = 0 is
"kmalloc-128" with index 7.  Depending on the debug kernel configuration,
sizeof(struct kmem_cache) can be larger than 128 resulting in an
INDEX_NODE of 8.

Commit 8fc9cf42 ("slab: make more slab management structure off the
slab") enables off-slab management objects for sizes starting with
PAGE_SIZE >> 5 (128 bytes for a 4KB page configuration) and the creation
of the "kmalloc-128" cache would try to place the management objects
off-slab.  However, since KMALLOC_MIN_SIZE is already 128 and
freelist_size == 32 in __kmem_cache_create(), kmalloc_slab(freelist_size)
returns NULL (kmalloc_caches[7] not populated yet).  This triggers the
following bug on arm64:

  kernel BUG at /work/Linux/linux-2.6-aarch64/mm/slab.c:2283!
  Internal error: Oops - BUG: 0 [#1] SMP
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper Not tainted 4.3.0-rc4+ #540
  Hardware name: Juno (DT)
  PC is at __kmem_cache_create+0x21c/0x280
  LR is at __kmem_cache_create+0x210/0x280
  [...]
  Call trace:
    __kmem_cache_create+0x21c/0x280
    create_boot_cache+0x48/0x80
    create_kmalloc_cache+0x50/0x88
    create_kmalloc_caches+0x4c/0xf4
    kmem_cache_init+0x100/0x118
    start_kernel+0x214/0x33c

This patch introduces an OFF_SLAB_MIN_SIZE definition to avoid off-slab
management objects for sizes equal to or smaller than KMALLOC_MIN_SIZE.

Fixes: 8fc9cf42 ("slab: make more slab management structure off the slab")
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

ca317d04

scsi: restart list search after unlock in scsi_remove_target · 0c422a84

Christoph Hellwig authored Oct 19, 2015

commit 40998193 upstream.

When dropping a lock while iterating a list we must restart the search
as other threads could have manipulated the list under us.  Without this
we can get stuck in an endless loop.  This bug was introduced by

commit bc3f02a7
Author: Dan Williams <djbw@fb.com>
Date:   Tue Aug 28 22:12:10 2012 -0700

    [SCSI] scsi_remove_target: fix softlockup regression on hot remove

Which was itself trying to fix a reported soft lockup issue

http://thread.gmane.org/gmane.linux.kernel/1348679

However, we believe even with this revert of the original patch, the soft
lockup problem has been fixed by

commit f2495e22
Author: James Bottomley <JBottomley@Parallels.com>
Date:   Tue Jan 21 07:01:41 2014 -0800

    [SCSI] dual scan thread bug fix

Thanks go to Dan Williams <dan.j.williams@intel.com> for tracking all this
prior history down.
Reported-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Fixes: bc3f02a7Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

0c422a84

firewire: ohci: fix JMicron JMB38x IT context discovery · bf60a52f

Stefan Richter authored Nov 03, 2015

commit 100ceb66 upstream.

Reported by Clifford and Craig for JMicron OHCI-1394 + SDHCI combo
controllers:  Often or even most of the time, the controller is
initialized with the message "added OHCI v1.10 device as card 0, 4 IR +
0 IT contexts, quirks 0x10".  With 0 isochronous transmit DMA contexts
(IT contexts), applications like audio output are impossible.

However, OHCI-1394 demands that at least 4 IT contexts are implemented
by the link layer controller, and indeed JMicron JMB38x do implement
four of them.  Only their IsoXmitIntMask register is unreliable at early
access.

With my own JMB381 single function controller I found:
  - I can reproduce the problem with a lower probability than Craig's.
  - If I put a loop around the section which clears and reads
    IsoXmitIntMask, then either the first or the second attempt will
    return the correct initial mask of 0x0000000f.  I never encountered
    a case of needing more than a second attempt.
  - Consequently, if I put a dummy reg_read(...IsoXmitIntMaskSet)
    before the first write, the subsequent read will return the correct
    result.
  - If I merely ignore a wrong read result and force the known real
    result, later isochronous transmit DMA usage works just fine.

So let's just fix this chip bug up by the latter method.  Tested with
JMB381 on kernel 3.13 and 4.3.

Since OHCI-1394 generally requires 4 IT contexts at a minium, this
workaround is simply applied whenever the initial read of IsoXmitIntMask
returns 0, regardless whether it's a JMicron chip or not.  I never heard
of this issue together with any other chip though.

I am not 100% sure that this fix works on the OHCI-1394 part of JMB380
and JMB388 combo controllers exactly the same as on the JMB381 single-
function controller, but so far I haven't had a chance to let an owner
of a combo chip run a patched kernel.

Strangely enough, IsoRecvIntMask is always reported correctly, even
though it is probed right before IsoXmitIntMask.

Reported-by: Clifford Dunn
Reported-by: Craig Moore <craig.moore@qenos.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

bf60a52f

ALSA: hda - Add Intel Lewisburg device IDs Audio · 2bec64a0

Alexandra Yates authored Nov 04, 2015

commit 5cf92c8b upstream.

Adding Intel codename Lewisburg platform device IDs for audio.

[rearranged the position by tiwai]
Signed-off-by: Alexandra Yates <alexandra.yates@linux.intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

2bec64a0

ALSA: hda - Apply pin fixup for HP ProBook 6550b · 9927f021

Takashi Iwai authored Nov 04, 2015

commit c932b98c upstream.

HP ProBook 6550b needs the same pin fixup applied to other HP B-series
laptops with docks for making its headphone and dock headphone jacks
working properly. We just need to add the codec SSID to the list.

Bugzilla: https://bugzilla.kernel.org/attachment.cgi?id=191971Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

9927f021

thermal: exynos: Fix unbalanced regulator disable on probe failure · b6f74a6e

Krzysztof Kozlowski authored Oct 08, 2015

commit 824ead03 upstream.

During probe if the regulator could not be enabled, the error exit path
would still disable it. This could lead to unbalanced counter of
regulator enable/disable.

The patch moves code for getting and enabling the regulator from
exynos_map_dt_data() to probe function because it is really not a part
of getting Device Tree properties.
Acked-by: Lukasz Majewski <l.majewski@samsung.com>
Tested-by: Lukasz Majewski <l.majewski@samsung.com>
Reviewed-by: Alim Akhtar <alim.akhtar@samsung.com>
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Fixes: 5f09a5cb ("thermal: exynos: Disable the regulator on probe failure")
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
[ luis: backported to 3.16: adjusted context ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

b6f74a6e

KVM: VMX: fix SMEP and SMAP without EPT · 2628ad5d

Radim Krčmář authored Nov 02, 2015

commit 656ec4a4 upstream.

The comment in code had it mostly right, but we enable paging for
emulated real mode regardless of EPT.

Without EPT (which implies emulated real mode), secondary VCPUs won't
start unless we disable SM[AE]P when the guest doesn't use paging.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

2628ad5d

recordmcount: Fix endianness handling bug for nop_mcount · 811f0bd9

libin authored Nov 03, 2015

commit c84da8b9 upstream.

In nop_mcount, shdr->sh_offset and welp->r_offset should handle
endianness properly, otherwise it will trigger Segmentation fault
if the recordmcount main and file.o have different endianness.

Link: http://lkml.kernel.org/r/563806C7.7070606@huawei.comSigned-off-by: Li Bin <huawei.libin@huawei.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

811f0bd9

xtensa: fix secondary core boot in SMP · fd5507ba

Max Filippov authored Oct 16, 2015

commit ab45fb14 upstream.

There are multiple factors adding to the issue in different
configurations:

- commit 17290231 ("xtensa: add fixup for double exception raised
  in window overflow") added function window_overflow_restore_a0_fixup to
  double exception vector overlapping reset vector location of secondary
  processor cores.
- on MMUv2 cores RESET_VECTOR1_VADDR may point to uncached kernel memory
  making code overlapping depend on cache type and size, so that without
  cache or with WT cache reset vector code overwrites double exception
  code, making issue even harder to detect.
- on MMUv3 cores RESET_VECTOR1_VADDR may point to unmapped area, as
  MMUv3 cores change virtual address map to match MMUv2 layout, but
  reset vector virtual address is given for the original MMUv3 mapping.
- physical memory region of the secondary reset vector is not reserved
  in the physical memory map, and thus may be allocated and overwritten
  at arbitrary moment.

Fix it as follows:

- move window_overflow_restore_a0_fixup code to .text section.
- define RESET_VECTOR1_VADDR so that it points to reset vector in the
  cacheable MMUv2 map for cores with MMU.
- reserve reset vector region in the physical memory map. Drop separate
  literal section and build mxhead.S with text section literals.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

fd5507ba

mac80211: allow null chandef in tracing · 50025953

Arik Nemtsov authored Oct 25, 2015

commit 254d3dfe upstream.

In TDLS channel-switch operations the chandef can sometimes be NULL.
Avoid an oops in the trace code for these cases and just print a
chandef full of zeros.

Fixes: a7a6bdd0 ("mac80211: introduce TDLS channel switch ops")
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

50025953

mac80211: fix divide by zero when NOA update · f5e1cc75

Janusz.Dziedzic@tieto.com authored Oct 27, 2015

commit 519ee691 upstream.

In case of one shot NOA the interval can be 0, catch that
instead of potentially (depending on the driver) crashing
like this:

divide error: 0000 [#1] SMP
[...]
Call Trace:
<IRQ>
[<ffffffffc08e891c>] ieee80211_extend_absent_time+0x6c/0xb0 [mac80211]
[<ffffffffc08e8a17>] ieee80211_update_p2p_noa+0xb7/0xe0 [mac80211]
[<ffffffffc069cc30>] ath9k_p2p_ps_timer+0x170/0x190 [ath9k]
[<ffffffffc070adf8>] ath_gen_timer_isr+0xc8/0xf0 [ath9k_hw]
[<ffffffffc0691156>] ath9k_tasklet+0x296/0x2f0 [ath9k]
[<ffffffff8107ad65>] tasklet_action+0xe5/0xf0
[...]
Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

f5e1cc75

megaraid_sas : SMAP restriction--do not access user memory from IOCTL code · bdd07ab7

sumit.saxena@avagotech.com authored Oct 15, 2015

commit 323c4a02 upstream.

This is an issue on SMAP enabled CPUs and 32 bit apps running on 64 bit
OS. Do not access user memory from kernel code. The SMAP bit restricts
accessing user memory from kernel code.
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

bdd07ab7

xtensa: fixes for configs without loop option · 84e3da3b

Max Filippov authored Sep 24, 2015

commit 5029615e upstream.

Build-time fixes:
- make lbeg/lend/lcount save/restore conditional on kernel entry;
- don't clear lcount in platform_restart functions unconditionally.

Run-time fixes:
- use correct end of range register in __endla paired with __loopt, not
  the unused temporary register. This fixes .bss zero-initialization.
  Update comments in asmmacro.h;
- don't clobber a10 in the usercopy that leads to access to unmapped
  memory.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
[ luis: backported to 3.16: adjusted context ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

84e3da3b

crypto: algif_hash - Only export and import on sockets with data · 214fcf79

Herbert Xu authored Nov 01, 2015

commit 4afa5f96 upstream.

The hash_accept call fails to work on sockets that have not received
any data.  For some algorithm implementations it may cause crashes.

This patch fixes this by ensuring that we only export and import on
sockets that have received data.
Reported-by: Harsh Jain <harshjain.prof@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

214fcf79

drm/i915: add quirk to enable backlight on Dell Chromebook 11 (2015) · e46963e0

Jani Nikula authored Oct 30, 2015

commit 9be64eee upstream.
Reported-by: Keith Webb <khwebb@gmail.com>
Suggested-by: Keith Webb <khwebb@gmail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=106671Reviewed-by: Clint Taylor <Clinton.A.Taylor@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1446209424-28801-1-git-send-email-jani.nikula@intel.comSigned-off-by: Luis Henriques <luis.henriques@canonical.com>

e46963e0

Revert "dm mpath: fix stalls when handling invalid ioctls" · f5699565

Mauricio Faria de Oliveira authored Oct 29, 2015

commit 47796938 upstream.

This reverts commit a1989b33.

That commit introduced a regression at least for the case of the SG_IO ioctl()
running without CAP_SYS_RAWIO capability (e.g., unprivileged users) when there
are no active paths: the ioctl() fails with the ENOTTY errno immediately rather
than blocking due to queue_if_no_path until a path becomes active, for example.

That case happens to be exercised by QEMU KVM guests with 'scsi-block' devices
(qemu "-device scsi-block" [1], libvirt "<disk type='block' device='lun'>" [2])
from multipath devices; which leads to SCSI/filesystem errors in such a guest.

More general scenarios can hit that regression too. The following demonstration
employs a SG_IO ioctl() with a standard SCSI INQUIRY command for this objective
(some output & user changes omitted for brevity and comments added for clarity).

Reverting that commit restores normal operation (queueing) in failing scenarios;
tested on linux-next (next-20151022).

1) Test-case is based on sg_simple0 [3] (just SG_IO; remove SG_GET_VERSION_NUM)

    $ cat sg_simple0.c
    ... see [3] ...
    $ sed '/SG_GET_VERSION_NUM/,/}/d' sg_simple0.c > sgio_inquiry.c
    $ gcc sgio_inquiry.c -o sgio_inquiry

2) The ioctl() works fine with active paths present.

    # multipath -l 85ag56
    85ag56 (...) dm-19 IBM     ,2145
    size=60G features='1 queue_if_no_path' hwhandler='0' wp=rw
    |-+- policy='service-time 0' prio=0 status=active
    | |- 8:0:11:0  sdz  65:144  active undef running
    | `- 9:0:9:0   sdbf 67:144  active undef running
    `-+- policy='service-time 0' prio=0 status=enabled
      |- 8:0:12:0  sdae 65:224  active undef running
      `- 9:0:12:0  sdbo 68:32   active undef running

    $ ./sgio_inquiry /dev/mapper/85ag56
    Some of the INQUIRY command's response:
        IBM       2145              0000
    INQUIRY duration=0 millisecs, resid=0

3) The ioctl() fails with ENOTTY errno with _no_ active paths present,
   for unprivileged users (rather than blocking due to queue_if_no_path).

    # for path in $(multipath -l 85ag56 | grep -o 'sd[a-z]\+'); \
          do multipathd -k"fail path $path"; done

    # multipath -l 85ag56
    85ag56 (...) dm-19 IBM     ,2145
    size=60G features='1 queue_if_no_path' hwhandler='0' wp=rw
    |-+- policy='service-time 0' prio=0 status=enabled
    | |- 8:0:11:0  sdz  65:144  failed undef running
    | `- 9:0:9:0   sdbf 67:144  failed undef running
    `-+- policy='service-time 0' prio=0 status=enabled
      |- 8:0:12:0  sdae 65:224  failed undef running
      `- 9:0:12:0  sdbo 68:32   failed undef running

    $ ./sgio_inquiry /dev/mapper/85ag56
    sg_simple0: Inquiry SG_IO ioctl error: Inappropriate ioctl for device

4) dmesg shows that scsi_verify_blk_ioctl() failed for SG_IO (0x2285);
   it returns -ENOIOCTLCMD, later replaced with -ENOTTY in vfs_ioctl().

    $ dmesg
    <...>
    [] device-mapper: multipath: Failing path 65:144.
    [] device-mapper: multipath: Failing path 67:144.
    [] device-mapper: multipath: Failing path 65:224.
    [] device-mapper: multipath: Failing path 68:32.
    [] sgio_inquiry: sending ioctl 2285 to a partition!

5) The ioctl() only works if the SYS_CAP_RAWIO capability is present
   (then queueing happens -- in this example, queue_if_no_path is set);
   this is due to a conditional check in scsi_verify_blk_ioctl().

    # capsh --drop=cap_sys_rawio -- -c './sgio_inquiry /dev/mapper/85ag56'
    sg_simple0: Inquiry SG_IO ioctl error: Inappropriate ioctl for device

    # ./sgio_inquiry /dev/mapper/85ag56 &
    [1] 72830

    # cat /proc/72830/stack
    [<c00000171c0df700>] 0xc00000171c0df700
    [<c000000000015934>] __switch_to+0x204/0x350
    [<c000000000152d4c>] msleep+0x5c/0x80
    [<c00000000077dfb0>] dm_blk_ioctl+0x70/0x170
    [<c000000000487c40>] blkdev_ioctl+0x2b0/0x9b0
    [<c0000000003128e4>] block_ioctl+0x64/0xd0
    [<c0000000002dd3b0>] do_vfs_ioctl+0x490/0x780
    [<c0000000002dd774>] SyS_ioctl+0xd4/0xf0
    [<c000000000009358>] system_call+0x38/0xd0

6) This is the function call chain exercised in this analysis:

SYSCALL_DEFINE3(ioctl, <...>) @ fs/ioctl.c
    -> do_vfs_ioctl()
        -> vfs_ioctl()
            ...
            error = filp->f_op->unlocked_ioctl(filp, cmd, arg);
            ...
                -> dm_blk_ioctl() @ drivers/md/dm.c
                    -> multipath_ioctl() @ drivers/md/dm-mpath.c
                        ...
                        (bdev = NULL, due to no active paths)
                        ...
                        if (!bdev || <...>) {
                            int err = scsi_verify_blk_ioctl(NULL, cmd);
                            if (err)
                                r = err;
                        }
                        ...
                            -> scsi_verify_blk_ioctl() @ block/scsi_ioctl.c
                                ...
                                if (bd && bd == bd->bd_contains) // not taken (bd = NULL)
                                    return 0;
                                ...
                                if (capable(CAP_SYS_RAWIO)) // not taken (unprivileged user)
                                    return 0;
                                ...
                                printk_ratelimited(KERN_WARNING
                                           "%s: sending ioctl %x to a partition!\n" <...>);

                                return -ENOIOCTLCMD;
                            <-
                        ...
                        return r ? : <...>
                    <-
            ...
            if (error == -ENOIOCTLCMD)
                error = -ENOTTY;
             out:
                return error;
            ...

Links:
[1] http://git.qemu.org/?p=qemu.git;a=commit;h=336a6915bc7089fb20fea4ba99972ad9a97c5f52
[2] https://libvirt.org/formatdomain.html#elementsDisks (see 'disk' -> 'device')
[3] http://tldp.org/HOWTO/SCSI-Generic-HOWTO/pexample.html (Revision 1.2, 2002-05-03)
Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

f5699565

mtd: blkdevs: fix potential deadlock + lockdep warnings · 9ea06026

Brian Norris authored Oct 26, 2015

commit f3c63795 upstream.

Commit 073db4a5 ("mtd: fix: avoid race condition when accessing
mtd->usecount") fixed a race condition but due to poor ordering of the
mutex acquisition, introduced a potential deadlock.

The deadlock can occur, for example, when rmmod'ing the m25p80 module, which
will delete one or more MTDs, along with any corresponding mtdblock
devices. This could potentially race with an acquisition of the block
device as follows.

 -> blktrans_open()
    ->  mutex_lock(&dev->lock);
    ->  mutex_lock(&mtd_table_mutex);

 -> del_mtd_device()
    ->  mutex_lock(&mtd_table_mutex);
    ->  blktrans_notify_remove() -> del_mtd_blktrans_dev()
       ->  mutex_lock(&dev->lock);

This is a classic (potential) ABBA deadlock, which can be fixed by
making the A->B ordering consistent everywhere. There was no real
purpose to the ordering in the original patch, AFAIR, so this shouldn't
be a problem. This ordering was actually already present in
del_mtd_blktrans_dev(), for one, where the function tried to ensure that
its caller already held mtd_table_mutex before it acquired &dev->lock:

        if (mutex_trylock(&mtd_table_mutex)) {
                mutex_unlock(&mtd_table_mutex);
                BUG();
        }

So, reverse the ordering of acquisition of &dev->lock and &mtd_table_mutex so
we always acquire mtd_table_mutex first.

Snippets of the lockdep output follow:

  # modprobe -r m25p80
  [   53.419251]
  [   53.420838] ======================================================
  [   53.427300] [ INFO: possible circular locking dependency detected ]
  [   53.433865] 4.3.0-rc6 #96 Not tainted
  [   53.437686] -------------------------------------------------------
  [   53.444220] modprobe/372 is trying to acquire lock:
  [   53.449320]  (&new->lock){+.+...}, at: [<c043fe4c>] del_mtd_blktrans_dev+0x80/0xdc
  [   53.457271]
  [   53.457271] but task is already holding lock:
  [   53.463372]  (mtd_table_mutex){+.+.+.}, at: [<c0439994>] del_mtd_device+0x18/0x100
  [   53.471321]
  [   53.471321] which lock already depends on the new lock.
  [   53.471321]
  [   53.479856]
  [   53.479856] the existing dependency chain (in reverse order) is:
  [   53.487660]
  -> #1 (mtd_table_mutex){+.+.+.}:
  [   53.492331]        [<c043fc5c>] blktrans_open+0x34/0x1a4
  [   53.497879]        [<c01afce0>] __blkdev_get+0xc4/0x3b0
  [   53.503364]        [<c01b0bb8>] blkdev_get+0x108/0x320
  [   53.508743]        [<c01713c0>] do_dentry_open+0x218/0x314
  [   53.514496]        [<c0180454>] path_openat+0x4c0/0xf9c
  [   53.519959]        [<c0182044>] do_filp_open+0x5c/0xc0
  [   53.525336]        [<c0172758>] do_sys_open+0xfc/0x1cc
  [   53.530716]        [<c000f740>] ret_fast_syscall+0x0/0x1c
  [   53.536375]
  -> #0 (&new->lock){+.+...}:
  [   53.540587]        [<c063f124>] mutex_lock_nested+0x38/0x3cc
  [   53.546504]        [<c043fe4c>] del_mtd_blktrans_dev+0x80/0xdc
  [   53.552606]        [<c043f164>] blktrans_notify_remove+0x7c/0x84
  [   53.558891]        [<c04399f0>] del_mtd_device+0x74/0x100
  [   53.564544]        [<c043c670>] del_mtd_partitions+0x80/0xc8
  [   53.570451]        [<c0439aa0>] mtd_device_unregister+0x24/0x48
  [   53.576637]        [<c046ce6c>] spi_drv_remove+0x1c/0x34
  [   53.582207]        [<c03de0f0>] __device_release_driver+0x88/0x114
  [   53.588663]        [<c03de19c>] device_release_driver+0x20/0x2c
  [   53.594843]        [<c03dd9e8>] bus_remove_device+0xd8/0x108
  [   53.600748]        [<c03dacc0>] device_del+0x10c/0x210
  [   53.606127]        [<c03dadd0>] device_unregister+0xc/0x20
  [   53.611849]        [<c046d878>] __unregister+0x10/0x20
  [   53.617211]        [<c03da868>] device_for_each_child+0x50/0x7c
  [   53.623387]        [<c046eae8>] spi_unregister_master+0x58/0x8c
  [   53.629578]        [<c03e12f0>] release_nodes+0x15c/0x1c8
  [   53.635223]        [<c03de0f8>] __device_release_driver+0x90/0x114
  [   53.641689]        [<c03de900>] driver_detach+0xb4/0xb8
  [   53.647147]        [<c03ddc78>] bus_remove_driver+0x4c/0xa0
  [   53.652970]        [<c00cab50>] SyS_delete_module+0x11c/0x1e4
  [   53.658976]        [<c000f740>] ret_fast_syscall+0x0/0x1c
  [   53.664621]
  [   53.664621] other info that might help us debug this:
  [   53.664621]
  [   53.672979]  Possible unsafe locking scenario:
  [   53.672979]
  [   53.679169]        CPU0                    CPU1
  [   53.683900]        ----                    ----
  [   53.688633]   lock(mtd_table_mutex);
  [   53.692383]                                lock(&new->lock);
  [   53.698306]                                lock(mtd_table_mutex);
  [   53.704658]   lock(&new->lock);
  [   53.707946]
  [   53.707946]  *** DEADLOCK ***

Fixes: 073db4a5 ("mtd: fix: avoid race condition when accessing mtd->usecount")
Reported-by: Felipe Balbi <balbi@ti.com>
Tested-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

9ea06026

can: Use correct type in sizeof() in nla_put() · a4d95541

Marek Vasut authored Oct 30, 2015

commit 562b103a upstream.

The sizeof() is invoked on an incorrect variable, likely due to some
copy-paste error, and this might result in memory corruption. Fix this.
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Wolfgang Grandegger <wg@grandegger.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

a4d95541

arm64: Fix compat register mappings · 25a0fff8

Robin Murphy authored Oct 22, 2015

commit 5accd17d upstream.

For reasons not entirely apparent, but now enshrined in history, the
architectural mapping of AArch32 banked registers to AArch64 registers
actually orders SP_<mode> and LR_<mode> backwards compared to the
intuitive r13/r14 order, for all modes except FIQ.

Fix the compat_<reg>_<mode> macros accordingly, in the hope of avoiding
subtle bugs with KVM and AArch32 guests.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

25a0fff8

KVM: s390: SCA must not cross page boundaries · 7c724edb

David Hildenbrand authored Oct 26, 2015

commit c5c2c393 upstream.

We seemed to have missed a few corner cases in commit f6c137ff
("KVM: s390: randomize sca address").

The SCA has a maximum size of 2112 bytes. By setting the sca_offset to
some unlucky numbers, we exceed the page.

0x7c0 (1984) -> Fits exactly
0x7d0 (2000) -> 16 bytes out
0x7e0 (2016) -> 32 bytes out
0x7f0 (2032) -> 48 bytes out

One VCPU entry is 32 bytes long.

For the last two cases, we actually write data to the other page.
1. The address of the VCPU.
2. Injection/delivery/clearing of SIGP externall calls via SIGP IF.

Especially the 2. happens regularly. So this could produce two problems:
1. The guest losing/getting external calls.
2. Random memory overwrites in the host.

So this problem happens on every 127 + 128 created VM with 64 VCPUs.
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

7c724edb

megaraid_sas: Do not use PAGE_SIZE for max_sectors · dea853b1

sumit.saxena@avagotech.com authored Oct 15, 2015

commit 357ae967 upstream.

Do not use PAGE_SIZE marco to calculate max_sectors per I/O
request. Driver code assumes PAGE_SIZE will be always 4096 which can
lead to wrongly calculated value if PAGE_SIZE is not 4096. This issue
was reported in Ubuntu Bugzilla Bug #1475166.
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

dea853b1

MAINTAINERS: Add public mailing list for ARC · fae7751f

Vineet Gupta authored Oct 26, 2015

commit 9acdc911 upstream.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

fae7751f

ALSA: hda/realtek - Dell XPS one ALC3260 speaker no sound after resume back · c1fc5009

Kailang Yang authored Oct 26, 2015

commit 6ed1131f upstream.

This machine had I2S codec for speaker output.
It need to refill the I2S codec initial verb after resume back.
Signed-off-by: Kailang Yang <kailang@realtek.com>
Reported-and-tested-by: George Gugulea <gugulea@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[ luis: backported to 3.16: adjusted context ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

c1fc5009

ACPI: Use correct IRQ when uninstalling ACPI interrupt handler · 79f08032

Chen Yu authored Oct 25, 2015

commit 49e4b843 upstream.

Currently when the system is trying to uninstall the ACPI interrupt
handler, it uses acpi_gbl_FADT.sci_interrupt as the IRQ number.
However, the IRQ number that the ACPI interrupt handled is installed
for comes from acpi_gsi_to_irq() and that is the number that should
be used for the handler removal.

Fix this problem by using the mapped IRQ returned from acpi_gsi_to_irq()
as appropriate.
Acked-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[ luis: backported to 3.16: adjusted context ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

79f08032

staging: rtl8712: Add device ID for Sitecom WLA2100 · 9bd08031

Larry Finger authored Oct 18, 2015

commit 1e6e6328 upstream.

This adds the USB ID for the Sitecom WLA2100. The Windows 10 inf file
was checked to verify that the addition is correct.
Reported-by: Frans van de Wiel <fvdw@fvdw.eu>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Frans van de Wiel <fvdw@fvdw.eu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

9bd08031

USB: qcserial: add Sierra Wireless MC74xx/EM74xx · 05e877eb

Bjørn Mork authored Oct 22, 2015

commit f504ab18 upstream.

New device IDs shamelessly lifted from the vendor driver.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Acked-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

05e877eb

spi: atmel: Fix DMA-setup for transfers with more than 8 bits per word · ec024552

David Mosberger-Tang authored Oct 20, 2015

commit 06515f83 upstream.

The DMA-slave configuration depends on the whether <= 8 or > 8 bits
are transferred per word, so we need to call
atmel_spi_dma_slave_config() with the correct value.
Signed-off-by: David Mosberger <davidm@egauge.net>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

ec024552

Bluetooth: ath3k: Add support of AR3012 0cf3:817b device · 6cadc9ae

Dmitry Tunin authored Oct 16, 2015

commit 18e0afab upstream.

T: Bus=04 Lev=02 Prnt=02 Port=04 Cnt=01 Dev#= 3 Spd=12 MxCh= 0
D: Ver= 1.10 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs= 1
P: Vendor=0cf3 ProdID=817b Rev=00.02
C: #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA
I: If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
I: If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb

BugLink: https://bugs.launchpad.net/bugs/1506615Signed-off-by: Dmitry Tunin <hanipouspilot@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

6cadc9ae

Bluetooth: ath3k: Add new AR3012 0930:021c id · 59d383d8

Dmitry Tunin authored Oct 05, 2015

commit cd355ff0 upstream.

This adapter works with the existing linux-firmware.

T:  Bus=01 Lev=01 Prnt=01 Port=03 Cnt=02 Dev#=  3 Spd=12  MxCh= 0
D:  Ver= 1.10 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs=  1
P:  Vendor=0930 ProdID=021c Rev=00.01
C:  #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA
I:  If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
I:  If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb

BugLink: https://bugs.launchpad.net/bugs/1502781Signed-off-by: Dmitry Tunin <hanipouspilot@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

59d383d8

Bluetooth: hidp: fix device disconnect on idle timeout · 38cfa4dc

David Herrmann authored Sep 07, 2015

commit 660f0fc0 upstream.

The HIDP specs define an idle-timeout which automatically disconnects a
device. This has always been implemented in the HIDP layer and forced a
synchronous shutdown of the hidp-scheduler. This works just fine, but
lacks a forced disconnect on the underlying l2cap channels. This has been
broken since:

    commit 5205185d
    Author: David Herrmann <dh.herrmann@gmail.com>
    Date:   Sat Apr 6 20:28:47 2013 +0200

        Bluetooth: hidp: remove old session-management

The old session-management always forced an l2cap error on the ctrl/intr
channels when shutting down. The new session-management skips this, as we
don't want to enforce channel policy on the caller. In other words, if
user-space removes an HIDP device, the underlying channels (which are
*owned* and *referenced* by user-space) are still left active. User-space
needs to call shutdown(2) or close(2) to release them.

Unfortunately, this does not work with idle-timeouts. There is no way to
signal user-space that the HIDP layer has been stopped. The API simply
does not support any event-passing except for poll(2). Hence, we restore
old behavior and force EUNATCH on the sockets if the HIDP layer is
disconnected due to idle-timeouts (behavior of explicit disconnects
remains unmodified). User-space can still call

    getsockopt(..., SO_ERROR, ...)

..to retrieve the EUNATCH error and clear sk_err. Hence, the channels can
still be re-used (which nobody does so far, though). Therefore, the API
still supports the new behavior, but with this patch it's also compatible
to the old implicit channel shutdown.
Reported-by: Mark Haun <haunma@keteu.org>
Reported-by: Luiz Augusto von Dentz <luiz.dentz@gmail.com>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

38cfa4dc

[media] media: vb2 dma-contig: Fully cache synchronise buffers in prepare and finish · 5eb6e56e

Tiffany Lin authored Sep 24, 2015

commit d9a98588 upstream.

In videobuf2 dma-contig memory type the prepare and finish ops, instead of
passing the number of entries in the original scatterlist as the "nents"
parameter to dma_sync_sg_for_device() and dma_sync_sg_for_cpu(), the value
returned by dma_map_sg() was used. Albeit this has been suggested in
comments of some implementations (which have since been corrected), this
is wrong.

Fixes: 199d101e ("v4l: vb2-dma-contig: add prepare/finish to dma-contig allocator")
Signed-off-by: Tiffany Lin <tiffany.lin@mediatek.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

5eb6e56e

spi: dw: explicitly free IRQ handler in dw_spi_remove_host() · d599fd12

Andy Shevchenko authored Oct 20, 2015

commit 02f20387 upstream.

The following warning occurs when DW SPI is compiled as a module and it's a PCI
device. On the removal stage pcibios_free_irq() is called earlier than
free_irq() due to the latter is called at managed resources free strage.

------------[ cut here ]------------
WARNING: CPU: 1 PID: 1003 at /home/andy/prj/linux/fs/proc/generic.c:575 remove_proc_entry+0x118/0x150()
remove_proc_entry: removing non-empty directory 'irq/38', leaking at least 'dw_spi1'
Modules linked in: spi_dw_midpci(-) spi_dw [last unloaded: dw_dmac_core]
CPU: 1 PID: 1003 Comm: modprobe Not tainted 4.3.0-rc5-next-20151013+ #32
 00000000 00000000 f5535d70 c12dc220 f5535db0 f5535da0 c104e912 c198a6bc
 f5535dcc 000003eb c198a638 0000023f c11b4098 c11b4098 f54f1ec8 f54f1ea0
 f642ba20 f5535db8 c104e96e 00000009 f5535db0 c198a6bc f5535dcc f5535df0
Call Trace:
 [<c12dc220>] dump_stack+0x41/0x61
 [<c104e912>] warn_slowpath_common+0x82/0xb0
 [<c11b4098>] ? remove_proc_entry+0x118/0x150
 [<c11b4098>] ? remove_proc_entry+0x118/0x150
 [<c104e96e>] warn_slowpath_fmt+0x2e/0x30
 [<c11b4098>] remove_proc_entry+0x118/0x150
 [<c109b96a>] unregister_irq_proc+0xaa/0xc0
 [<c109575e>] free_desc+0x1e/0x60
 [<c10957d2>] irq_free_descs+0x32/0x70
 [<c109b1a0>] irq_domain_free_irqs+0x120/0x150
 [<c1039e8c>] mp_unmap_irq+0x5c/0x60
 [<c16277b0>] intel_mid_pci_irq_disable+0x20/0x40
 [<c1627c7f>] pcibios_free_irq+0xf/0x20
 [<c13189f2>] pci_device_remove+0x52/0xb0
 [<c13f6367>] __device_release_driver+0x77/0x100
 [<c13f6da7>] driver_detach+0x87/0x90
 [<c13f5eaa>] bus_remove_driver+0x4a/0xc0
 [<c128bf0d>] ? selinux_capable+0xd/0x10
 [<c13f7483>] driver_unregister+0x23/0x60
 [<c10bad8a>] ? find_module_all+0x5a/0x80
 [<c1317413>] pci_unregister_driver+0x13/0x60
 [<f80ac654>] dw_spi_driver_exit+0xd/0xf [spi_dw_midpci]
 [<c10bce9a>] SyS_delete_module+0x17a/0x210

Explicitly call free_irq() at removal stage of the DW SPI driver.

Fixes: 04f421e7 (spi: dw: use managed resources)
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
[ luis: backported to 3.16: adjusted context ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

d599fd12

$Hon Ching \$Vicky\$ Lo's avatar$

vTPM: fix memory allocation flag for rtce buffer at kernel boot · 32c939ce

Hon Ching \$Vicky\$ Lo authored Oct 07, 2015

commit 60ecd86c upstream.

At ibm vtpm initialzation, tpm_ibmvtpm_probe() registers its interrupt
handler, ibmvtpm_interrupt, which calls ibmvtpm_crq_process to allocate
memory for rtce buffer.  The current code uses 'GFP_KERNEL' as the
type of kernel memory allocation, which resulted a warning at
kernel/lockdep.c.  This patch uses 'GFP_ATOMIC' instead so that the
allocation is high-priority and does not sleep.
Signed-off-by: Hon Ching(Vicky) Lo <honclo@linux.vnet.ibm.com>
Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

32c939ce

ext4, jbd2: ensure entering into panic after recording an error in superblock · abf7bef5

Daeho Jeong authored Oct 18, 2015

commit 4327ba52 upstream.

If a EXT4 filesystem utilizes JBD2 journaling and an error occurs, the
journaling will be aborted first and the error number will be recorded
into JBD2 superblock and, finally, the system will enter into the
panic state in "errors=panic" option.  But, in the rare case, this
sequence is little twisted like the below figure and it will happen
that the system enters into panic state, which means the system reset
in mobile environment, before completion of recording an error in the
journal superblock. In this case, e2fsck cannot recognize that the
filesystem failure occurred in the previous run and the corruption
wouldn't be fixed.

Task A                        Task B
ext4_handle_error()
-> jbd2_journal_abort()
  -> __journal_abort_soft()
    -> __jbd2_journal_abort_hard()
    | -> journal->j_flags |= JBD2_ABORT;
    |
    |                         __ext4_abort()
    |                         -> jbd2_journal_abort()
    |                         | -> __journal_abort_soft()
    |                         |   -> if (journal->j_flags & JBD2_ABORT)
    |                         |           return;
    |                         -> panic()
    |
    -> jbd2_journal_update_sb_errno()
Tested-by: Hobin Woo <hobin.woo@samsung.com>
Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

abf7bef5

[PATCH] fix calculation of meta_bg descriptor backups · ea9460a6

Andy Leiserson authored Oct 18, 2015

commit 904dad47 upstream.

"group" is the group where the backup will be placed, and is
initialized to zero in the declaration. This meant that backups for
meta_bg descriptors were erroneously written to the backup block group
descriptors in groups 1 and (desc_per_block-1).

Reproduction information:
  mke2fs -Fq -t ext4 -b 1024 -O ^resize_inode /tmp/foo.img 16G
  truncate -s 24G /tmp/foo.img
  losetup /dev/loop0 /tmp/foo.img
  mount /dev/loop0 /mnt
  resize2fs /dev/loop0
  umount /dev/loop0
  dd if=/dev/zero of=/dev/loop0 bs=1024 count=2
  e2fsck -fy /dev/loop0
  losetup -d /dev/loop0
Signed-off-by: Andy Leiserson <andy@leiserson.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

ea9460a6

ext4: fix potential use after free in __ext4_journal_stop · 13ea5f86

Lukas Czerner authored Oct 17, 2015

commit 6934da92 upstream.

There is a use-after-free possibility in __ext4_journal_stop() in the
case that we free the handle in the first jbd2_journal_stop() because
we're referencing handle->h_err afterwards. This was introduced in
9705acd6 and it is wrong. Fix it by
storing the handle->h_err value beforehand and avoid referencing
potentially freed handle.

Fixes: 9705acd6Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

13ea5f86

Btrfs: fix truncation of compressed and inlined extents · c40009c4

Filipe Manana authored Oct 16, 2015

commit 0305cd5f upstream.

When truncating a file to a smaller size which consists of an inline
extent that is compressed, we did not discard (or made unusable) the
data between the new file size and the old file size, wasting metadata
space and allowing for the truncated data to be leaked and the data
corruption/loss mentioned below.
We were also not correctly decrementing the number of bytes used by the
inode, we were setting it to zero, giving a wrong report for callers of
the stat(2) syscall. The fsck tool also reported an error about a mismatch
between the nbytes of the file versus the real space used by the file.

Now because we weren't discarding the truncated region of the file, it
was possible for a caller of the clone ioctl to actually read the data
that was truncated, allowing for a security breach without requiring root
access to the system, using only standard filesystem operations. The
scenario is the following:

   1) User A creates a file which consists of an inline and compressed
      extent with a size of 2000 bytes - the file is not accessible to
      any other users (no read, write or execution permission for anyone
      else);

   2) The user truncates the file to a size of 1000 bytes;

   3) User A makes the file world readable;

   4) User B creates a file consisting of an inline extent of 2000 bytes;

   5) User B issues a clone operation from user A's file into its own
      file (using a length argument of 0, clone the whole range);

   6) User B now gets to see the 1000 bytes that user A truncated from
      its file before it made its file world readbale. User B also lost
      the bytes in the range [1000, 2000[ bytes from its own file, but
      that might be ok if his/her intention was reading stale data from
      user A that was never supposed to be public.

Note that this contrasts with the case where we truncate a file from 2000
bytes to 1000 bytes and then truncate it back from 1000 to 2000 bytes. In
this case reading any byte from the range [1000, 2000[ will return a value
of 0x00, instead of the original data.

This problem exists since the clone ioctl was added and happens both with
and without my recent data loss and file corruption fixes for the clone
ioctl (patch "Btrfs: fix file corruption and data loss after cloning
inline extents").

So fix this by truncating the compressed inline extents as we do for the
non-compressed case, which involves decompressing, if the data isn't already
in the page cache, compressing the truncated version of the extent, writing
the compressed content into the inline extent and then truncate it.

The following test case for fstests reproduces the problem. In order for
the test to pass both this fix and my previous fix for the clone ioctl
that forbids cloning a smaller inline extent into a larger one,
which is titled "Btrfs: fix file corruption and data loss after cloning
inline extents", are needed. Without that other fix the test fails in a
different way that does not leak the truncated data, instead part of
destination file gets replaced with zeroes (because the destination file
has a larger inline extent than the source).

  seq=`basename $0`
  seqres=$RESULT_DIR/$seq
  echo "QA output created by $seq"
  tmp=/tmp/$$
  status=1	# failure is the default!
  trap "_cleanup; exit \$status" 0 1 2 3 15

  _cleanup()
  {
      rm -f $tmp.*
  }

  # get standard environment, filters and checks
  . ./common/rc
  . ./common/filter

  # real QA test starts here
  _need_to_be_root
  _supported_fs btrfs
  _supported_os Linux
  _require_scratch
  _require_cloner

  rm -f $seqres.full

  _scratch_mkfs >>$seqres.full 2>&1
  _scratch_mount "-o compress"

  # Create our test files. File foo is going to be the source of a clone operation
  # and consists of a single inline extent with an uncompressed size of 512 bytes,
  # while file bar consists of a single inline extent with an uncompressed size of
  # 256 bytes. For our test's purpose, it's important that file bar has an inline
  # extent with a size smaller than foo's inline extent.
  $XFS_IO_PROG -f -c "pwrite -S 0xa1 0 128"   \
          -c "pwrite -S 0x2a 128 384" \
          $SCRATCH_MNT/foo | _filter_xfs_io
  $XFS_IO_PROG -f -c "pwrite -S 0xbb 0 256" $SCRATCH_MNT/bar | _filter_xfs_io

  # Now durably persist all metadata and data. We do this to make sure that we get
  # on disk an inline extent with a size of 512 bytes for file foo.
  sync

  # Now truncate our file foo to a smaller size. Because it consists of a
  # compressed and inline extent, btrfs did not shrink the inline extent to the
  # new size (if the extent was not compressed, btrfs would shrink it to 128
  # bytes), it only updates the inode's i_size to 128 bytes.
  $XFS_IO_PROG -c "truncate 128" $SCRATCH_MNT/foo

  # Now clone foo's inline extent into bar.
  # This clone operation should fail with errno EOPNOTSUPP because the source
  # file consists only of an inline extent and the file's size is smaller than
  # the inline extent of the destination (128 bytes < 256 bytes). However the
  # clone ioctl was not prepared to deal with a file that has a size smaller
  # than the size of its inline extent (something that happens only for compressed
  # inline extents), resulting in copying the full inline extent from the source
  # file into the destination file.
  #
  # Note that btrfs' clone operation for inline extents consists of removing the
  # inline extent from the destination inode and copy the inline extent from the
  # source inode into the destination inode, meaning that if the destination
  # inode's inline extent is larger (N bytes) than the source inode's inline
  # extent (M bytes), some bytes (N - M bytes) will be lost from the destination
  # file. Btrfs could copy the source inline extent's data into the destination's
  # inline extent so that we would not lose any data, but that's currently not
  # done due to the complexity that would be needed to deal with such cases
  # (specially when one or both extents are compressed), returning EOPNOTSUPP, as
  # it's normally not a very common case to clone very small files (only case
  # where we get inline extents) and copying inline extents does not save any
  # space (unlike for normal, non-inlined extents).
  $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/foo $SCRATCH_MNT/bar

  # Now because the above clone operation used to succeed, and due to foo's inline
  # extent not being shinked by the truncate operation, our file bar got the whole
  # inline extent copied from foo, making us lose the last 128 bytes from bar
  # which got replaced by the bytes in range [128, 256[ from foo before foo was
  # truncated - in other words, data loss from bar and being able to read old and
  # stale data from foo that should not be possible to read anymore through normal
  # filesystem operations. Contrast with the case where we truncate a file from a
  # size N to a smaller size M, truncate it back to size N and then read the range
  # [M, N[, we should always get the value 0x00 for all the bytes in that range.

  # We expected the clone operation to fail with errno EOPNOTSUPP and therefore
  # not modify our file's bar data/metadata. So its content should be 256 bytes
  # long with all bytes having the value 0xbb.
  #
  # Without the btrfs bug fix, the clone operation succeeded and resulted in
  # leaking truncated data from foo, the bytes that belonged to its range
  # [128, 256[, and losing data from bar in that same range. So reading the
  # file gave us the following content:
  #
  # 0000000 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1
  # *
  # 0000200 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a
  # *
  # 0000400
  echo "File bar's content after the clone operation:"
  od -t x1 $SCRATCH_MNT/bar

  # Also because the foo's inline extent was not shrunk by the truncate
  # operation, btrfs' fsck, which is run by the fstests framework everytime a
  # test completes, failed reporting the following error:
  #
  #  root 5 inode 257 errors 400, nbytes wrong

  status=0
  exit
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

c40009c4