1. 30 Nov, 2015 40 commits
    • Filipe Manana's avatar
      Btrfs: fix race leading to BUG_ON when running delalloc for nodatacow · 657299fc
      Filipe Manana authored
      commit 1d512cb7 upstream.
      
      If we are using the NO_HOLES feature, we have a tiny time window when
      running delalloc for a nodatacow inode where we can race with a concurrent
      link or xattr add operation leading to a BUG_ON.
      
      This happens because at run_delalloc_nocow() we end up casting a leaf item
      of type BTRFS_INODE_[REF|EXTREF]_KEY or of type BTRFS_XATTR_ITEM_KEY to a
      file extent item (struct btrfs_file_extent_item) and then analyse its
      extent type field, which won't match any of the expected extent types
      (values BTRFS_FILE_EXTENT_[REG|PREALLOC|INLINE]) and therefore trigger an
      explicit BUG_ON(1).
      
      The following sequence diagram shows how the race happens when running a
      no-cow dellaloc range [4K, 8K[ for inode 257 and we have the following
      neighbour leafs:
      
                   Leaf X (has N items)                    Leaf Y
      
       [ ... (257 INODE_ITEM 0) (257 INODE_REF 256) ]  [ (257 EXTENT_DATA 8192), ... ]
                    slot N - 2         slot N - 1              slot 0
      
       (Note the implicit hole for inode 257 regarding the [0, 8K[ range)
      
             CPU 1                                         CPU 2
      
       run_dealloc_nocow()
         btrfs_lookup_file_extent()
           --> searches for a key with value
               (257 EXTENT_DATA 4096) in the
               fs/subvol tree
           --> returns us a path with
               path->nodes[0] == leaf X and
               path->slots[0] == N
      
         because path->slots[0] is >=
         btrfs_header_nritems(leaf X), it
         calls btrfs_next_leaf()
      
         btrfs_next_leaf()
           --> releases the path
      
                                                    hard link added to our inode,
                                                    with key (257 INODE_REF 500)
                                                    added to the end of leaf X,
                                                    so leaf X now has N + 1 keys
      
           --> searches for the key
               (257 INODE_REF 256), because
               it was the last key in leaf X
               before it released the path,
               with path->keep_locks set to 1
      
           --> ends up at leaf X again and
               it verifies that the key
               (257 INODE_REF 256) is no longer
               the last key in the leaf, so it
               returns with path->nodes[0] ==
               leaf X and path->slots[0] == N,
               pointing to the new item with
               key (257 INODE_REF 500)
      
         the loop iteration of run_dealloc_nocow()
         does not break out the loop and continues
         because the key referenced in the path
         at path->nodes[0] and path->slots[0] is
         for inode 257, its type is < BTRFS_EXTENT_DATA_KEY
         and its offset (500) is less then our delalloc
         range's end (8192)
      
         the item pointed by the path, an inode reference item,
         is (incorrectly) interpreted as a file extent item and
         we get an invalid extent type, leading to the BUG_ON(1):
      
         if (extent_type == BTRFS_FILE_EXTENT_REG ||
            extent_type == BTRFS_FILE_EXTENT_PREALLOC) {
             (...)
         } else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
             (...)
         } else {
             BUG_ON(1)
         }
      
      The same can happen if a xattr is added concurrently and ends up having
      a key with an offset smaller then the delalloc's range end.
      
      So fix this by skipping keys with a type smaller than
      BTRFS_EXTENT_DATA_KEY.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      657299fc
    • Filipe Manana's avatar
      Btrfs: fix race leading to incorrect item deletion when dropping extents · 097ea2dc
      Filipe Manana authored
      commit aeafbf84 upstream.
      
      While running a stress test I got the following warning triggered:
      
        [191627.672810] ------------[ cut here ]------------
        [191627.673949] WARNING: CPU: 8 PID: 8447 at fs/btrfs/file.c:779 __btrfs_drop_extents+0x391/0xa50 [btrfs]()
        (...)
        [191627.701485] Call Trace:
        [191627.702037]  [<ffffffff8145f077>] dump_stack+0x4f/0x7b
        [191627.702992]  [<ffffffff81095de5>] ? console_unlock+0x356/0x3a2
        [191627.704091]  [<ffffffff8104b3b0>] warn_slowpath_common+0xa1/0xbb
        [191627.705380]  [<ffffffffa0664499>] ? __btrfs_drop_extents+0x391/0xa50 [btrfs]
        [191627.706637]  [<ffffffff8104b46d>] warn_slowpath_null+0x1a/0x1c
        [191627.707789]  [<ffffffffa0664499>] __btrfs_drop_extents+0x391/0xa50 [btrfs]
        [191627.709155]  [<ffffffff8115663c>] ? cache_alloc_debugcheck_after.isra.32+0x171/0x1d0
        [191627.712444]  [<ffffffff81155007>] ? kmemleak_alloc_recursive.constprop.40+0x16/0x18
        [191627.714162]  [<ffffffffa06570c9>] insert_reserved_file_extent.constprop.40+0x83/0x24e [btrfs]
        [191627.715887]  [<ffffffffa065422b>] ? start_transaction+0x3bb/0x610 [btrfs]
        [191627.717287]  [<ffffffffa065b604>] btrfs_finish_ordered_io+0x273/0x4e2 [btrfs]
        [191627.728865]  [<ffffffffa065b888>] finish_ordered_fn+0x15/0x17 [btrfs]
        [191627.730045]  [<ffffffffa067d688>] normal_work_helper+0x14c/0x32c [btrfs]
        [191627.731256]  [<ffffffffa067d96a>] btrfs_endio_write_helper+0x12/0x14 [btrfs]
        [191627.732661]  [<ffffffff81061119>] process_one_work+0x24c/0x4ae
        [191627.733822]  [<ffffffff810615b0>] worker_thread+0x206/0x2c2
        [191627.734857]  [<ffffffff810613aa>] ? process_scheduled_works+0x2f/0x2f
        [191627.736052]  [<ffffffff810613aa>] ? process_scheduled_works+0x2f/0x2f
        [191627.737349]  [<ffffffff810669a6>] kthread+0xef/0xf7
        [191627.738267]  [<ffffffff810f3b3a>] ? time_hardirqs_on+0x15/0x28
        [191627.739330]  [<ffffffff810668b7>] ? __kthread_parkme+0xad/0xad
        [191627.741976]  [<ffffffff81465592>] ret_from_fork+0x42/0x70
        [191627.743080]  [<ffffffff810668b7>] ? __kthread_parkme+0xad/0xad
        [191627.744206] ---[ end trace bbfddacb7aaada8d ]---
      
        $ cat -n fs/btrfs/file.c
        691  int __btrfs_drop_extents(struct btrfs_trans_handle *trans,
        (...)
        758                  btrfs_item_key_to_cpu(leaf, &key, path->slots[0]);
        759                  if (key.objectid > ino ||
        760                      key.type > BTRFS_EXTENT_DATA_KEY || key.offset >= end)
        761                          break;
        762
        763                  fi = btrfs_item_ptr(leaf, path->slots[0],
        764                                      struct btrfs_file_extent_item);
        765                  extent_type = btrfs_file_extent_type(leaf, fi);
        766
        767                  if (extent_type == BTRFS_FILE_EXTENT_REG ||
        768                      extent_type == BTRFS_FILE_EXTENT_PREALLOC) {
        (...)
        774                  } else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
        (...)
        778                  } else {
        779                          WARN_ON(1);
        780                          extent_end = search_start;
        781                  }
        (...)
      
      This happened because the item we were processing did not match a file
      extent item (its key type != BTRFS_EXTENT_DATA_KEY), and even on this
      case we cast the item to a struct btrfs_file_extent_item pointer and
      then find a type field value that does not match any of the expected
      values (BTRFS_FILE_EXTENT_[REG|PREALLOC|INLINE]). This scenario happens
      due to a tiny time window where a race can happen as exemplified below.
      For example, consider the following scenario where we're using the
      NO_HOLES feature and we have the following two neighbour leafs:
      
                     Leaf X (has N items)                    Leaf Y
      
      [ ... (257 INODE_ITEM 0) (257 INODE_REF 256) ]  [ (257 EXTENT_DATA 8192), ... ]
                slot N - 2         slot N - 1              slot 0
      
      Our inode 257 has an implicit hole in the range [0, 8K[ (implicit rather
      than explicit because NO_HOLES is enabled). Now if our inode has an
      ordered extent for the range [4K, 8K[ that is finishing, the following
      can happen:
      
                CPU 1                                       CPU 2
      
        btrfs_finish_ordered_io()
          insert_reserved_file_extent()
            __btrfs_drop_extents()
               Searches for the key
                (257 EXTENT_DATA 4096) through
                btrfs_lookup_file_extent()
      
               Key not found and we get a path where
               path->nodes[0] == leaf X and
               path->slots[0] == N
      
               Because path->slots[0] is >=
               btrfs_header_nritems(leaf X), we call
               btrfs_next_leaf()
      
               btrfs_next_leaf() releases the path
      
                                                        inserts key
                                                        (257 INODE_REF 4096)
                                                        at the end of leaf X,
                                                        leaf X now has N + 1 keys,
                                                        and the new key is at
                                                        slot N
      
               btrfs_next_leaf() searches for
               key (257 INODE_REF 256), with
               path->keep_locks set to 1,
               because it was the last key it
               saw in leaf X
      
                 finds it in leaf X again and
                 notices it's no longer the last
                 key of the leaf, so it returns 0
                 with path->nodes[0] == leaf X and
                 path->slots[0] == N (which is now
                 < btrfs_header_nritems(leaf X)),
                 pointing to the new key
                 (257 INODE_REF 4096)
      
               __btrfs_drop_extents() casts the
               item at path->nodes[0], slot
               path->slots[0], to a struct
               btrfs_file_extent_item - it does
               not skip keys for the target
               inode with a type less than
               BTRFS_EXTENT_DATA_KEY
               (BTRFS_INODE_REF_KEY < BTRFS_EXTENT_DATA_KEY)
      
               sees a bogus value for the type
               field triggering the WARN_ON in
               the trace shown above, and sets
               extent_end = search_start (4096)
      
               does the if-then-else logic to
               fixup 0 length extent items created
               by a past bug from hole punching:
      
                 if (extent_end == key.offset &&
                     extent_end >= search_start)
                     goto delete_extent_item;
      
               that evaluates to true and it ends
               up deleting the key pointed to by
               path->slots[0], (257 INODE_REF 4096),
               from leaf X
      
      The same could happen for example for a xattr that ends up having a key
      with an offset value that matches search_start (very unlikely but not
      impossible).
      
      So fix this by ensuring that keys smaller than BTRFS_EXTENT_DATA_KEY are
      skipped, never casted to struct btrfs_file_extent_item and never deleted
      by accident. Also protect against the unexpected case of getting a key
      for a lower inode number by skipping that key and issuing a warning.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      097ea2dc
    • Helge Deller's avatar
      parisc: Fixes and cleanups in kernel uapi header files · 7da5559a
      Helge Deller authored
      commit d0cf62fb upstream.
      
      This patch fixes some bugs and partly cleans up the parisc uapi header
      files to what glibc defined:
      - compat_semid64_ds was wrong and did not take the endianess into
        account
      - ipc64_perm exported userspace types which broke building userspace
        packages on debian (e.g. trinity)
      - ipc64_perm needs to use a 32bit mode_t on 64bit kernel
      - msqid64_ds and semid64_ds needs unsigned longs for various struct members
      - shmid64_ds exported size_t instead of __kernel_size_t
      
      And finally add some compile-time checks for the sizes of those structs
      to avoid future breakage.
      
      Runtime-tested with the Linux Test Project (LTP) testsuite.
      Reviewed-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      7da5559a
    • Borislav Petkov's avatar
      x86/cpu: Call verify_cpu() after having entered long mode too · 0cafa822
      Borislav Petkov authored
      commit 04633df0 upstream.
      
      When we get loaded by a 64-bit bootloader, kernel entry point is
      startup_64 in head_64.S. We don't trust any and all bootloaders because
      some will fiddle with CPU configuration so we go ahead and massage each
      CPU into sanity again.
      
      For example, some dell BIOSes have this XD disable feature which set
      IA32_MISC_ENABLE[34] and disable NX. This might be some dumb workaround
      for other OSes but Linux sure doesn't need it.
      
      A similar thing is present in the Surface 3 firmware - see
      https://bugzilla.kernel.org/show_bug.cgi?id=106051 - which sets this bit
      only on the BSP:
      
        # rdmsr -a 0x1a0
        400850089
        850089
        850089
        850089
      
      I know, right?!
      
      There's not even an off switch in there.
      
      So fix all those cases by sanitizing the 64-bit entry point too. For
      that, make verify_cpu() callable in 64-bit mode also.
      Requested-and-debugged-by: default avatar"H. Peter Anvin" <hpa@zytor.com>
      Reported-and-tested-by: default avatarBastien Nocera <bugzilla@hadess.net>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1446739076-21303-1-git-send-email-bp@alien8.deSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      0cafa822
    • Greg Thelen's avatar
      fs, seqfile: always allow oom killer · 6f3916c7
      Greg Thelen authored
      commit 0f930902 upstream.
      
      Since 5cec38ac ("fs, seq_file: fallback to vmalloc instead of oom kill
      processes") seq_buf_alloc() avoids calling the oom killer for PAGE_SIZE or
      smaller allocations; but larger allocations can use the oom killer via
      vmalloc().  Thus reads of small files can return ENOMEM, but larger files
      use the oom killer to avoid ENOMEM.
      
      The effect of this bug is that reads from /proc and other virtual
      filesystems can return ENOMEM instead of the preferred behavior - oom
      killing something (possibly the calling process).  I don't know of anyone
      except Google who has noticed the issue.
      
      I suspect the fix is more needed in smaller systems where there isn't any
      reclaimable memory.  But these seem like the kinds of systems which
      probably don't use the oom killer for production situations.
      
      Memory overcommit requires use of the oom killer to select a victim
      regardless of file size.
      
      Enable oom killer for small seq_buf_alloc() allocations.
      
      Fixes: 5cec38ac ("fs, seq_file: fallback to vmalloc instead of oom kill processes")
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarGreg Thelen <gthelen@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      6f3916c7
    • Mathias Krause's avatar
      printk: prevent userland from spoofing kernel messages · 03bebbbe
      Mathias Krause authored
      commit 3824657c upstream.
      
      The following statement of ABI/testing/dev-kmsg is not quite right:
      
         It is not possible to inject messages from userspace with the
         facility number LOG_KERN (0), to make sure that the origin of the
         messages can always be reliably determined.
      
      Userland actually can inject messages with a facility of 0 by abusing the
      fact that the facility is stored in a u8 data type.  By using a facility
      which is a multiple of 256 the assignment of msg->facility in log_store()
      implicitly truncates it to 0, i.e.  LOG_KERN, allowing users of /dev/kmsg
      to spoof kernel messages as shown below:
      
      The following call...
         # printf '<%d>Kernel panic - not syncing: beer empty\n' 0 >/dev/kmsg
      ...leads to the following log entry (dmesg -x | tail -n 1):
         user  :emerg : [   66.137758] Kernel panic - not syncing: beer empty
      
      However, this call...
         # printf '<%d>Kernel panic - not syncing: beer empty\n' 0x800 >/dev/kmsg
      ...leads to the slightly different log entry (note the kernel facility):
         kern  :emerg : [   74.177343] Kernel panic - not syncing: beer empty
      
      Fix that by limiting the user provided facility to 8 bit right from the
      beginning and catch the truncation early.
      
      Fixes: 7ff9554b ("printk: convert byte-buffer to variable-length...")
      Signed-off-by: default avatarMathias Krause <minipli@googlemail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Petr Mladek <pmladek@suse.cz>
      Cc: Alex Elder <elder@linaro.org>
      Cc: Joe Perches <joe@perches.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      03bebbbe
    • Oleg Nesterov's avatar
      proc: actually make proc_fd_permission() thread-friendly · 1d6a67f1
      Oleg Nesterov authored
      commit 54708d28 upstream.
      
      The commit 96d0df79 ("proc: make proc_fd_permission() thread-friendly")
      fixed the access to /proc/self/fd from sub-threads, but introduced another
      problem: a sub-thread can't access /proc/<tid>/fd/ or /proc/thread-self/fd
      if generic_permission() fails.
      
      Change proc_fd_permission() to check same_thread_group(pid_task(), current).
      
      Fixes: 96d0df79 ("proc: make proc_fd_permission() thread-friendly")
      Reported-by: default avatar"Jin, Yihua" <yihua.jin@intel.com>
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      1d6a67f1
    • Takashi Iwai's avatar
      Input: elantech - add Fujitsu Lifebook U745 to force crc_enabled · c2bc43c2
      Takashi Iwai authored
      commit 60603950 upstream.
      
      Another Lifebook machine that needs the same quirk as other similar
      models to make the driver working.
      
      Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=883192Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      c2bc43c2
    • Zi Shen Lim's avatar
      arm64: bpf: fix mod-by-zero case · 856b41c2
      Zi Shen Lim authored
      commit 14e589ff upstream.
      
      Turns out in the case of modulo by zero in a BPF program:
      	A = A % X;  (X == 0)
      the expected behavior is to terminate with return value 0.
      
      The bug in JIT is exposed by a new test case [1].
      
      [1] https://lkml.org/lkml/2015/11/4/499Signed-off-by: default avatarZi Shen Lim <zlim.lnx@gmail.com>
      Reported-by: default avatarYang Shi <yang.shi@linaro.org>
      Reported-by: default avatarXi Wang <xi.wang@gmail.com>
      CC: Alexei Starovoitov <ast@plumgrid.com>
      Fixes: e54bcde3 ("arm64: eBPF JIT compiler")
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      856b41c2
    • Zi Shen Lim's avatar
      arm64: bpf: fix div-by-zero case · eb0c40a4
      Zi Shen Lim authored
      commit 251599e1 upstream.
      
      In the case of division by zero in a BPF program:
      	A = A / X;  (X == 0)
      the expected behavior is to terminate with return value 0.
      
      This is confirmed by the test case introduced in commit 86bf1721
      ("test_bpf: add tests checking that JIT/interpreter sets A and X to 0.").
      Reported-by: default avatarYang Shi <yang.shi@linaro.org>
      Tested-by: default avatarYang Shi <yang.shi@linaro.org>
      CC: Xi Wang <xi.wang@gmail.com>
      CC: Alexei Starovoitov <ast@plumgrid.com>
      CC: linux-arm-kernel@lists.infradead.org
      CC: linux-kernel@vger.kernel.org
      Fixes: e54bcde3 ("arm64: eBPF JIT compiler")
      Signed-off-by: default avatarZi Shen Lim <zlim.lnx@gmail.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      eb0c40a4
    • Michal Hocko's avatar
      memcg: fix thresholds for 32b architectures. · 0079424f
      Michal Hocko authored
      commit c12176d3 upstream.
      
      Commit 424cdc14 ("memcg: convert threshold to bytes") has fixed a
      regression introduced by 3e32cb2e ("mm: memcontrol: lockless page
      counters") where thresholds were silently converted to use page units
      rather than bytes when interpreting the user input.
      
      The fix is not complete, though, as properly pointed out by Ben Hutchings
      during stable backport review.  The page count is converted to bytes but
      unsigned long is used to hold the value which would be obviously not
      sufficient for 32b systems with more than 4G thresholds.  The same applies
      to usage as taken from mem_cgroup_usage which might overflow.
      
      Let's remove this bytes vs.  pages internal tracking differences and
      handle thresholds in page units internally.  Chage mem_cgroup_usage() to
      return the value in page units and revert 424cdc14 because this should
      be sufficient for the consistent handling.  mem_cgroup_read_u64 as the
      only users of mem_cgroup_usage outside of the threshold handling code is
      converted to give the proper in bytes result.  It is doing that already
      for page_counter output so this is more consistent as well.
      
      The value presented to the userspace is still in bytes units.
      
      Fixes: 424cdc14 ("memcg: convert threshold to bytes")
      Fixes: 3e32cb2e ("mm: memcontrol: lockless page counters")
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Reviewed-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      From: Michal Hocko <mhocko@kernel.org>
      Subject: memcg-fix-thresholds-for-32b-architectures-fix
      
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      From: Andrew Morton <akpm@linux-foundation.org>
      Subject: memcg-fix-thresholds-for-32b-architectures-fix-fix
      
      don't attempt to inline mem_cgroup_usage()
      
      The compiler ignores the inline anwyay.  And __always_inlining it adds 600
      bytes of goop to the .o file.
      
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      0079424f
    • Catalin Marinas's avatar
      mm: slab: only move management objects off-slab for sizes larger than KMALLOC_MIN_SIZE · e028ec21
      Catalin Marinas authored
      commit d4322d88 upstream.
      
      On systems with a KMALLOC_MIN_SIZE of 128 (arm64, some mips and powerpc
      configurations defining ARCH_DMA_MINALIGN to 128), the first
      kmalloc_caches[] entry to be initialised after slab_early_init = 0 is
      "kmalloc-128" with index 7.  Depending on the debug kernel configuration,
      sizeof(struct kmem_cache) can be larger than 128 resulting in an
      INDEX_NODE of 8.
      
      Commit 8fc9cf42 ("slab: make more slab management structure off the
      slab") enables off-slab management objects for sizes starting with
      PAGE_SIZE >> 5 (128 bytes for a 4KB page configuration) and the creation
      of the "kmalloc-128" cache would try to place the management objects
      off-slab.  However, since KMALLOC_MIN_SIZE is already 128 and
      freelist_size == 32 in __kmem_cache_create(), kmalloc_slab(freelist_size)
      returns NULL (kmalloc_caches[7] not populated yet).  This triggers the
      following bug on arm64:
      
        kernel BUG at /work/Linux/linux-2.6-aarch64/mm/slab.c:2283!
        Internal error: Oops - BUG: 0 [#1] SMP
        Modules linked in:
        CPU: 0 PID: 0 Comm: swapper Not tainted 4.3.0-rc4+ #540
        Hardware name: Juno (DT)
        PC is at __kmem_cache_create+0x21c/0x280
        LR is at __kmem_cache_create+0x210/0x280
        [...]
        Call trace:
          __kmem_cache_create+0x21c/0x280
          create_boot_cache+0x48/0x80
          create_kmalloc_cache+0x50/0x88
          create_kmalloc_caches+0x4c/0xf4
          kmem_cache_init+0x100/0x118
          start_kernel+0x214/0x33c
      
      This patch introduces an OFF_SLAB_MIN_SIZE definition to avoid off-slab
      management objects for sizes equal to or smaller than KMALLOC_MIN_SIZE.
      
      Fixes: 8fc9cf42 ("slab: make more slab management structure off the slab")
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reported-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      e028ec21
    • Christoph Hellwig's avatar
      scsi: restart list search after unlock in scsi_remove_target · 2ba3333e
      Christoph Hellwig authored
      commit 40998193 upstream.
      
      When dropping a lock while iterating a list we must restart the search
      as other threads could have manipulated the list under us.  Without this
      we can get stuck in an endless loop.  This bug was introduced by
      
      commit bc3f02a7
      Author: Dan Williams <djbw@fb.com>
      Date:   Tue Aug 28 22:12:10 2012 -0700
      
          [SCSI] scsi_remove_target: fix softlockup regression on hot remove
      
      Which was itself trying to fix a reported soft lockup issue
      
      http://thread.gmane.org/gmane.linux.kernel/1348679
      
      However, we believe even with this revert of the original patch, the soft
      lockup problem has been fixed by
      
      commit f2495e22
      Author: James Bottomley <JBottomley@Parallels.com>
      Date:   Tue Jan 21 07:01:41 2014 -0800
      
          [SCSI] dual scan thread bug fix
      
      Thanks go to Dan Williams <dan.j.williams@intel.com> for tracking all this
      prior history down.
      Reported-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Tested-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Fixes: bc3f02a7Signed-off-by: default avatarJames Bottomley <JBottomley@Odin.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      2ba3333e
    • Stefan Richter's avatar
      firewire: ohci: fix JMicron JMB38x IT context discovery · 8926d851
      Stefan Richter authored
      commit 100ceb66 upstream.
      
      Reported by Clifford and Craig for JMicron OHCI-1394 + SDHCI combo
      controllers:  Often or even most of the time, the controller is
      initialized with the message "added OHCI v1.10 device as card 0, 4 IR +
      0 IT contexts, quirks 0x10".  With 0 isochronous transmit DMA contexts
      (IT contexts), applications like audio output are impossible.
      
      However, OHCI-1394 demands that at least 4 IT contexts are implemented
      by the link layer controller, and indeed JMicron JMB38x do implement
      four of them.  Only their IsoXmitIntMask register is unreliable at early
      access.
      
      With my own JMB381 single function controller I found:
        - I can reproduce the problem with a lower probability than Craig's.
        - If I put a loop around the section which clears and reads
          IsoXmitIntMask, then either the first or the second attempt will
          return the correct initial mask of 0x0000000f.  I never encountered
          a case of needing more than a second attempt.
        - Consequently, if I put a dummy reg_read(...IsoXmitIntMaskSet)
          before the first write, the subsequent read will return the correct
          result.
        - If I merely ignore a wrong read result and force the known real
          result, later isochronous transmit DMA usage works just fine.
      
      So let's just fix this chip bug up by the latter method.  Tested with
      JMB381 on kernel 3.13 and 4.3.
      
      Since OHCI-1394 generally requires 4 IT contexts at a minium, this
      workaround is simply applied whenever the initial read of IsoXmitIntMask
      returns 0, regardless whether it's a JMicron chip or not.  I never heard
      of this issue together with any other chip though.
      
      I am not 100% sure that this fix works on the OHCI-1394 part of JMB380
      and JMB388 combo controllers exactly the same as on the JMB381 single-
      function controller, but so far I haven't had a chance to let an owner
      of a combo chip run a patched kernel.
      
      Strangely enough, IsoRecvIntMask is always reported correctly, even
      though it is probed right before IsoXmitIntMask.
      
      Reported-by: Clifford Dunn
      Reported-by: default avatarCraig Moore <craig.moore@qenos.com>
      Signed-off-by: default avatarStefan Richter <stefanr@s5r6.in-berlin.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      8926d851
    • Alexandra Yates's avatar
      ALSA: hda - Add Intel Lewisburg device IDs Audio · 1efbcb95
      Alexandra Yates authored
      commit 5cf92c8b upstream.
      
      Adding Intel codename Lewisburg platform device IDs for audio.
      
      [rearranged the position by tiwai]
      Signed-off-by: default avatarAlexandra Yates <alexandra.yates@linux.intel.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      1efbcb95
    • Takashi Iwai's avatar
      ALSA: hda - Apply pin fixup for HP ProBook 6550b · 28e2272a
      Takashi Iwai authored
      commit c932b98c upstream.
      
      HP ProBook 6550b needs the same pin fixup applied to other HP B-series
      laptops with docks for making its headphone and dock headphone jacks
      working properly.  We just need to add the codec SSID to the list.
      
      Bugzilla: https://bugzilla.kernel.org/attachment.cgi?id=191971Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      28e2272a
    • Krzysztof Kozlowski's avatar
      thermal: exynos: Fix unbalanced regulator disable on probe failure · 402b7ecc
      Krzysztof Kozlowski authored
      commit 824ead03 upstream.
      
      During probe if the regulator could not be enabled, the error exit path
      would still disable it. This could lead to unbalanced counter of
      regulator enable/disable.
      
      The patch moves code for getting and enabling the regulator from
      exynos_map_dt_data() to probe function because it is really not a part
      of getting Device Tree properties.
      Acked-by: default avatarLukasz Majewski <l.majewski@samsung.com>
      Tested-by: default avatarLukasz Majewski <l.majewski@samsung.com>
      Reviewed-by: default avatarAlim Akhtar <alim.akhtar@samsung.com>
      Signed-off-by: default avatarKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Fixes: 5f09a5cb ("thermal: exynos: Disable the regulator on probe failure")
      Signed-off-by: default avatarEduardo Valentin <edubezval@gmail.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      402b7ecc
    • Radim Krčmář's avatar
      KVM: VMX: fix SMEP and SMAP without EPT · 2d4290b7
      Radim Krčmář authored
      commit 656ec4a4 upstream.
      
      The comment in code had it mostly right, but we enable paging for
      emulated real mode regardless of EPT.
      
      Without EPT (which implies emulated real mode), secondary VCPUs won't
      start unless we disable SM[AE]P when the guest doesn't use paging.
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      2d4290b7
    • Li Bin's avatar
      recordmcount: arm64: Replace the ignored mcount call into nop · 44a41198
      Li Bin authored
      commit 2ee8a74f upstream.
      
      By now, the recordmcount only records the function that in
      following sections:
      .text/.ref.text/.sched.text/.spinlock.text/.irqentry.text/
      .kprobes.text/.text.unlikely
      
      For the function that not in these sections, the call mcount
      will be in place and not be replaced when kernel boot up. And
      it will bring performance overhead, such as do_mem_abort (in
      .exception.text section). This patch make the call mcount to
      nop for this case in recordmcount.
      
      Link: http://lkml.kernel.org/r/1446019445-14421-1-git-send-email-huawei.libin@huawei.com
      Link: http://lkml.kernel.org/r/1446193864-24593-4-git-send-email-huawei.libin@huawei.com
      
      Cc: <lkp@intel.com>
      Cc: <catalin.marinas@arm.com>
      Cc: <takahiro.akashi@linaro.org>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarLi Bin <huawei.libin@huawei.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      44a41198
    • libin's avatar
      recordmcount: Fix endianness handling bug for nop_mcount · aa7dc98c
      libin authored
      commit c84da8b9 upstream.
      
      In nop_mcount, shdr->sh_offset and welp->r_offset should handle
      endianness properly, otherwise it will trigger Segmentation fault
      if the recordmcount main and file.o have different endianness.
      
      Link: http://lkml.kernel.org/r/563806C7.7070606@huawei.comSigned-off-by: default avatarLi Bin <huawei.libin@huawei.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      aa7dc98c
    • Max Filippov's avatar
      xtensa: fix secondary core boot in SMP · ab5e118a
      Max Filippov authored
      commit ab45fb14 upstream.
      
      There are multiple factors adding to the issue in different
      configurations:
      
      - commit 17290231 ("xtensa: add fixup for double exception raised
        in window overflow") added function window_overflow_restore_a0_fixup to
        double exception vector overlapping reset vector location of secondary
        processor cores.
      - on MMUv2 cores RESET_VECTOR1_VADDR may point to uncached kernel memory
        making code overlapping depend on cache type and size, so that without
        cache or with WT cache reset vector code overwrites double exception
        code, making issue even harder to detect.
      - on MMUv3 cores RESET_VECTOR1_VADDR may point to unmapped area, as
        MMUv3 cores change virtual address map to match MMUv2 layout, but
        reset vector virtual address is given for the original MMUv3 mapping.
      - physical memory region of the secondary reset vector is not reserved
        in the physical memory map, and thus may be allocated and overwritten
        at arbitrary moment.
      
      Fix it as follows:
      
      - move window_overflow_restore_a0_fixup code to .text section.
      - define RESET_VECTOR1_VADDR so that it points to reset vector in the
        cacheable MMUv2 map for cores with MMU.
      - reserve reset vector region in the physical memory map. Drop separate
        literal section and build mxhead.S with text section literals.
      Signed-off-by: default avatarMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ab5e118a
    • Arik Nemtsov's avatar
      mac80211: allow null chandef in tracing · c5be932d
      Arik Nemtsov authored
      commit 254d3dfe upstream.
      
      In TDLS channel-switch operations the chandef can sometimes be NULL.
      Avoid an oops in the trace code for these cases and just print a
      chandef full of zeros.
      
      Fixes: a7a6bdd0 ("mac80211: introduce TDLS channel switch ops")
      Signed-off-by: default avatarArik Nemtsov <arikx.nemtsov@intel.com>
      Signed-off-by: default avatarEmmanuel Grumbach <emmanuel.grumbach@intel.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      c5be932d
    • Ola Olsson's avatar
      nl80211: Fix potential memory leak from parse_acl_data · 2d359552
      Ola Olsson authored
      commit 4baf6bea upstream.
      
      If parse_acl_data succeeds but the subsequent parsing of smps
      attributes fails, there will be a memory leak due to early returns.
      Fix that by moving the ACL parsing later.
      
      Fixes: 18998c38 ("cfg80211: allow requesting SMPS mode on ap start")
      Signed-off-by: default avatarOla Olsson <ola.olsson@sonymobile.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      2d359552
    • Janusz.Dziedzic@tieto.com's avatar
      mac80211: fix divide by zero when NOA update · 3d69b3f0
      Janusz.Dziedzic@tieto.com authored
      commit 519ee691 upstream.
      
      In case of one shot NOA the interval can be 0, catch that
      instead of potentially (depending on the driver) crashing
      like this:
      
      divide error: 0000 [#1] SMP
      [...]
      Call Trace:
      <IRQ>
      [<ffffffffc08e891c>] ieee80211_extend_absent_time+0x6c/0xb0 [mac80211]
      [<ffffffffc08e8a17>] ieee80211_update_p2p_noa+0xb7/0xe0 [mac80211]
      [<ffffffffc069cc30>] ath9k_p2p_ps_timer+0x170/0x190 [ath9k]
      [<ffffffffc070adf8>] ath_gen_timer_isr+0xc8/0xf0 [ath9k_hw]
      [<ffffffffc0691156>] ath9k_tasklet+0x296/0x2f0 [ath9k]
      [<ffffffff8107ad65>] tasklet_action+0xe5/0xf0
      [...]
      Signed-off-by: default avatarJanusz Dziedzic <janusz.dziedzic@tieto.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      3d69b3f0
    • sumit.saxena@avagotech.com's avatar
      megaraid_sas : SMAP restriction--do not access user memory from IOCTL code · 278313ea
      sumit.saxena@avagotech.com authored
      commit 323c4a02 upstream.
      
      This is an issue on SMAP enabled CPUs and 32 bit apps running on 64 bit
      OS. Do not access user memory from kernel code. The SMAP bit restricts
      accessing user memory from kernel code.
      Signed-off-by: default avatarSumit Saxena <sumit.saxena@avagotech.com>
      Signed-off-by: default avatarKashyap Desai <kashyap.desai@avagotech.com>
      Reviewed-by: default avatarTomas Henzl <thenzl@redhat.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      278313ea
    • Gabriele Paoloni's avatar
      PCI: spear: Fix dw_pcie_cfg_read/write() usage · f556c316
      Gabriele Paoloni authored
      commit fa3b7cba upstream.
      
      The first argument of dw_pcie_cfg_read/write() is a 32-bit aligned address.
      The second argument is the byte offset into a 32-bit word, and
      dw_pcie_cfg_read/write() only look at the low two bits.
      
      SPEAr13xx used dw_pcie_cfg_read() and dw_pcie_cfg_write() incorrectly: it
      passed important address bits in the second argument, where they were
      ignored.
      
      Pass the complete 32-bit word address in the first argument and only the
      2-bit offset into that word in the second argument.
      
      Without this fix, SPEAr13xx host will never work with few buggy gen1 card
      which connects with only gen1 host and also with any endpoint which would
      generate a read request of more than 128 bytes.
      
      [bhelgaas: changelog]
      Reported-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarPratyush Anand <panand@redhat.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      f556c316
    • Max Filippov's avatar
      xtensa: fixes for configs without loop option · e8cf8ad9
      Max Filippov authored
      commit 5029615e upstream.
      
      Build-time fixes:
      - make lbeg/lend/lcount save/restore conditional on kernel entry;
      - don't clear lcount in platform_restart functions unconditionally.
      
      Run-time fixes:
      - use correct end of range register in __endla paired with __loopt, not
        the unused temporary register. This fixes .bss zero-initialization.
        Update comments in asmmacro.h;
      - don't clobber a10 in the usercopy that leads to access to unmapped
        memory.
      Signed-off-by: default avatarMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      e8cf8ad9
    • Herbert Xu's avatar
      crypto: algif_hash - Only export and import on sockets with data · b1bc7c79
      Herbert Xu authored
      commit 4afa5f96 upstream.
      
      The hash_accept call fails to work on sockets that have not received
      any data.  For some algorithm implementations it may cause crashes.
      
      This patch fixes this by ensuring that we only export and import on
      sockets that have received data.
      Reported-by: default avatarHarsh Jain <harshjain.prof@gmail.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Tested-by: default avatarStephan Mueller <smueller@chronox.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      b1bc7c79
    • Jani Nikula's avatar
    • Mauricio Faria de Oliveira's avatar
      Revert "dm mpath: fix stalls when handling invalid ioctls" · cdc3e315
      Mauricio Faria de Oliveira authored
      commit 47796938 upstream.
      
      This reverts commit a1989b33.
      
      That commit introduced a regression at least for the case of the SG_IO ioctl()
      running without CAP_SYS_RAWIO capability (e.g., unprivileged users) when there
      are no active paths: the ioctl() fails with the ENOTTY errno immediately rather
      than blocking due to queue_if_no_path until a path becomes active, for example.
      
      That case happens to be exercised by QEMU KVM guests with 'scsi-block' devices
      (qemu "-device scsi-block" [1], libvirt "<disk type='block' device='lun'>" [2])
      from multipath devices; which leads to SCSI/filesystem errors in such a guest.
      
      More general scenarios can hit that regression too. The following demonstration
      employs a SG_IO ioctl() with a standard SCSI INQUIRY command for this objective
      (some output & user changes omitted for brevity and comments added for clarity).
      
      Reverting that commit restores normal operation (queueing) in failing scenarios;
      tested on linux-next (next-20151022).
      
      1) Test-case is based on sg_simple0 [3] (just SG_IO; remove SG_GET_VERSION_NUM)
      
          $ cat sg_simple0.c
          ... see [3] ...
          $ sed '/SG_GET_VERSION_NUM/,/}/d' sg_simple0.c > sgio_inquiry.c
          $ gcc sgio_inquiry.c -o sgio_inquiry
      
      2) The ioctl() works fine with active paths present.
      
          # multipath -l 85ag56
          85ag56 (...) dm-19 IBM     ,2145
          size=60G features='1 queue_if_no_path' hwhandler='0' wp=rw
          |-+- policy='service-time 0' prio=0 status=active
          | |- 8:0:11:0  sdz  65:144  active undef running
          | `- 9:0:9:0   sdbf 67:144  active undef running
          `-+- policy='service-time 0' prio=0 status=enabled
            |- 8:0:12:0  sdae 65:224  active undef running
            `- 9:0:12:0  sdbo 68:32   active undef running
      
          $ ./sgio_inquiry /dev/mapper/85ag56
          Some of the INQUIRY command's response:
              IBM       2145              0000
          INQUIRY duration=0 millisecs, resid=0
      
      3) The ioctl() fails with ENOTTY errno with _no_ active paths present,
         for unprivileged users (rather than blocking due to queue_if_no_path).
      
          # for path in $(multipath -l 85ag56 | grep -o 'sd[a-z]\+'); \
                do multipathd -k"fail path $path"; done
      
          # multipath -l 85ag56
          85ag56 (...) dm-19 IBM     ,2145
          size=60G features='1 queue_if_no_path' hwhandler='0' wp=rw
          |-+- policy='service-time 0' prio=0 status=enabled
          | |- 8:0:11:0  sdz  65:144  failed undef running
          | `- 9:0:9:0   sdbf 67:144  failed undef running
          `-+- policy='service-time 0' prio=0 status=enabled
            |- 8:0:12:0  sdae 65:224  failed undef running
            `- 9:0:12:0  sdbo 68:32   failed undef running
      
          $ ./sgio_inquiry /dev/mapper/85ag56
          sg_simple0: Inquiry SG_IO ioctl error: Inappropriate ioctl for device
      
      4) dmesg shows that scsi_verify_blk_ioctl() failed for SG_IO (0x2285);
         it returns -ENOIOCTLCMD, later replaced with -ENOTTY in vfs_ioctl().
      
          $ dmesg
          <...>
          [] device-mapper: multipath: Failing path 65:144.
          [] device-mapper: multipath: Failing path 67:144.
          [] device-mapper: multipath: Failing path 65:224.
          [] device-mapper: multipath: Failing path 68:32.
          [] sgio_inquiry: sending ioctl 2285 to a partition!
      
      5) The ioctl() only works if the SYS_CAP_RAWIO capability is present
         (then queueing happens -- in this example, queue_if_no_path is set);
         this is due to a conditional check in scsi_verify_blk_ioctl().
      
          # capsh --drop=cap_sys_rawio -- -c './sgio_inquiry /dev/mapper/85ag56'
          sg_simple0: Inquiry SG_IO ioctl error: Inappropriate ioctl for device
      
          # ./sgio_inquiry /dev/mapper/85ag56 &
          [1] 72830
      
          # cat /proc/72830/stack
          [<c00000171c0df700>] 0xc00000171c0df700
          [<c000000000015934>] __switch_to+0x204/0x350
          [<c000000000152d4c>] msleep+0x5c/0x80
          [<c00000000077dfb0>] dm_blk_ioctl+0x70/0x170
          [<c000000000487c40>] blkdev_ioctl+0x2b0/0x9b0
          [<c0000000003128e4>] block_ioctl+0x64/0xd0
          [<c0000000002dd3b0>] do_vfs_ioctl+0x490/0x780
          [<c0000000002dd774>] SyS_ioctl+0xd4/0xf0
          [<c000000000009358>] system_call+0x38/0xd0
      
      6) This is the function call chain exercised in this analysis:
      
      SYSCALL_DEFINE3(ioctl, <...>) @ fs/ioctl.c
          -> do_vfs_ioctl()
              -> vfs_ioctl()
                  ...
                  error = filp->f_op->unlocked_ioctl(filp, cmd, arg);
                  ...
                      -> dm_blk_ioctl() @ drivers/md/dm.c
                          -> multipath_ioctl() @ drivers/md/dm-mpath.c
                              ...
                              (bdev = NULL, due to no active paths)
                              ...
                              if (!bdev || <...>) {
                                  int err = scsi_verify_blk_ioctl(NULL, cmd);
                                  if (err)
                                      r = err;
                              }
                              ...
                                  -> scsi_verify_blk_ioctl() @ block/scsi_ioctl.c
                                      ...
                                      if (bd && bd == bd->bd_contains) // not taken (bd = NULL)
                                          return 0;
                                      ...
                                      if (capable(CAP_SYS_RAWIO)) // not taken (unprivileged user)
                                          return 0;
                                      ...
                                      printk_ratelimited(KERN_WARNING
                                                 "%s: sending ioctl %x to a partition!\n" <...>);
      
                                      return -ENOIOCTLCMD;
                                  <-
                              ...
                              return r ? : <...>
                          <-
                  ...
                  if (error == -ENOIOCTLCMD)
                      error = -ENOTTY;
                   out:
                      return error;
                  ...
      
      Links:
      [1] http://git.qemu.org/?p=qemu.git;a=commit;h=336a6915bc7089fb20fea4ba99972ad9a97c5f52
      [2] https://libvirt.org/formatdomain.html#elementsDisks (see 'disk' -> 'device')
      [3] http://tldp.org/HOWTO/SCSI-Generic-HOWTO/pexample.html (Revision 1.2, 2002-05-03)
      Signed-off-by: default avatarMauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      cdc3e315
    • Brian Norris's avatar
      mtd: blkdevs: fix potential deadlock + lockdep warnings · 0daab66d
      Brian Norris authored
      commit f3c63795 upstream.
      
      Commit 073db4a5 ("mtd: fix: avoid race condition when accessing
      mtd->usecount") fixed a race condition but due to poor ordering of the
      mutex acquisition, introduced a potential deadlock.
      
      The deadlock can occur, for example, when rmmod'ing the m25p80 module, which
      will delete one or more MTDs, along with any corresponding mtdblock
      devices. This could potentially race with an acquisition of the block
      device as follows.
      
       -> blktrans_open()
          ->  mutex_lock(&dev->lock);
          ->  mutex_lock(&mtd_table_mutex);
      
       -> del_mtd_device()
          ->  mutex_lock(&mtd_table_mutex);
          ->  blktrans_notify_remove() -> del_mtd_blktrans_dev()
             ->  mutex_lock(&dev->lock);
      
      This is a classic (potential) ABBA deadlock, which can be fixed by
      making the A->B ordering consistent everywhere. There was no real
      purpose to the ordering in the original patch, AFAIR, so this shouldn't
      be a problem. This ordering was actually already present in
      del_mtd_blktrans_dev(), for one, where the function tried to ensure that
      its caller already held mtd_table_mutex before it acquired &dev->lock:
      
              if (mutex_trylock(&mtd_table_mutex)) {
                      mutex_unlock(&mtd_table_mutex);
                      BUG();
              }
      
      So, reverse the ordering of acquisition of &dev->lock and &mtd_table_mutex so
      we always acquire mtd_table_mutex first.
      
      Snippets of the lockdep output follow:
      
        # modprobe -r m25p80
        [   53.419251]
        [   53.420838] ======================================================
        [   53.427300] [ INFO: possible circular locking dependency detected ]
        [   53.433865] 4.3.0-rc6 #96 Not tainted
        [   53.437686] -------------------------------------------------------
        [   53.444220] modprobe/372 is trying to acquire lock:
        [   53.449320]  (&new->lock){+.+...}, at: [<c043fe4c>] del_mtd_blktrans_dev+0x80/0xdc
        [   53.457271]
        [   53.457271] but task is already holding lock:
        [   53.463372]  (mtd_table_mutex){+.+.+.}, at: [<c0439994>] del_mtd_device+0x18/0x100
        [   53.471321]
        [   53.471321] which lock already depends on the new lock.
        [   53.471321]
        [   53.479856]
        [   53.479856] the existing dependency chain (in reverse order) is:
        [   53.487660]
        -> #1 (mtd_table_mutex){+.+.+.}:
        [   53.492331]        [<c043fc5c>] blktrans_open+0x34/0x1a4
        [   53.497879]        [<c01afce0>] __blkdev_get+0xc4/0x3b0
        [   53.503364]        [<c01b0bb8>] blkdev_get+0x108/0x320
        [   53.508743]        [<c01713c0>] do_dentry_open+0x218/0x314
        [   53.514496]        [<c0180454>] path_openat+0x4c0/0xf9c
        [   53.519959]        [<c0182044>] do_filp_open+0x5c/0xc0
        [   53.525336]        [<c0172758>] do_sys_open+0xfc/0x1cc
        [   53.530716]        [<c000f740>] ret_fast_syscall+0x0/0x1c
        [   53.536375]
        -> #0 (&new->lock){+.+...}:
        [   53.540587]        [<c063f124>] mutex_lock_nested+0x38/0x3cc
        [   53.546504]        [<c043fe4c>] del_mtd_blktrans_dev+0x80/0xdc
        [   53.552606]        [<c043f164>] blktrans_notify_remove+0x7c/0x84
        [   53.558891]        [<c04399f0>] del_mtd_device+0x74/0x100
        [   53.564544]        [<c043c670>] del_mtd_partitions+0x80/0xc8
        [   53.570451]        [<c0439aa0>] mtd_device_unregister+0x24/0x48
        [   53.576637]        [<c046ce6c>] spi_drv_remove+0x1c/0x34
        [   53.582207]        [<c03de0f0>] __device_release_driver+0x88/0x114
        [   53.588663]        [<c03de19c>] device_release_driver+0x20/0x2c
        [   53.594843]        [<c03dd9e8>] bus_remove_device+0xd8/0x108
        [   53.600748]        [<c03dacc0>] device_del+0x10c/0x210
        [   53.606127]        [<c03dadd0>] device_unregister+0xc/0x20
        [   53.611849]        [<c046d878>] __unregister+0x10/0x20
        [   53.617211]        [<c03da868>] device_for_each_child+0x50/0x7c
        [   53.623387]        [<c046eae8>] spi_unregister_master+0x58/0x8c
        [   53.629578]        [<c03e12f0>] release_nodes+0x15c/0x1c8
        [   53.635223]        [<c03de0f8>] __device_release_driver+0x90/0x114
        [   53.641689]        [<c03de900>] driver_detach+0xb4/0xb8
        [   53.647147]        [<c03ddc78>] bus_remove_driver+0x4c/0xa0
        [   53.652970]        [<c00cab50>] SyS_delete_module+0x11c/0x1e4
        [   53.658976]        [<c000f740>] ret_fast_syscall+0x0/0x1c
        [   53.664621]
        [   53.664621] other info that might help us debug this:
        [   53.664621]
        [   53.672979]  Possible unsafe locking scenario:
        [   53.672979]
        [   53.679169]        CPU0                    CPU1
        [   53.683900]        ----                    ----
        [   53.688633]   lock(mtd_table_mutex);
        [   53.692383]                                lock(&new->lock);
        [   53.698306]                                lock(mtd_table_mutex);
        [   53.704658]   lock(&new->lock);
        [   53.707946]
        [   53.707946]  *** DEADLOCK ***
      
      Fixes: 073db4a5 ("mtd: fix: avoid race condition when accessing mtd->usecount")
      Reported-by: default avatarFelipe Balbi <balbi@ti.com>
      Tested-by: default avatarFelipe Balbi <balbi@ti.com>
      Signed-off-by: default avatarBrian Norris <computersforpeace@gmail.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      0daab66d
    • Marek Vasut's avatar
      can: Use correct type in sizeof() in nla_put() · 6226f407
      Marek Vasut authored
      commit 562b103a upstream.
      
      The sizeof() is invoked on an incorrect variable, likely due to some
      copy-paste error, and this might result in memory corruption. Fix this.
      Signed-off-by: default avatarMarek Vasut <marex@denx.de>
      Cc: Wolfgang Grandegger <wg@grandegger.com>
      Cc: netdev@vger.kernel.org
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      6226f407
    • Robin Murphy's avatar
      arm64: Fix compat register mappings · 8ee06d7a
      Robin Murphy authored
      commit 5accd17d upstream.
      
      For reasons not entirely apparent, but now enshrined in history, the
      architectural mapping of AArch32 banked registers to AArch64 registers
      actually orders SP_<mode> and LR_<mode> backwards compared to the
      intuitive r13/r14 order, for all modes except FIQ.
      
      Fix the compat_<reg>_<mode> macros accordingly, in the hope of avoiding
      subtle bugs with KVM and AArch32 guests.
      Signed-off-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      8ee06d7a
    • David Hildenbrand's avatar
      KVM: s390: SCA must not cross page boundaries · 314e9525
      David Hildenbrand authored
      commit c5c2c393 upstream.
      
      We seemed to have missed a few corner cases in commit f6c137ff
      ("KVM: s390: randomize sca address").
      
      The SCA has a maximum size of 2112 bytes. By setting the sca_offset to
      some unlucky numbers, we exceed the page.
      
      0x7c0 (1984) -> Fits exactly
      0x7d0 (2000) -> 16 bytes out
      0x7e0 (2016) -> 32 bytes out
      0x7f0 (2032) -> 48 bytes out
      
      One VCPU entry is 32 bytes long.
      
      For the last two cases, we actually write data to the other page.
      1. The address of the VCPU.
      2. Injection/delivery/clearing of SIGP externall calls via SIGP IF.
      
      Especially the 2. happens regularly. So this could produce two problems:
      1. The guest losing/getting external calls.
      2. Random memory overwrites in the host.
      
      So this problem happens on every 127 + 128 created VM with 64 VCPUs.
      Acked-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      314e9525
    • sumit.saxena@avagotech.com's avatar
      megaraid_sas: Do not use PAGE_SIZE for max_sectors · c1f1362d
      sumit.saxena@avagotech.com authored
      commit 357ae967 upstream.
      
      Do not use PAGE_SIZE marco to calculate max_sectors per I/O
      request. Driver code assumes PAGE_SIZE will be always 4096 which can
      lead to wrongly calculated value if PAGE_SIZE is not 4096. This issue
      was reported in Ubuntu Bugzilla Bug #1475166.
      Signed-off-by: default avatarSumit Saxena <sumit.saxena@avagotech.com>
      Signed-off-by: default avatarKashyap Desai <kashyap.desai@avagotech.com>
      Reviewed-by: default avatarTomas Henzl <thenzl@redhat.com>
      Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      c1f1362d
    • sumit.saxena@avagotech.com's avatar
      megaraid_sas: Expose TAPE drives unconditionally · dbefeb69
      sumit.saxena@avagotech.com authored
      commit 0d5b47a7 upstream.
      
      Expose non-disk (TAPE drive, CD-ROM) unconditionally.
      Signed-off-by: default avatarSumit Saxena <sumit.saxena@avagotech.com>
      Signed-off-by: default avatarKashyap Desai <kashyap.desai@avagotech.com>
      Reviewed-by: default avatarTomas Henzl <thenzl@redhat.com>
      Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      dbefeb69
    • Vineet Gupta's avatar
      MAINTAINERS: Add public mailing list for ARC · 25a5ac85
      Vineet Gupta authored
      commit 9acdc911 upstream.
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      25a5ac85
    • Takashi Iwai's avatar
      ALSA: hda - Disable 64bit address for Creative HDA controllers · 92e03aea
      Takashi Iwai authored
      commit cadd16ea upstream.
      
      We've had many reports that some Creative sound cards with CA0132
      don't work well.  Some reported that it starts working after reloading
      the module, while some reported it starts working when a 32bit kernel
      is used.  All these facts seem implying that the chip fails to
      communicate when the buffer is located in 64bit address.
      
      This patch addresses these issues by just adding AZX_DCAPS_NO_64BIT
      flag to the corresponding PCI entries.  I casually had a chance to
      test an SB Recon3D board, and indeed this seems helping.
      
      Although this hasn't been tested on all Creative devices, it's safer
      to assume that this restriction applies to the rest of them, too.  So
      the flag is applied to all Creative entries.
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      92e03aea
    • Kailang Yang's avatar
      ALSA: hda/realtek - Dell XPS one ALC3260 speaker no sound after resume back · 021f85fa
      Kailang Yang authored
      commit 6ed1131f upstream.
      
      This machine had I2S codec for speaker output.
      It need to refill the I2S codec initial verb after resume back.
      Signed-off-by: default avatarKailang Yang <kailang@realtek.com>
      Reported-and-tested-by: default avatarGeorge Gugulea <gugulea@gmail.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      021f85fa
    • Chen Yu's avatar
      ACPI / PM: Fix incorrect wakeup IRQ setting during suspend-to-idle · 26b36c5b
      Chen Yu authored
      commit 8c01275e upstream.
      
      For an ACPI compatible system, the SCI (ACPI System Control
      Interrupt) is used to wake the system up from suspend-to-idle.
      Once the CPU is woken up by the SCI, the interrupt handler will
      first check if the current IRQ has been configured for system
      wakeup, so irq_pm_check_wakeup() is invoked to validate the IRQ
      number.  However, during suspend-to-idle, enable_irq_wake() is
      called for acpi_gbl_FADT.sci_interrupt, although the IRQ number
      that the SCI handler has been installed for should be passed to
      it instead.  Thus, if acpi_gbl_FADT.sci_interrupt happens to be
      different from that number, ACPI interrupts will not be able to
      wake up the system from sleep.
      
      Fix this problem by passing the IRQ number returned by
      acpi_gsi_to_irq() to enable_irq_wake() instead of
      acpi_gbl_FADT.sci_interrupt.
      Acked-by: default avatarLv Zheng <lv.zheng@intel.com>
      Signed-off-by: default avatarChen Yu <yu.c.chen@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      26b36c5b