1. 02 Feb, 2015 40 commits
    • Michal Hocko's avatar
      OOM, PM: OOM killed task shouldn't escape PM suspend · 7655f855
      Michal Hocko authored
      commit 5695be14 upstream.
      
      PM freezer relies on having all tasks frozen by the time devices are
      getting frozen so that no task will touch them while they are getting
      frozen. But OOM killer is allowed to kill an already frozen task in
      order to handle OOM situtation. In order to protect from late wake ups
      OOM killer is disabled after all tasks are frozen. This, however, still
      keeps a window open when a killed task didn't manage to die by the time
      freeze_processes finishes.
      
      Reduce the race window by checking all tasks after OOM killer has been
      disabled. This is still not race free completely unfortunately because
      oom_killer_disable cannot stop an already ongoing OOM killer so a task
      might still wake up from the fridge and get killed without
      freeze_processes noticing. Full synchronization of OOM and freezer is,
      however, too heavy weight for this highly unlikely case.
      
      Introduce and check oom_kills counter which gets incremented early when
      the allocator enters __alloc_pages_may_oom path and only check all the
      tasks if the counter changes during the freezing attempt. The counter
      is updated so early to reduce the race window since allocator checked
      oom_killer_disabled which is set by PM-freezing code. A false positive
      will push the PM-freezer into a slow path but that is not a big deal.
      
      Changes since v1
      - push the re-check loop out of freeze_processes into
        check_frozen_processes and invert the condition to make the code more
        readable as per Rafael
      
      Fixes: f660daac (oom: thaw threads if oom killed thread is frozen before deferring)
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      7655f855
    • Oleg Nesterov's avatar
      introduce for_each_thread() to replace the buggy while_each_thread() · b71ec075
      Oleg Nesterov authored
      commit 0c740d0a upstream.
      
      while_each_thread() and next_thread() should die, almost every lockless
      usage is wrong.
      
      1. Unless g == current, the lockless while_each_thread() is not safe.
      
         while_each_thread(g, t) can loop forever if g exits, next_thread()
         can't reach the unhashed thread in this case. Note that this can
         happen even if g is the group leader, it can exec.
      
      2. Even if while_each_thread() itself was correct, people often use
         it wrongly.
      
         It was never safe to just take rcu_read_lock() and loop unless
         you verify that pid_alive(g) == T, even the first next_thread()
         can point to the already freed/reused memory.
      
      This patch adds signal_struct->thread_head and task->thread_node to
      create the normal rcu-safe list with the stable head.  The new
      for_each_thread(g, t) helper is always safe under rcu_read_lock() as
      long as this task_struct can't go away.
      
      Note: of course it is ugly to have both task_struct->thread_node and the
      old task_struct->thread_group, we will kill it later, after we change
      the users of while_each_thread() to use for_each_thread().
      
      Perhaps we can kill it even before we convert all users, we can
      reimplement next_thread(t) using the new thread_head/thread_node.  But
      we can't do this right now because this will lead to subtle behavioural
      changes.  For example, do/while_each_thread() always sees at least one
      task, while for_each_thread() can do nothing if the whole thread group
      has died.  Or thread_group_empty(), currently its semantics is not clear
      unless thread_group_leader(p) and we need to audit the callers before we
      can change it.
      
      So this patch adds the new interface which has to coexist with the old
      one for some time, hopefully the next changes will be more or less
      straightforward and the old one will go away soon.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reviewed-by: default avatarSergey Dyasly <dserrg@gmail.com>
      Tested-by: default avatarSergey Dyasly <dserrg@gmail.com>
      Reviewed-by: default avatarSameer Nanda <snanda@chromium.org>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mandeep Singh Baines <msb@chromium.org>
      Cc: "Ma, Xindong" <xindong.ma@intel.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: "Tu, Xiaobing" <xiaobing.tu@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      b71ec075
    • Oleg Nesterov's avatar
      kernel/fork.c:copy_process(): unify CLONE_THREAD-or-thread_group_leader code · ace595fd
      Oleg Nesterov authored
      commit 80628ca0 upstream.
      
      Cleanup and preparation for the next changes.
      
      Move the "if (clone_flags & CLONE_THREAD)" code down under "if
      (likely(p->pid))" and turn it into into the "else" branch.  This makes the
      process/thread initialization more symmetrical and removes one check.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Sergey Dyasly <dserrg@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      ace595fd
    • Cong Wang's avatar
      freezer: Do not freeze tasks killed by OOM killer · a2ca02f1
      Cong Wang authored
      commit 51fae6da upstream.
      
      Since f660daac (oom: thaw threads if oom killed thread is frozen
      before deferring) OOM killer relies on being able to thaw a frozen task
      to handle OOM situation but a3201227 (freezer: make freezing() test
      freeze conditions in effect instead of TIF_FREEZE) has reorganized the
      code and stopped clearing freeze flag in __thaw_task. This means that
      the target task only wakes up and goes into the fridge again because the
      freezing condition hasn't changed for it. This reintroduces the bug
      fixed by f660daac.
      
      Fix the issue by checking for TIF_MEMDIE thread flag in
      freezing_slow_path and exclude the task from freezing completely. If a
      task was already frozen it would get woken by __thaw_task from OOM killer
      and get out of freezer after rechecking freezing().
      
      Changes since v1
      - put TIF_MEMDIE check into freezing_slowpath rather than in __refrigerator
        as per Oleg
      - return __thaw_task into oom_scan_process_thread because
        oom_kill_process will not wake task in the fridge because it is
        sleeping uninterruptible
      
      [mhocko@suse.cz: rewrote the changelog]
      Fixes: a3201227 (freezer: make freezing() test freeze conditions in effect instead of TIF_FREEZE)
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      a2ca02f1
    • Vlad Catoi's avatar
      ALSA: usb-audio: Add support for Steinberg UR22 USB interface · 7e551647
      Vlad Catoi authored
      commit f0b127fb upstream.
      
      Adding support for Steinberg UR22 USB interface via quirks table patch
      
      See Ubuntu bug report:
      https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1317244
      Also see threads:
      http://linux-audio.4202.n7.nabble.com/Support-for-Steinberg-UR22-Yamaha-USB-chipset-0499-1509-tc82888.html#a82917
      http://www.steinberg.net/forums/viewtopic.php?t=62290
      
      Tested by at least 4 people judging by the threads.
      Did not test MIDI interface, but audio output and capture both are
      functional. Built 3.17 kernel with this driver on Ubuntu 14.04 & tested with mpg123
      Patch applied to 3.13 Ubuntu kernel works well enough for daily use.
      Signed-off-by: default avatarVlad Catoi <vladcatoi@gmail.com>
      Acked-by: default avatarClemens Ladisch <clemens@ladisch.de>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      7e551647
    • Anatol Pomozov's avatar
      ALSA: pcm: use the same dma mmap codepath both for arm and arm64 · 7a13f726
      Anatol Pomozov authored
      commit a011e213 upstream.
      
      This avoids following kernel crash when try to playback on arm64
      
      [  107.497203] [<ffffffc00046b310>] snd_pcm_mmap_data_fault+0x90/0xd4
      [  107.503405] [<ffffffc0001541ac>] __do_fault+0xb0/0x498
      [  107.508565] [<ffffffc0001576a0>] handle_mm_fault+0x224/0x7b0
      [  107.514246] [<ffffffc000092640>] do_page_fault+0x11c/0x310
      [  107.519738] [<ffffffc000081100>] do_mem_abort+0x38/0x98
      
      Tested: backported to 3.14 and tried to playback on arm64 machine
      Signed-off-by: default avatarAnatol Pomozov <anatol.pomozov@gmail.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      7a13f726
    • Daniel Borkmann's avatar
      random: add and use memzero_explicit() for clearing data · 6a16b0d0
      Daniel Borkmann authored
      commit d4c5efdb upstream.
      
      zatimend has reported that in his environment (3.16/gcc4.8.3/corei7)
      memset() calls which clear out sensitive data in extract_{buf,entropy,
      entropy_user}() in random driver are being optimized away by gcc.
      
      Add a helper memzero_explicit() (similarly as explicit_bzero() variants)
      that can be used in such cases where a variable with sensitive data is
      being cleared out in the end. Other use cases might also be in crypto
      code. [ I have put this into lib/string.c though, as it's always built-in
      and doesn't need any dependencies then. ]
      
      Fixes kernel bugzilla: 82041
      
      Reported-by: zatimend@hotmail.co.uk
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      [lizf: Backported to 3.4:
       - adjust context
       - another memset() in extract_buf() needs to be converted]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      6a16b0d0
    • Cesar Eduardo Barros's avatar
      crypto: more robust crypto_memneq · e4425815
      Cesar Eduardo Barros authored
      commit fe8c8a12 upstream.
      
      [Only use the compiler.h portion of this patch, to get the
      OPTIMIZER_HIDE_VAR() macro, which we need for other -stable patches
      - gregkh]
      
      Disabling compiler optimizations can be fragile, since a new
      optimization could be added to -O0 or -Os that breaks the assumptions
      the code is making.
      
      Instead of disabling compiler optimizations, use a dummy inline assembly
      (based on RELOC_HIDE) to block the problematic kinds of optimization,
      while still allowing other optimizations to be applied to the code.
      
      The dummy inline assembly is added after every OR, and has the
      accumulator variable as its input and output. The compiler is forced to
      assume that the dummy inline assembly could both depend on the
      accumulator variable and change the accumulator variable, so it is
      forced to compute the value correctly before the inline assembly, and
      cannot assume anything about its value after the inline assembly.
      
      This change should be enough to make crypto_memneq work correctly (with
      data-independent timing) even if it is inlined at its call sites. That
      can be done later in a followup patch.
      
      Compile-tested on x86_64.
      Signed-off-by: default avatarCesar Eduardo Barros <cesarb@cesarb.eti.br>
      Acked-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      e4425815
    • Eric Sandeen's avatar
      ext4: fix reservation overflow in ext4_da_write_begin · e7ce7b47
      Eric Sandeen authored
      commit 0ff8947f upstream.
      
      Delalloc write journal reservations only reserve 1 credit,
      to update the inode if necessary.  However, it may happen
      once in a filesystem's lifetime that a file will cross
      the 2G threshold, and require the LARGE_FILE feature to
      be set in the superblock as well, if it was not set already.
      
      This overruns the transaction reservation, and can be
      demonstrated simply on any ext4 filesystem without the LARGE_FILE
      feature already set:
      
      dd if=/dev/zero of=testfile bs=1 seek=2147483646 count=1 \
      	conv=notrunc of=testfile
      sync
      dd if=/dev/zero of=testfile bs=1 seek=2147483647 count=1 \
      	conv=notrunc of=testfile
      
      leads to:
      
      EXT4-fs: ext4_do_update_inode:4296: aborting transaction: error 28 in __ext4_handle_dirty_super
      EXT4-fs error (device loop0) in ext4_do_update_inode:4301: error 28
      EXT4-fs error (device loop0) in ext4_reserve_inode_write:4757: Readonly filesystem
      EXT4-fs error (device loop0) in ext4_dirty_inode:4876: error 28
      EXT4-fs error (device loop0) in ext4_da_write_end:2685: error 28
      
      Adjust the number of credits based on whether the flag is
      already set, and whether the current write may extend past the
      LARGE_FILE limit.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarAndreas Dilger <adilger@dilger.ca>
      [lizf: Backported to 3.4:
       - adjust context
       - ext4_journal_start() has no parameter type]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      e7ce7b47
    • Theodore Ts'o's avatar
      ext4: add ext4_iget_normal() which is to be used for dir tree lookups · 07cf4db3
      Theodore Ts'o authored
      commit f4bb2981 upstream.
      
      If there is a corrupted file system which has directory entries that
      point at reserved, metadata inodes, prohibit them from being used by
      treating them the same way we treat Boot Loader inodes --- that is,
      mark them to be bad inodes.  This prohibits them from being opened,
      deleted, or modified via chmod, chown, utimes, etc.
      
      In particular, this prevents a corrupted file system which has a
      directory entry which points at the journal inode from being deleted
      and its blocks released, after which point Much Hilarity Ensues.
      Reported-by: default avatarSami Liedes <sami.liedes@iki.fi>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      07cf4db3
    • Theodore Ts'o's avatar
      ext4: don't orphan or truncate the boot loader inode · 07048f9e
      Theodore Ts'o authored
      commit e2bfb088 upstream.
      
      The boot loader inode (inode #5) should never be visible in the
      directory hierarchy, but it's possible if the file system is corrupted
      that there will be a directory entry that points at inode #5.  In
      order to avoid accidentally trashing it, when such a directory inode
      is opened, the inode will be marked as a bad inode, so that it's not
      possible to modify (or read) the inode from userspace.
      
      Unfortunately, when we unlink this (invalid/illegal) directory entry,
      we will put the bad inode on the ophan list, and then when try to
      unlink the directory, we don't actually remove the bad inode from the
      orphan list before freeing in-memory inode structure.  This means the
      in-memory orphan list is corrupted, leading to a kernel oops.
      
      In addition, avoid truncating a bad inode in ext4_destroy_inode(),
      since truncating the boot loader inode is not a smart thing to do.
      Reported-by: default avatarSami Liedes <sami.liedes@iki.fi>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      07048f9e
    • Jan Kara's avatar
      vfs: fix data corruption when blocksize < pagesize for mmaped data · 60e7100a
      Jan Kara authored
      commit 90a80202 upstream.
      
      ->page_mkwrite() is used by filesystems to allocate blocks under a page
      which is becoming writeably mmapped in some process' address space. This
      allows a filesystem to return a page fault if there is not enough space
      available, user exceeds quota or similar problem happens, rather than
      silently discarding data later when writepage is called.
      
      However VFS fails to call ->page_mkwrite() in all the cases where
      filesystems need it when blocksize < pagesize. For example when
      blocksize = 1024, pagesize = 4096 the following is problematic:
        ftruncate(fd, 0);
        pwrite(fd, buf, 1024, 0);
        map = mmap(NULL, 1024, PROT_WRITE, MAP_SHARED, fd, 0);
        map[0] = 'a';       ----> page_mkwrite() for index 0 is called
        ftruncate(fd, 10000); /* or even pwrite(fd, buf, 1, 10000) */
        mremap(map, 1024, 10000, 0);
        map[4095] = 'a';    ----> no page_mkwrite() called
      
      At the moment ->page_mkwrite() is called, filesystem can allocate only
      one block for the page because i_size == 1024. Otherwise it would create
      blocks beyond i_size which is generally undesirable. But later at
      ->writepage() time, we also need to store data at offset 4095 but we
      don't have block allocated for it.
      
      This patch introduces a helper function filesystems can use to have
      ->page_mkwrite() called at all the necessary moments.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      [lizf: Backported to 3.4:
       - adjust context
       - truncate_setsize() already has an oldsize variable]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      60e7100a
    • Quinn Tran's avatar
      target: Fix queue full status NULL pointer for SCF_TRANSPORT_TASK_SENSE · e306b0da
      Quinn Tran authored
      commit 082f58ac upstream.
      
      During temporary resource starvation at lower transport layer, command
      is placed on queue full retry path, which expose this problem.  The TCM
      queue full handling of SCF_TRANSPORT_TASK_SENSE currently sends the same
      cmd twice to lower layer.  The 1st time led to cmd normal free path.
      The 2nd time cause Null pointer access.
      
      This regression bug was originally introduced v3.1-rc code in the
      following commit:
      
      commit e057f533
      Author: Christoph Hellwig <hch@infradead.org>
      Date:   Mon Oct 17 13:56:41 2011 -0400
      
          target: remove the transport_qf_callback se_cmd callback
      Signed-off-by: default avatarQuinn Tran <quinn.tran@qlogic.com>
      Signed-off-by: default avatarSaurav Kashyap <saurav.kashyap@qlogic.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      e306b0da
    • Jan Kara's avatar
      ext4: don't check quota format when there are no quota files · a38c4d8a
      Jan Kara authored
      commit 279bf6d3 upstream.
      
      The check whether quota format is set even though there are no
      quota files with journalled quota is pointless and it actually
      makes it impossible to turn off journalled quotas (as there's
      no way to unset journalled quota format). Just remove the check.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      a38c4d8a
    • Darrick J. Wong's avatar
      ext4: check EA value offset when loading · b0fea9c1
      Darrick J. Wong authored
      commit a0626e75 upstream.
      
      When loading extended attributes, check each entry's value offset to
      make sure it doesn't collide with the entries.
      
      Without this check it is easy to crash the kernel by mounting a
      malicious FS containing a file with an EA wherein e_value_offs = 0 and
      e_value_size > 0 and then deleting the EA, which corrupts the name
      list.
      
      (See the f_ea_value_crash test's FS image in e2fsprogs for an example.)
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      b0fea9c1
    • Andy Lutomirski's avatar
      x86,kvm,vmx: Preserve CR4 across VM entry · 4e2c6422
      Andy Lutomirski authored
      commit d974baa3 upstream.
      
      CR4 isn't constant; at least the TSD and PCE bits can vary.
      
      TBH, treating CR0 and CR3 as constant scares me a bit, too, but it looks
      like it's correct.
      
      This adds a branch and a read from cr4 to each vm entry.  Because it is
      extremely likely that consecutive entries into the same vcpu will have
      the same host cr4 value, this fixes up the vmcs instead of restoring cr4
      after the fact.  A subsequent patch will add a kernel-wide cr4 shadow,
      reducing the overhead in the common case to just two memory reads and a
      branch.
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Acked-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Cc: Petr Matousek <pmatouse@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [lizf: Backported to 3.4:
       - adjust context
       - add parameter struct vcpu_vmx *vmx to vmx_set_constant_host_state()]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      4e2c6422
    • Catalin Marinas's avatar
      futex: Ensure get_futex_key_refs() always implies a barrier · 9e9aab5d
      Catalin Marinas authored
      commit 76835b0e upstream.
      
      Commit b0c29f79 (futexes: Avoid taking the hb->lock if there's
      nothing to wake up) changes the futex code to avoid taking a lock when
      there are no waiters. This code has been subsequently fixed in commit
      11d4616b (futex: revert back to the explicit waiter counting code).
      Both the original commit and the fix-up rely on get_futex_key_refs() to
      always imply a barrier.
      
      However, for private futexes, none of the cases in the switch statement
      of get_futex_key_refs() would be hit and the function completes without
      a memory barrier as required before checking the "waiters" in
      futex_wake() -> hb_waiters_pending(). The consequence is a race with a
      thread waiting on a futex on another CPU, allowing the waker thread to
      read "waiters == 0" while the waiter thread to have read "futex_val ==
      locked" (in kernel).
      
      Without this fix, the problem (user space deadlocks) can be seen with
      Android bionic's mutex implementation on an arm64 multi-cluster system.
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reported-by: default avatarMatteo Franchin <Matteo.Franchin@arm.com>
      Fixes: b0c29f79 (futexes: Avoid taking the hb->lock if there's nothing to wake up)
      Acked-by: default avatarDavidlohr Bueso <dave@stgolabs.net>
      Tested-by: default avatarMike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Darren Hart <dvhart@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      9e9aab5d
    • Stephen Smalley's avatar
      selinux: fix inode security list corruption · e8ab53a5
      Stephen Smalley authored
      commit 923190d3 upstream.
      
      sb_finish_set_opts() can race with inode_free_security()
      when initializing inode security structures for inodes
      created prior to initial policy load or by the filesystem
      during ->mount().   This appears to have always been
      a possible race, but commit 3dc91d43 ("SELinux:  Fix possible
      NULL pointer dereference in selinux_inode_permission()")
      made it more evident by immediately reusing the unioned
      list/rcu element  of the inode security structure for call_rcu()
      upon an inode_free_security().  But the underlying issue
      was already present before that commit as a possible use-after-free
      of isec.
      
      Shivnandan Kumar reported the list corruption and proposed
      a patch to split the list and rcu elements out of the union
      as separate fields of the inode_security_struct so that setting
      the rcu element would not affect the list element.  However,
      this would merely hide the issue and not truly fix the code.
      
      This patch instead moves up the deletion of the list entry
      prior to dropping the sbsec->isec_lock initially.  Then,
      if the inode is dropped subsequently, there will be no further
      references to the isec.
      Reported-by: default avatarShivnandan Kumar <shivnandan.k@samsung.com>
      Signed-off-by: default avatarStephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: default avatarPaul Moore <pmoore@redhat.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      e8ab53a5
    • Michael S. Tsirkin's avatar
      virtio_pci: fix virtio spec compliance on restore · fb7eb2f7
      Michael S. Tsirkin authored
      commit 6fbc198c upstream.
      
      On restore, virtio pci does the following:
      + set features
      + init vqs etc - device can be used at this point!
      + set ACKNOWLEDGE,DRIVER and DRIVER_OK status bits
      
      This is in violation of the virtio spec, which
      requires the following order:
      - ACKNOWLEDGE
      - DRIVER
      - init vqs
      - DRIVER_OK
      
      This behaviour will break with hypervisors that assume spec compliant
      behaviour.  It seems like a good idea to have this patch applied to
      stable branches to reduce the support butden for the hypervisors.
      
      Cc: Amit Shah <amit.shah@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      fb7eb2f7
    • Eric W. Biederman's avatar
      mnt: Prevent pivot_root from creating a loop in the mount tree · 9f7d53c0
      Eric W. Biederman authored
      commit 0d082601 upstream.
      
      Andy Lutomirski recently demonstrated that when chroot is used to set
      the root path below the path for the new ``root'' passed to pivot_root
      the pivot_root system call succeeds and leaks mounts.
      
      In examining the code I see that starting with a new root that is
      below the current root in the mount tree will result in a loop in the
      mount tree after the mounts are detached and then reattached to one
      another.  Resulting in all kinds of ugliness including a leak of that
      mounts involved in the leak of the mount loop.
      
      Prevent this problem by ensuring that the new mount is reachable from
      the current root of the mount tree.
      
      [Added stable cc.  Fixes CVE-2014-7970.  --Andy]
      Reported-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Reviewed-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Link: http://lkml.kernel.org/r/87bnpmihks.fsf@x220.int.ebiederm.orgSigned-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      9f7d53c0
    • Takashi Iwai's avatar
      ALSA: emu10k1: Fix deadlock in synth voice lookup · e65c1a23
      Takashi Iwai authored
      commit 95926035 upstream.
      
      The emu10k1 voice allocator takes voice_lock spinlock.  When there is
      no empty stream available, it tries to release a voice used by synth,
      and calls get_synth_voice.  The callback function,
      snd_emu10k1_synth_get_voice(), however, also takes the voice_lock,
      thus it deadlocks.
      
      The fix is simply removing the voice_lock holds in
      snd_emu10k1_synth_get_voice(), as this is always called in the
      spinlock context.
      Reported-and-tested-by: default avatarArthur Marsh <arthur.marsh@internode.on.net>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      e65c1a23
    • Sasha Levin's avatar
      kernel: add support for gcc 5 · bf70aaaa
      Sasha Levin authored
      commit 71458cfc upstream.
      
      We're missing include/linux/compiler-gcc5.h which is required now
      because gcc branched off to v5 in trunk.
      
      Just copy the relevant bits out of include/linux/compiler-gcc4.h,
      no new code is added as of now.
      
      This fixes a build error when using gcc 5.
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      bf70aaaa
    • Hans de Goede's avatar
      Input: i8042 - add noloop quirk for Asus X750LN · fbc94908
      Hans de Goede authored
      commit 9ff84a17 upstream.
      
      Without this the aux port does not get detected, and consequently the
      touchpad will not work.
      
      https://bugzilla.redhat.com/show_bug.cgi?id=1110011Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      fbc94908
    • Dmitry Torokhov's avatar
      Input: synaptics - gate forcepad support by DMI check · 8000fbfa
      Dmitry Torokhov authored
      commit aa972409 upstream.
      
      Unfortunately, ForcePad capability is not actually exported over PS/2, so
      we have to resort to DMI checks.
      Reported-by: default avatarNicole Faerber <nicole.faerber@kernelconcepts.de>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      8000fbfa
    • Yann Droneaud's avatar
      fanotify: enable close-on-exec on events' fd when requested in fanotify_init() · 7baa56f6
      Yann Droneaud authored
      commit 0b37e097 upstream.
      
      According to commit 80af2588 ("fanotify: groups can specify their
      f_flags for new fd"), file descriptors created as part of file access
      notification events inherit flags from the event_f_flags argument passed
      to syscall fanotify_init(2)[1].
      
      Unfortunately O_CLOEXEC is currently silently ignored.
      
      Indeed, event_f_flags are only given to dentry_open(), which only seems to
      care about O_ACCMODE and O_PATH in do_dentry_open(), O_DIRECT in
      open_check_o_direct() and O_LARGEFILE in generic_file_open().
      
      It's a pity, since, according to some lookup on various search engines and
      http://codesearch.debian.net/, there's already some userspace code which
      use O_CLOEXEC:
      
      - in systemd's readahead[2]:
      
          fanotify_fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK, O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);
      
      - in clsync[3]:
      
          #define FANOTIFY_EVFLAGS (O_LARGEFILE|O_RDONLY|O_CLOEXEC)
      
          int fanotify_d = fanotify_init(FANOTIFY_FLAGS, FANOTIFY_EVFLAGS);
      
      - in examples [4] from "Filesystem monitoring in the Linux
        kernel" article[5] by Aleksander Morgado:
      
          if ((fanotify_fd = fanotify_init (FAN_CLOEXEC,
                                            O_RDONLY | O_CLOEXEC | O_LARGEFILE)) < 0)
      
      Additionally, since commit 48149e9d ("fanotify: check file flags
      passed in fanotify_init").  having O_CLOEXEC as part of fanotify_init()
      second argument is expressly allowed.
      
      So it seems expected to set close-on-exec flag on the file descriptors if
      userspace is allowed to request it with O_CLOEXEC.
      
      But Andrew Morton raised[6] the concern that enabling now close-on-exec
      might break existing applications which ask for O_CLOEXEC but expect the
      file descriptor to be inherited across exec().
      
      In the other hand, as reported by Mihai Dontu[7] close-on-exec on the file
      descriptor returned as part of file access notify can break applications
      due to deadlock.  So close-on-exec is needed for most applications.
      
      More, applications asking for close-on-exec are likely expecting it to be
      enabled, relying on O_CLOEXEC being effective.  If not, it might weaken
      their security, as noted by Jan Kara[8].
      
      So this patch replaces call to macro get_unused_fd() by a call to function
      get_unused_fd_flags() with event_f_flags value as argument.  This way
      O_CLOEXEC flag in the second argument of fanotify_init(2) syscall is
      interpreted and close-on-exec get enabled when requested.
      
      [1] http://man7.org/linux/man-pages/man2/fanotify_init.2.html
      [2] http://cgit.freedesktop.org/systemd/systemd/tree/src/readahead/readahead-collect.c?id=v208#n294
      [3] https://github.com/xaionaro/clsync/blob/v0.2.1/sync.c#L1631
          https://github.com/xaionaro/clsync/blob/v0.2.1/configuration.h#L38
      [4] http://www.lanedo.com/~aleksander/fanotify/fanotify-example.c
      [5] http://www.lanedo.com/2013/filesystem-monitoring-linux-kernel/
      [6] http://lkml.kernel.org/r/20141001153621.65e9258e65a6167bf2e4cb50@linux-foundation.org
      [7] http://lkml.kernel.org/r/20141002095046.3715eb69@mdontu-l
      [8] http://lkml.kernel.org/r/20141002104410.GB19748@quack.suse.cz
      
      Link: http://lkml.kernel.org/r/cover.1411562410.git.ydroneaud@opteya.comSigned-off-by: default avatarYann Droneaud <ydroneaud@opteya.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reviewed by: Heinrich Schuchardt <xypron.glpk@gmx.de>
      Tested-by: default avatarHeinrich Schuchardt <xypron.glpk@gmx.de>
      Cc: Mihai Don\u021bu <mihai.dontu@gmail.com>
      Cc: Pádraig Brady <P@draigBrady.com>
      Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
      Cc: Michael Kerrisk-manpages <mtk.manpages@gmail.com>
      Cc: Lino Sanfilippo <LinoSanfilippo@gmx.de>
      Cc: Richard Guy Briggs <rgb@redhat.com>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      7baa56f6
    • Mike Snitzer's avatar
      block: fix alignment_offset math that assumes io_min is a power-of-2 · c76a73b3
      Mike Snitzer authored
      commit b8839b8c upstream.
      
      The math in both blk_stack_limits() and queue_limit_alignment_offset()
      assume that a block device's io_min (aka minimum_io_size) is always a
      power-of-2.  Fix the math such that it works for non-power-of-2 io_min.
      
      This issue (of alignment_offset != 0) became apparent when testing
      dm-thinp with a thinp blocksize that matches a RAID6 stripesize of
      1280K.  Commit fdfb4c8c ("dm thin: set minimum_io_size to pool's data
      block size") unlocked the potential for alignment_offset != 0 due to
      the dm-thin-pool's io_min possibly being a non-power-of-2.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      c76a73b3
    • Al Viro's avatar
      fix misuses of f_count() in ppp and netlink · 15c0af9c
      Al Viro authored
      commit 24dff96a upstream.
      
      we used to check for "nobody else could start doing anything with
      that opened file" by checking that refcount was 2 or less - one
      for descriptor table and one we'd acquired in fget() on the way to
      wherever we are.  That was race-prone (somebody else might have
      had a reference to descriptor table and do fget() just as we'd
      been checking) and it had become flat-out incorrect back when
      we switched to fget_light() on those codepaths - unlike fget(),
      it doesn't grab an extra reference unless the descriptor table
      is shared.  The same change allowed a race-free check, though -
      we are safe exactly when refcount is less than 2.
      
      It was a long time ago; pre-2.6.12 for ioctl() (the codepath leading
      to ppp one) and 2.6.17 for sendmsg() (netlink one).  OTOH,
      netlink hadn't grown that check until 3.9 and ppp used to live
      in drivers/net, not drivers/net/ppp until 3.1.  The bug existed
      well before that, though, and the same fix used to apply in old
      location of file.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      [lizf: Backported to 3.4: drop the change to netlink_mmap_sendmsg()]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      15c0af9c
    • Mikulas Patocka's avatar
      fs: make cont_expand_zero interruptible · 2551b5ed
      Mikulas Patocka authored
      commit c2ca0fcd upstream.
      
      This patch makes it possible to kill a process looping in
      cont_expand_zero. A process may spend a lot of time in this function, so
      it is desirable to be able to kill it.
      
      It happened to me that I wanted to copy a piece data from the disk to a
      file. By mistake, I used the "seek" parameter to dd instead of "skip". Due
      to the "seek" parameter, dd attempted to extend the file and became stuck
      doing so - the only possibility was to reset the machine or wait many
      hours until the filesystem runs out of space and cont_expand_zero fails.
      We need this patch to be able to terminate the process.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      2551b5ed
    • Tetsuo Handa's avatar
      fs: Fix theoretical division by 0 in super_cache_scan(). · 2ea17e67
      Tetsuo Handa authored
      commit 475d0db7 upstream.
      
      total_objects could be 0 and is used as a denom.
      
      While total_objects is a "long", total_objects == 0 unlikely happens for
      3.12 and later kernels because 32-bit architectures would not be able to
      hold (1 << 32) objects. However, total_objects == 0 may happen for kernels
      between 3.1 and 3.11 because total_objects in prune_super() was an "int"
      and (e.g.) x86_64 architecture might be able to hold (1 << 32) objects.
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      2ea17e67
    • Ben Hutchings's avatar
      x86: Reject x32 executables if x32 ABI not supported · db55550d
      Ben Hutchings authored
      commit 0e6d3112 upstream.
      
      It is currently possible to execve() an x32 executable on an x86_64
      kernel that has only ia32 compat enabled.  However all its syscalls
      will fail, even _exit().  This usually causes it to segfault.
      
      Change the ELF compat architecture check so that x32 executables are
      rejected if we don't support the x32 ABI.
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Link: http://lkml.kernel.org/r/1410120305.6822.9.camel@decadent.org.ukSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      db55550d
    • Scott Carter's avatar
      pata_serverworks: disable 64-KB DMA transfers on Broadcom OSB4 IDE Controller · 9be2cb10
      Scott Carter authored
      commit 37017ac6 upstream.
      
      The Broadcom OSB4 IDE Controller (vendor and device IDs: 1166:0211)
      does not support 64-KB DMA transfers.
      Whenever a 64-KB DMA transfer is attempted,
      the transfer fails and messages similar to the following
      are written to the console log:
      
         [ 2431.851125] sr 0:0:0:0: [sr0] Unhandled sense code
         [ 2431.851139] sr 0:0:0:0: [sr0]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
         [ 2431.851152] sr 0:0:0:0: [sr0]  Sense Key : Hardware Error [current]
         [ 2431.851166] sr 0:0:0:0: [sr0]  Add. Sense: Logical unit communication time-out
         [ 2431.851182] sr 0:0:0:0: [sr0] CDB: Read(10): 28 00 00 00 76 f4 00 00 40 00
         [ 2431.851210] end_request: I/O error, dev sr0, sector 121808
      
      When the libata and pata_serverworks modules
      are recompiled with ATA_DEBUG and ATA_VERBOSE_DEBUG defined in libata.h,
      the 64-KB transfer size in the scatter-gather list can be seen
      in the console log:
      
         [ 2664.897267] sr 9:0:0:0: [sr0] Send:
         [ 2664.897274] 0xf63d85e0
         [ 2664.897283] sr 9:0:0:0: [sr0] CDB:
         [ 2664.897288] Read(10): 28 00 00 00 7f b4 00 00 40 00
         [ 2664.897319] buffer = 0xf6d6fbc0, bufflen = 131072, queuecommand 0xf81b7700
         [ 2664.897331] ata_scsi_dump_cdb: CDB (1:0,0,0) 28 00 00 00 7f b4 00 00 40
         [ 2664.897338] ata_scsi_translate: ENTER
         [ 2664.897345] ata_sg_setup: ENTER, ata1
         [ 2664.897356] ata_sg_setup: 3 sg elements mapped
         [ 2664.897364] ata_bmdma_fill_sg: PRD[0] = (0x66FD2000, 0xE000)
         [ 2664.897371] ata_bmdma_fill_sg: PRD[1] = (0x65000000, 0x10000)
         ------------------------------------------------------> =======
         [ 2664.897378] ata_bmdma_fill_sg: PRD[2] = (0x66A10000, 0x2000)
         [ 2664.897386] ata1: ata_dev_select: ENTER, device 0, wait 1
         [ 2664.897422] ata_sff_tf_load: feat 0x1 nsect 0x0 lba 0x0 0x0 0xFC
         [ 2664.897428] ata_sff_tf_load: device 0xA0
         [ 2664.897448] ata_sff_exec_command: ata1: cmd 0xA0
         [ 2664.897457] ata_scsi_translate: EXIT
         [ 2664.897462] leaving scsi_dispatch_cmnd()
         [ 2664.897497] Doing sr request, dev = sr0, block = 0
         [ 2664.897507] sr0 : reading 64/256 512 byte blocks.
         [ 2664.897553] ata_sff_hsm_move: ata1: protocol 7 task_state 1 (dev_stat 0x58)
         [ 2664.897560] atapi_send_cdb: send cdb
         [ 2666.910058] ata_bmdma_port_intr: ata1: host_stat 0x64
         [ 2666.910079] __ata_sff_port_intr: ata1: protocol 7 task_state 3
         [ 2666.910093] ata_sff_hsm_move: ata1: protocol 7 task_state 3 (dev_stat 0x51)
         [ 2666.910101] ata_sff_hsm_move: ata1: protocol 7 task_state 4 (dev_stat 0x51)
         [ 2666.910129] sr 9:0:0:0: [sr0] Done:
         [ 2666.910136] 0xf63d85e0 TIMEOUT
      
      lspci shows that the driver used for the Broadcom OSB4 IDE Controller is
      pata_serverworks:
      
         00:0f.1 IDE interface: Broadcom OSB4 IDE Controller (prog-if 8e [Master SecP SecO PriP])
                 Flags: bus master, medium devsel, latency 64
                 [virtual] Memory at 000001f0 (32-bit, non-prefetchable) [size=8]
                 [virtual] Memory at 000003f0 (type 3, non-prefetchable) [size=1]
                 I/O ports at 0170 [size=8]
                 I/O ports at 0374 [size=4]
                 I/O ports at 1440 [size=16]
                 Kernel driver in use: pata_serverworks
      
      The pata_serverworks driver supports five distinct device IDs,
      one being the OSB4 and the other four belonging to the CSB series.
      The CSB series appears to support 64-KB DMA transfers,
      as tests on a machine with an SAI2 motherboard
      containing a Broadcom CSB5 IDE Controller (vendor and device IDs: 1166:0212)
      showed no problems with 64-KB DMA transfers.
      
      This problem was first discovered when attempting to install openSUSE
      from a DVD on a machine with an STL2 motherboard.
      Using the pata_serverworks module,
      older releases of openSUSE will not install at all due to the timeouts.
      Releases of openSUSE prior to 11.3 can be installed by disabling
      the pata_serverworks module using the brokenmodules boot parameter,
      which causes the serverworks module to be used instead.
      Recent releases of openSUSE (12.2 and later) include better error recovery and
      will install, though very slowly.
      On all openSUSE releases, the problem can be recreated
      on a machine containing a Broadcom OSB4 IDE Controller
      by mounting an install DVD and running a command similar to the following:
      
         find /mnt -type f -print | xargs cat > /dev/null
      
      The patch below corrects the problem.
      Similar to the other ATA drivers that do not support 64-KB DMA transfers,
      the patch changes the ata_port_operations qc_prep vector to point to a routine
      that breaks any 64-KB segment into two 32-KB segments and
      changes the scsi_host_template sg_tablesize element to reduce by half
      the number of scatter/gather elements allowed.
      These two changes affect only the OSB4.
      Signed-off-by: default avatarScott Carter <ccscott@funsoft.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      9be2cb10
    • Chao Yu's avatar
      ecryptfs: avoid to access NULL pointer when write metadata in xattr · 771f8a87
      Chao Yu authored
      commit 35425ea2 upstream.
      
      Christopher Head 2014-06-28 05:26:20 UTC described:
      "I tried to reproduce this on 3.12.21. Instead, when I do "echo hello > foo"
      in an ecryptfs mount with ecryptfs_xattr specified, I get a kernel crash:
      
      BUG: unable to handle kernel NULL pointer dereference at           (null)
      IP: [<ffffffff8110eb39>] fsstack_copy_attr_all+0x2/0x61
      PGD d7840067 PUD b2c3c067 PMD 0
      Oops: 0002 [#1] SMP
      Modules linked in: nvidia(PO)
      CPU: 3 PID: 3566 Comm: bash Tainted: P           O 3.12.21-gentoo-r1 #2
      Hardware name: ASUSTek Computer Inc. G60JX/G60JX, BIOS 206 03/15/2010
      task: ffff8801948944c0 ti: ffff8800bad70000 task.ti: ffff8800bad70000
      RIP: 0010:[<ffffffff8110eb39>]  [<ffffffff8110eb39>] fsstack_copy_attr_all+0x2/0x61
      RSP: 0018:ffff8800bad71c10  EFLAGS: 00010246
      RAX: 00000000000181a4 RBX: ffff880198648480 RCX: 0000000000000000
      RDX: 0000000000000004 RSI: ffff880172010450 RDI: 0000000000000000
      RBP: ffff880198490e40 R08: 0000000000000000 R09: 0000000000000000
      R10: ffff880172010450 R11: ffffea0002c51e80 R12: 0000000000002000
      R13: 000000000000001a R14: 0000000000000000 R15: ffff880198490e40
      FS:  00007ff224caa700(0000) GS:ffff88019fcc0000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 00000000bb07f000 CR4: 00000000000007e0
      Stack:
      ffffffff811826e8 ffff8800a39d8000 0000000000000000 000000000000001a
      ffff8800a01d0000 ffff8800a39d8000 ffffffff81185fd5 ffffffff81082c2c
      00000001a39d8000 53d0abbc98490e40 0000000000000037 ffff8800a39d8220
      Call Trace:
      [<ffffffff811826e8>] ? ecryptfs_setxattr+0x40/0x52
      [<ffffffff81185fd5>] ? ecryptfs_write_metadata+0x1b3/0x223
      [<ffffffff81082c2c>] ? should_resched+0x5/0x23
      [<ffffffff8118322b>] ? ecryptfs_initialize_file+0xaf/0xd4
      [<ffffffff81183344>] ? ecryptfs_create+0xf4/0x142
      [<ffffffff810f8c0d>] ? vfs_create+0x48/0x71
      [<ffffffff810f9c86>] ? do_last.isra.68+0x559/0x952
      [<ffffffff810f7ce7>] ? link_path_walk+0xbd/0x458
      [<ffffffff810fa2a3>] ? path_openat+0x224/0x472
      [<ffffffff810fa7bd>] ? do_filp_open+0x2b/0x6f
      [<ffffffff81103606>] ? __alloc_fd+0xd6/0xe7
      [<ffffffff810ee6ab>] ? do_sys_open+0x65/0xe9
      [<ffffffff8157d022>] ? system_call_fastpath+0x16/0x1b
      RIP  [<ffffffff8110eb39>] fsstack_copy_attr_all+0x2/0x61
      RSP <ffff8800bad71c10>
      CR2: 0000000000000000
      ---[ end trace df9dba5f1ddb8565 ]---"
      
      If we create a file when we mount with ecryptfs_xattr_metadata option, we will
      encounter a crash in this path:
      ->ecryptfs_create
        ->ecryptfs_initialize_file
          ->ecryptfs_write_metadata
            ->ecryptfs_write_metadata_to_xattr
              ->ecryptfs_setxattr
                ->fsstack_copy_attr_all
      It's because our dentry->d_inode used in fsstack_copy_attr_all is NULL, and it
      will be initialized when ecryptfs_initialize_file finish.
      
      So we should skip copying attr from lower inode when the value of ->d_inode is
      invalid.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarTyler Hicks <tyhicks@canonical.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      771f8a87
    • Alexey Khoroshilov's avatar
      dm log userspace: fix memory leak in dm_ulog_tfr_init failure path · dbd43539
      Alexey Khoroshilov authored
      commit 56ec16cb upstream.
      
      If cn_add_callback() fails in dm_ulog_tfr_init(), it does not
      deallocate prealloced memory but calls cn_del_callback().
      
      Found by Linux Driver Verification project (linuxtesting.org).
      Signed-off-by: default avatarAlexey Khoroshilov <khoroshilov@ispras.ru>
      Reviewed-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      dbd43539
    • Joe Thornber's avatar
      dm bufio: update last_accessed when relinking a buffer · adcbd2e5
      Joe Thornber authored
      commit eb76faf5 upstream.
      
      The 'last_accessed' member of the dm_buffer structure was only set when
      the the buffer was created.  This led to each buffer being discarded
      after dm_bufio_max_age time even if it was used recently.  In practice
      this resulted in all thinp metadata being evicted soon after being read
      -- this is particularly problematic for metadata intensive workloads
      like multithreaded small random IO.
      
      'last_accessed' is now updated each time the buffer is moved to the head
      of the LRU list, so the buffer is now properly discarded if it was not
      used in dm_bufio_max_age time.
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      adcbd2e5
    • Geert Uytterhoeven's avatar
      m68k: Disable/restore interrupts in hwreg_present()/hwreg_write() · 82556daf
      Geert Uytterhoeven authored
      commit e4dc601b upstream.
      
      hwreg_present() and hwreg_write() temporarily change the VBR register to
      another vector table. This table contains a valid bus error handler
      only, all other entries point to arbitrary addresses.
      
      If an interrupt comes in while the temporary table is active, the
      processor will start executing at such an arbitrary address, and the
      kernel will crash.
      
      While most callers run early, before interrupts are enabled, or
      explicitly disable interrupts, Finn Thain pointed out that macsonic has
      one callsite that doesn't, causing intermittent boot crashes.
      There's another unsafe callsite in hilkbd.
      
      Fix this for good by disabling and restoring interrupts inside
      hwreg_present() and hwreg_write().
      
      Explicitly disabling interrupts can be removed from the callsites later.
      Reported-by: default avatarFinn Thain <fthain@telegraphics.com.au>
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      82556daf
    • Andy Adamson's avatar
      NFSv4.1: Fix an NFSv4.1 state renewal regression · 2c9d556d
      Andy Adamson authored
      commit d1f456b0 upstream.
      
      Commit 2f60ea6b ("NFSv4: The NFSv4.0 client must send RENEW calls if it holds a delegation") set the NFS4_RENEW_TIMEOUT flag in nfs4_renew_state, and does
      not put an nfs41_proc_async_sequence call, the NFSv4.1 lease renewal heartbeat
      call, on the wire to renew the NFSv4.1 state if the flag was not set.
      
      The NFS4_RENEW_TIMEOUT flag is set when "now" is after the last renewal
      (cl_last_renewal) plus the lease time divided by 3. This is arbitrary and
      sometimes does the following:
      
      In normal operation, the only way a future state renewal call is put on the
      wire is via a call to nfs4_schedule_state_renewal, which schedules a
      nfs4_renew_state workqueue task. nfs4_renew_state determines if the
      NFS4_RENEW_TIMEOUT should be set, and the calls nfs41_proc_async_sequence,
      which only gets sent if the NFS4_RENEW_TIMEOUT flag is set.
      Then the nfs41_proc_async_sequence rpc_release function schedules
      another state remewal via nfs4_schedule_state_renewal.
      
      Without this change we can get into a state where an application stops
      accessing the NFSv4.1 share, state renewal calls stop due to the
      NFS4_RENEW_TIMEOUT flag _not_ being set. The only way to recover
      from this situation is with a clientid re-establishment, once the application
      resumes and the server has timed out the lease and so returns
      NFS4ERR_BAD_SESSION on the subsequent SEQUENCE operation.
      
      An example application:
      open, lock, write a file.
      
      sleep for 6 * lease (could be less)
      
      ulock, close.
      
      In the above example with NFSv4.1 delegations enabled, without this change,
      there are no OP_SEQUENCE state renewal calls during the sleep, and the
      clientid is recovered due to lease expiration on the close.
      
      This issue does not occur with NFSv4.1 delegations disabled, nor with
      NFSv4.0, with or without delegations enabled.
      Signed-off-by: default avatarAndy Adamson <andros@netapp.com>
      Link: http://lkml.kernel.org/r/1411486536-23401-1-git-send-email-andros@netapp.com
      Fixes: 2f60ea6b (NFSv4: The NFSv4.0 client must send RENEW calls...)
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      2c9d556d
    • Borislav Petkov's avatar
      mpc85xx_edac: Make L2 interrupt shared too · f4c4b923
      Borislav Petkov authored
      commit a18c3f16 upstream.
      
      The other two interrupt handlers in this driver are shared, except this
      one. When loading the driver, it fails like this.
      
      So make the IRQ line shared.
      
      Freescale(R) MPC85xx EDAC driver, (C) 2006 Montavista Software
      mpc85xx_mc_err_probe: No ECC DIMMs discovered
      EDAC DEVICE0: Giving out device to module MPC85xx_edac controller mpc85xx_l2_err: DEV mpc85xx_l2_err (INTERRUPT)
      genirq: Flags mismatch irq 16. 00000000 ([EDAC] L2 err) vs. 00000080 ([EDAC] PCI err)
      mpc85xx_l2_err_probe: Unable to request irq 16 for MPC85xx L2 err
      remove_proc_entry: removing non-empty directory 'irq/16', leaking at least 'aerdrv'
      ------------[ cut here ]------------
      WARNING: at fs/proc/generic.c:521
      Modules linked in:
      CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.17.0-rc5-dirty #1
      task: ee058000 ti: ee046000 task.ti: ee046000
      NIP: c016c0c4 LR: c016c0c4 CTR: c037b51c
      REGS: ee047c10 TRAP: 0700 Not tainted (3.17.0-rc5-dirty)
      MSR: 00029000 <CE,EE,ME> CR: 22008022 XER: 20000000
      
      GPR00: c016c0c4 ee047cc0 ee058000 00000053 00029000 00000000 c037c744 00000003
      GPR08: c09aab28 c09aab24 c09aab28 00000156 20008028 00000000 c0002ac8 00000000
      GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000139 c0950394
      GPR24: c09f0000 ee5585b0 ee047d08 c0a10000 ee047d08 ee15f808 00000002 ee03f660
      NIP [c016c0c4] remove_proc_entry
      LR [c016c0c4] remove_proc_entry
      Call Trace:
      remove_proc_entry (unreliable)
      unregister_irq_proc
      free_desc
      irq_free_descs
      mpc85xx_l2_err_probe
      platform_drv_probe
      really_probe
      __driver_attach
      bus_for_each_dev
      bus_add_driver
      driver_register
      mpc85xx_mc_init
      do_one_initcall
      kernel_init_freeable
      kernel_init
      ret_from_kernel_thread
      Instruction dump: ...
      
      Reported-and-tested-by: <lpb_098@163.com>
      Acked-by: default avatarJohannes Thumshirn <johannes.thumshirn@men.de>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      [lizf: Backported to 3.4: IRQF_DISABLED hasn't been removed in 3.4]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      f4c4b923
    • Mikulas Patocka's avatar
      framebuffer: fix border color · ab766b86
      Mikulas Patocka authored
      commit f74a289b upstream.
      
      The framebuffer code uses the current background color to fill the border
      when switching consoles, however, this results in inconsistent behavior.
      For example:
      - start Midnigh Commander
      - the border is black
      - switch to another console and switch back
      - the border is cyan
      - type something into the command line in mc
      - the border is cyan
      - switch to another console and switch back
      - the border is black
      - press F9 to go to menu
      - the border is black
      - switch to another console and switch back
      - the border is dark blue
      
      When switching to a console with Midnight Commander, the border is random
      color that was left selected by the slang subsystem.
      
      This patch fixes this inconsistency by always using black as the
      background color when switching consoles.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarTomi Valkeinen <tomi.valkeinen@ti.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      ab766b86
    • Bryan O'Donoghue's avatar
      serial: 8250: Add Quark X1000 to 8250_pci.c · a06f6a5d
      Bryan O'Donoghue authored
      commit 1ede7dcc upstream.
      
      Quark X1000 contains two designware derived 8250 serial ports.
      Each port has a unique PCI configuration space consisting of
      BAR0:UART BAR1:DMA respectively.
      
      Unlike the standard 8250 the register width is 32 bits for RHR,IER etc
      The Quark UART has a fundamental clock @ 44.2368 MHz allowing for a
      bitrate of up to about 2.76 megabits per second.
      
      This patch enables standard 8250 mode
      Signed-off-by: default avatarBryan O'Donoghue <pure.logic@nexus-software.ie>
      Reviewed-by: default avatarHeikki Krogerus <heikki.krogerus@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      a06f6a5d
    • Trond Myklebust's avatar
      NFSv4: fix open/lock state recovery error handling · 5a0b8b70
      Trond Myklebust authored
      commit df817ba3 upstream.
      
      The current open/lock state recovery unfortunately does not handle errors
      such as NFS4ERR_CONN_NOT_BOUND_TO_SESSION correctly. Instead of looping,
      just proceeds as if the state manager is finished recovering.
      This patch ensures that we loop back, handle higher priority errors
      and complete the open/lock state recovery.
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      5a0b8b70