1. 02 Feb, 2015 40 commits
    • Nadav Amit's avatar
      KVM: x86: Emulator fixes for eip canonical checks on near branches · 858f9338
      Nadav Amit authored
      commit 234f3ce4 upstream.
      
      Before changing rip (during jmp, call, ret, etc.) the target should be asserted
      to be canonical one, as real CPUs do.  During sysret, both target rsp and rip
      should be canonical. If any of these values is noncanonical, a #GP exception
      should occur.  The exception to this rule are syscall and sysenter instructions
      in which the assigned rip is checked during the assignment to the relevant
      MSRs.
      
      This patch fixes the emulator to behave as real CPUs do for near branches.
      Far branches are handled by the next patch.
      
      This fixes CVE-2014-3647.
      Signed-off-by: default avatarNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      [lizf: Backported to 3.4:
       - adjust context
       - use ctxt->regs rather than reg_read() and reg_write()]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      858f9338
    • Nadav Amit's avatar
      KVM: x86: Fix wrong masking on relative jump/call · 973be4a7
      Nadav Amit authored
      commit 05c83ec9 upstream.
      
      Relative jumps and calls do the masking according to the operand size, and not
      according to the address size as the KVM emulator does today.
      
      This patch fixes KVM behavior.
      Signed-off-by: default avatarNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      973be4a7
    • Andy Honig's avatar
      KVM: x86: Improve thread safety in pit · 375c51cd
      Andy Honig authored
      commit 2febc839 upstream.
      
      There's a race condition in the PIT emulation code in KVM.  In
      __kvm_migrate_pit_timer the pit_timer object is accessed without
      synchronization.  If the race condition occurs at the wrong time this
      can crash the host kernel.
      
      This fixes CVE-2014-3611.
      Signed-off-by: default avatarAndrew Honig <ahonig@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      375c51cd
    • Andy Honig's avatar
      KVM: x86: Prevent host from panicking on shared MSR writes. · 6a607be8
      Andy Honig authored
      commit 8b3c3104 upstream.
      
      The previous patch blocked invalid writes directly when the MSR
      is written.  As a precaution, prevent future similar mistakes by
      gracefulling handle GPs caused by writes to shared MSRs.
      Signed-off-by: default avatarAndrew Honig <ahonig@google.com>
      [Remove parts obsoleted by Nadav's patch. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      [lizf: Backported to 3.4:
       - adjust context
       - s/wrmsrl_safe/checking_wrmsrl/]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      6a607be8
    • Nadav Amit's avatar
      KVM: x86: Check non-canonical addresses upon WRMSR · ab998b3c
      Nadav Amit authored
      commit 854e8bb1 upstream.
      
      Upon WRMSR, the CPU should inject #GP if a non-canonical value (address) is
      written to certain MSRs. The behavior is "almost" identical for AMD and Intel
      (ignoring MSRs that are not implemented in either architecture since they would
      anyhow #GP). However, IA32_SYSENTER_ESP and IA32_SYSENTER_EIP cause #GP if
      non-canonical address is written on Intel but not on AMD (which ignores the top
      32-bits).
      
      Accordingly, this patch injects a #GP on the MSRs which behave identically on
      Intel and AMD.  To eliminate the differences between the architecutres, the
      value which is written to IA32_SYSENTER_ESP and IA32_SYSENTER_EIP is turned to
      canonical value before writing instead of injecting a #GP.
      
      Some references from Intel and AMD manuals:
      
      According to Intel SDM description of WRMSR instruction #GP is expected on
      WRMSR "If the source register contains a non-canonical address and ECX
      specifies one of the following MSRs: IA32_DS_AREA, IA32_FS_BASE, IA32_GS_BASE,
      IA32_KERNEL_GS_BASE, IA32_LSTAR, IA32_SYSENTER_EIP, IA32_SYSENTER_ESP."
      
      According to AMD manual instruction manual:
      LSTAR/CSTAR (SYSCALL): "The WRMSR instruction loads the target RIP into the
      LSTAR and CSTAR registers.  If an RIP written by WRMSR is not in canonical
      form, a general-protection exception (#GP) occurs."
      IA32_GS_BASE and IA32_FS_BASE (WRFSBASE/WRGSBASE): "The address written to the
      base field must be in canonical form or a #GP fault will occur."
      IA32_KERNEL_GS_BASE (SWAPGS): "The address stored in the KernelGSbase MSR must
      be in canonical form."
      
      This patch fixes CVE-2014-3610.
      Signed-off-by: default avatarNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      [lizf: Backported to 3.4:
       - adjust context
       - s/msr->index/msr_index and s/msr->data/data]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      ab998b3c
    • Dirk Brandewie's avatar
      cpufreq: expose scaling_cur_freq sysfs file for set_policy() drivers · aea44610
      Dirk Brandewie authored
      commit c034b02e upstream.
      
      Currently the core does not expose scaling_cur_freq for set_policy()
      drivers this breaks some userspace monitoring tools.
      Change the core to expose this file for all drivers and if the
      set_policy() driver supports the get() callback use it to retrieve the
      current frequency.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=73741Signed-off-by: default avatarDirk Brandewie <dirk.j.brandewie@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      aea44610
    • David Daney's avatar
      MIPS: tlbex: Properly fix HUGE TLB Refill exception handler · e9305166
      David Daney authored
      commit 9e0f162a upstream.
      
      In commit 8393c524 (MIPS: tlbex: Fix a missing statement for
      HUGETLB), the TLB Refill handler was fixed so that non-OCTEON targets
      would work properly with huge pages.  The change was incorrect in that
      it broke the OCTEON case.
      
      The problem is shown here:
      
          xxx0:	df7a0000 	ld	k0,0(k1)
          .
          .
          .
          xxxc0:	df610000 	ld	at,0(k1)
          xxxc4:	335a0ff0 	andi	k0,k0,0xff0
          xxxc8:	e825ffcd 	bbit1	at,0x5,0x0
          xxxcc:	003ad82d 	daddu	k1,at,k0
          .
          .
          .
      
      In the non-octeon case there is a destructive test for the huge PTE
      bit, and then at 0, $k0 is reloaded (that is what the 8393c524
      patch added).
      
      In the octeon case, we modify k1 in the branch delay slot, but we
      never need k0 again, so the new load is not needed, but since k1 is
      modified, if we do the load, we load from a garbage location and then
      get a nested TLB Refill, which is seen in userspace as either SIGBUS
      or SIGSEGV (depending on the garbage).
      
      The real fix is to only do this reloading if it is needed, and never
      where it is harmful.
      Signed-off-by: default avatarDavid Daney <david.daney@cavium.com>
      Cc: Huacai Chen <chenhc@lemote.com>
      Cc: Fuxin Zhang <zhangfx@lemote.com>
      Cc: Zhangjin Wu <wuzhangjin@gmail.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/8151/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      e9305166
    • Huacai Chen's avatar
      MIPS: tlbex: Fix a missing statement for HUGETLB · b58f0431
      Huacai Chen authored
      commit 8393c524 upstream.
      
      In commit 2c8c53e2 (MIPS: Optimize TLB handlers for Octeon CPUs)
      build_r4000_tlb_refill_handler() is modified. But it doesn't compatible
      with the original code in HUGETLB case. Because there is a copy & paste
      error and one line of code is missing. It is very easy to produce a bug
      with LTP's hugemmap05 test.
      Signed-off-by: default avatarHuacai Chen <chenhc@lemote.com>
      Signed-off-by: default avatarBinbin Zhou <zhoubb@lemote.com>
      Cc: John Crispin <john@phrozen.org>
      Cc: Steven J. Hill <Steven.Hill@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Cc: Fuxin Zhang <zhangfx@lemote.com>
      Cc: Zhangjin Wu <wuzhangjin@gmail.com>
      Patchwork: https://patchwork.linux-mips.org/patch/7496/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      b58f0431
    • Michal Hocko's avatar
      OOM, PM: OOM killed task shouldn't escape PM suspend · 7655f855
      Michal Hocko authored
      commit 5695be14 upstream.
      
      PM freezer relies on having all tasks frozen by the time devices are
      getting frozen so that no task will touch them while they are getting
      frozen. But OOM killer is allowed to kill an already frozen task in
      order to handle OOM situtation. In order to protect from late wake ups
      OOM killer is disabled after all tasks are frozen. This, however, still
      keeps a window open when a killed task didn't manage to die by the time
      freeze_processes finishes.
      
      Reduce the race window by checking all tasks after OOM killer has been
      disabled. This is still not race free completely unfortunately because
      oom_killer_disable cannot stop an already ongoing OOM killer so a task
      might still wake up from the fridge and get killed without
      freeze_processes noticing. Full synchronization of OOM and freezer is,
      however, too heavy weight for this highly unlikely case.
      
      Introduce and check oom_kills counter which gets incremented early when
      the allocator enters __alloc_pages_may_oom path and only check all the
      tasks if the counter changes during the freezing attempt. The counter
      is updated so early to reduce the race window since allocator checked
      oom_killer_disabled which is set by PM-freezing code. A false positive
      will push the PM-freezer into a slow path but that is not a big deal.
      
      Changes since v1
      - push the re-check loop out of freeze_processes into
        check_frozen_processes and invert the condition to make the code more
        readable as per Rafael
      
      Fixes: f660daac (oom: thaw threads if oom killed thread is frozen before deferring)
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      7655f855
    • Oleg Nesterov's avatar
      introduce for_each_thread() to replace the buggy while_each_thread() · b71ec075
      Oleg Nesterov authored
      commit 0c740d0a upstream.
      
      while_each_thread() and next_thread() should die, almost every lockless
      usage is wrong.
      
      1. Unless g == current, the lockless while_each_thread() is not safe.
      
         while_each_thread(g, t) can loop forever if g exits, next_thread()
         can't reach the unhashed thread in this case. Note that this can
         happen even if g is the group leader, it can exec.
      
      2. Even if while_each_thread() itself was correct, people often use
         it wrongly.
      
         It was never safe to just take rcu_read_lock() and loop unless
         you verify that pid_alive(g) == T, even the first next_thread()
         can point to the already freed/reused memory.
      
      This patch adds signal_struct->thread_head and task->thread_node to
      create the normal rcu-safe list with the stable head.  The new
      for_each_thread(g, t) helper is always safe under rcu_read_lock() as
      long as this task_struct can't go away.
      
      Note: of course it is ugly to have both task_struct->thread_node and the
      old task_struct->thread_group, we will kill it later, after we change
      the users of while_each_thread() to use for_each_thread().
      
      Perhaps we can kill it even before we convert all users, we can
      reimplement next_thread(t) using the new thread_head/thread_node.  But
      we can't do this right now because this will lead to subtle behavioural
      changes.  For example, do/while_each_thread() always sees at least one
      task, while for_each_thread() can do nothing if the whole thread group
      has died.  Or thread_group_empty(), currently its semantics is not clear
      unless thread_group_leader(p) and we need to audit the callers before we
      can change it.
      
      So this patch adds the new interface which has to coexist with the old
      one for some time, hopefully the next changes will be more or less
      straightforward and the old one will go away soon.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reviewed-by: default avatarSergey Dyasly <dserrg@gmail.com>
      Tested-by: default avatarSergey Dyasly <dserrg@gmail.com>
      Reviewed-by: default avatarSameer Nanda <snanda@chromium.org>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mandeep Singh Baines <msb@chromium.org>
      Cc: "Ma, Xindong" <xindong.ma@intel.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: "Tu, Xiaobing" <xiaobing.tu@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      b71ec075
    • Oleg Nesterov's avatar
      kernel/fork.c:copy_process(): unify CLONE_THREAD-or-thread_group_leader code · ace595fd
      Oleg Nesterov authored
      commit 80628ca0 upstream.
      
      Cleanup and preparation for the next changes.
      
      Move the "if (clone_flags & CLONE_THREAD)" code down under "if
      (likely(p->pid))" and turn it into into the "else" branch.  This makes the
      process/thread initialization more symmetrical and removes one check.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Sergey Dyasly <dserrg@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      ace595fd
    • Cong Wang's avatar
      freezer: Do not freeze tasks killed by OOM killer · a2ca02f1
      Cong Wang authored
      commit 51fae6da upstream.
      
      Since f660daac (oom: thaw threads if oom killed thread is frozen
      before deferring) OOM killer relies on being able to thaw a frozen task
      to handle OOM situation but a3201227 (freezer: make freezing() test
      freeze conditions in effect instead of TIF_FREEZE) has reorganized the
      code and stopped clearing freeze flag in __thaw_task. This means that
      the target task only wakes up and goes into the fridge again because the
      freezing condition hasn't changed for it. This reintroduces the bug
      fixed by f660daac.
      
      Fix the issue by checking for TIF_MEMDIE thread flag in
      freezing_slow_path and exclude the task from freezing completely. If a
      task was already frozen it would get woken by __thaw_task from OOM killer
      and get out of freezer after rechecking freezing().
      
      Changes since v1
      - put TIF_MEMDIE check into freezing_slowpath rather than in __refrigerator
        as per Oleg
      - return __thaw_task into oom_scan_process_thread because
        oom_kill_process will not wake task in the fridge because it is
        sleeping uninterruptible
      
      [mhocko@suse.cz: rewrote the changelog]
      Fixes: a3201227 (freezer: make freezing() test freeze conditions in effect instead of TIF_FREEZE)
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      a2ca02f1
    • Vlad Catoi's avatar
      ALSA: usb-audio: Add support for Steinberg UR22 USB interface · 7e551647
      Vlad Catoi authored
      commit f0b127fb upstream.
      
      Adding support for Steinberg UR22 USB interface via quirks table patch
      
      See Ubuntu bug report:
      https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1317244
      Also see threads:
      http://linux-audio.4202.n7.nabble.com/Support-for-Steinberg-UR22-Yamaha-USB-chipset-0499-1509-tc82888.html#a82917
      http://www.steinberg.net/forums/viewtopic.php?t=62290
      
      Tested by at least 4 people judging by the threads.
      Did not test MIDI interface, but audio output and capture both are
      functional. Built 3.17 kernel with this driver on Ubuntu 14.04 & tested with mpg123
      Patch applied to 3.13 Ubuntu kernel works well enough for daily use.
      Signed-off-by: default avatarVlad Catoi <vladcatoi@gmail.com>
      Acked-by: default avatarClemens Ladisch <clemens@ladisch.de>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      7e551647
    • Anatol Pomozov's avatar
      ALSA: pcm: use the same dma mmap codepath both for arm and arm64 · 7a13f726
      Anatol Pomozov authored
      commit a011e213 upstream.
      
      This avoids following kernel crash when try to playback on arm64
      
      [  107.497203] [<ffffffc00046b310>] snd_pcm_mmap_data_fault+0x90/0xd4
      [  107.503405] [<ffffffc0001541ac>] __do_fault+0xb0/0x498
      [  107.508565] [<ffffffc0001576a0>] handle_mm_fault+0x224/0x7b0
      [  107.514246] [<ffffffc000092640>] do_page_fault+0x11c/0x310
      [  107.519738] [<ffffffc000081100>] do_mem_abort+0x38/0x98
      
      Tested: backported to 3.14 and tried to playback on arm64 machine
      Signed-off-by: default avatarAnatol Pomozov <anatol.pomozov@gmail.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      7a13f726
    • Daniel Borkmann's avatar
      random: add and use memzero_explicit() for clearing data · 6a16b0d0
      Daniel Borkmann authored
      commit d4c5efdb upstream.
      
      zatimend has reported that in his environment (3.16/gcc4.8.3/corei7)
      memset() calls which clear out sensitive data in extract_{buf,entropy,
      entropy_user}() in random driver are being optimized away by gcc.
      
      Add a helper memzero_explicit() (similarly as explicit_bzero() variants)
      that can be used in such cases where a variable with sensitive data is
      being cleared out in the end. Other use cases might also be in crypto
      code. [ I have put this into lib/string.c though, as it's always built-in
      and doesn't need any dependencies then. ]
      
      Fixes kernel bugzilla: 82041
      
      Reported-by: zatimend@hotmail.co.uk
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      [lizf: Backported to 3.4:
       - adjust context
       - another memset() in extract_buf() needs to be converted]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      6a16b0d0
    • Cesar Eduardo Barros's avatar
      crypto: more robust crypto_memneq · e4425815
      Cesar Eduardo Barros authored
      commit fe8c8a12 upstream.
      
      [Only use the compiler.h portion of this patch, to get the
      OPTIMIZER_HIDE_VAR() macro, which we need for other -stable patches
      - gregkh]
      
      Disabling compiler optimizations can be fragile, since a new
      optimization could be added to -O0 or -Os that breaks the assumptions
      the code is making.
      
      Instead of disabling compiler optimizations, use a dummy inline assembly
      (based on RELOC_HIDE) to block the problematic kinds of optimization,
      while still allowing other optimizations to be applied to the code.
      
      The dummy inline assembly is added after every OR, and has the
      accumulator variable as its input and output. The compiler is forced to
      assume that the dummy inline assembly could both depend on the
      accumulator variable and change the accumulator variable, so it is
      forced to compute the value correctly before the inline assembly, and
      cannot assume anything about its value after the inline assembly.
      
      This change should be enough to make crypto_memneq work correctly (with
      data-independent timing) even if it is inlined at its call sites. That
      can be done later in a followup patch.
      
      Compile-tested on x86_64.
      Signed-off-by: default avatarCesar Eduardo Barros <cesarb@cesarb.eti.br>
      Acked-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      e4425815
    • Eric Sandeen's avatar
      ext4: fix reservation overflow in ext4_da_write_begin · e7ce7b47
      Eric Sandeen authored
      commit 0ff8947f upstream.
      
      Delalloc write journal reservations only reserve 1 credit,
      to update the inode if necessary.  However, it may happen
      once in a filesystem's lifetime that a file will cross
      the 2G threshold, and require the LARGE_FILE feature to
      be set in the superblock as well, if it was not set already.
      
      This overruns the transaction reservation, and can be
      demonstrated simply on any ext4 filesystem without the LARGE_FILE
      feature already set:
      
      dd if=/dev/zero of=testfile bs=1 seek=2147483646 count=1 \
      	conv=notrunc of=testfile
      sync
      dd if=/dev/zero of=testfile bs=1 seek=2147483647 count=1 \
      	conv=notrunc of=testfile
      
      leads to:
      
      EXT4-fs: ext4_do_update_inode:4296: aborting transaction: error 28 in __ext4_handle_dirty_super
      EXT4-fs error (device loop0) in ext4_do_update_inode:4301: error 28
      EXT4-fs error (device loop0) in ext4_reserve_inode_write:4757: Readonly filesystem
      EXT4-fs error (device loop0) in ext4_dirty_inode:4876: error 28
      EXT4-fs error (device loop0) in ext4_da_write_end:2685: error 28
      
      Adjust the number of credits based on whether the flag is
      already set, and whether the current write may extend past the
      LARGE_FILE limit.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarAndreas Dilger <adilger@dilger.ca>
      [lizf: Backported to 3.4:
       - adjust context
       - ext4_journal_start() has no parameter type]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      e7ce7b47
    • Theodore Ts'o's avatar
      ext4: add ext4_iget_normal() which is to be used for dir tree lookups · 07cf4db3
      Theodore Ts'o authored
      commit f4bb2981 upstream.
      
      If there is a corrupted file system which has directory entries that
      point at reserved, metadata inodes, prohibit them from being used by
      treating them the same way we treat Boot Loader inodes --- that is,
      mark them to be bad inodes.  This prohibits them from being opened,
      deleted, or modified via chmod, chown, utimes, etc.
      
      In particular, this prevents a corrupted file system which has a
      directory entry which points at the journal inode from being deleted
      and its blocks released, after which point Much Hilarity Ensues.
      Reported-by: default avatarSami Liedes <sami.liedes@iki.fi>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      07cf4db3
    • Theodore Ts'o's avatar
      ext4: don't orphan or truncate the boot loader inode · 07048f9e
      Theodore Ts'o authored
      commit e2bfb088 upstream.
      
      The boot loader inode (inode #5) should never be visible in the
      directory hierarchy, but it's possible if the file system is corrupted
      that there will be a directory entry that points at inode #5.  In
      order to avoid accidentally trashing it, when such a directory inode
      is opened, the inode will be marked as a bad inode, so that it's not
      possible to modify (or read) the inode from userspace.
      
      Unfortunately, when we unlink this (invalid/illegal) directory entry,
      we will put the bad inode on the ophan list, and then when try to
      unlink the directory, we don't actually remove the bad inode from the
      orphan list before freeing in-memory inode structure.  This means the
      in-memory orphan list is corrupted, leading to a kernel oops.
      
      In addition, avoid truncating a bad inode in ext4_destroy_inode(),
      since truncating the boot loader inode is not a smart thing to do.
      Reported-by: default avatarSami Liedes <sami.liedes@iki.fi>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      07048f9e
    • Jan Kara's avatar
      vfs: fix data corruption when blocksize < pagesize for mmaped data · 60e7100a
      Jan Kara authored
      commit 90a80202 upstream.
      
      ->page_mkwrite() is used by filesystems to allocate blocks under a page
      which is becoming writeably mmapped in some process' address space. This
      allows a filesystem to return a page fault if there is not enough space
      available, user exceeds quota or similar problem happens, rather than
      silently discarding data later when writepage is called.
      
      However VFS fails to call ->page_mkwrite() in all the cases where
      filesystems need it when blocksize < pagesize. For example when
      blocksize = 1024, pagesize = 4096 the following is problematic:
        ftruncate(fd, 0);
        pwrite(fd, buf, 1024, 0);
        map = mmap(NULL, 1024, PROT_WRITE, MAP_SHARED, fd, 0);
        map[0] = 'a';       ----> page_mkwrite() for index 0 is called
        ftruncate(fd, 10000); /* or even pwrite(fd, buf, 1, 10000) */
        mremap(map, 1024, 10000, 0);
        map[4095] = 'a';    ----> no page_mkwrite() called
      
      At the moment ->page_mkwrite() is called, filesystem can allocate only
      one block for the page because i_size == 1024. Otherwise it would create
      blocks beyond i_size which is generally undesirable. But later at
      ->writepage() time, we also need to store data at offset 4095 but we
      don't have block allocated for it.
      
      This patch introduces a helper function filesystems can use to have
      ->page_mkwrite() called at all the necessary moments.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      [lizf: Backported to 3.4:
       - adjust context
       - truncate_setsize() already has an oldsize variable]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      60e7100a
    • Quinn Tran's avatar
      target: Fix queue full status NULL pointer for SCF_TRANSPORT_TASK_SENSE · e306b0da
      Quinn Tran authored
      commit 082f58ac upstream.
      
      During temporary resource starvation at lower transport layer, command
      is placed on queue full retry path, which expose this problem.  The TCM
      queue full handling of SCF_TRANSPORT_TASK_SENSE currently sends the same
      cmd twice to lower layer.  The 1st time led to cmd normal free path.
      The 2nd time cause Null pointer access.
      
      This regression bug was originally introduced v3.1-rc code in the
      following commit:
      
      commit e057f533
      Author: Christoph Hellwig <hch@infradead.org>
      Date:   Mon Oct 17 13:56:41 2011 -0400
      
          target: remove the transport_qf_callback se_cmd callback
      Signed-off-by: default avatarQuinn Tran <quinn.tran@qlogic.com>
      Signed-off-by: default avatarSaurav Kashyap <saurav.kashyap@qlogic.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      e306b0da
    • Jan Kara's avatar
      ext4: don't check quota format when there are no quota files · a38c4d8a
      Jan Kara authored
      commit 279bf6d3 upstream.
      
      The check whether quota format is set even though there are no
      quota files with journalled quota is pointless and it actually
      makes it impossible to turn off journalled quotas (as there's
      no way to unset journalled quota format). Just remove the check.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      a38c4d8a
    • Darrick J. Wong's avatar
      ext4: check EA value offset when loading · b0fea9c1
      Darrick J. Wong authored
      commit a0626e75 upstream.
      
      When loading extended attributes, check each entry's value offset to
      make sure it doesn't collide with the entries.
      
      Without this check it is easy to crash the kernel by mounting a
      malicious FS containing a file with an EA wherein e_value_offs = 0 and
      e_value_size > 0 and then deleting the EA, which corrupts the name
      list.
      
      (See the f_ea_value_crash test's FS image in e2fsprogs for an example.)
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      b0fea9c1
    • Andy Lutomirski's avatar
      x86,kvm,vmx: Preserve CR4 across VM entry · 4e2c6422
      Andy Lutomirski authored
      commit d974baa3 upstream.
      
      CR4 isn't constant; at least the TSD and PCE bits can vary.
      
      TBH, treating CR0 and CR3 as constant scares me a bit, too, but it looks
      like it's correct.
      
      This adds a branch and a read from cr4 to each vm entry.  Because it is
      extremely likely that consecutive entries into the same vcpu will have
      the same host cr4 value, this fixes up the vmcs instead of restoring cr4
      after the fact.  A subsequent patch will add a kernel-wide cr4 shadow,
      reducing the overhead in the common case to just two memory reads and a
      branch.
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Acked-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Cc: Petr Matousek <pmatouse@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [lizf: Backported to 3.4:
       - adjust context
       - add parameter struct vcpu_vmx *vmx to vmx_set_constant_host_state()]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      4e2c6422
    • Catalin Marinas's avatar
      futex: Ensure get_futex_key_refs() always implies a barrier · 9e9aab5d
      Catalin Marinas authored
      commit 76835b0e upstream.
      
      Commit b0c29f79 (futexes: Avoid taking the hb->lock if there's
      nothing to wake up) changes the futex code to avoid taking a lock when
      there are no waiters. This code has been subsequently fixed in commit
      11d4616b (futex: revert back to the explicit waiter counting code).
      Both the original commit and the fix-up rely on get_futex_key_refs() to
      always imply a barrier.
      
      However, for private futexes, none of the cases in the switch statement
      of get_futex_key_refs() would be hit and the function completes without
      a memory barrier as required before checking the "waiters" in
      futex_wake() -> hb_waiters_pending(). The consequence is a race with a
      thread waiting on a futex on another CPU, allowing the waker thread to
      read "waiters == 0" while the waiter thread to have read "futex_val ==
      locked" (in kernel).
      
      Without this fix, the problem (user space deadlocks) can be seen with
      Android bionic's mutex implementation on an arm64 multi-cluster system.
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reported-by: default avatarMatteo Franchin <Matteo.Franchin@arm.com>
      Fixes: b0c29f79 (futexes: Avoid taking the hb->lock if there's nothing to wake up)
      Acked-by: default avatarDavidlohr Bueso <dave@stgolabs.net>
      Tested-by: default avatarMike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Darren Hart <dvhart@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      9e9aab5d
    • Stephen Smalley's avatar
      selinux: fix inode security list corruption · e8ab53a5
      Stephen Smalley authored
      commit 923190d3 upstream.
      
      sb_finish_set_opts() can race with inode_free_security()
      when initializing inode security structures for inodes
      created prior to initial policy load or by the filesystem
      during ->mount().   This appears to have always been
      a possible race, but commit 3dc91d43 ("SELinux:  Fix possible
      NULL pointer dereference in selinux_inode_permission()")
      made it more evident by immediately reusing the unioned
      list/rcu element  of the inode security structure for call_rcu()
      upon an inode_free_security().  But the underlying issue
      was already present before that commit as a possible use-after-free
      of isec.
      
      Shivnandan Kumar reported the list corruption and proposed
      a patch to split the list and rcu elements out of the union
      as separate fields of the inode_security_struct so that setting
      the rcu element would not affect the list element.  However,
      this would merely hide the issue and not truly fix the code.
      
      This patch instead moves up the deletion of the list entry
      prior to dropping the sbsec->isec_lock initially.  Then,
      if the inode is dropped subsequently, there will be no further
      references to the isec.
      Reported-by: default avatarShivnandan Kumar <shivnandan.k@samsung.com>
      Signed-off-by: default avatarStephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: default avatarPaul Moore <pmoore@redhat.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      e8ab53a5
    • Michael S. Tsirkin's avatar
      virtio_pci: fix virtio spec compliance on restore · fb7eb2f7
      Michael S. Tsirkin authored
      commit 6fbc198c upstream.
      
      On restore, virtio pci does the following:
      + set features
      + init vqs etc - device can be used at this point!
      + set ACKNOWLEDGE,DRIVER and DRIVER_OK status bits
      
      This is in violation of the virtio spec, which
      requires the following order:
      - ACKNOWLEDGE
      - DRIVER
      - init vqs
      - DRIVER_OK
      
      This behaviour will break with hypervisors that assume spec compliant
      behaviour.  It seems like a good idea to have this patch applied to
      stable branches to reduce the support butden for the hypervisors.
      
      Cc: Amit Shah <amit.shah@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      fb7eb2f7
    • Eric W. Biederman's avatar
      mnt: Prevent pivot_root from creating a loop in the mount tree · 9f7d53c0
      Eric W. Biederman authored
      commit 0d082601 upstream.
      
      Andy Lutomirski recently demonstrated that when chroot is used to set
      the root path below the path for the new ``root'' passed to pivot_root
      the pivot_root system call succeeds and leaks mounts.
      
      In examining the code I see that starting with a new root that is
      below the current root in the mount tree will result in a loop in the
      mount tree after the mounts are detached and then reattached to one
      another.  Resulting in all kinds of ugliness including a leak of that
      mounts involved in the leak of the mount loop.
      
      Prevent this problem by ensuring that the new mount is reachable from
      the current root of the mount tree.
      
      [Added stable cc.  Fixes CVE-2014-7970.  --Andy]
      Reported-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Reviewed-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Link: http://lkml.kernel.org/r/87bnpmihks.fsf@x220.int.ebiederm.orgSigned-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      9f7d53c0
    • Takashi Iwai's avatar
      ALSA: emu10k1: Fix deadlock in synth voice lookup · e65c1a23
      Takashi Iwai authored
      commit 95926035 upstream.
      
      The emu10k1 voice allocator takes voice_lock spinlock.  When there is
      no empty stream available, it tries to release a voice used by synth,
      and calls get_synth_voice.  The callback function,
      snd_emu10k1_synth_get_voice(), however, also takes the voice_lock,
      thus it deadlocks.
      
      The fix is simply removing the voice_lock holds in
      snd_emu10k1_synth_get_voice(), as this is always called in the
      spinlock context.
      Reported-and-tested-by: default avatarArthur Marsh <arthur.marsh@internode.on.net>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      e65c1a23
    • Sasha Levin's avatar
      kernel: add support for gcc 5 · bf70aaaa
      Sasha Levin authored
      commit 71458cfc upstream.
      
      We're missing include/linux/compiler-gcc5.h which is required now
      because gcc branched off to v5 in trunk.
      
      Just copy the relevant bits out of include/linux/compiler-gcc4.h,
      no new code is added as of now.
      
      This fixes a build error when using gcc 5.
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      bf70aaaa
    • Hans de Goede's avatar
      Input: i8042 - add noloop quirk for Asus X750LN · fbc94908
      Hans de Goede authored
      commit 9ff84a17 upstream.
      
      Without this the aux port does not get detected, and consequently the
      touchpad will not work.
      
      https://bugzilla.redhat.com/show_bug.cgi?id=1110011Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      fbc94908
    • Dmitry Torokhov's avatar
      Input: synaptics - gate forcepad support by DMI check · 8000fbfa
      Dmitry Torokhov authored
      commit aa972409 upstream.
      
      Unfortunately, ForcePad capability is not actually exported over PS/2, so
      we have to resort to DMI checks.
      Reported-by: default avatarNicole Faerber <nicole.faerber@kernelconcepts.de>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      8000fbfa
    • Yann Droneaud's avatar
      fanotify: enable close-on-exec on events' fd when requested in fanotify_init() · 7baa56f6
      Yann Droneaud authored
      commit 0b37e097 upstream.
      
      According to commit 80af2588 ("fanotify: groups can specify their
      f_flags for new fd"), file descriptors created as part of file access
      notification events inherit flags from the event_f_flags argument passed
      to syscall fanotify_init(2)[1].
      
      Unfortunately O_CLOEXEC is currently silently ignored.
      
      Indeed, event_f_flags are only given to dentry_open(), which only seems to
      care about O_ACCMODE and O_PATH in do_dentry_open(), O_DIRECT in
      open_check_o_direct() and O_LARGEFILE in generic_file_open().
      
      It's a pity, since, according to some lookup on various search engines and
      http://codesearch.debian.net/, there's already some userspace code which
      use O_CLOEXEC:
      
      - in systemd's readahead[2]:
      
          fanotify_fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK, O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);
      
      - in clsync[3]:
      
          #define FANOTIFY_EVFLAGS (O_LARGEFILE|O_RDONLY|O_CLOEXEC)
      
          int fanotify_d = fanotify_init(FANOTIFY_FLAGS, FANOTIFY_EVFLAGS);
      
      - in examples [4] from "Filesystem monitoring in the Linux
        kernel" article[5] by Aleksander Morgado:
      
          if ((fanotify_fd = fanotify_init (FAN_CLOEXEC,
                                            O_RDONLY | O_CLOEXEC | O_LARGEFILE)) < 0)
      
      Additionally, since commit 48149e9d ("fanotify: check file flags
      passed in fanotify_init").  having O_CLOEXEC as part of fanotify_init()
      second argument is expressly allowed.
      
      So it seems expected to set close-on-exec flag on the file descriptors if
      userspace is allowed to request it with O_CLOEXEC.
      
      But Andrew Morton raised[6] the concern that enabling now close-on-exec
      might break existing applications which ask for O_CLOEXEC but expect the
      file descriptor to be inherited across exec().
      
      In the other hand, as reported by Mihai Dontu[7] close-on-exec on the file
      descriptor returned as part of file access notify can break applications
      due to deadlock.  So close-on-exec is needed for most applications.
      
      More, applications asking for close-on-exec are likely expecting it to be
      enabled, relying on O_CLOEXEC being effective.  If not, it might weaken
      their security, as noted by Jan Kara[8].
      
      So this patch replaces call to macro get_unused_fd() by a call to function
      get_unused_fd_flags() with event_f_flags value as argument.  This way
      O_CLOEXEC flag in the second argument of fanotify_init(2) syscall is
      interpreted and close-on-exec get enabled when requested.
      
      [1] http://man7.org/linux/man-pages/man2/fanotify_init.2.html
      [2] http://cgit.freedesktop.org/systemd/systemd/tree/src/readahead/readahead-collect.c?id=v208#n294
      [3] https://github.com/xaionaro/clsync/blob/v0.2.1/sync.c#L1631
          https://github.com/xaionaro/clsync/blob/v0.2.1/configuration.h#L38
      [4] http://www.lanedo.com/~aleksander/fanotify/fanotify-example.c
      [5] http://www.lanedo.com/2013/filesystem-monitoring-linux-kernel/
      [6] http://lkml.kernel.org/r/20141001153621.65e9258e65a6167bf2e4cb50@linux-foundation.org
      [7] http://lkml.kernel.org/r/20141002095046.3715eb69@mdontu-l
      [8] http://lkml.kernel.org/r/20141002104410.GB19748@quack.suse.cz
      
      Link: http://lkml.kernel.org/r/cover.1411562410.git.ydroneaud@opteya.comSigned-off-by: default avatarYann Droneaud <ydroneaud@opteya.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reviewed by: Heinrich Schuchardt <xypron.glpk@gmx.de>
      Tested-by: default avatarHeinrich Schuchardt <xypron.glpk@gmx.de>
      Cc: Mihai Don\u021bu <mihai.dontu@gmail.com>
      Cc: Pádraig Brady <P@draigBrady.com>
      Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
      Cc: Michael Kerrisk-manpages <mtk.manpages@gmail.com>
      Cc: Lino Sanfilippo <LinoSanfilippo@gmx.de>
      Cc: Richard Guy Briggs <rgb@redhat.com>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      7baa56f6
    • Mike Snitzer's avatar
      block: fix alignment_offset math that assumes io_min is a power-of-2 · c76a73b3
      Mike Snitzer authored
      commit b8839b8c upstream.
      
      The math in both blk_stack_limits() and queue_limit_alignment_offset()
      assume that a block device's io_min (aka minimum_io_size) is always a
      power-of-2.  Fix the math such that it works for non-power-of-2 io_min.
      
      This issue (of alignment_offset != 0) became apparent when testing
      dm-thinp with a thinp blocksize that matches a RAID6 stripesize of
      1280K.  Commit fdfb4c8c ("dm thin: set minimum_io_size to pool's data
      block size") unlocked the potential for alignment_offset != 0 due to
      the dm-thin-pool's io_min possibly being a non-power-of-2.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      c76a73b3
    • Al Viro's avatar
      fix misuses of f_count() in ppp and netlink · 15c0af9c
      Al Viro authored
      commit 24dff96a upstream.
      
      we used to check for "nobody else could start doing anything with
      that opened file" by checking that refcount was 2 or less - one
      for descriptor table and one we'd acquired in fget() on the way to
      wherever we are.  That was race-prone (somebody else might have
      had a reference to descriptor table and do fget() just as we'd
      been checking) and it had become flat-out incorrect back when
      we switched to fget_light() on those codepaths - unlike fget(),
      it doesn't grab an extra reference unless the descriptor table
      is shared.  The same change allowed a race-free check, though -
      we are safe exactly when refcount is less than 2.
      
      It was a long time ago; pre-2.6.12 for ioctl() (the codepath leading
      to ppp one) and 2.6.17 for sendmsg() (netlink one).  OTOH,
      netlink hadn't grown that check until 3.9 and ppp used to live
      in drivers/net, not drivers/net/ppp until 3.1.  The bug existed
      well before that, though, and the same fix used to apply in old
      location of file.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      [lizf: Backported to 3.4: drop the change to netlink_mmap_sendmsg()]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      15c0af9c
    • Mikulas Patocka's avatar
      fs: make cont_expand_zero interruptible · 2551b5ed
      Mikulas Patocka authored
      commit c2ca0fcd upstream.
      
      This patch makes it possible to kill a process looping in
      cont_expand_zero. A process may spend a lot of time in this function, so
      it is desirable to be able to kill it.
      
      It happened to me that I wanted to copy a piece data from the disk to a
      file. By mistake, I used the "seek" parameter to dd instead of "skip". Due
      to the "seek" parameter, dd attempted to extend the file and became stuck
      doing so - the only possibility was to reset the machine or wait many
      hours until the filesystem runs out of space and cont_expand_zero fails.
      We need this patch to be able to terminate the process.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      2551b5ed
    • Tetsuo Handa's avatar
      fs: Fix theoretical division by 0 in super_cache_scan(). · 2ea17e67
      Tetsuo Handa authored
      commit 475d0db7 upstream.
      
      total_objects could be 0 and is used as a denom.
      
      While total_objects is a "long", total_objects == 0 unlikely happens for
      3.12 and later kernels because 32-bit architectures would not be able to
      hold (1 << 32) objects. However, total_objects == 0 may happen for kernels
      between 3.1 and 3.11 because total_objects in prune_super() was an "int"
      and (e.g.) x86_64 architecture might be able to hold (1 << 32) objects.
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      2ea17e67
    • Ben Hutchings's avatar
      x86: Reject x32 executables if x32 ABI not supported · db55550d
      Ben Hutchings authored
      commit 0e6d3112 upstream.
      
      It is currently possible to execve() an x32 executable on an x86_64
      kernel that has only ia32 compat enabled.  However all its syscalls
      will fail, even _exit().  This usually causes it to segfault.
      
      Change the ELF compat architecture check so that x32 executables are
      rejected if we don't support the x32 ABI.
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Link: http://lkml.kernel.org/r/1410120305.6822.9.camel@decadent.org.ukSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      db55550d
    • Scott Carter's avatar
      pata_serverworks: disable 64-KB DMA transfers on Broadcom OSB4 IDE Controller · 9be2cb10
      Scott Carter authored
      commit 37017ac6 upstream.
      
      The Broadcom OSB4 IDE Controller (vendor and device IDs: 1166:0211)
      does not support 64-KB DMA transfers.
      Whenever a 64-KB DMA transfer is attempted,
      the transfer fails and messages similar to the following
      are written to the console log:
      
         [ 2431.851125] sr 0:0:0:0: [sr0] Unhandled sense code
         [ 2431.851139] sr 0:0:0:0: [sr0]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
         [ 2431.851152] sr 0:0:0:0: [sr0]  Sense Key : Hardware Error [current]
         [ 2431.851166] sr 0:0:0:0: [sr0]  Add. Sense: Logical unit communication time-out
         [ 2431.851182] sr 0:0:0:0: [sr0] CDB: Read(10): 28 00 00 00 76 f4 00 00 40 00
         [ 2431.851210] end_request: I/O error, dev sr0, sector 121808
      
      When the libata and pata_serverworks modules
      are recompiled with ATA_DEBUG and ATA_VERBOSE_DEBUG defined in libata.h,
      the 64-KB transfer size in the scatter-gather list can be seen
      in the console log:
      
         [ 2664.897267] sr 9:0:0:0: [sr0] Send:
         [ 2664.897274] 0xf63d85e0
         [ 2664.897283] sr 9:0:0:0: [sr0] CDB:
         [ 2664.897288] Read(10): 28 00 00 00 7f b4 00 00 40 00
         [ 2664.897319] buffer = 0xf6d6fbc0, bufflen = 131072, queuecommand 0xf81b7700
         [ 2664.897331] ata_scsi_dump_cdb: CDB (1:0,0,0) 28 00 00 00 7f b4 00 00 40
         [ 2664.897338] ata_scsi_translate: ENTER
         [ 2664.897345] ata_sg_setup: ENTER, ata1
         [ 2664.897356] ata_sg_setup: 3 sg elements mapped
         [ 2664.897364] ata_bmdma_fill_sg: PRD[0] = (0x66FD2000, 0xE000)
         [ 2664.897371] ata_bmdma_fill_sg: PRD[1] = (0x65000000, 0x10000)
         ------------------------------------------------------> =======
         [ 2664.897378] ata_bmdma_fill_sg: PRD[2] = (0x66A10000, 0x2000)
         [ 2664.897386] ata1: ata_dev_select: ENTER, device 0, wait 1
         [ 2664.897422] ata_sff_tf_load: feat 0x1 nsect 0x0 lba 0x0 0x0 0xFC
         [ 2664.897428] ata_sff_tf_load: device 0xA0
         [ 2664.897448] ata_sff_exec_command: ata1: cmd 0xA0
         [ 2664.897457] ata_scsi_translate: EXIT
         [ 2664.897462] leaving scsi_dispatch_cmnd()
         [ 2664.897497] Doing sr request, dev = sr0, block = 0
         [ 2664.897507] sr0 : reading 64/256 512 byte blocks.
         [ 2664.897553] ata_sff_hsm_move: ata1: protocol 7 task_state 1 (dev_stat 0x58)
         [ 2664.897560] atapi_send_cdb: send cdb
         [ 2666.910058] ata_bmdma_port_intr: ata1: host_stat 0x64
         [ 2666.910079] __ata_sff_port_intr: ata1: protocol 7 task_state 3
         [ 2666.910093] ata_sff_hsm_move: ata1: protocol 7 task_state 3 (dev_stat 0x51)
         [ 2666.910101] ata_sff_hsm_move: ata1: protocol 7 task_state 4 (dev_stat 0x51)
         [ 2666.910129] sr 9:0:0:0: [sr0] Done:
         [ 2666.910136] 0xf63d85e0 TIMEOUT
      
      lspci shows that the driver used for the Broadcom OSB4 IDE Controller is
      pata_serverworks:
      
         00:0f.1 IDE interface: Broadcom OSB4 IDE Controller (prog-if 8e [Master SecP SecO PriP])
                 Flags: bus master, medium devsel, latency 64
                 [virtual] Memory at 000001f0 (32-bit, non-prefetchable) [size=8]
                 [virtual] Memory at 000003f0 (type 3, non-prefetchable) [size=1]
                 I/O ports at 0170 [size=8]
                 I/O ports at 0374 [size=4]
                 I/O ports at 1440 [size=16]
                 Kernel driver in use: pata_serverworks
      
      The pata_serverworks driver supports five distinct device IDs,
      one being the OSB4 and the other four belonging to the CSB series.
      The CSB series appears to support 64-KB DMA transfers,
      as tests on a machine with an SAI2 motherboard
      containing a Broadcom CSB5 IDE Controller (vendor and device IDs: 1166:0212)
      showed no problems with 64-KB DMA transfers.
      
      This problem was first discovered when attempting to install openSUSE
      from a DVD on a machine with an STL2 motherboard.
      Using the pata_serverworks module,
      older releases of openSUSE will not install at all due to the timeouts.
      Releases of openSUSE prior to 11.3 can be installed by disabling
      the pata_serverworks module using the brokenmodules boot parameter,
      which causes the serverworks module to be used instead.
      Recent releases of openSUSE (12.2 and later) include better error recovery and
      will install, though very slowly.
      On all openSUSE releases, the problem can be recreated
      on a machine containing a Broadcom OSB4 IDE Controller
      by mounting an install DVD and running a command similar to the following:
      
         find /mnt -type f -print | xargs cat > /dev/null
      
      The patch below corrects the problem.
      Similar to the other ATA drivers that do not support 64-KB DMA transfers,
      the patch changes the ata_port_operations qc_prep vector to point to a routine
      that breaks any 64-KB segment into two 32-KB segments and
      changes the scsi_host_template sg_tablesize element to reduce by half
      the number of scatter/gather elements allowed.
      These two changes affect only the OSB4.
      Signed-off-by: default avatarScott Carter <ccscott@funsoft.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      9be2cb10
    • Chao Yu's avatar
      ecryptfs: avoid to access NULL pointer when write metadata in xattr · 771f8a87
      Chao Yu authored
      commit 35425ea2 upstream.
      
      Christopher Head 2014-06-28 05:26:20 UTC described:
      "I tried to reproduce this on 3.12.21. Instead, when I do "echo hello > foo"
      in an ecryptfs mount with ecryptfs_xattr specified, I get a kernel crash:
      
      BUG: unable to handle kernel NULL pointer dereference at           (null)
      IP: [<ffffffff8110eb39>] fsstack_copy_attr_all+0x2/0x61
      PGD d7840067 PUD b2c3c067 PMD 0
      Oops: 0002 [#1] SMP
      Modules linked in: nvidia(PO)
      CPU: 3 PID: 3566 Comm: bash Tainted: P           O 3.12.21-gentoo-r1 #2
      Hardware name: ASUSTek Computer Inc. G60JX/G60JX, BIOS 206 03/15/2010
      task: ffff8801948944c0 ti: ffff8800bad70000 task.ti: ffff8800bad70000
      RIP: 0010:[<ffffffff8110eb39>]  [<ffffffff8110eb39>] fsstack_copy_attr_all+0x2/0x61
      RSP: 0018:ffff8800bad71c10  EFLAGS: 00010246
      RAX: 00000000000181a4 RBX: ffff880198648480 RCX: 0000000000000000
      RDX: 0000000000000004 RSI: ffff880172010450 RDI: 0000000000000000
      RBP: ffff880198490e40 R08: 0000000000000000 R09: 0000000000000000
      R10: ffff880172010450 R11: ffffea0002c51e80 R12: 0000000000002000
      R13: 000000000000001a R14: 0000000000000000 R15: ffff880198490e40
      FS:  00007ff224caa700(0000) GS:ffff88019fcc0000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 00000000bb07f000 CR4: 00000000000007e0
      Stack:
      ffffffff811826e8 ffff8800a39d8000 0000000000000000 000000000000001a
      ffff8800a01d0000 ffff8800a39d8000 ffffffff81185fd5 ffffffff81082c2c
      00000001a39d8000 53d0abbc98490e40 0000000000000037 ffff8800a39d8220
      Call Trace:
      [<ffffffff811826e8>] ? ecryptfs_setxattr+0x40/0x52
      [<ffffffff81185fd5>] ? ecryptfs_write_metadata+0x1b3/0x223
      [<ffffffff81082c2c>] ? should_resched+0x5/0x23
      [<ffffffff8118322b>] ? ecryptfs_initialize_file+0xaf/0xd4
      [<ffffffff81183344>] ? ecryptfs_create+0xf4/0x142
      [<ffffffff810f8c0d>] ? vfs_create+0x48/0x71
      [<ffffffff810f9c86>] ? do_last.isra.68+0x559/0x952
      [<ffffffff810f7ce7>] ? link_path_walk+0xbd/0x458
      [<ffffffff810fa2a3>] ? path_openat+0x224/0x472
      [<ffffffff810fa7bd>] ? do_filp_open+0x2b/0x6f
      [<ffffffff81103606>] ? __alloc_fd+0xd6/0xe7
      [<ffffffff810ee6ab>] ? do_sys_open+0x65/0xe9
      [<ffffffff8157d022>] ? system_call_fastpath+0x16/0x1b
      RIP  [<ffffffff8110eb39>] fsstack_copy_attr_all+0x2/0x61
      RSP <ffff8800bad71c10>
      CR2: 0000000000000000
      ---[ end trace df9dba5f1ddb8565 ]---"
      
      If we create a file when we mount with ecryptfs_xattr_metadata option, we will
      encounter a crash in this path:
      ->ecryptfs_create
        ->ecryptfs_initialize_file
          ->ecryptfs_write_metadata
            ->ecryptfs_write_metadata_to_xattr
              ->ecryptfs_setxattr
                ->fsstack_copy_attr_all
      It's because our dentry->d_inode used in fsstack_copy_attr_all is NULL, and it
      will be initialized when ecryptfs_initialize_file finish.
      
      So we should skip copying attr from lower inode when the value of ->d_inode is
      invalid.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarTyler Hicks <tyhicks@canonical.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      771f8a87