1. 19 Aug, 2015 5 commits
    • Thomas Gleixner's avatar
      genirq: Prevent resend to interrupts marked IRQ_NESTED_THREAD · 351ee276
      Thomas Gleixner authored
      commit 75a06189 upstream.
      
      The resend mechanism happily calls the interrupt handler of interrupts
      which are marked IRQ_NESTED_THREAD from softirq context. This can
      result in crashes because the interrupt handler is not the proper way
      to invoke the device handlers. They must be invoked via
      handle_nested_irq.
      
      Prevent the resend even if the interrupt has no valid parent irq
      set. Its better to have a lost interrupt than a crashing machine.
      Reported-by: default avatarUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      351ee276
    • Alexey Brodkin's avatar
      ARC: make sure instruction_pointer() returns unsigned value · b9d5df39
      Alexey Brodkin authored
      commit f51e2f19 upstream.
      
      Currently instruction_pointer() returns pt_regs->ret and so return value
      is of type "long", which implicitly stands for "signed long".
      
      While that's perfectly fine when dealing with 32-bit values if return
      value of instruction_pointer() gets assigned to 64-bit variable sign
      extension may happen.
      
      And at least in one real use-case it happens already.
      In perf_prepare_sample() return value of perf_instruction_pointer()
      (which is an alias to instruction_pointer() in case of ARC) is assigned
      to (struct perf_sample_data)->ip (which type is "u64").
      
      And what we see if instuction pointer points to user-space application
      that in case of ARC lays below 0x8000_0000 "ip" gets set properly with
      leading 32 zeros. But if instruction pointer points to kernel address
      space that starts from 0x8000_0000 then "ip" is set with 32 leadig
      "f"-s. I.e. id instruction_pointer() returns 0x8100_0000, "ip" will be
      assigned with 0xffff_ffff__8100_0000. Which is obviously wrong.
      
      In particular that issuse broke output of perf, because perf was unable
      to associate addresses like 0xffff_ffff__8100_0000 with anything from
      /proc/kallsyms.
      
      That's what we used to see:
       ----------->8----------
        6.27%  ls       [unknown]                [k] 0xffffffff8046c5cc
        2.96%  ls       libuClibc-0.9.34-git.so  [.] memcpy
        2.25%  ls       libuClibc-0.9.34-git.so  [.] memset
        1.66%  ls       [unknown]                [k] 0xffffffff80666536
        1.54%  ls       libuClibc-0.9.34-git.so  [.] 0x000224d6
        1.18%  ls       libuClibc-0.9.34-git.so  [.] 0x00022472
       ----------->8----------
      
      With that change perf output looks much better now:
       ----------->8----------
        8.21%  ls       [kernel.kallsyms]        [k] memset
        3.52%  ls       libuClibc-0.9.34-git.so  [.] memcpy
        2.11%  ls       libuClibc-0.9.34-git.so  [.] malloc
        1.88%  ls       libuClibc-0.9.34-git.so  [.] memset
        1.64%  ls       [kernel.kallsyms]        [k] _raw_spin_unlock_irqrestore
        1.41%  ls       [kernel.kallsyms]        [k] __d_lookup_rcu
       ----------->8----------
      Signed-off-by: default avatarAlexey Brodkin <abrodkin@synopsys.com>
      Cc: arc-linux-dev@synopsys.com
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      b9d5df39
    • Martin Schwidefsky's avatar
      s390/sclp: clear upper register halves in _sclp_print_early · 496398df
      Martin Schwidefsky authored
      commit f9c87a6f upstream.
      
      If the kernel is compiled with gcc 5.1 and the XZ compression option
      the decompress_kernel function calls _sclp_print_early in 64-bit mode
      while the content of the upper register half of %r6 is non-zero.
      This causes a specification exception on the servc instruction in
      _sclp_servc.
      
      The _sclp_print_early function saves and restores the upper registers
      halves but it fails to clear them for the 31-bit code of the mini sclp
      driver.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      496398df
    • Al Viro's avatar
      freeing unlinked file indefinitely delayed · f9c48e68
      Al Viro authored
      commit 75a6f82a upstream.
      
      	Normally opening a file, unlinking it and then closing will have
      the inode freed upon close() (provided that it's not otherwise busy and
      has no remaining links, of course).  However, there's one case where that
      does *not* happen.  Namely, if you open it by fhandle with cold dcache,
      then unlink() and close().
      
      	In normal case you get d_delete() in unlink(2) notice that dentry
      is busy and unhash it; on the final dput() it will be forcibly evicted from
      dcache, triggering iput() and inode removal.  In this case, though, we end
      up with *two* dentries - disconnected (created by open-by-fhandle) and
      regular one (used by unlink()).  The latter will have its reference to inode
      dropped just fine, but the former will not - it's considered hashed (it
      is on the ->s_anon list), so it will stay around until the memory pressure
      will finally do it in.  As the result, we have the final iput() delayed
      indefinitely.  It's trivial to reproduce -
      
      void flush_dcache(void)
      {
              system("mount -o remount,rw /");
      }
      
      static char buf[20 * 1024 * 1024];
      
      main()
      {
              int fd;
              union {
                      struct file_handle f;
                      char buf[MAX_HANDLE_SZ];
              } x;
              int m;
      
              x.f.handle_bytes = sizeof(x);
              chdir("/root");
              mkdir("foo", 0700);
              fd = open("foo/bar", O_CREAT | O_RDWR, 0600);
              close(fd);
              name_to_handle_at(AT_FDCWD, "foo/bar", &x.f, &m, 0);
              flush_dcache();
              fd = open_by_handle_at(AT_FDCWD, &x.f, O_RDWR);
              unlink("foo/bar");
              write(fd, buf, sizeof(buf));
              system("df .");			/* 20Mb eaten */
              close(fd);
              system("df .");			/* should've freed those 20Mb */
              flush_dcache();
              system("df .");			/* should be the same as #2 */
      }
      
      will spit out something like
      Filesystem     1K-blocks   Used Available Use% Mounted on
      /dev/root         322023 303843      1131 100% /
      Filesystem     1K-blocks   Used Available Use% Mounted on
      /dev/root         322023 303843      1131 100% /
      Filesystem     1K-blocks   Used Available Use% Mounted on
      /dev/root         322023 283282     21692  93% /
      - inode gets freed only when dentry is finally evicted (here we trigger
      than by remount; normally it would've happened in response to memory
      pressure hell knows when).
      Acked-by: default avatarJ. Bruce Fields <bfields@fieldses.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      f9c48e68
    • Kirill A. Shutemov's avatar
      mm: avoid setting up anonymous pages into file mapping · bf653833
      Kirill A. Shutemov authored
      commit 6b7339f4 upstream.
      
      Reading page fault handler code I've noticed that under right
      circumstances kernel would map anonymous pages into file mappings: if
      the VMA doesn't have vm_ops->fault() and the VMA wasn't fully populated
      on ->mmap(), kernel would handle page fault to not populated pte with
      do_anonymous_page().
      
      Let's change page fault handler to use do_anonymous_page() only on
      anonymous VMA (->vm_ops == NULL) and make sure that the VMA is not
      shared.
      
      For file mappings without vm_ops->fault() or shred VMA without vm_ops,
      page fault on pte_none() entry would lead to SIGBUS.
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Willy Tarreau <w@1wt.eu>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      bf653833
  2. 06 Aug, 2015 2 commits
  3. 05 Aug, 2015 1 commit
  4. 04 Aug, 2015 32 commits