1. 29 Jul, 2015 4 commits
    • Michael Ellerman's avatar
      powerpc: Don't negate error in syscall_set_return_value() · 1b1a3702
      Michael Ellerman authored
      Currently the only caller of syscall_set_return_value() is seccomp
      filter, which is not enabled on powerpc.
      
      This means we have not noticed that our implementation of
      syscall_set_return_value() negates error, even though the value passed
      in is already negative.
      
      So remove the negation in syscall_set_return_value(), and expect the
      caller to do it like all other implementations do.
      
      Also add a comment about the ccr handling.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      1b1a3702
    • Michael Ellerman's avatar
      powerpc: Drop unused syscall_get_error() · 2923e6d5
      Michael Ellerman authored
      syscall_get_error() is unused, and never has been.
      
      It's also probably wrong, as it negates r3 before returning it, but that
      depends on what the caller is expecting.
      
      It also doesn't deal with compat, and doesn't deal with TIF_NOERROR.
      
      Although we could fix those, until it has a caller and it's clear what
      semantics the caller wants it's just untested code. So drop it.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      2923e6d5
    • Michael Ellerman's avatar
      powerpc/kernel: Change the do_syscall_trace_enter() API · d3837414
      Michael Ellerman authored
      The API for calling do_syscall_trace_enter() is currently sensible
      enough, it just returns the (modified) syscall number.
      
      However once we enable seccomp filter it will get more complicated. When
      seccomp filter runs, the seccomp kernel code (via SECCOMP_RET_ERRNO), or
      a ptracer (via SECCOMP_RET_TRACE), may reject the syscall and *may* or may
      *not* set a return value in r3.
      
      That means the assembler that calls do_syscall_trace_enter() can not
      blindly return ENOSYS, it needs to only return ENOSYS if a return value
      has not already been set.
      
      There is no way to implement that logic with the current API. So change
      the do_syscall_trace_enter() API to make it deal with the return code
      juggling, and the assembler can then just return whatever return code it
      is given.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      d3837414
    • Michael Ellerman's avatar
      powerpc/kernel: Switch to using MAX_ERRNO · c3525940
      Michael Ellerman authored
      Currently on powerpc we have our own #define for the highest (negative)
      errno value, called _LAST_ERRNO. This is defined to be 516, for reasons
      which are not clear.
      
      The generic code, and x86, use MAX_ERRNO, which is defined to be 4095.
      
      In particular seccomp uses MAX_ERRNO to restrict the value that a
      seccomp filter can return.
      
      Currently with the mismatch between _LAST_ERRNO and MAX_ERRNO, a seccomp
      tracer wanting to return 600, expecting it to be seen as an error, would
      instead find on powerpc that userspace sees a successful syscall with a
      return value of 600.
      
      To avoid this inconsistency, switch powerpc to use MAX_ERRNO.
      
      We are somewhat confident that generic syscalls that can return a
      non-error value above negative MAX_ERRNO have already been updated to
      use force_successful_syscall_return().
      
      I have also checked all the powerpc specific syscalls, and believe that
      none of them expect to return a non-error value between -MAX_ERRNO and
      -516. So this change should be safe ...
      Acked-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      c3525940
  2. 27 Jul, 2015 1 commit
  3. 25 Jul, 2015 2 commits
  4. 23 Jul, 2015 3 commits
    • Paul Mackerras's avatar
      powerpc: Use hardware RNG for arch_get_random_seed_* not arch_get_random_* · 01c9348c
      Paul Mackerras authored
      The hardware RNG on POWER8 and POWER7+ can be relatively slow, since
      it can only supply one 64-bit value per microsecond.  Currently we
      read it in arch_get_random_long(), but that slows down reading from
      /dev/urandom since the code in random.c calls arch_get_random_long()
      for every longword read from /dev/urandom.
      
      Since the hardware RNG supplies high-quality entropy on every read, it
      matches the semantics of arch_get_random_seed_long() better than those
      of arch_get_random_long().  Therefore this commit makes the code use
      the POWER8/7+ hardware RNG only for arch_get_random_seed_{long,int}
      and not for arch_get_random_{long,int}.
      
      This won't affect any other PowerPC-based platforms because none of
      them currently support a hardware RNG.  To make it clear that the
      ppc_md function pointer is used for arch_get_random_seed_*, we rename
      it from get_random_long to get_random_seed.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      01c9348c
    • Thomas Huth's avatar
      powerpc/rtas: Introduce rtas_get_sensor_fast() for IRQ handlers · 1c2cb594
      Thomas Huth authored
      The EPOW interrupt handler uses rtas_get_sensor(), which in turn
      uses rtas_busy_delay() to wait for RTAS becoming ready in case it
      is necessary. But rtas_busy_delay() is annotated with might_sleep()
      and thus may not be used by interrupts handlers like the EPOW handler!
      This leads to the following BUG when CONFIG_DEBUG_ATOMIC_SLEEP is
      enabled:
      
       BUG: sleeping function called from invalid context at arch/powerpc/kernel/rtas.c:496
       in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
       CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc2-thuth #6
       Call Trace:
       [c00000007ffe7b90] [c000000000807670] dump_stack+0xa0/0xdc (unreliable)
       [c00000007ffe7bc0] [c0000000000e1f14] ___might_sleep+0x134/0x180
       [c00000007ffe7c20] [c00000000002aec0] rtas_busy_delay+0x30/0xd0
       [c00000007ffe7c50] [c00000000002bde4] rtas_get_sensor+0x74/0xe0
       [c00000007ffe7ce0] [c000000000083264] ras_epow_interrupt+0x44/0x450
       [c00000007ffe7d90] [c000000000120260] handle_irq_event_percpu+0xa0/0x300
       [c00000007ffe7e70] [c000000000120524] handle_irq_event+0x64/0xc0
       [c00000007ffe7eb0] [c000000000124dbc] handle_fasteoi_irq+0xec/0x260
       [c00000007ffe7ef0] [c00000000011f4f0] generic_handle_irq+0x50/0x80
       [c00000007ffe7f20] [c000000000010f3c] __do_irq+0x8c/0x200
       [c00000007ffe7f90] [c0000000000236cc] call_do_irq+0x14/0x24
       [c00000007e6f39e0] [c000000000011144] do_IRQ+0x94/0x110
       [c00000007e6f3a30] [c000000000002594] hardware_interrupt_common+0x114/0x180
      
      Fix this issue by introducing a new rtas_get_sensor_fast() function
      that does not use rtas_busy_delay() - and thus can only be used for
      sensors that do not cause a BUSY condition - known as "fast" sensors.
      
      The EPOW sensor is defined to be "fast" in sPAPR - mpe.
      
      Fixes: 587f83e8 ("powerpc/pseries: Use rtas_get_sensor in RAS code")
      Signed-off-by: default avatarThomas Huth <thuth@redhat.com>
      Reviewed-by: default avatarNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      1c2cb594
    • Thomas Huth's avatar
      powerpc/rtas: Replace magic values with defines · 9ef03193
      Thomas Huth authored
      rtas.h already has some nice #defines for RTAS return status
      codes - let's use them instead of hard-coded "magic" values!
      Signed-off-by: default avatarThomas Huth <thuth@redhat.com>
      Reviewed-by: default avatarTyrel Datwyler <tyreld@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      9ef03193
  5. 21 Jul, 2015 3 commits
  6. 16 Jul, 2015 5 commits
  7. 13 Jul, 2015 13 commits
  8. 12 Jul, 2015 9 commits
    • Linus Torvalds's avatar
      Linux 4.2-rc2 · bc0195aa
      Linus Torvalds authored
      bc0195aa
    • Linus Torvalds's avatar
      Revert "drm/i915: Use crtc_state->active in primary check_plane func" · 01e2d062
      Linus Torvalds authored
      This reverts commit dec4f799.
      
      Jörg Otte reports a NULL pointder dereference due to this commit, as
      'crtc_state' very much can be NULL:
      
              crtc_state = state->base.state ?
                      intel_atomic_get_crtc_state(state->base.state, intel_crtc) : NULL;
      
      So the change to test 'crtc_state->base.active' cannot possibly be
      correct as-is.
      
      There may be some other minimal fix (like just checking crtc_state for
      NULL), but I'm just reverting it now for the rc2 release, and people
      like Daniel Vetter who actually know this code will figure out what the
      right solution is in the longer term.
      Reported-and-bisected-by: default avatarJörg Otte <jrg.otte@gmail.com>
      Cc: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Daniel Vetter <daniel.vetter@intel.com>
      CC: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      01e2d062
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · c83727a6
      Linus Torvalds authored
      Pull VFS fixes from Al Viro:
       "Fixes for this cycle regression in overlayfs and a couple of
        long-standing (== all the way back to 2.6.12, at least) bugs"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        freeing unlinked file indefinitely delayed
        fix a braino in ovl_d_select_inode()
        9p: don't leave a half-initialized inode sitting around
      c83727a6
    • Linus Torvalds's avatar
      Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus · 7fbb58a0
      Linus Torvalds authored
      Pull MIPS fixes from Ralf Baechle:
       "A fair number of 4.2 fixes also because Markos opened the flood gates.
      
         - Patch up the math used calculate the location for the page bitmap.
      
         - The FDC (Not what you think, FDC stands for Fast Debug Channel) IRQ
           around was causing issues on non-Malta platforms, so move the code
           to a Malta specific location.
      
         - A spelling fix replicated through several files.
      
         - Fix to the emulation of an R2 instruction for R6 cores.
      
         - Fix the JR emulation for R6.
      
         - Further patching of mindless 64 bit issues.
      
         - Ensure the kernel won't crash on CPUs with L2 caches with >= 8
           ways.
      
         - Use compat_sys_getsockopt for O32 ABI on 64 bit kernels.
      
         - Fix cache flushing for multithreaded cores.
      
         - A build fix"
      
      * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
        MIPS: O32: Use compat_sys_getsockopt.
        MIPS: c-r4k: Extend way_string array
        MIPS: Pistachio: Support CDMM & Fast Debug Channel
        MIPS: Malta: Make GIC FDC IRQ workaround Malta specific
        MIPS: c-r4k: Fix cache flushing for MT cores
        Revert "MIPS: Kconfig: Disable SMP/CPS for 64-bit"
        MIPS: cps-vec: Use macros for various arithmetics and memory operations
        MIPS: kernel: cps-vec: Replace KSEG0 with CKSEG0
        MIPS: kernel: cps-vec: Use ta0-ta3 pseudo-registers for 64-bit
        MIPS: kernel: cps-vec: Replace mips32r2 ISA level with mips64r2
        MIPS: kernel: cps-vec: Replace 'la' macro with PTR_LA
        MIPS: kernel: smp-cps: Fix 64-bit compatibility errors due to pointer casting
        MIPS: Fix erroneous JR emulation for MIPS R6
        MIPS: Fix branch emulation for BLTC and BGEC instructions
        MIPS: kernel: traps: Fix broken indentation
        MIPS: bootmem: Don't use memory holes for page bitmap
        MIPS: O32: Do not handle require 32 bytes from the stack to be readable.
        MIPS, CPUFREQ: Fix spelling of Institute.
        MIPS: Lemote 2F: Fix build caused by recent mass rename.
      7fbb58a0
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1daa1cfb
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
      
       - the high latency PIT detection fix, which slipped through the cracks
         for rc1
      
       - a regression fix for the early printk mechanism
      
       - the x86 part to plug irq/vector related hotplug races
      
       - move the allocation of the espfix pages on cpu hotplug to non atomic
         context.  The current code triggers a might_sleep() warning.
      
       - a series of KASAN fixes addressing boot crashes and usability
      
       - a trivial typo fix for Kconfig help text
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/kconfig: Fix typo in the CONFIG_CMDLINE_BOOL help text
        x86/irq: Retrieve irq data after locking irq_desc
        x86/irq: Use proper locking in check_irq_vectors_for_cpu_disable()
        x86/irq: Plug irq vector hotplug race
        x86/earlyprintk: Allow early_printk() to use console style parameters like '115200n8'
        x86/espfix: Init espfix on the boot CPU side
        x86/espfix: Add 'cpu' parameter to init_espfix_ap()
        x86/kasan: Move KASAN_SHADOW_OFFSET to the arch Kconfig
        x86/kasan: Add message about KASAN being initialized
        x86/kasan: Fix boot crash on AMD processors
        x86/kasan: Flush TLBs after switching CR3
        x86/kasan: Fix KASAN shadow region page tables
        x86/init: Clear 'init_level4_pgt' earlier
        x86/tsc: Let high latency PIT fail fast in quick_pit_calibrate()
      1daa1cfb
    • Linus Torvalds's avatar
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 7b732169
      Linus Torvalds authored
      Pull timer fixes from Thomas Gleixner:
       "This update from the timer departement contains:
      
         - A series of patches which address a shortcoming in the tick
           broadcast code.
      
           If the broadcast device is not available or an hrtimer emulated
           broadcast device, some of the original assumptions lead to boot
           failures.  I rather plugged all of the corner cases instead of only
           addressing the issue reported, so the change got a little larger.
      
           Has been extensivly tested on x86 and arm.
      
         - Get rid of the last holdouts using do_posix_clock_monotonic_gettime()
      
         - A regression fix for the imx clocksource driver
      
         - An update to the new state callbacks mechanism for clockevents.
           This is required to simplify the conversion, which will take place
           in 4.3"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        tick/broadcast: Prevent NULL pointer dereference
        time: Get rid of do_posix_clock_monotonic_gettime
        cris: Replace do_posix_clock_monotonic_gettime()
        tick/broadcast: Unbreak CONFIG_GENERIC_CLOCKEVENTS=n build
        tick/broadcast: Handle spurious interrupts gracefully
        tick/broadcast: Check for hrtimer broadcast active early
        tick/broadcast: Return busy when IPI is pending
        tick/broadcast: Return busy if periodic mode and hrtimer broadcast
        tick/broadcast: Move the check for periodic mode inside state handling
        tick/broadcast: Prevent deep idle if no broadcast device available
        tick/broadcast: Make idle check independent from mode and config
        tick/broadcast: Sanity check the shutdown of the local clock_event
        tick/broadcast: Prevent hrtimer recursion
        clockevents: Allow set-state callbacks to be optional
        clocksource/imx: Define clocksource for mx27
      7b732169
    • Linus Torvalds's avatar
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c4bc680c
      Linus Torvalds authored
      Pull irq fix from Thomas Gleixner:
       "A single fix for a cpu hotplug race vs. interrupt descriptors:
      
        Prevent irq setup/teardown across the cpu starting/dying parts of cpu
        hotplug so that the starting/dying cpu has a stable view of the
        descriptor space.  This has been an issue for all architectures in the
        cpu dying phase, where interrupts are migrated away from the dying
        cpu.  In the starting phase its mostly a x86 issue vs the vector space
        update"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        hotplug: Prevent alloc/free of irq descriptors during cpu up/down
      c4bc680c
    • Al Viro's avatar
      freeing unlinked file indefinitely delayed · 75a6f82a
      Al Viro authored
      	Normally opening a file, unlinking it and then closing will have
      the inode freed upon close() (provided that it's not otherwise busy and
      has no remaining links, of course).  However, there's one case where that
      does *not* happen.  Namely, if you open it by fhandle with cold dcache,
      then unlink() and close().
      
      	In normal case you get d_delete() in unlink(2) notice that dentry
      is busy and unhash it; on the final dput() it will be forcibly evicted from
      dcache, triggering iput() and inode removal.  In this case, though, we end
      up with *two* dentries - disconnected (created by open-by-fhandle) and
      regular one (used by unlink()).  The latter will have its reference to inode
      dropped just fine, but the former will not - it's considered hashed (it
      is on the ->s_anon list), so it will stay around until the memory pressure
      will finally do it in.  As the result, we have the final iput() delayed
      indefinitely.  It's trivial to reproduce -
      
      void flush_dcache(void)
      {
              system("mount -o remount,rw /");
      }
      
      static char buf[20 * 1024 * 1024];
      
      main()
      {
              int fd;
              union {
                      struct file_handle f;
                      char buf[MAX_HANDLE_SZ];
              } x;
              int m;
      
              x.f.handle_bytes = sizeof(x);
              chdir("/root");
              mkdir("foo", 0700);
              fd = open("foo/bar", O_CREAT | O_RDWR, 0600);
              close(fd);
              name_to_handle_at(AT_FDCWD, "foo/bar", &x.f, &m, 0);
              flush_dcache();
              fd = open_by_handle_at(AT_FDCWD, &x.f, O_RDWR);
              unlink("foo/bar");
              write(fd, buf, sizeof(buf));
              system("df .");			/* 20Mb eaten */
              close(fd);
              system("df .");			/* should've freed those 20Mb */
              flush_dcache();
              system("df .");			/* should be the same as #2 */
      }
      
      will spit out something like
      Filesystem     1K-blocks   Used Available Use% Mounted on
      /dev/root         322023 303843      1131 100% /
      Filesystem     1K-blocks   Used Available Use% Mounted on
      /dev/root         322023 303843      1131 100% /
      Filesystem     1K-blocks   Used Available Use% Mounted on
      /dev/root         322023 283282     21692  93% /
      - inode gets freed only when dentry is finally evicted (here we trigger
      than by remount; normally it would've happened in response to memory
      pressure hell knows when).
      
      Cc: stable@vger.kernel.org # v2.6.38+; earlier ones need s/kill_it/unhash_it/
      Acked-by: default avatarJ. Bruce Fields <bfields@fieldses.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      75a6f82a
    • Al Viro's avatar
      fix a braino in ovl_d_select_inode() · 9391dd00
      Al Viro authored
      when opening a directory we want the overlayfs inode, not one from
      the topmost layer.
      Reported-By: default avatarAndrey Jr. Melnikov <temnota.am@gmail.com>
      Tested-By: default avatarAndrey Jr. Melnikov <temnota.am@gmail.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      9391dd00