1. 30 Aug, 2017 40 commits
    • Jani Nikula's avatar
      drm/i915/vbt: ignore extraneous child devices for a port · bbb04b37
      Jani Nikula authored
      commit 7c648bde upstream.
      
      Ever since we've parsed VBT child devices, starting from 6acab15a
      ("drm/i915: use the HDMI DDI buffer translations from VBT"), we've
      ignored the child device information if more than one child device
      references the same port. The rationale for this seems lost in time.
      
      Since commit 311a2094 ("drm/i915: don't init DP or HDMI when not
      supported by DDI port") we started using this information more to skip
      HDMI/DP init if the port wasn't there per VBT child devices. However, at
      the same time it added port defaults without further explanation.
      
      Thus, if the child device info was skipped due to multiple child devices
      referencing the same port, the device info would be retrieved from the
      somewhat arbitrary defaults.
      
      Finally, when commit bb1d1329 ("drm/i915/vbt: split out defaults
      that are set when there is no VBT") stopped initializing the defaults
      whenever VBT is present, thus trusting the VBT more, we stopped
      initializing ports which were referenced by more than one child device.
      
      Apparently at least Asus UX305UA, UX305U, and UX306U laptops have VBT
      child device blocks which cause this behaviour. Arguably they were
      shipped with a broken VBT.
      
      Relax the rules for multiple references to the same port, and use the
      first child device info to reference a port. Retain the logic to debug
      log about this, though.
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101745
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=196233
      Fixes: bb1d1329 ("drm/i915/vbt: split out defaults that are set when there is no VBT")
      Tested-by: default avatarOliver Weißbarth <mail@oweissbarth.de>
      Reported-by: default avatarOliver Weißbarth <mail@oweissbarth.de>
      Reported-by: default avatarDidier G <didierg-divers@orange.fr>
      Reported-by: default avatarGiles Anderson <agander@gmail.com>
      Cc: Manasi Navare <manasi.d.navare@intel.com>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
      Reviewed-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20170811113907.6716-1-jani.nikula@intel.comSigned-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      (cherry picked from commit b5273d72)
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bbb04b37
    • Maarten Lankhorst's avatar
      drm/atomic: If the atomic check fails, return its value first · d76df456
      Maarten Lankhorst authored
      commit a0ffc51e upstream.
      
      The last part of drm_atomic_check_only is testing whether we need to
      fail with -EINVAL when modeset is not allowed, but forgets to return
      the value when atomic_check() fails first.
      
      This results in -EDEADLK being replaced by -EINVAL, and the sanity
      check in drm_modeset_drop_locks kicks in:
      
      [  308.531734] ------------[ cut here ]------------
      [  308.531791] WARNING: CPU: 0 PID: 1886 at drivers/gpu/drm/drm_modeset_lock.c:217 drm_modeset_drop_locks+0x33/0xc0 [drm]
      [  308.531828] Modules linked in:
      [  308.532050] CPU: 0 PID: 1886 Comm: kms_atomic Tainted: G     U  W 4.13.0-rc5-patser+ #5225
      [  308.532082] Hardware name: NUC5i7RYB, BIOS RYBDWi35.86A.0246.2015.0309.1355 03/09/2015
      [  308.532124] task: ffff8800cd9dae00 task.stack: ffff8800ca3b8000
      [  308.532168] RIP: 0010:drm_modeset_drop_locks+0x33/0xc0 [drm]
      [  308.532189] RSP: 0018:ffff8800ca3bf980 EFLAGS: 00010282
      [  308.532211] RAX: dffffc0000000000 RBX: ffff8800ca3bfaf8 RCX: 0000000013a171e6
      [  308.532235] RDX: 1ffff10019477f69 RSI: ffffffffa8ba4fa0 RDI: ffff8800ca3bfb48
      [  308.532258] RBP: ffff8800ca3bf998 R08: 0000000000000000 R09: 0000000000000003
      [  308.532281] R10: 0000000079dbe066 R11: 00000000f760b34b R12: 0000000000000001
      [  308.532304] R13: dffffc0000000000 R14: 00000000ffffffea R15: ffff880096889680
      [  308.532328] FS:  00007ff00959cec0(0000) GS:ffff8800d4e00000(0000) knlGS:0000000000000000
      [  308.532359] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  308.532380] CR2: 0000000000000008 CR3: 00000000ca2e3000 CR4: 00000000003406f0
      [  308.532402] Call Trace:
      [  308.532440]  drm_mode_atomic_ioctl+0x19fa/0x1c00 [drm]
      [  308.532488]  ? drm_atomic_set_property+0x1220/0x1220 [drm]
      [  308.532565]  ? avc_has_extended_perms+0xc39/0xff0
      [  308.532593]  ? lock_downgrade+0x610/0x610
      [  308.532640]  ? drm_atomic_set_property+0x1220/0x1220 [drm]
      [  308.532680]  drm_ioctl_kernel+0x154/0x1a0 [drm]
      [  308.532755]  drm_ioctl+0x624/0x8f0 [drm]
      [  308.532858]  ? drm_atomic_set_property+0x1220/0x1220 [drm]
      [  308.532976]  ? drm_getunique+0x210/0x210 [drm]
      [  308.533061]  do_vfs_ioctl+0xd92/0xe40
      [  308.533121]  ? ioctl_preallocate+0x1b0/0x1b0
      [  308.533160]  ? selinux_capable+0x20/0x20
      [  308.533191]  ? do_fcntl+0x1b1/0xbf0
      [  308.533219]  ? kasan_slab_free+0xa2/0xb0
      [  308.533249]  ? f_getown+0x4b/0xa0
      [  308.533278]  ? putname+0xcf/0xe0
      [  308.533309]  ? security_file_ioctl+0x57/0x90
      [  308.533342]  SyS_ioctl+0x4e/0x80
      [  308.533374]  entry_SYSCALL_64_fastpath+0x18/0xad
      [  308.533405] RIP: 0033:0x7ff00779e4d7
      [  308.533431] RSP: 002b:00007fff66a043d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      [  308.533481] RAX: ffffffffffffffda RBX: 000000e7c7ca5910 RCX: 00007ff00779e4d7
      [  308.533560] RDX: 00007fff66a04430 RSI: 00000000c03864bc RDI: 0000000000000003
      [  308.533608] RBP: 00007ff007a5fb00 R08: 000000e7c7ca4620 R09: 000000e7c7ca5e60
      [  308.533647] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000070
      [  308.533685] R13: 0000000000000000 R14: 0000000000000000 R15: 000000e7c7ca5930
      [  308.533770] Code: ff df 55 48 89 e5 41 55 41 54 53 48 89 fb 48 83 c7
      50 48 89 fa 48 c1 ea 03 80 3c 02 00 74 05 e8 94 d4 16 e7 48 83 7b 50 00
      74 02 <0f> ff 4c 8d 6b 58 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1
      [  308.534086] ---[ end trace 77f11e53b1df44ad ]---
      
      Solve this by adding the missing return.
      
      This is also a bugfix because we could end up rejecting updates with
      -EINVAL because of a early -EDEADLK, while if atomic_check ran to
      completion it might have downgraded the modeset to a fastset.
      Signed-off-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Testcase: kms_atomic
      Link: https://patchwork.freedesktop.org/patch/msgid/20170815095706.23624-1-maarten.lankhorst@linux.intel.com
      Fixes: d34f20d6 ("drm: Atomic modeset ioctl")
      Reviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d76df456
    • Maarten Lankhorst's avatar
      drm/atomic: Handle -EDEADLK with out-fences correctly · 247122f1
      Maarten Lankhorst authored
      commit 7f5d6dac upstream.
      
      complete_crtc_signaling is freeing fence_state, but when retrying
      num_fences and fence_state are not zero'd. This caused duplicate
      fd's in the fence_state array, followed by a BUG_ON in fs/file.c
      because we reallocate freed memory, and installing over an existing
      fd, or potential other fun.
      
      Zero fence_state and num_fences correctly in the retry loop, which
      allows kms_atomic_transition to pass.
      
      Fixes: beaf5af4 ("drm/fence: add out-fences support")
      Cc: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
      Cc: Brian Starkey <brian.starkey@arm.com> (v10)
      Cc: Sean Paul <seanpaul@chromium.org>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: David Airlie <airlied@linux.ie>
      Signed-off-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Testcase: kms_atomic_transitions.plane-all-modeset-transition-fencing
      (with CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y)
      Link: https://patchwork.freedesktop.org/patch/msgid/20170814100721.13340-1-maarten.lankhorst@linux.intel.com
      Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> #intel-gfx on irc
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      247122f1
    • Jonathan Liu's avatar
      drm/sun4i: Implement drm_driver lastclose to restore fbdev console · d4ae641c
      Jonathan Liu authored
      commit 2a596fc9 upstream.
      
      The drm_driver lastclose callback is called when the last userspace
      DRM client has closed. Call drm_fbdev_cma_restore_mode to restore
      the fbdev console otherwise the fbdev console will stop working.
      
      Fixes: 9026e0d1 ("drm: Add Allwinner A10 Display Engine support")
      Tested-by: default avatarOlliver Schinagl <oliver@schinagl.nl>
      Reviewed-by: default avatarChen-Yu Tsai <wens@csie.org>
      Signed-off-by: default avatarJonathan Liu <net147@gmail.com>
      Signed-off-by: default avatarMaxime Ripard <maxime.ripard@free-electrons.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d4ae641c
    • Chris Wilson's avatar
      drm: Release driver tracking before making the object available again · 08353913
      Chris Wilson authored
      commit fe4600a5 upstream.
      
      This is the same bug as we fixed in commit f6cd7dae ("drm: Release
      driver references to handle before making it available again"), but now
      the exposure is via the PRIME lookup tables. If we remove the
      object/handle from the PRIME lut, then a new request for the same
      object/fd will generate a new handle, thus for a short window that
      object is known to userspace by two different handles. Fix this by
      releasing the driver tracking before PRIME.
      
      Fixes: 0ff926c7 ("drm/prime: add exported buffers to current fprivs
      imported buffer list (v2)")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Daniel Vetter <daniel.vetter@intel.com>
      Cc: Rob Clark <robdclark@gmail.com>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Cc: Thierry Reding <treding@nvidia.com>
      Reviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20170819120558.6465-1-chris@chris-wilson.co.ukSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      08353913
    • Nikhil Mahale's avatar
      drm: Fix framebuffer leak · b96c1565
      Nikhil Mahale authored
      commit 491ab470 upstream.
      
      Do not leak framebuffer if client provided crtc id found invalid.
      Signed-off-by: default avatarNikhil Mahale <nmahale@nvidia.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: https://patchwork.freedesktop.org/patch/msgid/1502250781-5779-1-git-send-email-nmahale@nvidia.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b96c1565
    • Dave Martin's avatar
      arm64: fpsimd: Prevent registers leaking across exec · 865d89f8
      Dave Martin authored
      commit 09662210 upstream.
      
      There are some tricky dependencies between the different stages of
      flushing the FPSIMD register state during exec, and these can race
      with context switch in ways that can cause the old task's regs to
      leak across.  In particular, a context switch during the memset() can
      cause some of the task's old FPSIMD registers to reappear.
      
      Disabling preemption for this small window would be no big deal for
      performance: preemption is already disabled for similar scenarios
      like updating the FPSIMD registers in sigreturn.
      
      So, instead of rearranging things in ways that might swap existing
      subtle bugs for new ones, this patch just disables preemption
      around the FPSIMD state flushing so that races of this type can't
      occur here.  This brings fpsimd_flush_thread() into line with other
      code paths.
      
      Fixes: 674c242c ("arm64: flush FP/SIMD state correctly after execve()")
      Reviewed-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarDave Martin <Dave.Martin@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      865d89f8
    • Pavel Tatashin's avatar
      mm/memblock.c: reversed logic in memblock_discard() · 1c229d7a
      Pavel Tatashin authored
      commit 91b540f9 upstream.
      
      In recently introduced memblock_discard() there is a reversed logic bug.
      Memory is freed of static array instead of dynamically allocated one.
      
      Link: http://lkml.kernel.org/r/1503511441-95478-2-git-send-email-pasha.tatashin@oracle.com
      Fixes: 3010f876 ("mm: discard memblock data later")
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Reported-by: default avatarWoody Suwalski <terraluna977@gmail.com>
      Tested-by: default avatarWoody Suwalski <terraluna977@gmail.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1c229d7a
    • Eric Biggers's avatar
      fork: fix incorrect fput of ->exe_file causing use-after-free · f5024bb3
      Eric Biggers authored
      commit 2b7e8665 upstream.
      
      Commit 7c051267 ("mm, fork: make dup_mmap wait for mmap_sem for
      write killable") made it possible to kill a forking task while it is
      waiting to acquire its ->mmap_sem for write, in dup_mmap().
      
      However, it was overlooked that this introduced an new error path before
      a reference is taken on the mm_struct's ->exe_file.  Since the
      ->exe_file of the new mm_struct was already set to the old ->exe_file by
      the memcpy() in dup_mm(), it was possible for the mmput() in the error
      path of dup_mm() to drop a reference to ->exe_file which was never
      taken.
      
      This caused the struct file to later be freed prematurely.
      
      Fix it by updating mm_init() to NULL out the ->exe_file, in the same
      place it clears other things like the list of mmaps.
      
      This bug was found by syzkaller.  It can be reproduced using the
      following C program:
      
          #define _GNU_SOURCE
          #include <pthread.h>
          #include <stdlib.h>
          #include <sys/mman.h>
          #include <sys/syscall.h>
          #include <sys/wait.h>
          #include <unistd.h>
      
          static void *mmap_thread(void *_arg)
          {
              for (;;) {
                  mmap(NULL, 0x1000000, PROT_READ,
                       MAP_POPULATE|MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
              }
          }
      
          static void *fork_thread(void *_arg)
          {
              usleep(rand() % 10000);
              fork();
          }
      
          int main(void)
          {
              fork();
              fork();
              fork();
              for (;;) {
                  if (fork() == 0) {
                      pthread_t t;
      
                      pthread_create(&t, NULL, mmap_thread, NULL);
                      pthread_create(&t, NULL, fork_thread, NULL);
                      usleep(rand() % 10000);
                      syscall(__NR_exit_group, 0);
                  }
                  wait(NULL);
              }
          }
      
      No special kernel config options are needed.  It usually causes a NULL
      pointer dereference in __remove_shared_vm_struct() during exit, or in
      dup_mmap() (which is usually inlined into copy_process()) during fork.
      Both are due to a vm_area_struct's ->vm_file being used after it's
      already been freed.
      
      Google Bug Id: 64772007
      
      Link: http://lkml.kernel.org/r/20170823211408.31198-1-ebiggers3@gmail.com
      Fixes: 7c051267 ("mm, fork: make dup_mmap wait for mmap_sem for write killable")
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Tested-by: default avatarMark Rutland <mark.rutland@arm.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f5024bb3
    • Eric Biggers's avatar
      mm/madvise.c: fix freeing of locked page with MADV_FREE · 4823f463
      Eric Biggers authored
      commit 263630e8 upstream.
      
      If madvise(..., MADV_FREE) split a transparent hugepage, it called
      put_page() before unlock_page().
      
      This was wrong because put_page() can free the page, e.g. if a
      concurrent madvise(..., MADV_DONTNEED) has removed it from the memory
      mapping. put_page() then rightfully complained about freeing a locked
      page.
      
      Fix this by moving the unlock_page() before put_page().
      
      This bug was found by syzkaller, which encountered the following splat:
      
          BUG: Bad page state in process syzkaller412798  pfn:1bd800
          page:ffffea0006f60000 count:0 mapcount:0 mapping:          (null) index:0x20a00
          flags: 0x200000000040019(locked|uptodate|dirty|swapbacked)
          raw: 0200000000040019 0000000000000000 0000000000020a00 00000000ffffffff
          raw: ffffea0006f60020 ffffea0006f60020 0000000000000000 0000000000000000
          page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
          bad because of flags: 0x1(locked)
          Modules linked in:
          CPU: 1 PID: 3037 Comm: syzkaller412798 Not tainted 4.13.0-rc5+ #35
          Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
          Call Trace:
           __dump_stack lib/dump_stack.c:16 [inline]
           dump_stack+0x194/0x257 lib/dump_stack.c:52
           bad_page+0x230/0x2b0 mm/page_alloc.c:565
           free_pages_check_bad+0x1f0/0x2e0 mm/page_alloc.c:943
           free_pages_check mm/page_alloc.c:952 [inline]
           free_pages_prepare mm/page_alloc.c:1043 [inline]
           free_pcp_prepare mm/page_alloc.c:1068 [inline]
           free_hot_cold_page+0x8cf/0x12b0 mm/page_alloc.c:2584
           __put_single_page mm/swap.c:79 [inline]
           __put_page+0xfb/0x160 mm/swap.c:113
           put_page include/linux/mm.h:814 [inline]
           madvise_free_pte_range+0x137a/0x1ec0 mm/madvise.c:371
           walk_pmd_range mm/pagewalk.c:50 [inline]
           walk_pud_range mm/pagewalk.c:108 [inline]
           walk_p4d_range mm/pagewalk.c:134 [inline]
           walk_pgd_range mm/pagewalk.c:160 [inline]
           __walk_page_range+0xc3a/0x1450 mm/pagewalk.c:249
           walk_page_range+0x200/0x470 mm/pagewalk.c:326
           madvise_free_page_range.isra.9+0x17d/0x230 mm/madvise.c:444
           madvise_free_single_vma+0x353/0x580 mm/madvise.c:471
           madvise_dontneed_free mm/madvise.c:555 [inline]
           madvise_vma mm/madvise.c:664 [inline]
           SYSC_madvise mm/madvise.c:832 [inline]
           SyS_madvise+0x7d3/0x13c0 mm/madvise.c:760
           entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      Here is a C reproducer:
      
          #define _GNU_SOURCE
          #include <pthread.h>
          #include <sys/mman.h>
          #include <unistd.h>
      
          #define MADV_FREE	8
          #define PAGE_SIZE	4096
      
          static void *mapping;
          static const size_t mapping_size = 0x1000000;
      
          static void *madvise_thrproc(void *arg)
          {
              madvise(mapping, mapping_size, (long)arg);
          }
      
          int main(void)
          {
              pthread_t t[2];
      
              for (;;) {
                  mapping = mmap(NULL, mapping_size, PROT_WRITE,
                                 MAP_POPULATE|MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
      
                  munmap(mapping + mapping_size / 2, PAGE_SIZE);
      
                  pthread_create(&t[0], 0, madvise_thrproc, (void*)MADV_DONTNEED);
                  pthread_create(&t[1], 0, madvise_thrproc, (void*)MADV_FREE);
                  pthread_join(t[0], NULL);
                  pthread_join(t[1], NULL);
                  munmap(mapping, mapping_size);
              }
          }
      
      Note: to see the splat, CONFIG_TRANSPARENT_HUGEPAGE=y and
      CONFIG_DEBUG_VM=y are needed.
      
      Google Bug Id: 64696096
      
      Link: http://lkml.kernel.org/r/20170823205235.132061-1-ebiggers3@gmail.com
      Fixes: 854e9ed0 ("mm: support madvise(MADV_FREE)")
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4823f463
    • Ulf Hansson's avatar
      i2c: designware: Fix system suspend · c237efed
      Ulf Hansson authored
      commit a23318fe upstream.
      
      The commit 8503ff16 ("i2c: designware: Avoid unnecessary resuming
      during system suspend"), may suggest to the PM core to try out the so
      called direct_complete path for system sleep. In this path, the PM core
      treats a runtime suspended device as it's already in a proper low power
      state for system sleep, which makes it skip calling the system sleep
      callbacks for the device, except for the ->prepare() and the ->complete()
      callbacks.
      
      However, the PM core may unset the direct_complete flag for a parent
      device, in case its child device are being system suspended before. In this
      scenario, the PM core invokes the system sleep callbacks, no matter if the
      device is runtime suspended or not.
      
      Particularly in cases of an existing i2c slave device, the above path is
      triggered, which breaks the assumption that the i2c device is always
      runtime resumed whenever the dw_i2c_plat_suspend() is being called.
      
      More precisely, dw_i2c_plat_suspend() calls clk_core_disable() and
      clk_core_unprepare(), for an already disabled/unprepared clock, leading to
      a splat in the log about clocks calls being wrongly balanced and breaking
      system sleep.
      
      To still allow the direct_complete path in cases when it's possible, but
      also to keep the fix simple, let's runtime resume the i2c device in the
      ->suspend() callback, before continuing to put the device into low power
      state.
      
      Note, in cases when the i2c device is attached to the ACPI PM domain, this
      problem doesn't occur, because ACPI's ->suspend() callback, assigned to
      acpi_subsys_suspend(), already calls pm_runtime_resume() for the device.
      
      It should also be noted that this change does not fix commit 8503ff16
      ("i2c: designware: Avoid unnecessary resuming during system suspend").
      Because for the non-ACPI case, the system sleep support was already broken
      prior that point.
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Acked-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Tested-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Tested-by: default avatarJarkko Nikula <jarkko.nikula@linux.intel.com>
      Acked-by: default avatarJarkko Nikula <jarkko.nikula@linux.intel.com>
      Reviewed-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: default avatarWolfram Sang <wsa@the-dreams.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c237efed
    • Ross Zwisler's avatar
      dax: fix deadlock due to misaligned PMD faults · 3a9495fd
      Ross Zwisler authored
      commit fffa281b upstream.
      
      In DAX there are two separate places where the 2MiB range of a PMD is
      defined.
      
      The first is in the page tables, where a PMD mapping inserted for a
      given address spans from (vmf->address & PMD_MASK) to ((vmf->address &
      PMD_MASK) + PMD_SIZE - 1).  That is, from the 2MiB boundary below the
      address to the 2MiB boundary above the address.
      
      So, for example, a fault at address 3MiB (0x30 0000) falls within the
      PMD that ranges from 2MiB (0x20 0000) to 4MiB (0x40 0000).
      
      The second PMD range is in the mapping->page_tree, where a given file
      offset is covered by a radix tree entry that spans from one 2MiB aligned
      file offset to another 2MiB aligned file offset.
      
      So, for example, the file offset for 3MiB (pgoff 768) falls within the
      PMD range for the order 9 radix tree entry that ranges from 2MiB (pgoff
      512) to 4MiB (pgoff 1024).
      
      This system works so long as the addresses and file offsets for a given
      mapping both have the same offsets relative to the start of each PMD.
      
      Consider the case where the starting address for a given file isn't 2MiB
      aligned - say our faulting address is 3 MiB (0x30 0000), but that
      corresponds to the beginning of our file (pgoff 0).  Now all the PMDs in
      the mapping are misaligned so that the 2MiB range defined in the page
      tables never matches up with the 2MiB range defined in the radix tree.
      
      The current code notices this case for DAX faults to storage with the
      following test in dax_pmd_insert_mapping():
      
      	if (pfn_t_to_pfn(pfn) & PG_PMD_COLOUR)
      		goto unlock_fallback;
      
      This test makes sure that the pfn we get from the driver is 2MiB
      aligned, and relies on the assumption that the 2MiB alignment of the pfn
      we get back from the driver matches the 2MiB alignment of the faulting
      address.
      
      However, faults to holes were not checked and we could hit the problem
      described above.
      
      This was reported in response to the NVML nvml/src/test/pmempool_sync
      TEST5:
      
      	$ cd nvml/src/test/pmempool_sync
      	$ make TEST5
      
      You can grab NVML here:
      
      	https://github.com/pmem/nvml/
      
      The dmesg warning you see when you hit this error is:
      
        WARNING: CPU: 13 PID: 2900 at fs/dax.c:641 dax_insert_mapping_entry+0x2df/0x310
      
      Where we notice in dax_insert_mapping_entry() that the radix tree entry
      we are about to replace doesn't match the locked entry that we had
      previously inserted into the tree.  This happens because the initial
      insertion was done in grab_mapping_entry() using a pgoff calculated from
      the faulting address (vmf->address), and the replacement in
      dax_pmd_load_hole() => dax_insert_mapping_entry() is done using
      vmf->pgoff.
      
      In our failure case those two page offsets (one calculated from
      vmf->address, one using vmf->pgoff) point to different order 9 radix
      tree entries.
      
      This failure case can result in a deadlock because the radix tree unlock
      also happens on the pgoff calculated from vmf->address.  This means that
      the locked radix tree entry that we swapped in to the tree in
      dax_insert_mapping_entry() using vmf->pgoff is never unlocked, so all
      future faults to that 2MiB range will block forever.
      
      Fix this by validating that the faulting address's PMD offset matches
      the PMD offset from the start of the file.  This check is done at the
      very beginning of the fault and covers faults that would have mapped to
      storage as well as faults to holes.  I left the COLOUR check in
      dax_pmd_insert_mapping() in place in case we ever hit the insanity
      condition where the alignment of the pfn we get from the driver doesn't
      match the alignment of the userspace address.
      
      Link: http://lkml.kernel.org/r/20170822222436.18926-1-ross.zwisler@linux.intel.comSigned-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Reported-by: default avatar"Slusarz, Marcin" <marcin.slusarz@intel.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3a9495fd
    • Kirill A. Shutemov's avatar
      mm, shmem: fix handling /sys/kernel/mm/transparent_hugepage/shmem_enabled · 735a252f
      Kirill A. Shutemov authored
      commit 435c0b87 upstream.
      
      /sys/kernel/mm/transparent_hugepage/shmem_enabled controls if we want
      to allocate huge pages when allocate pages for private in-kernel shmem
      mount.
      
      Unfortunately, as Dan noticed, I've screwed it up and the only way to
      make kernel allocate huge page for the mount is to use "force" there.
      All other values will be effectively ignored.
      
      Link: http://lkml.kernel.org/r/20170822144254.66431-1-kirill.shutemov@linux.intel.com
      Fixes: 5a6e75f8 ("shmem: prepare huge= mount option and sysfs knob")
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      735a252f
    • Chen Yu's avatar
      PM/hibernate: touch NMI watchdog when creating snapshot · b2719637
      Chen Yu authored
      commit 556b969a upstream.
      
      There is a problem that when counting the pages for creating the
      hibernation snapshot will take significant amount of time, especially on
      system with large memory.  Since the counting job is performed with irq
      disabled, this might lead to NMI lockup.  The following warning were
      found on a system with 1.5TB DRAM:
      
        Freezing user space processes ... (elapsed 0.002 seconds) done.
        OOM killer disabled.
        PM: Preallocating image memory...
        NMI watchdog: Watchdog detected hard LOCKUP on cpu 27
        CPU: 27 PID: 3128 Comm: systemd-sleep Not tainted 4.13.0-0.rc2.git0.1.fc27.x86_64 #1
        task: ffff9f01971ac000 task.stack: ffffb1a3f325c000
        RIP: 0010:memory_bm_find_bit+0xf4/0x100
        Call Trace:
         swsusp_set_page_free+0x2b/0x30
         mark_free_pages+0x147/0x1c0
         count_data_pages+0x41/0xa0
         hibernate_preallocate_memory+0x80/0x450
         hibernation_snapshot+0x58/0x410
         hibernate+0x17c/0x310
         state_store+0xdf/0xf0
         kobj_attr_store+0xf/0x20
         sysfs_kf_write+0x37/0x40
         kernfs_fop_write+0x11c/0x1a0
         __vfs_write+0x37/0x170
         vfs_write+0xb1/0x1a0
         SyS_write+0x55/0xc0
         entry_SYSCALL_64_fastpath+0x1a/0xa5
        ...
        done (allocated 6590003 pages)
        PM: Allocated 26360012 kbytes in 19.89 seconds (1325.28 MB/s)
      
      It has taken nearly 20 seconds(2.10GHz CPU) thus the NMI lockup was
      triggered.  In case the timeout of the NMI watch dog has been set to 1
      second, a safe interval should be 6590003/20 = 320k pages in theory.
      However there might also be some platforms running at a lower frequency,
      so feed the watchdog every 100k pages.
      
      [yu.c.chen@intel.com: simplification]
        Link: http://lkml.kernel.org/r/1503460079-29721-1-git-send-email-yu.c.chen@intel.com
      [yu.c.chen@intel.com: use interval of 128k instead of 100k to avoid modulus]
      Link: http://lkml.kernel.org/r/1503328098-5120-1-git-send-email-yu.c.chen@intel.comSigned-off-by: default avatarChen Yu <yu.c.chen@intel.com>
      Reported-by: default avatarJan Filipcewicz <jan.filipcewicz@intel.com>
      Suggested-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2719637
    • Vineet Gupta's avatar
      ARCv2: PAE40: set MSB even if !CONFIG_ARC_HAS_PAE40 but PAE exists in SoC · 8b366972
      Vineet Gupta authored
      commit b5ddb6d5 upstream.
      
      PAE40 confiuration in hardware extends some of the address registers
      for TLB/cache ops to 2 words.
      
      So far kernel was NOT setting the higher word if feature was not enabled
      in software which is wrong. Those need to be set to 0 in such case.
      
      Normally this would be done in the cache flush / tlb ops, however since
      these registers only exist conditionally, this would have to be
      conditional to a flag being set on boot which is expensive/ugly -
      specially for the more common case of PAE exists but not in use.
      Optimize that by zero'ing them once at boot - nobody will write to
      them afterwards
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8b366972
    • Alexey Brodkin's avatar
      ARCv2: PAE40: Explicitly set MSB counterpart of SLC region ops addresses · fcedf2f2
      Alexey Brodkin authored
      commit 7d79cee2 upstream.
      
      It is necessary to explicitly set both SLC_AUX_RGN_START1 and SLC_AUX_RGN_END1
      which hold MSB bits of the physical address correspondingly of region start
      and end otherwise SLC region operation is executed in unpredictable manner
      
      Without this patch, SLC flushes on HSDK (IOC disabled) were taking
      seconds.
      Reported-by: default avatarVladimir Kondratiev <vladimir.kondratiev@intel.com>
      Signed-off-by: default avatarAlexey Brodkin <abrodkin@synopsys.com>
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      [vgupta: PAR40 regs only written if PAE40 exist]
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fcedf2f2
    • Alexey Brodkin's avatar
      ARCv2: SLC: Make sure busy bit is set properly for region ops · 763ad317
      Alexey Brodkin authored
      commit b37174d9 upstream.
      
      c70c4733 "ARCv2: SLC: Make sure busy bit is set properly on SLC flushing"
      fixes problem for entire SLC operation where the problem was initially
      caught. But given a nature of the issue it is perfectly possible for
      busy bit to be read incorrectly even when region operation was started.
      
      So extending initial fix for regional operation as well.
      Signed-off-by: default avatarAlexey Brodkin <abrodkin@synopsys.com>
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      763ad317
    • Takashi Sakamoto's avatar
      ALSA: firewire-motu: destroy stream data surely at failure of card initialization · 8537b1e0
      Takashi Sakamoto authored
      commit dbd7396b upstream.
      
      When failing sound card registration after initializing stream data, this
      module leaves allocated data in stream data. This commit fixes the bug.
      
      Fixes: 9b2bb4f2 ('ALSA: firewire-motu: add stream management functionality')
      Signed-off-by: default avatarTakashi Sakamoto <o-takashi@sakamocchi.jp>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8537b1e0
    • Takashi Sakamoto's avatar
      ALSA: firewire: fix NULL pointer dereference when releasing uninitialized data of iso-resource · 59d00061
      Takashi Sakamoto authored
      commit 0c264af7 upstream.
      
      When calling 'iso_resource_free()' for uninitialized data, this function
      causes NULL pointer dereference due to its 'unit' member. This occurs when
      unplugging audio and music units on IEEE 1394 bus at failure of card
      registration.
      
      This commit fixes the bug. The bug exists since kernel v4.5.
      
      Fixes: 324540c4 ('ALSA: fireface: postpone sound card registration') at v4.12
      Fixes: 8865a31e ('ALSA: firewire-motu: postpone sound card registration') at v4.12
      Fixes: b610386c ('ALSA: firewire-tascam: deleyed registration of sound card') at v4.7
      Fixes: 86c8dd7f ('ALSA: firewire-digi00x: delayed registration of sound card') at v4.7
      Fixes: 6c29230e ('ALSA: oxfw: delayed registration of sound card') at v4.7
      Fixes: 7d3c1d59 ('ALSA: fireworks: delayed registration of sound card') at v4.7
      Fixes: 04a2c73c ('ALSA: bebob: delayed registration of sound card') at v4.7
      Fixes: b59fb190 ('ALSA: dice: postpone card registration') at v4.5
      Signed-off-by: default avatarTakashi Sakamoto <o-takashi@sakamocchi.jp>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      59d00061
    • Takashi Iwai's avatar
      ALSA: hda - Add stereo mic quirk for Lenovo G50-70 (17aa:3978) · 2f45c61b
      Takashi Iwai authored
      commit bbba6f9d upstream.
      
      Lenovo G50-70 (17aa:3978) with Conexant codec chip requires the
      similar workaround for the inverted stereo dmic like other Lenovo
      models.
      
      Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1020657Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2f45c61b
    • Takashi Iwai's avatar
      ALSA: core: Fix unexpected error at replacing user TLV · ba6b08b6
      Takashi Iwai authored
      commit 88c54cdf upstream.
      
      When user tries to replace the user-defined control TLV, the kernel
      checks the change of its content via memcmp().  The problem is that
      the kernel passes the return value from memcmp() as is.  memcmp()
      gives a non-zero negative value depending on the comparison result,
      and this shall be recognized as an error code.
      
      The patch covers that corner-case, return 1 properly for the changed
      TLV.
      
      Fixes: 8aa9b586 ("[ALSA] Control API - more robust TLV implementation")
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ba6b08b6
    • Joakim Tjernlund's avatar
      ALSA: usb-audio: Add delay quirk for H650e/Jabra 550a USB headsets · 1157dcda
      Joakim Tjernlund authored
      commit 07b3b5e9 upstream.
      
      These headsets reports a lot of: cannot set freq 44100 to ep 0x81
      and need a small delay between sample rate settings, just like
      Zoom R16/24. Add both headsets to the Zoom R16/24 quirk for
      a 1 ms delay between control msgs.
      Signed-off-by: default avatarJoakim Tjernlund <joakim.tjernlund@infinera.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1157dcda
    • Paolo Bonzini's avatar
      KVM: x86: block guest protection keys unless the host has them enabled · 2f76f62a
      Paolo Bonzini authored
      commit c469268c upstream.
      
      If the host has protection keys disabled, we cannot read and write the
      guest PKRU---RDPKRU and WRPKRU fail with #GP(0) if CR4.PKE=0.  Block
      the PKU cpuid bit in that case.
      
      This ensures that guest_CR4.PKE=1 implies host_CR4.PKE=1.
      
      Fixes: 1be0e61cReviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2f76f62a
    • Paolo Bonzini's avatar
      KVM, pkeys: do not use PKRU value in vcpu->arch.guest_fpu.state · 3c498d4b
      Paolo Bonzini authored
      commit 38cfd5e3 upstream.
      
      The host pkru is restored right after vcpu exit (commit 1be0e61c), so
      KVM_GET_XSAVE will return the host PKRU value instead.  Fix this by
      using the guest PKRU explicitly in fill_xsave and load_xsave.  This
      part is based on a patch by Junkang Fu.
      
      The host PKRU data may also not match the value in vcpu->arch.guest_fpu.state,
      because it could have been changed by userspace since the last time
      it was saved, so skip loading it in kvm_load_guest_fpu.
      Reported-by: default avatarJunkang Fu <junkang.fjk@alibaba-inc.com>
      Cc: Yang Zhang <zy107165@alibaba-inc.com>
      Fixes: 1be0e61cSigned-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3c498d4b
    • Paolo Bonzini's avatar
      KVM: x86: simplify handling of PKRU · d0e52c82
      Paolo Bonzini authored
      commit b9dd21e1 upstream.
      
      Move it to struct kvm_arch_vcpu, replacing guest_pkru_valid with a
      simple comparison against the host value of the register.  The write of
      PKRU in addition can be skipped if the guest has not enabled the feature.
      Once we do this, we need not test OSPKE in the host anymore, because
      guest_CR4.PKE=1 implies host_CR4.PKE=1.
      
      The static PKU test is kept to elide the code on older CPUs.
      Suggested-by: default avatarYang Zhang <zy107165@alibaba-inc.com>
      Fixes: 1be0e61cReviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d0e52c82
    • Heiko Carstens's avatar
      KVM: s390: sthyi: fix specification exception detection · 6dc06cd6
      Heiko Carstens authored
      commit 857b8de9 upstream.
      
      sthyi should only generate a specification exception if the function
      code is zero and the response buffer is not on a 4k boundary.
      
      The current code would also test for unknown function codes if the
      response buffer, that is currently only defined for function code 0,
      is not on a 4k boundary and incorrectly inject a specification
      exception instead of returning with condition code 3 and return code 4
      (unsupported function code).
      
      Fix this by moving the boundary check.
      
      Fixes: 95ca2cb5 ("KVM: s390: Add sthyi emulation")
      Reviewed-by: default avatarJanosch Frank <frankja@linux.vnet.ibm.com>
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarCornelia Huck <cohuck@redhat.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6dc06cd6
    • Heiko Carstens's avatar
      KVM: s390: sthyi: fix sthyi inline assembly · e516834a
      Heiko Carstens authored
      commit 4a4eefcd upstream.
      
      The sthyi inline assembly misses register r3 within the clobber
      list. The sthyi instruction will always write a return code to
      register "R2+1", which in this case would be r3. Due to that we may
      have register corruption and see host crashes or data corruption
      depending on how gcc decided to allocate and use registers during
      compile time.
      
      Fixes: 95ca2cb5 ("KVM: s390: Add sthyi emulation")
      Reviewed-by: default avatarJanosch Frank <frankja@linux.vnet.ibm.com>
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarCornelia Huck <cohuck@redhat.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e516834a
    • Masaki Ota's avatar
      Input: ALPS - fix two-finger scroll breakage in right side on ALPS touchpad · ddae9e6e
      Masaki Ota authored
      commit 4a646580 upstream.
      
      Fixed the issue that two finger scroll does not work correctly
      on V8 protocol. The cause is that V8 protocol X-coordinate decode
      is wrong at SS4 PLUS device. I added SS4 PLUS X decode definition.
      
      Mote notes:
      the problem manifests itself by the commit e7348396 ("Input: ALPS
      - fix V8+ protocol handling (73 03 28)"), where a fix for the V8+
      protocol was applied.  Although the culprit must have been present
      beforehand, the two-finger scroll worked casually even with the
      wrongly reported values by some reason.  It got broken by the commit
      above just because it changed x_max value, and this made libinput
      correctly figuring the MT events.  Since the X coord is reported as
      falsely doubled, the events on the right-half side go outside the
      boundary, thus they are no longer handled.  This resulted as a broken
      two-finger scroll.
      
      One finger event is decoded differently, and it didn't suffer from
      this problem.  The problem was only about MT events. --tiwai
      
      Fixes: e7348396 ("Input: ALPS - fix V8+ protocol handling (73 03 28)")
      Signed-off-by: default avatarMasaki Ota <masaki.ota@jp.alps.com>
      Tested-by: default avatarTakashi Iwai <tiwai@suse.de>
      Tested-by: default avatarPaul Donohue <linux-kernel@PaulSD.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ddae9e6e
    • KT Liao's avatar
      Input: elan_i2c - add ELAN0602 ACPI ID to support Lenovo Yoga310 · 8dcee8e8
      KT Liao authored
      commit 1d2226e4 upstream.
      
      Add ELAN0602 to the list of known ACPI IDs to enable support for ELAN
      touchpads found in Lenovo Yoga310.
      Signed-off-by: default avatarKT Liao <kt.liao@emc.com.tw>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8dcee8e8
    • Aaron Ma's avatar
      Input: trackpoint - add new trackpoint firmware ID · 38c36f9d
      Aaron Ma authored
      commit ec667683 upstream.
      
      Synaptics add new TP firmware ID: 0x2 and 0x3, for now both lower 2 bits
      are indicated as TP. Change the constant to bitwise values.
      
      This makes trackpoint to be recognized on Lenovo Carbon X1 Gen5 instead
      of it being identified as "PS/2 Generic Mouse".
      Signed-off-by: default avatarAaron Ma <aaron.ma@canonical.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      38c36f9d
    • Edward Cree's avatar
      bpf/verifier: fix min/max handling in BPF_SUB · c9c682f3
      Edward Cree authored
      
      [ Upstream commit 9305706c ]
      
      We have to subtract the src max from the dst min, and vice-versa, since
       (e.g.) the smallest result comes from the largest subtrahend.
      
      Fixes: 48461135 ("bpf: allow access into map value arrays")
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c9c682f3
    • Daniel Borkmann's avatar
      bpf: fix mixed signed/unsigned derived min/max value bounds · eb6cf01c
      Daniel Borkmann authored
      
      [ Upstream commit 4cabc5b1 ]
      
      Edward reported that there's an issue in min/max value bounds
      tracking when signed and unsigned compares both provide hints
      on limits when having unknown variables. E.g. a program such
      as the following should have been rejected:
      
         0: (7a) *(u64 *)(r10 -8) = 0
         1: (bf) r2 = r10
         2: (07) r2 += -8
         3: (18) r1 = 0xffff8a94cda93400
         5: (85) call bpf_map_lookup_elem#1
         6: (15) if r0 == 0x0 goto pc+7
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp
         7: (7a) *(u64 *)(r10 -16) = -8
         8: (79) r1 = *(u64 *)(r10 -16)
         9: (b7) r2 = -1
        10: (2d) if r1 > r2 goto pc+3
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=0
        R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp
        11: (65) if r1 s> 0x1 goto pc+2
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=0,max_value=1
        R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp
        12: (0f) r0 += r1
        13: (72) *(u8 *)(r0 +0) = 0
        R0=map_value_adj(ks=8,vs=8,id=0),min_value=0,max_value=1 R1=inv,min_value=0,max_value=1
        R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp
        14: (b7) r0 = 0
        15: (95) exit
      
      What happens is that in the first part ...
      
         8: (79) r1 = *(u64 *)(r10 -16)
         9: (b7) r2 = -1
        10: (2d) if r1 > r2 goto pc+3
      
      ... r1 carries an unsigned value, and is compared as unsigned
      against a register carrying an immediate. Verifier deduces in
      reg_set_min_max() that since the compare is unsigned and operation
      is greater than (>), that in the fall-through/false case, r1's
      minimum bound must be 0 and maximum bound must be r2. Latter is
      larger than the bound and thus max value is reset back to being
      'invalid' aka BPF_REGISTER_MAX_RANGE. Thus, r1 state is now
      'R1=inv,min_value=0'. The subsequent test ...
      
        11: (65) if r1 s> 0x1 goto pc+2
      
      ... is a signed compare of r1 with immediate value 1. Here,
      verifier deduces in reg_set_min_max() that since the compare
      is signed this time and operation is greater than (>), that
      in the fall-through/false case, we can deduce that r1's maximum
      bound must be 1, meaning with prior test, we result in r1 having
      the following state: R1=inv,min_value=0,max_value=1. Given that
      the actual value this holds is -8, the bounds are wrongly deduced.
      When this is being added to r0 which holds the map_value(_adj)
      type, then subsequent store access in above case will go through
      check_mem_access() which invokes check_map_access_adj(), that
      will then probe whether the map memory is in bounds based
      on the min_value and max_value as well as access size since
      the actual unknown value is min_value <= x <= max_value; commit
      fce366a9 ("bpf, verifier: fix alu ops against map_value{,
      _adj} register types") provides some more explanation on the
      semantics.
      
      It's worth to note in this context that in the current code,
      min_value and max_value tracking are used for two things, i)
      dynamic map value access via check_map_access_adj() and since
      commit 06c1c049 ("bpf: allow helpers access to variable memory")
      ii) also enforced at check_helper_mem_access() when passing a
      memory address (pointer to packet, map value, stack) and length
      pair to a helper and the length in this case is an unknown value
      defining an access range through min_value/max_value in that
      case. The min_value/max_value tracking is /not/ used in the
      direct packet access case to track ranges. However, the issue
      also affects case ii), for example, the following crafted program
      based on the same principle must be rejected as well:
      
         0: (b7) r2 = 0
         1: (bf) r3 = r10
         2: (07) r3 += -512
         3: (7a) *(u64 *)(r10 -16) = -8
         4: (79) r4 = *(u64 *)(r10 -16)
         5: (b7) r6 = -1
         6: (2d) if r4 > r6 goto pc+5
        R1=ctx R2=imm0,min_value=0,max_value=0,min_align=2147483648 R3=fp-512
        R4=inv,min_value=0 R6=imm-1,max_value=18446744073709551615,min_align=1 R10=fp
         7: (65) if r4 s> 0x1 goto pc+4
        R1=ctx R2=imm0,min_value=0,max_value=0,min_align=2147483648 R3=fp-512
        R4=inv,min_value=0,max_value=1 R6=imm-1,max_value=18446744073709551615,min_align=1
        R10=fp
         8: (07) r4 += 1
         9: (b7) r5 = 0
        10: (6a) *(u16 *)(r10 -512) = 0
        11: (85) call bpf_skb_load_bytes#26
        12: (b7) r0 = 0
        13: (95) exit
      
      Meaning, while we initialize the max_value stack slot that the
      verifier thinks we access in the [1,2] range, in reality we
      pass -7 as length which is interpreted as u32 in the helper.
      Thus, this issue is relevant also for the case of helper ranges.
      Resetting both bounds in check_reg_overflow() in case only one
      of them exceeds limits is also not enough as similar test can be
      created that uses values which are within range, thus also here
      learned min value in r1 is incorrect when mixed with later signed
      test to create a range:
      
         0: (7a) *(u64 *)(r10 -8) = 0
         1: (bf) r2 = r10
         2: (07) r2 += -8
         3: (18) r1 = 0xffff880ad081fa00
         5: (85) call bpf_map_lookup_elem#1
         6: (15) if r0 == 0x0 goto pc+7
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp
         7: (7a) *(u64 *)(r10 -16) = -8
         8: (79) r1 = *(u64 *)(r10 -16)
         9: (b7) r2 = 2
        10: (3d) if r2 >= r1 goto pc+3
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
        R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
        11: (65) if r1 s> 0x4 goto pc+2
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0
        R1=inv,min_value=3,max_value=4 R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
        12: (0f) r0 += r1
        13: (72) *(u8 *)(r0 +0) = 0
        R0=map_value_adj(ks=8,vs=8,id=0),min_value=3,max_value=4
        R1=inv,min_value=3,max_value=4 R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
        14: (b7) r0 = 0
        15: (95) exit
      
      This leaves us with two options for fixing this: i) to invalidate
      all prior learned information once we switch signed context, ii)
      to track min/max signed and unsigned boundaries separately as
      done in [0]. (Given latter introduces major changes throughout
      the whole verifier, it's rather net-next material, thus this
      patch follows option i), meaning we can derive bounds either
      from only signed tests or only unsigned tests.) There is still the
      case of adjust_reg_min_max_vals(), where we adjust bounds on ALU
      operations, meaning programs like the following where boundaries
      on the reg get mixed in context later on when bounds are merged
      on the dst reg must get rejected, too:
      
         0: (7a) *(u64 *)(r10 -8) = 0
         1: (bf) r2 = r10
         2: (07) r2 += -8
         3: (18) r1 = 0xffff89b2bf87ce00
         5: (85) call bpf_map_lookup_elem#1
         6: (15) if r0 == 0x0 goto pc+6
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp
         7: (7a) *(u64 *)(r10 -16) = -8
         8: (79) r1 = *(u64 *)(r10 -16)
         9: (b7) r2 = 2
        10: (3d) if r2 >= r1 goto pc+2
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
        R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
        11: (b7) r7 = 1
        12: (65) if r7 s> 0x0 goto pc+2
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
        R2=imm2,min_value=2,max_value=2,min_align=2 R7=imm1,max_value=0 R10=fp
        13: (b7) r0 = 0
        14: (95) exit
      
        from 12 to 15: R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0
        R1=inv,min_value=3 R2=imm2,min_value=2,max_value=2,min_align=2 R7=imm1,min_value=1 R10=fp
        15: (0f) r7 += r1
        16: (65) if r7 s> 0x4 goto pc+2
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
        R2=imm2,min_value=2,max_value=2,min_align=2 R7=inv,min_value=4,max_value=4 R10=fp
        17: (0f) r0 += r7
        18: (72) *(u8 *)(r0 +0) = 0
        R0=map_value_adj(ks=8,vs=8,id=0),min_value=4,max_value=4 R1=inv,min_value=3
        R2=imm2,min_value=2,max_value=2,min_align=2 R7=inv,min_value=4,max_value=4 R10=fp
        19: (b7) r0 = 0
        20: (95) exit
      
      Meaning, in adjust_reg_min_max_vals() we must also reset range
      values on the dst when src/dst registers have mixed signed/
      unsigned derived min/max value bounds with one unbounded value
      as otherwise they can be added together deducing false boundaries.
      Once both boundaries are established from either ALU ops or
      compare operations w/o mixing signed/unsigned insns, then they
      can safely be added to other regs also having both boundaries
      established. Adding regs with one unbounded side to a map value
      where the bounded side has been learned w/o mixing ops is
      possible, but the resulting map value won't recover from that,
      meaning such op is considered invalid on the time of actual
      access. Invalid bounds are set on the dst reg in case i) src reg,
      or ii) in case dst reg already had them. The only way to recover
      would be to perform i) ALU ops but only 'add' is allowed on map
      value types or ii) comparisons, but these are disallowed on
      pointers in case they span a range. This is fine as only BPF_JEQ
      and BPF_JNE may be performed on PTR_TO_MAP_VALUE_OR_NULL registers
      which potentially turn them into PTR_TO_MAP_VALUE type depending
      on the branch, so only here min/max value cannot be invalidated
      for them.
      
      In terms of state pruning, value_from_signed is considered
      as well in states_equal() when dealing with adjusted map values.
      With regards to breaking existing programs, there is a small
      risk, but use-cases are rather quite narrow where this could
      occur and mixing compares probably unlikely.
      
      Joint work with Josef and Edward.
      
        [0] https://lists.iovisor.org/pipermail/iovisor-dev/2017-June/000822.html
      
      Fixes: 48461135 ("bpf: allow access into map value arrays")
      Reported-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eb6cf01c
    • John Fastabend's avatar
      bpf, verifier: add additional patterns to evaluate_reg_imm_alu · 659ee968
      John Fastabend authored
      
      [ Upstream commit 43188702 ]
      
      Currently the verifier does not track imm across alu operations when
      the source register is of unknown type. This adds additional pattern
      matching to catch this and track imm. We've seen LLVM generating this
      pattern while working on cilium.
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      659ee968
    • Konstantin Khlebnikov's avatar
      net_sched: fix order of queue length updates in qdisc_replace() · d8a4ae09
      Konstantin Khlebnikov authored
      
      [ Upstream commit 68a66d14 ]
      
      This important to call qdisc_tree_reduce_backlog() after changing queue
      length. Parent qdisc should deactivate class in ->qlen_notify() called from
      qdisc_tree_reduce_backlog() but this happens only if qdisc->q.qlen in zero.
      
      Missed class deactivations leads to crashes/warnings at picking packets
      from empty qdisc and corrupting state at reactivating this class in future.
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Fixes: 86a7996c ("net_sched: introduce qdisc_replace() helper")
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d8a4ae09
    • Xin Long's avatar
      net: sched: fix NULL pointer dereference when action calls some targets · 09e1d36d
      Xin Long authored
      
      [ Upstream commit 4f8a881a ]
      
      As we know in some target's checkentry it may dereference par.entryinfo
      to check entry stuff inside. But when sched action calls xt_check_target,
      par.entryinfo is set with NULL. It would cause kernel panic when calling
      some targets.
      
      It can be reproduce with:
        # tc qd add dev eth1 ingress handle ffff:
        # tc filter add dev eth1 parent ffff: u32 match u32 0 0 action xt \
          -j ECN --ecn-tcp-remove
      
      It could also crash kernel when using target CLUSTERIP or TPROXY.
      
      By now there's no proper value for par.entryinfo in ipt_init_target,
      but it can not be set with NULL. This patch is to void all these
      panics by setting it with an ipt_entry obj with all members = 0.
      
      Note that this issue has been there since the very beginning.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      09e1d36d
    • Colin Ian King's avatar
      irda: do not leak initialized list.dev to userspace · f4e4a296
      Colin Ian King authored
      
      [ Upstream commit b024d949 ]
      
      list.dev has not been initialized and so the copy_to_user is copying
      data from the stack back to user space which is a potential
      information leak. Fix this ensuring all of list is initialized to
      zero.
      
      Detected by CoverityScan, CID#1357894 ("Uninitialized scalar variable")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f4e4a296
    • Huy Nguyen's avatar
      net/mlx4_core: Enable 4K UAR if SRIOV module parameter is not enabled · 754df4da
      Huy Nguyen authored
      
      [ Upstream commit ca3d89a3 ]
      
      enable_4k_uar module parameter was added in patch cited below to
      address the backward compatibility issue in SRIOV when the VM has
      system's PAGE_SIZE uar implementation and the Hypervisor has 4k uar
      implementation.
      
      The above compatibility issue does not exist in the non SRIOV case.
      In this patch, we always enable 4k uar implementation if SRIOV
      is not enabled on mlx4's supported cards.
      
      Fixes: 76e39ccf ("net/mlx4_core: Fix backward compatibility on VFs")
      Signed-off-by: default avatarHuy Nguyen <huyn@mellanox.com>
      Reviewed-by: default avatarDaniel Jurgens <danielj@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      754df4da
    • Neal Cardwell's avatar
      tcp: when rearming RTO, if RTO time is in past then fire RTO ASAP · 2d093adf
      Neal Cardwell authored
      
      [ Upstream commit cdbeb633 ]
      
      In some situations tcp_send_loss_probe() can realize that it's unable
      to send a loss probe (TLP), and falls back to calling tcp_rearm_rto()
      to schedule an RTO timer. In such cases, sometimes tcp_rearm_rto()
      realizes that the RTO was eligible to fire immediately or at some
      point in the past (delta_us <= 0). Previously in such cases
      tcp_rearm_rto() was scheduling such "overdue" RTOs to happen at now +
      icsk_rto, which caused needless delays of hundreds of milliseconds
      (and non-linear behavior that made reproducible testing
      difficult). This commit changes the logic to schedule "overdue" RTOs
      ASAP, rather than at now + icsk_rto.
      
      Fixes: 6ba8a3b1 ("tcp: Tail loss probe (TLP)")
      Suggested-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2d093adf
    • Wei Wang's avatar
      ipv6: repair fib6 tree in failure case · 7bbc60d9
      Wei Wang authored
      
      [ Upstream commit 348a4002 ]
      
      In fib6_add(), it is possible that fib6_add_1() picks an intermediate
      node and sets the node's fn->leaf to NULL in order to add this new
      route. However, if fib6_add_rt2node() fails to add the new
      route for some reason, fn->leaf will be left as NULL and could
      potentially cause crash when fn->leaf is accessed in fib6_locate().
      This patch makes sure fib6_repair_tree() is called to properly repair
      fn->leaf in the above failure case.
      
      Here is the syzkaller reported general protection fault in fib6_locate:
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Modules linked in:
      CPU: 0 PID: 40937 Comm: syz-executor3 Not tainted
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      task: ffff8801d7d64100 ti: ffff8801d01a0000 task.ti: ffff8801d01a0000
      RIP: 0010:[<ffffffff82a3e0e1>]  [<ffffffff82a3e0e1>] __ipv6_prefix_equal64_half include/net/ipv6.h:475 [inline]
      RIP: 0010:[<ffffffff82a3e0e1>]  [<ffffffff82a3e0e1>] ipv6_prefix_equal include/net/ipv6.h:492 [inline]
      RIP: 0010:[<ffffffff82a3e0e1>]  [<ffffffff82a3e0e1>] fib6_locate_1 net/ipv6/ip6_fib.c:1210 [inline]
      RIP: 0010:[<ffffffff82a3e0e1>]  [<ffffffff82a3e0e1>] fib6_locate+0x281/0x3c0 net/ipv6/ip6_fib.c:1233
      RSP: 0018:ffff8801d01a36a8  EFLAGS: 00010202
      RAX: 0000000000000020 RBX: ffff8801bc790e00 RCX: ffffc90002983000
      RDX: 0000000000001219 RSI: ffff8801d01a37a0 RDI: 0000000000000100
      RBP: ffff8801d01a36f0 R08: 00000000000000ff R09: 0000000000000000
      R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000001
      R13: dffffc0000000000 R14: ffff8801d01a37a0 R15: 0000000000000000
      FS:  00007f6afd68c700(0000) GS:ffff8801db400000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000004c6340 CR3: 00000000ba41f000 CR4: 00000000001426f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Stack:
       ffff8801d01a37a8 ffff8801d01a3780 ffffed003a0346f5 0000000c82a23ea0
       ffff8800b7bd7700 ffff8801d01a3780 ffff8800b6a1c940 ffffffff82a23ea0
       ffff8801d01a3920 ffff8801d01a3748 ffffffff82a223d6 ffff8801d7d64988
      Call Trace:
       [<ffffffff82a223d6>] ip6_route_del+0x106/0x570 net/ipv6/route.c:2109
       [<ffffffff82a23f9d>] inet6_rtm_delroute+0xfd/0x100 net/ipv6/route.c:3075
       [<ffffffff82621359>] rtnetlink_rcv_msg+0x549/0x7a0 net/core/rtnetlink.c:3450
       [<ffffffff8274c1d1>] netlink_rcv_skb+0x141/0x370 net/netlink/af_netlink.c:2281
       [<ffffffff82613ddf>] rtnetlink_rcv+0x2f/0x40 net/core/rtnetlink.c:3456
       [<ffffffff8274ad38>] netlink_unicast_kernel net/netlink/af_netlink.c:1206 [inline]
       [<ffffffff8274ad38>] netlink_unicast+0x518/0x750 net/netlink/af_netlink.c:1232
       [<ffffffff8274b83e>] netlink_sendmsg+0x8ce/0xc30 net/netlink/af_netlink.c:1778
       [<ffffffff82564aff>] sock_sendmsg_nosec net/socket.c:609 [inline]
       [<ffffffff82564aff>] sock_sendmsg+0xcf/0x110 net/socket.c:619
       [<ffffffff82564d62>] sock_write_iter+0x222/0x3a0 net/socket.c:834
       [<ffffffff8178523d>] new_sync_write+0x1dd/0x2b0 fs/read_write.c:478
       [<ffffffff817853f4>] __vfs_write+0xe4/0x110 fs/read_write.c:491
       [<ffffffff81786c38>] vfs_write+0x178/0x4b0 fs/read_write.c:538
       [<ffffffff817892a9>] SYSC_write fs/read_write.c:585 [inline]
       [<ffffffff817892a9>] SyS_write+0xd9/0x1b0 fs/read_write.c:577
       [<ffffffff82c71e32>] entry_SYSCALL_64_fastpath+0x12/0x17
      
      Note: there is no "Fixes" tag as this seems to be a bug introduced
      very early.
      Signed-off-by: default avatarWei Wang <weiwan@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7bbc60d9
    • Wei Wang's avatar
      ipv6: reset fn->rr_ptr when replacing route · 368129fe
      Wei Wang authored
      
      [ Upstream commit 383143f3 ]
      
      syzcaller reported the following use-after-free issue in rt6_select():
      BUG: KASAN: use-after-free in rt6_select net/ipv6/route.c:755 [inline] at addr ffff8800bc6994e8
      BUG: KASAN: use-after-free in ip6_pol_route.isra.46+0x1429/0x1470 net/ipv6/route.c:1084 at addr ffff8800bc6994e8
      Read of size 4 by task syz-executor1/439628
      CPU: 0 PID: 439628 Comm: syz-executor1 Not tainted 4.3.5+ #8
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
       0000000000000000 ffff88018fe435b0 ffffffff81ca384d ffff8801d3588c00
       ffff8800bc699380 ffff8800bc699500 dffffc0000000000 ffff8801d40a47c0
       ffff88018fe435d8 ffffffff81735751 ffff88018fe43660 ffff8800bc699380
      Call Trace:
       [<ffffffff81ca384d>] __dump_stack lib/dump_stack.c:15 [inline]
       [<ffffffff81ca384d>] dump_stack+0xc1/0x124 lib/dump_stack.c:51
      sctp: [Deprecated]: syz-executor0 (pid 439615) Use of struct sctp_assoc_value in delayed_ack socket option.
      Use struct sctp_sack_info instead
       [<ffffffff81735751>] kasan_object_err+0x21/0x70 mm/kasan/report.c:158
       [<ffffffff817359c4>] print_address_description mm/kasan/report.c:196 [inline]
       [<ffffffff817359c4>] kasan_report_error+0x1b4/0x4a0 mm/kasan/report.c:285
       [<ffffffff81735d93>] kasan_report mm/kasan/report.c:305 [inline]
       [<ffffffff81735d93>] __asan_report_load4_noabort+0x43/0x50 mm/kasan/report.c:325
       [<ffffffff82a28e39>] rt6_select net/ipv6/route.c:755 [inline]
       [<ffffffff82a28e39>] ip6_pol_route.isra.46+0x1429/0x1470 net/ipv6/route.c:1084
       [<ffffffff82a28fb1>] ip6_pol_route_output+0x81/0xb0 net/ipv6/route.c:1203
       [<ffffffff82ab0a50>] fib6_rule_action+0x1f0/0x680 net/ipv6/fib6_rules.c:95
       [<ffffffff8265cbb6>] fib_rules_lookup+0x2a6/0x7a0 net/core/fib_rules.c:223
       [<ffffffff82ab1430>] fib6_rule_lookup+0xd0/0x250 net/ipv6/fib6_rules.c:41
       [<ffffffff82a22006>] ip6_route_output+0x1d6/0x2c0 net/ipv6/route.c:1224
       [<ffffffff829e83d2>] ip6_dst_lookup_tail+0x4d2/0x890 net/ipv6/ip6_output.c:943
       [<ffffffff829e889a>] ip6_dst_lookup_flow+0x9a/0x250 net/ipv6/ip6_output.c:1079
       [<ffffffff82a9f7d8>] ip6_datagram_dst_update+0x538/0xd40 net/ipv6/datagram.c:91
       [<ffffffff82aa0978>] __ip6_datagram_connect net/ipv6/datagram.c:251 [inline]
       [<ffffffff82aa0978>] ip6_datagram_connect+0x518/0xe50 net/ipv6/datagram.c:272
       [<ffffffff82aa1313>] ip6_datagram_connect_v6_only+0x63/0x90 net/ipv6/datagram.c:284
       [<ffffffff8292f790>] inet_dgram_connect+0x170/0x1f0 net/ipv4/af_inet.c:564
       [<ffffffff82565547>] SYSC_connect+0x1a7/0x2f0 net/socket.c:1582
       [<ffffffff8256a649>] SyS_connect+0x29/0x30 net/socket.c:1563
       [<ffffffff82c72032>] entry_SYSCALL_64_fastpath+0x12/0x17
      Object at ffff8800bc699380, in cache ip6_dst_cache size: 384
      
      The root cause of it is that in fib6_add_rt2node(), when it replaces an
      existing route with the new one, it does not update fn->rr_ptr.
      This commit resets fn->rr_ptr to NULL when it points to a route which is
      replaced in fib6_add_rt2node().
      
      Fixes: 27596472 ("ipv6: fix ECMP route replacement")
      Signed-off-by: default avatarWei Wang <weiwan@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      368129fe