1. 29 Apr, 2018 6 commits
  2. 26 Apr, 2018 34 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.14.37 · 753be7e8
      Greg Kroah-Hartman authored
      753be7e8
    • Benjamin Beichler's avatar
      mac80211_hwsim: fix use-after-free bug in hwsim_exit_net · f606893f
      Benjamin Beichler authored
      commit 8cfd36a0 upstream.
      
      When destroying a net namespace, all hwsim interfaces, which are not
      created in default namespace are deleted. But the async deletion of the
      interfaces could last longer than the actual destruction of the
      namespace, which results to an use after free bug. Therefore use
      synchronous deletion in this case.
      
      Fixes: 100cb9ff ("mac80211_hwsim: Allow managing radios from non-initial namespaces")
      Reported-by: syzbot+70ce058e01259de7bb1d@syzkaller.appspotmail.com
      Signed-off-by: default avatarBenjamin Beichler <benjamin.beichler@uni-rostock.de>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f606893f
    • Sean Christopherson's avatar
      Revert "KVM: X86: Fix SMRAM accessing even if VM is shutdown" · 679833ea
      Sean Christopherson authored
      commit 2c151b25 upstream.
      
      The bug that led to commit 95e057e2
      was a benign warning (no adverse affects other than the warning
      itself) that was detected by syzkaller.  Further inspection shows
      that the WARN_ON in question, in handle_ept_misconfig(), is
      unnecessary and flawed (this was also briefly discussed in the
      original patch: https://patchwork.kernel.org/patch/10204649).
      
        * The WARN_ON is unnecessary as kvm_mmu_page_fault() will WARN
          if reserved bits are set in the SPTEs, i.e. it covers the case
          where an EPT misconfig occurred because of a KVM bug.
      
        * The WARN_ON is flawed because it will fire on any system error
          code that is hit while handling the fault, e.g. -ENOMEM can be
          returned by mmu_topup_memory_caches() while handling a legitmate
          MMIO EPT misconfig.
      
      The original behavior of returning -EFAULT when userspace munmaps
      an HVA without first removing the memslot is correct and desirable,
      i.e. KVM is letting userspace know it has generated a bad address.
      Returning RET_PF_EMULATE masks the WARN_ON in the EPT misconfig path,
      but does not fix the underlying bug, i.e. the WARN_ON is bogus.
      
      Furthermore, returning RET_PF_EMULATE has the unwanted side effect of
      causing KVM to attempt to emulate an instruction on any page fault
      with an invalid HVA translation, e.g. a not-present EPT violation
      on a VM_PFNMAP VMA whose fault handler failed to insert a PFN.
      
        * There is no guarantee that the fault is directly related to the
          instruction, i.e. the fault could have been triggered by a side
          effect memory access in the guest, e.g. while vectoring a #DB or
          writing a tracing record.  This could cause KVM to effectively
          mask the fault if KVM doesn't model the behavior leading to the
          fault, i.e. emulation could succeed and resume the guest.
      
        * If emulation does fail, KVM will return EMULATION_FAILED instead
          of -EFAULT, which is a red herring as the user will either debug
          a bogus emulation attempt or scratch their head wondering why we
          were attempting emulation in the first place.
      
      TL;DR: revert to returning -EFAULT and remove the bogus WARN_ON in
      handle_ept_misconfig in a future patch.
      
      This reverts commit 95e057e2.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      679833ea
    • Leon Romanovsky's avatar
      RDMA/mlx5: Fix NULL dereference while accessing XRC_TGT QPs · 75dceb68
      Leon Romanovsky authored
      commit 75a45982 upstream.
      
      mlx5 modify_qp() relies on FW that the error will be thrown if wrong
      state is supplied. The missing check in FW causes the following crash
      while using XRC_TGT QPs.
      
      [   14.769632] BUG: unable to handle kernel NULL pointer dereference at (null)
      [   14.771085] IP: mlx5_ib_modify_qp+0xf60/0x13f0
      [   14.771894] PGD 800000001472e067 P4D 800000001472e067 PUD 14529067 PMD 0
      [   14.773126] Oops: 0002 [#1] SMP PTI
      [   14.773763] CPU: 0 PID: 365 Comm: ubsan Not tainted 4.16.0-rc1-00038-g8151138c0793 #119
      [   14.775192] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
      [   14.777522] RIP: 0010:mlx5_ib_modify_qp+0xf60/0x13f0
      [   14.778417] RSP: 0018:ffffbf48001c7bd8 EFLAGS: 00010246
      [   14.779346] RAX: 0000000000000000 RBX: ffff9a8f9447d400 RCX: 0000000000000000
      [   14.780643] RDX: 0000000000000000 RSI: 000000000000000a RDI: 0000000000000000
      [   14.781930] RBP: 0000000000000000 R08: 00000000000217b0 R09: ffffffffbc9c1504
      [   14.783214] R10: fffff4a180519480 R11: ffff9a8f94523600 R12: ffff9a8f9493e240
      [   14.784507] R13: ffff9a8f9447d738 R14: 000000000000050a R15: 0000000000000000
      [   14.785800] FS:  00007f545b466700(0000) GS:ffff9a8f9fc00000(0000) knlGS:0000000000000000
      [   14.787073] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   14.787792] CR2: 0000000000000000 CR3: 00000000144be000 CR4: 00000000000006b0
      [   14.788689] Call Trace:
      [   14.789007]  _ib_modify_qp+0x71/0x120
      [   14.789475]  modify_qp.isra.20+0x207/0x2f0
      [   14.790010]  ib_uverbs_modify_qp+0x90/0xe0
      [   14.790532]  ib_uverbs_write+0x1d2/0x3c0
      [   14.791049]  ? __handle_mm_fault+0x93c/0xe40
      [   14.791644]  __vfs_write+0x36/0x180
      [   14.792096]  ? handle_mm_fault+0xc1/0x210
      [   14.792601]  vfs_write+0xad/0x1e0
      [   14.793018]  SyS_write+0x52/0xc0
      [   14.793422]  do_syscall_64+0x75/0x180
      [   14.793888]  entry_SYSCALL_64_after_hwframe+0x21/0x86
      [   14.794527] RIP: 0033:0x7f545ad76099
      [   14.794975] RSP: 002b:00007ffd78787468 EFLAGS: 00000287 ORIG_RAX: 0000000000000001
      [   14.795958] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f545ad76099
      [   14.797075] RDX: 0000000000000078 RSI: 0000000020009000 RDI: 0000000000000003
      [   14.798140] RBP: 00007ffd78787470 R08: 00007ffd78787480 R09: 00007ffd78787480
      [   14.799207] R10: 00007ffd78787480 R11: 0000000000000287 R12: 00005599ada98760
      [   14.800277] R13: 00007ffd78787560 R14: 0000000000000000 R15: 0000000000000000
      [   14.801341] Code: 4c 8b 1c 24 48 8b 83 70 02 00 00 48 c7 83 cc 02 00
      00 00 00 00 00 48 c7 83 24 03 00 00 00 00 00 00 c7 83 2c 03 00 00 00 00
      00 00 <c7> 00 00 00 00 00 48 8b 83 70 02 00 00 c7 40 04 00 00 00 00 4c
      [   14.804012] RIP: mlx5_ib_modify_qp+0xf60/0x13f0 RSP: ffffbf48001c7bd8
      [   14.804838] CR2: 0000000000000000
      [   14.805288] ---[ end trace 3f1da0df5c8b7c37 ]---
      
      Cc: syzkaller <syzkaller@googlegroups.com>
      Reported-by: default avatarMaor Gottlieb <maorg@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      75dceb68
    • Jiri Olsa's avatar
      perf: Return proper values for user stack errors · 01e71c21
      Jiri Olsa authored
      commit 78b562fb upstream.
      
      Return immediately when we find issue in the user stack checks. The
      error value could get overwritten by following check for
      PERF_SAMPLE_REGS_INTR.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: syzkaller-bugs@googlegroups.com
      Cc: x86@kernel.org
      Fixes: 60e2364e ("perf: Add ability to sample machine state on interrupt")
      Link: http://lkml.kernel.org/r/20180415092352.12403-1-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      01e71c21
    • Jiri Olsa's avatar
      perf: Fix sample_max_stack maximum check · 66038084
      Jiri Olsa authored
      commit 5af44ca5 upstream.
      
      The syzbot hit KASAN bug in perf_callchain_store having the entry stored
      behind the allocated bounds [1].
      
      We miss the sample_max_stack check for the initial event that allocates
      callchain buffers. This missing check allows to create an event with
      sample_max_stack value bigger than the global sysctl maximum:
      
        # sysctl -a | grep perf_event_max_stack
        kernel.perf_event_max_stack = 127
      
        # perf record -vv -C 1 -e cycles/max-stack=256/ kill
        ...
        perf_event_attr:
          size                             112
          ...
          sample_max_stack                 256
        ------------------------------------------------------------
        sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 4
      
      Note the '-C 1', which forces perf record to create just single event.
      Otherwise it opens event for every cpu, then the sample_max_stack check
      fails on the second event and all's fine.
      
      The fix is to run the sample_max_stack check also for the first event
      with callchains.
      
      [1] https://marc.info/?l=linux-kernel&m=152352732920874&w=2
      
      Reported-by: syzbot+7c449856228b63ac951e@syzkaller.appspotmail.com
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: syzkaller-bugs@googlegroups.com
      Cc: x86@kernel.org
      Fixes: 97c79a38 ("perf core: Per event callchain limit")
      Link: http://lkml.kernel.org/r/20180415092352.12403-2-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      66038084
    • Florian Westphal's avatar
      netfilter: x_tables: limit allocation requests for blob rule heads · 5bcf1694
      Florian Westphal authored
      commit 9d5c12a7 upstream.
      
      This is a very conservative limit (134217728 rules), but good
      enough to not trigger frequent oom from syzkaller.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5bcf1694
    • Florian Westphal's avatar
      netfilter: compat: reject huge allocation requests · 764f2162
      Florian Westphal authored
      commit 7d7d7e02 upstream.
      
      no need to bother even trying to allocating huge compat offset arrays,
      such ruleset is rejected later on anyway becaus we refuse to allocate
      overly large rule blobs.
      
      However, compat translation happens before blob allocation, so we should
      add a check there too.
      
      This is supposed to help with fuzzing by avoiding oom-killer.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      764f2162
    • Florian Westphal's avatar
      netfilter: compat: prepare xt_compat_init_offsets to return errors · 8d92d533
      Florian Westphal authored
      commit 9782a11e upstream.
      
      should have no impact, function still always returns 0.
      This patch is only to ease review.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8d92d533
    • Florian Westphal's avatar
      netfilter: x_tables: add counters allocation wrapper · 82b68ecd
      Florian Westphal authored
      commit c84ca954 upstream.
      
      allows to have size checks in a single spot.
      This is supposed to reduce oom situations when fuzz-testing xtables.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      82b68ecd
    • Florian Westphal's avatar
      netfilter: x_tables: cap allocations at 512 mbyte · fab0b3ce
      Florian Westphal authored
      commit 19926968 upstream.
      
      Arbitrary limit, however, this still allows huge rulesets
      (> 1 million rules).  This helps with automated fuzzer as it prevents
      oom-killer invocation.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fab0b3ce
    • Thomas Gleixner's avatar
      alarmtimer: Init nanosleep alarm timer on stack · 89f3232c
      Thomas Gleixner authored
      commit bd031430 upstream.
      
      syszbot reported the following debugobjects splat:
      
       ODEBUG: object is on stack, but not annotated
       WARNING: CPU: 0 PID: 4185 at lib/debugobjects.c:328
      
       RIP: 0010:debug_object_is_on_stack lib/debugobjects.c:327 [inline]
       debug_object_init+0x17/0x20 lib/debugobjects.c:391
       debug_hrtimer_init kernel/time/hrtimer.c:410 [inline]
       debug_init kernel/time/hrtimer.c:458 [inline]
       hrtimer_init+0x8c/0x410 kernel/time/hrtimer.c:1259
       alarm_init kernel/time/alarmtimer.c:339 [inline]
       alarm_timer_nsleep+0x164/0x4d0 kernel/time/alarmtimer.c:787
       SYSC_clock_nanosleep kernel/time/posix-timers.c:1226 [inline]
       SyS_clock_nanosleep+0x235/0x330 kernel/time/posix-timers.c:1204
       do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      This happens because the hrtimer for the alarm nanosleep is on stack, but
      the code does not use the proper debug objects initialization.
      
      Split out the code for the allocated use cases and invoke
      hrtimer_init_on_stack() for the nanosleep related functions.
      
      Reported-by: syzbot+a3e0726462b2e346a31d@syzkaller.appspotmail.com
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: syzkaller-bugs@googlegroups.com
      Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1803261528270.1585@nanos.tec.linutronix.deSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      89f3232c
    • Max Gurtovoy's avatar
      RDMA/core: Reduce poll batch for direct cq polling · 76cd54fa
      Max Gurtovoy authored
      
      [ Upstream commit d3b9e8ad ]
      
      Fix warning limit for kernel stack consumption:
      
      drivers/infiniband/core/cq.c: In function 'ib_process_cq_direct':
      drivers/infiniband/core/cq.c:78:1: error: the frame size of 1032 bytes
      is larger than 1024 bytes [-Werror=frame-larger-than=]
      
      Using smaller ib_wc array on the stack brings us comfortably below that
      limit again.
      
      Fixes: 246d8b18 ("IB/cq: Don't force IB_POLL_DIRECT poll context for ib_process_cq_direct")
      Reported-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarSergey Gorenko <sergeygo@mellanox.com>
      Signed-off-by: default avatarMax Gurtovoy <maxg@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Reviewed-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      76cd54fa
    • Mark Salter's avatar
      irqchip/gic-v3: Change pr_debug message to pr_devel · de16dfcc
      Mark Salter authored
      
      [ Upstream commit b6dd4d83 ]
      
      The pr_debug() in gic-v3 gic_send_sgi() can trigger a circular locking
      warning:
      
       GICv3: CPU10: ICC_SGI1R_EL1 5000400
       ======================================================
       WARNING: possible circular locking dependency detected
       4.15.0+ #1 Tainted: G        W
       ------------------------------------------------------
       dynamic_debug01/1873 is trying to acquire lock:
        ((console_sem).lock){-...}, at: [<0000000099c891ec>] down_trylock+0x20/0x4c
      
       but task is already holding lock:
        (&rq->lock){-.-.}, at: [<00000000842e1587>] __task_rq_lock+0x54/0xdc
      
       which lock already depends on the new lock.
      
       the existing dependency chain (in reverse order) is:
      
       -> #2 (&rq->lock){-.-.}:
              __lock_acquire+0x3b4/0x6e0
              lock_acquire+0xf4/0x2a8
              _raw_spin_lock+0x4c/0x60
              task_fork_fair+0x3c/0x148
              sched_fork+0x10c/0x214
              copy_process.isra.32.part.33+0x4e8/0x14f0
              _do_fork+0xe8/0x78c
              kernel_thread+0x48/0x54
              rest_init+0x34/0x2a4
              start_kernel+0x45c/0x488
      
       -> #1 (&p->pi_lock){-.-.}:
              __lock_acquire+0x3b4/0x6e0
              lock_acquire+0xf4/0x2a8
              _raw_spin_lock_irqsave+0x58/0x70
              try_to_wake_up+0x48/0x600
              wake_up_process+0x28/0x34
              __up.isra.0+0x60/0x6c
              up+0x60/0x68
              __up_console_sem+0x4c/0x7c
              console_unlock+0x328/0x634
              vprintk_emit+0x25c/0x390
              dev_vprintk_emit+0xc4/0x1fc
              dev_printk_emit+0x88/0xa8
              __dev_printk+0x58/0x9c
              _dev_info+0x84/0xa8
              usb_new_device+0x100/0x474
              hub_port_connect+0x280/0x92c
              hub_event+0x740/0xa84
              process_one_work+0x240/0x70c
              worker_thread+0x60/0x400
              kthread+0x110/0x13c
              ret_from_fork+0x10/0x18
      
       -> #0 ((console_sem).lock){-...}:
              validate_chain.isra.34+0x6e4/0xa20
              __lock_acquire+0x3b4/0x6e0
              lock_acquire+0xf4/0x2a8
              _raw_spin_lock_irqsave+0x58/0x70
              down_trylock+0x20/0x4c
              __down_trylock_console_sem+0x3c/0x9c
              console_trylock+0x20/0xb0
              vprintk_emit+0x254/0x390
              vprintk_default+0x58/0x90
              vprintk_func+0xbc/0x164
              printk+0x80/0xa0
              __dynamic_pr_debug+0x84/0xac
              gic_raise_softirq+0x184/0x18c
              smp_cross_call+0xac/0x218
              smp_send_reschedule+0x3c/0x48
              resched_curr+0x60/0x9c
              check_preempt_curr+0x70/0xdc
              wake_up_new_task+0x310/0x470
              _do_fork+0x188/0x78c
              SyS_clone+0x44/0x50
              __sys_trace_return+0x0/0x4
      
       other info that might help us debug this:
      
       Chain exists of:
         (console_sem).lock --> &p->pi_lock --> &rq->lock
      
        Possible unsafe locking scenario:
      
              CPU0                    CPU1
              ----                    ----
         lock(&rq->lock);
                                      lock(&p->pi_lock);
                                      lock(&rq->lock);
         lock((console_sem).lock);
      
        *** DEADLOCK ***
      
       2 locks held by dynamic_debug01/1873:
        #0:  (&p->pi_lock){-.-.}, at: [<000000001366df53>] wake_up_new_task+0x40/0x470
        #1:  (&rq->lock){-.-.}, at: [<00000000842e1587>] __task_rq_lock+0x54/0xdc
      
       stack backtrace:
       CPU: 10 PID: 1873 Comm: dynamic_debug01 Tainted: G        W        4.15.0+ #1
       Hardware name: GIGABYTE R120-T34-00/MT30-GS2-00, BIOS T48 10/02/2017
       Call trace:
        dump_backtrace+0x0/0x188
        show_stack+0x24/0x2c
        dump_stack+0xa4/0xe0
        print_circular_bug.isra.31+0x29c/0x2b8
        check_prev_add.constprop.39+0x6c8/0x6dc
        validate_chain.isra.34+0x6e4/0xa20
        __lock_acquire+0x3b4/0x6e0
        lock_acquire+0xf4/0x2a8
        _raw_spin_lock_irqsave+0x58/0x70
        down_trylock+0x20/0x4c
        __down_trylock_console_sem+0x3c/0x9c
        console_trylock+0x20/0xb0
        vprintk_emit+0x254/0x390
        vprintk_default+0x58/0x90
        vprintk_func+0xbc/0x164
        printk+0x80/0xa0
        __dynamic_pr_debug+0x84/0xac
        gic_raise_softirq+0x184/0x18c
        smp_cross_call+0xac/0x218
        smp_send_reschedule+0x3c/0x48
        resched_curr+0x60/0x9c
        check_preempt_curr+0x70/0xdc
        wake_up_new_task+0x310/0x470
        _do_fork+0x188/0x78c
        SyS_clone+0x44/0x50
        __sys_trace_return+0x0/0x4
       GICv3: CPU0: ICC_SGI1R_EL1 12000
      
      This could be fixed with printk_deferred() but that might lessen its
      usefulness for debugging. So change it to pr_devel to keep it out of
      production kernels. Developers working on gic-v3 can enable it as
      needed in their kernels.
      Signed-off-by: default avatarMark Salter <msalter@redhat.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      de16dfcc
    • Michael Kelley's avatar
      cpumask: Make for_each_cpu_wrap() available on UP as well · 4032cd4f
      Michael Kelley authored
      
      [ Upstream commit d207af2e ]
      
      for_each_cpu_wrap() was originally added in the #else half of a
      large "#if NR_CPUS == 1" statement, but was omitted in the #if
      half.  This patch adds the missing #if half to prevent compile
      errors when NR_CPUS is 1.
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: default avatarMichael Kelley <mhkelley@outlook.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kys@microsoft.com
      Cc: martin.petersen@oracle.com
      Cc: mikelley@microsoft.com
      Fixes: c743f0a5 ("sched/fair, cpumask: Export for_each_cpu_wrap()")
      Link: http://lkml.kernel.org/r/SN6PR1901MB2045F087F59450507D4FCC17CBF50@SN6PR1901MB2045.namprd19.prod.outlook.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4032cd4f
    • Stephen Boyd's avatar
      irqchip/gic-v3: Ignore disabled ITS nodes · c834b955
      Stephen Boyd authored
      
      [ Upstream commit 95a25625 ]
      
      On some platforms there's an ITS available but it's not enabled
      because reading or writing the registers is denied by the
      firmware. In fact, reading or writing them will cause the system
      to reset. We could remove the node from DT in such a case, but
      it's better to skip nodes that are marked as "disabled" in DT so
      that we can describe the hardware that exists and use the status
      property to indicate how the firmware has configured things.
      
      Cc: Stuart Yoder <stuyoder@gmail.com>
      Cc: Laurentiu Tudor <laurentiu.tudor@nxp.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Rajendra Nayak <rnayak@codeaurora.org>
      Signed-off-by: default avatarStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c834b955
    • Thomas Richter's avatar
      perf test: Fix test trace+probe_libc_inet_pton.sh for s390x · 2d8d8d23
      Thomas Richter authored
      
      [ Upstream commit 7a924536 ]
      
      On Intel test case trace+probe_libc_inet_pton.sh succeeds and the
      output is:
      
      [root@f27 perf]# ./perf trace --no-syscalls
                        -e probe_libc:inet_pton/max-stack=3/ ping -6 -c 1 ::1
      PING ::1(::1) 56 data bytes
      64 bytes from ::1: icmp_seq=1 ttl=64 time=0.037 ms
      
       --- ::1 ping statistics ---
      1 packets transmitted, 1 received, 0% packet loss, time 0ms
      rtt min/avg/max/mdev = 0.037/0.037/0.037/0.000 ms
           0.000 probe_libc:inet_pton:(7fa40ac618a0))
                    __GI___inet_pton (/usr/lib64/libc-2.26.so)
                    getaddrinfo (/usr/lib64/libc-2.26.so)
                    main (/usr/bin/ping)
      
      The kernel stack unwinder is used, it is specified implicitly
      as call-graph=fp (frame pointer).
      
      On s390x only dwarf is available for stack unwinding. It is also
      done in user space. This requires different parameter setup
      and result checking for s390x and Intel.
      
      This patch adds separate perf trace setup and result checking
      for Intel and s390x. On s390x specify this command line to
      get a call-graph and handle the different call graph result
      checking:
      
      [root@s35lp76 perf]# ./perf trace --no-syscalls
      	-e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
      PING ::1(::1) 56 data bytes
      64 bytes from ::1: icmp_seq=1 ttl=64 time=0.041 ms
      
       --- ::1 ping statistics ---
      1 packets transmitted, 1 received, 0% packet loss, time 0ms
      rtt min/avg/max/mdev = 0.041/0.041/0.041/0.000 ms
           0.000 probe_libc:inet_pton:(3ffb9942060))
                  __GI___inet_pton (/usr/lib64/libc-2.26.so)
                  gaih_inet (inlined)
                  __GI_getaddrinfo (inlined)
                  main (/usr/bin/ping)
                  __libc_start_main (/usr/lib64/libc-2.26.so)
                  _start (/usr/bin/ping)
      [root@s35lp76 perf]#
      
      Before:
      [root@s8360047 perf]# ./perf test -vv 58
      58: probe libc's inet_pton & backtrace it with ping       :
       --- start ---
      test child forked, pid 26349
      PING ::1(::1) 56 data bytes
      64 bytes from ::1: icmp_seq=1 ttl=64 time=0.079 ms
       --- ::1 ping statistics ---
      1 packets transmitted, 1 received, 0% packet loss, time 0ms
      rtt min/avg/max/mdev = 0.079/0.079/0.079/0.000 ms
      0.000 probe_libc:inet_pton:(3ff925c2060))
      test child finished with -1
       ---- end ----
      probe libc's inet_pton & backtrace it with ping: FAILED!
      [root@s8360047 perf]#
      
      After:
      [root@s35lp76 perf]# ./perf test -vv 57
      57: probe libc's inet_pton & backtrace it with ping       :
       --- start ---
      test child forked, pid 38708
      PING ::1(::1) 56 data bytes
      64 bytes from ::1: icmp_seq=1 ttl=64 time=0.038 ms
       --- ::1 ping statistics ---
      1 packets transmitted, 1 received, 0% packet loss, time 0ms
      rtt min/avg/max/mdev = 0.038/0.038/0.038/0.000 ms
      0.000 probe_libc:inet_pton:(3ff87342060))
      __GI___inet_pton (/usr/lib64/libc-2.26.so)
      gaih_inet (inlined)
      __GI_getaddrinfo (inlined)
      main (/usr/bin/ping)
      __libc_start_main (/usr/lib64/libc-2.26.so)
      _start (/usr/bin/ping)
      test child finished with 0
       ---- end ----
      probe libc's inet_pton & backtrace it with ping: Ok
      [root@s35lp76 perf]#
      
      On Intel the test case runs unchanged and succeeds.
      Signed-off-by: default avatarThomas Richter <tmricht@linux.vnet.ibm.com>
      Reviewed-by: default avatarHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Link: http://lkml.kernel.org/r/20180117083831.101001-1-tmricht@linux.vnet.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2d8d8d23
    • Nicholas Piggin's avatar
      powerpc/powernv: IMC fix out of bounds memory access at shutdown · 74cd9414
      Nicholas Piggin authored
      
      [ Upstream commit e7bde88c ]
      
      The OPAL IMC driver's shutdown handler disables nest PMU counters by
      walking nodes and taking the first CPU out of their cpumask, which is
      used to index into the paca (get_hard_smp_processor_id()). This does
      not always do the right thing, and in particular for CPU-less nodes it
      returns NR_CPUS and that overruns the paca and dereferences random
      memory.
      
      Fix it by being more careful about checking returned CPU, and only
      using online CPUs. It's not clear this shutdown code makes sense after
      commit 885dcd70 ("powerpc/perf: Add nest IMC PMU support"), but this
      should not make things worse
      
      Currently the bug causes us to call OPAL with a junk CPU number. A
      separate patch in development to change the way pacas are allocated
      escalates this bug into a crash:
      
        Unable to handle kernel paging request for data at address 0x2a21af1eeb000076
        Faulting instruction address: 0xc0000000000a5468
        Oops: Kernel access of bad area, sig: 11 [#1]
        ...
        NIP opal_imc_counters_shutdown+0x148/0x1d0
        LR  opal_imc_counters_shutdown+0x134/0x1d0
        Call Trace:
         opal_imc_counters_shutdown+0x134/0x1d0 (unreliable)
         platform_drv_shutdown+0x44/0x60
         device_shutdown+0x1f8/0x350
         kernel_restart_prepare+0x54/0x70
         kernel_restart+0x28/0xc0
         SyS_reboot+0x1d0/0x2c0
         system_call+0x58/0x6c
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      74cd9414
    • Will Deacon's avatar
      locking/qspinlock: Ensure node->count is updated before initialising node · c74e004c
      Will Deacon authored
      
      [ Upstream commit 11dc1322 ]
      
      When queuing on the qspinlock, the count field for the current CPU's head
      node is incremented. This needn't be atomic because locking in e.g. IRQ
      context is balanced and so an IRQ will return with node->count as it
      found it.
      
      However, the compiler could in theory reorder the initialisation of
      node[idx] before the increment of the head node->count, causing an
      IRQ to overwrite the initialised node and potentially corrupt the lock
      state.
      
      Avoid the potential for this harmful compiler reordering by placing a
      barrier() between the increment of the head node->count and the subsequent
      node initialisation.
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1518528177-19169-3-git-send-email-will.deacon@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c74e004c
    • mike.travis@hpe.com's avatar
      x86/platform/UV: Fix GAM Range Table entries less than 1GB · 5350cb01
      mike.travis@hpe.com authored
      
      [ Upstream commit c25d99d2 ]
      
      The latest UV platforms include the new ApachePass NVDIMMs into the
      UV address space.  This has introduced address ranges in the Global
      Address Map Table that are less than the previous lowest range, which
      was 2GB.  Fix the address calculation so it accommodates address ranges
      from bytes to exabytes.
      Signed-off-by: default avatarMike Travis <mike.travis@hpe.com>
      Reviewed-by: default avatarAndrew Banman <andrew.banman@hpe.com>
      Reviewed-by: default avatarDimitri Sivanich <dimitri.sivanich@hpe.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russ Anderson <russ.anderson@hpe.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180205221503.190219903@stormcage.americas.sgi.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5350cb01
    • Aneesh Kumar K.V's avatar
      powerpc/mm/hash64: Zero PGD pages on allocation · 288b3732
      Aneesh Kumar K.V authored
      
      [ Upstream commit fc5c2f4a ]
      
      On powerpc we allocate page table pages from slab caches of different
      sizes. Currently we have a constructor that zeroes out the objects when
      we allocate them for the first time.
      
      We expect the objects to be zeroed out when we free the the object
      back to slab cache. This happens in the unmap path. For hugetlb pages
      we call huge_pte_get_and_clear() to do that.
      
      With the current configuration of page table size, both PUD and PGD
      level tables are allocated from the same slab cache. At the PUD level,
      we use the second half of the table to store the slot information. But
      we never clear that when unmapping.
      
      When such a freed object is then allocated for a PGD page, the second
      half of the page table page will not be zeroed as expected. This
      results in a kernel crash.
      
      Fix it by always clearing PGD pages when they're allocated.
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      [mpe: Change log wording and formatting, add whitespace]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      288b3732
    • Jia Zhang's avatar
      vfs/proc/kcore, x86/mm/kcore: Fix SMAP fault when dumping vsyscall user page · f4d6e459
      Jia Zhang authored
      
      [ Upstream commit 595dd46e ]
      
      Commit:
      
        df04abfd ("fs/proc/kcore.c: Add bounce buffer for ktext data")
      
      ... introduced a bounce buffer to work around CONFIG_HARDENED_USERCOPY=y.
      However, accessing the vsyscall user page will cause an SMAP fault.
      
      Replace memcpy() with copy_from_user() to fix this bug works, but adding
      a common way to handle this sort of user page may be useful for future.
      
      Currently, only vsyscall page requires KCORE_USER.
      Signed-off-by: default avatarJia Zhang <zhang.jia@linux.alibaba.com>
      Reviewed-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: jolsa@redhat.com
      Link: http://lkml.kernel.org/r/1518446694-21124-2-git-send-email-zhang.jia@linux.alibaba.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f4d6e459
    • Tony Lindgren's avatar
      PM / wakeirq: Fix unbalanced IRQ enable for wakeirq · c064b7c1
      Tony Lindgren authored
      
      [ Upstream commit 69728051 ]
      
      If a device is runtime PM suspended when we enter suspend and has
      a dedicated wake IRQ, we can get the following warning:
      
      WARNING: CPU: 0 PID: 108 at kernel/irq/manage.c:526 enable_irq+0x40/0x94
      [  102.087860] Unbalanced enable for IRQ 147
      ...
      (enable_irq) from [<c06117a8>] (dev_pm_arm_wake_irq+0x4c/0x60)
      (dev_pm_arm_wake_irq) from [<c0618360>]
       (device_wakeup_arm_wake_irqs+0x58/0x9c)
      (device_wakeup_arm_wake_irqs) from [<c0615948>]
      (dpm_suspend_noirq+0x10/0x48)
      (dpm_suspend_noirq) from [<c01ac7ac>]
      (suspend_devices_and_enter+0x30c/0xf14)
      (suspend_devices_and_enter) from [<c01adf20>]
      (enter_state+0xad4/0xbd8)
      (enter_state) from [<c01ad3ec>] (pm_suspend+0x38/0x98)
      (pm_suspend) from [<c01ab3e8>] (state_store+0x68/0xc8)
      
      This is because the dedicated wake IRQ for the device may have been
      already enabled earlier by dev_pm_enable_wake_irq_check().  Fix the
      issue by checking for runtime PM suspended status.
      
      This issue can be easily reproduced by setting serial console log level
      to zero, letting the serial console idle, and suspend the system from
      an ssh terminal.  On resume, dmesg will have the warning above.
      
      The reason why I have not run into this issue earlier has been that I
      typically run my PM test cases from on a serial console instead over ssh.
      
      Fixes: c8434559 (PM / wakeirq: Enable dedicated wakeirq for suspend)
      Signed-off-by: default avatarTony Lindgren <tony@atomide.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c064b7c1
    • Rafael J. Wysocki's avatar
      ACPI / EC: Restore polling during noirq suspend/resume phases · afa0ce07
      Rafael J. Wysocki authored
      
      [ Upstream commit 3cd091a7 ]
      
      Commit 66259146 (ACPI / EC: Drop EC noirq hooks to fix a
      regression) modified the ACPI EC driver so that it doesn't switch
      over to busy polling mode during noirq stages of system suspend and
      resume in an attempt to fix an issue resulting from that behavior.
      
      However, that modification introduced a system resume regression on
      Thinkpad X240, so make the EC driver switch over to the polling mode
      during noirq stages of system suspend and resume again, which
      effectively reverts the problematic commit.
      
      Fixes: 66259146 (ACPI / EC: Drop EC noirq hooks to fix a regression)
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=197863Reported-by: default avatarMarkus Demleitner <m@tfiu.de>
      Tested-by: default avatarMarkus Demleitner <m@tfiu.de>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      afa0ce07
    • Daniel Borkmann's avatar
      bpf: fix rlimit in reuseport net selftest · 85bd5c68
      Daniel Borkmann authored
      
      [ Upstream commit 941ff6f1 ]
      
      Fix two issues in the reuseport_bpf selftests that were
      reported by Linaro CI:
      
        [...]
        + ./reuseport_bpf
        ---- IPv4 UDP ----
        Testing EBPF mod 10...
        Reprograming, testing mod 5...
        ./reuseport_bpf: ebpf error. log:
        0: (bf) r6 = r1
        1: (20) r0 = *(u32 *)skb[0]
        2: (97) r0 %= 10
        3: (95) exit
        processed 4 insns
        : Operation not permitted
        + echo FAIL
        [...]
        ---- IPv4 TCP ----
        Testing EBPF mod 10...
        ./reuseport_bpf: failed to bind send socket: Address already in use
        + echo FAIL
        [...]
      
      For the former adjust rlimit since this was the cause of
      failure for loading the BPF prog, and for the latter add
      SO_REUSEADDR.
      Reported-by: default avatarNaresh Kamboju <naresh.kamboju@linaro.org>
      Link: https://bugs.linaro.org/show_bug.cgi?id=3502Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      85bd5c68
    • Niklas Cassel's avatar
      net: stmmac: discard disabled flags in interrupt status register · ee5fe4bd
      Niklas Cassel authored
      
      [ Upstream commit 1b84ca18 ]
      
      The interrupt status register in both dwmac1000 and dwmac4 ignores
      interrupt enable (for dwmac4) / interrupt mask (for dwmac1000).
      Therefore, if we want to check only the bits that can actually trigger
      an irq, we have to filter the interrupt status register manually.
      
      Commit 0a764db1 ("stmmac: Discard masked flags in interrupt status
      register") fixed this for dwmac1000. Fix the same issue for dwmac4.
      
      Just like commit 0a764db1 ("stmmac: Discard masked flags in
      interrupt status register"), this makes sure that we do not get
      spurious link up/link down prints.
      Signed-off-by: default avatarNiklas Cassel <niklas.cassel@axis.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ee5fe4bd
    • Trond Myklebust's avatar
      SUNRPC: Don't call __UDPX_INC_STATS() from a preemptible context · 26bebd5a
      Trond Myklebust authored
      
      [ Upstream commit 0afa6b44 ]
      
      Calling __UDPX_INC_STATS() from a preemptible context leads to a
      warning of the form:
      
       BUG: using __this_cpu_add() in preemptible [00000000] code: kworker/u5:0/31
       caller is xs_udp_data_receive_workfn+0x194/0x270
       CPU: 1 PID: 31 Comm: kworker/u5:0 Not tainted 4.15.0-rc8-00076-g90ea9f1b #2
       Workqueue: xprtiod xs_udp_data_receive_workfn
       Call Trace:
        dump_stack+0x85/0xc1
        check_preemption_disabled+0xce/0xe0
        xs_udp_data_receive_workfn+0x194/0x270
        process_one_work+0x318/0x620
        worker_thread+0x20a/0x390
        ? process_one_work+0x620/0x620
        kthread+0x120/0x130
        ? __kthread_bind_mask+0x60/0x60
        ret_from_fork+0x24/0x30
      
      Since we're taking a spinlock in those functions anyway, let's fix the
      issue by moving the call so that it occurs under the spinlock.
      Reported-by: default avatarkernel test robot <fengguang.wu@intel.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      26bebd5a
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Fix handling of secondary HPTEG in HPT resizing code · f58e4ecb
      Paul Mackerras authored
      
      [ Upstream commit 05f2bb03 ]
      
      This fixes the computation of the HPTE index to use when the HPT
      resizing code encounters a bolted HPTE which is stored in its
      secondary HPTE group.  The code inverts the HPTE group number, which
      is correct, but doesn't then mask it with new_hash_mask.  As a result,
      new_pteg will be effectively negative, resulting in new_hptep
      pointing before the new HPT, which will corrupt memory.
      
      In addition, this removes two BUG_ON statements.  The condition that
      the BUG_ONs were testing -- that we have computed the hash value
      incorrectly -- has never been observed in testing, and if it did
      occur, would only affect the guest, not the host.  Given that
      BUG_ON should only be used in conditions where the kernel (i.e.
      the host kernel, in this case) can't possibly continue execution,
      it is not appropriate here.
      Reviewed-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f58e4ecb
    • Jesper Dangaard Brouer's avatar
      tools/libbpf: handle issues with bpf ELF objects containing .eh_frames · d6b00490
      Jesper Dangaard Brouer authored
      
      [ Upstream commit e3d91b0c ]
      
      V3: More generic skipping of relo-section (suggested by Daniel)
      
      If clang >= 4.0.1 is missing the option '-target bpf', it will cause
      llc/llvm to create two ELF sections for "Exception Frames", with
      section names '.eh_frame' and '.rel.eh_frame'.
      
      The BPF ELF loader library libbpf fails when loading files with these
      sections.  The other in-kernel BPF ELF loader in samples/bpf/bpf_load.c,
      handle this gracefully. And iproute2 loader also seems to work with these
      "eh" sections.
      
      The issue in libbpf is caused by bpf_object__elf_collect() skipping
      some sections, and later when performing relocation it will be
      pointing to a skipped section, as these sections cannot be found by
      bpf_object__find_prog_by_idx() in bpf_object__collect_reloc().
      
      This is a general issue that also occurs for other sections, like
      debug sections which are also skipped and can have relo section.
      
      As suggested by Daniel.  To avoid keeping state about all skipped
      sections, instead perform a direct qlookup in the ELF object.  Lookup
      the section that the relo-section points to and check if it contains
      executable machine instructions (denoted by the sh_flags
      SHF_EXECINSTR).  Use this check to also skip irrelevant relo-sections.
      
      Note, for samples/bpf/ the '-target bpf' parameter to clang cannot be used
      due to incompatibility with asm embedded headers, that some of the samples
      include. This is explained in more details by Yonghong Song in bpf_devel_QA.
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d6b00490
    • Mathieu Malaterre's avatar
      net: Extra '_get' in declaration of arch_get_platform_mac_address · 327aac8c
      Mathieu Malaterre authored
      
      [ Upstream commit e728789c ]
      
      In commit c7f5d105 ("net: Add eth_platform_get_mac_address() helper."),
      two declarations were added:
      
        int eth_platform_get_mac_address(struct device *dev, u8 *mac_addr);
        unsigned char *arch_get_platform_get_mac_address(void);
      
      An extra '_get' was introduced in arch_get_platform_get_mac_address, remove
      it. Fix compile warning using W=1:
      
        CC      net/ethernet/eth.o
      net/ethernet/eth.c:523:24: warning: no previous prototype for ‘arch_get_platform_mac_address’ [-Wmissing-prototypes]
       unsigned char * __weak arch_get_platform_mac_address(void)
                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        AR      net/ethernet/built-in.o
      Signed-off-by: default avatarMathieu Malaterre <malat@debian.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      327aac8c
    • Chuck Lever's avatar
      svcrdma: Fix Read chunk round-up · 0b1fa241
      Chuck Lever authored
      
      [ Upstream commit 175e0310 ]
      
      A single NFSv4 WRITE compound can often have three operations:
      PUTFH, WRITE, then GETATTR.
      
      When the WRITE payload is sent in a Read chunk, the client places
      the GETATTR in the inline part of the RPC/RDMA message, just after
      the WRITE operation (sans payload). The position value in the Read
      chunk enables the receiver to insert the Read chunk at the correct
      place in the received XDR stream; that is between the WRITE and
      GETATTR.
      
      According to RFC 8166, an NFS/RDMA client does not have to add XDR
      round-up to the Read chunk that carries the WRITE payload. The
      receiver adds XDR round-up padding if it is absent and the
      receiver's XDR decoder requires it to be present.
      
      Commit 193bcb7b ("svcrdma: Populate tail iovec when receiving")
      attempted to add support for receiving such a compound so that just
      the WRITE payload appears in rq_arg's page list, and the trailing
      GETATTR is placed in rq_arg's tail iovec. (TCP just strings the
      whole compound into the head iovec and page list, without regard
      to the alignment of the WRITE payload).
      
      The server transport logic also had to accommodate the optional XDR
      round-up of the Read chunk, which it did simply by lengthening the
      tail iovec when round-up was needed. This approach is adequate for
      the NFSv2 and NFSv3 WRITE decoders.
      
      Unfortunately it is not sufficient for nfsd4_decode_write. When the
      Read chunk length is a couple of bytes less than PAGE_SIZE, the
      computation at the end of nfsd4_decode_write allows argp->pagelen to
      go negative, which breaks the logic in read_buf that looks for the
      tail iovec.
      
      The result is that a WRITE operation whose payload length is just
      less than a multiple of a page succeeds, but the subsequent GETATTR
      in the same compound fails with NFS4ERR_OP_ILLEGAL because the XDR
      decoder can't find it. Clients ignore the error, but they must
      update their attribute cache via a separate round trip.
      
      As nfsd4_decode_write appears to expect the payload itself to always
      have appropriate XDR round-up, have svc_rdma_build_normal_read_chunk
      add the Read chunk XDR round-up to the page_len rather than
      lengthening the tail iovec.
      Reported-by: default avatarOlga Kornievskaia <kolga@netapp.com>
      Fixes: 193bcb7b ("svcrdma: Populate tail iovec when receiving")
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Tested-by: default avatarOlga Kornievskaia <kolga@netapp.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0b1fa241
    • David Howells's avatar
      rxrpc: Don't put crypto buffers on the stack · e781fff7
      David Howells authored
      
      [ Upstream commit 8c2f826d ]
      
      Don't put buffers of data to be handed to crypto on the stack as this may
      cause an assertion failure in the kernel (see below).  Fix this by using an
      kmalloc'd buffer instead.
      
      kernel BUG at ./include/linux/scatterlist.h:147!
      ...
      RIP: 0010:rxkad_encrypt_response.isra.6+0x191/0x1b0 [rxrpc]
      RSP: 0018:ffffbe2fc06cfca8 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: ffff989277d59900 RCX: 0000000000000028
      RDX: 0000259dc06cfd88 RSI: 0000000000000025 RDI: ffffbe30406cfd88
      RBP: ffffbe2fc06cfd60 R08: ffffbe2fc06cfd08 R09: ffffbe2fc06cfd08
      R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff7c5f80d9f95
      R13: ffffbe2fc06cfd88 R14: ffff98927a3f7aa0 R15: ffffbe2fc06cfd08
      FS:  0000000000000000(0000) GS:ffff98927fc00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000055b1ff28f0f8 CR3: 000000001b412003 CR4: 00000000003606f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       rxkad_respond_to_challenge+0x297/0x330 [rxrpc]
       rxrpc_process_connection+0xd1/0x690 [rxrpc]
       ? process_one_work+0x1c3/0x680
       ? __lock_is_held+0x59/0xa0
       process_one_work+0x249/0x680
       worker_thread+0x3a/0x390
       ? process_one_work+0x680/0x680
       kthread+0x121/0x140
       ? kthread_create_worker_on_cpu+0x70/0x70
       ret_from_fork+0x3a/0x50
      Reported-by: default avatarJonathan Billings <jsbillings@jsbillings.org>
      Reported-by: default avatarMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-by: default avatarJonathan Billings <jsbillings@jsbillings.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e781fff7
    • Steven Rostedt (VMware)'s avatar
      selftests/ftrace: Add some missing glob checks · c5ce9e5b
      Steven Rostedt (VMware) authored
      
      [ Upstream commit 97fe22ad ]
      
      Al Viro discovered a bug in the glob ftrace filtering code where "*a*b" is
      treated the same as "a*b", and functions that would be selected by "*a*b"
      but not "a*b" are not selected with "*a*b".
      
      Add tests for patterns "*a*b" and "a*b*" to the glob selftest.
      
      Link: http://lkml.kernel.org/r/20180127170748.GF13338@ZenIV.linux.org.uk
      
      Cc: Shuah Khan <shuah@kernel.org>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c5ce9e5b
    • Chen Yu's avatar
      cpufreq: intel_pstate: Enable HWP during system resume on CPU0 · ae9c78af
      Chen Yu authored
      
      [ Upstream commit 70f6bf2a ]
      
      When maxcpus=1 is in the kernel command line, the BP is responsible
      for re-enabling the HWP - because currently only the APs invoke
      intel_pstate_hwp_enable() during their online process - which might
      put the system into unstable state after resume.
      
      Fix this by enabling the HWP explicitly on BP during resume.
      Reported-by: default avatarDoug Smythies <dsmythies@telus.net>
      Suggested-by: default avatarSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: default avatarYu Chen <yu.c.chen@intel.com>
      [ rjw: Subject/changelog, minor modifications ]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ae9c78af