1. 26 Apr, 2018 40 commits
    • Stephen Boyd's avatar
      irqchip/gic-v3: Ignore disabled ITS nodes · c834b955
      Stephen Boyd authored
      
      [ Upstream commit 95a25625 ]
      
      On some platforms there's an ITS available but it's not enabled
      because reading or writing the registers is denied by the
      firmware. In fact, reading or writing them will cause the system
      to reset. We could remove the node from DT in such a case, but
      it's better to skip nodes that are marked as "disabled" in DT so
      that we can describe the hardware that exists and use the status
      property to indicate how the firmware has configured things.
      
      Cc: Stuart Yoder <stuyoder@gmail.com>
      Cc: Laurentiu Tudor <laurentiu.tudor@nxp.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Rajendra Nayak <rnayak@codeaurora.org>
      Signed-off-by: default avatarStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c834b955
    • Thomas Richter's avatar
      perf test: Fix test trace+probe_libc_inet_pton.sh for s390x · 2d8d8d23
      Thomas Richter authored
      
      [ Upstream commit 7a924536 ]
      
      On Intel test case trace+probe_libc_inet_pton.sh succeeds and the
      output is:
      
      [root@f27 perf]# ./perf trace --no-syscalls
                        -e probe_libc:inet_pton/max-stack=3/ ping -6 -c 1 ::1
      PING ::1(::1) 56 data bytes
      64 bytes from ::1: icmp_seq=1 ttl=64 time=0.037 ms
      
       --- ::1 ping statistics ---
      1 packets transmitted, 1 received, 0% packet loss, time 0ms
      rtt min/avg/max/mdev = 0.037/0.037/0.037/0.000 ms
           0.000 probe_libc:inet_pton:(7fa40ac618a0))
                    __GI___inet_pton (/usr/lib64/libc-2.26.so)
                    getaddrinfo (/usr/lib64/libc-2.26.so)
                    main (/usr/bin/ping)
      
      The kernel stack unwinder is used, it is specified implicitly
      as call-graph=fp (frame pointer).
      
      On s390x only dwarf is available for stack unwinding. It is also
      done in user space. This requires different parameter setup
      and result checking for s390x and Intel.
      
      This patch adds separate perf trace setup and result checking
      for Intel and s390x. On s390x specify this command line to
      get a call-graph and handle the different call graph result
      checking:
      
      [root@s35lp76 perf]# ./perf trace --no-syscalls
      	-e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
      PING ::1(::1) 56 data bytes
      64 bytes from ::1: icmp_seq=1 ttl=64 time=0.041 ms
      
       --- ::1 ping statistics ---
      1 packets transmitted, 1 received, 0% packet loss, time 0ms
      rtt min/avg/max/mdev = 0.041/0.041/0.041/0.000 ms
           0.000 probe_libc:inet_pton:(3ffb9942060))
                  __GI___inet_pton (/usr/lib64/libc-2.26.so)
                  gaih_inet (inlined)
                  __GI_getaddrinfo (inlined)
                  main (/usr/bin/ping)
                  __libc_start_main (/usr/lib64/libc-2.26.so)
                  _start (/usr/bin/ping)
      [root@s35lp76 perf]#
      
      Before:
      [root@s8360047 perf]# ./perf test -vv 58
      58: probe libc's inet_pton & backtrace it with ping       :
       --- start ---
      test child forked, pid 26349
      PING ::1(::1) 56 data bytes
      64 bytes from ::1: icmp_seq=1 ttl=64 time=0.079 ms
       --- ::1 ping statistics ---
      1 packets transmitted, 1 received, 0% packet loss, time 0ms
      rtt min/avg/max/mdev = 0.079/0.079/0.079/0.000 ms
      0.000 probe_libc:inet_pton:(3ff925c2060))
      test child finished with -1
       ---- end ----
      probe libc's inet_pton & backtrace it with ping: FAILED!
      [root@s8360047 perf]#
      
      After:
      [root@s35lp76 perf]# ./perf test -vv 57
      57: probe libc's inet_pton & backtrace it with ping       :
       --- start ---
      test child forked, pid 38708
      PING ::1(::1) 56 data bytes
      64 bytes from ::1: icmp_seq=1 ttl=64 time=0.038 ms
       --- ::1 ping statistics ---
      1 packets transmitted, 1 received, 0% packet loss, time 0ms
      rtt min/avg/max/mdev = 0.038/0.038/0.038/0.000 ms
      0.000 probe_libc:inet_pton:(3ff87342060))
      __GI___inet_pton (/usr/lib64/libc-2.26.so)
      gaih_inet (inlined)
      __GI_getaddrinfo (inlined)
      main (/usr/bin/ping)
      __libc_start_main (/usr/lib64/libc-2.26.so)
      _start (/usr/bin/ping)
      test child finished with 0
       ---- end ----
      probe libc's inet_pton & backtrace it with ping: Ok
      [root@s35lp76 perf]#
      
      On Intel the test case runs unchanged and succeeds.
      Signed-off-by: default avatarThomas Richter <tmricht@linux.vnet.ibm.com>
      Reviewed-by: default avatarHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Link: http://lkml.kernel.org/r/20180117083831.101001-1-tmricht@linux.vnet.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2d8d8d23
    • Nicholas Piggin's avatar
      powerpc/powernv: IMC fix out of bounds memory access at shutdown · 74cd9414
      Nicholas Piggin authored
      
      [ Upstream commit e7bde88c ]
      
      The OPAL IMC driver's shutdown handler disables nest PMU counters by
      walking nodes and taking the first CPU out of their cpumask, which is
      used to index into the paca (get_hard_smp_processor_id()). This does
      not always do the right thing, and in particular for CPU-less nodes it
      returns NR_CPUS and that overruns the paca and dereferences random
      memory.
      
      Fix it by being more careful about checking returned CPU, and only
      using online CPUs. It's not clear this shutdown code makes sense after
      commit 885dcd70 ("powerpc/perf: Add nest IMC PMU support"), but this
      should not make things worse
      
      Currently the bug causes us to call OPAL with a junk CPU number. A
      separate patch in development to change the way pacas are allocated
      escalates this bug into a crash:
      
        Unable to handle kernel paging request for data at address 0x2a21af1eeb000076
        Faulting instruction address: 0xc0000000000a5468
        Oops: Kernel access of bad area, sig: 11 [#1]
        ...
        NIP opal_imc_counters_shutdown+0x148/0x1d0
        LR  opal_imc_counters_shutdown+0x134/0x1d0
        Call Trace:
         opal_imc_counters_shutdown+0x134/0x1d0 (unreliable)
         platform_drv_shutdown+0x44/0x60
         device_shutdown+0x1f8/0x350
         kernel_restart_prepare+0x54/0x70
         kernel_restart+0x28/0xc0
         SyS_reboot+0x1d0/0x2c0
         system_call+0x58/0x6c
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      74cd9414
    • Will Deacon's avatar
      locking/qspinlock: Ensure node->count is updated before initialising node · c74e004c
      Will Deacon authored
      
      [ Upstream commit 11dc1322 ]
      
      When queuing on the qspinlock, the count field for the current CPU's head
      node is incremented. This needn't be atomic because locking in e.g. IRQ
      context is balanced and so an IRQ will return with node->count as it
      found it.
      
      However, the compiler could in theory reorder the initialisation of
      node[idx] before the increment of the head node->count, causing an
      IRQ to overwrite the initialised node and potentially corrupt the lock
      state.
      
      Avoid the potential for this harmful compiler reordering by placing a
      barrier() between the increment of the head node->count and the subsequent
      node initialisation.
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1518528177-19169-3-git-send-email-will.deacon@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c74e004c
    • mike.travis@hpe.com's avatar
      x86/platform/UV: Fix GAM Range Table entries less than 1GB · 5350cb01
      mike.travis@hpe.com authored
      
      [ Upstream commit c25d99d2 ]
      
      The latest UV platforms include the new ApachePass NVDIMMs into the
      UV address space.  This has introduced address ranges in the Global
      Address Map Table that are less than the previous lowest range, which
      was 2GB.  Fix the address calculation so it accommodates address ranges
      from bytes to exabytes.
      Signed-off-by: default avatarMike Travis <mike.travis@hpe.com>
      Reviewed-by: default avatarAndrew Banman <andrew.banman@hpe.com>
      Reviewed-by: default avatarDimitri Sivanich <dimitri.sivanich@hpe.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russ Anderson <russ.anderson@hpe.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180205221503.190219903@stormcage.americas.sgi.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5350cb01
    • Aneesh Kumar K.V's avatar
      powerpc/mm/hash64: Zero PGD pages on allocation · 288b3732
      Aneesh Kumar K.V authored
      
      [ Upstream commit fc5c2f4a ]
      
      On powerpc we allocate page table pages from slab caches of different
      sizes. Currently we have a constructor that zeroes out the objects when
      we allocate them for the first time.
      
      We expect the objects to be zeroed out when we free the the object
      back to slab cache. This happens in the unmap path. For hugetlb pages
      we call huge_pte_get_and_clear() to do that.
      
      With the current configuration of page table size, both PUD and PGD
      level tables are allocated from the same slab cache. At the PUD level,
      we use the second half of the table to store the slot information. But
      we never clear that when unmapping.
      
      When such a freed object is then allocated for a PGD page, the second
      half of the page table page will not be zeroed as expected. This
      results in a kernel crash.
      
      Fix it by always clearing PGD pages when they're allocated.
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      [mpe: Change log wording and formatting, add whitespace]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      288b3732
    • Jia Zhang's avatar
      vfs/proc/kcore, x86/mm/kcore: Fix SMAP fault when dumping vsyscall user page · f4d6e459
      Jia Zhang authored
      
      [ Upstream commit 595dd46e ]
      
      Commit:
      
        df04abfd ("fs/proc/kcore.c: Add bounce buffer for ktext data")
      
      ... introduced a bounce buffer to work around CONFIG_HARDENED_USERCOPY=y.
      However, accessing the vsyscall user page will cause an SMAP fault.
      
      Replace memcpy() with copy_from_user() to fix this bug works, but adding
      a common way to handle this sort of user page may be useful for future.
      
      Currently, only vsyscall page requires KCORE_USER.
      Signed-off-by: default avatarJia Zhang <zhang.jia@linux.alibaba.com>
      Reviewed-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: jolsa@redhat.com
      Link: http://lkml.kernel.org/r/1518446694-21124-2-git-send-email-zhang.jia@linux.alibaba.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f4d6e459
    • Tony Lindgren's avatar
      PM / wakeirq: Fix unbalanced IRQ enable for wakeirq · c064b7c1
      Tony Lindgren authored
      
      [ Upstream commit 69728051 ]
      
      If a device is runtime PM suspended when we enter suspend and has
      a dedicated wake IRQ, we can get the following warning:
      
      WARNING: CPU: 0 PID: 108 at kernel/irq/manage.c:526 enable_irq+0x40/0x94
      [  102.087860] Unbalanced enable for IRQ 147
      ...
      (enable_irq) from [<c06117a8>] (dev_pm_arm_wake_irq+0x4c/0x60)
      (dev_pm_arm_wake_irq) from [<c0618360>]
       (device_wakeup_arm_wake_irqs+0x58/0x9c)
      (device_wakeup_arm_wake_irqs) from [<c0615948>]
      (dpm_suspend_noirq+0x10/0x48)
      (dpm_suspend_noirq) from [<c01ac7ac>]
      (suspend_devices_and_enter+0x30c/0xf14)
      (suspend_devices_and_enter) from [<c01adf20>]
      (enter_state+0xad4/0xbd8)
      (enter_state) from [<c01ad3ec>] (pm_suspend+0x38/0x98)
      (pm_suspend) from [<c01ab3e8>] (state_store+0x68/0xc8)
      
      This is because the dedicated wake IRQ for the device may have been
      already enabled earlier by dev_pm_enable_wake_irq_check().  Fix the
      issue by checking for runtime PM suspended status.
      
      This issue can be easily reproduced by setting serial console log level
      to zero, letting the serial console idle, and suspend the system from
      an ssh terminal.  On resume, dmesg will have the warning above.
      
      The reason why I have not run into this issue earlier has been that I
      typically run my PM test cases from on a serial console instead over ssh.
      
      Fixes: c8434559 (PM / wakeirq: Enable dedicated wakeirq for suspend)
      Signed-off-by: default avatarTony Lindgren <tony@atomide.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c064b7c1
    • Rafael J. Wysocki's avatar
      ACPI / EC: Restore polling during noirq suspend/resume phases · afa0ce07
      Rafael J. Wysocki authored
      
      [ Upstream commit 3cd091a7 ]
      
      Commit 66259146 (ACPI / EC: Drop EC noirq hooks to fix a
      regression) modified the ACPI EC driver so that it doesn't switch
      over to busy polling mode during noirq stages of system suspend and
      resume in an attempt to fix an issue resulting from that behavior.
      
      However, that modification introduced a system resume regression on
      Thinkpad X240, so make the EC driver switch over to the polling mode
      during noirq stages of system suspend and resume again, which
      effectively reverts the problematic commit.
      
      Fixes: 66259146 (ACPI / EC: Drop EC noirq hooks to fix a regression)
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=197863Reported-by: default avatarMarkus Demleitner <m@tfiu.de>
      Tested-by: default avatarMarkus Demleitner <m@tfiu.de>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      afa0ce07
    • Daniel Borkmann's avatar
      bpf: fix rlimit in reuseport net selftest · 85bd5c68
      Daniel Borkmann authored
      
      [ Upstream commit 941ff6f1 ]
      
      Fix two issues in the reuseport_bpf selftests that were
      reported by Linaro CI:
      
        [...]
        + ./reuseport_bpf
        ---- IPv4 UDP ----
        Testing EBPF mod 10...
        Reprograming, testing mod 5...
        ./reuseport_bpf: ebpf error. log:
        0: (bf) r6 = r1
        1: (20) r0 = *(u32 *)skb[0]
        2: (97) r0 %= 10
        3: (95) exit
        processed 4 insns
        : Operation not permitted
        + echo FAIL
        [...]
        ---- IPv4 TCP ----
        Testing EBPF mod 10...
        ./reuseport_bpf: failed to bind send socket: Address already in use
        + echo FAIL
        [...]
      
      For the former adjust rlimit since this was the cause of
      failure for loading the BPF prog, and for the latter add
      SO_REUSEADDR.
      Reported-by: default avatarNaresh Kamboju <naresh.kamboju@linaro.org>
      Link: https://bugs.linaro.org/show_bug.cgi?id=3502Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      85bd5c68
    • Niklas Cassel's avatar
      net: stmmac: discard disabled flags in interrupt status register · ee5fe4bd
      Niklas Cassel authored
      
      [ Upstream commit 1b84ca18 ]
      
      The interrupt status register in both dwmac1000 and dwmac4 ignores
      interrupt enable (for dwmac4) / interrupt mask (for dwmac1000).
      Therefore, if we want to check only the bits that can actually trigger
      an irq, we have to filter the interrupt status register manually.
      
      Commit 0a764db1 ("stmmac: Discard masked flags in interrupt status
      register") fixed this for dwmac1000. Fix the same issue for dwmac4.
      
      Just like commit 0a764db1 ("stmmac: Discard masked flags in
      interrupt status register"), this makes sure that we do not get
      spurious link up/link down prints.
      Signed-off-by: default avatarNiklas Cassel <niklas.cassel@axis.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ee5fe4bd
    • Trond Myklebust's avatar
      SUNRPC: Don't call __UDPX_INC_STATS() from a preemptible context · 26bebd5a
      Trond Myklebust authored
      
      [ Upstream commit 0afa6b44 ]
      
      Calling __UDPX_INC_STATS() from a preemptible context leads to a
      warning of the form:
      
       BUG: using __this_cpu_add() in preemptible [00000000] code: kworker/u5:0/31
       caller is xs_udp_data_receive_workfn+0x194/0x270
       CPU: 1 PID: 31 Comm: kworker/u5:0 Not tainted 4.15.0-rc8-00076-g90ea9f1b #2
       Workqueue: xprtiod xs_udp_data_receive_workfn
       Call Trace:
        dump_stack+0x85/0xc1
        check_preemption_disabled+0xce/0xe0
        xs_udp_data_receive_workfn+0x194/0x270
        process_one_work+0x318/0x620
        worker_thread+0x20a/0x390
        ? process_one_work+0x620/0x620
        kthread+0x120/0x130
        ? __kthread_bind_mask+0x60/0x60
        ret_from_fork+0x24/0x30
      
      Since we're taking a spinlock in those functions anyway, let's fix the
      issue by moving the call so that it occurs under the spinlock.
      Reported-by: default avatarkernel test robot <fengguang.wu@intel.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      26bebd5a
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Fix handling of secondary HPTEG in HPT resizing code · f58e4ecb
      Paul Mackerras authored
      
      [ Upstream commit 05f2bb03 ]
      
      This fixes the computation of the HPTE index to use when the HPT
      resizing code encounters a bolted HPTE which is stored in its
      secondary HPTE group.  The code inverts the HPTE group number, which
      is correct, but doesn't then mask it with new_hash_mask.  As a result,
      new_pteg will be effectively negative, resulting in new_hptep
      pointing before the new HPT, which will corrupt memory.
      
      In addition, this removes two BUG_ON statements.  The condition that
      the BUG_ONs were testing -- that we have computed the hash value
      incorrectly -- has never been observed in testing, and if it did
      occur, would only affect the guest, not the host.  Given that
      BUG_ON should only be used in conditions where the kernel (i.e.
      the host kernel, in this case) can't possibly continue execution,
      it is not appropriate here.
      Reviewed-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f58e4ecb
    • Jesper Dangaard Brouer's avatar
      tools/libbpf: handle issues with bpf ELF objects containing .eh_frames · d6b00490
      Jesper Dangaard Brouer authored
      
      [ Upstream commit e3d91b0c ]
      
      V3: More generic skipping of relo-section (suggested by Daniel)
      
      If clang >= 4.0.1 is missing the option '-target bpf', it will cause
      llc/llvm to create two ELF sections for "Exception Frames", with
      section names '.eh_frame' and '.rel.eh_frame'.
      
      The BPF ELF loader library libbpf fails when loading files with these
      sections.  The other in-kernel BPF ELF loader in samples/bpf/bpf_load.c,
      handle this gracefully. And iproute2 loader also seems to work with these
      "eh" sections.
      
      The issue in libbpf is caused by bpf_object__elf_collect() skipping
      some sections, and later when performing relocation it will be
      pointing to a skipped section, as these sections cannot be found by
      bpf_object__find_prog_by_idx() in bpf_object__collect_reloc().
      
      This is a general issue that also occurs for other sections, like
      debug sections which are also skipped and can have relo section.
      
      As suggested by Daniel.  To avoid keeping state about all skipped
      sections, instead perform a direct qlookup in the ELF object.  Lookup
      the section that the relo-section points to and check if it contains
      executable machine instructions (denoted by the sh_flags
      SHF_EXECINSTR).  Use this check to also skip irrelevant relo-sections.
      
      Note, for samples/bpf/ the '-target bpf' parameter to clang cannot be used
      due to incompatibility with asm embedded headers, that some of the samples
      include. This is explained in more details by Yonghong Song in bpf_devel_QA.
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d6b00490
    • Mathieu Malaterre's avatar
      net: Extra '_get' in declaration of arch_get_platform_mac_address · 327aac8c
      Mathieu Malaterre authored
      
      [ Upstream commit e728789c ]
      
      In commit c7f5d105 ("net: Add eth_platform_get_mac_address() helper."),
      two declarations were added:
      
        int eth_platform_get_mac_address(struct device *dev, u8 *mac_addr);
        unsigned char *arch_get_platform_get_mac_address(void);
      
      An extra '_get' was introduced in arch_get_platform_get_mac_address, remove
      it. Fix compile warning using W=1:
      
        CC      net/ethernet/eth.o
      net/ethernet/eth.c:523:24: warning: no previous prototype for ‘arch_get_platform_mac_address’ [-Wmissing-prototypes]
       unsigned char * __weak arch_get_platform_mac_address(void)
                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        AR      net/ethernet/built-in.o
      Signed-off-by: default avatarMathieu Malaterre <malat@debian.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      327aac8c
    • Chuck Lever's avatar
      svcrdma: Fix Read chunk round-up · 0b1fa241
      Chuck Lever authored
      
      [ Upstream commit 175e0310 ]
      
      A single NFSv4 WRITE compound can often have three operations:
      PUTFH, WRITE, then GETATTR.
      
      When the WRITE payload is sent in a Read chunk, the client places
      the GETATTR in the inline part of the RPC/RDMA message, just after
      the WRITE operation (sans payload). The position value in the Read
      chunk enables the receiver to insert the Read chunk at the correct
      place in the received XDR stream; that is between the WRITE and
      GETATTR.
      
      According to RFC 8166, an NFS/RDMA client does not have to add XDR
      round-up to the Read chunk that carries the WRITE payload. The
      receiver adds XDR round-up padding if it is absent and the
      receiver's XDR decoder requires it to be present.
      
      Commit 193bcb7b ("svcrdma: Populate tail iovec when receiving")
      attempted to add support for receiving such a compound so that just
      the WRITE payload appears in rq_arg's page list, and the trailing
      GETATTR is placed in rq_arg's tail iovec. (TCP just strings the
      whole compound into the head iovec and page list, without regard
      to the alignment of the WRITE payload).
      
      The server transport logic also had to accommodate the optional XDR
      round-up of the Read chunk, which it did simply by lengthening the
      tail iovec when round-up was needed. This approach is adequate for
      the NFSv2 and NFSv3 WRITE decoders.
      
      Unfortunately it is not sufficient for nfsd4_decode_write. When the
      Read chunk length is a couple of bytes less than PAGE_SIZE, the
      computation at the end of nfsd4_decode_write allows argp->pagelen to
      go negative, which breaks the logic in read_buf that looks for the
      tail iovec.
      
      The result is that a WRITE operation whose payload length is just
      less than a multiple of a page succeeds, but the subsequent GETATTR
      in the same compound fails with NFS4ERR_OP_ILLEGAL because the XDR
      decoder can't find it. Clients ignore the error, but they must
      update their attribute cache via a separate round trip.
      
      As nfsd4_decode_write appears to expect the payload itself to always
      have appropriate XDR round-up, have svc_rdma_build_normal_read_chunk
      add the Read chunk XDR round-up to the page_len rather than
      lengthening the tail iovec.
      Reported-by: default avatarOlga Kornievskaia <kolga@netapp.com>
      Fixes: 193bcb7b ("svcrdma: Populate tail iovec when receiving")
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Tested-by: default avatarOlga Kornievskaia <kolga@netapp.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0b1fa241
    • David Howells's avatar
      rxrpc: Don't put crypto buffers on the stack · e781fff7
      David Howells authored
      
      [ Upstream commit 8c2f826d ]
      
      Don't put buffers of data to be handed to crypto on the stack as this may
      cause an assertion failure in the kernel (see below).  Fix this by using an
      kmalloc'd buffer instead.
      
      kernel BUG at ./include/linux/scatterlist.h:147!
      ...
      RIP: 0010:rxkad_encrypt_response.isra.6+0x191/0x1b0 [rxrpc]
      RSP: 0018:ffffbe2fc06cfca8 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: ffff989277d59900 RCX: 0000000000000028
      RDX: 0000259dc06cfd88 RSI: 0000000000000025 RDI: ffffbe30406cfd88
      RBP: ffffbe2fc06cfd60 R08: ffffbe2fc06cfd08 R09: ffffbe2fc06cfd08
      R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff7c5f80d9f95
      R13: ffffbe2fc06cfd88 R14: ffff98927a3f7aa0 R15: ffffbe2fc06cfd08
      FS:  0000000000000000(0000) GS:ffff98927fc00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000055b1ff28f0f8 CR3: 000000001b412003 CR4: 00000000003606f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       rxkad_respond_to_challenge+0x297/0x330 [rxrpc]
       rxrpc_process_connection+0xd1/0x690 [rxrpc]
       ? process_one_work+0x1c3/0x680
       ? __lock_is_held+0x59/0xa0
       process_one_work+0x249/0x680
       worker_thread+0x3a/0x390
       ? process_one_work+0x680/0x680
       kthread+0x121/0x140
       ? kthread_create_worker_on_cpu+0x70/0x70
       ret_from_fork+0x3a/0x50
      Reported-by: default avatarJonathan Billings <jsbillings@jsbillings.org>
      Reported-by: default avatarMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-by: default avatarJonathan Billings <jsbillings@jsbillings.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e781fff7
    • Steven Rostedt (VMware)'s avatar
      selftests/ftrace: Add some missing glob checks · c5ce9e5b
      Steven Rostedt (VMware) authored
      
      [ Upstream commit 97fe22ad ]
      
      Al Viro discovered a bug in the glob ftrace filtering code where "*a*b" is
      treated the same as "a*b", and functions that would be selected by "*a*b"
      but not "a*b" are not selected with "*a*b".
      
      Add tests for patterns "*a*b" and "a*b*" to the glob selftest.
      
      Link: http://lkml.kernel.org/r/20180127170748.GF13338@ZenIV.linux.org.uk
      
      Cc: Shuah Khan <shuah@kernel.org>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c5ce9e5b
    • Chen Yu's avatar
      cpufreq: intel_pstate: Enable HWP during system resume on CPU0 · ae9c78af
      Chen Yu authored
      
      [ Upstream commit 70f6bf2a ]
      
      When maxcpus=1 is in the kernel command line, the BP is responsible
      for re-enabling the HWP - because currently only the APs invoke
      intel_pstate_hwp_enable() during their online process - which might
      put the system into unstable state after resume.
      
      Fix this by enabling the HWP explicitly on BP during resume.
      Reported-by: default avatarDoug Smythies <dsmythies@telus.net>
      Suggested-by: default avatarSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: default avatarYu Chen <yu.c.chen@intel.com>
      [ rjw: Subject/changelog, minor modifications ]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ae9c78af
    • Tang Junhui's avatar
      bcache: return attach error when no cache set exist · c4c9fd55
      Tang Junhui authored
      
      [ Upstream commit 7f4fc93d ]
      
      I attach a back-end device to a cache set, and the cache set is not
      registered yet, this back-end device did not attach successfully, and no
      error returned:
      [root]# echo 87859280-fec6-4bcc-20df7ca8f86b > /sys/block/sde/bcache/attach
      [root]#
      
      In sysfs_attach(), the return value "v" is initialized to "size" in
      the beginning, and if no cache set exist in bch_cache_sets, the "v" value
      would not change any more, and return to sysfs, sysfs regard it as success
      since the "size" is a positive number.
      
      This patch fixes this issue by assigning "v" with "-ENOENT" in the
      initialization.
      Signed-off-by: default avatarTang Junhui <tang.junhui@zte.com.cn>
      Reviewed-by: default avatarMichael Lyle <mlyle@lyle.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c4c9fd55
    • Tang Junhui's avatar
      bcache: fix for data collapse after re-attaching an attached device · 4c8e0270
      Tang Junhui authored
      
      [ Upstream commit 73ac105b ]
      
      back-end device sdm has already attached a cache_set with ID
      f67ebe1f-f8bc-4d73-bfe5-9dc88607f119, then try to attach with
      another cache set, and it returns with an error:
      [root]# cd /sys/block/sdm/bcache
      [root]# echo 5ccd0a63-148e-48b8-afa2-aca9cbd6279f > attach
      -bash: echo: write error: Invalid argument
      
      After that, execute a command to modify the label of bcache
      device:
      [root]# echo data_disk1 > label
      
      Then we reboot the system, when the system power on, the back-end
      device can not attach to cache_set, a messages show in the log:
      Feb  5 12:05:52 ceph152 kernel: [922385.508498] bcache:
      bch_cached_dev_attach() couldn't find uuid for sdm in set
      
      In sysfs_attach(), dc->sb.set_uuid was assigned to the value
      which input through sysfs, no matter whether it is success
      or not in bch_cached_dev_attach(). For example, If the back-end
      device has already attached to an cache set, bch_cached_dev_attach()
      would fail, but dc->sb.set_uuid was changed. Then modify the
      label of bcache device, it will call bch_write_bdev_super(),
      which would write the dc->sb.set_uuid to the super block, so we
      record a wrong cache set ID in the super block, after the system
      reboot, the cache set couldn't find the uuid of the back-end
      device, so the bcache device couldn't exist and use any more.
      
      In this patch, we don't assigned cache set ID to dc->sb.set_uuid
      in sysfs_attach() directly, but input it into bch_cached_dev_attach(),
      and assigned dc->sb.set_uuid to the cache set ID after the back-end
      device attached to the cache set successful.
      Signed-off-by: default avatarTang Junhui <tang.junhui@zte.com.cn>
      Reviewed-by: default avatarMichael Lyle <mlyle@lyle.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4c8e0270
    • Tang Junhui's avatar
      bcache: fix for allocator and register thread race · 311e3141
      Tang Junhui authored
      
      [ Upstream commit 682811b3 ]
      
      After long time running of random small IO writing,
      I reboot the machine, and after the machine power on,
      I found bcache got stuck, the stack is:
      [root@ceph153 ~]# cat /proc/2510/task/*/stack
      [<ffffffffa06b2455>] closure_sync+0x25/0x90 [bcache]
      [<ffffffffa06b6be8>] bch_journal+0x118/0x2b0 [bcache]
      [<ffffffffa06b6dc7>] bch_journal_meta+0x47/0x70 [bcache]
      [<ffffffffa06be8f7>] bch_prio_write+0x237/0x340 [bcache]
      [<ffffffffa06a8018>] bch_allocator_thread+0x3c8/0x3d0 [bcache]
      [<ffffffff810a631f>] kthread+0xcf/0xe0
      [<ffffffff8164c318>] ret_from_fork+0x58/0x90
      [<ffffffffffffffff>] 0xffffffffffffffff
      [root@ceph153 ~]# cat /proc/2038/task/*/stack
      [<ffffffffa06b1abd>] __bch_btree_map_nodes+0x12d/0x150 [bcache]
      [<ffffffffa06b1bd1>] bch_btree_insert+0xf1/0x170 [bcache]
      [<ffffffffa06b637f>] bch_journal_replay+0x13f/0x230 [bcache]
      [<ffffffffa06c75fe>] run_cache_set+0x79a/0x7c2 [bcache]
      [<ffffffffa06c0cf8>] register_bcache+0xd48/0x1310 [bcache]
      [<ffffffff812f702f>] kobj_attr_store+0xf/0x20
      [<ffffffff8125b216>] sysfs_write_file+0xc6/0x140
      [<ffffffff811dfbfd>] vfs_write+0xbd/0x1e0
      [<ffffffff811e069f>] SyS_write+0x7f/0xe0
      [<ffffffff8164c3c9>] system_call_fastpath+0x16/0x1
      The stack shows the register thread and allocator thread
      were getting stuck when registering cache device.
      
      I reboot the machine several times, the issue always
      exsit in this machine.
      
      I debug the code, and found the call trace as bellow:
      register_bcache()
         ==>run_cache_set()
            ==>bch_journal_replay()
               ==>bch_btree_insert()
                  ==>__bch_btree_map_nodes()
                     ==>btree_insert_fn()
                        ==>btree_split() //node need split
                           ==>btree_check_reserve()
      In btree_check_reserve(), It will check if there is enough buckets
      of RESERVE_BTREE type, since allocator thread did not work yet, so
      no buckets of RESERVE_BTREE type allocated, so the register thread
      waits on c->btree_cache_wait, and goes to sleep.
      
      Then the allocator thread initialized, the call trace is bellow:
      bch_allocator_thread()
      ==>bch_prio_write()
         ==>bch_journal_meta()
            ==>bch_journal()
               ==>journal_wait_for_write()
      In journal_wait_for_write(), It will check if journal is full by
      journal_full(), but the long time random small IO writing
      causes the exhaustion of journal buckets(journal.blocks_free=0),
      In order to release the journal buckets,
      the allocator calls btree_flush_write() to flush keys to
      btree nodes, and waits on c->journal.wait until btree nodes writing
      over or there has already some journal buckets space, then the
      allocator thread goes to sleep. but in btree_flush_write(), since
      bch_journal_replay() is not finished, so no btree nodes have journal
      (condition "if (btree_current_write(b)->journal)" never satisfied),
      so we got no btree node to flush, no journal bucket released,
      and allocator sleep all the times.
      
      Through the above analysis, we can see that:
      1) Register thread wait for allocator thread to allocate buckets of
         RESERVE_BTREE type;
      2) Alloctor thread wait for register thread to replay journal, so it
         can flush btree nodes and get journal bucket.
         then they are all got stuck by waiting for each other.
      
      Hua Rui provided a patch for me, by allocating some buckets of
      RESERVE_BTREE type in advance, so the register thread can get bucket
      when btree node splitting and no need to waiting for the allocator
      thread. I tested it, it has effect, and register thread run a step
      forward, but finally are still got stuck, the reason is only 8 bucket
      of RESERVE_BTREE type were allocated, and in bch_journal_replay(),
      after 2 btree nodes splitting, only 4 bucket of RESERVE_BTREE type left,
      then btree_check_reserve() is not satisfied anymore, so it goes to sleep
      again, and in the same time, alloctor thread did not flush enough btree
      nodes to release a journal bucket, so they all got stuck again.
      
      So we need to allocate more buckets of RESERVE_BTREE type in advance,
      but how much is enough?  By experience and test, I think it should be
      as much as journal buckets. Then I modify the code as this patch,
      and test in the machine, and it works.
      
      This patch modified base on Hua Rui’s patch, and allocate more buckets
      of RESERVE_BTREE type in advance to avoid register thread and allocate
      thread going to wait for each other.
      
      [patch v2] ca->sb.njournal_buckets would be 0 in the first time after
      cache creation, and no journal exists, so just 8 btree buckets is OK.
      Signed-off-by: default avatarHua Rui <huarui.dev@gmail.com>
      Signed-off-by: default avatarTang Junhui <tang.junhui@zte.com.cn>
      Reviewed-by: default avatarMichael Lyle <mlyle@lyle.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      311e3141
    • Coly Li's avatar
      bcache: properly set task state in bch_writeback_thread() · f89edd17
      Coly Li authored
      
      [ Upstream commit 99361bbf ]
      
      Kernel thread routine bch_writeback_thread() has the following code block,
      
      447         down_write(&dc->writeback_lock);
      448~450     if (check conditions) {
      451                 up_write(&dc->writeback_lock);
      452                 set_current_state(TASK_INTERRUPTIBLE);
      453
      454                 if (kthread_should_stop())
      455                         return 0;
      456
      457                 schedule();
      458                 continue;
      459         }
      
      If condition check is true, its task state is set to TASK_INTERRUPTIBLE
      and call schedule() to wait for others to wake up it.
      
      There are 2 issues in current code,
      1, Task state is set to TASK_INTERRUPTIBLE after the condition checks, if
         another process changes the condition and call wake_up_process(dc->
         writeback_thread), then at line 452 task state is set back to
         TASK_INTERRUPTIBLE, the writeback kernel thread will lose a chance to be
         waken up.
      2, At line 454 if kthread_should_stop() is true, writeback kernel thread
         will return to kernel/kthread.c:kthread() with TASK_INTERRUPTIBLE and
         call do_exit(). It is not good to enter do_exit() with task state
         TASK_INTERRUPTIBLE, in following code path might_sleep() is called and a
         warning message is reported by __might_sleep(): "WARNING: do not call
         blocking ops when !TASK_RUNNING; state=1 set at [xxxx]".
      
      For the first issue, task state should be set before condition checks.
      Ineed because dc->writeback_lock is required when modifying all the
      conditions, calling set_current_state() inside code block where dc->
      writeback_lock is hold is safe. But this is quite implicit, so I still move
      set_current_state() before all the condition checks.
      
      For the second issue, frankley speaking it does not hurt when kernel thread
      exits with TASK_INTERRUPTIBLE state, but this warning message scares users,
      makes them feel there might be something risky with bcache and hurt their
      data.  Setting task state to TASK_RUNNING before returning fixes this
      problem.
      
      In alloc.c:allocator_wait(), there is also a similar issue, and is also
      fixed in this patch.
      
      Changelog:
      v3: merge two similar fixes into one patch
      v2: fix the race issue in v1 patch.
      v1: initial buggy fix.
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Reviewed-by: default avatarMichael Lyle <mlyle@lyle.org>
      Cc: Michael Lyle <mlyle@lyle.org>
      Cc: Junhui Tang <tang.junhui@zte.com.cn>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f89edd17
    • Arnd Bergmann's avatar
      cifs: silence compiler warnings showing up with gcc-8.0.0 · 05921c49
      Arnd Bergmann authored
      
      [ Upstream commit ade7db99 ]
      
      This bug was fixed before, but came up again with the latest
      compiler in another function:
      
      fs/cifs/cifssmb.c: In function 'CIFSSMBSetEA':
      fs/cifs/cifssmb.c:6362:3: error: 'strncpy' offset 8 is out of the bounds [0, 4] [-Werror=array-bounds]
         strncpy(parm_data->list[0].name, ea_name, name_len);
      
      Let's apply the same fix that was used for the other instances.
      
      Fixes: b2a3ad9c ("cifs: silence compiler warnings showing up with gcc-4.7.0")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      05921c49
    • Ulf Hansson's avatar
      PM / domains: Fix up domain-idle-states OF parsing · 4b95781c
      Ulf Hansson authored
      
      [ Upstream commit a3381e3a ]
      
      Commit b539cc82 (PM / Domains: Ignore domain-idle-states that are
      not compatible), made it possible to ignore non-compatible
      domain-idle-states OF nodes. However, in case that happens while doing
      the OF parsing, the number of elements in the allocated array would
      exceed the numbers actually needed, thus wasting memory.
      
      Fix this by pre-iterating the genpd OF node and counting the number of
      compatible domain-idle-states nodes, before doing the allocation. While
      doing this, it makes sense to rework the code a bit to avoid open coding,
      of parts responsible for the OF node iteration.
      
      Let's also take the opportunity to clarify the function header for
      of_genpd_parse_idle_states(), about what is being returned in case of
      errors.
      
      Fixes: b539cc82 (PM / Domains: Ignore domain-idle-states that are not compatible)
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Reviewed-by: default avatarLina Iyer <ilina@codeaurora.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4b95781c
    • Alexey Dobriyan's avatar
      proc: fix /proc/*/map_files lookup · 05e52e5b
      Alexey Dobriyan authored
      
      [ Upstream commit ac7f1061 ]
      
      Current code does:
      
      	if (sscanf(dentry->d_name.name, "%lx-%lx", start, end) != 2)
      
      However sscanf() is broken garbage.
      
      It silently accepts whitespace between format specifiers
      (did you know that?).
      
      It silently accepts valid strings which result in integer overflow.
      
      Do not use sscanf() for any even remotely reliable parsing code.
      
      	OK
      	# readlink '/proc/1/map_files/55a23af39000-55a23b05b000'
      	/lib/systemd/systemd
      
      	broken
      	# readlink '/proc/1/map_files/               55a23af39000-55a23b05b000'
      	/lib/systemd/systemd
      
      	broken
      	# readlink '/proc/1/map_files/55a23af39000-55a23b05b000    '
      	/lib/systemd/systemd
      
      	very broken
      	# readlink '/proc/1/map_files/1000000000000000055a23af39000-55a23b05b000'
      	/lib/systemd/systemd
      
      Andrei said:
      
      : This patch breaks criu.  It was a bug in criu.  And this bug is on a minor
      : path, which works when memfd_create() isn't available.  It is a reason why
      : I ask to not backport this patch to stable kernels.
      :
      : In CRIU this bug can be triggered, only if this patch will be backported
      : to a kernel which version is lower than v3.16.
      
      Link: http://lkml.kernel.org/r/20171120212706.GA14325@avx2Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Andrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      05e52e5b
    • Will Deacon's avatar
      arm64: spinlock: Fix theoretical trylock() A-B-A with LSE atomics · 4ec317a4
      Will Deacon authored
      
      [ Upstream commit 202fb4ef ]
      
      If the spinlock "next" ticket wraps around between the initial LDR
      and the cmpxchg in the LSE version of spin_trylock, then we can erroneously
      think that we have successfuly acquired the lock because we only check
      whether the next ticket return by the cmpxchg is equal to the owner ticket
      in our updated lock word.
      
      This patch fixes the issue by performing a full 32-bit check of the lock
      word when trying to determine whether or not the CASA instruction updated
      memory.
      Reported-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4ec317a4
    • Guanglei Li's avatar
      RDS: IB: Fix null pointer issue · 693b9589
      Guanglei Li authored
      
      [ Upstream commit 2c0aa086 ]
      
      Scenario:
      1. Port down and do fail over
      2. Ap do rds_bind syscall
      
      PID: 47039  TASK: ffff89887e2fe640  CPU: 47  COMMAND: "kworker/u:6"
       #0 [ffff898e35f159f0] machine_kexec at ffffffff8103abf9
       #1 [ffff898e35f15a60] crash_kexec at ffffffff810b96e3
       #2 [ffff898e35f15b30] oops_end at ffffffff8150f518
       #3 [ffff898e35f15b60] no_context at ffffffff8104854c
       #4 [ffff898e35f15ba0] __bad_area_nosemaphore at ffffffff81048675
       #5 [ffff898e35f15bf0] bad_area_nosemaphore at ffffffff810487d3
       #6 [ffff898e35f15c00] do_page_fault at ffffffff815120b8
       #7 [ffff898e35f15d10] page_fault at ffffffff8150ea95
          [exception RIP: unknown or invalid address]
          RIP: 0000000000000000  RSP: ffff898e35f15dc8  RFLAGS: 00010282
          RAX: 00000000fffffffe  RBX: ffff889b77f6fc00  RCX:ffffffff81c99d88
          RDX: 0000000000000000  RSI: ffff896019ee08e8  RDI:ffff889b77f6fc00
          RBP: ffff898e35f15df0   R8: ffff896019ee08c8  R9:0000000000000000
          R10: 0000000000000400  R11: 0000000000000000  R12:ffff896019ee08c0
          R13: ffff889b77f6fe68  R14: ffffffff81c99d80  R15: ffffffffa022a1e0
          ORIG_RAX: ffffffffffffffff  CS: 0010 SS: 0018
       #8 [ffff898e35f15dc8] cma_ndev_work_handler at ffffffffa022a228 [rdma_cm]
       #9 [ffff898e35f15df8] process_one_work at ffffffff8108a7c6
       #10 [ffff898e35f15e58] worker_thread at ffffffff8108bda0
       #11 [ffff898e35f15ee8] kthread at ffffffff81090fe6
      
      PID: 45659  TASK: ffff880d313d2500  CPU: 31  COMMAND: "oracle_45659_ap"
       #0 [ffff881024ccfc98] __schedule at ffffffff8150bac4
       #1 [ffff881024ccfd40] schedule at ffffffff8150c2cf
       #2 [ffff881024ccfd50] __mutex_lock_slowpath at ffffffff8150cee7
       #3 [ffff881024ccfdc0] mutex_lock at ffffffff8150cdeb
       #4 [ffff881024ccfde0] rdma_destroy_id at ffffffffa022a027 [rdma_cm]
       #5 [ffff881024ccfe10] rds_ib_laddr_check at ffffffffa0357857 [rds_rdma]
       #6 [ffff881024ccfe50] rds_trans_get_preferred at ffffffffa0324c2a [rds]
       #7 [ffff881024ccfe80] rds_bind at ffffffffa031d690 [rds]
       #8 [ffff881024ccfeb0] sys_bind at ffffffff8142a670
      
      PID: 45659                          PID: 47039
      rds_ib_laddr_check
        /* create id_priv with a null event_handler */
        rdma_create_id
        rdma_bind_addr
          cma_acquire_dev
            /* add id_priv to cma_dev->id_list */
            cma_attach_to_dev
                                          cma_ndev_work_handler
                                            /* event_hanlder is null */
                                            id_priv->id.event_handler
      Signed-off-by: default avatarGuanglei Li <guanglei.li@oracle.com>
      Signed-off-by: default avatarHonglei Wang <honglei.wang@oracle.com>
      Reviewed-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: default avatarYanjun Zhu <yanjun.zhu@oracle.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Acked-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      693b9589
    • John Fastabend's avatar
      bpf: sockmap, fix leaking maps with attached but not detached progs · a8e7a4e2
      John Fastabend authored
      
      [ Upstream commit 3d9e9526 ]
      
      When a program is attached to a map we increment the program refcnt
      to ensure that the program is not removed while it is potentially
      being referenced from sockmap side. However, if this same program
      also references the map (this is a reasonably common pattern in
      my programs) then the verifier will also increment the maps refcnt
      from the verifier. This is to ensure the map doesn't get garbage
      collected while the program has a reference to it.
      
      So we are left in a state where the map holds the refcnt on the
      program stopping it from being removed and releasing the map refcnt.
      And vice versa the program holds a refcnt on the map stopping it
      from releasing the refcnt on the prog.
      
      All this is fine as long as users detach the program while the
      map fd is still around. But, if the user omits this detach command
      we are left with a dangling map we can no longer release.
      
      To resolve this when the map fd is released decrement the program
      references and remove any reference from the map to the program.
      This fixes the issue with possibly dangling map and creates a
      user side API constraint. That is, the map fd must be held open
      for programs to be attached to a map.
      
      Fixes: 174a79ff ("bpf: sockmap with sk redirect support")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a8e7a4e2
    • Ross Lagerwall's avatar
      xen/grant-table: Use put_page instead of free_page · 05c062c3
      Ross Lagerwall authored
      
      [ Upstream commit 3ac7292a ]
      
      The page given to gnttab_end_foreign_access() to free could be a
      compound page so use put_page() instead of free_page() since it can
      handle both compound and single pages correctly.
      
      This bug was discovered when migrating a Xen VM with several VIFs and
      CONFIG_DEBUG_VM enabled. It hits a BUG usually after fewer than 10
      iterations. All netfront devices disconnect from the backend during a
      suspend/resume and this will call gnttab_end_foreign_access() if a
      netfront queue has an outstanding skb. The mismatch between calling
      get_page() and free_page() on a compound page causes a reference
      counting error which is detected when DEBUG_VM is enabled.
      Signed-off-by: default avatarRoss Lagerwall <ross.lagerwall@citrix.com>
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      05c062c3
    • Ross Lagerwall's avatar
      xen-netfront: Fix race between device setup and open · 70f3461c
      Ross Lagerwall authored
      
      [ Upstream commit f599c64f ]
      
      When a netfront device is set up it registers a netdev fairly early on,
      before it has set up the queues and is actually usable. A userspace tool
      like NetworkManager will immediately try to open it and access its state
      as soon as it appears. The bug can be reproduced by hotplugging VIFs
      until the VM runs out of grant refs. It registers the netdev but fails
      to set up any queues (since there are no more grant refs). In the
      meantime, NetworkManager opens the device and the kernel crashes trying
      to access the queues (of which there are none).
      
      Fix this in two ways:
      * For initial setup, register the netdev much later, after the queues
      are setup. This avoids the race entirely.
      * During a suspend/resume cycle, the frontend reconnects to the backend
      and the queues are recreated. It is possible (though highly unlikely) to
      race with something opening the device and accessing the queues after
      they have been destroyed but before they have been recreated. Extend the
      region covered by the rtnl semaphore to protect against this race. There
      is a possibility that we fail to recreate the queues so check for this
      in the open function.
      Signed-off-by: default avatarRoss Lagerwall <ross.lagerwall@citrix.com>
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      70f3461c
    • Jiri Olsa's avatar
      perf evsel: Fix period/freq terms setup · 2f79b5e5
      Jiri Olsa authored
      
      [ Upstream commit 49c0ae80 ]
      
      Stephane reported that we don't set properly PERIOD sample type for
      events with period term defined.
      
      Before:
        $ perf record -e cpu/cpu-cycles,period=1000/u ls
        $ perf evlist -v
        cpu/cpu-cycles,period=1000/u: ... sample_type: IP|TID|TIME|PERIOD, ...
      
      After:
        $ perf record -e cpu/cpu-cycles,period=1000/u ls
        $ perf evlist -v
        cpu/cpu-cycles,period=1000/u: ... sample_type: IP|TID|TIME, ...
      
      Setting PERIOD sample type based on period term setup.
      
      Committer note:
      
      When we use -c or a period=N term in the event definition, then we don't
      need to ask the kernel, for this event, via perf_event_attr.sample_type
      |= PERF_SAMPLE_PERIOD, to put the event period in each sample for this
      event, as we know it already, it is in perf_event_attr.sample_period.
      Reported-by: default avatarStephane Eranian <eranian@google.com>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarStephane Eranian <eranian@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20180201083812.11359-2-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2f79b5e5
    • Matt Redfearn's avatar
      MIPS: Generic: Support GIC in EIC mode · b1f9f9fb
      Matt Redfearn authored
      
      [ Upstream commit 7bf8b16d ]
      
      The GIC supports running in External Interrupt Controller (EIC) mode,
      and will signal this via cpu_has_veic if enabled in hardware. Currently
      the generic kernel will panic if cpu_has_veic is set - but the GIC can
      legitimately set this flag if either configured to boot in EIC mode, or
      if the GIC driver enables this mode. Make the kernel not panic in this
      case, and instead just check if the GIC is present. If so, use it's CPU
      local interrupt routing functions. If an EIC is present, but it is not
      the GIC, then the kernel does not know how to get the VIRQ for the CPU
      local interrupts and should panic. Support for alternative EICs being
      present is needed here for the generic kernel to support them.
      Suggested-by: default avatarPaul Burton <paul.burton@mips.com>
      Signed-off-by: default avatarMatt Redfearn <matt.redfearn@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/18191/Signed-off-by: default avatarJames Hogan <jhogan@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b1f9f9fb
    • Jiri Olsa's avatar
      perf record: Fix period option handling · 76e3ea2f
      Jiri Olsa authored
      
      [ Upstream commit f290aa1f ]
      
      Stephan reported we don't unset PERIOD sample type when --no-period is
      specified. Adding the unset check and reset PERIOD if --no-period is
      specified.
      
      Committer notes:
      
      Check the sample_type, it shouldn't have PERF_SAMPLE_PERIOD there when
      --no-period is used.
      
      Before:
      
        # perf record --no-period sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.018 MB perf.data (7 samples) ]
        # perf evlist -v
        cycles:ppp: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
        #
      
      After:
      
      [root@jouet ~]# perf record --no-period sleep 1
      [ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 0.019 MB perf.data (17 samples) ]
      [root@jouet ~]# perf evlist -v
      cycles:ppp: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
      [root@jouet ~]#
      Reported-by: default avatarStephane Eranian <eranian@google.com>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: default avatarStephane Eranian <eranian@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20180201083812.11359-3-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      76e3ea2f
    • Matt Redfearn's avatar
      MIPS: TXx9: use IS_BUILTIN() for CONFIG_LEDS_CLASS · f938c2ac
      Matt Redfearn authored
      
      [ Upstream commit 0cde5b44 ]
      
      When commit b27311e1 ("MIPS: TXx9: Add RBTX4939 board support")
      added board support for the RBTX4939, it added a call to
      led_classdev_register even if the LED class is built as a module.
      Built-in arch code cannot call module code directly like this. Commit
      b33b4407 ("MIPS: TXX9: use IS_ENABLED() macro") subsequently
      changed the inclusion of this code to a single check that
      CONFIG_LEDS_CLASS is either builtin or a module, but the same issue
      remains.
      
      This leads to MIPS allmodconfig builds failing when CONFIG_MACH_TX49XX=y
      is set:
      
      arch/mips/txx9/rbtx4939/setup.o: In function `rbtx4939_led_probe':
      setup.c:(.init.text+0xc0): undefined reference to `of_led_classdev_register'
      make: *** [Makefile:999: vmlinux] Error 1
      
      Fix this by using the IS_BUILTIN() macro instead.
      
      Fixes: b27311e1 ("MIPS: TXx9: Add RBTX4939 board support")
      Signed-off-by: default avatarMatt Redfearn <matt.redfearn@mips.com>
      Reviewed-by: default avatarJames Hogan <jhogan@kernel.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/18544/Signed-off-by: default avatarJames Hogan <jhogan@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f938c2ac
    • Yonghong Song's avatar
      bpf: fix selftests/bpf test_kmod.sh failure when CONFIG_BPF_JIT_ALWAYS_ON=y · 3e01c16d
      Yonghong Song authored
      
      [ Upstream commit 09584b40 ]
      
      With CONFIG_BPF_JIT_ALWAYS_ON is defined in the config file,
      tools/testing/selftests/bpf/test_kmod.sh failed like below:
        [root@localhost bpf]# ./test_kmod.sh
        sysctl: setting key "net.core.bpf_jit_enable": Invalid argument
        [ JIT enabled:0 hardened:0 ]
        [  132.175681] test_bpf: #297 BPF_MAXINSNS: Jump, gap, jump, ... FAIL to prog_create err=-524 len=4096
        [  132.458834] test_bpf: Summary: 348 PASSED, 1 FAILED, [340/340 JIT'ed]
        [ JIT enabled:1 hardened:0 ]
        [  133.456025] test_bpf: #297 BPF_MAXINSNS: Jump, gap, jump, ... FAIL to prog_create err=-524 len=4096
        [  133.730935] test_bpf: Summary: 348 PASSED, 1 FAILED, [340/340 JIT'ed]
        [ JIT enabled:1 hardened:1 ]
        [  134.769730] test_bpf: #297 BPF_MAXINSNS: Jump, gap, jump, ... FAIL to prog_create err=-524 len=4096
        [  135.050864] test_bpf: Summary: 348 PASSED, 1 FAILED, [340/340 JIT'ed]
        [ JIT enabled:1 hardened:2 ]
        [  136.442882] test_bpf: #297 BPF_MAXINSNS: Jump, gap, jump, ... FAIL to prog_create err=-524 len=4096
        [  136.821810] test_bpf: Summary: 348 PASSED, 1 FAILED, [340/340 JIT'ed]
        [root@localhost bpf]#
      
      The test_kmod.sh load/remove test_bpf.ko multiple times with different
      settings for sysctl net.core.bpf_jit_{enable,harden}. The failed test #297
      of test_bpf.ko is designed such that JIT always fails.
      
      Commit 290af866 (bpf: introduce BPF_JIT_ALWAYS_ON config)
      introduced the following tightening logic:
          ...
              if (!bpf_prog_is_dev_bound(fp->aux)) {
                      fp = bpf_int_jit_compile(fp);
          #ifdef CONFIG_BPF_JIT_ALWAYS_ON
                      if (!fp->jited) {
                              *err = -ENOTSUPP;
                              return fp;
                      }
          #endif
          ...
      With this logic, Test #297 always gets return value -ENOTSUPP
      when CONFIG_BPF_JIT_ALWAYS_ON is defined, causing the test failure.
      
      This patch fixed the failure by marking Test #297 as expected failure
      when CONFIG_BPF_JIT_ALWAYS_ON is defined.
      
      Fixes: 290af866 (bpf: introduce BPF_JIT_ALWAYS_ON config)
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3e01c16d
    • Hans de Goede's avatar
      ACPI / scan: Use acpi_bus_get_status() to initialize ACPI_TYPE_DEVICE devs · 74abca65
      Hans de Goede authored
      
      [ Upstream commit 63347db0 ]
      
      The acpi_get_bus_status wrapper for acpi_bus_get_status_handle has some
      code to handle certain device quirks, in some cases we also need this
      quirk handling for the initial _STA call.
      
      Specifically on some devices calling _STA before all _DEP dependencies
      are met results in errors like these:
      
      [    0.123579] ACPI Error: No handler for Region [ECRM] (00000000ba9edc4c)
                     [GenericSerialBus] (20170831/evregion-166)
      [    0.123601] ACPI Error: Region GenericSerialBus (ID=9) has no handler
                     (20170831/exfldio-299)
      [    0.123618] ACPI Error: Method parse/execution failed
                     \_SB.I2C1.BAT1._STA, AE_NOT_EXIST (20170831/psparse-550)
      
      acpi_get_bus_status already has code to avoid this, so by using it we
      also silence these errors from the initial _STA call.
      
      Note that in order for the acpi_get_bus_status handling for this to work,
      we initialize dep_unmet to 1 until acpi_device_dep_initialize gets called,
      this means that battery devices will be instantiated with an initial
      status of 0. This is not a problem, acpi_bus_attach will get called soon
      after the instantiation anyways and it will update the status as first
      point of order.
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      74abca65
    • Hans de Goede's avatar
      ACPI / bus: Do not call _STA on battery devices with unmet dependencies · f920e914
      Hans de Goede authored
      
      [ Upstream commit 54ddce70 ]
      
      The battery code uses acpi_device->dep_unmet to check for unmet deps and
      if there are unmet deps it does not bind to the device to avoid errors
      about missing OpRegions when calling ACPI methods on the device.
      
      The missing OpRegions when there are unmet deps problem also applies to
      the _STA method of some battery devices and calling it too early results
      in errors like these:
      
      [    0.123579] ACPI Error: No handler for Region [ECRM] (00000000ba9edc4c)
                     [GenericSerialBus] (20170831/evregion-166)
      [    0.123601] ACPI Error: Region GenericSerialBus (ID=9) has no handler
                     (20170831/exfldio-299)
      [    0.123618] ACPI Error: Method parse/execution failed
                     \_SB.I2C1.BAT1._STA, AE_NOT_EXIST (20170831/psparse-550)
      
      This commit fixes these errors happening when acpi_get_bus_status gets
      called by checking dep_unmet for battery devices and reporting a status
      of 0 until all dependencies are met.
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f920e914
    • Chen Yu's avatar
      ACPI: processor_perflib: Do not send _PPC change notification if not ready · 51939996
      Chen Yu authored
      
      [ Upstream commit ba1edb9a ]
      
      The following warning was triggered after resumed from S3 -
      if all the nonboot CPUs were put offline before suspend:
      
      [ 1840.329515] unchecked MSR access error: RDMSR from 0x771 at rIP: 0xffffffff86061e3a (native_read_msr+0xa/0x30)
      [ 1840.329516] Call Trace:
      [ 1840.329521]  __rdmsr_on_cpu+0x33/0x50
      [ 1840.329525]  generic_exec_single+0x81/0xb0
      [ 1840.329527]  smp_call_function_single+0xd2/0x100
      [ 1840.329530]  ? acpi_ds_result_pop+0xdd/0xf2
      [ 1840.329532]  ? acpi_ds_create_operand+0x215/0x23c
      [ 1840.329534]  rdmsrl_on_cpu+0x57/0x80
      [ 1840.329536]  ? cpumask_next+0x1b/0x20
      [ 1840.329538]  ? rdmsrl_on_cpu+0x57/0x80
      [ 1840.329541]  intel_pstate_update_perf_limits+0xf3/0x220
      [ 1840.329544]  ? notifier_call_chain+0x4a/0x70
      [ 1840.329546]  intel_pstate_set_policy+0x4e/0x150
      [ 1840.329548]  cpufreq_set_policy+0xcd/0x2f0
      [ 1840.329550]  cpufreq_update_policy+0xb2/0x130
      [ 1840.329552]  ? cpufreq_update_policy+0x130/0x130
      [ 1840.329556]  acpi_processor_ppc_has_changed+0x65/0x80
      [ 1840.329558]  acpi_processor_notify+0x80/0x100
      [ 1840.329561]  acpi_ev_notify_dispatch+0x44/0x5c
      [ 1840.329563]  acpi_os_execute_deferred+0x14/0x20
      [ 1840.329565]  process_one_work+0x193/0x3c0
      [ 1840.329567]  worker_thread+0x35/0x3b0
      [ 1840.329569]  kthread+0x125/0x140
      [ 1840.329571]  ? process_one_work+0x3c0/0x3c0
      [ 1840.329572]  ? kthread_park+0x60/0x60
      [ 1840.329575]  ? do_syscall_64+0x67/0x180
      [ 1840.329577]  ret_from_fork+0x25/0x30
      [ 1840.329585] unchecked MSR access error: WRMSR to 0x774 (tried to write 0x0000000000000000) at rIP: 0xffffffff86061f78 (native_write_msr+0x8/0x30)
      [ 1840.329586] Call Trace:
      [ 1840.329587]  __wrmsr_on_cpu+0x37/0x40
      [ 1840.329589]  generic_exec_single+0x81/0xb0
      [ 1840.329592]  smp_call_function_single+0xd2/0x100
      [ 1840.329594]  ? acpi_ds_create_operand+0x215/0x23c
      [ 1840.329595]  ? cpumask_next+0x1b/0x20
      [ 1840.329597]  wrmsrl_on_cpu+0x57/0x70
      [ 1840.329598]  ? rdmsrl_on_cpu+0x57/0x80
      [ 1840.329599]  ? wrmsrl_on_cpu+0x57/0x70
      [ 1840.329602]  intel_pstate_hwp_set+0xd3/0x150
      [ 1840.329604]  intel_pstate_set_policy+0x119/0x150
      [ 1840.329606]  cpufreq_set_policy+0xcd/0x2f0
      [ 1840.329607]  cpufreq_update_policy+0xb2/0x130
      [ 1840.329610]  ? cpufreq_update_policy+0x130/0x130
      [ 1840.329613]  acpi_processor_ppc_has_changed+0x65/0x80
      [ 1840.329615]  acpi_processor_notify+0x80/0x100
      [ 1840.329617]  acpi_ev_notify_dispatch+0x44/0x5c
      [ 1840.329619]  acpi_os_execute_deferred+0x14/0x20
      [ 1840.329620]  process_one_work+0x193/0x3c0
      [ 1840.329622]  worker_thread+0x35/0x3b0
      [ 1840.329624]  kthread+0x125/0x140
      [ 1840.329625]  ? process_one_work+0x3c0/0x3c0
      [ 1840.329626]  ? kthread_park+0x60/0x60
      [ 1840.329628]  ? do_syscall_64+0x67/0x180
      [ 1840.329631]  ret_from_fork+0x25/0x30
      
      This is because if there's only one online CPU, the MSR_PM_ENABLE
      (package wide)can not be enabled after resumed, due to
      intel_pstate_hwp_enable() will only be invoked on AP's online
      process after resumed - if there's no AP online, the HWP remains
      disabled after resumed (BIOS has disabled it in S3). Then if
      there comes a _PPC change notification which touches HWP register
      during this stage, the warning is triggered.
      
      Since we don't call acpi_processor_register_performance() when
      HWP is enabled, the pr->performance will be NULL. When this is
      NULL we don't need to do _PPC change notification.
      Reported-by: default avatarDoug Smythies <dsmythies@telus.net>
      Suggested-by: default avatarSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: default avatarYu Chen <yu.c.chen@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      51939996
    • Jean Delvare's avatar
      firmware: dmi_scan: Fix handling of empty DMI strings · 573cb560
      Jean Delvare authored
      
      [ Upstream commit a7770ae1 ]
      
      The handling of empty DMI strings looks quite broken to me:
      * Strings from 1 to 7 spaces are not considered empty.
      * True empty DMI strings (string index set to 0) are not considered
        empty, and result in allocating a 0-char string.
      * Strings with invalid index also result in allocating a 0-char
        string.
      * Strings starting with 8 spaces are all considered empty, even if
        non-space characters follow (sounds like a weird thing to do, but
        I have actually seen occurrences of this in DMI tables before.)
      * Strings which are considered empty are reported as 8 spaces,
        instead of being actually empty.
      
      Some of these issues are the result of an off-by-one error in memcmp,
      the rest is incorrect by design.
      
      So let's get it square: missing strings and strings made of only
      spaces, regardless of their length, should be treated as empty and
      no memory should be allocated for them. All other strings are
      non-empty and should be allocated.
      Signed-off-by: default avatarJean Delvare <jdelvare@suse.de>
      Fixes: 79da4721 ("x86: fix DMI out of memory problems")
      Cc: Parag Warudkar <parag.warudkar@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      573cb560